Update readme.

This commit is contained in:
Daniel 2020-04-04 10:50:28 +02:00
parent e16b72ff28
commit a291e23041

View File

@ -6,16 +6,16 @@ You can download the librispeech corpus with the following commands:
.. code-block:: bash
wget http://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz -O librispeech.txt.gz
gunzip librispeech.txt.gz
| Then use the `generate_lm.py` script to generate `lm.binary` and `vocab-500000.txt`.
| As input you can use a `file.txt` or `file.txt.gz` with one sentence in each line.
| If you are not using the DeepSpeech docker container, you have to build `KenLM <https://github.com/kpu/kenlm>`_ first
and then pass the build path to the script `--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/`.
.. code-block:: bash
python3 data/lm/generate_lm.py --input_txt path/to/vocab_sentences.txt --output_dir path/lm/
python3 data/lm/generate_lm.py --input_txt path/to/librispeech.txt.gz --output_dir path/lm/
Afterwards you can generate the scorer package with the above vocab-500000.txt and lm.binary files