Update readme.

2020-04-04 10:50:28 +02:00 · 2020-04-04 10:50:28 +02:00 · a291e23041
commit a291e23041
parent e16b72ff28
1 changed files with 2 additions and 2 deletions
--- a/data/lm/README.rst
+++ b/data/lm/README.rst
@ -6,16 +6,16 @@ You can download the librispeech corpus with the following commands:
 .. code-block:: bash

    wget http://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz -O librispeech.txt.gz
-    gunzip librispeech.txt.gz


 | Then use the `generate_lm.py` script to generate `lm.binary` and `vocab-500000.txt`.
+| As input you can use a `file.txt` or `file.txt.gz` with one sentence in each line.
 | If you are not using the DeepSpeech docker container, you have to build `KenLM <https://github.com/kpu/kenlm>`_ first
  and then pass the build path to the script `--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/`.

 .. code-block:: bash

-    python3 data/lm/generate_lm.py --input_txt path/to/vocab_sentences.txt --output_dir path/lm/
+    python3 data/lm/generate_lm.py --input_txt path/to/librispeech.txt.gz --output_dir path/lm/


 Afterwards you can generate the scorer package with the above vocab-500000.txt and lm.binary files