Default to required params.

This commit is contained in:
Daniel 2020-04-08 20:23:04 +02:00
parent a291e23041
commit c29c0beb72
2 changed files with 14 additions and 11 deletions

View File

@ -11,15 +11,18 @@ You can download the librispeech corpus with the following commands:
| Then use the `generate_lm.py` script to generate `lm.binary` and `vocab-500000.txt`.
| As input you can use a `file.txt` or `file.txt.gz` with one sentence in each line.
| If you are not using the DeepSpeech docker container, you have to build `KenLM <https://github.com/kpu/kenlm>`_ first
and then pass the build path to the script `--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/`.
and then pass the build directory to the script.
.. code-block:: bash
python3 data/lm/generate_lm.py --input_txt path/to/librispeech.txt.gz --output_dir path/lm/
python3 data/lm/generate_lm.py --input_txt path/to/librispeech.txt.gz --output_dir path/lm/ --top_k 500000 \
--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" \
--arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
Afterwards you can generate the scorer package with the above vocab-500000.txt and lm.binary files
.. code-block:: bash
python generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt --package kenlm.scorer --default_alpha 0.75 --default_beta 1.85
python3 generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt \
--package kenlm.scorer --default_alpha 0.75 --default_beta 1.85

View File

@ -144,49 +144,49 @@ def main():
"--top_k",
help="Use top_k most frequent words for the vocab.txt file. These will be used to filter the ARPA file.",
type=int,
default=500000,
required=True,
)
parser.add_argument(
"--kenlm_bins",
help="File path to the KENLM binaries lmplz, filter and build_binary",
type=str,
default="/DeepSpeech/native_client/kenlm/build/bin/",
required=True,
)
parser.add_argument(
"--arpa_order",
help="Order of k-grams in ARPA-file generation",
type=int,
default=5,
required=True,
)
parser.add_argument(
"--max_arpa_memory",
help="Maximum allowed memory usage for ARPA-file generation",
type=str,
default="75%",
required=True,
)
parser.add_argument(
"--arpa_prune",
help="ARPA pruning parameters. Separate values with '|'",
type=str,
default="0|0|1",
required=True,
)
parser.add_argument(
"--binary_a_bits",
help="Build binary quantization value a in bits",
type=int,
default=255,
required=True,
)
parser.add_argument(
"--binary_q_bits",
help="Build binary quantization value q in bits",
type=int,
default=8,
required=True,
)
parser.add_argument(
"--binary_type",
help="Build binary data structure type",
type=str,
default="trie",
required=True,
)
args = parser.parse_args()