Default to required params.
This commit is contained in:
parent
a291e23041
commit
c29c0beb72
@ -11,15 +11,18 @@ You can download the librispeech corpus with the following commands:
|
||||
| Then use the `generate_lm.py` script to generate `lm.binary` and `vocab-500000.txt`.
|
||||
| As input you can use a `file.txt` or `file.txt.gz` with one sentence in each line.
|
||||
| If you are not using the DeepSpeech docker container, you have to build `KenLM <https://github.com/kpu/kenlm>`_ first
|
||||
and then pass the build path to the script `--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/`.
|
||||
and then pass the build directory to the script.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 data/lm/generate_lm.py --input_txt path/to/librispeech.txt.gz --output_dir path/lm/
|
||||
python3 data/lm/generate_lm.py --input_txt path/to/librispeech.txt.gz --output_dir path/lm/ --top_k 500000 \
|
||||
--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" \
|
||||
--arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
|
||||
|
||||
|
||||
Afterwards you can generate the scorer package with the above vocab-500000.txt and lm.binary files
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt --package kenlm.scorer --default_alpha 0.75 --default_beta 1.85
|
||||
python3 generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt \
|
||||
--package kenlm.scorer --default_alpha 0.75 --default_beta 1.85
|
||||
|
@ -144,49 +144,49 @@ def main():
|
||||
"--top_k",
|
||||
help="Use top_k most frequent words for the vocab.txt file. These will be used to filter the ARPA file.",
|
||||
type=int,
|
||||
default=500000,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--kenlm_bins",
|
||||
help="File path to the KENLM binaries lmplz, filter and build_binary",
|
||||
type=str,
|
||||
default="/DeepSpeech/native_client/kenlm/build/bin/",
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--arpa_order",
|
||||
help="Order of k-grams in ARPA-file generation",
|
||||
type=int,
|
||||
default=5,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max_arpa_memory",
|
||||
help="Maximum allowed memory usage for ARPA-file generation",
|
||||
type=str,
|
||||
default="75%",
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--arpa_prune",
|
||||
help="ARPA pruning parameters. Separate values with '|'",
|
||||
type=str,
|
||||
default="0|0|1",
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--binary_a_bits",
|
||||
help="Build binary quantization value a in bits",
|
||||
type=int,
|
||||
default=255,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--binary_q_bits",
|
||||
help="Build binary quantization value q in bits",
|
||||
type=int,
|
||||
default=8,
|
||||
required=True,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--binary_type",
|
||||
help="Build binary data structure type",
|
||||
type=str,
|
||||
default="trie",
|
||||
required=True,
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user