Merge pull request #3192 from mozilla/remove-scorer
Remove external scorer file and documentation and flag references
This commit is contained in:
commit
c1fd93ac8d
|
@ -19,7 +19,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
g++ \
|
g++ \
|
||||||
gcc \
|
gcc \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
|
||||||
libbz2-dev \
|
libbz2-dev \
|
||||||
libboost-all-dev \
|
libboost-all-dev \
|
||||||
libgsm1-dev \
|
libgsm1-dev \
|
||||||
|
|
|
@ -13,7 +13,6 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
cmake \
|
cmake \
|
||||||
curl \
|
curl \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
|
||||||
libboost-all-dev \
|
libboost-all-dev \
|
||||||
libbz2-dev \
|
libbz2-dev \
|
||||||
locales \
|
locales \
|
||||||
|
@ -32,7 +31,6 @@ RUN apt-get install -y --no-install-recommends libopus0 libsndfile1
|
||||||
RUN rm -rf /var/lib/apt/lists/*
|
RUN rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
WORKDIR /
|
WORKDIR /
|
||||||
RUN git lfs install
|
|
||||||
RUN git clone $DEEPSPEECH_REPO
|
RUN git clone $DEEPSPEECH_REPO
|
||||||
|
|
||||||
WORKDIR /DeepSpeech
|
WORKDIR /DeepSpeech
|
||||||
|
|
|
@ -5,7 +5,7 @@ This directory contains language-specific data files. Most importantly, you will
|
||||||
|
|
||||||
1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m deepspeech_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files.
|
1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m deepspeech_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files.
|
||||||
|
|
||||||
2. A scorer package (``data/lm/kenlm.scorer``) generated with ``generate_scorer_package`` (``native_client/generate_scorer_package.cpp``). The scorer package includes a binary n-gram language model generated with ``data/lm/generate_lm.py``.
|
2. A script used to generate a binary n-gram language model: ``data/lm/generate_lm.py``.
|
||||||
|
|
||||||
For more information on how to build these resources from scratch, see the ``External scorer scripts`` section on `deepspeech.readthedocs.io <https://deepspeech.readthedocs.io/>`_.
|
For more information on how to build these resources from scratch, see the ``External scorer scripts`` section on `deepspeech.readthedocs.io <https://deepspeech.readthedocs.io/>`_.
|
||||||
|
|
||||||
|
|
|
@ -1,3 +0,0 @@
|
||||||
version https://git-lfs.github.com/spec/v1
|
|
||||||
oid sha256:d0cf926ab9cab54a8a7d70003b931b2d62ebd9105ed392d1ec9c840029867799
|
|
||||||
size 953363776
|
|
|
@ -282,8 +282,9 @@ Please push DeepSpeech data to ``/sdcard/deepspeech/``\ , including:
|
||||||
|
|
||||||
|
|
||||||
* ``output_graph.tflite`` which is the TF Lite model
|
* ``output_graph.tflite`` which is the TF Lite model
|
||||||
* ``kenlm.scorer``, if you want to use the scorer; please be aware that too big
|
* External scorer file (available from one of our releases), if you want to use
|
||||||
scorer will make the device run out of memory
|
the scorer; please be aware that too big scorer will make the device run out
|
||||||
|
of memory
|
||||||
|
|
||||||
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/ds``\ :
|
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/ds``\ :
|
||||||
|
|
||||||
|
|
|
@ -6,15 +6,13 @@ Training Your Own Model
|
||||||
Prerequisites for training a model
|
Prerequisites for training a model
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
* `Python 3.6 <https://www.python.org/>`_
|
* `Python 3.6 <https://www.python.org/>`_
|
||||||
* `Git Large File Storage <https://git-lfs.github.com/>`_
|
|
||||||
* Mac or Linux environment
|
* Mac or Linux environment
|
||||||
|
|
||||||
Getting the training code
|
Getting the training code
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Install `Git Large File Storage <https://git-lfs.github.com/>`_ either manually or through a package-manager if available on your system. Then clone the DeepSpeech repository normally:
|
Clone the DeepSpeech repository:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
|
|
|
@ -31,7 +31,6 @@ Prerequisites
|
||||||
* Windows 10
|
* Windows 10
|
||||||
* `Windows 10 SDK <https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk>`_
|
* `Windows 10 SDK <https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk>`_
|
||||||
* `Visual Studio 2019 Community <https://visualstudio.microsoft.com/vs/community/>`_
|
* `Visual Studio 2019 Community <https://visualstudio.microsoft.com/vs/community/>`_
|
||||||
* `Git Large File Storage <https://git-lfs.github.com/>`_
|
|
||||||
* `TensorFlow Windows pre-requisites <https://www.tensorflow.org/install/source_windows>`_
|
* `TensorFlow Windows pre-requisites <https://www.tensorflow.org/install/source_windows>`_
|
||||||
|
|
||||||
Inside the Visual Studio Installer enable ``MS Build Tools`` and ``VC++ 2019 v16.00 (v160) toolset for desktop``.
|
Inside the Visual Studio Installer enable ``MS Build Tools`` and ``VC++ 2019 v16.00 (v160) toolset for desktop``.
|
||||||
|
|
|
@ -157,7 +157,7 @@ def create_flags():
|
||||||
|
|
||||||
f.DEFINE_boolean('utf8', False, 'enable UTF-8 mode. When this is used the model outputs UTF-8 sequences directly rather than using an alphabet mapping.')
|
f.DEFINE_boolean('utf8', False, 'enable UTF-8 mode. When this is used the model outputs UTF-8 sequences directly rather than using an alphabet mapping.')
|
||||||
f.DEFINE_string('alphabet_config_path', 'data/alphabet.txt', 'path to the configuration file specifying the alphabet used by the network. See the comment in data/alphabet.txt for a description of the format.')
|
f.DEFINE_string('alphabet_config_path', 'data/alphabet.txt', 'path to the configuration file specifying the alphabet used by the network. See the comment in data/alphabet.txt for a description of the format.')
|
||||||
f.DEFINE_string('scorer_path', 'data/lm/kenlm.scorer', 'path to the external scorer file.')
|
f.DEFINE_string('scorer_path', '', 'path to the external scorer file.')
|
||||||
f.DEFINE_alias('scorer', 'scorer_path')
|
f.DEFINE_alias('scorer', 'scorer_path')
|
||||||
f.DEFINE_integer('beam_width', 1024, 'beam width used in the CTC decoder when building candidate transcriptions')
|
f.DEFINE_integer('beam_width', 1024, 'beam width used in the CTC decoder when building candidate transcriptions')
|
||||||
f.DEFINE_float('lm_alpha', 0.931289039105002, 'the alpha hyperparameter of the CTC decoder. Language Model weight.')
|
f.DEFINE_float('lm_alpha', 0.931289039105002, 'the alpha hyperparameter of the CTC decoder. Language Model weight.')
|
||||||
|
|
Loading…
Reference in New Issue