Merge branch 'reference-training-decoder-docs' (Fixes #3140)
This commit is contained in:
commit
48cd53e474
|
@ -20,7 +20,7 @@ pip3:
|
|||
$(PIP_INSTALL) -r ../taskcluster/docs-requirements.txt
|
||||
|
||||
submodule:
|
||||
git submodule update --init --remote
|
||||
git submodule update --init --remote -- ../doc/examples
|
||||
|
||||
# Add submodule update dependency to Sphinx's "html" target
|
||||
html: Makefile submodule pip3
|
||||
|
|
|
@ -1,3 +1,5 @@
|
|||
.. _training-docs:
|
||||
|
||||
Training Your Own Model
|
||||
=======================
|
||||
|
||||
|
@ -232,6 +234,8 @@ If your own data uses the *extact* same alphabet as the English release model (i
|
|||
|
||||
N.B. - If you have access to a pre-trained model which uses UTF-8 bytes at the output layer you can always fine-tune, because any alphabet should be encodable as UTF-8.
|
||||
|
||||
.. _training-fine-tuning:
|
||||
|
||||
Fine-Tuning (same alphabet)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
|
|
@ -52,7 +52,16 @@ There are several pre-trained model files available in official releases. Files
|
|||
| deepspeech-tflite | ❌ | ✅ |
|
||||
+--------------------+---------------------+---------------------+
|
||||
|
||||
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how language models are generated <scorer-scripts>`.
|
||||
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
|
||||
|
||||
Important considerations on model inputs
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The release notes include detailed information on how the released models were trained/constructed. Important considerations for users include the characteristics of the training data used and whether they match your intended use case. For acoustic models, an important characteristic is the demographic distribution of speakers. For external scorers, the texts should be similar to those of the expected use case. If the data used for training the models does not align with your intended use case, it may be necessary to adapt or train new models in order to get good accuracy in your transcription results.
|
||||
|
||||
The process for training an acoustic model is described in :ref:`training-docs`. In particular, fine tuning a release model using your own data can be a good way to leverage relatively smaller amounts of data that would not be sufficient for training a new model from scratch. See the :ref:`fine tuning and transfer learning sections <training-fine-tuning>` for more information. :ref:`Data augmentation <training-data-augmentation>` can also be a good way to increase the value of smaller training sets.
|
||||
|
||||
Creating your own external scorer from text data is another way that you can adapt the model to your specific needs. The process and tools used to generate an external scorer package are described in :ref:`scorer-scripts` and an overview of how the external scorer is used by DeepSpeech to perform inference is available in :ref:`decoder-docs`. Generating a smaller scorer from a single purpose text dataset is a quick process and can bring significant accuracy improvements, specially for more constrained, limited vocabulary applications.
|
||||
|
||||
Model compatibility
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
|
Loading…
Reference in New Issue