From daf28086e58ffaa4e99b0b9343bb353ada3da036 Mon Sep 17 00:00:00 2001 From: Reuben Morais Date: Tue, 7 Jul 2020 19:59:41 +0200 Subject: [PATCH] Add note on model input data considerations and reference training/scorer docs --- doc/TRAINING.rst | 4 ++++ doc/USING.rst | 9 +++++++++ 2 files changed, 13 insertions(+) diff --git a/doc/TRAINING.rst b/doc/TRAINING.rst index f7af448e..f42cf819 100644 --- a/doc/TRAINING.rst +++ b/doc/TRAINING.rst @@ -1,3 +1,5 @@ +.. _training-docs: + Training Your Own Model ======================= @@ -232,6 +234,8 @@ If your own data uses the *extact* same alphabet as the English release model (i N.B. - If you have access to a pre-trained model which uses UTF-8 bytes at the output layer you can always fine-tune, because any alphabet should be encodable as UTF-8. +.. _training-fine-tuning: + Fine-Tuning (same alphabet) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/doc/USING.rst b/doc/USING.rst index fe650a07..f1ae2c74 100644 --- a/doc/USING.rst +++ b/doc/USING.rst @@ -54,6 +54,15 @@ There are several pre-trained model files available in official releases. Files Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process ` and :ref:`how language models are generated `. +Important considerations on model inputs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The release notes include detailed information on how the released models were trained/constructed. Important considerations for users include the characteristics of the used training data and whether they match your intended use case. For acoustic models, an important characteristic is the demographic distribution of speakers. For language models, the sources of text used in their construction. If the data used for training the models does not align with your intended use case, it may be necessary to adapt or train new models in order to get good accuracy in your transcription results. + +The process for training an acoustic model is described in :ref:`training-docs`. In particular, fine tuning a release model using your own data can be a good way to leverage relatively smaller amounts of data that would not be sufficient for training a new model from scratch. See the :ref:`fine tuning and transfer learning sections ` for more information. :ref:`Data augmentation ` can also be a good way to increase the value of smaller training sets. + +Creating your own external scorer from text data is another way that you can adapt the model to your specific needs. The process and tools used to generate an external scorer package are described in :ref:`scorer-scripts` and an overview of how the external scorer is used by DeepSpeech to perform inference is available in :ref:`decoder-docs`. Generating a smaller scorer from a single purpose text dataset is a quick process and can bring significant accuracy improvements, specially for more constrained, limited vocabulary applications. + Model compatibility ^^^^^^^^^^^^^^^^^^^