From a584c8e6b6a4f231d6ce37be61c23c9802417f10 Mon Sep 17 00:00:00 2001 From: Reuben Morais Date: Mon, 27 Apr 2020 18:49:02 +0200 Subject: [PATCH 1/4] Docs centered on ReadTheDocs instead of GitHub --- README.rst | 80 ++---------------------------- SUPPORT.rst | 12 ++--- doc/Decoder.rst | 2 +- doc/NodeJS-Examples.rst | 2 + doc/Python-Examples.rst | 4 +- doc/TRAINING.rst | 6 +-- doc/USING.rst | 31 ++++++++---- doc/index.rst | 44 ++++++++++++++++ native_client/javascript/README.md | 2 +- native_client/python/README.rst | 2 +- 10 files changed, 84 insertions(+), 101 deletions(-) diff --git a/README.rst b/README.rst index 3e9320e5..17b849fa 100644 --- a/README.rst +++ b/README.rst @@ -14,82 +14,10 @@ Project DeepSpeech DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper `_. Project DeepSpeech uses Google's `TensorFlow `_ to make the implementation easier. -**NOTE:** This documentation applies to the **master version** of DeepSpeech only. **Documentation for all versions** is published on `deepspeech.readthedocs.io `_. +Documentation for installation, usage, and training models is available on `deepspeech.readthedocs.io `_. -To install and use DeepSpeech all you have to do is: +For the latest release, including pre-trained models and checkpoints, `see the latest release on GitHub `_. -.. code-block:: bash +For contribution guidelines, see `CONTRIBUTING.rst `_. - # Create and activate a virtualenv - virtualenv -p python3 $HOME/tmp/deepspeech-venv/ - source $HOME/tmp/deepspeech-venv/bin/activate - - # Install DeepSpeech - pip3 install deepspeech - - # Download pre-trained English model files - curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm - curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer - - # Download example audio files - curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/audio-0.7.0.tar.gz - tar xvf audio-0.7.0.tar.gz - - # Transcribe an audio file - deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav - -A pre-trained English model is available for use and can be downloaded using `the instructions below `_. A package with some example audio files is available for download in our `release notes `_. - -Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes `_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package: - -.. code-block:: bash - - # Create and activate a virtualenv - virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/ - source $HOME/tmp/deepspeech-gpu-venv/bin/activate - - # Install DeepSpeech CUDA enabled package - pip3 install deepspeech-gpu - - # Transcribe an audio file. - deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav - -Please ensure you have the required `CUDA dependencies `_. - -See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``\ , please check `required runtime dependencies `_\ ). - ----- - -**Table of Contents** - -* `Using a Pre-trained Model `_ - - * `CUDA dependency `_ - * `Getting the pre-trained model `_ - * `Model compatibility `_ - * `Using the Python package `_ - * `Using the Node.JS package `_ - * `Using the Command Line client `_ - * `Installing bindings from source `_ - * `Third party bindings `_ - - -* `Trying out DeepSpeech with examples `_ - -* `Training your own Model `_ - - * `Prerequisites for training a model `_ - * `Getting the training code `_ - * `Installing Python dependencies `_ - * `Recommendations `_ - * `Common Voice training data `_ - * `Training a model `_ - * `Checkpointing `_ - * `Exporting a model for inference `_ - * `Exporting a model for TFLite `_ - * `Making a mmap-able model for inference `_ - * `Continuing training from a release model `_ - * `Training with Augmentation `_ - -* `Contribution guidelines `_ -* `Contact/Getting Help `_ +For contact and support information, see `SUPPORT.rst `_. diff --git a/SUPPORT.rst b/SUPPORT.rst index 8ef8ae11..c30e13a2 100644 --- a/SUPPORT.rst +++ b/SUPPORT.rst @@ -4,14 +4,10 @@ Contact/Getting Help There are several ways to contact us or to get help: -#. - `\ **FAQ** `_ - We have a list of common questions, and their answers, in our `FAQ `_. When just getting started, it's best to first check the `FAQ `_ to see if your question is addressed. +#. `FAQ `_ - We have a list of common questions, and their answers, in our `FAQ `_. When just getting started, it's best to first check the `FAQ `_ to see if your question is addressed. -#. - `\ **Discourse Forums** `_ - If your question is not addressed in the `FAQ `_\ , the `Discourse Forums `_ is the next place to look. They contain conversations on `General Topics `_\ , `Using Deep Speech `_\ , and `Deep Speech Development `_. +#. `Discourse Forums `_ - If your question is not addressed in the `FAQ `_\ , the `Discourse Forums `_ is the next place to look. They contain conversations on `General Topics `_\ , `Using Deep Speech `_\ , and `Deep Speech Development `_. -#. - `\ **Matrix chat** `_ - If your question is not addressed by either the `FAQ `_ or `Discourse Forums `_\ , you can contact us on the ``#machinelearning`` channel on `Mozilla Matrix `_\ ; people there can try to answer/help +#. `Matrix chat `_ - If your question is not addressed by either the `FAQ `_ or `Discourse Forums `_\ , you can contact us on the ``#machinelearning`` channel on `Mozilla Matrix `_\ ; people there can try to answer/help -#. - `\ **Issues** `_ - Finally, if all else fails, you can open an issue in our repo. +#. `Issues `_ - Finally, if all else fails, you can open an issue in our repo. diff --git a/doc/Decoder.rst b/doc/Decoder.rst index d7960fad..03cbd39d 100644 --- a/doc/Decoder.rst +++ b/doc/Decoder.rst @@ -76,4 +76,4 @@ The character, '|' in this case, will then have to be replaced with spaces as a Implementation ^^^^^^^^^^^^^^ -The decoder source code can be found in ``native_client/ctcdecode``. The decoder is included in the language bindings and clients. In addition, there is a separate Python module which includes just the decoder and is needed for evaluation. In order to build and install this package, see the :github:`native_client README `. +The decoder source code can be found in ``native_client/ctcdecode``. The decoder is included in the language bindings and clients. In addition, there is a separate Python module which includes just the decoder and is needed for evaluation. A pre-built version of this package is automatically downloaded and installed when installing the training code. If you want to manually build and install it from source, see the :github:`native_client README `. diff --git a/doc/NodeJS-Examples.rst b/doc/NodeJS-Examples.rst index 9c1197a3..ef7e7761 100644 --- a/doc/NodeJS-Examples.rst +++ b/doc/NodeJS-Examples.rst @@ -1,3 +1,5 @@ +.. _js-api-example: + JavaScript API Usage example ============================= diff --git a/doc/Python-Examples.rst b/doc/Python-Examples.rst index 9bbc4a3b..e00ac722 100644 --- a/doc/Python-Examples.rst +++ b/doc/Python-Examples.rst @@ -1,7 +1,9 @@ +.. _py-api-example: + Python API Usage example ======================== -Examples are from `native_client/python/client.cc`. +Examples are from `native_client/python/client.py`. Creating a model instance and loading model ------------------------------------------- diff --git a/doc/TRAINING.rst b/doc/TRAINING.rst index fecbbd53..3f0b584c 100644 --- a/doc/TRAINING.rst +++ b/doc/TRAINING.rst @@ -65,7 +65,7 @@ If you have a capable (NVIDIA, at least 8GB of VRAM) GPU, it is highly recommend pip3 uninstall tensorflow pip3 install 'tensorflow-gpu==1.15.2' -Please ensure you have the required `CUDA dependency `_. +Please ensure you have the required :ref:`CUDA dependency `. It has been reported for some people failure at training: @@ -74,7 +74,7 @@ It has been reported for some people failure at training: tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node tower_0/conv1d/Conv2D}}]] -Setting the ``TF_FORCE_GPU_ALLOW_GROWTH`` environment variable to ``true`` seems to help in such cases. This could also be due to an incorrect version of libcudnn. Double check your versions with the `TensorFlow 1.15 documentation `_. +Setting the ``TF_FORCE_GPU_ALLOW_GROWTH`` environment variable to ``true`` seems to help in such cases. This could also be due to an incorrect version of libcudnn. Double check your versions with the :ref:`TensorFlow 1.15 documentation `. Common Voice training data ^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -179,7 +179,7 @@ Exporting a model for inference ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If the ``--export_dir`` parameter is provided, a model will have been exported to this directory during training. -Refer to the corresponding :github:`README.rst ` for information on building and running a client that can use the exported model. +Refer to the :ref:`usage instructions ` for information on running a client that can use the exported model. Exporting a model for TFLite ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/doc/USING.rst b/doc/USING.rst index 7a98813e..57ee279d 100644 --- a/doc/USING.rst +++ b/doc/USING.rst @@ -1,16 +1,19 @@ +.. _usage-docs: + Using a Pre-trained Model ========================= Inference using a DeepSpeech pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed `further down in this README <#third-party-bindings>`_. - -* `The Python package/language binding <#using-the-python-package>`_ -* `The Node.JS package/language binding <#using-the-nodejs-package>`_ -* `The Command-Line client <#using-the-command-line-client>`_ +* `The C API `. +* :ref:`The Python package/language binding ` +* :ref:`The Node.JS package/language binding ` +* :ref:`The command-line client ` * :github:`The .NET client/language binding ` -Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system: +.. _runtime-deps: +Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system: * ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz. * ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually. @@ -20,6 +23,8 @@ Running ``deepspeech`` might, see below, require some runtime dependencies to be Please refer to your system's documentation on how to install these dependencies. +.. _cuda-deps: + CUDA dependency ^^^^^^^^^^^^^^^ @@ -40,6 +45,8 @@ Model compatibility DeepSpeech models are versioned to keep you from trying to use an incompatible graph with a newer client after a breaking change was made to the code. If you get an error saying your model file version is too old for the client, you should either upgrade to a newer model release, re-export your model from the checkpoint using a newer version of the code, or downgrade your client if you need to use the old model and can't re-export it. +.. _py-usage: + Using the Python package ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -110,7 +117,9 @@ Note: the following command assumes you `downloaded the pre-trained model <#gett The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio. -See :github:`client.py ` for an example of how to use the package programatically. +See :ref:`the Python client ` for an example of how to use the package programatically. + +.. _nodejs-usage: Using the Node.JS / Electron.JS package ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -135,9 +144,11 @@ Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can in See the `release notes `_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_. -See :github:`client.ts ` for an example of how to use the bindings. +See the :ref:`TypeScript client ` for an example of how to use the bindings programatically. -Using the Command-Line client +.. _cli-usage: + +Using the command-line client ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To download the pre-built binaries for the ``deepspeech`` command-line (compiled C++) client, use ``util/taskcluster.py``\ : @@ -168,12 +179,12 @@ Note: the following command assumes you `downloaded the pre-trained model <#gett ./deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio_input.wav -See the help output with ``./deepspeech -h`` and the :github:`native client README ` for more details. +See the help output with ``./deepspeech -h`` for more details. Installing bindings from source ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow these :github:`native client installation instructions `. +If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow the :github:`native client build and installation instructions `. Third party bindings ^^^^^^^^^^^^^^^^^^^^ diff --git a/doc/index.rst b/doc/index.rst index 9eb761e1..fbf1a620 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -6,6 +6,50 @@ Welcome to DeepSpeech's documentation! ====================================== +DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper `_. Project DeepSpeech uses Google's `TensorFlow `_ to make the implementation easier. + +To install and use DeepSpeech all you have to do is: + +.. code-block:: bash + + # Create and activate a virtualenv + virtualenv -p python3 $HOME/tmp/deepspeech-venv/ + source $HOME/tmp/deepspeech-venv/bin/activate + + # Install DeepSpeech + pip3 install deepspeech + + # Download pre-trained English model files + curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm + curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer + + # Download example audio files + curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/audio-0.7.0.tar.gz + tar xvf audio-0.7.0.tar.gz + + # Transcribe an audio file + deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav + +A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs `. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page `_. + +Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes `_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package: + +.. code-block:: bash + + # Create and activate a virtualenv + virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/ + source $HOME/tmp/deepspeech-gpu-venv/bin/activate + + # Install DeepSpeech CUDA enabled package + pip3 install deepspeech-gpu + + # Transcribe an audio file. + deepspeech --model deepspeech-0.7.0-models.pbmm --scorer deepspeech-0.7.0-models.scorer --audio audio/2830-3980-0043.wav + +Please ensure you have the required :ref:`CUDA dependencies `. + +See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``, please check :ref:`required runtime dependencies `). + .. toctree:: :maxdepth: 2 :caption: Introduction diff --git a/native_client/javascript/README.md b/native_client/javascript/README.md index 267fbeba..39b291f6 100644 --- a/native_client/javascript/README.md +++ b/native_client/javascript/README.md @@ -1,4 +1,4 @@ -Full project description and documentation on GitHub: [https://github.com/mozilla/DeepSpeech](https://github.com/mozilla/DeepSpeech). +Full project description and documentation on [https://deepspeech.readthedocs.io/](https://deepspeech.readthedocs.io/). ## Generating TypeScript Type Definitions diff --git a/native_client/python/README.rst b/native_client/python/README.rst index bde1e032..04d6bb29 100644 --- a/native_client/python/README.rst +++ b/native_client/python/README.rst @@ -1 +1 @@ -Full project description and documentation on GitHub: `https://github.com/mozilla/DeepSpeech `_ +Full project description and documentation on `https://deepspeech.readthedocs.io/ `_ From 1838a1e0d49f796d88140a0582298e2601c74b2e Mon Sep 17 00:00:00 2001 From: Reuben Morais Date: Tue, 28 Apr 2020 11:55:20 +0200 Subject: [PATCH 2/4] Remove FAQ reference and reword SUPPORT.rst a bit --- SUPPORT.rst | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/SUPPORT.rst b/SUPPORT.rst index c30e13a2..f2291e01 100644 --- a/SUPPORT.rst +++ b/SUPPORT.rst @@ -3,11 +3,8 @@ Contact/Getting Help There are several ways to contact us or to get help: - -#. `FAQ `_ - We have a list of common questions, and their answers, in our `FAQ `_. When just getting started, it's best to first check the `FAQ `_ to see if your question is addressed. - -#. `Discourse Forums `_ - If your question is not addressed in the `FAQ `_\ , the `Discourse Forums `_ is the next place to look. They contain conversations on `General Topics `_\ , `Using Deep Speech `_\ , and `Deep Speech Development `_. +#. `Discourse Forums `_ - The `Deep Speech category on Discourse `_ is the first place to look. Search for keywords related to your question or problem to see if someone else has run into it already. If you can't find anything relevant there, search on our `issue tracker `_ to see if there is an existing issue about your problem. #. `Matrix chat `_ - If your question is not addressed by either the `FAQ `_ or `Discourse Forums `_\ , you can contact us on the ``#machinelearning`` channel on `Mozilla Matrix `_\ ; people there can try to answer/help -#. `Issues `_ - Finally, if all else fails, you can open an issue in our repo. +#. `Create a new issue `_ - Finally, if you have a bug report or a feature request that isn't already covered by an existing issue, please open an issue in our repo and fill the appropriate information on your hardware and software setup. From d85b0960eb645f287d961f8d0acfd0298e035839 Mon Sep 17 00:00:00 2001 From: Reuben Morais Date: Tue, 28 Apr 2020 11:55:58 +0200 Subject: [PATCH 3/4] Address review comment --- doc/Decoder.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/Decoder.rst b/doc/Decoder.rst index 03cbd39d..e337f031 100644 --- a/doc/Decoder.rst +++ b/doc/Decoder.rst @@ -76,4 +76,4 @@ The character, '|' in this case, will then have to be replaced with spaces as a Implementation ^^^^^^^^^^^^^^ -The decoder source code can be found in ``native_client/ctcdecode``. The decoder is included in the language bindings and clients. In addition, there is a separate Python module which includes just the decoder and is needed for evaluation. A pre-built version of this package is automatically downloaded and installed when installing the training code. If you want to manually build and install it from source, see the :github:`native_client README `. +The decoder source code can be found in ``native_client/ctcdecode``. The decoder is included in the language bindings and clients. In addition, there is a separate Python module which includes just the decoder and is needed for evaluation. A pre-built version of this package is automatically downloaded and installed when installing the training code. If you want or need to manually build and install it from source, see the :github:`native_client README `. From 6f9fcf302905d3e43872de4a419cbb3233806f23 Mon Sep 17 00:00:00 2001 From: Reuben Morais Date: Tue, 28 Apr 2020 13:33:45 +0200 Subject: [PATCH 4/4] Embed flag definitions --- doc/Flags.rst | 16 ++++++++++++++++ doc/TRAINING.rst | 2 +- training/deepspeech_training/util/flags.py | 3 +++ 3 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 doc/Flags.rst diff --git a/doc/Flags.rst b/doc/Flags.rst new file mode 100644 index 00000000..66b26f0c --- /dev/null +++ b/doc/Flags.rst @@ -0,0 +1,16 @@ +.. _training-flags: + +Command-line flags for the training scripts +=========================================== + +Below you can find the definition of all command-line flags supported by the training scripts. This includes ``DeepSpeech.py``, ``evaluate.py``, ``evaluate_tflite.py``, ``transcribe.py`` and ``lm_optimizer.py``. + +Flags +----- + +.. literalinclude:: ../training/deepspeech_training/util/flags.py + :language: python + :linenos: + :lineno-match: + :start-after: sphinx-doc: training_ref_flags_start + :end-before: sphinx-doc: training_ref_flags_end diff --git a/doc/TRAINING.rst b/doc/TRAINING.rst index 3f0b584c..36decbb9 100644 --- a/doc/TRAINING.rst +++ b/doc/TRAINING.rst @@ -123,7 +123,7 @@ The central (Python) script is ``DeepSpeech.py`` in the project's root directory ./DeepSpeech.py --helpfull -To get the output of this in a slightly better-formatted way, you can also look up the option definitions in :github:`util/flags.py `. +To get the output of this in a slightly better-formatted way, you can also look at the flag definitions in :ref:`training-flags`. For executing pre-configured training scenarios, there is a collection of convenience scripts in the ``bin`` folder. Most of them are named after the corpora they are configured for. Keep in mind that most speech corpora are *very large*, on the order of tens of gigabytes, and some aren't free. Downloading and preprocessing them can take a very long time, and training on them without a fast GPU (GTX 10 series or newer recommended) takes even longer. diff --git a/training/deepspeech_training/util/flags.py b/training/deepspeech_training/util/flags.py index fde946cc..2bd65be0 100644 --- a/training/deepspeech_training/util/flags.py +++ b/training/deepspeech_training/util/flags.py @@ -5,6 +5,7 @@ import absl.flags FLAGS = absl.flags.FLAGS +# sphinx-doc: training_ref_flags_start def create_flags(): # Importer # ======== @@ -198,3 +199,5 @@ def create_flags(): f.register_validator('one_shot_infer', lambda value: not value or os.path.isfile(value), message='The file pointed to by --one_shot_infer must exist and be readable.') + +# sphinx-doc: training_ref_flags_end