Docs welcome page and Development / Inference page overhaul (#1793)

* Docs welcome page and Development / Inference page overhaul

* Address review comments

* Fix broken refs and other small adjustments

Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
This commit is contained in:
Josh Meyer 2021-03-17 05:14:50 -04:00 committed by GitHub
parent 2d654706ed
commit 629706b262
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 302 additions and 343 deletions

View File

@ -139,6 +139,8 @@ After following the above build and installation instructions, the Node.JS bindi
This will create the package ``stt-VERSION.tgz`` in ``native_client/javascript``.
.. _build-ctcdecoder-package:
Install the CTC decoder package
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@ -3,7 +3,7 @@
Building Coqui STT native client for Windows
============================================
Now we can build the native client of 🐸STT and run inference on Windows using the C# client, to do that we need to compile the ``native_client``.
Now we can build the native client of 🐸STT and deploy on Windows using the C# client, to do that we need to compile the ``native_client``.
**Table of Contents**

View File

@ -13,8 +13,8 @@ Creating a model instance and loading model
:start-after: sphinx-doc: c_ref_model_start
:end-before: sphinx-doc: c_ref_model_stop
Performing inference
--------------------
Deploying trained model
-----------------------
.. literalinclude:: ../native_client/client.cc
:language: c

233
doc/DEPLOYMENT.rst Normal file
View File

@ -0,0 +1,233 @@
.. _usage-docs:
Deployment / Inference
======================
You might call the act of transcribing audio with a trained model either "deployment" or "inference". In this document we use "deployment", but we consider the terms interchangable.
Introduction
^^^^^^^^^^^^
Deployment is the process of feeding audio (speech) into a trained 🐸STT model and receiving text (transcription) as output. In practice you probably want to use two models for deployment: an audio model and a text model. The audio model (a.k.a. the acoustic model) is a deep neural network which converts audio into text. The text model (a.k.a. the language model / scorer) returns the likelihood of a string of text. If the acoustic model makes spelling or grammatical mistakes, the language model can help correct them.
You can deploy 🐸STT models either via a command-line client or a language binding. 🐸 provides three language bindings and one command line client. There also exist several community-maintained clients and language bindings, which are listed `further down in this README <#third-party-bindings>`_.
*Note that 🐸STT currently only provides packages for CPU deployment with Python 3.5 or higher on Linux. We're working to get the rest of our usually supported packages back up and running as soon as possible.*
* :ref:`The Python package + language binding <py-usage>`
* :ref:`The command-line client <cli-usage>`
* :ref:`The native C API <c-usage>`
* :ref:`The Node.JS package + language binding <nodejs-usage>`
* :ref:`The .NET client + language binding <build-native-client-dotnet>`
.. _download-models:
Download trained Coqui STT models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can find pre-trained models ready for deployment on the 🐸STT `releases page <https://github.com/coqui-ai/STT/releases>`_. You can also download the latest acoustic model (``.pbmm``) and language model (``.scorer``) from the command line as such:
.. code-block:: bash
wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.pbmm
wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.scorer
In every 🐸STT official release, there are several kinds of model files provided. For the acoustic model there are two file extensions: ``.pbmm`` and ``.tflite``. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. ``.pbmm`` files are also compatible with CUDA enabled clients and language bindings. Files ending in ``.tflite``, on the other hand, are only compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. TFLite models are optimized for size and performance on low-power devices. You can find a full list of supported platforms and TensorFlow runtimes at :ref:`supported-platforms-deployment`.
For language models, there is only only file extension: ``.scorer``. Language models can run on any supported device, regardless of Tensorflow runtime. You can read more about language models with regard to :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
How will a model perform on my data?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How well a 🐸STT model transcribes your audio will depend on a lot of things. The general rule is the following: the more similar your data is to the data used to train the model, the better the model will transcribe your data. The more your data differs from the data used to train the model, the worse the model will perform on your data. This general rule applies to both the acoustic model and the language model. There are many dimensions upon which data can differ, but here are the most important ones:
* Language
* Accent
* Speaking style
* Speaking topic
* Speaker demographics
If you take a 🐸STT model trained on English, and pass Spanish into it, you should expect the model to perform horribly. Imagine you have a friend who only speaks English, and you ask her to make Spanish subtitles for a Spanish film, you wouldn't expect to get good subtitles. This is an extreme example, but it helps to form an intuition for what to expect from 🐸STT models. Imagine that the 🐸STT models are like people who speak a certain language with a certain accent, and then think about what would happen if you asked that person to transcribe your audio.
An acoustic model (i.e. ``.pbmm`` or ``.tflite``) has "learned" how to transcribe a certain language, and the model probably understands some accents better than others. In addition to languages and accents, acoustic models are sensitive to the style of speech, the topic of speech, and the demographics of the person speaking. The language model (``.scorer``) has been trained on text alone. As such, the language model is sensitive to how well the topic and style of speech matches that of the text used in training. The 🐸STT `release notes <https://github.com/coqui-ai/STT/releases/tag/v0.9.3>`_ include detailed information on the data used to train the models. If the data used for training the off-the-shelf models does not align with your intended use case, it may be necessary to adapt or train new models in order to improve transcription on your data.
Training your own language model is often a good way to improve transcription on your audio. The process and tools used to generate a language model are described in :ref:`scorer-scripts` and general information can be found in :ref:`decoder-docs`. Generating a scorer from a constrained topic dataset is a quick process and can bring significant accuracy improvements if your audio is from a specific topic.
Acoustic model training is described in :ref:`training-docs`. Fine tuning an off-the-shelf acoustic model to your own data can be a good way to improve performance. See the :ref:`fine tuning and transfer learning sections <training-fine-tuning>` for more information.
Model compatibility
^^^^^^^^^^^^^^^^^^^
🐸STT models are versioned to mitigate incompatibilities with clients and language bindings. If you get an error saying your model file version is too old for the client, you should either (1) upgrade to a newer model, (2) re-export your model from the checkpoint using a newer version of the code, or (3) downgrade your client if you need to use the old model and can't re-export it.
.. _py-usage:
Using the Python package
^^^^^^^^^^^^^^^^^^^^^^^^
Pre-built binaries for deploying a trained model can be installed with ``pip``. It is highly recommended that you use Python 3.5 or higher in a virtual environment. Both `pip <https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#installing-pip>`_ and `venv <https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment>`_ are included in normal Python 3 installations.
When you create a new Python virtual environment, you create a directory containing a ``python`` binary and everything needed to run 🐸STT. For the purpose of this documentation, we will use on ``$HOME/coqui-stt-venv``, but you can use whatever directory you like.
Let's make the virtual environment:
.. code-block::
$ python3 -m venv $HOME/coqui-stt-venv/
After this command completes, your new environment is ready to be activated. Each time you work with 🐸STT, you need to *activate* your virtual environment, as such:
.. code-block::
$ source $HOME/coqui-stt-venv/bin/activate
After your environment has been activated, you can use ``pip`` to install ``stt``, as such:
.. code-block::
(coqui-stt-venv)$ python3 -m pip install -U pip && python3 -m pip install stt
After installation has finished, you can call ``stt`` from the command-line.
The following command assumes you :ref:`downloaded the pre-trained models <download-models>`.
.. code-block:: bash
(coqui-stt-venv)$ stt --model stt-0.9.3-models.pbmm --scorer stt-0.9.3-models.scorer --audio my_audio_file.wav
See :ref:`the Python client <py-api-example>` for an example of how to use the package programatically.
*GPUs will soon be supported:* If you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
.. code-block::
(coqui-stt-venv)$ python3 -m pip install -U pip && python3 -m pip install stt-gpu
See the `release notes <https://github.com/coqui-ai/STT/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
.. _cli-usage:
Using the command-line client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To download the pre-built binaries for the ``stt`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
.. code-block:: bash
python3 util/taskcluster.py --target .
or if you're on macOS:
.. code-block:: bash
python3 util/taskcluster.py --arch osx --target .
also, if you need some binaries different than current main branch, like ``v0.2.0-alpha.6``\ , you can use ``--branch``\ :
.. code-block:: bash
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``stt`` binary and associated libraries) and extract it into the current folder. ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python3 util/taskcluster.py -h`` for more details. Specific branches of 🐸STT or TensorFlow can be specified as well.
Alternatively you may manually download the ``native_client.tar.xz`` from the `releases page <https://github.com/coqui-ai/STT/releases>`_.
Assuming you have :ref:`downloaded the pre-trained models <download-models>`, you can use the client as such:
.. code-block:: bash
./stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio_input.wav
See the help output with ``./stt -h`` for more details.
.. _nodejs-usage:
Using the Node.JS / Electron.JS package
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*Note that 🐸STT currently only provides packages for CPU deployment with Python 3.5 or higher on Linux. We're working to get the rest of our usually supported packages back up and running as soon as possible.*
You can download the JS bindings using ``npm``\ :
.. code-block:: bash
npm install stt
Special thanks to `Huan - Google Developers Experts in Machine Learning (ML GDE) <https://github.com/huan>`_ for providing the STT project name on npmjs.org
Please note that as of now, we support:
- Node.JS versions 4 to 13
- Electron.JS versions 1.6 to 7.1
TypeScript support is also provided.
If you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
.. code-block:: bash
npm install stt-gpu
See the `release notes <https://github.com/coqui-ai/STT/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
See the :ref:`TypeScript client <js-api-example>` for an example of how to use the bindings programatically.
Installing bindings from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow the :ref:`native client build and installation instructions <native-build-client>`.
Dockerfile for building from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We provide ``Dockerfile.build`` to automatically build ``libstt.so``, the C++ native client, Python bindings, and KenLM.
You need to generate the Dockerfile from the template using:
.. code-block:: bash
make Dockerfile.build
If you want to specify a different repository or branch, you can pass ``STT_REPO`` or ``STT_SHA`` parameters:
.. code-block:: bash
make Dockerfile.build STT_REPO=git://your/fork STT_SHA=origin/your-branch
.. _runtime-deps:
Runtime Dependencies
^^^^^^^^^^^^^^^^^^^^
Running ``stt`` may require runtime dependencies. Please refer to your system's documentation on how to install these dependencies.
* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP
* ``libstdc++`` - Standard C++ Library implementation
* ``libpthread`` - Reported dependency on Linux. On Ubuntu, ``libpthread`` is part of the ``libpthread-stubs0-dev`` package
* ``Redistribuable Visual C++ 2015 Update 3 (64-bits)`` - Reported dependency on Windows. Please `download from Microsoft <https://www.microsoft.com/download/details.aspx?id=53587>`_
CUDA Dependency
^^^^^^^^^^^^^^^
The GPU capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN v7.6.
.. _cuda-inference-deps:
.. toctree::
:maxdepth: 1
:caption: Supported Platforms
SUPPORTED_PLATFORMS
.. Third party bindings
^^^^^^^^^^^^^^^^^^^^
In addition to the official 🐸STT bindings and clients, third party developers have provided bindings to other languages:
* `Asticode <https://github.com/asticode>`_ provides `Golang <https://golang.org>`_ bindings in its `go-astideepspeech <https://github.com/asticode/go-astideepspeech>`_ repo.
* `RustAudio <https://github.com/RustAudio>`_ provide a `Rust <https://www.rust-lang.org>`_ binding, the installation and use of which is described in their `deepspeech-rs <https://github.com/RustAudio/deepspeech-rs>`_ repo.
* `stes <https://github.com/stes>`_ provides preliminary `PKGBUILDs <https://wiki.archlinux.org/index.php/PKGBUILD>`_ to install the client and python bindings on `Arch Linux <https://www.archlinux.org/>`_ in the `arch-deepspeech <https://github.com/stes/arch-deepspeech>`_ repo.
* `gst-deepspeech <https://github.com/Elleo/gst-deepspeech>`_ provides a `GStreamer <https://gstreamer.freedesktop.org/>`_ plugin which can be used from any language with GStreamer bindings.
* `thecodrr <https://github.com/thecodrr>`_ provides `Vlang <https://vlang.io>`_ bindings. The installation and use of which is described in their `vspeech <https://github.com/thecodrr/vspeech>`_ repo.
* `eagledot <https://gitlab.com/eagledot>`_ provides `NIM-lang <https://nim-lang.org/>`_ bindings. The installation and use of which is described in their `nim-deepspeech <https://gitlab.com/eagledot/nim-deepspeech>`_ repo.

View File

@ -82,4 +82,4 @@ The character, '|' in this case, will then have to be replaced with spaces as a
Implementation
^^^^^^^^^^^^^^
The decoder source code can be found in ``native_client/ctcdecode``. The decoder is included in the language bindings and clients. In addition, there is a separate Python module which includes just the decoder and is needed for evaluation. A pre-built version of this package is automatically downloaded and installed when installing the training code. If you want or need to manually build and install it from source, see the :github:`native_client README <native_client/README.rst#install-the-ctc-decoder-package>`.
The decoder source code can be found in ``native_client/ctcdecode``. The decoder is included in the language bindings and clients. In addition, there is a separate Python module which includes just the decoder and is needed for evaluation. A pre-built version of this package is automatically downloaded and installed when installing the training code. If you want or need to manually build and install it from source, see the :ref:`decoder build and installation instructions <build-ctcdecoder-package>`.

View File

@ -13,7 +13,7 @@ Creating a model instance and loading model
:start-after: sphinx-doc: csharp_ref_model_start
:end-before: sphinx-doc: csharp_ref_model_stop
Performing inference
Deploying trained model
--------------------
.. literalinclude:: ../native_client/dotnet/STTConsole/Program.cs

View File

@ -43,7 +43,7 @@ Previously mentioned problem where extensive boost value caused letter splitting
Example
-------
To use hot-word boosting just add hot-words of your choice before performing an inference to a ``Model``. You can also erase boosting of a chosen word or clear it for all hot-words.
To use hot-word boosting just add hot-words of your choice performing a speech-to-text operation with a ``Model``. You can also erase boosting of a chosen word or clear it for all hot-words.
.. code-block:: python

View File

@ -13,7 +13,7 @@ Creating a model instance and loading model
:start-after: sphinx-doc: java_ref_model_start
:end-before: sphinx-doc: java_ref_model_stop
Performing inference
Deploying trained model
--------------------
.. literalinclude:: ../native_client/java/app/src/main/java/ai/coqui/STTActivity.java

View File

@ -15,7 +15,7 @@ Creating a model instance and loading model
:start-after: sphinx-doc: js_ref_model_start
:end-before: sphinx-doc: js_ref_model_stop
Performing inference
Deploying trained model
--------------------
.. literalinclude:: ../native_client/javascript/client.ts

View File

@ -1,10 +1,10 @@
Parallel Optimization
=====================
This is how we implement optimization of the 🐸STT model across GPUs on a
single host. Parallel optimization can take on various forms. For example
one can use asynchronous updates of the model, synchronous updates of the model,
or some combination of the two.
This is how we implement train the 🐸STT model across GPUs on a single host.
Parallel optimization can take on various forms. For example one can use
asynchronous updates of the model, synchronous updates of the model, or some
combination of the two.
Asynchronous Parallel Optimization
----------------------------------

View File

@ -15,7 +15,7 @@ Creating a model instance and loading model
:start-after: sphinx-doc: python_ref_model_start
:end-before: sphinx-doc: python_ref_model_stop
Performing inference
Deploying trained model
--------------------
.. literalinclude:: ../native_client/python/client.py

View File

@ -1,20 +1,22 @@
.. _supported-platforms-inference:
.. _supported-platforms-deployment:
Supported platforms for inference
=================================
Supported platforms
===================
Here we maintain the list of supported platforms for running inference. Note that for now we only have working packages for Python on Linux, without GPU support. We're working to get the rest of our supported languages and architectures up and running.
Here we maintain the list of supported platforms for deployment.
*Note that 🐸STT currently only provides packages for CPU deployment with Python 3.5 or higher on Linux. We're working to get the rest of our usually supported packages back up and running as soon as possible.*
Linux / AMD64 without GPU
^^^^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down performance)
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
* Full TensorFlow runtime (``stt`` packages)
* TensorFlow Lite runtime (``stt-tflite`` packages)
Linux / AMD64 with GPU
^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down performance)
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
* CUDA 10.0 (and capable GPU)
* Full TensorFlow runtime (``stt`` packages)
@ -48,21 +50,21 @@ Android / Aarch64
macOS / AMD64
^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down performance)
* macOS >= 10.10
* Full TensorFlow runtime (``stt`` packages)
* TensorFlow Lite runtime (``stt-tflite`` packages)
Windows / AMD64 without GPU
^^^^^^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down performance)
* Windows Server >= 2012 R2 ; Windows >= 8.1
* Full TensorFlow runtime (``stt`` packages)
* TensorFlow Lite runtime (``stt-tflite`` packages)
Windows / AMD64 with GPU
^^^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down performance)
* Windows Server >= 2012 R2 ; Windows >= 8.1
* CUDA 10.0 (and capable GPU)
* Full TensorFlow runtime (``stt`` packages)

View File

@ -1,7 +1,7 @@
.. _training-docs:
Training Your Own Model
=======================
Training
========
.. _cuda-training-deps:
@ -203,8 +203,8 @@ During training of a model so-called checkpoints will get stored on disk. This t
Be aware however that checkpoints are only valid for the same model geometry they had been generated from. In other words: If there are error messages of certain ``Tensors`` having incompatible dimensions, this is most likely due to an incompatible model change. One usual way out would be to wipe all checkpoint files in the checkpoint directory or changing it before starting the training.
Exporting a model for inference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exporting a model for deployment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the ``--export_dir`` parameter is provided, a model will have been exported to this directory during training.
Refer to the :ref:`usage instructions <usage-docs>` for information on running a client that can use the exported model.
@ -214,10 +214,10 @@ Exporting a model for TFLite
If you want to experiment with the TF Lite engine, you need to export a model that is compatible with it, then use the ``--export_tflite`` flags. If you already have a trained model, you can re-export it for TFLite by running ``train.py`` again and specifying the same ``checkpoint_dir`` that you used for training, as well as passing ``--export_tflite --export_dir /model/export/destination``. If you changed the alphabet you also need to add the ``--alphabet_config_path my-new-language-alphabet.txt`` flag.
Making a mmap-able model for inference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Making a mmap-able model for deployment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``output_graph.pb`` model file generated in the above step will be loaded in memory to be dealt with when running inference.
The ``output_graph.pb`` model file generated in the above step will be loaded in memory when deployed.
This will result in extra loading time and memory consumption. One way to avoid this is to directly read data from the disk.
TensorFlow has tooling to achieve this: it requires building the target ``//tensorflow/contrib/util:convert_graphdef_memmapped_format`` (binaries are produced by our TaskCluster for some systems including Linux/amd64 and macOS/amd64), use ``util/taskcluster.py`` tool to download:

View File

@ -1,240 +0,0 @@
.. _usage-docs:
Using a Pre-trained Model
=========================
Inference using a 🐸STT pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed `further down in this README <#third-party-bindings>`_. Note that for now we only have working packages for Python on Linux, without GPU support. We're working to get the rest of our supported languages and architectures up and running.
* :ref:`The C API <c-usage>`.
* :ref:`The Python package/language binding <py-usage>`
* :ref:`The Node.JS package/language binding <nodejs-usage>`
* :ref:`The command-line client <cli-usage>`
* :github:`The .NET client/language binding <native_client/dotnet/README.rst>`
.. _runtime-deps:
Running ``stt`` might, see below, require some runtime dependencies to be already installed on your system:
* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz.
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
* ``libstdc++`` - Standard C++ Library implementation. Some people have had to install this manually.
* ``libpthread`` - On Linux, some people have had to install libpthread manually. On Ubuntu, ``libpthread`` is part of the ``libpthread-stubs0-dev`` package.
* ``Redistribuable Visual C++ 2015 Update 3 (64-bits)`` - On Windows, it might be required to ensure this is installed. Please `download from Microsoft <https://www.microsoft.com/download/details.aspx?id=53587>`_.
Please refer to your system's documentation on how to install these dependencies.
.. _cuda-inference-deps:
CUDA dependency (inference)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The GPU capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN v7.6.
Getting the pre-trained model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the 🐸STT `releases page <https://github.com/coqui-ai/STT/releases>`_. Alternatively, you can run the following command to download the model files in your current directory:
.. code-block:: bash
wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.pbmm
wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.scorer
There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``stt``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``stt-gpu``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``stt-tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``stt``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`.
+--------------------+---------------------+---------------------+
| Package/Model type | .pbmm | .tflite |
+====================+=====================+=====================+
| stt | Depends on platform | Depends on platform |
+--------------------+---------------------+---------------------+
| stt-gpu | ✅ | ❌ |
+--------------------+---------------------+---------------------+
| stt-tflite | ❌ | ✅ |
+--------------------+---------------------+---------------------+
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
Important considerations on model inputs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The release notes include detailed information on how the released models were trained/constructed. Important considerations for users include the characteristics of the training data used and whether they match your intended use case. For acoustic models, an important characteristic is the demographic distribution of speakers. For external scorers, the texts should be similar to those of the expected use case. If the data used for training the models does not align with your intended use case, it may be necessary to adapt or train new models in order to get good accuracy in your transcription results.
The process for training an acoustic model is described in :ref:`training-docs`. In particular, fine tuning a release model using your own data can be a good way to leverage relatively smaller amounts of data that would not be sufficient for training a new model from scratch. See the :ref:`fine tuning and transfer learning sections <training-fine-tuning>` for more information. :ref:`Data augmentation <training-data-augmentation>` can also be a good way to increase the value of smaller training sets.
Creating your own external scorer from text data is another way that you can adapt the model to your specific needs. The process and tools used to generate an external scorer package are described in :ref:`scorer-scripts` and an overview of how the external scorer is used by 🐸STT to perform inference is available in :ref:`decoder-docs`. Generating a smaller scorer from a single purpose text dataset is a quick process and can bring significant accuracy improvements, specially for more constrained, limited vocabulary applications.
Model compatibility
^^^^^^^^^^^^^^^^^^^
🐸STT models are versioned to keep you from trying to use an incompatible graph with a newer client after a breaking change was made to the code. If you get an error saying your model file version is too old for the client, you should either upgrade to a newer model release, re-export your model from the checkpoint using a newer version of the code, or downgrade your client if you need to use the old model and can't re-export it.
.. _py-usage:
Using the Python package
^^^^^^^^^^^^^^^^^^^^^^^^
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``stt`` binary to do speech-to-text on an audio file:
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
We will continue under the assumption that you already have your system properly setup to create new virtual environments.
Create a Coqui STT virtual environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run 🐸STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/coqui-stt-venv``. You can create it using this command:
.. code-block::
$ virtualenv -p python3 $HOME/tmp/coqui-stt-venv/
Once this command completes successfully, the environment will be ready to be activated.
Activating the environment
~~~~~~~~~~~~~~~~~~~~~~~~~~
Each time you need to work with 🐸STT, you have to *activate* this virtual environment. This is done with this simple command:
.. code-block::
$ source $HOME/tmp/coqui-stt-venv/bin/activate
Installing Coqui STT Python bindings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the 🐸STT wheel. You can check if ``stt`` is already installed with ``pip3 list``.
To perform the installation, just use ``pip3`` as such:
.. code-block::
$ pip3 install stt
If ``stt`` is already installed, you can update it as such:
.. code-block::
$ pip3 install --upgrade stt
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
.. code-block::
$ pip3 install stt-gpu
See the `release notes <https://github.com/coqui-ai/STT/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
You can update ``stt-gpu`` as follows:
.. code-block::
$ pip3 install --upgrade stt-gpu
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``stt`` from the command-line.
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
.. code-block:: bash
stt --model stt-0.9.3-models.pbmm --scorer stt-0.9.3-models.scorer --audio my_audio_file.wav
The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio.
See :ref:`the Python client <py-api-example>` for an example of how to use the package programatically.
.. _nodejs-usage:
Using the Node.JS / Electron.JS package
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can download the JS bindings using ``npm``\ :
.. code-block:: bash
npm install stt
Special thanks to `Huan - Google Developers Experts in Machine Learning (ML GDE) <https://github.com/huan>`_ for providing the STT project name on npmjs.org
Please note that as of now, we support:
- Node.JS versions 4 to 13.
- Electron.JS versions 1.6 to 7.1
TypeScript support is also provided.
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
.. code-block:: bash
npm install stt-gpu
See the `release notes <https://github.com/coqui-ai/STT/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
See the :ref:`TypeScript client <js-api-example>` for an example of how to use the bindings programatically.
.. _cli-usage:
Using the command-line client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To download the pre-built binaries for the ``stt`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
.. code-block:: bash
python3 util/taskcluster.py --target .
or if you're on macOS:
.. code-block:: bash
python3 util/taskcluster.py --arch osx --target .
also, if you need some binaries different than current main branch, like ``v0.2.0-alpha.6``\ , you can use ``--branch``\ :
.. code-block:: bash
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``stt`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of 🐸STT or TensorFlow can be specified as well.
Alternatively you may manually download the ``native_client.tar.xz`` from the [releases](https://github.com/coqui-ai/STT/releases).
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
.. code-block:: bash
./stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio_input.wav
See the help output with ``./stt -h`` for more details.
Installing bindings from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow the :github:`native client build and installation instructions <native_client/README.rst>`.
Dockerfile for building from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We provide ``Dockerfile.build`` to automatically build ``libstt.so``, the C++ native client, Python bindings, and KenLM.
You need to generate the Dockerfile from the template using:
.. code-block:: bash
make Dockerfile.build
If you want to specify a different repository / branch, you can pass ``STT_REPO`` or ``STT_SHA`` parameters:
.. code-block:: bash
make Dockerfile.build STT_REPO=git://your/fork STT_SHA=origin/your-branch
.. Third party bindings
^^^^^^^^^^^^^^^^^^^^
In addition to the bindings above, third party developers have started to provide bindings to other languages:
* `Asticode <https://github.com/asticode>`_ provides `Golang <https://golang.org>`_ bindings in its `go-astideepspeech <https://github.com/asticode/go-astideepspeech>`_ repo.
* `RustAudio <https://github.com/RustAudio>`_ provide a `Rust <https://www.rust-lang.org>`_ binding, the installation and use of which is described in their `deepspeech-rs <https://github.com/RustAudio/deepspeech-rs>`_ repo.
* `stes <https://github.com/stes>`_ provides preliminary `PKGBUILDs <https://wiki.archlinux.org/index.php/PKGBUILD>`_ to install the client and python bindings on `Arch Linux <https://www.archlinux.org/>`_ in the `arch-deepspeech <https://github.com/stes/arch-deepspeech>`_ repo.
* `gst-deepspeech <https://github.com/Elleo/gst-deepspeech>`_ provides a `GStreamer <https://gstreamer.freedesktop.org/>`_ plugin which can be used from any language with GStreamer bindings.
* `thecodrr <https://github.com/thecodrr>`_ provides `Vlang <https://vlang.io>`_ bindings. The installation and use of which is described in their `vspeech <https://github.com/thecodrr/vspeech>`_ repo.
* `eagledot <https://gitlab.com/eagledot>`_ provides `NIM-lang <https://nim-lang.org/>`_ bindings. The installation and use of which is described in their `nim-deepspeech <https://gitlab.com/eagledot/nim-deepspeech>`_ repo.

View File

@ -6,88 +6,48 @@
Coqui STT
=========
**Coqui STT** (🐸STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models. 🐸STT is battle tested in both production and research 🚀
**Coqui STT** (🐸STT) is an open-source deep-learning toolkit for training and deploying speech-to-text models.
For now we only have working packages for Python on Linux, without GPU support. We're working to get the rest of our supported languages and architectures up and running.
🐸STT is battle tested in both production and research 🚀
To install and use 🐸STT all you have to do is:
.. code-block:: bash
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/stt/
source $HOME/tmp/stt/bin/activate
# Install 🐸STT
pip3 install stt
# Download pre-trained English model files
curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.pbmm
curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.scorer
# Download example audio files
curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/audio-0.9.3.tar.gz
tar xvf audio-0.9.3.tar.gz
# Transcribe an audio file
stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs <usage-docs>`. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page <https://github.com/coqui-ai/STT/releases/latest>`_.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/coqui-ai/STT/releases/latest>`_ to find which GPUs are supported. To run ``stt`` on a GPU, install the GPU specific package. Note that for now the GPU package is not available. We're working to get all of our supported languages and architectures up and running.
.. code-block:: bash
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/coqui-stt-gpu-venv/
source $HOME/tmp/coqui-stt-gpu-venv/bin/activate
# Install 🐸STT CUDA enabled package
pip3 install stt-gpu
# Transcribe an audio file.
stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
Please ensure you have the required :ref:`CUDA dependencies <cuda-inference-deps>`.
See the output of ``stt -h`` for more information on the use of ``stt``. (If you experience problems running ``stt``, please check :ref:`required runtime dependencies <runtime-deps>`).
.. toctree::
:maxdepth: 2
:caption: Introduction
:maxdepth: 1
:caption: Quick Reference
USING
DEPLOYMENT
TRAINING
SUPPORTED_PLATFORMS
BUILDING
Quickstart: Deployment
^^^^^^^^^^^^^^^^^^^^^^
BUILDING_DotNet
The fastest way to deploy a pre-trained 🐸STT model is with `pip` with Python 3.5 or higher (*Note - only Linux supported at this time. We are working to get our normally supported packages back up and running.*):
.. include:: ../SUPPORT.rst
.. code-block:: bash
# Create a virtual environment
$ python3 -m venv venv-stt
$ source venv-stt/bin/activate
# Install 🐸STT
$ python3 -m pip install -U pip
$ python3 -m pip install stt
# Download 🐸's pre-trained English models
$ curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.pbmm
$ curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.scorer
# Download some example audio files
$ curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/audio-0.9.3.tar.gz
$ tar -xvf audio-0.9.3.tar.gz
# Transcribe an audio file
$ stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
.. toctree::
:maxdepth: 2
:caption: Decoder and scorer
Decoder
Scorer
.. toctree::
:maxdepth: 2
:caption: Architecture and training
Architecture
Geometry
ParallelOptimization
.. toctree::
:maxdepth: 3
:maxdepth: 1
:caption: API Reference
Error-Codes
@ -103,23 +63,25 @@ See the output of ``stt -h`` for more information on the use of ``stt``. (If you
Python-API
.. toctree::
:maxdepth: 2
:maxdepth: 1
:caption: Examples
Python-Examples
NodeJS-Examples
C-Examples
DotNet-Examples
Java-Examples
NodeJS-Examples
Python-Examples
HotWordBoosting-Examples
Contributed-Examples
.. include:: ../SUPPORT.rst
Indices and tables
==================