Address review comments

This commit is contained in:
Reuben Morais 2020-08-05 23:21:41 +02:00
parent 4d98958b77
commit 0b51004081
27 changed files with 130 additions and 133 deletions

View File

@ -1,5 +1,5 @@
This file contains a list of papers in chronological order that have been published
using Mozilla's DeepSpeech.
using Mozilla Voice STT.
To appear
==========

View File

@ -1,5 +1,5 @@
Project DeepSpeech
==================
Mozilla Voice STT
=================
.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
@ -12,7 +12,7 @@ Project DeepSpeech
:alt: Task Status
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
Mozilla Voice STT is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Mozilla Voice STT uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
Documentation for installation, usage, and training models are available on `deepspeech.readthedocs.io <http://deepspeech.readthedocs.io/?badge=latest>`_.

View File

@ -99,7 +99,7 @@ Now, ``cd`` into the ``DeepSpeech/native_client`` directory and use the ``Makefi
.. code-block::
cd ../DeepSpeech/native_client
make deepspeech
make mozilla_voice_stt
Installing your own Binaries
----------------------------
@ -121,7 +121,7 @@ Included are a set of generated Python bindings. After following the above build
cd native_client/python
make bindings
pip install dist/deepspeech*
pip install dist/mozilla_voice_stt*
The API mirrors the C++ API and is demonstrated in `client.py <python/client.py>`_. Refer to the `C API <c-usage>` for documentation.
@ -175,13 +175,13 @@ And your command line for ``LePotato`` and ``ARM64`` should look like:
While we test only on RPi3 Raspbian Buster and LePotato ARMBian Buster, anything compatible with ``armv7-a cortex-a53`` or ``armv8-a cortex-a53`` should be fine.
The ``deepspeech`` binary can also be cross-built, with ``TARGET=rpi3`` or ``TARGET=rpi3-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multitrap configuration files: ``native_client/multistrap_armbian64_buster.conf`` and ``native_client/multistrap_raspbian_buster.conf``.
The ``mozilla_voice_stt`` binary can also be cross-built, with ``TARGET=rpi3`` or ``TARGET=rpi3-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multitrap configuration files: ``native_client/multistrap_armbian64_buster.conf`` and ``native_client/multistrap_raspbian_buster.conf``.
The path of the system tree can be overridden from the default values defined in ``definitions.mk`` through the ``RASPBIAN`` ``make`` variable.
.. code-block::
cd ../DeepSpeech/native_client
make TARGET=<system> deepspeech
make TARGET=<system> mozilla_voice_stt
Android devices support
-----------------------
@ -236,10 +236,10 @@ Please note that you might have to copy the file to a local Maven repository
and adapt file naming (when missing, the error message should states what
filename it expects and where).
Building C++ ``deepspeech`` binary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Building C++ ``mozilla_voice_stt`` binary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Building the ``deepspeech`` binary will happen through ``ndk-build`` (ARMv7):
Building the ``mozilla_voice_stt`` binary will happen through ``ndk-build`` (ARMv7):
.. code-block::
@ -272,13 +272,13 @@ demo of one usage of the application. For example, it's only able to read PCM
mono 16kHz 16-bits file and it might fail on some WAVE file that are not
following exactly the specification.
Running ``deepspeech`` via adb
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Running ``mozilla_voice_stt`` via adb
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You should use ``adb push`` to send data to device, please refer to Android
documentation on how to use that.
Please push Mozilla Voice STT data to ``/sdcard/deepspeech/``\ , including:
Please push Mozilla Voice STT data to ``/sdcard/mozilla_voice_stt/``\ , including:
* ``output_graph.tflite`` which is the TF Lite model
@ -286,9 +286,9 @@ Please push Mozilla Voice STT data to ``/sdcard/deepspeech/``\ , including:
the scorer; please be aware that too big scorer will make the device run out
of memory
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/ds``\ :
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/stt``\ :
* ``deepspeech``
* ``mozilla_voice_stt``
* ``libmozilla_voice_stt.so``
* ``libc++_shared.so``
@ -296,8 +296,8 @@ You should then be able to run as usual, using a shell from ``adb shell``\ :
.. code-block::
user@device$ cd /data/local/tmp/ds/
user@device$ LD_LIBRARY_PATH=$(pwd)/ ./deepspeech [...]
user@device$ cd /data/local/tmp/stt/
user@device$ LD_LIBRARY_PATH=$(pwd)/ ./mozilla_voice_stt [...]
Please note that Android linker does not support ``rpath`` so you have to set
``LD_LIBRARY_PATH``. Properly wrapped / packaged bindings does embed the library

View File

@ -2,17 +2,17 @@
==============
DeepSpeech Class
----------------
MozillaVoiceSttModel Class
--------------------------
.. doxygenclass:: DeepSpeechClient::DeepSpeech
.. doxygenclass:: MozillaVoiceSttClient::MozillaVoiceSttModel
:project: deepspeech-dotnet
:members:
DeepSpeechStream Class
----------------------
MozillaVoiceSttStream Class
---------------------------
.. doxygenclass:: DeepSpeechClient::Models::DeepSpeechStream
.. doxygenclass:: MozillaVoiceSttClient::Models::MozillaVoiceSttStream
:project: deepspeech-dotnet
:members:
@ -21,33 +21,33 @@ ErrorCodes
See also the main definition including descriptions for each error in :ref:`error-codes`.
.. doxygenenum:: DeepSpeechClient::Enums::ErrorCodes
.. doxygenenum:: MozillaVoiceSttClient::Enums::ErrorCodes
:project: deepspeech-dotnet
Metadata
--------
.. doxygenclass:: DeepSpeechClient::Models::Metadata
.. doxygenclass:: MozillaVoiceSttClient::Models::Metadata
:project: deepspeech-dotnet
:members: Transcripts
CandidateTranscript
-------------------
.. doxygenclass:: DeepSpeechClient::Models::CandidateTranscript
.. doxygenclass:: MozillaVoiceSttClient::Models::CandidateTranscript
:project: deepspeech-dotnet
:members: Tokens, Confidence
TokenMetadata
-------------
.. doxygenclass:: DeepSpeechClient::Models::TokenMetadata
.. doxygenclass:: MozillaVoiceSttClient::Models::TokenMetadata
:project: deepspeech-dotnet
:members: Text, Timestep, StartTime
DeepSpeech Interface
--------------------
IMozillaVoiceSttModel Interface
-------------------------------
.. doxygeninterface:: DeepSpeechClient::Interfaces::IDeepSpeech
.. doxygeninterface:: MozillaVoiceSttClient::Interfaces::IMozillaVoiceSttModel
:project: deepspeech-dotnet
:members:

View File

@ -1,12 +1,12 @@
.NET API Usage example
======================
Examples are from `native_client/dotnet/DeepSpeechConsole/Program.cs`.
Examples are from `native_client/dotnet/MozillaVoiceSttConsole/Program.cs`.
Creating a model instance and loading model
-------------------------------------------
.. literalinclude:: ../native_client/dotnet/DeepSpeechConsole/Program.cs
.. literalinclude:: ../native_client/dotnet/MozillaVoiceSttConsole/Program.cs
:language: csharp
:linenos:
:lineno-match:
@ -16,7 +16,7 @@ Creating a model instance and loading model
Performing inference
--------------------
.. literalinclude:: ../native_client/dotnet/DeepSpeechConsole/Program.cs
.. literalinclude:: ../native_client/dotnet/MozillaVoiceSttConsole/Program.cs
:language: csharp
:linenos:
:lineno-match:
@ -26,4 +26,4 @@ Performing inference
Full source code
----------------
See :download:`Full source code<../native_client/dotnet/DeepSpeechConsole/Program.cs>`.
See :download:`Full source code<../native_client/dotnet/MozillaVoiceSttConsole/Program.cs>`.

View File

@ -1,29 +1,29 @@
Java
====
DeepSpeechModel
---------------
MozillaVoiceSttModel
--------------------
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::DeepSpeechModel
.. doxygenclass:: org::mozilla::voice::stt::MozillaVoiceSttModel
:project: deepspeech-java
:members:
Metadata
--------
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::Metadata
.. doxygenclass:: org::mozilla::voice::stt::Metadata
:project: deepspeech-java
:members: getNumTranscripts, getTranscript
CandidateTranscript
-------------------
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::CandidateTranscript
.. doxygenclass:: org::mozilla::voice::stt::CandidateTranscript
:project: deepspeech-java
:members: getNumTokens, getConfidence, getToken
TokenMetadata
-------------
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::TokenMetadata
.. doxygenclass:: org::mozilla::voice::stt::TokenMetadata
:project: deepspeech-java
:members: getText, getTimestep, getStartTime

View File

@ -1,12 +1,12 @@
Java API Usage example
======================
Examples are from `native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java`.
Examples are from `native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java`.
Creating a model instance and loading model
-------------------------------------------
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java
:language: java
:linenos:
:lineno-match:
@ -16,7 +16,7 @@ Creating a model instance and loading model
Performing inference
--------------------
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java
:language: java
:linenos:
:lineno-match:
@ -26,4 +26,4 @@ Performing inference
Full source code
----------------
See :download:`Full source code<../native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java>`.
See :download:`Full source code<../native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java>`.

View File

@ -1,8 +1,8 @@
Parallel Optimization
=====================
This is how we implement optimization of the DeepSpeech model across GPUs on a
single host. Parallel optimization can take on various forms. For example
This is how we implement optimization of the Mozilla Voice STT model across GPUs
on a single host. Parallel optimization can take on various forms. For example
one can use asynchronous updates of the model, synchronous updates of the model,
or some combination of the two.

View File

@ -9,61 +9,61 @@ Linux / AMD64 without GPU
^^^^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
* Full TensorFlow runtime (``deepspeech`` packages)
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Linux / AMD64 with GPU
^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
* CUDA 10.0 (and capable GPU)
* Full TensorFlow runtime (``deepspeech`` packages)
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Linux / ARMv7
^^^^^^^^^^^^^
* Cortex-A53 compatible ARMv7 SoC with Neon support
* Raspbian Buster-compatible distribution
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Linux / Aarch64
^^^^^^^^^^^^^^^
* Cortex-A72 compatible Aarch64 SoC
* ARMbian Buster-compatible distribution
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Android / ARMv7
^^^^^^^^^^^^^^^
* ARMv7 SoC with Neon support
* Android 7.0-10.0
* NDK API level >= 21
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Android / Aarch64
^^^^^^^^^^^^^^^^^
* Aarch64 SoC
* Android 7.0-10.0
* NDK API level >= 21
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
macOS / AMD64
^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* macOS >= 10.10
* Full TensorFlow runtime (``deepspeech`` packages)
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Windows / AMD64 without GPU
^^^^^^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* Windows Server >= 2012 R2 ; Windows >= 8.1
* Full TensorFlow runtime (``deepspeech`` packages)
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
Windows / AMD64 with GPU
^^^^^^^^^^^^^^^^^^^^^^^^
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
* Windows Server >= 2012 R2 ; Windows >= 8.1
* CUDA 10.0 (and capable GPU)
* Full TensorFlow runtime (``deepspeech`` packages)
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)

View File

@ -21,11 +21,11 @@ Clone the Mozilla Voice STT repository:
Creating a virtual environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-train-venv``. You can create it using this command:
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run Mozilla Voice STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/stt-train-venv``. You can create it using this command:
.. code-block::
$ python3 -m venv $HOME/tmp/deepspeech-train-venv/
$ python3 -m venv $HOME/tmp/stt-train-venv/
Once this command completes successfully, the environment will be ready to be activated.
@ -36,7 +36,7 @@ Each time you need to work with Mozilla Voice STT, you have to *activate* this v
.. code-block::
$ source $HOME/tmp/deepspeech-train-venv/bin/activate
$ source $HOME/tmp/stt-train-venv/bin/activate
Installing Mozilla Voice STT Training Code and its dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@ -13,7 +13,7 @@ Inference using a Mozilla Voice STT pre-trained model can be done with a client/
.. _runtime-deps:
Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system:
Running ``mozilla_voice_stt`` might, see below, require some runtime dependencies to be already installed on your system:
* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz.
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
@ -28,7 +28,7 @@ Please refer to your system's documentation on how to install these dependencies
CUDA dependency
^^^^^^^^^^^^^^^
The GPU capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN v7.6.
The CUDA capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN v7.6.
Getting the pre-trained model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -40,17 +40,17 @@ If you want to use the pre-trained English model for performing speech-to-text,
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.scorer
There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``deepspeech``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``deepspeech-gpu``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``deepspeech-tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``deepspeech``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`.
There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``mozilla_voice_stt``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``mozilla_voice_stt_cuda``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``mozilla_voice_stt_tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``mozilla_voice_stt``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`.
+--------------------+---------------------+---------------------+
| Package/Model type | .pbmm | .tflite |
+====================+=====================+=====================+
| deepspeech | Depends on platform | Depends on platform |
+--------------------+---------------------+---------------------+
| deepspeech-gpu | ✅ | ❌ |
+--------------------+---------------------+---------------------+
| deepspeech-tflite | ❌ | ✅ |
+--------------------+---------------------+---------------------+
+--------------------------+---------------------+---------------------+
| Package/Model type | .pbmm | .tflite |
+==========================+=====================+=====================+
| mozilla_voice_stt | Depends on platform | Depends on platform |
+--------------------------+---------------------+---------------------+
| mozilla_voice_stt_cuda | ✅ | ❌ |
+--------------------------+---------------------+---------------------+
| mozilla_voice_stt_tflite | ❌ | ✅ |
+--------------------------+---------------------+---------------------+
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
@ -73,7 +73,7 @@ Mozilla Voice STT models are versioned to keep you from trying to use an incompa
Using the Python package
^^^^^^^^^^^^^^^^^^^^^^^^
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``deepspeech`` binary to do speech-to-text on an audio file:
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``mozilla_voice_stt`` binary to do speech-to-text on an audio file:
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
@ -82,11 +82,11 @@ We will continue under the assumption that you already have your system properly
Create a Mozilla Voice STT virtual environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-venv``. You can create it using this command:
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run Mozilla Voice STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/stt-venv``. You can create it using this command:
.. code-block::
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
$ virtualenv -p python3 $HOME/tmp/stt-venv/
Once this command completes successfully, the environment will be ready to be activated.
@ -97,46 +97,46 @@ Each time you need to work with Mozilla Voice STT, you have to *activate* this v
.. code-block::
$ source $HOME/tmp/deepspeech-venv/bin/activate
$ source $HOME/tmp/stt-venv/bin/activate
Installing Mozilla Voice STT Python bindings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the Mozilla Voice STT wheel. You can check if ``deepspeech`` is already installed with ``pip3 list``.
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the Mozilla Voice STT wheel. You can check if ``mozilla_voice_stt`` is already installed with ``pip3 list``.
To perform the installation, just use ``pip3`` as such:
.. code-block::
$ pip3 install deepspeech
$ pip3 install mozilla_voice_stt
If ``deepspeech`` is already installed, you can update it as such:
If ``mozilla_voice_stt`` is already installed, you can update it as such:
.. code-block::
$ pip3 install --upgrade deepspeech
$ pip3 install --upgrade mozilla_voice_stt
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the CUDA specific package as follows:
.. code-block::
$ pip3 install deepspeech-gpu
$ pip3 install mozilla_voice_stt_cuda
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
You can update ``deepspeech-gpu`` as follows:
You can update ``mozilla_voice_stt_cuda`` as follows:
.. code-block::
$ pip3 install --upgrade deepspeech-gpu
$ pip3 install --upgrade mozilla_voice_stt_cuda
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``deepspeech`` from the command-line.
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``mozilla_voice_stt`` from the command-line.
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
.. code-block:: bash
deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio my_audio_file.wav
mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio my_audio_file.wav
The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio.
@ -151,7 +151,7 @@ You can download the JS bindings using ``npm``\ :
.. code-block:: bash
npm install deepspeech
npm install mozilla_voice_stt
Please note that as of now, we support:
- Node.JS versions 4 to 13.
@ -159,11 +159,11 @@ Please note that as of now, we support:
TypeScript support is also provided.
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the CUDA specific package as follows:
.. code-block:: bash
npm install deepspeech-gpu
npm install mozilla_voice_stt_cuda
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
@ -174,7 +174,7 @@ See the :ref:`TypeScript client <js-api-example>` for an example of how to use t
Using the command-line client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To download the pre-built binaries for the ``deepspeech`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
To download the pre-built binaries for the ``mozilla_voice_stt`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
.. code-block:: bash
@ -192,7 +192,7 @@ also, if you need some binaries different than current master, like ``v0.2.0-alp
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``deepspeech`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of Mozilla Voice STT or TensorFlow can be specified as well.
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``mozilla_voice_stt`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of Mozilla Voice STT or TensorFlow can be specified as well.
Alternatively you may manually download the ``native_client.tar.xz`` from the [releases](https://github.com/mozilla/DeepSpeech/releases).
@ -200,9 +200,9 @@ Note: the following command assumes you `downloaded the pre-trained model <#gett
.. code-block:: bash
./deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio_input.wav
./mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio_input.wav
See the help output with ``./deepspeech -h`` for more details.
See the help output with ``./mozilla_voice_stt -h`` for more details.
Installing bindings from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@ -24,7 +24,7 @@ import sys
sys.path.insert(0, os.path.abspath('../'))
autodoc_mock_imports = ['deepspeech']
autodoc_mock_imports = ['mozilla_voice_stt']
# This is in fact only relevant on ReadTheDocs, but we want to run the same way
# on our CI as in RTD to avoid regressions on RTD that we would not catch on
@ -143,7 +143,7 @@ html_static_path = ['.static']
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'DeepSpeechdoc'
htmlhelp_basename = 'sttdoc'
# -- Options for LaTeX output ---------------------------------------------
@ -180,7 +180,7 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'deepspeech', u'Mozilla Voice STT Documentation',
(master_doc, 'mozilla_voice_stt', u'Mozilla Voice STT Documentation',
[author], 1)
]

View File

@ -790,7 +790,7 @@ WARN_LOGFILE =
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.
INPUT = native_client/dotnet/DeepSpeechClient/ native_client/dotnet/DeepSpeechClient/Interfaces/ native_client/dotnet/DeepSpeechClient/Enums/ native_client/dotnet/DeepSpeechClient/Models/
INPUT = native_client/dotnet/MozillaVoiceSttClient/ native_client/dotnet/MozillaVoiceSttClient/Interfaces/ native_client/dotnet/MozillaVoiceSttClient/Enums/ native_client/dotnet/MozillaVoiceSttClient/Models/
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses

View File

@ -790,7 +790,7 @@ WARN_LOGFILE =
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched.
INPUT = native_client/java/libdeepspeech/src/main/java/org/mozilla/deepspeech/libdeepspeech/ native_client/java/libdeepspeech/src/main/java/org/mozilla/deepspeech/libdeepspeech_doc/
INPUT = native_client/java/libmozillavoicestt/src/main/java/org/mozilla/voice/stt/ native_client/java/libmozillavoicestt/src/main/java/org/mozilla/voice/stt_doc/
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses

View File

@ -13,11 +13,11 @@ To install and use Mozilla Voice STT all you have to do is:
.. code-block:: bash
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate
virtualenv -p python3 $HOME/tmp/stt-venv/
source $HOME/tmp/stt-venv/bin/activate
# Install Mozilla Voice STT
pip3 install deepspeech
pip3 install mozilla_voice_stt
# Download pre-trained English model files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
@ -28,27 +28,27 @@ To install and use Mozilla Voice STT all you have to do is:
tar xvf audio-0.7.4.tar.gz
# Transcribe an audio file
deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs <usage-docs>`. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page <https://github.com/mozilla/DeepSpeech/releases/latest>`_.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package:
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``mozilla_voice_stt`` on a GPU, install the GPU specific package:
.. code-block:: bash
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate
virtualenv -p python3 $HOME/tmp/stt-gpu-venv/
source $HOME/tmp/stt-gpu-venv/bin/activate
# Install Mozilla Voice STT CUDA enabled package
pip3 install deepspeech-gpu
pip3 install mozilla_voice_stt_cuda
# Transcribe an audio file.
deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
Please ensure you have the required :ref:`CUDA dependencies <cuda-deps>`.
See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``, please check :ref:`required runtime dependencies <runtime-deps>`).
See the output of ``mozilla_voice_stt -h`` for more information on the use of ``mozilla_voice_stt``. (If you experience problems running ``mozilla_voice_stt``, please check :ref:`required runtime dependencies <runtime-deps>`).
.. toctree::
:maxdepth: 2

View File

@ -19,11 +19,8 @@ from six.moves import zip, range
r'''
This module should be self-contained:
- build libmozilla_voice_stt.so with TFLite:
- bazel build [...] --define=runtime=tflite [...] //native_client:libmozilla_voice_stt.so
- make -C native_client/python/ TFDIR=... bindings
- setup a virtualenv
- pip install native_client/python/dist/deepspeech*.whl
- pip install mozilla_voice_stt_tflite
- pip install -r requirements_eval_tflite.txt
Then run with a TF Lite model, a scorer and a CSV test file

View File

@ -1,6 +1,6 @@
Examples
========
DeepSpeech examples were moved to a separate repository.
Mozilla Voice STT examples were moved to a separate repository.
New location: https://github.com/mozilla/DeepSpeech-examples

View File

@ -1,5 +1,5 @@
This file contains some notes on coding style within the C++ portion of the
DeepSpeech project. It is very much a work in progress and incomplete.
Mozilla Voice STT project. It is very much a work in progress and incomplete.
General
=======

View File

@ -1,6 +1,6 @@
absl-py==0.9.0
attrdict==2.0.1
deepspeech
mozilla_voice_stt_tflite
numpy==1.16.0
progressbar2==3.47.0
python-utils==2.3.0

View File

@ -26,7 +26,7 @@ then:
DEEPSPEECH_AUDIO: "https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/audio-0.4.1.tar.gz"
PIP_DEFAULT_TIMEOUT: "60"
EXAMPLES_CLONE_URL: "https://github.com/mozilla/DeepSpeech-examples"
EXAMPLES_CHECKOUT_TARGET: "rename-test"
EXAMPLES_CHECKOUT_TARGET: "master"
command:
- "/bin/bash"

View File

@ -7,4 +7,4 @@ source $(dirname "$0")/tc-tests-utils.sh
mkdir -p ${TASKCLUSTER_ARTIFACTS} || true
# NodeJS package
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt-*.tgz ${TASKCLUSTER_ARTIFACTS}/
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt*.tgz ${TASKCLUSTER_ARTIFACTS}/

View File

@ -14,7 +14,7 @@ package_libdeepspeech_as_zip "libmozilla_voice_stt.zip"
if [ -d ${DS_ROOT_TASK}/DeepSpeech/ds/wheels ]; then
cp ${DS_ROOT_TASK}/DeepSpeech/ds/wheels/* ${TASKCLUSTER_ARTIFACTS}/
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt-*.tgz ${TASKCLUSTER_ARTIFACTS}/
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt*.tgz ${TASKCLUSTER_ARTIFACTS}/
fi;
if [ -f ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ]; then

View File

@ -17,9 +17,9 @@ do_deepspeech_python_build()
SETUP_FLAGS=""
if [ "${package_option}" = "--cuda" ]; then
SETUP_FLAGS="--project_name mozilla_voice_stt-gpu"
SETUP_FLAGS="--project_name mozilla_voice_stt_cuda"
elif [ "${package_option}" = "--tflite" ]; then
SETUP_FLAGS="--project_name mozilla_voice_stt-tflite"
SETUP_FLAGS="--project_name mozilla_voice_stt_tflite"
fi
for pyver_conf in ${SUPPORTED_PYTHON_VERSIONS}; do
@ -133,7 +133,7 @@ do_deepspeech_nodejs_build()
done;
if [ "${rename_to_gpu}" = "--cuda" ]; then
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt-gpu
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt_cuda
else
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt
fi
@ -165,9 +165,9 @@ do_deepspeech_npm_package()
done;
if [ "${package_option}" = "--cuda" ]; then
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt-gpu
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt_cuda
elif [ "${package_option}" = "--tflite" ]; then
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt-tflite
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt_tflite
else
make -C native_client/javascript clean npm-pack
fi

View File

@ -7,8 +7,8 @@ get_dep_npm_pkg_url()
{
local all_deps="$(curl -s https://community-tc.services.mozilla.com/api/queue/v1/task/${TASK_ID} | python -c 'import json; import sys; print(" ".join(json.loads(sys.stdin.read())["dependencies"]));')"
# We try "mozilla_voice_stt-tflite" and "mozilla_voice_stt-gpu" first and if we don't find it we try "mozilla_voice_stt"
for pkg_basename in "mozilla_voice_stt-tflite" "mozilla_voice_stt-gpu" "mozilla_voice_stt"; do
# We try "mozilla_voice_stt_tflite" and "mozilla_voice_stt_cuda" first and if we don't find it we try "mozilla_voice_stt"
for pkg_basename in "mozilla_voice_stt_tflite" "mozilla_voice_stt_cuda" "mozilla_voice_stt"; do
local deepspeech_pkg="${pkg_basename}-${DS_VERSION}.tgz"
for dep in ${all_deps}; do
local has_artifact=$(curl -s https://community-tc.services.mozilla.com/api/queue/v1/task/${dep}/artifacts | python -c 'import json; import sys; has_artifact = True in [ e["name"].find("'${deepspeech_pkg}'") > 0 for e in json.loads(sys.stdin.read())["artifacts"] ]; print(has_artifact)')

View File

@ -14,7 +14,7 @@ download_data
virtualenv_activate "${pyalias}" "deepspeech"
if [ "$3" = "cuda" ]; then
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}" "mozilla_voice_stt_gpu")
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}" "mozilla_voice_stt_cuda")
else
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}")
fi;

View File

@ -22,5 +22,5 @@ fi;
if [ -f ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ]; then
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ${TASKCLUSTER_ARTIFACTS}/
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt-*.tgz ${TASKCLUSTER_ARTIFACTS}/
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt*.tgz ${TASKCLUSTER_ARTIFACTS}/
fi;

View File

@ -47,14 +47,14 @@ def check_ctcdecoder_version():
from ds_ctcdecoder import __version__ as decoder_version
except ImportError as e:
if e.msg.find('__version__') > 0:
print("DeepSpeech version ({ds_version}) requires CTC decoder to expose __version__. "
print("Mozilla Voice STT version ({ds_version}) requires CTC decoder to expose __version__. "
"Please upgrade the ds_ctcdecoder package to version {ds_version}".format(ds_version=ds_version_s))
sys.exit(1)
raise e
rv = semver.compare(ds_version_s, decoder_version)
if rv != 0:
print("DeepSpeech version ({}) and CTC decoder version ({}) do not match. "
print("Mozilla Voice STT version ({}) and CTC decoder version ({}) do not match. "
"Please ensure matching versions are in use.".format(ds_version_s, decoder_version))
sys.exit(1)