Address review comments
This commit is contained in:
parent
4d98958b77
commit
0b51004081
@ -1,5 +1,5 @@
|
|||||||
This file contains a list of papers in chronological order that have been published
|
This file contains a list of papers in chronological order that have been published
|
||||||
using Mozilla's DeepSpeech.
|
using Mozilla Voice STT.
|
||||||
|
|
||||||
To appear
|
To appear
|
||||||
==========
|
==========
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
Project DeepSpeech
|
Mozilla Voice STT
|
||||||
==================
|
=================
|
||||||
|
|
||||||
|
|
||||||
.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
|
.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
|
||||||
@ -12,7 +12,7 @@ Project DeepSpeech
|
|||||||
:alt: Task Status
|
:alt: Task Status
|
||||||
|
|
||||||
|
|
||||||
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
|
Mozilla Voice STT is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Mozilla Voice STT uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
|
||||||
|
|
||||||
Documentation for installation, usage, and training models are available on `deepspeech.readthedocs.io <http://deepspeech.readthedocs.io/?badge=latest>`_.
|
Documentation for installation, usage, and training models are available on `deepspeech.readthedocs.io <http://deepspeech.readthedocs.io/?badge=latest>`_.
|
||||||
|
|
||||||
|
@ -99,7 +99,7 @@ Now, ``cd`` into the ``DeepSpeech/native_client`` directory and use the ``Makefi
|
|||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
cd ../DeepSpeech/native_client
|
cd ../DeepSpeech/native_client
|
||||||
make deepspeech
|
make mozilla_voice_stt
|
||||||
|
|
||||||
Installing your own Binaries
|
Installing your own Binaries
|
||||||
----------------------------
|
----------------------------
|
||||||
@ -121,7 +121,7 @@ Included are a set of generated Python bindings. After following the above build
|
|||||||
|
|
||||||
cd native_client/python
|
cd native_client/python
|
||||||
make bindings
|
make bindings
|
||||||
pip install dist/deepspeech*
|
pip install dist/mozilla_voice_stt*
|
||||||
|
|
||||||
The API mirrors the C++ API and is demonstrated in `client.py <python/client.py>`_. Refer to the `C API <c-usage>` for documentation.
|
The API mirrors the C++ API and is demonstrated in `client.py <python/client.py>`_. Refer to the `C API <c-usage>` for documentation.
|
||||||
|
|
||||||
@ -175,13 +175,13 @@ And your command line for ``LePotato`` and ``ARM64`` should look like:
|
|||||||
|
|
||||||
While we test only on RPi3 Raspbian Buster and LePotato ARMBian Buster, anything compatible with ``armv7-a cortex-a53`` or ``armv8-a cortex-a53`` should be fine.
|
While we test only on RPi3 Raspbian Buster and LePotato ARMBian Buster, anything compatible with ``armv7-a cortex-a53`` or ``armv8-a cortex-a53`` should be fine.
|
||||||
|
|
||||||
The ``deepspeech`` binary can also be cross-built, with ``TARGET=rpi3`` or ``TARGET=rpi3-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multitrap configuration files: ``native_client/multistrap_armbian64_buster.conf`` and ``native_client/multistrap_raspbian_buster.conf``.
|
The ``mozilla_voice_stt`` binary can also be cross-built, with ``TARGET=rpi3`` or ``TARGET=rpi3-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multitrap configuration files: ``native_client/multistrap_armbian64_buster.conf`` and ``native_client/multistrap_raspbian_buster.conf``.
|
||||||
The path of the system tree can be overridden from the default values defined in ``definitions.mk`` through the ``RASPBIAN`` ``make`` variable.
|
The path of the system tree can be overridden from the default values defined in ``definitions.mk`` through the ``RASPBIAN`` ``make`` variable.
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
cd ../DeepSpeech/native_client
|
cd ../DeepSpeech/native_client
|
||||||
make TARGET=<system> deepspeech
|
make TARGET=<system> mozilla_voice_stt
|
||||||
|
|
||||||
Android devices support
|
Android devices support
|
||||||
-----------------------
|
-----------------------
|
||||||
@ -236,10 +236,10 @@ Please note that you might have to copy the file to a local Maven repository
|
|||||||
and adapt file naming (when missing, the error message should states what
|
and adapt file naming (when missing, the error message should states what
|
||||||
filename it expects and where).
|
filename it expects and where).
|
||||||
|
|
||||||
Building C++ ``deepspeech`` binary
|
Building C++ ``mozilla_voice_stt`` binary
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Building the ``deepspeech`` binary will happen through ``ndk-build`` (ARMv7):
|
Building the ``mozilla_voice_stt`` binary will happen through ``ndk-build`` (ARMv7):
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
@ -272,13 +272,13 @@ demo of one usage of the application. For example, it's only able to read PCM
|
|||||||
mono 16kHz 16-bits file and it might fail on some WAVE file that are not
|
mono 16kHz 16-bits file and it might fail on some WAVE file that are not
|
||||||
following exactly the specification.
|
following exactly the specification.
|
||||||
|
|
||||||
Running ``deepspeech`` via adb
|
Running ``mozilla_voice_stt`` via adb
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
You should use ``adb push`` to send data to device, please refer to Android
|
You should use ``adb push`` to send data to device, please refer to Android
|
||||||
documentation on how to use that.
|
documentation on how to use that.
|
||||||
|
|
||||||
Please push Mozilla Voice STT data to ``/sdcard/deepspeech/``\ , including:
|
Please push Mozilla Voice STT data to ``/sdcard/mozilla_voice_stt/``\ , including:
|
||||||
|
|
||||||
|
|
||||||
* ``output_graph.tflite`` which is the TF Lite model
|
* ``output_graph.tflite`` which is the TF Lite model
|
||||||
@ -286,9 +286,9 @@ Please push Mozilla Voice STT data to ``/sdcard/deepspeech/``\ , including:
|
|||||||
the scorer; please be aware that too big scorer will make the device run out
|
the scorer; please be aware that too big scorer will make the device run out
|
||||||
of memory
|
of memory
|
||||||
|
|
||||||
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/ds``\ :
|
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/stt``\ :
|
||||||
|
|
||||||
* ``deepspeech``
|
* ``mozilla_voice_stt``
|
||||||
* ``libmozilla_voice_stt.so``
|
* ``libmozilla_voice_stt.so``
|
||||||
* ``libc++_shared.so``
|
* ``libc++_shared.so``
|
||||||
|
|
||||||
@ -296,8 +296,8 @@ You should then be able to run as usual, using a shell from ``adb shell``\ :
|
|||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
user@device$ cd /data/local/tmp/ds/
|
user@device$ cd /data/local/tmp/stt/
|
||||||
user@device$ LD_LIBRARY_PATH=$(pwd)/ ./deepspeech [...]
|
user@device$ LD_LIBRARY_PATH=$(pwd)/ ./mozilla_voice_stt [...]
|
||||||
|
|
||||||
Please note that Android linker does not support ``rpath`` so you have to set
|
Please note that Android linker does not support ``rpath`` so you have to set
|
||||||
``LD_LIBRARY_PATH``. Properly wrapped / packaged bindings does embed the library
|
``LD_LIBRARY_PATH``. Properly wrapped / packaged bindings does embed the library
|
||||||
|
@ -2,17 +2,17 @@
|
|||||||
==============
|
==============
|
||||||
|
|
||||||
|
|
||||||
DeepSpeech Class
|
MozillaVoiceSttModel Class
|
||||||
----------------
|
--------------------------
|
||||||
|
|
||||||
.. doxygenclass:: DeepSpeechClient::DeepSpeech
|
.. doxygenclass:: MozillaVoiceSttClient::MozillaVoiceSttModel
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
DeepSpeechStream Class
|
MozillaVoiceSttStream Class
|
||||||
----------------------
|
---------------------------
|
||||||
|
|
||||||
.. doxygenclass:: DeepSpeechClient::Models::DeepSpeechStream
|
.. doxygenclass:: MozillaVoiceSttClient::Models::MozillaVoiceSttStream
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
@ -21,33 +21,33 @@ ErrorCodes
|
|||||||
|
|
||||||
See also the main definition including descriptions for each error in :ref:`error-codes`.
|
See also the main definition including descriptions for each error in :ref:`error-codes`.
|
||||||
|
|
||||||
.. doxygenenum:: DeepSpeechClient::Enums::ErrorCodes
|
.. doxygenenum:: MozillaVoiceSttClient::Enums::ErrorCodes
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
|
|
||||||
Metadata
|
Metadata
|
||||||
--------
|
--------
|
||||||
|
|
||||||
.. doxygenclass:: DeepSpeechClient::Models::Metadata
|
.. doxygenclass:: MozillaVoiceSttClient::Models::Metadata
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
:members: Transcripts
|
:members: Transcripts
|
||||||
|
|
||||||
CandidateTranscript
|
CandidateTranscript
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
.. doxygenclass:: DeepSpeechClient::Models::CandidateTranscript
|
.. doxygenclass:: MozillaVoiceSttClient::Models::CandidateTranscript
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
:members: Tokens, Confidence
|
:members: Tokens, Confidence
|
||||||
|
|
||||||
TokenMetadata
|
TokenMetadata
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
.. doxygenclass:: DeepSpeechClient::Models::TokenMetadata
|
.. doxygenclass:: MozillaVoiceSttClient::Models::TokenMetadata
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
:members: Text, Timestep, StartTime
|
:members: Text, Timestep, StartTime
|
||||||
|
|
||||||
DeepSpeech Interface
|
IMozillaVoiceSttModel Interface
|
||||||
--------------------
|
-------------------------------
|
||||||
|
|
||||||
.. doxygeninterface:: DeepSpeechClient::Interfaces::IDeepSpeech
|
.. doxygeninterface:: MozillaVoiceSttClient::Interfaces::IMozillaVoiceSttModel
|
||||||
:project: deepspeech-dotnet
|
:project: deepspeech-dotnet
|
||||||
:members:
|
:members:
|
||||||
|
@ -1,12 +1,12 @@
|
|||||||
.NET API Usage example
|
.NET API Usage example
|
||||||
======================
|
======================
|
||||||
|
|
||||||
Examples are from `native_client/dotnet/DeepSpeechConsole/Program.cs`.
|
Examples are from `native_client/dotnet/MozillaVoiceSttConsole/Program.cs`.
|
||||||
|
|
||||||
Creating a model instance and loading model
|
Creating a model instance and loading model
|
||||||
-------------------------------------------
|
-------------------------------------------
|
||||||
|
|
||||||
.. literalinclude:: ../native_client/dotnet/DeepSpeechConsole/Program.cs
|
.. literalinclude:: ../native_client/dotnet/MozillaVoiceSttConsole/Program.cs
|
||||||
:language: csharp
|
:language: csharp
|
||||||
:linenos:
|
:linenos:
|
||||||
:lineno-match:
|
:lineno-match:
|
||||||
@ -16,7 +16,7 @@ Creating a model instance and loading model
|
|||||||
Performing inference
|
Performing inference
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
.. literalinclude:: ../native_client/dotnet/DeepSpeechConsole/Program.cs
|
.. literalinclude:: ../native_client/dotnet/MozillaVoiceSttConsole/Program.cs
|
||||||
:language: csharp
|
:language: csharp
|
||||||
:linenos:
|
:linenos:
|
||||||
:lineno-match:
|
:lineno-match:
|
||||||
@ -26,4 +26,4 @@ Performing inference
|
|||||||
Full source code
|
Full source code
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
See :download:`Full source code<../native_client/dotnet/DeepSpeechConsole/Program.cs>`.
|
See :download:`Full source code<../native_client/dotnet/MozillaVoiceSttConsole/Program.cs>`.
|
||||||
|
@ -1,29 +1,29 @@
|
|||||||
Java
|
Java
|
||||||
====
|
====
|
||||||
|
|
||||||
DeepSpeechModel
|
MozillaVoiceSttModel
|
||||||
---------------
|
--------------------
|
||||||
|
|
||||||
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::DeepSpeechModel
|
.. doxygenclass:: org::mozilla::voice::stt::MozillaVoiceSttModel
|
||||||
:project: deepspeech-java
|
:project: deepspeech-java
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
Metadata
|
Metadata
|
||||||
--------
|
--------
|
||||||
|
|
||||||
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::Metadata
|
.. doxygenclass:: org::mozilla::voice::stt::Metadata
|
||||||
:project: deepspeech-java
|
:project: deepspeech-java
|
||||||
:members: getNumTranscripts, getTranscript
|
:members: getNumTranscripts, getTranscript
|
||||||
|
|
||||||
CandidateTranscript
|
CandidateTranscript
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::CandidateTranscript
|
.. doxygenclass:: org::mozilla::voice::stt::CandidateTranscript
|
||||||
:project: deepspeech-java
|
:project: deepspeech-java
|
||||||
:members: getNumTokens, getConfidence, getToken
|
:members: getNumTokens, getConfidence, getToken
|
||||||
|
|
||||||
TokenMetadata
|
TokenMetadata
|
||||||
-------------
|
-------------
|
||||||
.. doxygenclass:: org::mozilla::deepspeech::libdeepspeech::TokenMetadata
|
.. doxygenclass:: org::mozilla::voice::stt::TokenMetadata
|
||||||
:project: deepspeech-java
|
:project: deepspeech-java
|
||||||
:members: getText, getTimestep, getStartTime
|
:members: getText, getTimestep, getStartTime
|
||||||
|
@ -1,12 +1,12 @@
|
|||||||
Java API Usage example
|
Java API Usage example
|
||||||
======================
|
======================
|
||||||
|
|
||||||
Examples are from `native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java`.
|
Examples are from `native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java`.
|
||||||
|
|
||||||
Creating a model instance and loading model
|
Creating a model instance and loading model
|
||||||
-------------------------------------------
|
-------------------------------------------
|
||||||
|
|
||||||
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java
|
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java
|
||||||
:language: java
|
:language: java
|
||||||
:linenos:
|
:linenos:
|
||||||
:lineno-match:
|
:lineno-match:
|
||||||
@ -16,7 +16,7 @@ Creating a model instance and loading model
|
|||||||
Performing inference
|
Performing inference
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java
|
.. literalinclude:: ../native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java
|
||||||
:language: java
|
:language: java
|
||||||
:linenos:
|
:linenos:
|
||||||
:lineno-match:
|
:lineno-match:
|
||||||
@ -26,4 +26,4 @@ Performing inference
|
|||||||
Full source code
|
Full source code
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
See :download:`Full source code<../native_client/java/app/src/main/java/org/mozilla/deepspeech/DeepSpeechActivity.java>`.
|
See :download:`Full source code<../native_client/java/app/src/main/java/org/mozilla/voice/sttapp/MozillaVoiceSttActivity.java>`.
|
||||||
|
@ -1,8 +1,8 @@
|
|||||||
Parallel Optimization
|
Parallel Optimization
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
This is how we implement optimization of the DeepSpeech model across GPUs on a
|
This is how we implement optimization of the Mozilla Voice STT model across GPUs
|
||||||
single host. Parallel optimization can take on various forms. For example
|
on a single host. Parallel optimization can take on various forms. For example
|
||||||
one can use asynchronous updates of the model, synchronous updates of the model,
|
one can use asynchronous updates of the model, synchronous updates of the model,
|
||||||
or some combination of the two.
|
or some combination of the two.
|
||||||
|
|
||||||
|
@ -9,61 +9,61 @@ Linux / AMD64 without GPU
|
|||||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
||||||
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
|
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
|
||||||
* Full TensorFlow runtime (``deepspeech`` packages)
|
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Linux / AMD64 with GPU
|
Linux / AMD64 with GPU
|
||||||
^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^
|
||||||
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
||||||
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
|
* Ubuntu 14.04+ (glibc >= 2.19, libstdc++6 >= 4.8)
|
||||||
* CUDA 10.0 (and capable GPU)
|
* CUDA 10.0 (and capable GPU)
|
||||||
* Full TensorFlow runtime (``deepspeech`` packages)
|
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Linux / ARMv7
|
Linux / ARMv7
|
||||||
^^^^^^^^^^^^^
|
^^^^^^^^^^^^^
|
||||||
* Cortex-A53 compatible ARMv7 SoC with Neon support
|
* Cortex-A53 compatible ARMv7 SoC with Neon support
|
||||||
* Raspbian Buster-compatible distribution
|
* Raspbian Buster-compatible distribution
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Linux / Aarch64
|
Linux / Aarch64
|
||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
* Cortex-A72 compatible Aarch64 SoC
|
* Cortex-A72 compatible Aarch64 SoC
|
||||||
* ARMbian Buster-compatible distribution
|
* ARMbian Buster-compatible distribution
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Android / ARMv7
|
Android / ARMv7
|
||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
* ARMv7 SoC with Neon support
|
* ARMv7 SoC with Neon support
|
||||||
* Android 7.0-10.0
|
* Android 7.0-10.0
|
||||||
* NDK API level >= 21
|
* NDK API level >= 21
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Android / Aarch64
|
Android / Aarch64
|
||||||
^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^
|
||||||
* Aarch64 SoC
|
* Aarch64 SoC
|
||||||
* Android 7.0-10.0
|
* Android 7.0-10.0
|
||||||
* NDK API level >= 21
|
* NDK API level >= 21
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
macOS / AMD64
|
macOS / AMD64
|
||||||
^^^^^^^^^^^^^
|
^^^^^^^^^^^^^
|
||||||
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
||||||
* macOS >= 10.10
|
* macOS >= 10.10
|
||||||
* Full TensorFlow runtime (``deepspeech`` packages)
|
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Windows / AMD64 without GPU
|
Windows / AMD64 without GPU
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
||||||
* Windows Server >= 2012 R2 ; Windows >= 8.1
|
* Windows Server >= 2012 R2 ; Windows >= 8.1
|
||||||
* Full TensorFlow runtime (``deepspeech`` packages)
|
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
|
||||||
Windows / AMD64 with GPU
|
Windows / AMD64 with GPU
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
* x86-64 CPU with AVX/FMA (one can rebuild without AVX/FMA, but it might slow down inference)
|
||||||
* Windows Server >= 2012 R2 ; Windows >= 8.1
|
* Windows Server >= 2012 R2 ; Windows >= 8.1
|
||||||
* CUDA 10.0 (and capable GPU)
|
* CUDA 10.0 (and capable GPU)
|
||||||
* Full TensorFlow runtime (``deepspeech`` packages)
|
* Full TensorFlow runtime (``mozilla_voice_stt`` packages)
|
||||||
* TensorFlow Lite runtime (``deepspeech-tflite`` packages)
|
* TensorFlow Lite runtime (``mozilla_voice_stt_tflite`` packages)
|
||||||
|
@ -21,11 +21,11 @@ Clone the Mozilla Voice STT repository:
|
|||||||
Creating a virtual environment
|
Creating a virtual environment
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-train-venv``. You can create it using this command:
|
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run Mozilla Voice STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/stt-train-venv``. You can create it using this command:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ python3 -m venv $HOME/tmp/deepspeech-train-venv/
|
$ python3 -m venv $HOME/tmp/stt-train-venv/
|
||||||
|
|
||||||
Once this command completes successfully, the environment will be ready to be activated.
|
Once this command completes successfully, the environment will be ready to be activated.
|
||||||
|
|
||||||
@ -36,7 +36,7 @@ Each time you need to work with Mozilla Voice STT, you have to *activate* this v
|
|||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ source $HOME/tmp/deepspeech-train-venv/bin/activate
|
$ source $HOME/tmp/stt-train-venv/bin/activate
|
||||||
|
|
||||||
Installing Mozilla Voice STT Training Code and its dependencies
|
Installing Mozilla Voice STT Training Code and its dependencies
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
@ -13,7 +13,7 @@ Inference using a Mozilla Voice STT pre-trained model can be done with a client/
|
|||||||
|
|
||||||
.. _runtime-deps:
|
.. _runtime-deps:
|
||||||
|
|
||||||
Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system:
|
Running ``mozilla_voice_stt`` might, see below, require some runtime dependencies to be already installed on your system:
|
||||||
|
|
||||||
* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz.
|
* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz.
|
||||||
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
|
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
|
||||||
@ -28,7 +28,7 @@ Please refer to your system's documentation on how to install these dependencies
|
|||||||
CUDA dependency
|
CUDA dependency
|
||||||
^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The GPU capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN v7.6.
|
The CUDA capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN v7.6.
|
||||||
|
|
||||||
Getting the pre-trained model
|
Getting the pre-trained model
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
@ -40,17 +40,17 @@ If you want to use the pre-trained English model for performing speech-to-text,
|
|||||||
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
|
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
|
||||||
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.scorer
|
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.scorer
|
||||||
|
|
||||||
There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``deepspeech``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``deepspeech-gpu``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``deepspeech-tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``deepspeech``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`.
|
There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``mozilla_voice_stt``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``mozilla_voice_stt_cuda``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``mozilla_voice_stt_tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``mozilla_voice_stt``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`.
|
||||||
|
|
||||||
+--------------------+---------------------+---------------------+
|
+--------------------------+---------------------+---------------------+
|
||||||
| Package/Model type | .pbmm | .tflite |
|
| Package/Model type | .pbmm | .tflite |
|
||||||
+====================+=====================+=====================+
|
+==========================+=====================+=====================+
|
||||||
| deepspeech | Depends on platform | Depends on platform |
|
| mozilla_voice_stt | Depends on platform | Depends on platform |
|
||||||
+--------------------+---------------------+---------------------+
|
+--------------------------+---------------------+---------------------+
|
||||||
| deepspeech-gpu | ✅ | ❌ |
|
| mozilla_voice_stt_cuda | ✅ | ❌ |
|
||||||
+--------------------+---------------------+---------------------+
|
+--------------------------+---------------------+---------------------+
|
||||||
| deepspeech-tflite | ❌ | ✅ |
|
| mozilla_voice_stt_tflite | ❌ | ✅ |
|
||||||
+--------------------+---------------------+---------------------+
|
+--------------------------+---------------------+---------------------+
|
||||||
|
|
||||||
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
|
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
|
||||||
|
|
||||||
@ -73,7 +73,7 @@ Mozilla Voice STT models are versioned to keep you from trying to use an incompa
|
|||||||
Using the Python package
|
Using the Python package
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``deepspeech`` binary to do speech-to-text on an audio file:
|
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``mozilla_voice_stt`` binary to do speech-to-text on an audio file:
|
||||||
|
|
||||||
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
|
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
|
||||||
|
|
||||||
@ -82,11 +82,11 @@ We will continue under the assumption that you already have your system properly
|
|||||||
Create a Mozilla Voice STT virtual environment
|
Create a Mozilla Voice STT virtual environment
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-venv``. You can create it using this command:
|
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run Mozilla Voice STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/stt-venv``. You can create it using this command:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
|
$ virtualenv -p python3 $HOME/tmp/stt-venv/
|
||||||
|
|
||||||
Once this command completes successfully, the environment will be ready to be activated.
|
Once this command completes successfully, the environment will be ready to be activated.
|
||||||
|
|
||||||
@ -97,46 +97,46 @@ Each time you need to work with Mozilla Voice STT, you have to *activate* this v
|
|||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ source $HOME/tmp/deepspeech-venv/bin/activate
|
$ source $HOME/tmp/stt-venv/bin/activate
|
||||||
|
|
||||||
Installing Mozilla Voice STT Python bindings
|
Installing Mozilla Voice STT Python bindings
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the Mozilla Voice STT wheel. You can check if ``deepspeech`` is already installed with ``pip3 list``.
|
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the Mozilla Voice STT wheel. You can check if ``mozilla_voice_stt`` is already installed with ``pip3 list``.
|
||||||
|
|
||||||
To perform the installation, just use ``pip3`` as such:
|
To perform the installation, just use ``pip3`` as such:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ pip3 install deepspeech
|
$ pip3 install mozilla_voice_stt
|
||||||
|
|
||||||
If ``deepspeech`` is already installed, you can update it as such:
|
If ``mozilla_voice_stt`` is already installed, you can update it as such:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ pip3 install --upgrade deepspeech
|
$ pip3 install --upgrade mozilla_voice_stt
|
||||||
|
|
||||||
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
|
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the CUDA specific package as follows:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ pip3 install deepspeech-gpu
|
$ pip3 install mozilla_voice_stt_cuda
|
||||||
|
|
||||||
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
|
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
|
||||||
|
|
||||||
You can update ``deepspeech-gpu`` as follows:
|
You can update ``mozilla_voice_stt_cuda`` as follows:
|
||||||
|
|
||||||
.. code-block::
|
.. code-block::
|
||||||
|
|
||||||
$ pip3 install --upgrade deepspeech-gpu
|
$ pip3 install --upgrade mozilla_voice_stt_cuda
|
||||||
|
|
||||||
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``deepspeech`` from the command-line.
|
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``mozilla_voice_stt`` from the command-line.
|
||||||
|
|
||||||
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
|
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio my_audio_file.wav
|
mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio my_audio_file.wav
|
||||||
|
|
||||||
The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio.
|
The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio.
|
||||||
|
|
||||||
@ -151,7 +151,7 @@ You can download the JS bindings using ``npm``\ :
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
npm install deepspeech
|
npm install mozilla_voice_stt
|
||||||
|
|
||||||
Please note that as of now, we support:
|
Please note that as of now, we support:
|
||||||
- Node.JS versions 4 to 13.
|
- Node.JS versions 4 to 13.
|
||||||
@ -159,11 +159,11 @@ Please note that as of now, we support:
|
|||||||
|
|
||||||
TypeScript support is also provided.
|
TypeScript support is also provided.
|
||||||
|
|
||||||
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
|
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the CUDA specific package as follows:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
npm install deepspeech-gpu
|
npm install mozilla_voice_stt_cuda
|
||||||
|
|
||||||
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
|
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
|
||||||
|
|
||||||
@ -174,7 +174,7 @@ See the :ref:`TypeScript client <js-api-example>` for an example of how to use t
|
|||||||
Using the command-line client
|
Using the command-line client
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
To download the pre-built binaries for the ``deepspeech`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
|
To download the pre-built binaries for the ``mozilla_voice_stt`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
@ -192,7 +192,7 @@ also, if you need some binaries different than current master, like ``v0.2.0-alp
|
|||||||
|
|
||||||
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
|
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
|
||||||
|
|
||||||
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``deepspeech`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of Mozilla Voice STT or TensorFlow can be specified as well.
|
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``mozilla_voice_stt`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of Mozilla Voice STT or TensorFlow can be specified as well.
|
||||||
|
|
||||||
Alternatively you may manually download the ``native_client.tar.xz`` from the [releases](https://github.com/mozilla/DeepSpeech/releases).
|
Alternatively you may manually download the ``native_client.tar.xz`` from the [releases](https://github.com/mozilla/DeepSpeech/releases).
|
||||||
|
|
||||||
@ -200,9 +200,9 @@ Note: the following command assumes you `downloaded the pre-trained model <#gett
|
|||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
./deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio_input.wav
|
./mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio_input.wav
|
||||||
|
|
||||||
See the help output with ``./deepspeech -h`` for more details.
|
See the help output with ``./mozilla_voice_stt -h`` for more details.
|
||||||
|
|
||||||
Installing bindings from source
|
Installing bindings from source
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
@ -24,7 +24,7 @@ import sys
|
|||||||
|
|
||||||
sys.path.insert(0, os.path.abspath('../'))
|
sys.path.insert(0, os.path.abspath('../'))
|
||||||
|
|
||||||
autodoc_mock_imports = ['deepspeech']
|
autodoc_mock_imports = ['mozilla_voice_stt']
|
||||||
|
|
||||||
# This is in fact only relevant on ReadTheDocs, but we want to run the same way
|
# This is in fact only relevant on ReadTheDocs, but we want to run the same way
|
||||||
# on our CI as in RTD to avoid regressions on RTD that we would not catch on
|
# on our CI as in RTD to avoid regressions on RTD that we would not catch on
|
||||||
@ -143,7 +143,7 @@ html_static_path = ['.static']
|
|||||||
# -- Options for HTMLHelp output ------------------------------------------
|
# -- Options for HTMLHelp output ------------------------------------------
|
||||||
|
|
||||||
# Output file base name for HTML help builder.
|
# Output file base name for HTML help builder.
|
||||||
htmlhelp_basename = 'DeepSpeechdoc'
|
htmlhelp_basename = 'sttdoc'
|
||||||
|
|
||||||
|
|
||||||
# -- Options for LaTeX output ---------------------------------------------
|
# -- Options for LaTeX output ---------------------------------------------
|
||||||
@ -180,7 +180,7 @@ latex_documents = [
|
|||||||
# One entry per manual page. List of tuples
|
# One entry per manual page. List of tuples
|
||||||
# (source start file, name, description, authors, manual section).
|
# (source start file, name, description, authors, manual section).
|
||||||
man_pages = [
|
man_pages = [
|
||||||
(master_doc, 'deepspeech', u'Mozilla Voice STT Documentation',
|
(master_doc, 'mozilla_voice_stt', u'Mozilla Voice STT Documentation',
|
||||||
[author], 1)
|
[author], 1)
|
||||||
]
|
]
|
||||||
|
|
||||||
|
@ -790,7 +790,7 @@ WARN_LOGFILE =
|
|||||||
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
||||||
# Note: If this tag is empty the current directory is searched.
|
# Note: If this tag is empty the current directory is searched.
|
||||||
|
|
||||||
INPUT = native_client/dotnet/DeepSpeechClient/ native_client/dotnet/DeepSpeechClient/Interfaces/ native_client/dotnet/DeepSpeechClient/Enums/ native_client/dotnet/DeepSpeechClient/Models/
|
INPUT = native_client/dotnet/MozillaVoiceSttClient/ native_client/dotnet/MozillaVoiceSttClient/Interfaces/ native_client/dotnet/MozillaVoiceSttClient/Enums/ native_client/dotnet/MozillaVoiceSttClient/Models/
|
||||||
|
|
||||||
# This tag can be used to specify the character encoding of the source files
|
# This tag can be used to specify the character encoding of the source files
|
||||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||||
|
@ -790,7 +790,7 @@ WARN_LOGFILE =
|
|||||||
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
|
||||||
# Note: If this tag is empty the current directory is searched.
|
# Note: If this tag is empty the current directory is searched.
|
||||||
|
|
||||||
INPUT = native_client/java/libdeepspeech/src/main/java/org/mozilla/deepspeech/libdeepspeech/ native_client/java/libdeepspeech/src/main/java/org/mozilla/deepspeech/libdeepspeech_doc/
|
INPUT = native_client/java/libmozillavoicestt/src/main/java/org/mozilla/voice/stt/ native_client/java/libmozillavoicestt/src/main/java/org/mozilla/voice/stt_doc/
|
||||||
|
|
||||||
# This tag can be used to specify the character encoding of the source files
|
# This tag can be used to specify the character encoding of the source files
|
||||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||||
|
@ -13,11 +13,11 @@ To install and use Mozilla Voice STT all you have to do is:
|
|||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
# Create and activate a virtualenv
|
# Create and activate a virtualenv
|
||||||
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
|
virtualenv -p python3 $HOME/tmp/stt-venv/
|
||||||
source $HOME/tmp/deepspeech-venv/bin/activate
|
source $HOME/tmp/stt-venv/bin/activate
|
||||||
|
|
||||||
# Install Mozilla Voice STT
|
# Install Mozilla Voice STT
|
||||||
pip3 install deepspeech
|
pip3 install mozilla_voice_stt
|
||||||
|
|
||||||
# Download pre-trained English model files
|
# Download pre-trained English model files
|
||||||
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
|
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.4/deepspeech-0.7.4-models.pbmm
|
||||||
@ -28,27 +28,27 @@ To install and use Mozilla Voice STT all you have to do is:
|
|||||||
tar xvf audio-0.7.4.tar.gz
|
tar xvf audio-0.7.4.tar.gz
|
||||||
|
|
||||||
# Transcribe an audio file
|
# Transcribe an audio file
|
||||||
deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
|
mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
|
||||||
|
|
||||||
A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs <usage-docs>`. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page <https://github.com/mozilla/DeepSpeech/releases/latest>`_.
|
A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs <usage-docs>`. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page <https://github.com/mozilla/DeepSpeech/releases/latest>`_.
|
||||||
|
|
||||||
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package:
|
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``mozilla_voice_stt`` on a GPU, install the GPU specific package:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
# Create and activate a virtualenv
|
# Create and activate a virtualenv
|
||||||
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
|
virtualenv -p python3 $HOME/tmp/stt-gpu-venv/
|
||||||
source $HOME/tmp/deepspeech-gpu-venv/bin/activate
|
source $HOME/tmp/stt-gpu-venv/bin/activate
|
||||||
|
|
||||||
# Install Mozilla Voice STT CUDA enabled package
|
# Install Mozilla Voice STT CUDA enabled package
|
||||||
pip3 install deepspeech-gpu
|
pip3 install mozilla_voice_stt_cuda
|
||||||
|
|
||||||
# Transcribe an audio file.
|
# Transcribe an audio file.
|
||||||
deepspeech --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
|
mozilla_voice_stt --model deepspeech-0.7.4-models.pbmm --scorer deepspeech-0.7.4-models.scorer --audio audio/2830-3980-0043.wav
|
||||||
|
|
||||||
Please ensure you have the required :ref:`CUDA dependencies <cuda-deps>`.
|
Please ensure you have the required :ref:`CUDA dependencies <cuda-deps>`.
|
||||||
|
|
||||||
See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``, please check :ref:`required runtime dependencies <runtime-deps>`).
|
See the output of ``mozilla_voice_stt -h`` for more information on the use of ``mozilla_voice_stt``. (If you experience problems running ``mozilla_voice_stt``, please check :ref:`required runtime dependencies <runtime-deps>`).
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
@ -19,11 +19,8 @@ from six.moves import zip, range
|
|||||||
|
|
||||||
r'''
|
r'''
|
||||||
This module should be self-contained:
|
This module should be self-contained:
|
||||||
- build libmozilla_voice_stt.so with TFLite:
|
|
||||||
- bazel build [...] --define=runtime=tflite [...] //native_client:libmozilla_voice_stt.so
|
|
||||||
- make -C native_client/python/ TFDIR=... bindings
|
|
||||||
- setup a virtualenv
|
- setup a virtualenv
|
||||||
- pip install native_client/python/dist/deepspeech*.whl
|
- pip install mozilla_voice_stt_tflite
|
||||||
- pip install -r requirements_eval_tflite.txt
|
- pip install -r requirements_eval_tflite.txt
|
||||||
|
|
||||||
Then run with a TF Lite model, a scorer and a CSV test file
|
Then run with a TF Lite model, a scorer and a CSV test file
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
Examples
|
Examples
|
||||||
========
|
========
|
||||||
|
|
||||||
DeepSpeech examples were moved to a separate repository.
|
Mozilla Voice STT examples were moved to a separate repository.
|
||||||
|
|
||||||
New location: https://github.com/mozilla/DeepSpeech-examples
|
New location: https://github.com/mozilla/DeepSpeech-examples
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
This file contains some notes on coding style within the C++ portion of the
|
This file contains some notes on coding style within the C++ portion of the
|
||||||
DeepSpeech project. It is very much a work in progress and incomplete.
|
Mozilla Voice STT project. It is very much a work in progress and incomplete.
|
||||||
|
|
||||||
General
|
General
|
||||||
=======
|
=======
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
absl-py==0.9.0
|
absl-py==0.9.0
|
||||||
attrdict==2.0.1
|
attrdict==2.0.1
|
||||||
deepspeech
|
mozilla_voice_stt_tflite
|
||||||
numpy==1.16.0
|
numpy==1.16.0
|
||||||
progressbar2==3.47.0
|
progressbar2==3.47.0
|
||||||
python-utils==2.3.0
|
python-utils==2.3.0
|
||||||
|
@ -26,7 +26,7 @@ then:
|
|||||||
DEEPSPEECH_AUDIO: "https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/audio-0.4.1.tar.gz"
|
DEEPSPEECH_AUDIO: "https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/audio-0.4.1.tar.gz"
|
||||||
PIP_DEFAULT_TIMEOUT: "60"
|
PIP_DEFAULT_TIMEOUT: "60"
|
||||||
EXAMPLES_CLONE_URL: "https://github.com/mozilla/DeepSpeech-examples"
|
EXAMPLES_CLONE_URL: "https://github.com/mozilla/DeepSpeech-examples"
|
||||||
EXAMPLES_CHECKOUT_TARGET: "rename-test"
|
EXAMPLES_CHECKOUT_TARGET: "master"
|
||||||
|
|
||||||
command:
|
command:
|
||||||
- "/bin/bash"
|
- "/bin/bash"
|
||||||
|
@ -7,4 +7,4 @@ source $(dirname "$0")/tc-tests-utils.sh
|
|||||||
mkdir -p ${TASKCLUSTER_ARTIFACTS} || true
|
mkdir -p ${TASKCLUSTER_ARTIFACTS} || true
|
||||||
|
|
||||||
# NodeJS package
|
# NodeJS package
|
||||||
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt-*.tgz ${TASKCLUSTER_ARTIFACTS}/
|
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt*.tgz ${TASKCLUSTER_ARTIFACTS}/
|
||||||
|
@ -14,7 +14,7 @@ package_libdeepspeech_as_zip "libmozilla_voice_stt.zip"
|
|||||||
|
|
||||||
if [ -d ${DS_ROOT_TASK}/DeepSpeech/ds/wheels ]; then
|
if [ -d ${DS_ROOT_TASK}/DeepSpeech/ds/wheels ]; then
|
||||||
cp ${DS_ROOT_TASK}/DeepSpeech/ds/wheels/* ${TASKCLUSTER_ARTIFACTS}/
|
cp ${DS_ROOT_TASK}/DeepSpeech/ds/wheels/* ${TASKCLUSTER_ARTIFACTS}/
|
||||||
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt-*.tgz ${TASKCLUSTER_ARTIFACTS}/
|
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt*.tgz ${TASKCLUSTER_ARTIFACTS}/
|
||||||
fi;
|
fi;
|
||||||
|
|
||||||
if [ -f ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ]; then
|
if [ -f ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ]; then
|
||||||
|
@ -17,9 +17,9 @@ do_deepspeech_python_build()
|
|||||||
|
|
||||||
SETUP_FLAGS=""
|
SETUP_FLAGS=""
|
||||||
if [ "${package_option}" = "--cuda" ]; then
|
if [ "${package_option}" = "--cuda" ]; then
|
||||||
SETUP_FLAGS="--project_name mozilla_voice_stt-gpu"
|
SETUP_FLAGS="--project_name mozilla_voice_stt_cuda"
|
||||||
elif [ "${package_option}" = "--tflite" ]; then
|
elif [ "${package_option}" = "--tflite" ]; then
|
||||||
SETUP_FLAGS="--project_name mozilla_voice_stt-tflite"
|
SETUP_FLAGS="--project_name mozilla_voice_stt_tflite"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
for pyver_conf in ${SUPPORTED_PYTHON_VERSIONS}; do
|
for pyver_conf in ${SUPPORTED_PYTHON_VERSIONS}; do
|
||||||
@ -133,7 +133,7 @@ do_deepspeech_nodejs_build()
|
|||||||
done;
|
done;
|
||||||
|
|
||||||
if [ "${rename_to_gpu}" = "--cuda" ]; then
|
if [ "${rename_to_gpu}" = "--cuda" ]; then
|
||||||
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt-gpu
|
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt_cuda
|
||||||
else
|
else
|
||||||
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt
|
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt
|
||||||
fi
|
fi
|
||||||
@ -165,9 +165,9 @@ do_deepspeech_npm_package()
|
|||||||
done;
|
done;
|
||||||
|
|
||||||
if [ "${package_option}" = "--cuda" ]; then
|
if [ "${package_option}" = "--cuda" ]; then
|
||||||
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt-gpu
|
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt_cuda
|
||||||
elif [ "${package_option}" = "--tflite" ]; then
|
elif [ "${package_option}" = "--tflite" ]; then
|
||||||
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt-tflite
|
make -C native_client/javascript clean npm-pack PROJECT_NAME=mozilla_voice_stt_tflite
|
||||||
else
|
else
|
||||||
make -C native_client/javascript clean npm-pack
|
make -C native_client/javascript clean npm-pack
|
||||||
fi
|
fi
|
||||||
|
@ -7,8 +7,8 @@ get_dep_npm_pkg_url()
|
|||||||
{
|
{
|
||||||
local all_deps="$(curl -s https://community-tc.services.mozilla.com/api/queue/v1/task/${TASK_ID} | python -c 'import json; import sys; print(" ".join(json.loads(sys.stdin.read())["dependencies"]));')"
|
local all_deps="$(curl -s https://community-tc.services.mozilla.com/api/queue/v1/task/${TASK_ID} | python -c 'import json; import sys; print(" ".join(json.loads(sys.stdin.read())["dependencies"]));')"
|
||||||
|
|
||||||
# We try "mozilla_voice_stt-tflite" and "mozilla_voice_stt-gpu" first and if we don't find it we try "mozilla_voice_stt"
|
# We try "mozilla_voice_stt_tflite" and "mozilla_voice_stt_cuda" first and if we don't find it we try "mozilla_voice_stt"
|
||||||
for pkg_basename in "mozilla_voice_stt-tflite" "mozilla_voice_stt-gpu" "mozilla_voice_stt"; do
|
for pkg_basename in "mozilla_voice_stt_tflite" "mozilla_voice_stt_cuda" "mozilla_voice_stt"; do
|
||||||
local deepspeech_pkg="${pkg_basename}-${DS_VERSION}.tgz"
|
local deepspeech_pkg="${pkg_basename}-${DS_VERSION}.tgz"
|
||||||
for dep in ${all_deps}; do
|
for dep in ${all_deps}; do
|
||||||
local has_artifact=$(curl -s https://community-tc.services.mozilla.com/api/queue/v1/task/${dep}/artifacts | python -c 'import json; import sys; has_artifact = True in [ e["name"].find("'${deepspeech_pkg}'") > 0 for e in json.loads(sys.stdin.read())["artifacts"] ]; print(has_artifact)')
|
local has_artifact=$(curl -s https://community-tc.services.mozilla.com/api/queue/v1/task/${dep}/artifacts | python -c 'import json; import sys; has_artifact = True in [ e["name"].find("'${deepspeech_pkg}'") > 0 for e in json.loads(sys.stdin.read())["artifacts"] ]; print(has_artifact)')
|
||||||
|
@ -14,7 +14,7 @@ download_data
|
|||||||
virtualenv_activate "${pyalias}" "deepspeech"
|
virtualenv_activate "${pyalias}" "deepspeech"
|
||||||
|
|
||||||
if [ "$3" = "cuda" ]; then
|
if [ "$3" = "cuda" ]; then
|
||||||
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}" "mozilla_voice_stt_gpu")
|
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}" "mozilla_voice_stt_cuda")
|
||||||
else
|
else
|
||||||
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}")
|
deepspeech_pkg_url=$(get_python_pkg_url "${pyver_pkg}" "${py_unicode_type}")
|
||||||
fi;
|
fi;
|
||||||
|
@ -22,5 +22,5 @@ fi;
|
|||||||
|
|
||||||
if [ -f ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ]; then
|
if [ -f ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ]; then
|
||||||
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ${TASKCLUSTER_ARTIFACTS}/
|
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/wrapper.tar.gz ${TASKCLUSTER_ARTIFACTS}/
|
||||||
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt-*.tgz ${TASKCLUSTER_ARTIFACTS}/
|
cp ${DS_ROOT_TASK}/DeepSpeech/ds/native_client/javascript/mozilla_voice_stt*.tgz ${TASKCLUSTER_ARTIFACTS}/
|
||||||
fi;
|
fi;
|
||||||
|
@ -47,14 +47,14 @@ def check_ctcdecoder_version():
|
|||||||
from ds_ctcdecoder import __version__ as decoder_version
|
from ds_ctcdecoder import __version__ as decoder_version
|
||||||
except ImportError as e:
|
except ImportError as e:
|
||||||
if e.msg.find('__version__') > 0:
|
if e.msg.find('__version__') > 0:
|
||||||
print("DeepSpeech version ({ds_version}) requires CTC decoder to expose __version__. "
|
print("Mozilla Voice STT version ({ds_version}) requires CTC decoder to expose __version__. "
|
||||||
"Please upgrade the ds_ctcdecoder package to version {ds_version}".format(ds_version=ds_version_s))
|
"Please upgrade the ds_ctcdecoder package to version {ds_version}".format(ds_version=ds_version_s))
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
raise e
|
raise e
|
||||||
|
|
||||||
rv = semver.compare(ds_version_s, decoder_version)
|
rv = semver.compare(ds_version_s, decoder_version)
|
||||||
if rv != 0:
|
if rv != 0:
|
||||||
print("DeepSpeech version ({}) and CTC decoder version ({}) do not match. "
|
print("Mozilla Voice STT version ({}) and CTC decoder version ({}) do not match. "
|
||||||
"Please ensure matching versions are in use.".format(ds_version_s, decoder_version))
|
"Please ensure matching versions are in use.".format(ds_version_s, decoder_version))
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user