Rebranding WIP

This commit is contained in:
Kelly Davis 2021-03-05 12:48:08 +01:00
parent 2bb42d4fb1
commit d2009582e9
144 changed files with 584 additions and 594 deletions

View File

@ -1,5 +1,4 @@
This file contains a list of papers in chronological order that have been published This file contains a list of papers in chronological order that have been published using 🐸STT.
using DeepSpeech.
To appear To appear
========== ==========

View File

@ -1,7 +1,7 @@
DeepSpeech code owners / governance system Coqui STT code owners / governance system
========================================== =========================================
DeepSpeech is run under a governance system inspired (and partially copied from) by the `Mozilla module ownership system <https://www.mozilla.org/about/governance/policies/module-ownership/>`_. The project is roughly divided into modules, and each module has its own owners, which are responsible for reviewing pull requests and deciding on technical direction for their modules. Module ownership authority is given to people who have worked extensively on areas of the project. 🐸STT is run under a governance system inspired (and partially copied from) by the `Mozilla module ownership system <https://www.mozilla.org/about/governance/policies/module-ownership/>`_. The project is roughly divided into modules, and each module has its own owners, which are responsible for reviewing pull requests and deciding on technical direction for their modules. Module ownership authority is given to people who have worked extensively on areas of the project.
Module owners also have the authority of naming other module owners or appointing module peers, which are people with authority to review pull requests in that module. They can also sub-divide their module into sub-modules with their own owners. Module owners also have the authority of naming other module owners or appointing module peers, which are people with authority to review pull requests in that module. They can also sub-divide their module into sub-modules with their own owners.
@ -46,7 +46,7 @@ Testing & CI
Native inference client Native inference client
----------------------- -----------------------
Everything that goes into libdeepspeech.so and is not specifically covered in another area fits here. Everything that goes into libstt.so and is not specifically covered in another area fits here.
- Alexandre Lissy (@lissyx) - Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben) - Reuben Morais (@reuben)
@ -110,7 +110,7 @@ Documentation
- Alexandre Lissy (@lissyx) - Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben) - Reuben Morais (@reuben)
Third party bindings .. Third party bindings
-------------------- --------------------
Hosted externally and owned by the individual authors. See the `list of third-party bindings <https://deepspeech.readthedocs.io/en/master/USING.html#third-party-bindings>`_ for more info. Hosted externally and owned by the individual authors. See the `list of third-party bindings <https://stt.readthedocs.io/en/latest/ USING.html#third-party-bindings>`_ for more info.

View File

@ -1,14 +1,14 @@
Contribution guidelines Contribution guidelines
======================= =======================
Welcome to the DeepSpeech project! We are excited to see your interest, and appreciate your support! Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the `Mozilla Community Participation Guidelines <https://www.mozilla.org/about/governance/policies/participation/>`_. This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the `Mozilla Community Participation Guidelines <https://www.mozilla.org/about/governance/policies/participation/>`_.
How to Make a Good Pull Request How to Make a Good Pull Request
------------------------------- -------------------------------
Here's some guidelines on how to make a good PR to DeepSpeech. Here's some guidelines on how to make a good PR to 🐸STT.
Bug-fix PR Bug-fix PR
^^^^^^^^^^ ^^^^^^^^^^
@ -18,20 +18,20 @@ You've found a bug and you were able to squash it! Great job! Please write a sho
Documentation PR Documentation PR
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
If you're just making updates or changes to the documentation, there's no need to run all of DeepSpeech's tests for Continuous Integration (i.e. Taskcluster tests). In this case, at the end of your short but clear commit message, you should add **X-DeepSpeech: NOBUILD**. This will trigger the CI tests to skip your PR, saving both time and compute. If you're just making updates or changes to the documentation, there's no need to run all of 🐸STT's tests for Continuous Integration (i.e. Taskcluster tests). In this case, at the end of your short but clear commit message, you should add **X-DeepSpeech: NOBUILD**. This will trigger the CI tests to skip your PR, saving both time and compute.
New Feature PR New Feature PR
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
You've made some core changes to DeepSpeech, and you would like to share them back with the community -- great! First things first: if you're planning to add a feature (not just fix a bug or docs) let the DeepSpeech team know ahead of time and get some feedback early. A quick check-in with the team can save time during code-review, and also ensure that your new feature fits into the project. You've made some core changes to 🐸STT, and you would like to share them back with the community -- great! First things first: if you're planning to add a feature (not just fix a bug or docs) let the 🐸STT team know ahead of time and get some feedback early. A quick check-in with the team can save time during code-review, and also ensure that your new feature fits into the project.
The DeepSpeech codebase is made of many connected parts. There is Python code for training DeepSpeech, core C++ code for running inference on trained models, and multiple language bindings to the C++ core so you can use DeepSpeech in your favorite language. The 🐸STT codebase is made of many connected parts. There is Python code for training 🐸STT, core C++ code for running inference on trained models, and multiple language bindings to the C++ core so you can use 🐸STT in your favorite language.
Whenever you add a new feature to DeepSpeech and what to contribute that feature back to the project, here are some things to keep in mind: Whenever you add a new feature to 🐸STT and what to contribute that feature back to the project, here are some things to keep in mind:
1. You've made changes to the core C++ code. Core changes can have downstream effects on all parts of the DeepSpeech project, so keep that in mind. You should minimally also make necessary changes to the C client (i.e. **args.h** and **client.cc**). The bindings for Python, Java, and Javascript are SWIG generated, and in the best-case scenario you won't have to worry about them. However, if you've added a whole new feature, you may need to make custom tweaks to those bindings, because SWIG may not automagically work with your new feature, especially if you've exposed new arguments. The bindings for .NET and Swift are not generated automatically. It would be best if you also made the necessary manual changes to these bindings as well. It is best to communicate with the core DeepSpeech team and come to an understanding of where you will likely need to work with the bindings. They can't predict all the bugs you will run into, but they will have a good idea of how to plan for some obvious challenges. 1. You've made changes to the core C++ code. Core changes can have downstream effects on all parts of the 🐸STT project, so keep that in mind. You should minimally also make necessary changes to the C client (i.e. **args.h** and **client.cc**). The bindings for Python, Java, and Javascript are SWIG generated, and in the best-case scenario you won't have to worry about them. However, if you've added a whole new feature, you may need to make custom tweaks to those bindings, because SWIG may not automagically work with your new feature, especially if you've exposed new arguments. The bindings for .NET and Swift are not generated automatically. It would be best if you also made the necessary manual changes to these bindings as well. It is best to communicate with the core 🐸STT team and come to an understanding of where you will likely need to work with the bindings. They can't predict all the bugs you will run into, but they will have a good idea of how to plan for some obvious challenges.
2. You've made changes to the Python code. Make sure you run a linter (described below). 2. You've made changes to the Python code. Make sure you run a linter (described below).
3. Make sure your new feature doesn't regress the project. If you've added a significant feature or amount of code, you want to be sure your new feature doesn't create performance issues. For example, if you've made a change to the DeepSpeech decoder, you should know that inference performance doesn't drop in terms of latency, accuracy, or memory usage. Unless you're proposing a new decoding algorithm, you probably don't have to worry about affecting accuracy. However, it's very possible you've affected latency or memory usage. You should run local performance tests to make sure no bugs have crept in. There are lots of tools to check latency and memory usage, and you should use what is most comfortable for you and gets the job done. If you're on Linux, you might find [[perf](https://perf.wiki.kernel.org/index.php/Main_Page)] to be a useful tool. You can use sample WAV files for testing which are provided in the `DeepSpeech/data/` directory. 3. Make sure your new feature doesn't regress the project. If you've added a significant feature or amount of code, you want to be sure your new feature doesn't create performance issues. For example, if you've made a change to the 🐸STT decoder, you should know that inference performance doesn't drop in terms of latency, accuracy, or memory usage. Unless you're proposing a new decoding algorithm, you probably don't have to worry about affecting accuracy. However, it's very possible you've affected latency or memory usage. You should run local performance tests to make sure no bugs have crept in. There are lots of tools to check latency and memory usage, and you should use what is most comfortable for you and gets the job done. If you're on Linux, you might find [[perf](https://perf.wiki.kernel.org/index.php/Main_Page)] to be a useful tool. You can use sample WAV files for testing which are provided in the `STT/data/` directory.
Requesting review on your PR Requesting review on your PR
---------------------------- ----------------------------

View File

@ -3,8 +3,8 @@
# Need devel version cause we need /usr/include/cudnn.h # Need devel version cause we need /usr/include/cudnn.h
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
ENV DEEPSPEECH_REPO=#DEEPSPEECH_REPO# ENV STT_REPO=#STT_REPO#
ENV DEEPSPEECH_SHA=#DEEPSPEECH_SHA# ENV STT_SHA=#STT_SHA#
# >> START Install base software # >> START Install base software
@ -113,15 +113,15 @@ RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
WORKDIR / WORKDIR /
RUN git clone --recursive $DEEPSPEECH_REPO DeepSpeech RUN git clone --recursive $STT_REPO STT
WORKDIR /DeepSpeech WORKDIR /STT
RUN git checkout $DEEPSPEECH_SHA RUN git checkout $STT_SHA
RUN git submodule sync tensorflow/ RUN git submodule sync tensorflow/
RUN git submodule update --init tensorflow/ RUN git submodule update --init tensorflow/
# >> START Build and bind # >> START Build and bind
WORKDIR /DeepSpeech/tensorflow WORKDIR /STT/tensorflow
# Fix for not found script https://github.com/tensorflow/tensorflow/issues/471 # Fix for not found script https://github.com/tensorflow/tensorflow/issues/471
RUN ./configure RUN ./configure
@ -132,7 +132,7 @@ RUN ./configure
# passing LD_LIBRARY_PATH is required cause Bazel doesn't pickup it from environment # passing LD_LIBRARY_PATH is required cause Bazel doesn't pickup it from environment
# Build DeepSpeech # Build STT
RUN bazel build \ RUN bazel build \
--workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" \ --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" \
--config=monolithic \ --config=monolithic \
@ -149,22 +149,22 @@ RUN bazel build \
--copt=-msse4.2 \ --copt=-msse4.2 \
--copt=-mavx \ --copt=-mavx \
--copt=-fvisibility=hidden \ --copt=-fvisibility=hidden \
//native_client:libdeepspeech.so \ //native_client:libstt.so \
--verbose_failures \ --verbose_failures \
--action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
# Copy built libs to /DeepSpeech/native_client # Copy built libs to /STT/native_client
RUN cp bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/ RUN cp bazel-bin/native_client/libstt.so /STT/native_client/
# Build client.cc and install Python client and decoder bindings # Build client.cc and install Python client and decoder bindings
ENV TFDIR /DeepSpeech/tensorflow ENV TFDIR /STT/tensorflow
RUN nproc RUN nproc
WORKDIR /DeepSpeech/native_client WORKDIR /STT/native_client
RUN make NUM_PROCESSES=$(nproc) deepspeech RUN make NUM_PROCESSES=$(nproc) stt
WORKDIR /DeepSpeech WORKDIR /STT
RUN cd native_client/python && make NUM_PROCESSES=$(nproc) bindings RUN cd native_client/python && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install --upgrade native_client/python/dist/*.whl RUN pip3 install --upgrade native_client/python/dist/*.whl
@ -176,8 +176,8 @@ RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
# Allow Python printing utf-8 # Allow Python printing utf-8
ENV PYTHONIOENCODING UTF-8 ENV PYTHONIOENCODING UTF-8
# Build KenLM in /DeepSpeech/native_client/kenlm folder # Build KenLM in /STT/native_client/kenlm folder
WORKDIR /DeepSpeech/native_client WORKDIR /STT/native_client
RUN rm -rf kenlm && \ RUN rm -rf kenlm && \
git clone https://github.com/kpu/kenlm && \ git clone https://github.com/kpu/kenlm && \
cd kenlm && \ cd kenlm && \
@ -188,4 +188,4 @@ RUN rm -rf kenlm && \
make -j $(nproc) make -j $(nproc)
# Done # Done
WORKDIR /DeepSpeech WORKDIR /STT

View File

@ -3,8 +3,8 @@
FROM tensorflow/tensorflow:1.15.4-gpu-py3 FROM tensorflow/tensorflow:1.15.4-gpu-py3
ENV DEBIAN_FRONTEND=noninteractive ENV DEBIAN_FRONTEND=noninteractive
ENV DEEPSPEECH_REPO=#DEEPSPEECH_REPO# ENV STT_REPO=#STT_REPO#
ENV DEEPSPEECH_SHA=#DEEPSPEECH_SHA# ENV STT_SHA=#STT_SHA#
RUN apt-get update && apt-get install -y --no-install-recommends \ RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \ apt-utils \
@ -20,7 +20,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
unzip \ unzip \
wget wget
# We need to remove it because it's breaking deepspeech install later with # We need to remove it because it's breaking STT install later with
# weird errors about setuptools # weird errors about setuptools
RUN apt-get purge -y python3-xdg RUN apt-get purge -y python3-xdg
@ -31,10 +31,10 @@ RUN apt-get install -y --no-install-recommends libopus0 libsndfile1
RUN rm -rf /var/lib/apt/lists/* RUN rm -rf /var/lib/apt/lists/*
WORKDIR / WORKDIR /
RUN git clone $DEEPSPEECH_REPO DeepSpeech RUN git clone $STT_REPO STT
WORKDIR /DeepSpeech WORKDIR /STT
RUN git checkout $DEEPSPEECH_SHA RUN git checkout $STT_SHA
# Build CTC decoder first, to avoid clashes on incompatible versions upgrades # Build CTC decoder first, to avoid clashes on incompatible versions upgrades
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
@ -43,7 +43,7 @@ RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
# Prepare deps # Prepare deps
RUN pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0 RUN pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0
# Install DeepSpeech # Install STT
# - No need for the decoder since we did it earlier # - No need for the decoder since we did it earlier
# - There is already correct TensorFlow GPU installed on the base image, # - There is already correct TensorFlow GPU installed on the base image,
# we don't want to break that # we don't want to break that
@ -54,7 +54,7 @@ RUN python3 util/taskcluster.py --source tensorflow --branch r1.15 \
--artifact convert_graphdef_memmapped_format --target . --artifact convert_graphdef_memmapped_format --target .
# Build KenLM to generate new scorers # Build KenLM to generate new scorers
WORKDIR /DeepSpeech/native_client WORKDIR /STT/native_client
RUN rm -rf kenlm && \ RUN rm -rf kenlm && \
git clone https://github.com/kpu/kenlm && \ git clone https://github.com/kpu/kenlm && \
cd kenlm && \ cd kenlm && \
@ -63,6 +63,6 @@ RUN rm -rf kenlm && \
cd build && \ cd build && \
cmake .. && \ cmake .. && \
make -j $(nproc) make -j $(nproc)
WORKDIR /DeepSpeech WORKDIR /STT
RUN ./bin/run-ldc93s1.sh RUN ./bin/run-ldc93s1.sh

View File

@ -1 +1 @@
training/deepspeech_training/GRAPH_VERSION training/coqui_stt_training/GRAPH_VERSION

View File

@ -1,8 +1,8 @@
DEEPSPEECH_REPO ?= https://github.com/mozilla/DeepSpeech.git STT_REPO ?= https://github.com/coqui-ai/STT.git
DEEPSPEECH_SHA ?= origin/master STT_SHA ?= origin/main
Dockerfile%: Dockerfile%.tmpl Dockerfile%: Dockerfile%.tmpl
sed \ sed \
-e "s|#DEEPSPEECH_REPO#|$(DEEPSPEECH_REPO)|g" \ -e "s|#STT_REPO#|$(STT_REPO)|g" \
-e "s|#DEEPSPEECH_SHA#|$(DEEPSPEECH_SHA)|g" \ -e "s|#STT_SHA#|$(STT_SHA)|g" \
< $< > $@ < $< > $@

View File

@ -1,12 +0,0 @@
Making a (new) release of the codebase
======================================
* Update version in VERSION file, commit
* Open PR, ensure all tests are passing properly
* Merge the PR
* Fetch the new master, tag it with (hopefully) the same version as in VERSION
* Push that to Github
* New build should be triggered and new packages should be made
* TaskCluster should schedule a merge build **including** a "DeepSpeech Packages" task

View File

@ -5,8 +5,8 @@ Contact/Getting Help
There are several ways to contact us or to get help: There are several ways to contact us or to get help:
#. `Discourse Forums <https://discourse.mozilla.org/c/deep-speech>`_ - The `Deep Speech category on Discourse <https://discourse.mozilla.org/c/deep-speech>`_ is the first place to look. Search for keywords related to your question or problem to see if someone else has run into it already. If you can't find anything relevant there, search on our `issue tracker <https://github.com/mozilla/deepspeech/issues>`_ to see if there is an existing issue about your problem. #. `GitHub Discussions <https://github.com/coqui-ai/STT/discussions>`_ - `GitHub Discussions <https://github.com/coqui-ai/STT/discussions>`_ is the first place to look. Search for keywords related to your question or problem to see if someone else has run into it already. If you can't find anything relevant there, search on our `issue tracker <https://github.com/coqui-ai/STT/issues>`_ to see if there is an existing issue about your problem.
#. `Matrix chat <https://chat.mozilla.org/#/room/#machinelearning:mozilla.org>`_ - If your question is not addressed by either the `FAQ <https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions>`_ or `Discourse Forums <https://discourse.mozilla.org/c/deep-speech>`_\ , you can contact us on the ``#machinelearning`` channel on `Mozilla Matrix <https://chat.mozilla.org/#/room/#machinelearning:mozilla.org>`_\ ; people there can try to answer/help #. `Matrix chat <https://matrix.to/#/+coqui:matrix.org>`_ - If your question is not addressed on `GitHub Discussions <https://github.com/coqui-ai/STT/discussions>`_\ , you can contact us on the ``#stt:matrix.org`` `channel on Matrix <https://matrix.to/#/#stt:matrix.org?via=matrix.org>`_.
#. `Create a new issue <https://github.com/mozilla/deepspeech/issues>`_ - Finally, if you have a bug report or a feature request that isn't already covered by an existing issue, please open an issue in our repo and fill the appropriate information on your hardware and software setup. #. `Create a new issue <https://github.com/coqui-ai/STT/issues>`_ - Finally, if you have a bug report or a feature request that isn't already covered by an existing issue, please open an issue in our repo and fill the appropriate information on your hardware and software setup.

View File

@ -1 +1 @@
training/deepspeech_training/VERSION training/coqui_stt_training/VERSION

View File

@ -6,8 +6,8 @@ import sys
import argparse import argparse
import numpy as np import numpy as np
from deepspeech_training.util.audio import AUDIO_TYPE_NP, mean_dbfs from coqui_stt_training.util.audio import AUDIO_TYPE_NP, mean_dbfs
from deepspeech_training.util.sample_collections import load_sample from coqui_stt_training.util.sample_collections import load_sample
def fail(message): def fail(message):

View File

@ -8,20 +8,20 @@ import argparse
import progressbar import progressbar
from pathlib import Path from pathlib import Path
from deepspeech_training.util.audio import ( from coqui_stt_training.util.audio import (
AUDIO_TYPE_PCM, AUDIO_TYPE_PCM,
AUDIO_TYPE_OPUS, AUDIO_TYPE_OPUS,
AUDIO_TYPE_WAV, AUDIO_TYPE_WAV,
change_audio_types, change_audio_types,
) )
from deepspeech_training.util.downloader import SIMPLE_BAR from coqui_stt_training.util.downloader import SIMPLE_BAR
from deepspeech_training.util.sample_collections import ( from coqui_stt_training.util.sample_collections import (
CSVWriter, CSVWriter,
DirectSDBWriter, DirectSDBWriter,
TarWriter, TarWriter,
samples_from_sources, samples_from_sources,
) )
from deepspeech_training.util.augmentations import ( from coqui_stt_training.util.augmentations import (
parse_augmentations, parse_augmentations,
apply_sample_augmentations, apply_sample_augmentations,
SampleAugmentation SampleAugmentation

View File

@ -5,7 +5,7 @@ import tarfile
import pandas import pandas
from deepspeech_training.util.importers import get_importers_parser from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"] COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -5,7 +5,7 @@ import tarfile
import pandas import pandas
from deepspeech_training.util.importers import get_importers_parser from coqui_stt_training.util.importers import get_importers_parser
COLUMNNAMES = ["wav_filename", "wav_filesize", "transcript"] COLUMNNAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -30,9 +30,9 @@ except ImportError as ex:
import requests import requests
import json import json
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.helpers import secs_to_hours from coqui_stt_training.util.helpers import secs_to_hours
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_importers_parser, get_importers_parser,
get_imported_samples, get_imported_samples,

View File

@ -10,13 +10,13 @@ from multiprocessing import Pool
import progressbar import progressbar
import sox import sox
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
print_import_report, print_import_report,
) )
from deepspeech_training.util.importers import validate_label_eng as validate_label from coqui_stt_training.util.importers import validate_label_eng as validate_label
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"] FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000 SAMPLE_RATE = 16000
@ -35,7 +35,7 @@ def _download_and_preprocess_data(target_dir):
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL) archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract common voice data # Conditionally extract common voice data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path) _maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert common voice CSV files and mp3 data to DeepSpeech CSVs and wav # Conditionally convert common voice CSV files and mp3 data to Coqui STT CSVs and wav
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME) _maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)

View File

@ -3,7 +3,7 @@
Broadly speaking, this script takes the audio downloaded from Common Voice Broadly speaking, this script takes the audio downloaded from Common Voice
for a certain language, in addition to the *.tsv files output by CorporaCreator, for a certain language, in addition to the *.tsv files output by CorporaCreator,
and the script formats the data and transcripts to be in a state usable by and the script formats the data and transcripts to be in a state usable by
DeepSpeech.py train.py
Use "python3 import_cv2.py -h" for help Use "python3 import_cv2.py -h" for help
""" """
import csv import csv
@ -15,8 +15,8 @@ from multiprocessing import Pool
import progressbar import progressbar
import sox import sox
from deepspeech_training.util.downloader import SIMPLE_BAR from coqui_stt_training.util.downloader import SIMPLE_BAR
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
get_importers_parser, get_importers_parser,
@ -138,9 +138,9 @@ def _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_ever
print_import_report(counter, SAMPLE_RATE, MAX_SECS) print_import_report(counter, SAMPLE_RATE, MAX_SECS)
output_csv = os.path.join(os.path.abspath(audio_dir), dataset + ".csv") output_csv = os.path.join(os.path.abspath(audio_dir), dataset + ".csv")
print("Saving new DeepSpeech-formatted CSV file to: ", output_csv) print("Saving new Coqui STT-formatted CSV file to: ", output_csv)
with open(output_csv, "w", encoding="utf-8", newline="") as output_csv_file: with open(output_csv, "w", encoding="utf-8", newline="") as output_csv_file:
print("Writing CSV file for DeepSpeech.py as: ", output_csv) print("Writing CSV file for train.py as: ", output_csv)
writer = csv.DictWriter(output_csv_file, fieldnames=FIELDNAMES) writer = csv.DictWriter(output_csv_file, fieldnames=FIELDNAMES)
writer.writeheader() writer.writeheader()
bar = progressbar.ProgressBar(max_value=len(rows), widgets=SIMPLE_BAR) bar = progressbar.ProgressBar(max_value=len(rows), widgets=SIMPLE_BAR)

View File

@ -11,7 +11,7 @@ import librosa
import pandas import pandas
import soundfile # <= Has an external dependency on libsndfile import soundfile # <= Has an external dependency on libsndfile
from deepspeech_training.util.importers import validate_label_eng as validate_label from coqui_stt_training.util.importers import validate_label_eng as validate_label
# Prerequisite: Having the sph2pipe tool in your PATH: # Prerequisite: Having the sph2pipe tool in your PATH:
# https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools # https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools

View File

@ -6,7 +6,7 @@ import tarfile
import numpy as np import numpy as np
import pandas import pandas
from deepspeech_training.util.importers import get_importers_parser from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"] COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -12,7 +12,7 @@ import pandas as pd
from sox import Transformer from sox import Transformer
import swifter import swifter
from deepspeech_training.util.importers import get_importers_parser, get_validate_label from coqui_stt_training.util.importers import get_importers_parser, get_validate_label
__version__ = "0.1.0" __version__ = "0.1.0"
_logger = logging.getLogger(__name__) _logger = logging.getLogger(__name__)

View File

@ -4,7 +4,7 @@ import sys
import pandas import pandas
from deepspeech_training.util.downloader import maybe_download from coqui_stt_training.util.downloader import maybe_download
def _download_and_preprocess_data(data_dir): def _download_and_preprocess_data(data_dir):

View File

@ -12,7 +12,7 @@ import progressbar
from sox import Transformer from sox import Transformer
from tensorflow.python.platform import gfile from tensorflow.python.platform import gfile
from deepspeech_training.util.downloader import maybe_download from coqui_stt_training.util.downloader import maybe_download
SAMPLE_RATE = 16000 SAMPLE_RATE = 16000

View File

@ -12,8 +12,8 @@ from multiprocessing import Pool
import progressbar import progressbar
import sox import sox
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
get_importers_parser, get_importers_parser,

View File

@ -10,8 +10,8 @@ from multiprocessing import Pool
import progressbar import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
get_importers_parser, get_importers_parser,

View File

@ -6,7 +6,7 @@ import wave
import pandas import pandas
from deepspeech_training.util.importers import get_importers_parser from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"] COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -7,7 +7,7 @@ import tarfile
import numpy as np import numpy as np
import pandas import pandas
from deepspeech_training.util.importers import get_importers_parser from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"] COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -9,8 +9,8 @@ from multiprocessing import Pool
import progressbar import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
get_importers_parser, get_importers_parser,

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python #!/usr/bin/env python
# ensure that you have downloaded the LDC dataset LDC97S62 and tar exists in a folder e.g. # ensure that you have downloaded the LDC dataset LDC97S62 and tar exists in a folder e.g.
# ./data/swb/swb1_LDC97S62.tgz # ./data/swb/swb1_LDC97S62.tgz
# from the deepspeech directory run with: ./bin/import_swb.py ./data/swb/ # from the Coqui STT directory run with: ./bin/import_swb.py ./data/swb/
import codecs import codecs
import fnmatch import fnmatch
import os import os
@ -17,7 +17,7 @@ import pandas
import requests import requests
import soundfile # <= Has an external dependency on libsndfile import soundfile # <= Has an external dependency on libsndfile
from deepspeech_training.util.importers import validate_label_eng as validate_label from coqui_stt_training.util.importers import validate_label_eng as validate_label
# ARCHIVE_NAME refers to ISIP alignments from 01/29/03 # ARCHIVE_NAME refers to ISIP alignments from 01/29/03
ARCHIVE_NAME = "switchboard_word_alignments.tar.gz" ARCHIVE_NAME = "switchboard_word_alignments.tar.gz"

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python #!/usr/bin/env python
""" """
Downloads and prepares (parts of) the "Spoken Wikipedia Corpora" for DeepSpeech.py Downloads and prepares (parts of) the "Spoken Wikipedia Corpora" for train.py
Use "python3 import_swc.py -h" for help Use "python3 import_swc.py -h" for help
""" """
@ -22,8 +22,8 @@ from multiprocessing.pool import ThreadPool
import progressbar import progressbar
import sox import sox
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import validate_label_eng as validate_label from coqui_stt_training.util.importers import validate_label_eng as validate_label
from ds_ctcdecoder import Alphabet from ds_ctcdecoder import Alphabet
SWC_URL = "https://www2.informatik.uni-hamburg.de/nats/pub/SWC/SWC_{language}.tar" SWC_URL = "https://www2.informatik.uni-hamburg.de/nats/pub/SWC/SWC_{language}.tar"

View File

@ -10,8 +10,8 @@ import pandas
from sox import Transformer from sox import Transformer
from tensorflow.python.platform import gfile from tensorflow.python.platform import gfile
from deepspeech_training.util.downloader import maybe_download from coqui_stt_training.util.downloader import maybe_download
from deepspeech_training.util.stm import parse_stm_file from coqui_stt_training.util.stm import parse_stm_file
def _download_and_preprocess_data(data_dir): def _download_and_preprocess_data(data_dir):

View File

@ -10,8 +10,8 @@ import progressbar
import sox import sox
import unidecode import unidecode
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
get_importers_parser, get_importers_parser,
@ -25,7 +25,7 @@ MAX_SECS = 15
ARCHIVE_NAME = "2019-04-11_fr_FR" ARCHIVE_NAME = "2019-04-11_fr_FR"
ARCHIVE_DIR_NAME = "ts_" + ARCHIVE_NAME ARCHIVE_DIR_NAME = "ts_" + ARCHIVE_NAME
ARCHIVE_URL = ( ARCHIVE_URL = (
"https://deepspeech-storage-mirror.s3.fr-par.scw.cloud/" + ARCHIVE_NAME + ".zip" "https://Coqui STT-storage-mirror.s3.fr-par.scw.cloud/" + ARCHIVE_NAME + ".zip"
) )
@ -38,7 +38,7 @@ def _download_and_preprocess_data(target_dir, english_compatible=False):
) )
# Conditionally extract archive data # Conditionally extract archive data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path) _maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert TrainingSpeech data to DeepSpeech CSVs and wav # Conditionally convert TrainingSpeech data to Coqui STT CSVs and wav
_maybe_convert_sets( _maybe_convert_sets(
target_dir, ARCHIVE_DIR_NAME, english_compatible=english_compatible target_dir, ARCHIVE_DIR_NAME, english_compatible=english_compatible
) )

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python #!/usr/bin/env python
""" """
Downloads and prepares (parts of) the "German Distant Speech" corpus (TUDA) for DeepSpeech.py Downloads and prepares (parts of) the "German Distant Speech" corpus (TUDA) for train.py
Use "python3 import_tuda.py -h" for help Use "python3 import_tuda.py -h" for help
""" """
import argparse import argparse
@ -14,8 +14,8 @@ from collections import Counter
import progressbar import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import validate_label_eng as validate_label from coqui_stt_training.util.importers import validate_label_eng as validate_label
from ds_ctcdecoder import Alphabet from ds_ctcdecoder import Alphabet
TUDA_VERSION = "v2" TUDA_VERSION = "v2"

View File

@ -11,8 +11,8 @@ from zipfile import ZipFile
import librosa import librosa
import progressbar import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import ( from coqui_stt_training.util.importers import (
get_counter, get_counter,
get_imported_samples, get_imported_samples,
print_import_report, print_import_report,
@ -35,7 +35,7 @@ def _download_and_preprocess_data(target_dir):
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL) archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract common voice data # Conditionally extract common voice data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path) _maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert common voice CSV files and mp3 data to DeepSpeech CSVs and wav # Conditionally convert common voice CSV files and mp3 data to Coqui STT CSVs and wav
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME) _maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)

View File

@ -14,7 +14,7 @@ from os import makedirs, path
import pandas import pandas
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from tensorflow.python.platform import gfile from tensorflow.python.platform import gfile
from deepspeech_training.util.downloader import maybe_download from coqui_stt_training.util.downloader import maybe_download
"""The number of jobs to run in parallel""" """The number of jobs to run in parallel"""
NUM_PARALLEL = 8 NUM_PARALLEL = 8

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python #!/usr/bin/env python
""" """
Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) and DeepSpeech CSV files Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) and 🐸STT CSV files
Use "python3 play.py -h" for help Use "python3 play.py -h" for help
""" """
@ -9,9 +9,9 @@ import sys
import random import random
import argparse import argparse
from deepspeech_training.util.audio import get_loadable_audio_type_from_extension, AUDIO_TYPE_PCM, AUDIO_TYPE_WAV from coqui_stt_training.util.audio import get_loadable_audio_type_from_extension, AUDIO_TYPE_PCM, AUDIO_TYPE_WAV
from deepspeech_training.util.sample_collections import SampleList, LabeledSample, samples_from_source from coqui_stt_training.util.sample_collections import SampleList, LabeledSample, samples_from_source
from deepspeech_training.util.augmentations import parse_augmentations, apply_sample_augmentations, SampleAugmentation from coqui_stt_training.util.augmentations import parse_augmentations, apply_sample_augmentations, SampleAugmentation
def get_samples_in_play_order(): def get_samples_in_play_order():
@ -68,7 +68,7 @@ def play_collection():
def handle_args(): def handle_args():
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) " description="Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) "
"and DeepSpeech CSV files" "and Coqui STT CSV files"
) )
parser.add_argument("source", help="Sample DB, CSV or WAV file to play samples from") parser.add_argument("source", help="Sample DB, CSV or WAV file to play samples from")
parser.add_argument( parser.add_argument(

View File

@ -1,7 +1,7 @@
#!/bin/sh #!/bin/sh
set -xe set -xe
if [ ! -f DeepSpeech.py ]; then if [ ! -f train.py ]; then
echo "Please make sure you run this from DeepSpeech's top level directory." echo "Please make sure you run this from STT's top level directory."
exit 1 exit 1
fi; fi;
@ -20,7 +20,7 @@ fi
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar \ python -u train.py --noshow_progressbar \
--train_files data/ldc93s1/ldc93s1.csv \ --train_files data/ldc93s1/ldc93s1.csv \
--test_files data/ldc93s1/ldc93s1.csv \ --test_files data/ldc93s1/ldc93s1.csv \
--train_batch_size 1 \ --train_batch_size 1 \

View File

@ -14,7 +14,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--scorer "" \ --scorer "" \
--augment dropout \ --augment dropout \

View File

@ -14,7 +14,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_csv} --dev_batch_size 1 \
--test_files ${ldc93s1_csv} --test_batch_size 1 \ --test_files ${ldc93s1_csv} --test_batch_size 1 \

View File

@ -14,7 +14,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_csv} --dev_batch_size 1 \
--test_files ${ldc93s1_csv} --test_batch_size 1 \ --test_files ${ldc93s1_csv} --test_batch_size 1 \

View File

@ -20,7 +20,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_sdb} --train_batch_size 1 \ --train_files ${ldc93s1_sdb} --train_batch_size 1 \
--dev_files ${ldc93s1_sdb} --dev_batch_size 1 \ --dev_files ${ldc93s1_sdb} --dev_batch_size 1 \
--test_files ${ldc93s1_sdb} --test_batch_size 1 \ --test_files ${ldc93s1_sdb} --test_batch_size 1 \

View File

@ -17,7 +17,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--feature_cache '/tmp/ldc93s1_cache' \ --feature_cache '/tmp/ldc93s1_cache' \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_csv} --dev_batch_size 1 \

View File

@ -17,7 +17,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--feature_cache '/tmp/ldc93s1_cache' \ --feature_cache '/tmp/ldc93s1_cache' \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_csv} --dev_batch_size 1 \

View File

@ -16,7 +16,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar \ python -u train.py --noshow_progressbar \
--n_hidden 100 \ --n_hidden 100 \
--checkpoint_dir '/tmp/ckpt_bytes' \ --checkpoint_dir '/tmp/ckpt_bytes' \
--export_dir '/tmp/train_bytes_tflite' \ --export_dir '/tmp/train_bytes_tflite' \

View File

@ -17,7 +17,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_csv} --dev_batch_size 1 \
--test_files ${ldc93s1_csv} --test_batch_size 1 \ --test_files ${ldc93s1_csv} --test_batch_size 1 \

View File

@ -23,7 +23,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_sdb} --train_batch_size 1 \ --train_files ${ldc93s1_sdb} --train_batch_size 1 \
--dev_files ${ldc93s1_sdb} --dev_batch_size 1 \ --dev_files ${ldc93s1_sdb} --dev_batch_size 1 \
--test_files ${ldc93s1_sdb} --test_batch_size 1 \ --test_files ${ldc93s1_sdb} --test_batch_size 1 \

View File

@ -23,7 +23,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_sdb},${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_sdb},${ldc93s1_csv} --train_batch_size 1 \
--feature_cache '/tmp/ldc93s1_cache_sdb_csv' \ --feature_cache '/tmp/ldc93s1_cache_sdb_csv' \
--dev_files ${ldc93s1_sdb},${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_sdb},${ldc93s1_csv} --dev_batch_size 1 \

View File

@ -14,7 +14,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--train_files ${ldc93s1_csv} --train_batch_size 1 \ --train_files ${ldc93s1_csv} --train_batch_size 1 \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \ --dev_files ${ldc93s1_csv} --dev_batch_size 1 \
--test_files ${ldc93s1_csv} --test_batch_size 1 \ --test_files ${ldc93s1_csv} --test_batch_size 1 \
@ -23,7 +23,7 @@ python -u DeepSpeech.py --noshow_progressbar --noearly_stop \
--learning_rate 0.001 --dropout_rate 0.05 \ --learning_rate 0.001 --dropout_rate 0.05 \
--scorer_path 'data/smoke_test/pruned_lm.scorer' --scorer_path 'data/smoke_test/pruned_lm.scorer'
python -u DeepSpeech.py \ python -u train.py \
--n_hidden 100 \ --n_hidden 100 \
--checkpoint_dir '/tmp/ckpt' \ --checkpoint_dir '/tmp/ckpt' \
--scorer_path 'data/smoke_test/pruned_lm.scorer' \ --scorer_path 'data/smoke_test/pruned_lm.scorer' \

View File

@ -16,7 +16,7 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break # and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar \ python -u train.py --noshow_progressbar \
--n_hidden 100 \ --n_hidden 100 \
--checkpoint_dir '/tmp/ckpt' \ --checkpoint_dir '/tmp/ckpt' \
--export_dir '/tmp/train_tflite' \ --export_dir '/tmp/train_tflite' \
@ -26,7 +26,7 @@ python -u DeepSpeech.py --noshow_progressbar \
mkdir /tmp/train_tflite/en-us mkdir /tmp/train_tflite/en-us
python -u DeepSpeech.py --noshow_progressbar \ python -u train.py --noshow_progressbar \
--n_hidden 100 \ --n_hidden 100 \
--checkpoint_dir '/tmp/ckpt' \ --checkpoint_dir '/tmp/ckpt' \
--export_dir '/tmp/train_tflite/en-us' \ --export_dir '/tmp/train_tflite/en-us' \

View File

@ -29,7 +29,7 @@ for LOAD in 'init' 'last' 'auto'; do
echo "########################################################" echo "########################################################"
echo "#### Train ENGLISH model with just --checkpoint_dir ####" echo "#### Train ENGLISH model with just --checkpoint_dir ####"
echo "########################################################" echo "########################################################"
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--alphabet_config_path "./data/alphabet.txt" \ --alphabet_config_path "./data/alphabet.txt" \
--load_train "$LOAD" \ --load_train "$LOAD" \
--train_files "${ldc93s1_csv}" --train_batch_size 1 \ --train_files "${ldc93s1_csv}" --train_batch_size 1 \
@ -43,7 +43,7 @@ for LOAD in 'init' 'last' 'auto'; do
echo "##############################################################################" echo "##############################################################################"
echo "#### Train ENGLISH model with --save_checkpoint_dir --load_checkpoint_dir ####" echo "#### Train ENGLISH model with --save_checkpoint_dir --load_checkpoint_dir ####"
echo "##############################################################################" echo "##############################################################################"
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--alphabet_config_path "./data/alphabet.txt" \ --alphabet_config_path "./data/alphabet.txt" \
--load_train "$LOAD" \ --load_train "$LOAD" \
--train_files "${ldc93s1_csv}" --train_batch_size 1 \ --train_files "${ldc93s1_csv}" --train_batch_size 1 \
@ -58,7 +58,7 @@ for LOAD in 'init' 'last' 'auto'; do
echo "####################################################################################" echo "####################################################################################"
echo "#### Transfer to RUSSIAN model with --save_checkpoint_dir --load_checkpoint_dir ####" echo "#### Transfer to RUSSIAN model with --save_checkpoint_dir --load_checkpoint_dir ####"
echo "####################################################################################" echo "####################################################################################"
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \ python -u train.py --noshow_progressbar --noearly_stop \
--drop_source_layers 1 \ --drop_source_layers 1 \
--alphabet_config_path "${ru_dir}/alphabet.ru" \ --alphabet_config_path "${ru_dir}/alphabet.ru" \
--load_train 'last' \ --load_train 'last' \

View File

@ -3,9 +3,9 @@ Language-Specific Data
This directory contains language-specific data files. Most importantly, you will find here: This directory contains language-specific data files. Most importantly, you will find here:
1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m deepspeech_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files. 1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m coqui_stt_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files.
2. A script used to generate a binary n-gram language model: ``data/lm/generate_lm.py``. 2. A script used to generate a binary n-gram language model: ``data/lm/generate_lm.py``.
For more information on how to build these resources from scratch, see the ``External scorer scripts`` section on `deepspeech.readthedocs.io <https://deepspeech.readthedocs.io/>`_. For more information on how to build these resources from scratch, see the ``External scorer scripts`` section on `stt.readthedocs.io <https://stt.readthedocs.io/>`_.

View File

@ -130,7 +130,7 @@ def build_lm(args, data_lower, vocab_str):
def main(): def main():
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Generate lm.binary and top-k vocab for DeepSpeech." description="Generate lm.binary and top-k vocab for Coqui STT."
) )
parser.add_argument( parser.add_argument(
"--input_txt", "--input_txt",

View File

@ -1,5 +1,5 @@
DeepSpeech Model STT Model
================ =========
The aim of this project is to create a simple, open, and ubiquitous speech The aim of this project is to create a simple, open, and ubiquitous speech
recognition engine. Simple, in that the engine should not require server-class recognition engine. Simple, in that the engine should not require server-class

View File

@ -1,12 +1,12 @@
.. _build-native-client: .. _build-native-client:
Building DeepSpeech Binaries Building Coqui STT Binaries
============================ ===========================
This section describes how to rebuild binaries. We have already several prebuilt binaries for all the supported platform, This section describes how to rebuild binaries. We have already several prebuilt binaries for all the supported platform,
it is highly advised to use them except if you know what you are doing. it is highly advised to use them except if you know what you are doing.
If you'd like to build the DeepSpeech binaries yourself, you'll need the following pre-requisites downloaded and installed: If you'd like to build the 🐸STT binaries yourself, you'll need the following pre-requisites downloaded and installed:
* `Bazel 3.1.0 <https://github.com/bazelbuild/bazel/releases/tag/3.1.0>`_ * `Bazel 3.1.0 <https://github.com/bazelbuild/bazel/releases/tag/3.1.0>`_
* `General TensorFlow r2.3 requirements <https://www.tensorflow.org/install/source#tested_build_configurations>`_ * `General TensorFlow r2.3 requirements <https://www.tensorflow.org/install/source#tested_build_configurations>`_
@ -26,18 +26,18 @@ If you'd like to build the language bindings or the decoder package, you'll also
Dependencies Dependencies
------------ ------------
If you follow these instructions, you should compile your own binaries of DeepSpeech (built on TensorFlow using Bazel). If you follow these instructions, you should compile your own binaries of 🐸STT (built on TensorFlow using Bazel).
For more information on configuring TensorFlow, read the docs up to the end of `"Configure the Build" <https://www.tensorflow.org/install/source#configure_the_build>`_. For more information on configuring TensorFlow, read the docs up to the end of `"Configure the Build" <https://www.tensorflow.org/install/source#configure_the_build>`_.
Checkout source code Checkout source code
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
Clone DeepSpeech source code (TensorFlow will come as a submdule): Clone 🐸STT source code (TensorFlow will come as a submdule):
.. code-block:: .. code-block::
git clone https://github.com/mozilla/DeepSpeech.git git clone https://github.com/coqui-ai/STT.git
git submodule sync tensorflow/ git submodule sync tensorflow/
git submodule update --init tensorflow/ git submodule update --init tensorflow/
@ -56,24 +56,24 @@ After you have installed the correct version of Bazel, configure TensorFlow:
cd tensorflow cd tensorflow
./configure ./configure
Compile DeepSpeech Compile Coqui STT
------------------ -----------------
Compile ``libdeepspeech.so`` Compile ``libstt.so``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Within your TensorFlow directory, there should be a symbolic link to the DeepSpeech ``native_client`` directory. If it is not present, create it with the follow command: Within your TensorFlow directory, there should be a symbolic link to the 🐸STT ``native_client`` directory. If it is not present, create it with the follow command:
.. code-block:: .. code-block::
cd tensorflow cd tensorflow
ln -s ../native_client ln -s ../native_client
You can now use Bazel to build the main DeepSpeech library, ``libdeepspeech.so``. Add ``--config=cuda`` if you want a CUDA build. You can now use Bazel to build the main 🐸STT library, ``libstt.so``. Add ``--config=cuda`` if you want a CUDA build.
.. code-block:: .. code-block::
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libstt.so
The generated binaries will be saved to ``bazel-bin/native_client/``. The generated binaries will be saved to ``bazel-bin/native_client/``.
@ -82,24 +82,24 @@ The generated binaries will be saved to ``bazel-bin/native_client/``.
Compile ``generate_scorer_package`` Compile ``generate_scorer_package``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Following the same setup as for ``libdeepspeech.so`` above, you can rebuild the ``generate_scorer_package`` binary by adding its target to the command line: ``//native_client:generate_scorer_package``. Following the same setup as for ``libstt.so`` above, you can rebuild the ``generate_scorer_package`` binary by adding its target to the command line: ``//native_client:generate_scorer_package``.
Using the example from above you can build the library and that binary at the same time: Using the example from above you can build the library and that binary at the same time:
.. code-block:: .. code-block::
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_scorer_package bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-fvisibility=hidden //native_client:libstt.so //native_client:generate_scorer_package
The generated binaries will be saved to ``bazel-bin/native_client/``. The generated binaries will be saved to ``bazel-bin/native_client/``.
Compile Language Bindings Compile Language Bindings
^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
Now, ``cd`` into the ``DeepSpeech/native_client`` directory and use the ``Makefile`` to build all the language bindings (C++ client, Python package, Nodejs package, etc.). Now, ``cd`` into the ``STT/native_client`` directory and use the ``Makefile`` to build all the language bindings (C++ client, Python package, Nodejs package, etc.).
.. code-block:: .. code-block::
cd ../DeepSpeech/native_client cd ../STT/native_client
make deepspeech make stt
Installing your own Binaries Installing your own Binaries
---------------------------- ----------------------------
@ -121,9 +121,9 @@ Included are a set of generated Python bindings. After following the above build
cd native_client/python cd native_client/python
make bindings make bindings
pip install dist/deepspeech* pip install dist/stt-*
The API mirrors the C++ API and is demonstrated in `client.py <python/client.py>`_. Refer to `deepspeech.h <deepspeech.h>`_ for documentation. The API mirrors the C++ API and is demonstrated in `client.py <python/client.py>`_. Refer to `coqui-stt.h <coqui-stt.h>`_ for documentation.
Install NodeJS / ElectronJS bindings Install NodeJS / ElectronJS bindings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -136,7 +136,7 @@ After following the above build and installation instructions, the Node.JS bindi
make build make build
make npm-pack make npm-pack
This will create the package ``deepspeech-VERSION.tgz`` in ``native_client/javascript``. This will create the package ``stt-VERSION.tgz`` in ``native_client/javascript``.
Install the CTC decoder package Install the CTC decoder package
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -196,23 +196,23 @@ So your command line for ``RPi3`` and ``ARMv7`` should look like:
.. code-block:: .. code-block::
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=rpi3 --config=rpi3_opt -c opt --copt=-O3 --copt=-fvisibility=hidden //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=rpi3 --config=rpi3_opt -c opt --copt=-O3 --copt=-fvisibility=hidden //native_client:libstt.so
And your command line for ``LePotato`` and ``ARM64`` should look like: And your command line for ``LePotato`` and ``ARM64`` should look like:
.. code-block:: .. code-block::
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=rpi3-armv8 --config=rpi3-armv8_opt -c opt --copt=-O3 --copt=-fvisibility=hidden //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=rpi3-armv8 --config=rpi3-armv8_opt -c opt --copt=-O3 --copt=-fvisibility=hidden //native_client:libstt.so
While we test only on RPi3 Raspbian Buster and LePotato ARMBian Buster, anything compatible with ``armv7-a cortex-a53`` or ``armv8-a cortex-a53`` should be fine. While we test only on RPi3 Raspbian Buster and LePotato ARMBian Buster, anything compatible with ``armv7-a cortex-a53`` or ``armv8-a cortex-a53`` should be fine.
The ``deepspeech`` binary can also be cross-built, with ``TARGET=rpi3`` or ``TARGET=rpi3-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multitrap configuration files: ``native_client/multistrap_armbian64_buster.conf`` and ``native_client/multistrap_raspbian_buster.conf``. The ``stt`` binary can also be cross-built, with ``TARGET=rpi3`` or ``TARGET=rpi3-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multitrap configuration files: ``native_client/multistrap_armbian64_buster.conf`` and ``native_client/multistrap_raspbian_buster.conf``.
The path of the system tree can be overridden from the default values defined in ``definitions.mk`` through the ``RASPBIAN`` ``make`` variable. The path of the system tree can be overridden from the default values defined in ``definitions.mk`` through the ``RASPBIAN`` ``make`` variable.
.. code-block:: .. code-block::
cd ../DeepSpeech/native_client cd ../STT/native_client
make TARGET=<system> deepspeech make TARGET=<system> stt
Android devices support Android devices support
----------------------- -----------------------
@ -224,64 +224,66 @@ Please refer to TensorFlow documentation on how to setup the environment to buil
Using the library from Android project Using the library from Android project
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We provide uptodate and tested ``libdeepspeech`` usable as an ``AAR`` package, Due to the discontinuation of Bintray JCenter we do not have pre-built Android packages published for now. We are working to move to Maven Central and will update this section when it's available.
for Android versions starting with 7.0 to 11.0. The package is published on
`JCenter <https://bintray.com/alissy/org.mozilla.deepspeech/libdeepspeech>`_,
and the ``JCenter`` repository should be available by default in any Android
project. Please make sure your project is setup to pull from this repository.
You can then include the library by just adding this line to your
``gradle.build``, adjusting ``VERSION`` to the version you need:
.. code-block:: .. We provide uptodate and tested ``libstt`` usable as an ``AAR`` package,
for Android versions starting with 7.0 to 11.0. The package is published on
`JCenter <https://bintray.com/coqui/ai.coqui.stt/libstt>`_,
and the ``JCenter`` repository should be available by default in any Android
project. Please make sure your project is setup to pull from this repository.
You can then include the library by just adding this line to your
``gradle.build``, adjusting ``VERSION`` to the version you need:
implementation 'deepspeech.mozilla.org:libdeepspeech:VERSION@aar' .. code-block::
Building ``libdeepspeech.so`` implementation 'stt.coqui.ai:libstt:VERSION@aar'
Building ``libstt.so``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can build the ``libdeepspeech.so`` using (ARMv7): You can build the ``libstt.so`` using (ARMv7):
.. code-block:: .. code-block::
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=android --config=android_arm --define=runtime=tflite --action_env ANDROID_NDK_API_LEVEL=21 --cxxopt=-std=c++14 --copt=-D_GLIBCXX_USE_C99 //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=android --config=android_arm --define=runtime=tflite --action_env ANDROID_NDK_API_LEVEL=21 --cxxopt=-std=c++14 --copt=-D_GLIBCXX_USE_C99 //native_client:libstt.so
Or (ARM64): Or (ARM64):
.. code-block:: .. code-block::
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=android --config=android_arm64 --define=runtime=tflite --action_env ANDROID_NDK_API_LEVEL=21 --cxxopt=-std=c++14 --copt=-D_GLIBCXX_USE_C99 //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" --config=monolithic --config=android --config=android_arm64 --define=runtime=tflite --action_env ANDROID_NDK_API_LEVEL=21 --cxxopt=-std=c++14 --copt=-D_GLIBCXX_USE_C99 //native_client:libstt.so
Building ``libdeepspeech.aar`` Building ``libstt.aar``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the unlikely event you have to rebuild the JNI bindings, source code is In the unlikely event you have to rebuild the JNI bindings, source code is
available under the ``libdeepspeech`` subdirectory. Building depends on shared available under the ``libstt`` subdirectory. Building depends on shared
object: please ensure to place ``libdeepspeech.so`` into the object: please ensure to place ``libstt.so`` into the
``libdeepspeech/libs/{arm64-v8a,armeabi-v7a,x86_64}/`` matching subdirectories. ``libstt/libs/{arm64-v8a,armeabi-v7a,x86_64}/`` matching subdirectories.
Building the bindings is managed by ``gradle`` and should be limited to issuing Building the bindings is managed by ``gradle`` and should be limited to issuing
``./gradlew libdeepspeech:build``, producing an ``AAR`` package in ``./gradlew libstt:build``, producing an ``AAR`` package in
``./libdeepspeech/build/outputs/aar/``. ``./libstt/build/outputs/aar/``.
Please note that you might have to copy the file to a local Maven repository Please note that you might have to copy the file to a local Maven repository
and adapt file naming (when missing, the error message should states what and adapt file naming (when missing, the error message should states what
filename it expects and where). filename it expects and where).
Building C++ ``deepspeech`` binary Building C++ ``stt`` binary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Building the ``deepspeech`` binary will happen through ``ndk-build`` (ARMv7): Building the ``stt`` binary will happen through ``ndk-build`` (ARMv7):
.. code-block:: .. code-block::
cd ../DeepSpeech/native_client cd ../STT/native_client
$ANDROID_NDK_HOME/ndk-build APP_PLATFORM=android-21 APP_BUILD_SCRIPT=$(pwd)/Android.mk NDK_PROJECT_PATH=$(pwd) APP_STL=c++_shared TFDIR=$(pwd)/../tensorflow/ TARGET_ARCH_ABI=armeabi-v7a $ANDROID_NDK_HOME/ndk-build APP_PLATFORM=android-21 APP_BUILD_SCRIPT=$(pwd)/Android.mk NDK_PROJECT_PATH=$(pwd) APP_STL=c++_shared TFDIR=$(pwd)/../tensorflow/ TARGET_ARCH_ABI=armeabi-v7a
And (ARM64): And (ARM64):
.. code-block:: .. code-block::
cd ../DeepSpeech/native_client cd ../STT/native_client
$ANDROID_NDK_HOME/ndk-build APP_PLATFORM=android-21 APP_BUILD_SCRIPT=$(pwd)/Android.mk NDK_PROJECT_PATH=$(pwd) APP_STL=c++_shared TFDIR=$(pwd)/../tensorflow/ TARGET_ARCH_ABI=arm64-v8a $ANDROID_NDK_HOME/ndk-build APP_PLATFORM=android-21 APP_BUILD_SCRIPT=$(pwd)/Android.mk NDK_PROJECT_PATH=$(pwd) APP_STL=c++_shared TFDIR=$(pwd)/../tensorflow/ TARGET_ARCH_ABI=arm64-v8a
Android demo APK Android demo APK
@ -303,13 +305,13 @@ demo of one usage of the application. For example, it's only able to read PCM
mono 16kHz 16-bits file and it might fail on some WAVE file that are not mono 16kHz 16-bits file and it might fail on some WAVE file that are not
following exactly the specification. following exactly the specification.
Running ``deepspeech`` via adb Running ``stt`` via adb
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You should use ``adb push`` to send data to device, please refer to Android You should use ``adb push`` to send data to device, please refer to Android
documentation on how to use that. documentation on how to use that.
Please push DeepSpeech data to ``/sdcard/deepspeech/``\ , including: Please push 🐸STT data to ``/sdcard/STT/``\ , including:
* ``output_graph.tflite`` which is the TF Lite model * ``output_graph.tflite`` which is the TF Lite model
@ -319,8 +321,8 @@ Please push DeepSpeech data to ``/sdcard/deepspeech/``\ , including:
Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/ds``\ : Then, push binaries from ``native_client.tar.xz`` to ``/data/local/tmp/ds``\ :
* ``deepspeech`` * ``stt``
* ``libdeepspeech.so`` * ``libstt.so``
* ``libc++_shared.so`` * ``libc++_shared.so``
You should then be able to run as usual, using a shell from ``adb shell``\ : You should then be able to run as usual, using a shell from ``adb shell``\ :
@ -328,7 +330,7 @@ You should then be able to run as usual, using a shell from ``adb shell``\ :
.. code-block:: .. code-block::
user@device$ cd /data/local/tmp/ds/ user@device$ cd /data/local/tmp/ds/
user@device$ LD_LIBRARY_PATH=$(pwd)/ ./deepspeech [...] user@device$ LD_LIBRARY_PATH=$(pwd)/ ./stt [...]
Please note that Android linker does not support ``rpath`` so you have to set Please note that Android linker does not support ``rpath`` so you have to set
``LD_LIBRARY_PATH``. Properly wrapped / packaged bindings does embed the library ``LD_LIBRARY_PATH``. Properly wrapped / packaged bindings does embed the library

View File

@ -1,9 +1,9 @@
.. _build-native-client-dotnet: .. _build-native-client-dotnet:
Building DeepSpeech native client for Windows Building Coqui STT native client for Windows
============================================= ============================================
Now we can build the native client of DeepSpeech and run inference on Windows using the C# client, to do that we need to compile the ``native_client``. Now we can build the native client of 🐸STT and run inference on Windows using the C# client, to do that we need to compile the ``native_client``.
**Table of Contents** **Table of Contents**
@ -44,11 +44,11 @@ We highly recommend sticking to the recommended versions of CUDA/cuDNN in order
Getting the code Getting the code
---------------- ----------------
We need to clone ``mozilla/DeepSpeech``. We need to clone ``coqui-ai/STT``.
.. code-block:: bash .. code-block:: bash
git clone https://github.com/mozilla/DeepSpeech git clone https://github.com/coqui-ai/STT
git submodule sync tensorflow/ git submodule sync tensorflow/
git submodule update --init tensorflow/ git submodule update --init tensorflow/
@ -61,8 +61,8 @@ There should already be a symbolic link, for this example let's suppose that we
. .
├── D:\ ├── D:\
│ ├── cloned # Contains DeepSpeech and tensorflow side by side │ ├── cloned # Contains 🐸STT and tensorflow side by side
│ │ └── DeepSpeech # Root of the cloned DeepSpeech │ │ └── STT # Root of the cloned 🐸STT
│ │ ├── tensorflow # Root of the cloned mozilla/tensorflow │ │ ├── tensorflow # Root of the cloned mozilla/tensorflow
└── ... └── ...
@ -71,7 +71,7 @@ Change your path accordingly to your path structure, for the structure above we
.. code-block:: bash .. code-block:: bash
mklink /d "D:\cloned\DeepSpeech\tensorflow\native_client" "D:\cloned\DeepSpeech\native_client" mklink /d "D:\cloned\STT\tensorflow\native_client" "D:\cloned\STT\native_client"
Adding environment variables Adding environment variables
---------------------------- ----------------------------
@ -119,7 +119,7 @@ Building the native_client
There's one last command to run before building, you need to run the `configure.py <https://github.com/mozilla/tensorflow/blob/master/configure.py>`_ inside ``tensorflow`` cloned directory. There's one last command to run before building, you need to run the `configure.py <https://github.com/mozilla/tensorflow/blob/master/configure.py>`_ inside ``tensorflow`` cloned directory.
At this point we are ready to start building the ``native_client``, go to ``tensorflow`` sub-directory, following our examples should be ``D:\cloned\DeepSpeech\tensorflow``. At this point we are ready to start building the ``native_client``, go to ``tensorflow`` sub-directory, following our examples should be ``D:\cloned\STT\tensorflow``.
CPU CPU
~~~ ~~~
@ -128,7 +128,7 @@ We will add AVX/AVX2 support in the command, please make sure that your CPU supp
.. code-block:: bash .. code-block:: bash
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" -c opt --copt=/arch:AVX --copt=/arch:AVX2 //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" -c opt --copt=/arch:AVX --copt=/arch:AVX2 //native_client:libstt.so
GPU with CUDA GPU with CUDA
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
@ -137,11 +137,11 @@ If you enabled CUDA in `configure.py <https://github.com/mozilla/tensorflow/blob
.. code-block:: bash .. code-block:: bash
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" -c opt --config=cuda --copt=/arch:AVX --copt=/arch:AVX2 //native_client:libdeepspeech.so bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" -c opt --config=cuda --copt=/arch:AVX --copt=/arch:AVX2 //native_client:libstt.so
Be patient, if you enabled AVX/AVX2 and CUDA it will take a long time. Finally you should see it stops and shows the path to the generated ``libdeepspeech.so``. Be patient, if you enabled AVX/AVX2 and CUDA it will take a long time. Finally you should see it stops and shows the path to the generated ``libstt.so``.
Using the generated library Using the generated library
--------------------------- ---------------------------
As for now we can only use the generated ``libdeepspeech.so`` with the C# clients, go to `native_client/dotnet/ <https://github.com/mozilla/DeepSpeech/tree/master/native_client/dotnet>`_ in your DeepSpeech directory and open the Visual Studio solution, then we need to build in debug or release mode, finally we just need to copy ``libdeepspeech.so`` to the generated ``x64/Debug`` or ``x64/Release`` directory. As for now we can only use the generated ``libstt.so`` with the C# clients, go to `native_client/dotnet/ <https://github.com/coqui-ai/STT/tree/master/native_client/dotnet>`_ in your STT directory and open the Visual Studio solution, then we need to build in debug or release mode, finally we just need to copy ``libstt.so`` to the generated ``x64/Debug`` or ``x64/Release`` directory.

View File

@ -1,4 +1,4 @@
User contributed examples User contributed examples
========================= =========================
There are also several user contributed examples available on a separate examples repository: `https://github.com/mozilla/DeepSpeech-examples <https://github.com/mozilla/DeepSpeech-examples>`_. There are also several user contributed examples available on a separate examples repository: `https://github.com/coqui-ai/STT-examples <https://github.com/coqui-ai/STT-examples>`_.

View File

@ -6,7 +6,7 @@ CTC beam search decoder
Introduction Introduction
^^^^^^^^^^^^ ^^^^^^^^^^^^
DeepSpeech uses the `Connectionist Temporal Classification <http://www.cs.toronto.edu/~graves/icml_2006.pdf>`_ loss function. For an excellent explanation of CTC and its usage, see this Distill article: `Sequence Modeling with CTC <https://distill.pub/2017/ctc/>`_. This document assumes the reader is familiar with the concepts described in that article, and describes DeepSpeech specific behaviors that developers building systems with DeepSpeech should know to avoid problems. 🐸STT uses the `Connectionist Temporal Classification <http://www.cs.toronto.edu/~graves/icml_2006.pdf>`_ loss function. For an excellent explanation of CTC and its usage, see this Distill article: `Sequence Modeling with CTC <https://distill.pub/2017/ctc/>`_. This document assumes the reader is familiar with the concepts described in that article, and describes 🐸STT specific behaviors that developers building systems with 🐸STT should know to avoid problems.
Note: Documentation for the tooling for creating custom scorer packages is available in :ref:`scorer-scripts`. Note: Documentation for the tooling for creating custom scorer packages is available in :ref:`scorer-scripts`.
@ -16,19 +16,19 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S
External scorer External scorer
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
DeepSpeech clients support OPTIONAL use of an external language model to improve the accuracy of the predicted transcripts. In the code, command line parameters, and documentation, this is referred to as a "scorer". The scorer is used to compute the likelihood (also called a score, hence the name "scorer") of sequences of words or characters in the output, to guide the decoder towards more likely results. This improves accuracy significantly. 🐸STT clients support OPTIONAL use of an external language model to improve the accuracy of the predicted transcripts. In the code, command line parameters, and documentation, this is referred to as a "scorer". The scorer is used to compute the likelihood (also called a score, hence the name "scorer") of sequences of words or characters in the output, to guide the decoder towards more likely results. This improves accuracy significantly.
The use of an external scorer is fully optional. When an external scorer is not specified, DeepSpeech still uses a beam search decoding algorithm, but without any outside scoring. The use of an external scorer is fully optional. When an external scorer is not specified, 🐸STT still uses a beam search decoding algorithm, but without any outside scoring.
Currently, the DeepSpeech external scorer is implemented with `KenLM <https://kheafield.com/code/kenlm/>`_, plus some tooling to package the necessary files and metadata into a single ``.scorer`` package. The tooling lives in ``data/lm/``. The scripts included in ``data/lm/`` can be used and modified to build your own language model based on your particular use case or language. See :ref:`scorer-scripts` for more details on how to reproduce our scorer file as well as create your own. Currently, the 🐸STT external scorer is implemented with `KenLM <https://kheafield.com/code/kenlm/>`_, plus some tooling to package the necessary files and metadata into a single ``.scorer`` package. The tooling lives in ``data/lm/``. The scripts included in ``data/lm/`` can be used and modified to build your own language model based on your particular use case or language. See :ref:`scorer-scripts` for more details on how to reproduce our scorer file as well as create your own.
The scripts are geared towards replicating the language model files we release as part of `DeepSpeech model releases <https://github.com/mozilla/DeepSpeech/releases/latest>`_, but modifying them to use different datasets or language model construction parameters should be simple. The scripts are geared towards replicating the language model files we release as part of `STT model releases <https://github.com/mozilla/🐸STT/releases/latest>`_, but modifying them to use different datasets or language model construction parameters should be simple.
Decoding modes Decoding modes
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
DeepSpeech currently supports two modes of operation with significant differences at both training and decoding time. Note that Bytes output mode is experimental and has not been tested for languages other than Chinese Mandarin. 🐸STT currently supports two modes of operation with significant differences at both training and decoding time. Note that Bytes output mode is experimental and has not been tested for languages other than Chinese Mandarin.
Default mode (alphabet based) Default mode (alphabet based)

View File

@ -5,7 +5,7 @@ Error codes
Below is the definition for all error codes used in the API, their numerical values, and a human readable description. Below is the definition for all error codes used in the API, their numerical values, and a human readable description.
.. literalinclude:: ../native_client/deepspeech.h .. literalinclude:: ../native_client/coqui-stt.h
:language: c :language: c
:start-after: sphinx-doc: error_code_listing_start :start-after: sphinx-doc: error_code_listing_start
:end-before: sphinx-doc: error_code_listing_end :end-before: sphinx-doc: error_code_listing_end

View File

@ -3,12 +3,12 @@
Command-line flags for the training scripts Command-line flags for the training scripts
=========================================== ===========================================
Below you can find the definition of all command-line flags supported by the training scripts. This includes ``DeepSpeech.py``, ``evaluate.py``, ``evaluate_tflite.py``, ``transcribe.py`` and ``lm_optimizer.py``. Below you can find the definition of all command-line flags supported by the training scripts. This includes ``train.py``, ``evaluate.py``, ``evaluate_tflite.py``, ``transcribe.py`` and ``lm_optimizer.py``.
Flags Flags
----- -----
.. literalinclude:: ../training/deepspeech_training/util/flags.py .. literalinclude:: ../training/coqui_stt_training/util/flags.py
:language: python :language: python
:linenos: :linenos:
:lineno-match: :lineno-match:

View File

@ -1,7 +1,7 @@
Hot-word boosting API Usage example Hot-word boosting API Usage example
=================================== ===================================
With DeepSpeech 0.9 release a new API feature was introduced that allows boosting probability from the scorer of given words. It is exposed in all bindings (C, Python, JS, Java and .Net). With the 🐸STT 0.9 release a new API feature was introduced that allows boosting probability from the scorer of given words. It is exposed in all bindings (C, Python, JS, Java and .Net).
Currently, it provides three methods for the Model class: Currently, it provides three methods for the Model class:
@ -19,11 +19,11 @@ It is worth noting that boosting non-existent words in scorer (mostly proper nou
Adjusting the boosting value Adjusting the boosting value
---------------------------- ----------------------------
For hot-word boosting it is hard to determine what the optimal value that one might be searching for is. Additionally, this is dependant on the input audio file. In practice, as it was reported by DeepSpeech users, the value should be not bigger than 20.0 for positive value boosting. Nevertheless, each usecase is different and you might need to adjust values on your own. For hot-word boosting it is hard to determine what the optimal value that one might be searching for is. Additionally, this is dependant on the input audio file. In practice, as it was reported by 🐸STT users, the value should be not bigger than 20.0 for positive value boosting. Nevertheless, each usecase is different and you might need to adjust values on your own.
There is a user contributed script available on ``DeepSpeech-examples`` repository for adjusting boost values: There is a user contributed script available on ``STT-examples`` repository for adjusting boost values:
`https://github.com/mozilla/DeepSpeech-examples/tree/master/hotword_adjusting <https://github.com/mozilla/DeepSpeech-examples/tree/master/hotword_adjusting>`_. `https://github.com/coqui-ai/STT-examples/tree/master/hotword_adjusting <https://github.com/coqui-ai/STT-examples/tree/master/hotword_adjusting>`_.
Positive value boosting Positive value boosting

View File

@ -4,7 +4,7 @@
# You can set these variables from the command line. # You can set these variables from the command line.
SPHINXOPTS = SPHINXOPTS =
SPHINXBUILD = sphinx-build SPHINXBUILD = sphinx-build
SPHINXPROJ = DeepSpeech SPHINXPROJ = "Coqui STT"
SOURCEDIR = . SOURCEDIR = .
BUILDDIR = .build BUILDDIR = .build

View File

@ -1,7 +1,7 @@
Parallel Optimization Parallel Optimization
===================== =====================
This is how we implement optimization of the DeepSpeech model across GPUs on a This is how we implement optimization of the 🐸STT model across GPUs on a
single host. Parallel optimization can take on various forms. For example single host. Parallel optimization can take on various forms. For example
one can use asynchronous updates of the model, synchronous updates of the model, one can use asynchronous updates of the model, synchronous updates of the model,
or some combination of the two. or some combination of the two.

View File

@ -3,11 +3,11 @@
External scorer scripts External scorer scripts
======================= =======================
DeepSpeech pre-trained models include an external scorer. This document explains how to reproduce our external scorer, as well as adapt the scripts to create your own. 🐸STT pre-trained models include an external scorer. This document explains how to reproduce our external scorer, as well as adapt the scripts to create your own.
The scorer is composed of two sub-components, a KenLM language model and a trie data structure containing all words in the vocabulary. In order to create the scorer package, first we must create a KenLM language model (using ``data/lm/generate_lm.py``, and then use ``generate_scorer_package`` to create the final package file including the trie data structure. The scorer is composed of two sub-components, a KenLM language model and a trie data structure containing all words in the vocabulary. In order to create the scorer package, first we must create a KenLM language model (using ``data/lm/generate_lm.py``, and then use ``generate_scorer_package`` to create the final package file including the trie data structure.
The ``generate_scorer_package`` binary is part of the native client package that is included with official releases. You can find the appropriate archive for your platform in the `GitHub release downloads <https://github.com/mozilla/DeepSpeech/releases/latest>`_. The native client package is named ``native_client.{arch}.{config}.{plat}.tar.xz``, where ``{arch}`` is the architecture the binary was built for, for example ``amd64`` or ``arm64``, ``config`` is the build configuration, which for building decoder packages does not matter, and ``{plat}`` is the platform the binary was built-for, for example ``linux`` or ``osx``. If you wanted to run the ``generate_scorer_package`` binary on a Linux desktop, you would download ``native_client.amd64.cpu.linux.tar.xz``. The ``generate_scorer_package`` binary is part of the native client package that is included with official releases. You can find the appropriate archive for your platform in the `GitHub release downloads <https://github.com/coqui-ai/STT/releases/latest>`_. The native client package is named ``native_client.{arch}.{config}.{plat}.tar.xz``, where ``{arch}`` is the architecture the binary was built for, for example ``amd64`` or ``arm64``, ``config`` is the build configuration, which for building decoder packages does not matter, and ``{plat}`` is the platform the binary was built-for, for example ``linux`` or ``osx``. If you wanted to run the ``generate_scorer_package`` binary on a Linux desktop, you would download ``native_client.amd64.cpu.linux.tar.xz``.
Reproducing our external scorer Reproducing our external scorer
------------------------------- -------------------------------
@ -26,7 +26,7 @@ Then use the ``generate_lm.py`` script to generate ``lm.binary`` and ``vocab-500
As input you can use a plain text (e.g. ``file.txt``) or gzipped (e.g. ``file.txt.gz``) text file with one sentence in each line. As input you can use a plain text (e.g. ``file.txt``) or gzipped (e.g. ``file.txt.gz``) text file with one sentence in each line.
If you are using a container created from ``Dockerfile.build``, you can use ``--kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/``. If you are using a container created from ``Dockerfile.build``, you can use ``--kenlm_bins /STT/native_client/kenlm/build/bin/``.
Else you have to build `KenLM <https://github.com/kpu/kenlm>`_ first and then pass the build directory to the script. Else you have to build `KenLM <https://github.com/kpu/kenlm>`_ first and then pass the build directory to the script.
.. code-block:: bash .. code-block:: bash
@ -44,7 +44,7 @@ Afterwards you can use ``generate_scorer_package`` to generate the scorer packag
cd data/lm cd data/lm
# Download and extract appropriate native_client package: # Download and extract appropriate native_client package:
curl -LO http://github.com/mozilla/DeepSpeech/releases/... curl -LO http://github.com/coqui-ai/STT/releases/...
tar xvf native_client.*.tar.xz tar xvf native_client.*.tar.xz
./generate_scorer_package --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt \ ./generate_scorer_package --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt \
--package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284 --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
@ -59,6 +59,6 @@ Building your own scorer can be useful if you're using models in a narrow usage
The LibriSpeech LM training text used by our scorer is around 4GB uncompressed, which should give an idea of the size of a corpus needed for a reasonable language model for general speech recognition. For more constrained use cases with smaller vocabularies, you don't need as much data, but you should still try to gather as much as you can. The LibriSpeech LM training text used by our scorer is around 4GB uncompressed, which should give an idea of the size of a corpus needed for a reasonable language model for general speech recognition. For more constrained use cases with smaller vocabularies, you don't need as much data, but you should still try to gather as much as you can.
With a text corpus in hand, you can then re-use ``generate_lm.py`` and ``generate_scorer_package`` to create your own scorer that is compatible with DeepSpeech clients and language bindings. Before building the language model, you must first familiarize yourself with the `KenLM toolkit <https://kheafield.com/code/kenlm/>`_. Most of the options exposed by the ``generate_lm.py`` script are simply forwarded to KenLM options of the same name, so you must read the KenLM documentation in order to fully understand their behavior. With a text corpus in hand, you can then re-use ``generate_lm.py`` and ``generate_scorer_package`` to create your own scorer that is compatible with 🐸STT clients and language bindings. Before building the language model, you must first familiarize yourself with the `KenLM toolkit <https://kheafield.com/code/kenlm/>`_. Most of the options exposed by the ``generate_lm.py`` script are simply forwarded to KenLM options of the same name, so you must read the KenLM documentation in order to fully understand their behavior.
After using ``generate_lm.py`` to create a KenLM language model binary file, you can use ``generate_scorer_package`` to create a scorer package as described in the previous section. Note that we have a :github:`lm_optimizer.py script <lm_optimizer.py>` which can be used to find good default values for alpha and beta. To use it, you must first generate a package with any value set for default alpha and beta flags. For this step, it doesn't matter what values you use, as they'll be overridden by ``lm_optimizer.py`` later. Then, use ``lm_optimizer.py`` with this scorer file to find good alpha and beta values. Finally, use ``generate_scorer_package`` again, this time with the new values. After using ``generate_lm.py`` to create a KenLM language model binary file, you can use ``generate_scorer_package`` to create a scorer package as described in the previous section. Note that we have a :github:`lm_optimizer.py script <lm_optimizer.py>` which can be used to find good default values for alpha and beta. To use it, you must first generate a package with any value set for default alpha and beta flags. For this step, it doesn't matter what values you use, as they'll be overridden by ``lm_optimizer.py`` later. Then, use ``lm_optimizer.py`` with this scorer file to find good alpha and beta values. Finally, use ``generate_scorer_package`` again, this time with the new values.

View File

@ -15,11 +15,11 @@ Prerequisites for training a model
Getting the training code Getting the training code
^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
Clone the latest released stable branch from Github (e.g. 0.9.3, check `here <https://github.com/mozilla/DeepSpeech/releases>`_): Clone the latest released stable branch from Github (e.g. 0.9.3, check `here <https://github.com/coqui-ai/STT/releases>`_):
.. code-block:: bash .. code-block:: bash
git clone --branch v0.9.3 https://github.com/mozilla/DeepSpeech git clone --branch v0.9.3 https://github.com/coqui-ai/STT
If you plan on committing code or you want to report bugs, please use the master branch. If you plan on committing code or you want to report bugs, please use the master branch.
@ -28,31 +28,31 @@ Creating a virtual environment
Throughout the documentation we assume you are using **virtualenv** to manage your Python environments. This setup is the one used and recommended by the project authors and is the easiest way to make sure you won't run into environment issues. If you're using **Anaconda, Miniconda or Mamba**, first read the instructions at :ref:`training-with-conda` and then continue from the installation step below. Throughout the documentation we assume you are using **virtualenv** to manage your Python environments. This setup is the one used and recommended by the project authors and is the easiest way to make sure you won't run into environment issues. If you're using **Anaconda, Miniconda or Mamba**, first read the instructions at :ref:`training-with-conda` and then continue from the installation step below.
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-train-venv``. You can create it using this command: In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run 🐸STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/coqui-stt-train-venv``. You can create it using this command:
.. code-block:: .. code-block::
$ python3 -m venv $HOME/tmp/deepspeech-train-venv/ $ python3 -m venv $HOME/tmp/coqui-stt-train-venv/
Once this command completes successfully, the environment will be ready to be activated. Once this command completes successfully, the environment will be ready to be activated.
Activating the environment Activating the environment
^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^
Each time you need to work with DeepSpeech, you have to *activate* this virtual environment. This is done with this simple command: Each time you need to work with 🐸STT, you have to *activate* this virtual environment. This is done with this simple command:
.. code-block:: .. code-block::
$ source $HOME/tmp/deepspeech-train-venv/bin/activate $ source $HOME/tmp/coqui-stt-train-venv/bin/activate
Installing DeepSpeech Training Code and its dependencies Installing Coqui STT Training Code and its dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Install the required dependencies using ``pip3``\ : Install the required dependencies using ``pip3``\ :
.. code-block:: bash .. code-block:: bash
cd DeepSpeech cd STT
pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0 pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0
pip3 install --upgrade -e . pip3 install --upgrade -e .
@ -95,11 +95,11 @@ This should ensure that you'll re-use the upstream Python 3 TensorFlow GPU-enabl
make Dockerfile.train make Dockerfile.train
If you want to specify a different DeepSpeech repository / branch, you can pass ``DEEPSPEECH_REPO`` or ``DEEPSPEECH_SHA`` parameters: If you want to specify a different 🐸STT repository / branch, you can pass ``STT_REPO`` or ``STT_SHA`` parameters:
.. code-block:: bash .. code-block:: bash
make Dockerfile.train DEEPSPEECH_REPO=git://your/fork DEEPSPEECH_SHA=origin/your-branch make Dockerfile.train STT_REPO=git://your/fork STT_SHA=origin/your-branch
Common Voice training data Common Voice training data
^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -112,7 +112,7 @@ After extraction of such a data set, you'll find the following contents:
* the ``*.tsv`` files output by CorporaCreator for the downloaded language * the ``*.tsv`` files output by CorporaCreator for the downloaded language
* the mp3 audio files they reference in a ``clips`` sub-directory. * the mp3 audio files they reference in a ``clips`` sub-directory.
For bringing this data into a form that DeepSpeech understands, you have to run the CommonVoice v2.0 importer (\ ``bin/import_cv2.py``\ ): For bringing this data into a form that 🐸STT understands, you have to run the CommonVoice v2.0 importer (\ ``bin/import_cv2.py``\ ):
.. code-block:: bash .. code-block:: bash
@ -134,22 +134,22 @@ The CSV files comprise of the following fields:
* ``wav_filesize`` - samples size given in bytes, used for sorting the data before training. Expects integer. * ``wav_filesize`` - samples size given in bytes, used for sorting the data before training. Expects integer.
* ``transcript`` - transcription target for the sample. * ``transcript`` - transcription target for the sample.
To use Common Voice data during training, validation and testing, you pass (comma separated combinations of) their filenames into ``--train_files``\ , ``--dev_files``\ , ``--test_files`` parameters of ``DeepSpeech.py``. To use Common Voice data during training, validation and testing, you pass (comma separated combinations of) their filenames into ``--train_files``\ , ``--dev_files``\ , ``--test_files`` parameters of ``train.py``.
If, for example, Common Voice language ``en`` was extracted to ``../data/CV/en/``\ , ``DeepSpeech.py`` could be called like this: If, for example, Common Voice language ``en`` was extracted to ``../data/CV/en/``\ , ``train.py`` could be called like this:
.. code-block:: bash .. code-block:: bash
python3 DeepSpeech.py --train_files ../data/CV/en/clips/train.csv --dev_files ../data/CV/en/clips/dev.csv --test_files ../data/CV/en/clips/test.csv python3 train.py --train_files ../data/CV/en/clips/train.csv --dev_files ../data/CV/en/clips/dev.csv --test_files ../data/CV/en/clips/test.csv
Training a model Training a model
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
The central (Python) script is ``DeepSpeech.py`` in the project's root directory. For its list of command line options, you can call: The central (Python) script is ``train.py`` in the project's root directory. For its list of command line options, you can call:
.. code-block:: bash .. code-block:: bash
python3 DeepSpeech.py --helpfull python3 train.py --helpfull
To get the output of this in a slightly better-formatted way, you can also look at the flag definitions in :ref:`training-flags`. To get the output of this in a slightly better-formatted way, you can also look at the flag definitions in :ref:`training-flags`.
@ -157,7 +157,7 @@ For executing pre-configured training scenarios, there is a collection of conven
**If you experience GPU OOM errors while training, try reducing the batch size with the ``--train_batch_size``\ , ``--dev_batch_size`` and ``--test_batch_size`` parameters.** **If you experience GPU OOM errors while training, try reducing the batch size with the ``--train_batch_size``\ , ``--dev_batch_size`` and ``--test_batch_size`` parameters.**
As a simple first example you can open a terminal, change to the directory of the DeepSpeech checkout, activate the virtualenv created above, and run: As a simple first example you can open a terminal, change to the directory of the 🐸STT checkout, activate the virtualenv created above, and run:
.. code-block:: bash .. code-block:: bash
@ -165,9 +165,9 @@ As a simple first example you can open a terminal, change to the directory of th
This script will train on a small sample dataset composed of just a single audio file, the sample file for the `TIMIT Acoustic-Phonetic Continuous Speech Corpus <https://catalog.ldc.upenn.edu/LDC93S1>`_, which can be overfitted on a GPU in a few minutes for demonstration purposes. From here, you can alter any variables with regards to what dataset is used, how many training iterations are run and the default values of the network parameters. This script will train on a small sample dataset composed of just a single audio file, the sample file for the `TIMIT Acoustic-Phonetic Continuous Speech Corpus <https://catalog.ldc.upenn.edu/LDC93S1>`_, which can be overfitted on a GPU in a few minutes for demonstration purposes. From here, you can alter any variables with regards to what dataset is used, how many training iterations are run and the default values of the network parameters.
Feel also free to pass additional (or overriding) ``DeepSpeech.py`` parameters to these scripts. Then, just run the script to train the modified network. Feel also free to pass additional (or overriding) ``train.py`` parameters to these scripts. Then, just run the script to train the modified network.
Each dataset has a corresponding importer script in ``bin/`` that can be used to download (if it's freely available) and preprocess the dataset. See ``bin/import_librivox.py`` for an example of how to import and preprocess a large dataset for training with DeepSpeech. Each dataset has a corresponding importer script in ``bin/`` that can be used to download (if it's freely available) and preprocess the dataset. See ``bin/import_librivox.py`` for an example of how to import and preprocess a large dataset for training with 🐸STT.
Some importers might require additional code to properly handled your locale-specific requirements. Such handling is dealt with ``--validate_label_locale`` flag that allows you to source out-of-tree Python script that defines a ``validate_label`` function. Please refer to ``util/importers.py`` for implementation example of that function. Some importers might require additional code to properly handled your locale-specific requirements. Such handling is dealt with ``--validate_label_locale`` flag that allows you to source out-of-tree Python script that defines a ``validate_label`` function. Please refer to ``util/importers.py`` for implementation example of that function.
If you don't provide this argument, the default ``validate_label`` function will be used. This one is only intended for English language, so you might have consistency issues in your data for other languages. If you don't provide this argument, the default ``validate_label`` function will be used. This one is only intended for English language, so you might have consistency issues in your data for other languages.
@ -191,10 +191,10 @@ Automatic Mixed Precision (AMP) training on GPU for TensorFlow has been recently
Mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput. Mixed precision training also often allows larger batch sizes. Automatic mixed precision training can be enabled by including the flag `--automatic_mixed_precision` at training time: Mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput. Mixed precision training also often allows larger batch sizes. Automatic mixed precision training can be enabled by including the flag `--automatic_mixed_precision` at training time:
``` ```
python3 DeepSpeech.py --train_files ./train.csv --dev_files ./dev.csv --test_files ./test.csv --automatic_mixed_precision python3 train.py --train_files ./train.csv --dev_files ./dev.csv --test_files ./test.csv --automatic_mixed_precision
``` ```
On a Volta generation V100 GPU, automatic mixed precision speeds up DeepSpeech training and evaluation by ~30%-40%. On a Volta generation V100 GPU, automatic mixed precision speeds up 🐸STT training and evaluation by ~30%-40%.
Checkpointing Checkpointing
^^^^^^^^^^^^^ ^^^^^^^^^^^^^
@ -212,7 +212,7 @@ Refer to the :ref:`usage instructions <usage-docs>` for information on running a
Exporting a model for TFLite Exporting a model for TFLite
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to experiment with the TF Lite engine, you need to export a model that is compatible with it, then use the ``--export_tflite`` flags. If you already have a trained model, you can re-export it for TFLite by running ``DeepSpeech.py`` again and specifying the same ``checkpoint_dir`` that you used for training, as well as passing ``--export_tflite --export_dir /model/export/destination``. If you changed the alphabet you also need to add the ``--alphabet_config_path my-new-language-alphabet.txt`` flag. If you want to experiment with the TF Lite engine, you need to export a model that is compatible with it, then use the ``--export_tflite`` flags. If you already have a trained model, you can re-export it for TFLite by running ``train.py`` again and specifying the same ``checkpoint_dir`` that you used for training, as well as passing ``--export_tflite --export_dir /model/export/destination``. If you changed the alphabet you also need to add the ``--alphabet_config_path my-new-language-alphabet.txt`` flag.
Making a mmap-able model for inference Making a mmap-able model for inference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -236,9 +236,9 @@ Upon sucessfull run, it should report about conversion of a non-zero number of n
Continuing training from a release model Continuing training from a release model
---------------------------------------- ----------------------------------------
There are currently two supported approaches to make use of a pre-trained DeepSpeech model: fine-tuning or transfer-learning. Choosing which one to use is a simple decision, and it depends on your target dataset. Does your data use the same alphabet as the release model? If "Yes": fine-tune. If "No" use transfer-learning. There are currently two supported approaches to make use of a pre-trained 🐸STT model: fine-tuning or transfer-learning. Choosing which one to use is a simple decision, and it depends on your target dataset. Does your data use the same alphabet as the release model? If "Yes": fine-tune. If "No" use transfer-learning.
If your own data uses the *extact* same alphabet as the English release model (i.e. `a-z` plus `'`) then the release model's output layer will match your data, and you can just fine-tune the existing parameters. However, if you want to use a new alphabet (e.g. Cyrillic `а`, `б`, `д`), the output layer of a release DeepSpeech model will *not* match your data. In this case, you should use transfer-learning (i.e. remove the trained model's output layer, and reinitialize a new output layer that matches your target character set. If your own data uses the *extact* same alphabet as the English release model (i.e. `a-z` plus `'`) then the release model's output layer will match your data, and you can just fine-tune the existing parameters. However, if you want to use a new alphabet (e.g. Cyrillic `а`, `б`, `д`), the output layer of a release 🐸STT model will *not* match your data. In this case, you should use transfer-learning (i.e. remove the trained model's output layer, and reinitialize a new output layer that matches your target character set.
N.B. - If you have access to a pre-trained model which uses UTF-8 bytes at the output layer you can always fine-tune, because any alphabet should be encodable as UTF-8. N.B. - If you have access to a pre-trained model which uses UTF-8 bytes at the output layer you can always fine-tune, because any alphabet should be encodable as UTF-8.
@ -247,14 +247,14 @@ N.B. - If you have access to a pre-trained model which uses UTF-8 bytes at the o
Fine-Tuning (same alphabet) Fine-Tuning (same alphabet)
^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you'd like to use one of the pre-trained models to bootstrap your training process (fine tuning), you can do so by using the ``--checkpoint_dir`` flag in ``DeepSpeech.py``. Specify the path where you downloaded the checkpoint from the release, and training will resume from the pre-trained model. If you'd like to use one of the pre-trained models to bootstrap your training process (fine tuning), you can do so by using the ``--checkpoint_dir`` flag in ``train.py``. Specify the path where you downloaded the checkpoint from the release, and training will resume from the pre-trained model.
For example, if you want to fine tune the entire graph using your own data in ``my-train.csv``\ , ``my-dev.csv`` and ``my-test.csv``\ , for three epochs, you can something like the following, tuning the hyperparameters as needed: For example, if you want to fine tune the entire graph using your own data in ``my-train.csv``\ , ``my-dev.csv`` and ``my-test.csv``\ , for three epochs, you can something like the following, tuning the hyperparameters as needed:
.. code-block:: bash .. code-block:: bash
mkdir fine_tuning_checkpoints mkdir fine_tuning_checkpoints
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir path/to/checkpoint/folder --epochs 3 --train_files my-train.csv --dev_files my-dev.csv --test_files my_dev.csv --learning_rate 0.0001 python3 train.py --n_hidden 2048 --checkpoint_dir path/to/checkpoint/folder --epochs 3 --train_files my-train.csv --dev_files my-dev.csv --test_files my_dev.csv --learning_rate 0.0001
Notes about the release checkpoints: the released models were trained with ``--n_hidden 2048``\ , so you need to use that same value when initializing from the release models. Since v0.6.0, the release models are also trained with ``--train_cudnn``\ , so you'll need to specify that as well. If you don't have a CUDA compatible GPU, then you can workaround it by using the ``--load_cudnn`` flag. Use ``--helpfull`` to get more information on how the flags work. Notes about the release checkpoints: the released models were trained with ``--n_hidden 2048``\ , so you need to use that same value when initializing from the release models. Since v0.6.0, the release models are also trained with ``--train_cudnn``\ , so you'll need to specify that as well. If you don't have a CUDA compatible GPU, then you can workaround it by using the ``--load_cudnn`` flag. Use ``--helpfull`` to get more information on how the flags work.
@ -270,17 +270,17 @@ If you try to load a release model without following these steps, you'll get an
Transfer-Learning (new alphabet) Transfer-Learning (new alphabet)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to continue training an alphabet-based DeepSpeech model (i.e. not a UTF-8 model) on a new language, or if you just want to add new characters to your custom alphabet, you will probably want to use transfer-learning instead of fine-tuning. If you're starting with a pre-trained UTF-8 model -- even if your data comes from a different language or uses a different alphabet -- the model will be able to predict your new transcripts, and you should use fine-tuning instead. If you want to continue training an alphabet-based 🐸STT model (i.e. not a UTF-8 model) on a new language, or if you just want to add new characters to your custom alphabet, you will probably want to use transfer-learning instead of fine-tuning. If you're starting with a pre-trained UTF-8 model -- even if your data comes from a different language or uses a different alphabet -- the model will be able to predict your new transcripts, and you should use fine-tuning instead.
In a nutshell, DeepSpeech's transfer-learning allows you to remove certain layers from a pre-trained model, initialize new layers for your target data, stitch together the old and new layers, and update all layers via gradient descent. You will remove the pre-trained output layer (and optionally more layers) and reinitialize parameters to fit your target alphabet. The simplest case of transfer-learning is when you remove just the output layer. In a nutshell, 🐸STT's transfer-learning allows you to remove certain layers from a pre-trained model, initialize new layers for your target data, stitch together the old and new layers, and update all layers via gradient descent. You will remove the pre-trained output layer (and optionally more layers) and reinitialize parameters to fit your target alphabet. The simplest case of transfer-learning is when you remove just the output layer.
In DeepSpeech's implementation of transfer-learning, all removed layers will be contiguous, starting from the output layer. The key flag you will want to experiment with is ``--drop_source_layers``. This flag accepts an integer from ``1`` to ``5`` and allows you to specify how many layers you want to remove from the pre-trained model. For example, if you supplied ``--drop_source_layers 3``, you will drop the last three layers of the pre-trained model: the output layer, penultimate layer, and LSTM layer. All dropped layers will be reinintialized, and (crucially) the output layer will be defined to match your supplied target alphabet. In 🐸STT's implementation of transfer-learning, all removed layers will be contiguous, starting from the output layer. The key flag you will want to experiment with is ``--drop_source_layers``. This flag accepts an integer from ``1`` to ``5`` and allows you to specify how many layers you want to remove from the pre-trained model. For example, if you supplied ``--drop_source_layers 3``, you will drop the last three layers of the pre-trained model: the output layer, penultimate layer, and LSTM layer. All dropped layers will be reinintialized, and (crucially) the output layer will be defined to match your supplied target alphabet.
You need to specify the location of the pre-trained model with ``--load_checkpoint_dir`` and define where your new model checkpoints will be saved with ``--save_checkpoint_dir``. You need to specify how many layers to remove (aka "drop") from the pre-trained model: ``--drop_source_layers``. You also need to supply your new alphabet file using the standard ``--alphabet_config_path`` (remember, using a new alphabet is the whole reason you want to use transfer-learning). You need to specify the location of the pre-trained model with ``--load_checkpoint_dir`` and define where your new model checkpoints will be saved with ``--save_checkpoint_dir``. You need to specify how many layers to remove (aka "drop") from the pre-trained model: ``--drop_source_layers``. You also need to supply your new alphabet file using the standard ``--alphabet_config_path`` (remember, using a new alphabet is the whole reason you want to use transfer-learning).
.. code-block:: bash .. code-block:: bash
python3 DeepSpeech.py \ python3 train.py \
--drop_source_layers 1 \ --drop_source_layers 1 \
--alphabet_config_path my-new-language-alphabet.txt \ --alphabet_config_path my-new-language-alphabet.txt \
--save_checkpoint_dir path/to/output-checkpoint/folder \ --save_checkpoint_dir path/to/output-checkpoint/folder \
@ -292,7 +292,7 @@ You need to specify the location of the pre-trained model with ``--load_checkpoi
UTF-8 mode UTF-8 mode
^^^^^^^^^^ ^^^^^^^^^^
DeepSpeech includes a UTF-8 operating mode which can be useful to model languages with very large alphabets, such as Chinese Mandarin. For details on how it works and how to use it, see :ref:`decoder-docs`. 🐸STT includes a UTF-8 operating mode which can be useful to model languages with very large alphabets, such as Chinese Mandarin. For details on how it works and how to use it, see :ref:`decoder-docs`.
.. _training-data-augmentation: .. _training-data-augmentation:
@ -314,7 +314,7 @@ For example, for the ``overlay`` augmentation:
.. code-block:: .. code-block::
python3 DeepSpeech.py --augment overlay[p=0.1,source=/path/to/audio.sdb,snr=20.0] ... python3 train.py --augment overlay[p=0.1,source=/path/to/audio.sdb,snr=20.0] ...
In the documentation below, whenever a value is specified as ``<float-range>`` or ``<int-range>``, it supports one of the follow formats: In the documentation below, whenever a value is specified as ``<float-range>`` or ``<int-range>``, it supports one of the follow formats:
@ -485,7 +485,7 @@ Example training with all augmentations:
.. code-block:: bash .. code-block:: bash
python -u DeepSpeech.py \ python -u train.py \
--train_files "train.sdb" \ --train_files "train.sdb" \
--feature_cache ./feature.cache \ --feature_cache ./feature.cache \
--cache_for_epochs 10 \ --cache_for_epochs 10 \
@ -541,5 +541,5 @@ To prevent common problems, make sure you **always use a separate environment wh
.. code-block:: bash .. code-block:: bash
(base) $ conda create -n deepspeech python=3.7 (base) $ conda create -n coqui-stt python=3.7
(base) $ conda activate deepspeech (base) $ conda activate coqui-stt

View File

@ -3,7 +3,7 @@
Using a Pre-trained Model Using a Pre-trained Model
========================= =========================
Inference using a DeepSpeech pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed `further down in this README <#third-party-bindings>`_. Inference using a 🐸STT pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed `further down in this README <#third-party-bindings>`_.
* :ref:`The C API <c-usage>`. * :ref:`The C API <c-usage>`.
* :ref:`The Python package/language binding <py-usage>` * :ref:`The Python package/language binding <py-usage>`
@ -13,7 +13,7 @@ Inference using a DeepSpeech pre-trained model can be done with a client/languag
.. _runtime-deps: .. _runtime-deps:
Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system: Running ``stt`` might, see below, require some runtime dependencies to be already installed on your system:
* ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz. * ``sox`` - The Python and Node.JS clients use SoX to resample files to 16kHz.
* ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually. * ``libgomp1`` - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
@ -33,23 +33,23 @@ The GPU capable builds (Python, NodeJS, C++, etc) depend on CUDA 10.1 and CuDNN
Getting the pre-trained model Getting the pre-trained model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech `releases page <https://github.com/mozilla/DeepSpeech/releases>`_. Alternatively, you can run the following command to download the model files in your current directory: If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the 🐸STT `releases page <https://github.com/coqui-ai/STT/releases>`_. Alternatively, you can run the following command to download the model files in your current directory:
.. code-block:: bash .. code-block:: bash
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer wget https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.scorer
There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``deepspeech``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``deepspeech-gpu``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``deepspeech-tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``deepspeech``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`. There are several pre-trained model files available in official releases. Files ending in ``.pbmm`` are compatible with clients and language bindings built against the standard TensorFlow runtime. Usually these packages are simply called ``stt``. These files are also compatible with CUDA enabled clients and language bindings. These packages are usually called ``stt-gpu``. Files ending in ``.tflite`` are compatible with clients and language bindings built against the `TensorFlow Lite runtime <https://www.tensorflow.org/lite/>`_. These models are optimized for size and performance in low power devices. On desktop platforms, the compatible packages are called ``stt-tflite``. On Android and Raspberry Pi, we only publish TensorFlow Lite enabled packages, and they are simply called ``stt``. You can see a full list of supported platforms and which TensorFlow runtime is supported at :ref:`supported-platforms-inference`.
+--------------------+---------------------+---------------------+ +--------------------+---------------------+---------------------+
| Package/Model type | .pbmm | .tflite | | Package/Model type | .pbmm | .tflite |
+====================+=====================+=====================+ +====================+=====================+=====================+
| deepspeech | Depends on platform | Depends on platform | | stt | Depends on platform | Depends on platform |
+--------------------+---------------------+---------------------+ +--------------------+---------------------+---------------------+
| deepspeech-gpu | ✅ | ❌ | | stt-gpu | ✅ | ❌ |
+--------------------+---------------------+---------------------+ +--------------------+---------------------+---------------------+
| deepspeech-tflite | ❌ | ✅ | | stt-tflite | ❌ | ✅ |
+--------------------+---------------------+---------------------+ +--------------------+---------------------+---------------------+
Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`. Finally, the pre-trained model files also include files ending in ``.scorer``. These are external scorers (language models) that are used at inference time in conjunction with an acoustic model (``.pbmm`` or ``.tflite`` file) to produce transcriptions. We also provide further documentation on :ref:`the decoding process <decoder-docs>` and :ref:`how scorers are generated <scorer-scripts>`.
@ -61,82 +61,82 @@ The release notes include detailed information on how the released models were t
The process for training an acoustic model is described in :ref:`training-docs`. In particular, fine tuning a release model using your own data can be a good way to leverage relatively smaller amounts of data that would not be sufficient for training a new model from scratch. See the :ref:`fine tuning and transfer learning sections <training-fine-tuning>` for more information. :ref:`Data augmentation <training-data-augmentation>` can also be a good way to increase the value of smaller training sets. The process for training an acoustic model is described in :ref:`training-docs`. In particular, fine tuning a release model using your own data can be a good way to leverage relatively smaller amounts of data that would not be sufficient for training a new model from scratch. See the :ref:`fine tuning and transfer learning sections <training-fine-tuning>` for more information. :ref:`Data augmentation <training-data-augmentation>` can also be a good way to increase the value of smaller training sets.
Creating your own external scorer from text data is another way that you can adapt the model to your specific needs. The process and tools used to generate an external scorer package are described in :ref:`scorer-scripts` and an overview of how the external scorer is used by DeepSpeech to perform inference is available in :ref:`decoder-docs`. Generating a smaller scorer from a single purpose text dataset is a quick process and can bring significant accuracy improvements, specially for more constrained, limited vocabulary applications. Creating your own external scorer from text data is another way that you can adapt the model to your specific needs. The process and tools used to generate an external scorer package are described in :ref:`scorer-scripts` and an overview of how the external scorer is used by 🐸STT to perform inference is available in :ref:`decoder-docs`. Generating a smaller scorer from a single purpose text dataset is a quick process and can bring significant accuracy improvements, specially for more constrained, limited vocabulary applications.
Model compatibility Model compatibility
^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
DeepSpeech models are versioned to keep you from trying to use an incompatible graph with a newer client after a breaking change was made to the code. If you get an error saying your model file version is too old for the client, you should either upgrade to a newer model release, re-export your model from the checkpoint using a newer version of the code, or downgrade your client if you need to use the old model and can't re-export it. 🐸STT models are versioned to keep you from trying to use an incompatible graph with a newer client after a breaking change was made to the code. If you get an error saying your model file version is too old for the client, you should either upgrade to a newer model release, re-export your model from the checkpoint using a newer version of the code, or downgrade your client if you need to use the old model and can't re-export it.
.. _py-usage: .. _py-usage:
Using the Python package Using the Python package
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``deepspeech`` binary to do speech-to-text on an audio file: Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``stt`` binary to do speech-to-text on an audio file:
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_. For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
We will continue under the assumption that you already have your system properly setup to create new virtual environments. We will continue under the assumption that you already have your system properly setup to create new virtual environments.
Create a DeepSpeech virtual environment Create a Coqui STT virtual environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-venv``. You can create it using this command: In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run 🐸STT. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/coqui-stt-venv``. You can create it using this command:
.. code-block:: .. code-block::
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/ $ virtualenv -p python3 $HOME/tmp/coqui-stt-venv/
Once this command completes successfully, the environment will be ready to be activated. Once this command completes successfully, the environment will be ready to be activated.
Activating the environment Activating the environment
~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~
Each time you need to work with DeepSpeech, you have to *activate* this virtual environment. This is done with this simple command: Each time you need to work with 🐸STT, you have to *activate* this virtual environment. This is done with this simple command:
.. code-block:: .. code-block::
$ source $HOME/tmp/deepspeech-venv/bin/activate $ source $HOME/tmp/coqui-stt-venv/bin/activate
Installing DeepSpeech Python bindings Installing Coqui STT Python bindings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the DeepSpeech wheel. You can check if ``deepspeech`` is already installed with ``pip3 list``. Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the 🐸STT wheel. You can check if ``stt`` is already installed with ``pip3 list``.
To perform the installation, just use ``pip3`` as such: To perform the installation, just use ``pip3`` as such:
.. code-block:: .. code-block::
$ pip3 install deepspeech $ pip3 install stt
If ``deepspeech`` is already installed, you can update it as such: If ``stt`` is already installed, you can update it as such:
.. code-block:: .. code-block::
$ pip3 install --upgrade deepspeech $ pip3 install --upgrade stt
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows: Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
.. code-block:: .. code-block::
$ pip3 install deepspeech-gpu $ pip3 install stt-gpu
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_. See the `release notes <https://github.com/coqui-ai/STT/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
You can update ``deepspeech-gpu`` as follows: You can update ``stt-gpu`` as follows:
.. code-block:: .. code-block::
$ pip3 install --upgrade deepspeech-gpu $ pip3 install --upgrade stt-gpu
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``deepspeech`` from the command-line. In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``stt`` from the command-line.
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_. Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
.. code-block:: bash .. code-block:: bash
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio my_audio_file.wav stt --model stt-0.9.3-models.pbmm --scorer stt-0.9.3-models.scorer --audio my_audio_file.wav
The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio. The ``--scorer`` argument is optional, and represents an external language model to be used when transcribing the audio.
@ -151,7 +151,9 @@ You can download the JS bindings using ``npm``\ :
.. code-block:: bash .. code-block:: bash
npm install deepspeech npm install stt
Special thanks to `Huan - Google Developers Experts in Machine Learning (ML GDE) <https://github.com/huan>`_ for providing the STT project name on npmjs.org
Please note that as of now, we support: Please note that as of now, we support:
- Node.JS versions 4 to 13. - Node.JS versions 4 to 13.
@ -163,9 +165,9 @@ Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can in
.. code-block:: bash .. code-block:: bash
npm install deepspeech-gpu npm install stt-gpu
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_. See the `release notes <https://github.com/coqui-ai/STT/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
See the :ref:`TypeScript client <js-api-example>` for an example of how to use the bindings programatically. See the :ref:`TypeScript client <js-api-example>` for an example of how to use the bindings programatically.
@ -174,7 +176,7 @@ See the :ref:`TypeScript client <js-api-example>` for an example of how to use t
Using the command-line client Using the command-line client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To download the pre-built binaries for the ``deepspeech`` command-line (compiled C++) client, use ``util/taskcluster.py``\ : To download the pre-built binaries for the ``stt`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
.. code-block:: bash .. code-block:: bash
@ -192,17 +194,17 @@ also, if you need some binaries different than current master, like ``v0.2.0-alp
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "." python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``deepspeech`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of DeepSpeech or TensorFlow can be specified as well. The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``stt`` binary and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of 🐸STT or TensorFlow can be specified as well.
Alternatively you may manually download the ``native_client.tar.xz`` from the [releases](https://github.com/mozilla/DeepSpeech/releases). Alternatively you may manually download the ``native_client.tar.xz`` from the [releases](https://github.com/coqui-ai/STT/releases).
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_. Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
.. code-block:: bash .. code-block:: bash
./deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio audio_input.wav ./stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio_input.wav
See the help output with ``./deepspeech -h`` for more details. See the help output with ``./stt -h`` for more details.
Installing bindings from source Installing bindings from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -212,28 +214,27 @@ If pre-built binaries aren't available for your system, you'll need to install t
Dockerfile for building from source Dockerfile for building from source
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We provide ``Dockerfile.build`` to automatically build ``libdeepspeech.so``, the C++ native client, Python bindings, and KenLM. We provide ``Dockerfile.build`` to automatically build ``libstt.so``, the C++ native client, Python bindings, and KenLM.
You need to generate the Dockerfile from the template using: You need to generate the Dockerfile from the template using:
.. code-block:: bash .. code-block:: bash
make Dockerfile.build make Dockerfile.build
If you want to specify a different DeepSpeech repository / branch, you can pass ``DEEPSPEECH_REPO`` or ``DEEPSPEECH_SHA`` parameters: If you want to specify a different repository / branch, you can pass ``STT_REPO`` or ``STT_SHA`` parameters:
.. code-block:: bash .. code-block:: bash
make Dockerfile.build DEEPSPEECH_REPO=git://your/fork DEEPSPEECH_SHA=origin/your-branch make Dockerfile.build STT_REPO=git://your/fork STT_SHA=origin/your-branch
Third party bindings .. Third party bindings
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
In addition to the bindings above, third party developers have started to provide bindings to other languages: In addition to the bindings above, third party developers have started to provide bindings to other languages:
* `Asticode <https://github.com/asticode>`_ provides `Golang <https://golang.org>`_ bindings in its `go-astideepspeech <https://github.com/asticode/go-astideepspeech>`_ repo.
* `Asticode <https://github.com/asticode>`_ provides `Golang <https://golang.org>`_ bindings in its `go-astideepspeech <https://github.com/asticode/go-astideepspeech>`_ repo. * `RustAudio <https://github.com/RustAudio>`_ provide a `Rust <https://www.rust-lang.org>`_ binding, the installation and use of which is described in their `deepspeech-rs <https://github.com/RustAudio/deepspeech-rs>`_ repo.
* `RustAudio <https://github.com/RustAudio>`_ provide a `Rust <https://www.rust-lang.org>`_ binding, the installation and use of which is described in their `deepspeech-rs <https://github.com/RustAudio/deepspeech-rs>`_ repo. * `stes <https://github.com/stes>`_ provides preliminary `PKGBUILDs <https://wiki.archlinux.org/index.php/PKGBUILD>`_ to install the client and python bindings on `Arch Linux <https://www.archlinux.org/>`_ in the `arch-deepspeech <https://github.com/stes/arch-deepspeech>`_ repo.
* `stes <https://github.com/stes>`_ provides preliminary `PKGBUILDs <https://wiki.archlinux.org/index.php/PKGBUILD>`_ to install the client and python bindings on `Arch Linux <https://www.archlinux.org/>`_ in the `arch-deepspeech <https://github.com/stes/arch-deepspeech>`_ repo. * `gst-deepspeech <https://github.com/Elleo/gst-deepspeech>`_ provides a `GStreamer <https://gstreamer.freedesktop.org/>`_ plugin which can be used from any language with GStreamer bindings.
* `gst-deepspeech <https://github.com/Elleo/gst-deepspeech>`_ provides a `GStreamer <https://gstreamer.freedesktop.org/>`_ plugin which can be used from any language with GStreamer bindings. * `thecodrr <https://github.com/thecodrr>`_ provides `Vlang <https://vlang.io>`_ bindings. The installation and use of which is described in their `vspeech <https://github.com/thecodrr/vspeech>`_ repo.
* `thecodrr <https://github.com/thecodrr>`_ provides `Vlang <https://vlang.io>`_ bindings. The installation and use of which is described in their `vspeech <https://github.com/thecodrr/vspeech>`_ repo. * `eagledot <https://gitlab.com/eagledot>`_ provides `NIM-lang <https://nim-lang.org/>`_ bindings. The installation and use of which is described in their `nim-deepspeech <https://gitlab.com/eagledot/nim-deepspeech>`_ repo.
* `eagledot <https://gitlab.com/eagledot>`_ provides `NIM-lang <https://nim-lang.org/>`_ bindings. The installation and use of which is described in their `nim-deepspeech <https://gitlab.com/eagledot/nim-deepspeech>`_ repo.

View File

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
# #
# DeepSpeech documentation build configuration file, created by # Coqui STT documentation build configuration file, created by
# sphinx-quickstart on Thu Feb 2 21:20:39 2017. # sphinx-quickstart on Thu Feb 2 21:20:39 2017.
# #
# This file is execfile()d with the current directory set to its # This file is execfile()d with the current directory set to its
@ -24,7 +24,7 @@ import sys
sys.path.insert(0, os.path.abspath('../')) sys.path.insert(0, os.path.abspath('../'))
autodoc_mock_imports = ['deepspeech'] autodoc_mock_imports = ['stt']
# This is in fact only relevant on ReadTheDocs, but we want to run the same way # This is in fact only relevant on ReadTheDocs, but we want to run the same way
# on our CI as in RTD to avoid regressions on RTD that we would not catch on # on our CI as in RTD to avoid regressions on RTD that we would not catch on
@ -45,9 +45,9 @@ import semver
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
project = u'DeepSpeech' project = u'Coqui STT'
copyright = '2019-2020 Mozilla Corporation, 2020 DeepSpeech authors' copyright = '2019-2020 Mozilla Corporation, 2020 DeepSpeech authors, 2021 Coqui GmbH'
author = 'DeepSpeech authors' author = 'Coqui GmbH'
with open('../VERSION', 'r') as ver: with open('../VERSION', 'r') as ver:
v = ver.read().strip() v = ver.read().strip()
@ -147,7 +147,7 @@ html_static_path = ['.static']
# -- Options for HTMLHelp output ------------------------------------------ # -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder. # Output file base name for HTML help builder.
htmlhelp_basename = 'DeepSpeechdoc' htmlhelp_basename = 'STTdoc'
# -- Options for LaTeX output --------------------------------------------- # -- Options for LaTeX output ---------------------------------------------
@ -174,8 +174,8 @@ latex_elements = {
# (source start file, target name, title, # (source start file, target name, title,
# author, documentclass [howto, manual, or own class]). # author, documentclass [howto, manual, or own class]).
latex_documents = [ latex_documents = [
(master_doc, 'DeepSpeech.tex', u'DeepSpeech Documentation', (master_doc, 'STT.tex', u'Coqui STT Documentation',
u'DeepSpeech authors', 'manual'), u'Coqui GmbH', 'manual'),
] ]
@ -184,7 +184,7 @@ latex_documents = [
# One entry per manual page. List of tuples # One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section). # (source start file, name, description, authors, manual section).
man_pages = [ man_pages = [
(master_doc, 'deepspeech', u'DeepSpeech Documentation', (master_doc, 'stt', u'Coqui STT Documentation',
[author], 1) [author], 1)
] ]
@ -195,8 +195,8 @@ man_pages = [
# (source start file, target name, title, author, # (source start file, target name, title, author,
# dir menu entry, description, category) # dir menu entry, description, category)
texinfo_documents = [ texinfo_documents = [
(master_doc, 'DeepSpeech', u'DeepSpeech Documentation', (master_doc, 'STT', u'Coqui STT Documentation',
author, 'DeepSpeech', 'One line description of project.', author, 'STT', 'One line description of project.',
'Miscellaneous'), 'Miscellaneous'),
] ]
@ -206,5 +206,5 @@ texinfo_documents = [
# Example configuration for intersphinx: refer to the Python standard library. # Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'https://docs.python.org/': None} intersphinx_mapping = {'https://docs.python.org/': None}
extlinks = {'github': ('https://github.com/mozilla/DeepSpeech/blob/v{}/%s'.format(release), extlinks = {'github': ('https://github.com/coqui-ai/STT/blob/v{}/%s'.format(release),
'%s')} '%s')}

View File

@ -790,7 +790,7 @@ WARN_LOGFILE =
# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING
# Note: If this tag is empty the current directory is searched. # Note: If this tag is empty the current directory is searched.
INPUT = native_client/deepspeech.h INPUT = native_client/coqui-stt.h
# This tag can be used to specify the character encoding of the source files # This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses

View File

@ -1,54 +1,54 @@
.. DeepSpeech documentation master file, created by .. Coqui STT documentation master file, created by
sphinx-quickstart on Thu Feb 2 21:20:39 2017. sphinx-quickstart on Thu Feb 2 21:20:39 2017.
You can adapt this file completely to your liking, but it should at least You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive. contain the root `toctree` directive.
Welcome to DeepSpeech's documentation! Coqui STT
====================================== =========
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier. Coqui STT (🐸STT) is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. 🐸STT uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
To install and use DeepSpeech all you have to do is: To install and use 🐸STT all you have to do is:
.. code-block:: bash .. code-block:: bash
# Create and activate a virtualenv # Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-venv/ virtualenv -p python3 $HOME/tmp/stt/
source $HOME/tmp/deepspeech-venv/bin/activate source $HOME/tmp/stt/bin/activate
# Install DeepSpeech # Install 🐸STT
pip3 install deepspeech pip3 install stt
# Download pre-trained English model files # Download pre-trained English model files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/coqui-stt-0.9.3-models.scorer
# Download example audio files # Download example audio files
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/audio-0.9.3.tar.gz curl -LO https://github.com/coqui-ai/STT/releases/download/v0.9.3/audio-0.9.3.tar.gz
tar xvf audio-0.9.3.tar.gz tar xvf audio-0.9.3.tar.gz
# Transcribe an audio file # Transcribe an audio file
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs <usage-docs>`. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page <https://github.com/mozilla/DeepSpeech/releases/latest>`_. A pre-trained English model is available for use and can be downloaded following the instructions in :ref:`the usage docs <usage-docs>`. For the latest release, including pre-trained models and checkpoints, `see the GitHub releases page <https://github.com/coqui-ai/STT/releases/latest>`_.
Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package: Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/coqui-ai/STT/releases/latest>`_ to find which GPUs are supported. To run ``stt`` on a GPU, install the GPU specific package:
.. code-block:: bash .. code-block:: bash
# Create and activate a virtualenv # Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/ virtualenv -p python3 $HOME/tmp/coqui-stt-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate source $HOME/tmp/coqui-stt-gpu-venv/bin/activate
# Install DeepSpeech CUDA enabled package # Install 🐸STT CUDA enabled package
pip3 install deepspeech-gpu pip3 install stt-gpu
# Transcribe an audio file. # Transcribe an audio file.
deepspeech --model deepspeech-0.9.3-models.pbmm --scorer deepspeech-0.9.3-models.scorer --audio audio/2830-3980-0043.wav stt --model coqui-stt-0.9.3-models.pbmm --scorer coqui-stt-0.9.3-models.scorer --audio audio/2830-3980-0043.wav
Please ensure you have the required :ref:`CUDA dependencies <cuda-inference-deps>`. Please ensure you have the required :ref:`CUDA dependencies <cuda-inference-deps>`.
See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``, please check :ref:`required runtime dependencies <runtime-deps>`). See the output of ``stt -h`` for more information on the use of ``stt``. (If you experience problems running ``stt``, please check :ref:`required runtime dependencies <runtime-deps>`).
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
@ -78,7 +78,7 @@ See the output of ``deepspeech -h`` for more information on the use of ``deepspe
:maxdepth: 2 :maxdepth: 2
:caption: Architecture and training :caption: Architecture and training
DeepSpeech Architecture
Geometry Geometry

View File

@ -9,7 +9,7 @@ if "%SPHINXBUILD%" == "" (
) )
set SOURCEDIR=. set SOURCEDIR=.
set BUILDDIR=.build set BUILDDIR=.build
set SPHINXPROJ=DeepSpeech set SPHINXPROJ="Coqui STT"
if "%1" == "" goto help if "%1" == "" goto help

View File

@ -4,7 +4,7 @@ from __future__ import absolute_import, division, print_function
if __name__ == '__main__': if __name__ == '__main__':
try: try:
from deepspeech_training import evaluate as ds_evaluate from coqui_stt_training import evaluate as ds_evaluate
except ImportError: except ImportError:
print('Training package is not installed. See training documentation.') print('Training package is not installed. See training documentation.')
raise raise

View File

@ -11,8 +11,8 @@ import os
import sys import sys
from deepspeech import Model from deepspeech import Model
from deepspeech_training.util.evaluate_tools import calculate_and_print_report from coqui_stt_training.util.evaluate_tools import calculate_and_print_report
from deepspeech_training.util.flags import create_flags from coqui_stt_training.util.flags import create_flags
from functools import partial from functools import partial
from multiprocessing import JoinableQueue, Process, cpu_count, Manager from multiprocessing import JoinableQueue, Process, cpu_count, Manager
from six.moves import zip, range from six.moves import zip, range

View File

@ -1,6 +1,6 @@
Examples Examples
======== ========
DeepSpeech examples were moved to a separate repository. 🐸STT examples were moved to a separate repository.
New location: https://github.com/mozilla/DeepSpeech-examples New location: https://github.com/coqui-ai/STT-examples

View File

@ -7,12 +7,12 @@ import optuna
import sys import sys
import tensorflow.compat.v1 as tfv1 import tensorflow.compat.v1 as tfv1
from deepspeech_training.evaluate import evaluate from coqui_stt_training.evaluate import evaluate
from deepspeech_training.train import create_model from coqui_stt_training.train import create_model
from deepspeech_training.util.config import Config, initialize_globals from coqui_stt_training.util.config import Config, initialize_globals
from deepspeech_training.util.flags import create_flags, FLAGS from coqui_stt_training.util.flags import create_flags, FLAGS
from deepspeech_training.util.logging import log_error from coqui_stt_training.util.logging import log_error
from deepspeech_training.util.evaluate_tools import wer_cer_batch from coqui_stt_training.util.evaluate_tools import wer_cer_batch
from ds_ctcdecoder import Scorer from ds_ctcdecoder import Scorer

View File

@ -1,4 +1,4 @@
# Description: Deepspeech native client library. # Description: Coqui STT native client library.
load("@org_tensorflow//tensorflow:tensorflow.bzl", "tf_cc_shared_object", "tf_copts", "lrt_if_needed") load("@org_tensorflow//tensorflow:tensorflow.bzl", "tf_cc_shared_object", "tf_copts", "lrt_if_needed")
load("@local_config_cuda//cuda:build_defs.bzl", "if_cuda") load("@local_config_cuda//cuda:build_defs.bzl", "if_cuda")
@ -112,10 +112,10 @@ cc_library(
) )
cc_library( cc_library(
name = "deepspeech_bundle", name = "coqui_stt_bundle",
srcs = [ srcs = [
"deepspeech.cc", "deepspeech.cc",
"deepspeech.h", "coqui-stt.h",
"deepspeech_errors.cc", "deepspeech_errors.cc",
"modelstate.cc", "modelstate.cc",
"modelstate.h", "modelstate.h",
@ -165,7 +165,7 @@ cc_library(
#"//tensorflow/core:all_kernels", #"//tensorflow/core:all_kernels",
### => Trying to be more fine-grained ### => Trying to be more fine-grained
### Use bin/ops_in_graph.py to list all the ops used by a frozen graph. ### Use bin/ops_in_graph.py to list all the ops used by a frozen graph.
### CPU only build, libdeepspeech.so file size reduced by ~50% ### CPU only build, libstt.so file size reduced by ~50%
"//tensorflow/core/kernels:spectrogram_op", # AudioSpectrogram "//tensorflow/core/kernels:spectrogram_op", # AudioSpectrogram
"//tensorflow/core/kernels:bias_op", # BiasAdd "//tensorflow/core/kernels:bias_op", # BiasAdd
"//tensorflow/core/kernels:cast_op", # Cast "//tensorflow/core/kernels:cast_op", # Cast
@ -205,24 +205,24 @@ cc_library(
) )
tf_cc_shared_object( tf_cc_shared_object(
name = "libdeepspeech.so", name = "libstt.so",
deps = [":deepspeech_bundle"], deps = [":coqui_stt_bundle"],
) )
ios_static_framework( ios_static_framework(
name = "deepspeech_ios", name = "coqui_stt_ios",
deps = [":deepspeech_bundle"], deps = [":coqui_stt_bundle"],
families = ["iphone", "ipad"], families = ["iphone", "ipad"],
minimum_os_version = "9.0", minimum_os_version = "9.0",
linkopts = ["-lstdc++"], linkopts = ["-lstdc++"],
) )
genrule( genrule(
name = "libdeepspeech_so_dsym", name = "libstt_so_dsym",
srcs = [":libdeepspeech.so"], srcs = [":libstt.so"],
outs = ["libdeepspeech.so.dSYM"], outs = ["libstt.so.dSYM"],
output_to_bindir = True, output_to_bindir = True,
cmd = "dsymutil $(location :libdeepspeech.so) -o $@" cmd = "dsymutil $(location :libstt.so) -o $@"
) )
cc_binary( cc_binary(

View File

@ -1,5 +1,5 @@
This file contains some notes on coding style within the C++ portion of the This file contains some notes on coding style within the C++ portion of the
DeepSpeech project. It is very much a work in progress and incomplete. 🐸STT project. It is very much a work in progress and incomplete.
General General
======= =======

View File

@ -13,35 +13,35 @@
include definitions.mk include definitions.mk
default: $(DEEPSPEECH_BIN) default: $(STT_BIN)
clean: clean:
rm -f deepspeech rm -f deepspeech
$(DEEPSPEECH_BIN): client.cc Makefile $(STT_BIN): client.cc Makefile
$(CXX) $(CFLAGS) $(CFLAGS_DEEPSPEECH) $(SOX_CFLAGS) client.cc $(LDFLAGS) $(SOX_LDFLAGS) $(CXX) $(CFLAGS) $(CFLAGS_DEEPSPEECH) $(SOX_CFLAGS) client.cc $(LDFLAGS) $(SOX_LDFLAGS)
ifeq ($(OS),Darwin) ifeq ($(OS),Darwin)
install_name_tool -change bazel-out/local-opt/bin/native_client/libdeepspeech.so @rpath/libdeepspeech.so deepspeech install_name_tool -change bazel-out/local-opt/bin/native_client/libstt.so @rpath/libstt.so stt
endif endif
run: $(DEEPSPEECH_BIN) run: $(STT_BIN)
${META_LD_LIBRARY_PATH}=${TFDIR}/bazel-bin/native_client:${${META_LD_LIBRARY_PATH}} ./deepspeech ${ARGS} ${META_LD_LIBRARY_PATH}=${TFDIR}/bazel-bin/native_client:${${META_LD_LIBRARY_PATH}} ./stt ${ARGS}
debug: $(DEEPSPEECH_BIN) debug: $(STT_BIN)
${META_LD_LIBRARY_PATH}=${TFDIR}/bazel-bin/native_client:${${META_LD_LIBRARY_PATH}} gdb --args ./deepspeech ${ARGS} ${META_LD_LIBRARY_PATH}=${TFDIR}/bazel-bin/native_client:${${META_LD_LIBRARY_PATH}} gdb --args ./stt ${ARGS}
install: $(DEEPSPEECH_BIN) install: $(STT_BIN)
install -d ${PREFIX}/lib install -d ${PREFIX}/lib
install -m 0644 ${TFDIR}/bazel-bin/native_client/libdeepspeech.so ${PREFIX}/lib/ install -m 0644 ${TFDIR}/bazel-bin/native_client/libdeepspeech.so ${PREFIX}/lib/
install -d ${PREFIX}/include install -d ${PREFIX}/include
install -m 0644 deepspeech.h ${PREFIX}/include install -m 0644 coqui-stt.h ${PREFIX}/include
install -d ${PREFIX}/bin install -d ${PREFIX}/bin
install -m 0755 deepspeech ${PREFIX}/bin/ install -m 0755 stt ${PREFIX}/bin/
uninstall: uninstall:
rm -f ${PREFIX}/bin/deepspeech rm -f ${PREFIX}/bin/stt
rmdir --ignore-fail-on-non-empty ${PREFIX}/bin rmdir --ignore-fail-on-non-empty ${PREFIX}/bin
rm -f ${PREFIX}/lib/libdeepspeech.so rm -f ${PREFIX}/lib/libstt.so
rmdir --ignore-fail-on-non-empty ${PREFIX}/lib rmdir --ignore-fail-on-non-empty ${PREFIX}/lib
print-toolchain: print-toolchain:

View File

@ -8,7 +8,7 @@
#endif #endif
#include <iostream> #include <iostream>
#include "deepspeech.h" #include "coqui-stt.h"
char* model = NULL; char* model = NULL;
@ -47,7 +47,7 @@ void PrintHelp(const char* bin)
std::cout << std::cout <<
"Usage: " << bin << " --model MODEL [--scorer SCORER] --audio AUDIO [-t] [-e]\n" "Usage: " << bin << " --model MODEL [--scorer SCORER] --audio AUDIO [-t] [-e]\n"
"\n" "\n"
"Running DeepSpeech inference.\n" "Running Coqui STT inference.\n"
"\n" "\n"
"\t--model MODEL\t\t\tPath to the model (protocol buffer binary file)\n" "\t--model MODEL\t\t\tPath to the model (protocol buffer binary file)\n"
"\t--scorer SCORER\t\t\tPath to the external scorer file\n" "\t--scorer SCORER\t\t\tPath to the external scorer file\n"
@ -65,7 +65,7 @@ void PrintHelp(const char* bin)
"\t--help\t\t\t\tShow help\n" "\t--help\t\t\t\tShow help\n"
"\t--version\t\t\tPrint version and exits\n"; "\t--version\t\t\tPrint version and exits\n";
char* version = DS_Version(); char* version = DS_Version();
std::cerr << "DeepSpeech " << version << "\n"; std::cerr << "Coqui STT " << version << "\n";
DS_FreeString(version); DS_FreeString(version);
exit(1); exit(1);
} }
@ -170,7 +170,7 @@ bool ProcessArgs(int argc, char** argv)
if (has_versions) { if (has_versions) {
char* version = DS_Version(); char* version = DS_Version();
std::cout << "DeepSpeech " << version << "\n"; std::cout << "Coqui " << version << "\n";
DS_FreeString(version); DS_FreeString(version);
return false; return false;
} }

View File

@ -22,8 +22,8 @@ echo "STABLE_TF_GIT_VERSION ${tf_git_rev}"
pushd $(dirname "$0") pushd $(dirname "$0")
ds_git_rev=$(git describe --long --tags) ds_git_rev=$(git describe --long --tags)
echo "STABLE_DS_GIT_VERSION ${ds_git_rev}" echo "STABLE_DS_GIT_VERSION ${ds_git_rev}"
ds_version=$(cat ../training/deepspeech_training/VERSION) ds_version=$(cat ../training/coqui_stt_training/VERSION)
echo "STABLE_DS_VERSION ${ds_version}" echo "STABLE_DS_VERSION ${ds_version}"
ds_graph_version=$(cat ../training/deepspeech_training/GRAPH_VERSION) ds_graph_version=$(cat ../training/coqui_stt_training/GRAPH_VERSION)
echo "STABLE_DS_GRAPH_VERSION ${ds_graph_version}" echo "STABLE_DS_GRAPH_VERSION ${ds_graph_version}"
popd popd

View File

@ -34,7 +34,7 @@
#endif // NO_DIR #endif // NO_DIR
#include <vector> #include <vector>
#include "deepspeech.h" #include "coqui-stt.h"
#include "args.h" #include "args.h"
typedef struct { typedef struct {
@ -406,7 +406,7 @@ ProcessFile(ModelState* context, const char* path, bool show_times)
{ {
ds_audio_buffer audio = GetAudioBuffer(path, DS_GetModelSampleRate(context)); ds_audio_buffer audio = GetAudioBuffer(path, DS_GetModelSampleRate(context));
// Pass audio to DeepSpeech // Pass audio to STT
// We take half of buffer_size because buffer is a char* while // We take half of buffer_size because buffer is a char* while
// LocalDsSTT() expected a short* // LocalDsSTT() expected a short*
ds_result result = LocalDsSTT(context, ds_result result = LocalDsSTT(context,
@ -450,7 +450,7 @@ main(int argc, char **argv)
return 1; return 1;
} }
// Initialise DeepSpeech // Initialise STT
ModelState* ctx; ModelState* ctx;
// sphinx-doc: c_ref_model_start // sphinx-doc: c_ref_model_start
int status = DS_CreateModel(model, &ctx); int status = DS_CreateModel(model, &ctx);

View File

@ -1,5 +1,5 @@
#ifndef DEEPSPEECH_H #ifndef COQUI_STT_H
#define DEEPSPEECH_H #define COQUI_STT_H
#ifdef __cplusplus #ifdef __cplusplus
extern "C" { extern "C" {
@ -7,12 +7,12 @@ extern "C" {
#ifndef SWIG #ifndef SWIG
#if defined _MSC_VER #if defined _MSC_VER
#define DEEPSPEECH_EXPORT __declspec(dllexport) #define STT_EXPORT __declspec(dllexport)
#else #else
#define DEEPSPEECH_EXPORT __attribute__ ((visibility("default"))) #define STT_EXPORT __attribute__ ((visibility("default")))
#endif /*End of _MSC_VER*/ #endif /*End of _MSC_VER*/
#else #else
#define DEEPSPEECH_EXPORT #define STT_EXPORT
#endif #endif
typedef struct ModelState ModelState; typedef struct ModelState ModelState;
@ -96,14 +96,14 @@ DS_FOR_EACH_ERROR(DEFINE)
}; };
/** /**
* @brief An object providing an interface to a trained DeepSpeech model. * @brief An object providing an interface to a trained Coqui STT model.
* *
* @param aModelPath The path to the frozen model graph. * @param aModelPath The path to the frozen model graph.
* @param[out] retval a ModelState pointer * @param[out] retval a ModelState pointer
* *
* @return Zero on success, non-zero on failure. * @return Zero on success, non-zero on failure.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_CreateModel(const char* aModelPath, int DS_CreateModel(const char* aModelPath,
ModelState** retval); ModelState** retval);
@ -116,7 +116,7 @@ int DS_CreateModel(const char* aModelPath,
* *
* @return Beam width value used by the model. * @return Beam width value used by the model.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
unsigned int DS_GetModelBeamWidth(const ModelState* aCtx); unsigned int DS_GetModelBeamWidth(const ModelState* aCtx);
/** /**
@ -128,7 +128,7 @@ unsigned int DS_GetModelBeamWidth(const ModelState* aCtx);
* *
* @return Zero on success, non-zero on failure. * @return Zero on success, non-zero on failure.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_SetModelBeamWidth(ModelState* aCtx, int DS_SetModelBeamWidth(ModelState* aCtx,
unsigned int aBeamWidth); unsigned int aBeamWidth);
@ -139,13 +139,13 @@ int DS_SetModelBeamWidth(ModelState* aCtx,
* *
* @return Sample rate expected by the model for its input. * @return Sample rate expected by the model for its input.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_GetModelSampleRate(const ModelState* aCtx); int DS_GetModelSampleRate(const ModelState* aCtx);
/** /**
* @brief Frees associated resources and destroys model object. * @brief Frees associated resources and destroys model object.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
void DS_FreeModel(ModelState* ctx); void DS_FreeModel(ModelState* ctx);
/** /**
@ -156,7 +156,7 @@ void DS_FreeModel(ModelState* ctx);
* *
* @return Zero on success, non-zero on failure (invalid arguments). * @return Zero on success, non-zero on failure (invalid arguments).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_EnableExternalScorer(ModelState* aCtx, int DS_EnableExternalScorer(ModelState* aCtx,
const char* aScorerPath); const char* aScorerPath);
@ -171,7 +171,7 @@ int DS_EnableExternalScorer(ModelState* aCtx,
* *
* @return Zero on success, non-zero on failure (invalid arguments). * @return Zero on success, non-zero on failure (invalid arguments).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_AddHotWord(ModelState* aCtx, int DS_AddHotWord(ModelState* aCtx,
const char* word, const char* word,
float boost); float boost);
@ -184,7 +184,7 @@ int DS_AddHotWord(ModelState* aCtx,
* *
* @return Zero on success, non-zero on failure (invalid arguments). * @return Zero on success, non-zero on failure (invalid arguments).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_EraseHotWord(ModelState* aCtx, int DS_EraseHotWord(ModelState* aCtx,
const char* word); const char* word);
@ -195,7 +195,7 @@ int DS_EraseHotWord(ModelState* aCtx,
* *
* @return Zero on success, non-zero on failure (invalid arguments). * @return Zero on success, non-zero on failure (invalid arguments).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_ClearHotWords(ModelState* aCtx); int DS_ClearHotWords(ModelState* aCtx);
/** /**
@ -205,7 +205,7 @@ int DS_ClearHotWords(ModelState* aCtx);
* *
* @return Zero on success, non-zero on failure. * @return Zero on success, non-zero on failure.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_DisableExternalScorer(ModelState* aCtx); int DS_DisableExternalScorer(ModelState* aCtx);
/** /**
@ -217,13 +217,13 @@ int DS_DisableExternalScorer(ModelState* aCtx);
* *
* @return Zero on success, non-zero on failure. * @return Zero on success, non-zero on failure.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_SetScorerAlphaBeta(ModelState* aCtx, int DS_SetScorerAlphaBeta(ModelState* aCtx,
float aAlpha, float aAlpha,
float aBeta); float aBeta);
/** /**
* @brief Use the DeepSpeech model to convert speech to text. * @brief Use the Coqui STT model to convert speech to text.
* *
* @param aCtx The ModelState pointer for the model to use. * @param aCtx The ModelState pointer for the model to use.
* @param aBuffer A 16-bit, mono raw audio signal at the appropriate * @param aBuffer A 16-bit, mono raw audio signal at the appropriate
@ -233,13 +233,13 @@ int DS_SetScorerAlphaBeta(ModelState* aCtx,
* @return The STT result. The user is responsible for freeing the string using * @return The STT result. The user is responsible for freeing the string using
* {@link DS_FreeString()}. Returns NULL on error. * {@link DS_FreeString()}. Returns NULL on error.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
char* DS_SpeechToText(ModelState* aCtx, char* DS_SpeechToText(ModelState* aCtx,
const short* aBuffer, const short* aBuffer,
unsigned int aBufferSize); unsigned int aBufferSize);
/** /**
* @brief Use the DeepSpeech model to convert speech to text and output results * @brief Use the Coqui STT model to convert speech to text and output results
* including metadata. * including metadata.
* *
* @param aCtx The ModelState pointer for the model to use. * @param aCtx The ModelState pointer for the model to use.
@ -253,7 +253,7 @@ char* DS_SpeechToText(ModelState* aCtx,
* user is responsible for freeing Metadata by calling {@link DS_FreeMetadata()}. * user is responsible for freeing Metadata by calling {@link DS_FreeMetadata()}.
* Returns NULL on error. * Returns NULL on error.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
Metadata* DS_SpeechToTextWithMetadata(ModelState* aCtx, Metadata* DS_SpeechToTextWithMetadata(ModelState* aCtx,
const short* aBuffer, const short* aBuffer,
unsigned int aBufferSize, unsigned int aBufferSize,
@ -270,7 +270,7 @@ Metadata* DS_SpeechToTextWithMetadata(ModelState* aCtx,
* *
* @return Zero for success, non-zero on failure. * @return Zero for success, non-zero on failure.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
int DS_CreateStream(ModelState* aCtx, int DS_CreateStream(ModelState* aCtx,
StreamingState** retval); StreamingState** retval);
@ -282,7 +282,7 @@ int DS_CreateStream(ModelState* aCtx,
* appropriate sample rate (matching what the model was trained on). * appropriate sample rate (matching what the model was trained on).
* @param aBufferSize The number of samples in @p aBuffer. * @param aBufferSize The number of samples in @p aBuffer.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
void DS_FeedAudioContent(StreamingState* aSctx, void DS_FeedAudioContent(StreamingState* aSctx,
const short* aBuffer, const short* aBuffer,
unsigned int aBufferSize); unsigned int aBufferSize);
@ -295,7 +295,7 @@ void DS_FeedAudioContent(StreamingState* aSctx,
* @return The STT intermediate result. The user is responsible for freeing the * @return The STT intermediate result. The user is responsible for freeing the
* string using {@link DS_FreeString()}. * string using {@link DS_FreeString()}.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
char* DS_IntermediateDecode(const StreamingState* aSctx); char* DS_IntermediateDecode(const StreamingState* aSctx);
/** /**
@ -310,7 +310,7 @@ char* DS_IntermediateDecode(const StreamingState* aSctx);
* responsible for freeing Metadata by calling {@link DS_FreeMetadata()}. * responsible for freeing Metadata by calling {@link DS_FreeMetadata()}.
* Returns NULL on error. * Returns NULL on error.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
Metadata* DS_IntermediateDecodeWithMetadata(const StreamingState* aSctx, Metadata* DS_IntermediateDecodeWithMetadata(const StreamingState* aSctx,
unsigned int aNumResults); unsigned int aNumResults);
@ -325,7 +325,7 @@ Metadata* DS_IntermediateDecodeWithMetadata(const StreamingState* aSctx,
* *
* @note This method will free the state pointer (@p aSctx). * @note This method will free the state pointer (@p aSctx).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
char* DS_FinishStream(StreamingState* aSctx); char* DS_FinishStream(StreamingState* aSctx);
/** /**
@ -343,7 +343,7 @@ char* DS_FinishStream(StreamingState* aSctx);
* *
* @note This method will free the state pointer (@p aSctx). * @note This method will free the state pointer (@p aSctx).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
Metadata* DS_FinishStreamWithMetadata(StreamingState* aSctx, Metadata* DS_FinishStreamWithMetadata(StreamingState* aSctx,
unsigned int aNumResults); unsigned int aNumResults);
@ -356,19 +356,19 @@ Metadata* DS_FinishStreamWithMetadata(StreamingState* aSctx,
* *
* @note This method will free the state pointer (@p aSctx). * @note This method will free the state pointer (@p aSctx).
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
void DS_FreeStream(StreamingState* aSctx); void DS_FreeStream(StreamingState* aSctx);
/** /**
* @brief Free memory allocated for metadata information. * @brief Free memory allocated for metadata information.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
void DS_FreeMetadata(Metadata* m); void DS_FreeMetadata(Metadata* m);
/** /**
* @brief Free a char* string returned by the DeepSpeech API. * @brief Free a char* string returned by the Coqui STT API.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
void DS_FreeString(char* str); void DS_FreeString(char* str);
/** /**
@ -377,7 +377,7 @@ void DS_FreeString(char* str);
* *
* @return The version string. * @return The version string.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
char* DS_Version(); char* DS_Version();
/** /**
@ -386,13 +386,13 @@ char* DS_Version();
* *
* @return The error description. * @return The error description.
*/ */
DEEPSPEECH_EXPORT STT_EXPORT
char* DS_ErrorCodeToErrorMessage(int aErrorCode); char* DS_ErrorCodeToErrorMessage(int aErrorCode);
#undef DEEPSPEECH_EXPORT #undef STT_EXPORT
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif
#endif /* DEEPSPEECH_H */ #endif /* COQUI_STT_H */

View File

@ -125,7 +125,7 @@ int Scorer::load_trie(std::ifstream& fin, const std::string& file_path)
if (version < FILE_VERSION) { if (version < FILE_VERSION) {
std::cerr << "Update your scorer file."; std::cerr << "Update your scorer file.";
} else { } else {
std::cerr << "Downgrade your scorer file or update your version of DeepSpeech."; std::cerr << "Downgrade your scorer file or update your version of Coqui STT.";
} }
std::cerr << std::endl; std::cerr << std::endl;
return DS_ERR_SCORER_VERSION_MISMATCH; return DS_ERR_SCORER_VERSION_MISMATCH;

View File

@ -13,7 +13,7 @@
#include "path_trie.h" #include "path_trie.h"
#include "alphabet.h" #include "alphabet.h"
#include "deepspeech.h" #include "coqui-stt.h"
const double OOV_SCORE = -1000.0; const double OOV_SCORE = -1000.0;
const std::string START_TOKEN = "<s>"; const std::string START_TOKEN = "<s>";

View File

@ -51,7 +51,7 @@ def maybe_rebuild(srcs, out_name, build_dir):
num_parallel=known_args.num_processes, num_parallel=known_args.num_processes,
debug=debug) debug=debug)
project_version = read('../../training/deepspeech_training/VERSION').strip() project_version = read('../../training/coqui_stt_training/VERSION').strip()
build_dir = 'temp_build/temp_build' build_dir = 'temp_build/temp_build'

View File

@ -44,14 +44,14 @@ namespace std {
%constant const char* __version__ = ds_version(); %constant const char* __version__ = ds_version();
%constant const char* __git_version__ = ds_git_version(); %constant const char* __git_version__ = ds_git_version();
// Import only the error code enum definitions from deepspeech.h // Import only the error code enum definitions from coqui-stt.h
// We can't just do |%ignore "";| here because it affects this file globally (even // We can't just do |%ignore "";| here because it affects this file globally (even
// files %include'd above). That causes SWIG to lose destructor information and // files %include'd above). That causes SWIG to lose destructor information and
// leads to leaks of the wrapper objects. // leads to leaks of the wrapper objects.
// Instead we ignore functions and classes (structs), which are the only other // Instead we ignore functions and classes (structs), which are the only other
// things in deepspeech.h. If we add some new construct to deepspeech.h we need // things in coqui-stt.h. If we add some new construct to coqui-stt.h we need
// to update the ignore rules here to avoid exposing unwanted APIs in the decoder // to update the ignore rules here to avoid exposing unwanted APIs in the decoder
// package. // package.
%rename("$ignore", %$isfunction) ""; %rename("$ignore", %$isfunction) "";
%rename("$ignore", %$isclass) ""; %rename("$ignore", %$isclass) "";
%include "../deepspeech.h" %include "../coqui-stt.h"

View File

@ -9,7 +9,7 @@
#include <utility> #include <utility>
#include <vector> #include <vector>
#include "deepspeech.h" #include "coqui-stt.h"
#include "alphabet.h" #include "alphabet.h"
#include "modelstate.h" #include "modelstate.h"
@ -25,7 +25,7 @@
#ifdef __ANDROID__ #ifdef __ANDROID__
#include <android/log.h> #include <android/log.h>
#define LOG_TAG "libdeepspeech" #define LOG_TAG "libstt"
#define LOGD(...) __android_log_print(ANDROID_LOG_DEBUG, LOG_TAG, __VA_ARGS__) #define LOGD(...) __android_log_print(ANDROID_LOG_DEBUG, LOG_TAG, __VA_ARGS__)
#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR, LOG_TAG, __VA_ARGS__) #define LOGE(...) __android_log_print(ANDROID_LOG_ERROR, LOG_TAG, __VA_ARGS__)
#else #else
@ -269,12 +269,12 @@ DS_CreateModel(const char* aModelPath,
*retval = nullptr; *retval = nullptr;
std::cerr << "TensorFlow: " << tf_local_git_version() << std::endl; std::cerr << "TensorFlow: " << tf_local_git_version() << std::endl;
std::cerr << "DeepSpeech: " << ds_git_version() << std::endl; std::cerr << " Coqui STT: " << ds_git_version() << std::endl;
#ifdef __ANDROID__ #ifdef __ANDROID__
LOGE("TensorFlow: %s", tf_local_git_version()); LOGE("TensorFlow: %s", tf_local_git_version());
LOGD("TensorFlow: %s", tf_local_git_version()); LOGD("TensorFlow: %s", tf_local_git_version());
LOGE("DeepSpeech: %s", ds_git_version()); LOGE(" Coqui STT: %s", ds_git_version());
LOGD("DeepSpeech: %s", ds_git_version()); LOGD(" Coqui STT: %s", ds_git_version());
#endif #endif
if (!aModelPath || strlen(aModelPath) < 1) { if (!aModelPath || strlen(aModelPath) < 1) {

View File

@ -1,4 +1,4 @@
#include "deepspeech.h" #include "coqui-stt.h"
#include <string.h> #include <string.h>
char* char*

View File

@ -18,8 +18,8 @@ ifeq ($(findstring _NT,$(OS)),_NT)
PLATFORM_EXE_SUFFIX := .exe PLATFORM_EXE_SUFFIX := .exe
endif endif
DEEPSPEECH_BIN := deepspeech$(PLATFORM_EXE_SUFFIX) STT_BIN := stt$(PLATFORM_EXE_SUFFIX)
CFLAGS_DEEPSPEECH := -std=c++11 -o $(DEEPSPEECH_BIN) CFLAGS_DEEPSPEECH := -std=c++11 -o $(STT_BIN)
LINK_DEEPSPEECH := -ldeepspeech LINK_DEEPSPEECH := -ldeepspeech
LINK_PATH_DEEPSPEECH := -L${TFDIR}/bazel-bin/native_client LINK_PATH_DEEPSPEECH := -L${TFDIR}/bazel-bin/native_client
@ -63,7 +63,7 @@ TOOL_LD := link.exe
TOOL_LIBEXE := lib.exe TOOL_LIBEXE := lib.exe
LINK_DEEPSPEECH := $(TFDIR)\bazel-bin\native_client\libdeepspeech.so.if.lib LINK_DEEPSPEECH := $(TFDIR)\bazel-bin\native_client\libdeepspeech.so.if.lib
LINK_PATH_DEEPSPEECH := LINK_PATH_DEEPSPEECH :=
CFLAGS_DEEPSPEECH := -nologo -Fe$(DEEPSPEECH_BIN) CFLAGS_DEEPSPEECH := -nologo -Fe$(STT_BIN)
SOX_CFLAGS := SOX_CFLAGS :=
SOX_LDFLAGS := SOX_LDFLAGS :=
PYTHON_PACKAGES := numpy${NUMPY_BUILD_VERSION} PYTHON_PACKAGES := numpy${NUMPY_BUILD_VERSION}

View File

@ -11,7 +11,7 @@ using namespace std;
#include "ctcdecode/decoder_utils.h" #include "ctcdecode/decoder_utils.h"
#include "ctcdecode/scorer.h" #include "ctcdecode/scorer.h"
#include "alphabet.h" #include "alphabet.h"
#include "deepspeech.h" #include "coqui-stt.h"
namespace po = boost::program_options; namespace po = boost::program_options;

View File

@ -1 +1 @@
Full project description and documentation on GitHub: [https://github.com/mozilla/DeepSpeech](https://github.com/mozilla/DeepSpeech). Full project description and documentation on [https://stt.readthedocs.io/](https://stt.readthedocs.io/).

View File

@ -2,7 +2,7 @@
%{ %{
#define SWIG_FILE_WITH_INIT #define SWIG_FILE_WITH_INIT
#include "../../deepspeech.h" #include "../../coqui-stt.h"
%} %}
%include "typemaps.i" %include "typemaps.i"
@ -71,4 +71,4 @@
%ignore "Metadata::transcripts"; %ignore "Metadata::transcripts";
%ignore "CandidateTranscript::tokens"; %ignore "CandidateTranscript::tokens";
%include "../deepspeech.h" %include "../coqui-stt.h"

View File

@ -2,8 +2,8 @@ NODE_BUILD_TOOL ?= node-pre-gyp
NODE_ABI_TARGET ?= NODE_ABI_TARGET ?=
NODE_BUILD_VERBOSE ?= --verbose NODE_BUILD_VERBOSE ?= --verbose
NPM_TOOL ?= npm NPM_TOOL ?= npm
PROJECT_NAME ?= deepspeech PROJECT_NAME ?= stt
PROJECT_VERSION ?= $(shell cat ../../training/deepspeech_training/VERSION | tr -d '\n') PROJECT_VERSION ?= $(shell cat ../../training/coqui_stt_training/VERSION | tr -d '\n')
NPM_ROOT ?= $(shell npm root) NPM_ROOT ?= $(shell npm root)
NODE_PRE_GYP_ABI_CROSSWALK_FILE ?= $(NPM_ROOT)/../abi_crosswalk_priv.json NODE_PRE_GYP_ABI_CROSSWALK_FILE ?= $(NPM_ROOT)/../abi_crosswalk_priv.json

View File

@ -1 +1,3 @@
Full project description and documentation on [https://deepspeech.readthedocs.io/](https://deepspeech.readthedocs.io/). Full project description and documentation on [https://stt.readthedocs.io/](https://stt.readthedocs.io/).
Special thanks to [Huan - Google Developers Experts in Machine Learning (ML GDE)](https://github.com/huan) for providing the STT project name on npmjs.org

View File

@ -14,7 +14,7 @@ const Duplex = require("stream").Duplex;
class VersionAction extends argparse.Action { class VersionAction extends argparse.Action {
call(parser: argparse.ArgumentParser, namespace: argparse.Namespace, values: string | string[], optionString: string | null) { call(parser: argparse.ArgumentParser, namespace: argparse.Namespace, values: string | string[], optionString: string | null) {
console.log('DeepSpeech ' + Ds.Version()); console.log('Coqui STT ' + Ds.Version());
let runtime = 'Node'; let runtime = 'Node';
if (process.versions.electron) { if (process.versions.electron) {
runtime = 'Electron'; runtime = 'Electron';
@ -24,7 +24,7 @@ class VersionAction extends argparse.Action {
} }
} }
let parser = new argparse.ArgumentParser({addHelp: true, description: 'Running DeepSpeech inference.'}); let parser = new argparse.ArgumentParser({addHelp: true, description: 'Running Coqui STT inference.'});
parser.addArgument(['--model'], {required: true, help: 'Path to the model (protocol buffer binary file)'}); parser.addArgument(['--model'], {required: true, help: 'Path to the model (protocol buffer binary file)'});
parser.addArgument(['--scorer'], {help: 'Path to the external scorer file'}); parser.addArgument(['--scorer'], {help: 'Path to the external scorer file'});
parser.addArgument(['--audio'], {required: true, help: 'Path to the audio file to run (WAV format)'}); parser.addArgument(['--audio'], {required: true, help: 'Path to the audio file to run (WAV format)'});

View File

@ -5,7 +5,7 @@
#define SWIG_FILE_WITH_INIT #define SWIG_FILE_WITH_INIT
#include <string.h> #include <string.h>
#include <node_buffer.h> #include <node_buffer.h>
#include "deepspeech.h" #include "coqui-stt.h"
using namespace v8; using namespace v8;
using namespace node; using namespace node;
@ -95,4 +95,4 @@ using namespace node;
%rename ("%(strip:[DS_])s") ""; %rename ("%(strip:[DS_])s") "";
%include "../deepspeech.h" %include "../coqui-stt.h"

View File

@ -1,7 +1,7 @@
import binary from 'node-pre-gyp'; import binary from 'node-pre-gyp';
import path from 'path'; import path from 'path';
// 'lib', 'binding', 'v0.1.1', ['node', 'v' + process.versions.modules, process.platform, process.arch].join('-'), 'deepspeech-bindings.node') // 'lib', 'binding', 'v0.1.1', ['node', 'v' + process.versions.modules, process.platform, process.arch].join('-'), 'stt-bindings.node')
const binding_path = binary.find(path.resolve(path.join(__dirname, 'package.json'))); const binding_path = binary.find(path.resolve(path.join(__dirname, 'package.json')));
// On Windows, we can't rely on RPATH being set to $ORIGIN/../ or on // On Windows, we can't rely on RPATH being set to $ORIGIN/../ or on
@ -62,7 +62,7 @@ export interface Metadata {
} }
/** /**
* Provides an interface to a DeepSpeech stream. The constructor cannot be called * Provides an interface to a Coqui STT stream. The constructor cannot be called
* directly, use :js:func:`Model.createStream`. * directly, use :js:func:`Model.createStream`.
*/ */
class StreamImpl { class StreamImpl {
@ -142,7 +142,7 @@ class StreamImpl {
export type Stream = StreamImpl; export type Stream = StreamImpl;
/** /**
* An object providing an interface to a trained DeepSpeech model. * An object providing an interface to a trained Coqui STT model.
*/ */
export class Model { export class Model {
/** @internal */ /** @internal */
@ -282,7 +282,7 @@ export class Model {
} }
/** /**
* Use the DeepSpeech model to perform Speech-To-Text. * Use the Coqui STT model to perform Speech-To-Text.
* *
* @param aBuffer A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). * @param aBuffer A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
* *
@ -293,7 +293,7 @@ export class Model {
} }
/** /**
* Use the DeepSpeech model to perform Speech-To-Text and output metadata * Use the Coqui STT model to perform Speech-To-Text and output metadata
* about the results. * about the results.
* *
* @param aBuffer A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). * @param aBuffer A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).

View File

@ -1,15 +1,15 @@
{ {
"name" : "$(PROJECT_NAME)", "name" : "$(PROJECT_NAME)",
"version" : "$(PROJECT_VERSION)", "version" : "$(PROJECT_VERSION)",
"description" : "DeepSpeech NodeJS bindings", "description" : "Coqui STT NodeJS bindings",
"main" : "./index.js", "main" : "./index.js",
"types": "./index.d.ts", "types": "./index.d.ts",
"bin": { "bin": {
"deepspeech": "./client.js" "stt": "./client.js"
}, },
"author" : "DeepSpeech authors", "author" : "Coqui GmbH",
"license": "MPL-2.0", "license": "MPL-2.0",
"homepage": "https://github.com/mozilla/DeepSpeech/tree/v$(PROJECT_VERSION)#project-deepspeech", "homepage": "https://github.com/coqui-ai/STT",
"files": [ "files": [
"README.md", "README.md",
"client.js", "client.js",
@ -18,18 +18,18 @@
"lib/*" "lib/*"
], ],
"bugs": { "bugs": {
"url": "https://github.com/mozilla/DeepSpeech/issues" "url": "https://github.com/coqui-ai/STT/issues"
}, },
"repository" : { "repository" : {
"type" : "git", "type" : "git",
"url" : "git://github.com/mozilla/DeepSpeech.git" "url" : "git://github.com/coqui-ai/STT.git"
}, },
"binary": { "binary": {
"module_name" : "deepspeech", "module_name" : "stt",
"module_path" : "./lib/binding/v{version}/{platform}-{arch}/{node_abi}/", "module_path" : "./lib/binding/v{version}/{platform}-{arch}/{node_abi}/",
"remote_path" : "./v{version}/{configuration}/", "remote_path" : "./v{version}/{configuration}/",
"package_name": "{module_name}-v{version}-{node_abi}-{platform}-{arch}.tar.gz", "package_name": "{module_name}-v{version}-{node_abi}-{platform}-{arch}.tar.gz",
"host" : "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.v1.0.0-warpctc.arm/artifacts/public/" "host" : "https://host.invalid"
}, },
"dependencies" : { "dependencies" : {
"node-pre-gyp": "0.15.x", "node-pre-gyp": "0.15.x",

View File

@ -3,7 +3,7 @@
#include <vector> #include <vector>
#include "deepspeech.h" #include "coqui-stt.h"
#include "alphabet.h" #include "alphabet.h"
#include "ctcdecode/scorer.h" #include "ctcdecode/scorer.h"

View File

@ -1 +1 @@
Full project description and documentation on `https://deepspeech.readthedocs.io/ <https://deepspeech.readthedocs.io/>`_ Full project description and documentation on `https://stt.readthedocs.io/ <https://stt.readthedocs.io/>`_

View File

@ -17,14 +17,14 @@ if platform.system().lower() == "windows":
# directory for the dynamic linker # directory for the dynamic linker
os.environ['PATH'] = dslib_path + ';' + os.environ['PATH'] os.environ['PATH'] = dslib_path + ';' + os.environ['PATH']
import deepspeech import stt
# rename for backwards compatibility # rename for backwards compatibility
from deepspeech.impl import Version as version from stt.impl import Version as version
class Model(object): class Model(object):
""" """
Class holding a DeepSpeech model Class holding a Coqui STT model
:param aModelPath: Path to model file to load :param aModelPath: Path to model file to load
:type aModelPath: str :type aModelPath: str
@ -33,14 +33,14 @@ class Model(object):
# make sure the attribute is there if CreateModel fails # make sure the attribute is there if CreateModel fails
self._impl = None self._impl = None
status, impl = deepspeech.impl.CreateModel(model_path) status, impl = stt.impl.CreateModel(model_path)
if status != 0: if status != 0:
raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status)) raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(stt.impl.ErrorCodeToErrorMessage(status),status))
self._impl = impl self._impl = impl
def __del__(self): def __del__(self):
if self._impl: if self._impl:
deepspeech.impl.FreeModel(self._impl) stt.impl.FreeModel(self._impl)
self._impl = None self._impl = None
def beamWidth(self): def beamWidth(self):
@ -51,7 +51,7 @@ class Model(object):
:return: Beam width value used by the model. :return: Beam width value used by the model.
:type: int :type: int
""" """
return deepspeech.impl.GetModelBeamWidth(self._impl) return stt.impl.GetModelBeamWidth(self._impl)
def setBeamWidth(self, beam_width): def setBeamWidth(self, beam_width):
""" """
@ -63,7 +63,7 @@ class Model(object):
:return: Zero on success, non-zero on failure. :return: Zero on success, non-zero on failure.
:type: int :type: int
""" """
return deepspeech.impl.SetModelBeamWidth(self._impl, beam_width) return stt.impl.SetModelBeamWidth(self._impl, beam_width)
def sampleRate(self): def sampleRate(self):
""" """
@ -72,7 +72,7 @@ class Model(object):
:return: Sample rate. :return: Sample rate.
:type: int :type: int
""" """
return deepspeech.impl.GetModelSampleRate(self._impl) return stt.impl.GetModelSampleRate(self._impl)
def enableExternalScorer(self, scorer_path): def enableExternalScorer(self, scorer_path):
""" """
@ -83,9 +83,9 @@ class Model(object):
:throws: RuntimeError on error :throws: RuntimeError on error
""" """
status = deepspeech.impl.EnableExternalScorer(self._impl, scorer_path) status = stt.impl.EnableExternalScorer(self._impl, scorer_path)
if status != 0: if status != 0:
raise RuntimeError("EnableExternalScorer failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status)) raise RuntimeError("EnableExternalScorer failed with '{}' (0x{:X})".format(stt.impl.ErrorCodeToErrorMessage(status),status))
def disableExternalScorer(self): def disableExternalScorer(self):
""" """
@ -93,7 +93,7 @@ class Model(object):
:return: Zero on success, non-zero on failure. :return: Zero on success, non-zero on failure.
""" """
return deepspeech.impl.DisableExternalScorer(self._impl) return stt.impl.DisableExternalScorer(self._impl)
def addHotWord(self, word, boost): def addHotWord(self, word, boost):
""" """
@ -109,9 +109,9 @@ class Model(object):
:throws: RuntimeError on error :throws: RuntimeError on error
""" """
status = deepspeech.impl.AddHotWord(self._impl, word, boost) status = stt.impl.AddHotWord(self._impl, word, boost)
if status != 0: if status != 0:
raise RuntimeError("AddHotWord failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status)) raise RuntimeError("AddHotWord failed with '{}' (0x{:X})".format(stt.impl.ErrorCodeToErrorMessage(status),status))
def eraseHotWord(self, word): def eraseHotWord(self, word):
""" """
@ -122,9 +122,9 @@ class Model(object):
:throws: RuntimeError on error :throws: RuntimeError on error
""" """
status = deepspeech.impl.EraseHotWord(self._impl, word) status = stt.impl.EraseHotWord(self._impl, word)
if status != 0: if status != 0:
raise RuntimeError("EraseHotWord failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status)) raise RuntimeError("EraseHotWord failed with '{}' (0x{:X})".format(stt.impl.ErrorCodeToErrorMessage(status),status))
def clearHotWords(self): def clearHotWords(self):
""" """
@ -132,9 +132,9 @@ class Model(object):
:throws: RuntimeError on error :throws: RuntimeError on error
""" """
status = deepspeech.impl.ClearHotWords(self._impl) status = stt.impl.ClearHotWords(self._impl)
if status != 0: if status != 0:
raise RuntimeError("ClearHotWords failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status)) raise RuntimeError("ClearHotWords failed with '{}' (0x{:X})".format(stt.impl.ErrorCodeToErrorMessage(status),status))
def setScorerAlphaBeta(self, alpha, beta): def setScorerAlphaBeta(self, alpha, beta):
""" """
@ -149,11 +149,11 @@ class Model(object):
:return: Zero on success, non-zero on failure. :return: Zero on success, non-zero on failure.
:type: int :type: int
""" """
return deepspeech.impl.SetScorerAlphaBeta(self._impl, alpha, beta) return stt.impl.SetScorerAlphaBeta(self._impl, alpha, beta)
def stt(self, audio_buffer): def stt(self, audio_buffer):
""" """
Use the DeepSpeech model to perform Speech-To-Text. Use the Coqui STT model to perform Speech-To-Text.
:param audio_buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). :param audio_buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
:type audio_buffer: numpy.int16 array :type audio_buffer: numpy.int16 array
@ -161,11 +161,11 @@ class Model(object):
:return: The STT result. :return: The STT result.
:type: str :type: str
""" """
return deepspeech.impl.SpeechToText(self._impl, audio_buffer) return stt.impl.SpeechToText(self._impl, audio_buffer)
def sttWithMetadata(self, audio_buffer, num_results=1): def sttWithMetadata(self, audio_buffer, num_results=1):
""" """
Use the DeepSpeech model to perform Speech-To-Text and return results including metadata. Use the Coqui STT model to perform Speech-To-Text and return results including metadata.
:param audio_buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on). :param audio_buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
:type audio_buffer: numpy.int16 array :type audio_buffer: numpy.int16 array
@ -176,7 +176,7 @@ class Model(object):
:return: Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. :return: Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
:type: :func:`Metadata` :type: :func:`Metadata`
""" """
return deepspeech.impl.SpeechToTextWithMetadata(self._impl, audio_buffer, num_results) return stt.impl.SpeechToTextWithMetadata(self._impl, audio_buffer, num_results)
def createStream(self): def createStream(self):
""" """
@ -188,15 +188,15 @@ class Model(object):
:throws: RuntimeError on error :throws: RuntimeError on error
""" """
status, ctx = deepspeech.impl.CreateStream(self._impl) status, ctx = stt.impl.CreateStream(self._impl)
if status != 0: if status != 0:
raise RuntimeError("CreateStream failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status)) raise RuntimeError("CreateStream failed with '{}' (0x{:X})".format(stt.impl.ErrorCodeToErrorMessage(status),status))
return Stream(ctx) return Stream(ctx)
class Stream(object): class Stream(object):
""" """
Class wrapping a DeepSpeech stream. The constructor cannot be called directly. Class wrapping a stt stream. The constructor cannot be called directly.
Use :func:`Model.createStream()` Use :func:`Model.createStream()`
""" """
def __init__(self, native_stream): def __init__(self, native_stream):
@ -217,7 +217,7 @@ class Stream(object):
""" """
if not self._impl: if not self._impl:
raise RuntimeError("Stream object is not valid. Trying to feed an already finished stream?") raise RuntimeError("Stream object is not valid. Trying to feed an already finished stream?")
deepspeech.impl.FeedAudioContent(self._impl, audio_buffer) stt.impl.FeedAudioContent(self._impl, audio_buffer)
def intermediateDecode(self): def intermediateDecode(self):
""" """
@ -230,7 +230,7 @@ class Stream(object):
""" """
if not self._impl: if not self._impl:
raise RuntimeError("Stream object is not valid. Trying to decode an already finished stream?") raise RuntimeError("Stream object is not valid. Trying to decode an already finished stream?")
return deepspeech.impl.IntermediateDecode(self._impl) return stt.impl.IntermediateDecode(self._impl)
def intermediateDecodeWithMetadata(self, num_results=1): def intermediateDecodeWithMetadata(self, num_results=1):
""" """
@ -246,7 +246,7 @@ class Stream(object):
""" """
if not self._impl: if not self._impl:
raise RuntimeError("Stream object is not valid. Trying to decode an already finished stream?") raise RuntimeError("Stream object is not valid. Trying to decode an already finished stream?")
return deepspeech.impl.IntermediateDecodeWithMetadata(self._impl, num_results) return stt.impl.IntermediateDecodeWithMetadata(self._impl, num_results)
def finishStream(self): def finishStream(self):
""" """
@ -261,7 +261,7 @@ class Stream(object):
""" """
if not self._impl: if not self._impl:
raise RuntimeError("Stream object is not valid. Trying to finish an already finished stream?") raise RuntimeError("Stream object is not valid. Trying to finish an already finished stream?")
result = deepspeech.impl.FinishStream(self._impl) result = stt.impl.FinishStream(self._impl)
self._impl = None self._impl = None
return result return result
@ -282,7 +282,7 @@ class Stream(object):
""" """
if not self._impl: if not self._impl:
raise RuntimeError("Stream object is not valid. Trying to finish an already finished stream?") raise RuntimeError("Stream object is not valid. Trying to finish an already finished stream?")
result = deepspeech.impl.FinishStreamWithMetadata(self._impl, num_results) result = stt.impl.FinishStreamWithMetadata(self._impl, num_results)
self._impl = None self._impl = None
return result return result
@ -295,12 +295,12 @@ class Stream(object):
""" """
if not self._impl: if not self._impl:
raise RuntimeError("Stream object is not valid. Trying to free an already finished stream?") raise RuntimeError("Stream object is not valid. Trying to free an already finished stream?")
deepspeech.impl.FreeStream(self._impl) stt.impl.FreeStream(self._impl)
self._impl = None self._impl = None
# This is only for documentation purpose # This is only for documentation purpose
# Metadata, CandidateTranscript and TokenMetadata should be in sync with native_client/deepspeech.h # Metadata, CandidateTranscript and TokenMetadata should be in sync with native_client/coqui-stt.h
class TokenMetadata(object): class TokenMetadata(object):
""" """
Stores each individual character, along with its timing information Stores each individual character, along with its timing information

View File

@ -10,7 +10,7 @@ import sys
import wave import wave
import json import json
from deepspeech import Model, version from stt import Model, version
from timeit import default_timer as timer from timeit import default_timer as timer
try: try:
@ -83,12 +83,12 @@ class VersionAction(argparse.Action):
super(VersionAction, self).__init__(nargs=0, *args, **kwargs) super(VersionAction, self).__init__(nargs=0, *args, **kwargs)
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
print('DeepSpeech ', version()) print('Coqui STT ', version())
exit(0) exit(0)
def main(): def main():
parser = argparse.ArgumentParser(description='Running DeepSpeech inference.') parser = argparse.ArgumentParser(description='Running Coqui STT inference.')
parser.add_argument('--model', required=True, parser.add_argument('--model', required=True,
help='Path to the model (protocol buffer binary file)') help='Path to the model (protocol buffer binary file)')
parser.add_argument('--scorer', required=False, parser.add_argument('--scorer', required=False,

Some files were not shown because too many files have changed in this diff Show More