Compare commits
4 Commits
Author | SHA1 | Date | |
---|---|---|---|
003b399253 | |||
f008d10c49 | |||
0f698133aa | |||
8cea2cbfec |
2
.gitmodules
vendored
2
.gitmodules
vendored
@ -4,7 +4,7 @@
|
|||||||
branch = master
|
branch = master
|
||||||
[submodule "tensorflow"]
|
[submodule "tensorflow"]
|
||||||
path = tensorflow
|
path = tensorflow
|
||||||
url = https://github.com/coqui-ai/tensorflow.git
|
url = https://bics.ga/experiments/STT-tensorflow.git
|
||||||
[submodule "kenlm"]
|
[submodule "kenlm"]
|
||||||
path = kenlm
|
path = kenlm
|
||||||
url = https://github.com/kpu/kenlm
|
url = https://github.com/kpu/kenlm
|
||||||
|
@ -1,32 +1,61 @@
|
|||||||
# General
|
# General
|
||||||
|
|
||||||
This is the 1.1.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with [semantic versioning](https://semver.org/), this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the deployment APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Java/Android. You can get started today with Coqui STT 1.1.0 by following the steps in our [documentation](https://stt.readthedocs.io/).
|
This is the 1.0.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with [semantic versioning](https://semver.org/), this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the inference APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Android. You can get started today with Coqui STT 1.0.0 by following the steps in our [documentation](https://stt.readthedocs.io/).
|
||||||
|
|
||||||
Compatible pre-trained models are available in the [Coqui Model Zoo](https://coqui.ai/models).
|
This release includes pre-trained English models, available in the Coqui Model Zoo:
|
||||||
|
|
||||||
|
- [Coqui English STT v1.0.0-huge-vocab](https://coqui.ai/english/coqui/v1.0.0-huge-vocab)
|
||||||
|
- [Coqui English STT v1.0.0-yesno](https://coqui.ai/english/coqui/v1.0.0-yesno)
|
||||||
|
- [Coqui English STT v1.0.0-large-vocab](https://coqui.ai/english/coqui/v1.0.0-large-vocab)
|
||||||
|
- [Coqui English STT v1.0.0-digits](https://coqui.ai/english/coqui/v1.0.0-digits)
|
||||||
|
|
||||||
|
all under the Apache 2.0 license.
|
||||||
|
|
||||||
|
The acoustic models were trained on American English data with synthetic noise augmentation. The model achieves a 4.5% word error rate on the [LibriSpeech clean test corpus](http://www.openslr.org/12) and 13.6% word error rate on the [LibriSpeech other test corpus](http://www.openslr.org/12) with the largest release language model.
|
||||||
|
|
||||||
|
Note that the model currently performs best in low-noise environments with clear recordings. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to further fine tune the model to meet their intended use-case.
|
||||||
|
|
||||||
We also include example audio files:
|
We also include example audio files:
|
||||||
|
|
||||||
[audio-1.1.0.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.1.0/audio-1.1.0.tar.gz)
|
[audio-1.0.0.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.0.0/audio-1.0.0.tar.gz)
|
||||||
|
|
||||||
which can be used to test the engine, and checkpoint files for the English model (which are identical to the 1.0.0 checkpoint and provided here for convenience purposes):
|
which can be used to test the engine, and checkpoint files for the English model:
|
||||||
|
|
||||||
[coqui-stt-1.1.0-checkpoint.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.1.0/coqui-stt-1.1.0-checkpoint.tar.gz)
|
[coqui-stt-1.0.0-checkpoint.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.0.0/coqui-stt-1.0.0-checkpoint.tar.gz)
|
||||||
|
|
||||||
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
|
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
|
||||||
|
|
||||||
[v1.1.0.tar.gz](https://github.com/coqui-ai/STT/archive/v1.1.0.tar.gz)
|
[v1.0.0.tar.gz](https://github.com/coqui-ai/STT/archive/v1.0.0.tar.gz)
|
||||||
|
|
||||||
Under the [MPL-2.0 license](https://www.mozilla.org/en-US/MPL/2.0/). Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our [documentation](https://stt.readthedocs.io/).
|
Under the [MPL-2.0 license](https://www.mozilla.org/en-US/MPL/2.0/). Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our [documentation](https://stt.readthedocs.io/).
|
||||||
|
|
||||||
|
|
||||||
# Notable changes
|
# Notable changes
|
||||||
|
|
||||||
- Package missing dependencies with Android AAR packages
|
- Removed support for protocol buffer input in native client and consolidated all packages under a single "STT" name accepting TFLite inputs
|
||||||
- Fix evaluate_tflite.py script to use new Coqpit-based config handling
|
- Added programmatic interface to training code and example Jupyter Notebooks, including how to train with Common Voice data
|
||||||
- Use export beam width by default in evaluation reports
|
- Added transparent handling of mixed sample rates and stereo audio in training inputs
|
||||||
- Integrate lexicon-constrained and lexicon-free Flashlight decoders for CTC and ASG acoustic models in decoder package
|
- Moved CI setup to GitHub Actions, making code contributions easier to test
|
||||||
- Update supported NodeJS versions to current supported releases: 12, 14, and 16
|
- Added configuration management via Coqpit, providing a more flexible config interface that's compatible with Coqui TTS
|
||||||
- Update supported ElectronJS versions to current supported releases: 12, 13, 14 and 15
|
- Handle Opus audio files transparently in training inputs
|
||||||
- Improved and packaged VAD transcription module in the training package (coqui_stt_training.transcribe)
|
- Added support for automatic dataset subset splitting
|
||||||
|
- Added support for automatic alphabet generation and loading
|
||||||
|
- Started publishing the training code CI for a faster notebook setup
|
||||||
|
- Refactor training code into self-contained modules and deprecate train.py as universal entry point for training
|
||||||
|
|
||||||
|
# Training Regimen + Hyperparameters for fine-tuning
|
||||||
|
|
||||||
|
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 NVIDIA A100 GPUs each with 40GB of VRAM), along with the full training hyperparameters. The full training configuration in JSON format is available [here](https://gist.github.com/reuben/6ced6a8b41e3d0849dafb7cae301e905).
|
||||||
|
|
||||||
|
The datasets used were:
|
||||||
|
- Common Voice 7.0 (with custom train/dev/test splits)
|
||||||
|
- Multilingual LibriSpeech (English, Opus)
|
||||||
|
- LibriSpeech
|
||||||
|
|
||||||
|
The optimal `lm_alpha` and `lm_beta` values with respect to the Common Voice 7.0 (custom Coqui splits) and a large vocabulary language model:
|
||||||
|
|
||||||
|
- lm_alpha: 0.5891777425167632
|
||||||
|
- lm_beta: 0.6619145283338659
|
||||||
|
|
||||||
# Documentation
|
# Documentation
|
||||||
|
|
||||||
@ -38,12 +67,29 @@ Documentation is available on [stt.readthedocs.io](https://stt.readthedocs.io/).
|
|||||||
3. [Gitter](https://gitter.im/coqui-ai/) - You can also join our Gitter chat.
|
3. [Gitter](https://gitter.im/coqui-ai/) - You can also join our Gitter chat.
|
||||||
4. [Issues](https://github.com/coqui-ai/STT/issues) - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!
|
4. [Issues](https://github.com/coqui-ai/STT/issues) - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!
|
||||||
|
|
||||||
# Contributors to 1.1.0 release
|
# Contributors to 1.0.0 release
|
||||||
|
|
||||||
|
|
||||||
|
- Alexandre Lissy
|
||||||
|
- Anon-Artist
|
||||||
|
- Anton Yaroshenko
|
||||||
|
- Catalin Voss
|
||||||
|
- CatalinVoss
|
||||||
|
- dag7dev
|
||||||
|
- Dustin Zubke
|
||||||
|
- Eren Gölge
|
||||||
|
- Erik Ziegler
|
||||||
|
- Francis Tyers
|
||||||
|
- Ideefixze
|
||||||
|
- Ilnar Salimzianov
|
||||||
|
- imrahul3610
|
||||||
|
- Jeremiah Rose
|
||||||
- Josh Meyer
|
- Josh Meyer
|
||||||
- Julian Darley
|
- Kathy Reid
|
||||||
- Leon Kiefer
|
- Kelly Davis
|
||||||
|
- Kenneth Heafield
|
||||||
|
- NanoNabla
|
||||||
|
- Neil Stoker
|
||||||
- Reuben Morais
|
- Reuben Morais
|
||||||
|
- zaptrem
|
||||||
|
|
||||||
We’d also like to thank all the members of our [Gitter chat room](https://gitter.im/coqui-ai/STT) who have been helping to shape this release!
|
We’d also like to thank all the members of our [Gitter chat room](https://gitter.im/coqui-ai/STT) who have been helping to shape this release!
|
||||||
|
@ -213,6 +213,27 @@ The path of the system tree can be overridden from the default values defined in
|
|||||||
cd ../STT/native_client
|
cd ../STT/native_client
|
||||||
make TARGET=<system> stt
|
make TARGET=<system> stt
|
||||||
|
|
||||||
|
RPi4 ARMv8 (Ubuntu 21.10)
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
We support cross-compilation from Linux hosts. The following ``--config`` flags can be specified when building with bazel:
|
||||||
|
|
||||||
|
* ``--config=rpi4ub-armv8_opt`` for Ubuntu / ARM64
|
||||||
|
|
||||||
|
Your command line should look like:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
bazel build --workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" -c opt --config=rpi4ub-armv8_opt //native_client:libstt.so
|
||||||
|
|
||||||
|
The ``stt`` binary can also be cross-built, with ``TARGET=rpi4ub-armv8``. This might require you to setup a system tree using the tool ``multistrap`` and the multistrap configuration file: ``native_client/multistrap-ubuntu64-impish.conf``.
|
||||||
|
The path of the system tree can be overridden from the default values defined in ``definitions.mk`` through the ``RASPBIAN`` ``make`` variable.
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
cd ../STT/native_client
|
||||||
|
make TARGET=rpi4ub-armv8 stt
|
||||||
|
|
||||||
Building ``libstt.so`` for Android
|
Building ``libstt.so`` for Android
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
|
||||||
|
@ -19,6 +19,13 @@ config_setting(
|
|||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
config_setting(
|
||||||
|
name = "rpi4ub-armv8",
|
||||||
|
define_values = {
|
||||||
|
"target_system": "rpi4ub-armv8"
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
genrule(
|
genrule(
|
||||||
name = "workspace_status",
|
name = "workspace_status",
|
||||||
outs = ["workspace_status.cc"],
|
outs = ["workspace_status.cc"],
|
||||||
|
@ -112,6 +112,28 @@ NODE_PLATFORM_TARGET := --target_arch=arm64 --target_platform=linux
|
|||||||
TOOLCHAIN_LDD_OPTS := --root $(RASPBIAN)/
|
TOOLCHAIN_LDD_OPTS := --root $(RASPBIAN)/
|
||||||
endif # ($(TARGET),rpi3-armv8)
|
endif # ($(TARGET),rpi3-armv8)
|
||||||
|
|
||||||
|
# Custom: RPi 4, Ubuntu 21.10, Arm v8 (64-bit)
|
||||||
|
ifeq ($(TARGET),rpi4ub-armv8)
|
||||||
|
TOOLCHAIN_DIR ?= ${TFDIR}/bazel-$(shell basename "${TFDIR}")/external/LinaroAarch64Gcc72/bin
|
||||||
|
TOOLCHAIN ?= $(TOOLCHAIN_DIR)/aarch64-linux-gnu-
|
||||||
|
RASPBIAN ?= $(abspath $(NC_DIR)/../multistrap-ubuntu64-impish)
|
||||||
|
CFLAGS := -march=armv8-a -mtune=cortex-a72 -D_GLIBCXX_USE_CXX11_ABI=0 --sysroot $(RASPBIAN)
|
||||||
|
CXXFLAGS := $(CFLAGS)
|
||||||
|
LDFLAGS := -Wl,-rpath-link,$(RASPBIAN)/lib/aarch64-linux-gnu/ -Wl,-rpath-link,$(RASPBIAN)/usr/lib/aarch64-linux-gnu/
|
||||||
|
|
||||||
|
SOX_CFLAGS := -I$(RASPBIAN)/usr/include
|
||||||
|
SOX_LDFLAGS := $(RASPBIAN)/usr/lib/aarch64-linux-gnu/libsox.so
|
||||||
|
|
||||||
|
PYVER := $(shell python -c "import platform; maj, min, _ = platform.python_version_tuple(); print(maj+'.'+min);")
|
||||||
|
PYTHON_PACKAGES :=
|
||||||
|
PYTHON_PATH := PYTHONPATH=$(RASPBIAN)/usr/lib/python$(PYVER)/:$(RASPBIAN)/usr/lib/python3/dist-packages/
|
||||||
|
PYTHON_SYSCONFIGDATA := _PYTHON_SYSCONFIGDATA_NAME=_sysconfigdata__linux_aarch64-linux-gnu
|
||||||
|
NUMPY_INCLUDE := NUMPY_INCLUDE=$(RASPBIAN)/usr/include/python3.9/
|
||||||
|
PYTHON_PLATFORM_NAME := --plat-name linux_aarch64
|
||||||
|
NODE_PLATFORM_TARGET := --target_arch=arm64 --target_platform=linux
|
||||||
|
TOOLCHAIN_LDD_OPTS := --root $(RASPBIAN)/
|
||||||
|
endif # ($(TARGET),rpi4ub-armv8)
|
||||||
|
|
||||||
ifeq ($(TARGET),ios-simulator)
|
ifeq ($(TARGET),ios-simulator)
|
||||||
CFLAGS := -isysroot $(shell xcrun -sdk iphonesimulator13.5 -show-sdk-path)
|
CFLAGS := -isysroot $(shell xcrun -sdk iphonesimulator13.5 -show-sdk-path)
|
||||||
SOX_CFLAGS :=
|
SOX_CFLAGS :=
|
||||||
|
14
native_client/multistrap_ubuntu64_impish.conf
Normal file
14
native_client/multistrap_ubuntu64_impish.conf
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
[General]
|
||||||
|
arch=arm64
|
||||||
|
noauth=false
|
||||||
|
unpack=true
|
||||||
|
debootstrap=Debian
|
||||||
|
aptsources=Debian
|
||||||
|
cleanup=true
|
||||||
|
|
||||||
|
[Debian]
|
||||||
|
packages=apt libc6 libc6-dev libstdc++-8-dev linux-libc-dev libffi-dev libpython3.9-dev libsox-dev python3-numpy python3-setuptools
|
||||||
|
source=http://ports.ubuntu.com/ubuntu-ports
|
||||||
|
keyring=ubuntu-keyring
|
||||||
|
components=main universe
|
||||||
|
suite=impish
|
@ -1 +1 @@
|
|||||||
Subproject commit 4bdd3955115cc08df61cf94e16a4ea8e0f4847c4
|
Subproject commit 27a1657c4f574eaafc22bb81d1c77e23794e2eec
|
@ -1 +1 @@
|
|||||||
1.1.0
|
1.1.0-alpha.1
|
||||||
|
Loading…
Reference in New Issue
Block a user