Compare commits

...

728 Commits

Author SHA1 Message Date
003b399253 Fix up the RPi4 build 2021-12-04 16:28:48 +00:00
f008d10c49 Use tensorflow fork with rpi4ub-armv8 build target 2021-12-04 16:08:39 +00:00
0f698133aa Add an rpi4ub-armv8 build variant 2021-12-04 15:45:55 +00:00
8cea2cbfec Use my fork of the fork of tensorflow 2021-12-04 12:34:10 +00:00
Reuben Morais
dbd38c3a89
Merge pull request #2032 from coqui-ai/transcription-scripts-docs
[transcribe] Fix multiprocessing hangs, clean-up target collection, write docs
2021-12-03 16:46:48 +01:00
Reuben Morais
b43e710959 Docs for transcription with training package 2021-12-03 16:22:43 +01:00
Reuben Morais
ff24a8b917 Undo late-imports 2021-12-03 16:22:43 +01:00
Reuben Morais
479d963155 Set training pkg python-requires<3.8 (due to TF 1.15.4 limit) 2021-12-03 16:22:34 +01:00
Reuben Morais
d90bb60506 [transcribe] Fix multiprocessing hangs, clean-up target collection 2021-12-01 15:44:25 +01:00
Reuben Morais
5cefd7069c Use known paths for Scorer and Alphabet copy in export 2021-11-23 14:21:11 +01:00
Reuben Morais
154a67fb2c
Merge pull request #2026 from coqui-ai/save-scorer-alphabet-savedmodel
Save Scorer and alphabet with SavedModel exports
2021-11-19 20:07:32 +01:00
Reuben Morais
d6456ae4aa Save Scorer and alphabet with SavedModel exports 2021-11-19 19:40:00 +01:00
Reuben Morais
3020949075
Merge pull request #2025 from coqui-ai/various-fixes
Docs fixes, SavedModel export, transcribe.py revival
2021-11-19 16:10:20 +01:00
Reuben Morais
efdaa61e2c Revive transcribe.py
Update to use Coqpit based config handling, fix multiprocesing setup, and add CI coverage.
2021-11-19 13:57:44 +01:00
Reuben Morais
419b15b72a Allow exporting as SavedModel 2021-11-18 13:48:52 +01:00
Reuben Morais
6a9bd1e6b6 Add usage instructions for C API 2021-11-17 14:07:22 +01:00
Reuben Morais
922d668155
Merge pull request #2024 from juliandarley/fix-shlex-typo
Fix typo in client.py - shlex in line 17
2021-11-17 13:05:30 +01:00
Julian Darley
8ed0a827de Fix typo in clienty.py - shlex in line 17 2021-11-17 08:33:32 +00:00
Reuben Morais
11c2edb068
Merge pull request #2018 from coqui-ai/node-electron-version-bump
Update NodeJS and ElectronJS build/test versions to supported releases
2021-11-12 22:47:41 +01:00
Reuben Morais
e7c28ca3c9 Remove outdated comment in supported platforms doc [skip ci] 2021-11-12 22:47:15 +01:00
Reuben Morais
2af6f8da89 Explicitly name TF build cache destination file
GitHub's API has stopped sending the artifact name as the file name, so we ended up with a file matching the artifact ID.
Name the full file path explicitly so there's no room for changes.
2021-11-12 22:47:02 +01:00
Reuben Morais
a5c981bb48 Update NodeJS and ElectronJS build/test versions to supported releases 2021-11-12 22:47:02 +01:00
Reuben Morais
23af8bd095 Bump version to v1.1.0-alpha.1 2021-10-31 21:42:00 +01:00
Reuben Morais
2b955fc70f
Merge pull request #2004 from coqui-ai/flashlight-docs
Improve decoder docs and include in RTD
2021-10-31 21:40:04 +01:00
Reuben Morais
90feb63894 Improve decoder package docs and include in RTD 2021-10-31 20:12:37 +01:00
Reuben Morais
91f1307de4 Pin docutils version as 0.18 release breaks build
Build breaks when writing output for AUGMENTATION.rst with error:

AttributeError: 'Values' object has no attribute 'section_self_link'
2021-10-31 16:46:03 +01:00
Reuben Morais
3d1e3ed3ba Don't include RELEASE_NOTES for pre-releases [skip ci] 2021-10-30 17:31:04 +02:00
Reuben Morais
9a2c2028c7 Bump version to v1.1.0-alpha.0 2021-10-30 17:24:05 +02:00
Reuben Morais
6ef733be54
Merge pull request #2001 from coqui-ai/decoder-flashlight
Expose Flashlight LexiconDecoder/LexiconFreeDecoder in decoder package
2021-10-30 17:19:41 +02:00
Reuben Morais
a61180aeae Fix Flashlight multiplatform build 2021-10-30 16:23:44 +02:00
Reuben Morais
391036643c debug 2021-10-30 16:23:44 +02:00
Reuben Morais
04f62ac9f7 Exercise training graph inference/Flashlight decoder in extra training tests 2021-10-30 14:59:32 +02:00
Reuben Morais
755fb81a62 Expose Flashlight LexiconDecoder/LexiconFreeDecoder 2021-10-30 14:59:32 +02:00
Reuben Morais
5f2ff85fe8
Merge pull request #1977 from Legion2/patch-1
fixed duplicate deallocation of stream in Swift STTStream
2021-10-30 10:19:17 +02:00
Reuben Morais
489e49f698
Merge pull request #1990 from JRMeyer/evaluate_tflite
Update evaluate_tflite.py script for Coqpit
2021-10-30 10:18:56 +02:00
Reuben Morais
65e66117e2
Merge pull request #1998 from coqui-ai/aar-pack-deps
Package dynamic deps in AAR
2021-10-29 20:48:43 +02:00
Reuben Morais
a726351341 Bump Windows TF build cache due to worker upgrade 2021-10-29 20:05:22 +02:00
Reuben Morais
d753431d11 Fix build on Windows after internal GitHub Actions MSYS2 changes 2021-10-29 20:05:22 +02:00
Reuben Morais
83b40b2532 Rehost PCRE package to avoid external outages interrupting CI 2021-10-25 11:03:19 +02:00
Reuben Morais
1f7b43f94e Package libkenlm.so, libtensorflowlite.so and libtflitedelegates.so in AAR 2021-10-25 11:03:19 +02:00
Reuben Morais
5ff8d11393 Use export beam width by default in evaluation 2021-10-13 13:36:30 +02:00
Josh Meyer
157ce340b6 Update evaluate_tflite.py script for Coqpit 2021-10-07 14:46:03 -04:00
Reuben Morais
27584037f8 Bump version to v1.0.0 2021-10-04 16:30:39 +02:00
Reuben Morais
29e980473f Docs changes for 1.0.0 2021-10-04 16:30:39 +02:00
Reuben Morais
0b36745338 Bump version to v0.10.0-alpha.29 2021-10-02 14:18:39 +02:00
Reuben Morais
c6a91dad2a Fix permissions for Docker push, tagging of prereleases 2021-10-02 14:18:26 +02:00
Reuben Morais
1233fc7b71 Bump version to v0.10.0-alpha.28 2021-10-02 13:47:04 +02:00
Reuben Morais
bd45ecf56e Centralized handling of git tag/VERSION checks 2021-10-02 13:46:52 +02:00
Reuben Morais
a4faa4475a Bump version to v0.10.0-alpha.27 2021-10-02 13:40:28 +02:00
Reuben Morais
18812376dc
Merge pull request #1981 from coqui-ai/aar-publish
AAR build+publish
2021-10-02 13:39:28 +02:00
Reuben Morais
62effd9acb AAR build+publish 2021-10-02 13:38:46 +02:00
Reuben Morais
8a64ed2a1e Bump version to v0.10.0-alpha.26 2021-09-28 13:58:00 +02:00
Reuben Morais
178cdacf5e
Merge pull request #1980 from coqui-ai/docker-kenlm-base
Build KenLM in same base as final image
2021-09-28 13:57:17 +02:00
Reuben Morais
0b60e4dbbb Build KenLM in same base as final img 2021-09-28 13:38:11 +02:00
Leon Kiefer
fab1bbad73
fixed duplicate deallocation of stream
streamCtx must be unset after STT_FreeStream was called in STT_FinishStreamWithMetadata, else STT_FreeStream is called again on destruction of STTStream resulting in EXC_BAD_ACCESS errors
2021-09-26 12:56:28 +02:00
Reuben Morais
5691d4e053
Merge pull request #1975 from coqui-ai/android-builds
Android builds
2021-09-22 13:09:26 +02:00
Reuben Morais
c536d1bd01 Add Android build tasks 2021-09-22 11:55:19 +02:00
Reuben Morais
d4091badf9 Rename host-build action to libstt-build 2021-09-21 18:09:51 +02:00
Reuben Morais
1d75af5ab4 Fix and improve build instructions for Android and RPi 2021-09-21 18:09:51 +02:00
Reuben Morais
8bd5dac837 Declare delegate dependencies on Android 2021-09-21 18:09:51 +02:00
Josh Meyer
46bae2f3fc
Fix typo in link to colab 2021-09-16 06:59:45 -04:00
Josh Meyer
df67678220
Merge pull request #1966 from JRMeyer/cv-notebook
Python notebook for training on Common Voice
2021-09-16 06:51:30 -04:00
Reuben Morais
fd719ac013
Merge pull request #1968 from coqui-ai/rehost-sox-win
Rehost SoX Windows package to avoid Sourceforge outages
2021-09-16 11:58:17 +02:00
Reuben Morais
4861557a03 Rehost SoX Windows package to avoid Sourceforge outages 2021-09-16 11:22:32 +02:00
Reuben Morais
835d657648
Merge pull request #1967 from coqui-ai/batch-shuffling
Add support for shuffling batches after N epochs (Fixes #1901)
2021-09-16 11:05:06 +02:00
Reuben Morais
72599be9d4 Add support for shuffling batches after N epochs 2021-09-16 10:40:27 +02:00
Josh Meyer
7cbe879fc6 Use python 3.7, not 3.8 2021-09-16 03:36:32 -04:00
Josh Meyer
8cfc1163e2 Add checkout action 2021-09-16 03:28:11 -04:00
Josh Meyer
c78f98a7bc Add separate job to CI for notebook tests 2021-09-16 03:19:09 -04:00
Josh Meyer
90d4e43c58
Use sudo for installing opus things 2021-09-15 12:22:05 -04:00
Josh Meyer
be7500c8b7
Fix Typo 2021-09-15 12:04:34 -04:00
Josh Meyer
1a55ce8078
Add missing opus tools to CI 2021-09-15 12:03:30 -04:00
Josh Meyer
242d2eff2c
Add missing jupyter install in CI 2021-09-15 10:09:05 -04:00
Josh Meyer
56d1282642 Merge branch 'cv-notebook' of github.com:JRMeyer/STT into cv-notebook 2021-09-15 09:20:16 -04:00
Josh Meyer
bd7809421d Add notebooks to CI workflow 2021-09-15 09:19:46 -04:00
Josh Meyer
5e1e810102
Merge branch 'coqui-ai:main' into cv-notebook 2021-09-15 09:09:00 -04:00
Josh Meyer
6405bd1758 Add CI tests for notebooks 2021-09-15 09:08:12 -04:00
Josh Meyer
8a3cea8b6d Cosmetic changes 2021-09-15 07:57:53 -04:00
Josh Meyer
f6a64e7dd8 Typo 2021-09-15 07:19:35 -04:00
Josh Meyer
cbd3db9d28 Cosmetic notebook changes 2021-09-15 07:16:08 -04:00
Josh Meyer
2729da33a8 More notebook work 2021-09-15 06:54:25 -04:00
Reuben Morais
feeb2a222d Bump version to v0.10.0-alpha.25 2021-09-15 11:09:54 +02:00
Reuben Morais
76267ebdff Rename libstt and native_client archives when publishing on GitHub Actions 2021-09-15 11:08:20 +02:00
Josh Meyer
0e8920ed63
Use table to organize notebooks 2021-09-15 04:09:00 -04:00
Josh Meyer
903c2b4aca
Install STT from pypi in notebook 2021-09-15 04:03:27 -04:00
Josh Meyer
5201c2a10c
Install STT from pypi in notebook 2021-09-15 04:02:08 -04:00
Josh Meyer
7085fd3ed3 Add notebook for CV 2021-09-14 11:50:35 -04:00
Reuben Morais
ef8825f5f6 Bump version to v0.10.0-alpha.24 2021-09-14 13:06:22 +02:00
Reuben Morais
4744d0c9e4 Separate brace expansion into two upload calls 2021-09-14 13:05:51 +02:00
Reuben Morais
93e743d171 Bump version to v0.10.0-alpha.23 2021-09-14 12:47:36 +02:00
Reuben Morais
e0e5b0391c Don't overwrite asset_name for multiple files 2021-09-14 12:47:25 +02:00
Reuben Morais
7a20c9bd90 Bump version to v0.10.0-alpha.22 2021-09-14 12:34:10 +02:00
Reuben Morais
473d1a8e4f Fix filename when uploading multiple assets, upload training package 2021-09-14 12:33:54 +02:00
Reuben Morais
92aff6a8ef Bump version to v0.10.0-alpha.21 2021-09-14 12:22:10 +02:00
Reuben Morais
220cc8ab15 Fix escaping of variable when creating new release 2021-09-14 12:21:56 +02:00
Reuben Morais
39e57b522a Bump version to v0.10.0-alpha.20 2021-09-14 12:15:29 +02:00
Reuben Morais
2e588bd0b8 Fix GitHub upload logic for multiple assets 2021-09-14 12:15:19 +02:00
Reuben Morais
abc0399fdb Bump version to v0.10.0-alpha.19 2021-09-14 11:47:34 +02:00
Reuben Morais
26b578c1c7 Checkout source for upload-release-asset action 2021-09-14 11:47:10 +02:00
Reuben Morais
6f4a3c1200 Bump version to v0.10.0-alpha.18 2021-09-14 09:48:58 +02:00
Reuben Morais
e8a5e91151 Fix syntax errors in tag scripts 2021-09-14 09:41:35 +02:00
Reuben Morais
810164d679 Bump version to v0.10.0-alpha.16 2021-09-13 18:35:03 +02:00
Reuben Morais
aed43cc988 Fix syntax error in GitHub Release asset upload task 2021-09-13 18:34:48 +02:00
Reuben Morais
d437ecc69f Bump version to v0.10.0-alpha.15 2021-09-13 17:53:35 +02:00
Josh Meyer
ba581501f4
Merge pull request #1965 from JRMeyer/notebooks
Fix notebook syntax after train.py was split into parts
2021-09-13 09:31:14 -04:00
Josh Meyer
638874e925 Fix notebook syntax after train.py was split into parts 2021-09-13 09:29:55 -04:00
Reuben Morais
b5e8ebb943 Fix quickstart docs [skip ci] 2021-09-13 13:08:14 +02:00
Reuben Morais
822019bf05
Merge pull request #1961 from coqui-ai/upload-wheels-release
Upload built artifacts to GitHub releases
2021-09-09 18:50:39 +02:00
Reuben Morais
01c992caef Upload built artifacts to GitHub releases 2021-09-09 18:13:18 +02:00
Reuben Morais
97a2cb21ee
Merge pull request #1960 from coqui-ai/fix-dockerfile-build
Fix Dockerfile.build build after TFLite changes
2021-09-08 12:17:01 +02:00
Reuben Morais
e6d5a0ca8d Fix linter error [skip ci] 2021-09-08 12:16:25 +02:00
Reuben Morais
738874fb6f Fix Dockerfile.build build after TFLite changes 2021-09-08 12:00:11 +02:00
Reuben Morais
28f107fb96
Merge pull request #1956 from jeremiahrose/build-local-source
Fix #1955 Use local source instead of redownloading in Dockerfile.build
2021-09-08 11:11:39 +02:00
Jeremiah Rose
0917206827 Update Dockerfile.build documentation in DEPLOYMENT.rst 2021-09-08 10:20:24 +10:00
Reuben Morais
909b343ce0 Fix header logo scaling 2021-09-07 22:03:09 +02:00
Reuben Morais
a51cc78a3b git add missing logo image [skip ci] 2021-09-07 18:30:48 +02:00
Reuben Morais
083a9e1ecc Add logo and wordmark to docs [skip ci] 2021-09-07 18:28:52 +02:00
Reuben Morais
6635668eb3
Merge pull request #1951 from coqui-ai/docs-pass
Documentation cleanup pass to match recent changes
2021-09-07 10:15:46 +02:00
Jeremiah Rose
d85187aa44 Use local source instead of redownloading in Dockerfile.build 2021-09-07 09:45:31 +10:00
Reuben Morais
186bb63b57 Documentation cleanup pass to match recent changes 2021-08-27 14:24:23 +02:00
Reuben Morais
6214816e26 Merge branch 'publish-training-code' (Fixes #1950) 2021-08-27 13:12:14 +02:00
Reuben Morais
eb19d271fd Publish training package on PyPI 2021-08-27 13:11:27 +02:00
Reuben Morais
f94d16bcc3
Merge pull request #1948 from coqui-ai/remove-exception-box
Remove ExceptionBox and remember_exception
2021-08-26 21:13:18 +02:00
Reuben Morais
33c2190015 Remove ExceptionBox and remember_exception
TensorFlow already handles surfacing dataset exceptions internally.
2021-08-26 19:58:17 +02:00
Reuben Morais
497c828dd7
Merge pull request #1947 from coqui-ai/dataset-split
Automatic dataset split/alphabet generation
2021-08-26 19:45:06 +02:00
Reuben Morais
412de47623 Introduce --auto_input_dataset flag for input formatting
Automatically split data into sets and generate alphabet.
2021-08-26 18:03:32 +02:00
Reuben Morais
8458352255 Disable side-effects when importing train/evaluate scripts 2021-08-26 15:24:11 +02:00
Reuben Morais
b62fa678e6 Remove dead code 2021-08-26 12:00:15 +02:00
Reuben Morais
07ed417627 Bump version to v0.10.0-alpha.14 2021-08-26 10:57:27 +02:00
Reuben Morais
66b8a56454
Merge pull request #1945 from coqui-ai/alphabet-loading-generation
Convenience features for alphabet loading/saving/generation
2021-08-25 20:35:09 +02:00
Reuben Morais
02adea2d50 Generate and save alphabet automatically if dataset is fully specified 2021-08-25 19:39:05 +02:00
Reuben Morais
2b5a844c05 Load alphabet alongside checkpoint if present, some config fixes/cleanup 2021-08-25 19:39:03 +02:00
Reuben Morais
87f0a371b1 Serialize alphabet alongside checkpoint 2021-08-25 19:38:30 +02:00
Reuben Morais
5afe3c6e59
Merge pull request #1946 from coqui-ai/training-submodules
Split train.py into separate modules
2021-08-25 19:37:53 +02:00
Reuben Morais
2fd98de56f Split train.py into separate modules
Currently train.py is overloaded with many independent features.
Understanding the code and what will be the result of a training
call requires untangling the entire script. It's also an error
prone UX. This is a first step at separating independent parts
into their own scripts.
2021-08-25 18:57:30 +02:00
Reuben Morais
71da178138
Merge pull request #1942 from coqui-ai/nc-api-boundary
Python training API cleanup, mark nodes known by native client
2021-08-23 12:57:08 +02:00
Reuben Morais
3dff38ab3d Point to newer native_client build with lower glibc requirement [skip ci] 2021-08-20 16:35:25 +02:00
Reuben Morais
80a109b04e Pin Advanced Training Topics to docs sidebar [skip ci] (Fixes #1893) 2021-08-19 18:51:19 +02:00
Reuben Morais
fb2691ad70 Fix link to training with CV data in Playbook [skip ci] (Fixes #1932) 2021-08-19 18:44:58 +02:00
Reuben Morais
4c3537952a Fix lm_optimizer.py to use new Config/flags/logging setup 2021-08-19 18:42:07 +02:00
Reuben Morais
f9556d2236 Add comments marking nodes with names/shapes known by native client 2021-08-19 18:33:48 +02:00
Reuben Morais
f90408d3ab Move early_training_checks to train function 2021-08-19 18:33:32 +02:00
Reuben Morais
ad7335db0e Fix docs code listing for flags [skip ci] 2021-08-19 18:25:08 +02:00
Reuben Morais
392f4dbb25 Merge branch 'downgrade-docker-train-base' (Fixes #1941) 2021-08-19 18:22:28 +02:00
Reuben Morais
3995ec62c5 Bump Windows TF build cache due to upgraded MSVC 2021-08-19 18:22:18 +02:00
Reuben Morais
2936c72c08 Build and publish Docker train image on tag 2021-08-19 18:22:18 +02:00
Reuben Morais
32b44c5447 Downgrade training Dockerfile base image to one that has TFLite support
See https://github.com/NVIDIA/tensorflow/issues/16
2021-08-19 11:27:37 +02:00
Reuben Morais
4fc60bf5e9
Merge pull request #1938 from coqui-ai/non-quantized-export
Non quantized export + Better error message on missing alphabet
2021-08-12 11:29:21 +02:00
Reuben Morais
f71e32735f Add a more explicit error message when alphabet is not specified 2021-08-06 16:55:42 +02:00
Reuben Morais
3cff3dd0de Add an --export_quantize flag to control TFLite export quantization 2021-08-06 16:52:55 +02:00
Reuben Morais
285b524299
Merge pull request #1931 from coqui-ai/pull_request_template 2021-08-03 15:28:45 +02:00
Reuben Morais
c3cc7aae2e Bump version to v0.10.0-alpha.13 2021-08-02 21:08:00 +02:00
Reuben Morais
b5db9b2f41 Merge branch 'npm-publish' (Fixes #1930) 2021-08-02 21:07:28 +02:00
Reuben Morais
f3df9b16d5 Publish Node package on npmjs.com on tags 2021-08-02 20:42:12 +02:00
kdavis-coqui
d2bcbcc6b7 Added CLA info to pull request template 2021-08-02 17:45:19 +02:00
Reuben Morais
800ddae12f Bump version to v0.10.0-alpha.12 2021-08-01 23:56:46 +02:00
Reuben Morais
5a5db45c7e
Merge pull request #1923 from coqui-ai/tf-libstt-manylinux
Build TensorFlow+libstt+Python packages in manylinux_2_24 containers
2021-08-01 23:56:23 +02:00
Reuben Morais
9d44e2f506 Disable wrapping of struct ctors to workaround NodeJS 16.6 ABI break 2021-08-01 23:25:23 +02:00
Reuben Morais
1a423a4c8d Force plat name in favor of auditwheel for Python packages
Auditwheel can't properly deal with the shared libraries we bundle
and ends up copying some of them, ending up with a package with
duplicated images.
2021-08-01 23:00:57 +02:00
Reuben Morais
8f0b759103 Build TensorFlow+libstt+Py pkgs on manylinux_2_24 2021-08-01 23:00:57 +02:00
Josh Meyer
90a067df49
Merge pull request #1926 from JRMeyer/progressbar-to-tqdm
Change progressbar to tqdm
2021-07-30 17:18:47 -04:00
Josh Meyer
b77d33a108
Merge pull request #1927 from JRMeyer/tfv1-moving
Move tfv1 calls inside high-level functions
2021-07-30 17:18:25 -04:00
Josh Meyer
256af35a61 Move tfv1 calls inside high-level functions 2021-07-30 13:25:36 -04:00
Josh Meyer
da23122cca Add SIMPLE_BAR for other scripts 2021-07-30 13:09:14 -04:00
Josh Meyer
fb2d99e9e0 Change progressbar to tqdm 2021-07-30 12:52:09 -04:00
Josh Meyer
df26eca4d2
Merge pull request #1920 from JRMeyer/transfer-learning-notebook
Fix config checkpoint handling and add notebook
2021-07-30 10:15:12 -04:00
Josh Meyer
1e79b8703d checkpoint_dir always overrides {save,load}_checkpoint_dir 2021-07-30 07:57:11 -04:00
Josh Meyer
979dccdf58 Fix typo 2021-07-30 06:48:33 -04:00
Josh Meyer
aeeb2549b1 Install after git clone 2021-07-30 06:48:33 -04:00
Josh Meyer
6c26f616ba Fix typo in notebooks 2021-07-30 06:48:33 -04:00
Josh Meyer
4eb5d7814a Fix typo in notebooks 2021-07-30 06:48:33 -04:00
Josh Meyer
bc1839baf4 Add cell to install STT in notebooks 2021-07-30 06:48:33 -04:00
Josh Meyer
4f14420a25 Fix broken colab button 2021-07-30 06:48:33 -04:00
Josh Meyer
f4f0c1dba9 Add README with Colab links 2021-07-30 06:48:33 -04:00
Josh Meyer
e95c8fe0b0 Fix typo in notebook 2021-07-30 06:48:33 -04:00
Josh Meyer
73f7b765ef Fix config issue and add notebook 2021-07-30 06:48:33 -04:00
Reuben Morais
8c5c35a0ad Bump version to v0.10.0-alpha.11 2021-07-29 17:45:16 +02:00
Reuben Morais
58a8e813e4
Merge pull request #1921 from coqui-ai/tflite-only
Remove full TF backend from native client and CI
2021-07-29 17:44:38 +02:00
Reuben Morais
d119957586 Update supported architectures doc 2021-07-29 16:45:30 +02:00
Reuben Morais
c69735e3b6 Build swig and decoder on manylinux_2_24 2021-07-29 16:45:30 +02:00
Reuben Morais
42ebbf9120 Remove full TF backend 2021-07-28 17:19:27 +02:00
Reuben Morais
2020f1b15a Undo accidental auth removal in check_artifact_exists action 2021-07-28 17:19:27 +02:00
Josh Meyer
4cd1a1cec4
Merge pull request #1919 from coqui-ai/JRMeyer-linter
Exclude smoke testing data from linter
2021-07-27 10:14:40 -04:00
Josh Meyer
c9840e59b1
Exclude smoke testing data from linter 2021-07-27 08:06:02 -04:00
Josh Meyer
4b4f00da56
Merge pull request #1918 from coqui-ai/JRMeyer-alphabet-patch
Add missing space to russian sample alphabet
2021-07-27 08:01:11 -04:00
Josh Meyer
1c7539e9c9
Add missing space to russian sample alphabet 2021-07-27 05:40:20 -04:00
Reuben Morais
4b2af9ce6b
Merge pull request #1909 from coqui-ai/kenlm-dynamic
Dynamically link KenLM and distribute with packages
2021-07-27 00:36:15 +02:00
Reuben Morais
3a695f9c1c Fix packaging and linking of libkenlm on Windows 2021-07-26 19:30:25 +02:00
Reuben Morais
36923c1e93 Enable -fexceptions in decoder builds 2021-07-26 12:50:53 +02:00
Reuben Morais
8a40ff086d Force link against KenLM on Windows 2021-07-26 12:43:49 +02:00
Reuben Morais
cbbdcbf246 Revert factoring out of decoder library build definition 2021-07-26 12:43:49 +02:00
Reuben Morais
7846f4602e Export needed KenLM symbols manually for Windows 2021-07-26 12:42:59 +02:00
Reuben Morais
b7428d114e Dynamically link KenLM and distribute with packages 2021-07-26 12:41:46 +02:00
Reuben Morais
579c36c98c
Merge pull request #1912 from JRMeyer/jupyter
Add Jupyter notebook and Dockerfile with Jupyter
2021-07-24 20:33:06 +02:00
Josh Meyer
3119911657 Next core Coqui STT docker image will have notebooks dir 2021-07-23 12:17:55 -04:00
Josh Meyer
7d40d5d686 Specify latest for base Coqui STT docker image 2021-07-23 12:16:26 -04:00
Josh Meyer
ea82ab4cb8 Remove old unneeded files 2021-07-23 12:15:12 -04:00
Josh Meyer
9f7fda14cb Add first Jupyter notebook 2021-07-23 12:12:02 -04:00
Josh Meyer
d1b0aadfbc
Merge pull request #1910 from JRMeyer/main
Bump VERSION to 0.10.0-alpha.10
2021-07-22 10:31:59 -04:00
Josh Meyer
9b5176321d Bump VERSION to 0.10.0-alpha.10 2021-07-22 10:30:14 -04:00
Josh Meyer
c19faeb5d0
Merge pull request #1908 from JRMeyer/config-logic 2021-07-22 08:15:14 -04:00
Josh Meyer
b4827fa462 Formatting changes from pre-commit 2021-07-22 05:39:45 -04:00
Josh Meyer
b6d40a3451 Add CI test for in-script variable setting 2021-07-21 15:35:33 -04:00
Josh Meyer
414748f1fe Remove extra imports 2021-07-21 15:23:56 -04:00
Josh Meyer
ec37b3324a Add example python script with initialize_globals_from_args() 2021-07-21 15:13:39 -04:00
Josh Meyer
ae9280ef1a Cleaner lines for CI args 2021-07-21 12:09:43 -04:00
Josh Meyer
a050b076cb Cleaner lines for CI args 2021-07-21 12:07:24 -04:00
Josh Meyer
3438dd2beb Fix checkpoint setting logic 2021-07-21 11:53:11 -04:00
Josh Meyer
90ce16fa15 Shortening some print statements 2021-07-21 09:37:32 -04:00
Josh Meyer
6da7b5fc26 Raise error when alphabet and bytes_mode both specified 2021-07-21 08:44:57 -04:00
Josh Meyer
0389560a92 Remove alphabet.txt from CI tests with bytes_output_mode 2021-07-21 07:16:58 -04:00
Josh Meyer
4342906c50 Better file_size handling 2021-07-21 06:37:25 -04:00
Josh Meyer
f6bd7bcf7d Handle file_size passed as int 2021-07-21 06:23:19 -04:00
Josh Meyer
4dc565beca Move checking logic into __post_init__() 2021-07-21 05:00:05 -04:00
Josh Meyer
5b4fa27467 Add required alphabet path to CI tests 2021-07-20 09:50:47 -04:00
Josh Meyer
920e92d68a Remove check_values and default alphabet 2021-07-20 09:34:44 -04:00
Josh Meyer
59e32556a4 Currently working notebook 2021-07-20 09:07:54 -04:00
Josh Meyer
848a612efe Import _SttConfig 2021-07-20 08:41:22 -04:00
Josh Meyer
afbcc01369 Break out config instantiation and setting 2021-07-20 08:13:07 -04:00
Josh Meyer
a37ca2ec27 Simplyfy dockerfile and add notebook 2021-07-20 04:20:57 -04:00
Josh Meyer
d0f8eb96cd Take out OVH run-time params 2021-07-16 11:52:51 -04:00
Josh Meyer
649bc53536 Remove extra installs from Dockerfile 2021-07-16 11:14:00 -04:00
Josh Meyer
ef5d472b29
Merge pull request #1894 from JRMeyer/dockerfile
Use multi-stage build process for Dockerfile.train
2021-07-16 10:05:33 -04:00
Josh Meyer
2c26497d96 Cleanup for smaller containers 2021-07-16 09:32:19 -04:00
Josh Meyer
f062f75e17 working on dockerfile with jupyter support 2021-07-16 09:24:00 -04:00
Reuben Morais
ba24f010eb Bump version to v0.10.0-alpha.9 2021-07-15 20:56:48 +02:00
Reuben Morais
b7380a6928 Comment out publishing of arv7 and aarch64 wheels 2021-07-15 20:56:28 +02:00
Reuben Morais
b12aa69922 Bump version to v0.10.0-alpha.8 2021-07-15 18:32:17 +02:00
Reuben Morais
5ded871a5e Separate PyPI publish jobs per API token and publish decoder package 2021-07-15 18:31:45 +02:00
Reuben Morais
550f5368a8
Merge pull request #1902 from coqui-ai/upload-tf-cache-to-release
Save build cache as release asset instead of artifact
2021-07-15 18:20:20 +02:00
Reuben Morais
460209d209 Pick up MSVC version automatically to handle worker upgrades cleanly 2021-07-15 16:27:37 +02:00
Josh Meyer
99fa146253 Fix wording in error message 2021-07-15 09:54:54 -04:00
Josh Meyer
52ecb5dbe2 Remove unneeded assets and only copy kenlm bins 2021-07-15 09:53:04 -04:00
Reuben Morais
d6da5191f5
Merge pull request #1903 from JRMeyer/minor-branding 2021-07-15 12:30:45 +02:00
Josh Meyer
a29db11a62 Change DS to STT in top python scripts 2021-07-15 06:26:06 -04:00
Reuben Morais
f026c75dae Setup MSVC env in Win TF build job 2021-07-15 11:06:44 +02:00
Reuben Morais
ed171b2efd Save build cache as release asset instead of artifact 2021-07-15 10:10:24 +02:00
Reuben Morais
2d9cbb2f06 Fix PYTHON_BIN_PATH in Windows build 2021-07-14 14:13:15 +02:00
Reuben Morais
283379775e Bump version to v0.10.0-alpha.7 2021-07-13 18:40:12 +02:00
Reuben Morais
432ca99db1 Add job to publish Python artifacts to PyPI on tag pushes 2021-07-13 18:40:12 +02:00
Reuben Morais
f1c0559406 Exclude a few more jobs from non-PR triggers 2021-07-13 18:25:58 +02:00
Reuben Morais
7b7f52f44c Enable build-and-test workflow on tags 2021-07-13 18:25:58 +02:00
Josh Meyer
8c65cbf064
Add missing Sox library for processing MP3 data 2021-06-14 16:09:26 -04:00
Josh Meyer
0385dfb5aa Fix broken multi-line error message 2021-06-14 10:32:38 -04:00
Josh Meyer
d3b337af09 Fixed error print statement 2021-06-14 07:20:34 -04:00
Josh Meyer
75fbd0ca30 Error message when KenLM build fails 2021-06-14 05:04:01 -04:00
Josh Meyer
769b310919 Use multistage building in dockerfile 2021-06-11 14:36:23 -04:00
Josh Meyer
6f2c7a8a7b
Merge pull request #1892 from JRMeyer/update-kenlm
Update kenlm submodule
2021-06-11 08:14:27 -04:00
Josh Meyer
806a16d1c0 Update kenlm submodule 2021-06-11 08:09:53 -04:00
Reuben Morais
866e15af7f Comment out isort pre-commit hook until we can figure out discrepancies between macOS and Linux 2021-06-10 16:57:57 +02:00
Reuben Morais
f2a21b2258
Merge pull request #1890 from JRMeyer/pre-commit-hook-changes
Add changes from pre-commit hook
2021-06-10 16:56:45 +02:00
Josh Meyer
9252cef138 Add changes from pre-commit hook 2021-06-10 10:49:54 -04:00
Reuben Morais
2e5efe5e15
Merge pull request #1889 from JRMeyer/dockerfile
Use NVIDIA image in Dockerfile
2021-06-10 16:01:52 +02:00
Josh Meyer
38e06e4635 Use NVIDIA image in Dockerfile 2021-06-10 09:47:15 -04:00
Reuben Morais
bf07f35420
Merge pull request #1887 from JRMeyer/dockerfile
Add dependencies for opus training
2021-06-10 00:00:21 +02:00
Josh Meyer
eba8e1ad4a Add dependencies for opus training 2021-06-09 14:02:07 -04:00
Reuben Morais
4ebcbea8b3
Merge pull request #1874 from erksch/ios-deployment-target-9.0
Set iOS deployment target to 9.0
2021-05-25 16:31:16 +02:00
Reuben Morais
a2515397cf
Merge pull request #1876 from erksch/rename-ios-framework
Change static ios framework name to stt_ios from coqui_stt_ios
2021-05-25 16:30:43 +02:00
Reuben Morais
0a38b72e34
Merge pull request #1877 from erksch/remove-libstt-from-ios-test-project
Remove libstt.so reference from stt_ios_test project
2021-05-25 16:30:13 +02:00
Erik Ziegler
d69c15db1a Remove libstt.so reference from stt_ios_test project 2021-05-24 17:29:36 +02:00
Erik Ziegler
ced136c657 Change static ios framework name to stt_ios from coqui_stt_ios 2021-05-24 17:22:55 +02:00
Erik Ziegler
b2fee574d8
Set iOS deployment target to 9.0 2021-05-23 20:59:09 +02:00
Reuben Morais
1bf8058379
Merge pull request #1871 from coqui-ai/training-tests
Training tests
2021-05-21 14:22:12 +02:00
Reuben Morais
f9ecf8370e Training unittests and lint check 2021-05-21 13:17:05 +02:00
Reuben Morais
3f17bba229 Training tests 2021-05-21 13:17:05 +02:00
Reuben Morais
5ba1e4d969 Remove broken TrainingSpeech importer temporarily
During the fork the archive URL was broken and nobody has mentioned it since.
Additionally the dependency on Unidecode (GPL) complicates licensing.

Removing it for now until both points are fixed.
2021-05-20 17:02:39 +02:00
Reuben Morais
debd1d9495
Merge pull request #1866 from coqui-ai/coqpit-config
Switch flag/config handling to Coqpit
2021-05-20 14:38:42 +02:00
Reuben Morais
eab6d3f5d9 Break dependency cycle between augmentation and config 2021-05-19 20:19:36 +02:00
Reuben Morais
d83630fef4 Print fully parsed augmentation config 2021-05-19 20:19:36 +02:00
Reuben Morais
5114362f6d Fix regression caused by PR #1868 2021-05-19 20:19:36 +02:00
Reuben Morais
5ad6e6abbf Switch flag/config handling to Coqpit 2021-05-19 20:19:36 +02:00
Reuben Morais
fb826f714d
Merge pull request #1828 from coqui-ai/global-cleanup
Run pre-commit hooks on all files
2021-05-18 13:47:20 +02:00
Reuben Morais
43a6c3e62a Run pre-commit hooks on all files 2021-05-18 13:45:52 +02:00
Reuben Morais
14aee5d35b Reintroduce excludes to pre-commit-hook 2021-05-18 13:45:09 +02:00
Josh Meyer
ac2bbd6a79
Merge pull request #1868 from JRMeyer/data-augmentation-cleaning
Add logging and clean up some augmentation code
2021-05-18 07:05:12 -04:00
Josh Meyer
7bec52c55d More compact return statement 2021-05-18 06:47:32 -04:00
Josh Meyer
9a708328e7 Use class name and review cleanup 2021-05-18 06:12:56 -04:00
Reuben Morais
d2c5f979ce Update pre-commit setup 2021-05-18 11:46:53 +02:00
Josh Meyer
f19ecbdd93 Add logging for augmentation and more transparent syntax 2021-05-17 10:05:49 -04:00
Josh Meyer
b793aa53bb
Merge pull request #1867 from ftyers/patch-2
Fix typo in augmentations.py
2021-05-13 11:36:55 -04:00
Francis Tyers
37cc7f2312
Update augmentations.py
Looks like `clock_to` got changed to `final_clock` but this one was missed.
2021-05-13 16:11:46 +01:00
Reuben Morais
8d62a6e154 Add some clarifying comments on building SWIG from source 2021-05-05 17:04:10 +02:00
Reuben Morais
b78894d7ab
Merge pull request #1864 from NanoNabla/pr_docu_ppc64_swig
Stop pointing people to a fork for docs on building SWIG
2021-05-05 16:59:08 +02:00
NanoNabla
af35faf67e fixes docu, use official swig release on ppc64le 2021-05-05 16:46:50 +02:00
Reuben Morais
397c351fa7
Merge pull request #1863 from IlnarSelimcan/patch-1
[docs playbook] fix a typo in TRAINING.md
2021-05-04 22:47:41 +02:00
Ilnar Salimzianov
c235841871
[docs playbook] fix a typo in TRAINING.md 2021-05-04 22:50:22 +03:00
Reuben Morais
ce71ec0c89 Include missing changes in MLS English importer 2021-05-04 19:06:18 +02:00
Reuben Morais
ad4025af7d Merge branch 'build-decoder-push' (Closes #1860) 2021-05-03 16:59:19 +02:00
Reuben Morais
3dcd56145c Expand build matrix of decoder package, build on push 2021-05-03 15:00:17 +02:00
Reuben Morais
1f3d2dab4c Bump version to v0.10.0-alpha.6 2021-04-30 14:01:16 +02:00
Reuben Morais
3d2ab809ee
Merge pull request #1858 from coqui-ai/ci-updates
CI: Add ARM build and tests / Add NodeJS 16.0.0
2021-04-30 13:59:54 +02:00
Alexandre Lissy
a4d5d14304 Add NodeJS 16.0.0 2021-04-30 12:14:18 +02:00
Alexandre Lissy
1eec25a9ab CI: Linux ARMv7 / Aarch64 2021-04-30 11:05:05 +02:00
Reuben Morais
f147c78a97
Merge pull request #1856 from coqui-ai/decoder-rename
Rename decoder package to coqui_stt_ctcdecoder
2021-04-28 12:56:54 +02:00
Reuben Morais
36e0223c07 Bump version to v0.10.0-alpha.5 2021-04-27 19:46:27 +02:00
Reuben Morais
c952ee0b0d Rename decoder package to coqui_stt_ctcdecoder 2021-04-27 19:46:12 +02:00
Reuben Morais
e5aff105d4
Merge pull request #1855 from coqui-ai/linux-ci
Linux CI base
2021-04-27 19:21:51 +02:00
Reuben Morais
5ddd7e0fa2 Try to reduce API calls in check_artifact_exists 2021-04-27 14:58:14 +02:00
Reuben Morais
93128cae5f Address review comments 2021-04-27 14:58:14 +02:00
Reuben Morais
01b5a79c5c Linux CI scripts fixes 2021-04-27 14:58:14 +02:00
Reuben Morais
3f85c1d8da Linux base CI 2021-04-27 14:58:14 +02:00
Reuben Morais
a0914d8915 Improve job name 2021-04-27 14:53:07 +02:00
Reuben Morais
46dab53e11 Remove unused SWIG native build job 2021-04-27 14:52:36 +02:00
Reuben Morais
b542e9e469 Update Windows SWIG build job to use caching 2021-04-27 14:44:50 +02:00
Reuben Morais
9639a27929 Run build/test workflow on pushes to main, not master 2021-04-19 14:37:27 +02:00
Reuben Morais
9c7003d77d
Merge pull request #1843 from coqui-ai/windows-ci
Windows base CI
2021-04-19 12:44:12 +02:00
Reuben Morais
59297e526c Windows base CI 2021-04-19 10:48:14 +02:00
Alexandre Lissy
df8d17fc4e Ensure proper termination for ElectronJS and NodeJS 2021-04-18 17:03:16 +02:00
Alexandre Lissy
5558f55701 Use caching for node_modules and headers 2021-04-18 17:03:00 +02:00
Alexandre Lissy
b0c38d5aa9 Remove references to TaskCluster from ci_scripts/ 2021-04-18 17:01:47 +02:00
Alexandre Lissy
d45149b02e NodeJS repackaging 2021-04-18 16:54:00 +02:00
Reuben Morais
5d4941067f Add explicit attribution, description of changes and link to original in playbook 2021-04-15 09:10:38 +02:00
Reuben Morais
09b04a8f83
Merge pull request #1831 from coqui-ai/windows-ci
Windows CI
2021-04-12 14:56:01 +02:00
Reuben Morais
ec271453c1 Ensure upstream Python is used 2021-04-12 14:10:28 +02:00
Reuben Morais
8fe4eb8357 CI rebrand pass 2021-04-12 13:24:54 +02:00
Reuben Morais
7855f0a563 Base Windows CI setup 2021-04-12 12:54:07 +02:00
Reuben Morais
7d017df80c Bump pygments from 2.6.1 to 2.7.4 in doc/requirements.txt 2021-04-12 12:52:59 +02:00
Alexandre Lissy
f5369c8f4b Remove code refs to TaskCluster 2021-04-12 12:51:55 +02:00
Reuben Morais
5b3119ad3f Remove CircleCI setup 2021-04-12 12:44:32 +02:00
Alexandre Lissy
54f232c51a Reduce non multiplatform NodeJS/ElectronJS package tests matrices 2021-04-12 12:43:50 +02:00
Alexandre Lissy
3d96b1d4fd Fix #3549 2021-04-10 18:31:20 +02:00
Kenneth Heafield
b6b8160810 MSVC doesn't like const Proxy operator*() const.
Fixes #308
2021-04-10 18:31:13 +02:00
Alexandre Lissy
c4a4ca2bf8 Fix #3586: NumPy versions 2021-04-10 16:13:53 +02:00
Alexandre Lissy
ef31be2e32 Fix #3593: Limit tests to PR 2021-04-10 16:13:48 +02:00
Alexandre Lissy
3e66adba01 Fix #3578: Re-instate Python TF/TFLite tests on GitHub Actions / macOS 2021-04-10 16:13:42 +02:00
Alexandre Lissy
7168e83ac0 Fix #3590: Move training to macOS 2021-04-10 16:13:33 +02:00
CatalinVoss
51fd6170fa Fix documentation for check_characters.py script 2021-04-10 16:10:57 +02:00
Reuben Morais
449a723bf6 Add missing imports for sample rate normalization 2021-04-06 12:45:32 +02:00
Reuben Morais
39627b282c
Merge pull request #1827 from coqui-ai/reuben-issue-templates-1
Update issue templates
2021-04-06 12:34:54 +02:00
Reuben Morais
5933634db5 Update issue templates 2021-04-06 12:34:25 +02:00
Reuben Morais
4696c3dd0a Create PR template and update issue template 2021-04-06 12:08:40 +02:00
Reuben Morais
4d764c0559 Add importer for English subset of Multilingual LibriSpeech 2021-04-06 11:59:41 +02:00
Reuben Morais
4b9b0743a8 Replace cardboardlinter + pylint setup with pre-commit + black 2021-04-06 11:58:58 +02:00
Reuben Morais
8cdaa18533 Normalize sample rate of dev/test sets to avoid feature computation errors 2021-04-06 11:41:12 +02:00
Reuben Morais
c78af058a5 Merge branch 'playbook-into-docs' 2021-03-30 19:39:27 +02:00
Reuben Morais
c0d068702e Commit non git add'ed dockerignore files 2021-03-30 19:39:10 +02:00
Reuben Morais
0bd653a975 Merge STT playbook into docs 2021-03-30 19:38:31 +02:00
Reuben Morais
a5c950e334 Fix .readthedocs.yml to point at the correct docs requirements file 2021-03-30 18:49:31 +02:00
Josh Meyer
ce0dacd3d2
Add link to generate_scorer_package releases 2021-03-30 10:58:20 -04:00
Reuben Morais
91c5f90f3c Merge pull request #1821 from JRMeyer/docs 2021-03-29 21:34:29 +02:00
Reuben Morais
3409fde4a0 Rename model export metadata flags 2021-03-29 21:06:38 +02:00
Reuben Morais
214a150c19 Fixes for Dockerfile.{train,build} and adjust instructions for new image 2021-03-29 21:05:49 +02:00
Reuben Morais
1029d06a23 Reinstate manylinux1 hack on Python package build 2021-03-29 19:24:11 +02:00
Reuben Morais
c95b89f3c5 Remove dummy workflow 2021-03-27 11:26:42 +01:00
Alexandre Lissy
719050f204 Fix #3581: GitHub Actions test model 2021-03-27 11:24:18 +01:00
Alexandre Lissy
63aeb6a945 Introduce ci_scripts/ for GitHub Actions 2021-03-27 11:24:11 +01:00
Alexandre Lissy
cd80708546 GitHub Actions for macOS 2021-03-27 11:23:59 +01:00
Kathy Reid
654a83a294 Replace remove_remote() method with remove method
Partially resolves #3569
2021-03-27 11:16:53 +01:00
CatalinVoss
c152be2343 Handle mono conversion within pcm_to_np() 2021-03-27 11:16:35 +01:00
CatalinVoss
be5f9627da Don't throw on mono audio any more since everything should work? 2021-03-27 11:16:27 +01:00
CatalinVoss
900a01305c Expose some internal layers for downstream applications 2021-03-27 11:16:19 +01:00
Josh Meyer
653ce25a7c
Merge pull request #1807 from JRMeyer/docs
Overhaul the language model docs + include in ToC
2021-03-24 11:58:49 -04:00
Josh Meyer
04451a681c Overhaul the language model docs + include in ToC 2021-03-24 11:34:28 -04:00
Josh Meyer
cb75dcb419
Merge pull request #1808 from JRMeyer/docs-building
building docs minor changes
2021-03-24 11:10:49 -04:00
Josh Meyer
d5e000427f Reword docs for building binaries + include in ToC 2021-03-24 11:03:03 -04:00
Reuben Morais
b5f72ca4cb
Remove missing feature from list
Raised in https://github.com/coqui-ai/STT/discussions/1814
2021-03-24 10:41:44 +01:00
Eren Gölge
116029aafe
Update README.rst (#1796)
* Update README.rst

* Update README.rst

* Update README.rst

fixes
2021-03-21 14:06:00 +01:00
Reuben Morais
6c9f3a52dc Add empty workflow file to main branch 2021-03-19 13:36:43 +01:00
Reuben Morais
b4e8802aff Switch doc theme to Furo 2021-03-19 10:25:55 +01:00
Reuben Morais
6b9de13ad1 Adjust name of Python package in build system 2021-03-19 10:25:55 +01:00
Josh Meyer
629706b262
Docs welcome page and Development / Inference page overhaul (#1793)
* Docs welcome page and Development / Inference page overhaul

* Address review comments

* Fix broken refs and other small adjustments

Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2021-03-17 10:14:50 +01:00
Reuben Morais
2d654706ed
Merge pull request #1794 from coqui-ai/erogol-doi-patch
Update README with DOI from Zenodo
2021-03-16 18:56:02 +01:00
Eren Gölge
f024b0ded6
Update README.rst
DOI from ZENODO
2021-03-15 23:29:59 +01:00
Reuben Morais
e64d62631c
Merge pull request #1792 from coqui-ai/erogol-patch-2
Gitter room
2021-03-14 16:42:01 +01:00
Eren Gölge
52a709c807
Update README.rst
gitter link

Note: Without sub-def the next badge goes to the new line
2021-03-13 17:55:27 +01:00
Josh Meyer
120ff297af
🐸 instead of \:frog\: 2021-03-09 10:41:28 -05:00
Josh Meyer
89d9a53b86
readme: help + community 2021-03-08 12:47:57 -05:00
Kelly Davis
31f3a6a235 Changes for new Linux packages and bump VERSION 2021-03-08 16:55:43 +01:00
Kelly Davis
8a03f4bce5 Note on supported platforms 2021-03-07 19:50:30 +01:00
Kelly Davis
f02c12925a More updates 2021-03-07 19:25:10 +01:00
Josh Meyer
8c95f3ec20
readme 2021-03-07 13:13:41 -05:00
Kelly Davis
4c37313c3d Some leftover references 2021-03-07 14:47:47 +01:00
Kelly Davis
742b44dd2c Merge branch 'rebrand' onto main 2021-03-07 14:42:44 +01:00
Kelly Davis
57adefcc10 More rebranding, submodules, some internal names 2021-03-07 14:41:43 +01:00
Kelly Davis
6d4d1a7153 More rebranding, API names, iOS, .NET 2021-03-07 14:29:02 +01:00
Kelly Davis
136ca35ca2 Contributor covenant badge 2021-03-07 11:37:17 +01:00
Kelly Davis
95f122806e More rebranding, Java package, C++ impl 2021-03-07 11:34:01 +01:00
Kelly Davis
f33f0b382d More rebranding, Python and JS packages verified 2021-03-07 11:14:16 +01:00
Kelly Davis
99fc28a6c7 More rebranding 2021-03-05 16:46:18 +01:00
Kelly Davis
915886b3b7 Main README logo 2021-03-05 12:53:37 +01:00
Kelly Davis
d2009582e9 Rebranding WIP 2021-03-05 12:48:08 +01:00
lissyx
2bb42d4fb1
Merge pull request #3548 from lissyx/doc-net-build
Expose .Net building doc better
2021-03-03 15:45:25 +01:00
Alexandre Lissy
a087509ab7 Expose .Net building doc better 2021-03-03 15:42:31 +01:00
Reuben Morais
8c8b80dc0b
Merge pull request #3546 from dzubke/Iss-3511_split-sets
Fix #3511: split-sets on sample size
2021-03-01 18:09:38 +00:00
Dustin Zubke
6945663698 Fix #3511: split-sets on sample size 2021-02-28 16:09:37 -05:00
lissyx
385c8c769b
Merge pull request #3539 from lissyx/new-swig
Tentative merge of SWIG master
2021-02-25 18:54:58 +01:00
Alexandre Lissy
206b8355b1 Fix #3540: Force NAudio 1.10.0 2021-02-25 17:09:15 +01:00
Alexandre Lissy
fee12be4ff Update SWIG with upstream 4.1-aligned branch 2021-02-25 17:09:15 +01:00
lissyx
7b2eeb6734
Merge pull request #3524 from Ideefixze/master
Added hot-word boosting doc
2021-02-12 20:35:13 +01:00
Ideefixze
7cf257a2f5 Added hot-word boosting api example doc
Comments for API bindings
X-DeepSpeech: NOBUILD
2021-02-12 19:52:19 +01:00
lissyx
cc038c1263
Merge pull request #3527 from zaptrem/master
Fix incompatible Swift module error
2021-02-12 09:34:48 +01:00
zaptrem
9d83e18113 Fix incompatible Swift 2021-02-12 00:08:34 -05:00
lissyx
962a117f7e
Merge pull request #3518 from lissyx/rebuild-swig
Fix #3517: Update SWIG sha1
2021-02-01 16:54:06 +01:00
Alexandre Lissy
6eca9b4e0a Fix #3517: Update SWIG sha1 2021-02-01 16:21:44 +01:00
CatalinVoss
f27908e7e3 Fix copying remote AudioFile target to local 2021-01-26 10:02:59 +00:00
Reuben Morais
efbd6be727 Merge PR #3509 (Use pyyaml.safe_load in tc-decision.py) 2021-01-25 09:53:55 +00:00
lissyx
50c7ac6cf6
Merge pull request #3514 from lissyx/fix-decision-task-master
Set base image to ubuntu 18.04
2021-01-25 10:46:52 +01:00
Anton Yaroshenko
54565a056f Set base image to ubuntu 18.04 2021-01-25 10:41:23 +01:00
Reuben Morais
d7e0e89aed
Merge pull request #3510 from zaptrem/patch-1
Swift iOS Bindings: Expose DeepSpeechTokenMetadata fields
2021-01-22 10:29:50 +00:00
zaptrem
28ddc6b0e0
Expose DeepSpeechTokenMetadata fields
Currently, attempting to access member fields DeepSpeechTokenMetadata objects output from intermediateDecodeWithMetadata causes a crash. Changing these lines makes the object work as (I assume) intended.
2021-01-22 03:42:08 -05:00
lissyx
93c7d1d5dc
Merge pull request #3508 from tud-zih-tools/docu_unsupported_architecture
Docu building ctc decoder on unsupported architecture
2021-01-21 23:33:09 +01:00
NanoNabla
5873145c8e arm is not supported for building cdcdecoder 2021-01-21 23:31:07 +01:00
NanoNabla
334f6b1e47 improve ctcdecode docu for unsupported platforms 2021-01-21 20:59:27 +01:00
NanoNabla
aec81bc048 add hints for building ctcdecode on unsupported platforms 2021-01-21 10:52:26 +01:00
lissyx
b9aa725900
Merge pull request #3505 from tud-zih-tools/ppc64le_integration
build ctcdecode on ppc64le
2021-01-20 23:38:18 +01:00
NanoNabla
d0f0a2d6e8 applying lissyx's patch from mozilla#3379, make it possible to set PYTHON_PLATFORM_NAME in environment on target host 2021-01-20 20:18:03 +01:00
Reuben Morais
80b5fe10df
Merge pull request #3493 from mozilla/add-ogg-opus-training-support
Add ogg opus training support
2021-01-20 17:53:10 +00:00
NanoNabla
80da74c472 add build rules for ctcdecode on ppc64le 2021-01-20 17:25:29 +01:00
Reuben Morais
b2feb04763 Fix some test names/descs and drop Py3.5 training tests 2021-01-18 16:23:40 +00:00
Reuben Morais
f2e57467c6 Compare sample durations with an epsilon 2021-01-18 16:20:03 +00:00
Reuben Morais
db45057dcc Add missing metadata leak suppressions 2021-01-18 13:57:44 +00:00
Reuben Morais
64465cd93a Bump NCCL version due to NVIDIA base image update 2021-01-18 13:37:13 +00:00
Reuben Morais
79a42b345d Read audio format from data before running augmentation passes instead of assuming default 2021-01-18 12:11:31 +00:00
Reuben Morais
8c0d46cb7f Normalize sample rate of train_files by default 2021-01-18 12:11:31 +00:00
Reuben Morais
d4152f6e67 Add support for Ogg/Opus audio files for training 2021-01-18 12:11:31 +00:00
Reuben Morais
ad0f7d2ab7
Merge pull request #3486 from KathyReid/patch-3
Update refs to 0.9.3 from 0.9.2
2021-01-03 09:56:24 +00:00
Kathy Reid
bb47cf26d0
Update refs to 0.9.3 from 0.9.2
I'm using this documentation to build out a Playbook - please don't interpret this as nitpicking, saw a minor change and made it.
2021-01-03 13:53:46 +11:00
Anon-Artist
5edfcdb92e
Update tc-decision.py 2020-12-21 15:34:00 +05:30
Reuben Morais
fcbd92d0d7 Bump version to v0.10.0-alpha.3 2020-12-19 09:28:21 +00:00
Reuben Morais
81c2a33f5b Separate branch and tag 2020-12-19 09:23:32 +00:00
Reuben Morais
239656c0f9 Bump version to v0.10.0-alpha.2 2020-12-19 09:11:06 +00:00
Reuben Morais
05654ef896 Expose GITHUB_HEAD_TAG, used by package upload scriptworker 2020-12-19 09:10:40 +00:00
Reuben Morais
55751e5d70 Bump version to v0.10.0-alpha.1 2020-12-19 08:48:27 +00:00
Reuben Morais
dc16a0e7f9 Separate ref and branch/tag metadata 2020-12-18 23:49:33 +00:00
Reuben Morais
9c988c764b Fix metadata.github.ref on push and tag 2020-12-18 23:42:29 +00:00
Reuben Morais
273d461f6a Bump version to v0.10.0-alpha.0 2020-12-18 23:29:54 +00:00
Reuben Morais
caaec68f59
Merge pull request #3473 from mozilla/taskcluster-v1
Convert to .taskcluster.yml v1
2020-12-18 20:36:38 +00:00
Reuben Morais
4723de25bf Use payload.env instead of forwarding variables manually 2020-12-18 17:00:00 +00:00
Reuben Morais
bb1ad00194 Convert to .taskcluster.yml v1
forward TASK_ID, add created and deadline

more fixes

typo

try without TASK_ID

fix task templates

add missing env vars to tc decision dry runs

avoid repetition in .taskcluster and manually forward varibles to tc-decision.py

url -> clone_url

simulate GITHUB_EVENT

separate ref an sha

correct pull request actions

correct pull request policy
2020-12-18 09:35:14 +00:00
Reuben Morais
07d0e93083 Add paragraph on expected behavior from module owners
X-DeepSpeech: NOBUILD
2020-12-17 08:59:36 +00:00
Reuben Morais
8a88e6e063 Fix link in RST
X-DeepSpeech: NOBUILD
2020-12-17 08:53:59 +00:00
Reuben Morais
89cae68706 Improve explanation of governance model 2020-12-17 08:51:08 +00:00
lissyx
3e10163ec8
Merge pull request #3416 from lissyx/pr-3414
.NET Client Binding Fix
2020-12-08 14:48:32 +01:00
Reuben Morais
b3b9e268a7
Merge pull request #3460 from mozilla/more-doc-fixes
More documentation fixes
2020-12-08 15:42:11 +02:00
imrahul3610
1be44c63fc Hotword support for .NET client tests 2020-12-08 13:42:53 +01:00
Reuben Morais
d422955c4a Fix doc references to renamed StreamImpl class 2020-12-08 13:52:04 +02:00
Reuben Morais
1102185abf More branding fixes for docs & Java bindings 2020-12-08 13:36:28 +02:00
Reuben Morais
857ce297f0
Merge pull request #3459 from mozilla/move-linter-circleci
Move linting job to CircleCI
2020-12-08 13:24:35 +02:00
Reuben Morais
0e2209e2b3 Remove Travis 2020-12-08 13:21:05 +02:00
Reuben Morais
25c4f97aa7 Move linting job to CircleCI 2020-12-08 13:21:05 +02:00
Sjors Holtrop
8c8387c45a
Rename Stream class to StreamImpl, export its type as Stream (#3456) 2020-12-08 12:19:21 +01:00
Reuben Morais
4e55d63351
Fix package name reference in Java API docs (#3458) 2020-12-08 10:44:31 +01:00
Catalin Voss
6640cf2341
Remote training I/O once more (#3437)
* Redo remote I/O changes once more; this time without messing with taskcluster

* Add bin changes

* Fix merge-induced issue?

* For the interleaved case with multiple collections, unpack audio on the fly

To reproduce the previous failure

rm data/smoke_test/ldc93s1.csv
rm data/smoke_test/ldc93s1.sdb
rm -rf /tmp/ldc93s1_cache_sdb_csv
rm -rf /tmp/ckpt_sdb_csv
rm -rf /tmp/train_sdb_csv

./bin/run-tc-ldc93s1_new_sdb_csv.sh 109 16000
python -u DeepSpeech.py --noshow_progressbar --noearly_stop --train_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --train_batch_size 1 --feature_cache /tmp/ldc93s1_cache_sdb_csv --dev_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --dev_batch_size 1 --test_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --test_batch_size 1 --n_hidden 100 --epochs 109 --max_to_keep 1 --checkpoint_dir /tmp/ckpt_sdb_csv --learning_rate 0.001 --dropout_rate 0.05 --export_dir /tmp/train_sdb_csv --scorer_path data/smoke_test/pruned_lm.scorer --audio_sample_rate 16000

* Attempt to preserve length information with a wrapper around `map()`… this gets pretty python-y

* Call the right `__next__()`

* Properly implement the rest of the map wrappers here……

* Fix trailing whitespace situation and other linter complaints

* Remove data accidentally checked in

* Fix overlay augmentations

* Wavs must be open in rb mode if we're passing in an external file pointer -- this confused me

* Lint whitespace

* Revert "Fix trailing whitespace situation and other linter complaints"

This reverts commit c3c45397a2f98e9b00d00c18c4ced4fc52475032.

* Fix linter issue but without such an aggressive diff

* Move unpack_maybe into sample_collections

* Use unpack_maybe in place of duplicate lambda

* Fix confusing comment

* Add clarifying comment for on-the-fly unpacking
2020-12-07 13:07:34 +01:00
Reuben Morais
18b66adf46
Merge pull request #3435 from olafthiele/scorerchange
Conditional msg for missing lm.binary added
2020-12-07 13:59:36 +02:00
Reuben Morais
a947e80f70
Merge pull request #3454 from mozilla/branding-cleanup
Branding cleanup
2020-12-07 13:59:03 +02:00
Reuben Morais
4639d57f81
Merge pull request #3455 from mozilla/conda-instructions
Add some guidelines for conda environments for training
2020-12-07 10:56:57 +02:00
Reuben Morais
f6ddc4f72c Add some guidelines for conda environments for training 2020-12-07 10:55:35 +02:00
Reuben Morais
c7ce999e02 Remove trademark from Swift binding project identifier 2020-12-07 10:20:02 +02:00
Reuben Morais
da0209de01 Remove trademark from Java binding package names 2020-12-07 10:18:56 +02:00
Reuben Morais
f822b04e1b Branding cleanup
Remove Mozilla trademarks.
2020-12-07 10:07:39 +02:00
Reuben Morais
ad7d61f837
Merge pull request #3452 from mozilla/codeowners
Add listing of code owners/reviewers and reference from CONTRIBUTING.rst
2020-12-04 15:23:28 +02:00
Reuben Morais
bc078423eb Merge branch 'pr-3436-leaks' (Fixes #3436 and #3451) 2020-12-04 15:21:17 +02:00
Reuben Morais
c6318859df Re-add missing TF flags to deepspeech_bundle library 2020-12-04 15:20:09 +02:00
CatalinVoss
32b6067a01 Enable static build of DeepSpeech iOS framework
Set up additional `deepspeech_ios` target with static build steps

Xcode config: lock swift version at 5.0, bundle framework rather than dynamic lib, never strip swift symbols, add framework search paths, and bring in lstdc++

Runtime schema config: disable the main thread checker as this causes trouble with the static build

Update model versions to 0.9.1

Remove libdeepspeech.so from example app bundling steps

Swift lib embed settings that are somehow essential

Attempt to adjust taskcluster build steps

Add a basic podspec

Add framework to gitignore

Fix podspec version code

Attempt to fix taskcluster unzip step

Switch deepspeech targets for iOS build

Try doing this unzip in one step

Remove packaging steps for unneeded stuff because libdeepspeech.so is no longer a thing here. I suppose we could add a step to package the iOS static lib instead.

Fix podspec version

Set up podspec relative assuming a clone from the repo root

Remove space in iOS package step

Fix buildfile nit

Link stdc++ in explicitly with iOS build only

Revert "Remove space in iOS package step"

This reverts commit 3e1922ea370c110f9854ae7e97101f2ea00f55c6.
2020-12-04 15:19:49 +02:00
Reuben Morais
73240a0f1d Add listing of code owners/reviewers and reference from contribution guidelines
X-DeepSpeech: NOBUILD
2020-12-04 15:17:09 +02:00
lissyx
bcfc74874f
Merge pull request #3444 from lissyx/doc-cuda
Fix #3443: Link to upstream Dockerfile for lack of correct TensorFlow…
2020-11-27 12:37:45 +01:00
Alexandre Lissy
c979e360da Fix #3443: Link to upstream Dockerfile for lack of correct TensorFlow GPU deps doc. 2020-11-27 12:36:23 +01:00
lissyx
da31812173
Merge pull request #3440 from lissyx/electronjs_11
Adding support for ElectronJS v11.0
2020-11-26 16:08:52 +01:00
Alexandre Lissy
c0c5e6ade8 Adding support for ElectronJS v11.0 2020-11-26 13:28:57 +01:00
lissyx
d217369839
Merge pull request #3428 from lissyx/import-ccef
Importer for XML file provided by Conference Centre for Economics, France
2020-11-24 09:51:36 +01:00
Alexandre Lissy
c822a6e875 Importer for dataset from Centre de Conférences Pierre Mendès-France
Released by Ministère de l'Economie, des Finances, et de la Relance
2020-11-24 09:49:39 +01:00
Olaf Thiele
3ae77ca75d Conditional msg for missing lm.binary added 2020-11-23 19:55:27 +01:00
Reuben Morais
ecc48062a7
Merge pull request #3432 from mozilla/revert-remote-io
Revert remote IO PR
2020-11-19 19:35:20 +02:00
Reuben Morais
88f7297215 Revert "Merge pull request #3420 from CatalinVoss/remote-io"
This reverts commit 08d18d7328c03eb0c65d28ffdc0d3755549585e0, reversing
changes made to 12badcce1ffc820bebc4cd2ed5d9787b248200f6.
2020-11-19 16:58:21 +02:00
Reuben Morais
f5cbda694a Revert "Merge pull request #3424 from mozilla/io-fixes"
This reverts commit ab1288ffde7118a76e5394e142b789adf3ad1bba, reversing
changes made to 08d18d7328c03eb0c65d28ffdc0d3755549585e0.
2020-11-19 16:58:01 +02:00
lissyx
ee68367580
Merge pull request #3430 from lissyx/fix-tc-gzip
Fix #3429: TaskCluster behavioral change wrt compression of artifacts
2020-11-19 14:54:36 +01:00
Alexandre Lissy
3caa474cce Fix #3429: TaskCluster behavioral change wrt compression of artifacts 2020-11-19 13:23:56 +01:00
Reuben Morais
ab1288ffde
Merge pull request #3424 from mozilla/io-fixes
Fix I/O issues introduced in #3420
2020-11-18 08:07:10 +02:00
CatalinVoss
6cb638211e Only unpack when we need to, to make things work with SDBs 2020-11-17 16:55:49 -08:00
CatalinVoss
24e9e6777c Make sure we properly unpack samples when changing audio types 2020-11-17 14:44:26 -08:00
CatalinVoss
9aaa0e406b Make sure to unpack samples now 2020-11-17 14:31:48 -08:00
CatalinVoss
8bf1e9ddb7 Fix too aggressive F&R 2020-11-17 14:21:31 -08:00
CatalinVoss
ffe2155733 Undo remote edits for taskcluster as this is all local 2020-11-17 13:47:55 -08:00
CatalinVoss
7121ca5a2b Add a dockerignore for slightly faster local docker builds 2020-11-17 13:40:35 -08:00
Reuben Morais
08d18d7328
Merge pull request #3420 from CatalinVoss/remote-io
Remote I/O Training Setup
2020-11-17 11:53:32 +02:00
CatalinVoss
d0678cd1b7 Remove unused unordered imap from LimitPool 2020-11-16 13:47:21 -08:00
CatalinVoss
611633fcf6 Remove unnecessary uses of open_remote() where we know __file__ will always be local 2020-11-16 13:47:06 -08:00
CatalinVoss
b5b3b2546c Clean up remote I/O docs 2020-11-16 13:46:34 -08:00
CatalinVoss
fb6d4ca361 Add disclaimers to CSV and Tar writers 2020-11-13 19:36:07 -08:00
CatalinVoss
8c1a183c67 Clean up print debugging statements 2020-11-13 19:24:09 -08:00
CatalinVoss
47020e4ecb Add an imap_unordered helper to LimitPool -- I might experiment with this 2020-11-13 19:20:02 -08:00
CatalinVoss
3d2b09b951 Linter seems unhappy with conditional imports. Make gfile a module-level import.
I usually do this as a conditional because tf takes a while to load and it's nice to skip it when you want to run a script that just preps data or something like that, but it doesn't seem like a big deal.
2020-11-13 10:47:06 -08:00
CatalinVoss
2332e7fb76 Linter fix: define self.tmp_src_file_path in init 2020-11-13 10:45:53 -08:00
CatalinVoss
be39d3354d Perform data loading I/O within worker process rather than main process by wrapping Sample 2020-11-12 21:46:39 -08:00
CatalinVoss
fc0b495643 TODO: CSVWriter still totally breaks with remote paths 2020-11-12 16:46:59 -08:00
CatalinVoss
86cba458c5 Fix remote path handling for CSV sample reading 2020-11-12 16:40:59 -08:00
CatalinVoss
8fe972eb6f Fix wave file reading helpers 2020-11-12 16:40:40 -08:00
CatalinVoss
783cdad8db Fix downloader and taskcluster directory mgmt with remote I/O 2020-11-12 16:30:11 -08:00
CatalinVoss
64d278560d Why do we need absolute paths everywhere here? 2020-11-12 16:29:43 -08:00
CatalinVoss
0030cab220 Skip remote zipping for now 2020-11-12 16:29:23 -08:00
CatalinVoss
a6322b384e Fix remote I/O handling in train 2020-11-12 16:29:16 -08:00
CatalinVoss
8f31072998 Fix startswith check 2020-11-12 15:09:42 -08:00
CatalinVoss
90e2e1f7d2 Respect buffering, encoding, newline, closefd, and opener if we're looking at a local file 2020-11-12 14:45:05 -08:00
CatalinVoss
ad08830421 Work remote I/O into audio utils -- a bit more involved 2020-11-12 14:17:03 -08:00
CatalinVoss
3d503bd69e Add universal is_remote_path to I/O helper 2020-11-12 14:16:37 -08:00
CatalinVoss
c3dc4c0d5c Fix bad I/O helper fn replace errors 2020-11-12 14:06:22 -08:00
CatalinVoss
abe5dd2eb4 Remote I/O for taskcluster 2020-11-12 12:49:44 -08:00
CatalinVoss
296b74e01a Remote I/O for sample_collections 2020-11-12 10:54:44 -08:00
CatalinVoss
7de317cf59 Remote I/O for evaluate_tools 2020-11-12 10:49:33 -08:00
CatalinVoss
396ac7fe46 Remote I/O for downloader 2020-11-12 10:48:49 -08:00
CatalinVoss
933d96dc74 Fix relative imports 2020-11-12 10:47:26 -08:00
CatalinVoss
42170a57eb Remote I/O for config 2020-11-12 10:46:49 -08:00
CatalinVoss
83e5cf0416 Remote I/O fro check_characters 2020-11-12 10:46:15 -08:00
CatalinVoss
579921cc92 Work remote I/O into train script 2020-11-12 10:45:35 -08:00
CatalinVoss
53e3f5374f Add I/O helpers for remote file access 2020-11-12 10:44:19 -08:00
lissyx
12badcce1f
Merge pull request #3393 from imrahul361/master
Run test On Java Client
2020-11-05 16:30:41 +01:00
imrahul3610
3ac6b4fda6 Run test On Java Client 2020-11-05 19:10:50 +05:30
lissyx
8f9d6ad024
Merge pull request #3408 from lissyx/pr-3406
Pr 3406
2020-11-05 13:13:40 +01:00
dag7dev
3a2879933f initial commit for py39 support 2020-11-04 20:16:35 +01:00
Reuben Morais
b72e2643c4
Merge pull request #3395 from CatalinVoss/patch-1
Minor Training Variable Consistency fix
2020-11-03 21:50:59 +01:00
Catalin Voss
98e75c3c03
Call the logits probs in create_inference_graph after they go thru softmax 2020-11-03 09:49:27 -08:00
lissyx
19eeadd0f3
Merge pull request #3398 from lissyx/fix-rtd
Force npm install on RTD and set appropriate PATH value
2020-11-03 14:36:47 +01:00
Alexandre Lissy
1cd5e44a52 Force npm install on RTD and set appropriate PATH value 2020-11-03 14:33:52 +01:00
Catalin Voss
9a92fa40ca
Make variables consistent 2020-11-02 21:09:35 -08:00
lissyx
d9a35d63b0
Merge pull request #3390 from JRMeyer/contributing-docs
note about perf testing
2020-10-29 10:27:51 +01:00
Josh Meyer
b732e39567 note about perf testing
X-DeepSpeech: NOBUILD
2020-10-28 10:22:19 -04:00
lissyx
4427cf9a42
Merge pull request #3389 from suriyaa/patch-1
Use HTTPS in README.md
2020-10-27 12:32:04 +01:00
Suriyaa Sundararuban
87c44d75a3
Use HTTPS in README.md 2020-10-27 11:04:32 +01:00
lissyx
e6a281ed4f
Merge pull request #3383 from ftyers/node15
update for NodeJS 15
2020-10-26 18:22:22 +01:00
Francis Tyers
55e31c4025 update for NodeJS 15 2020-10-26 15:44:06 +00:00
lissyx
5e2a916899
Merge pull request #3385 from liezl200/sys-import-voxforge
Add missing sys import to import_voxforge.py
2020-10-23 15:07:02 +02:00
Liezl P
af7c4e90df Add missing sys import to import_voxforge.py 2020-10-22 23:09:49 -10:00
Reuben Morais
0798698e97
Merge pull request #3380 from piraka9011/patch-1
Convert channels for CV2 dataset
2020-10-17 00:43:08 +02:00
Anas Abou Allaban
521842deea
Convert channels for CV2 dataset
When running a training session on the CV2 dataset, it is possible to get the following error:

```
ValueError: Mono-channel audio required
```

This makes the [pysox Transformer](https://pysox.readthedocs.io/en/latest/api.html#sox.transform.Transformer.convert) also convert the channels.
2020-10-15 11:22:39 -04:00
lissyx
e508cd30b7
Merge pull request #3377 from actual-kwarter/master
Minor spelling fixes to CONTRIBUTING.rst X-DeepSpeech: NOBUILD
2020-10-14 10:28:33 +02:00
THCKwarter
e9fc614d8a Minor spelling fixes to CONTRIBUTING.rst X-DeepSpeech: NOBUILD 2020-10-13 22:53:49 -05:00
Reuben Morais
51e351e895
Merge pull request #3370 from tiagomoraismorgado/patch-1
X-DeepSpeech: NOBUILD
2020-10-12 14:18:09 +02:00
tiagomoraismorgado
f753b86ca9
[docs/typos/enhance] - mozilla/deepspeech/readme.rst - update
[docs/typos/enhance] - mozilla/deepspeech/readme.rst - update
2020-10-12 12:46:32 +01:00
lissyx
435b20d530
Merge pull request #3369 from nmstoker/patch-1
Tiny fix to addHotWord doc string parameters
2020-10-12 09:03:33 +02:00
Neil Stoker
2ca91039c8
Tiny fix to addHotWord doc string parameters
As the parameter for boost was actually written as "word" in the doc string, it was replacing the previous type for word with the type intended for boost and not showing any type for boost, thus messing up what displayed on https://deepspeech.readthedocs.io/en/master/Python-API.html
2020-10-11 17:46:20 +01:00
lissyx
7ca237d19b
Merge pull request #3361 from imrahul361/master
enable hot-words boosting for Javascript
2020-10-10 16:00:47 +02:00
imrahul3610
9df89bd945 Fix JavaScript binding calls for Hot Words 2020-10-10 11:30:27 +05:30
imrahul3610
368f76557a Run Tests on CI for JS Client 2020-10-10 11:30:27 +05:30
imrahul3610
29b39fd2d5 JS Binding Fix 2020-10-10 11:30:27 +05:30
Reuben Morais
07fcd5bcd1
Merge pull request #3360 from mozilla/utf8alphabet-python-bindings
Fix binding of UTF8Alphabet class in decoder package
2020-10-06 22:07:45 +02:00
Reuben Morais
cc2763e0b7 Add small bytes output mode scorer for tests 2020-10-06 18:19:34 +02:00
Reuben Morais
09f0aa3d75 Rename --force_utf8 flag to --force_bytes_output_mode to avoid confusion 2020-10-06 18:19:34 +02:00
Reuben Morais
83a36b7a34 Rename --utf8 flag to --bytes_output_mode to avoid confusion 2020-10-06 18:19:33 +02:00
Reuben Morais
fb4f5b6a84 Add some coverage for training and inference in bytes output mode 2020-10-06 18:19:33 +02:00
Reuben Morais
2fd11dd74a Fix binding of UTF8Alphabet class in decoder package 2020-10-06 13:13:34 +02:00
lissyx
421f44cf73
Merge pull request #3357 from JRMeyer/mono-channel-error-message
mono-channel error, not just an assertion
2020-10-03 11:08:07 +02:00
josh meyer
afee570f3c mono-channel error, not just an assertion
X-DeepSpeech: NOBUILD
2020-10-02 13:27:43 -07:00
lissyx
dd4122a04a
Merge pull request #3356 from lissyx/linux-valgrind
Linux valgrind
2020-10-01 18:49:19 +02:00
Alexandre Lissy
fdd663829a Fix #3355: Add valgrind runs 2020-10-01 15:31:21 +02:00
Alexandre Lissy
86bba80b0e Fix #3292: Linux debug builds 2020-10-01 12:40:24 +02:00
lissyx
f20f939ade
Merge pull request #3351 from lissyx/leak-intermediate-decode
Fix leak in C++ client
2020-09-29 18:30:13 +02:00
Alexandre Lissy
9a34507023 Fix leak in C++ client 2020-09-29 16:02:27 +02:00
lissyx
0c020d11bc
Merge pull request #3350 from lissyx/test-lzma-bz2
Auto-discover lzma/bz2 linkage of libmagic
2020-09-29 12:52:42 +02:00
Alexandre Lissy
9674ced520 Auto-discover lzma/bz2 linkage of libmagic 2020-09-29 10:52:37 +02:00
lissyx
c7d58d628e
Merge pull request #3343 from lissyx/docker-1.15.4
Use correct 1.15.4 docker image
2020-09-28 14:54:14 +02:00
Alexandre Lissy
02548c17de Fix #3347: Disable Git-LFS on Windows 2020-09-28 13:33:59 +02:00
Alexandre Lissy
57c26827c0 Use correct 1.15.4 docker image 2020-09-28 12:43:12 +02:00
lissyx
731dd1b6bd
Merge pull request #3338 from lissyx/tf-1.15.4
Fix #3088: Use TensorFlow 1.15.4 with CUDNN fix
2020-09-25 16:00:36 +02:00
lissyx
d7be8e2789
Fix typo on DS_ClearHotWords 2020-09-25 14:45:37 +02:00
lissyx
5a88417547
Merge pull request #3339 from lissyx/missing-hotword-c-doc
Fix missing doc for new Hot Word API
2020-09-25 14:40:27 +02:00
Alexandre Lissy
25c2965da8 Fix missing doc for new Hot Word API
X-DeepSpeech: NOBUILD
2020-09-25 14:39:42 +02:00
lissyx
0728ac259e
Merge pull request #3320 from lissyx/build-kenlm
Fix #3299: Build KenLM on CI
2020-09-25 14:36:23 +02:00
Alexandre Lissy
16165f3ddc Fix #3088: Use TensorFlow 1.15.4 with CUDNN fix 2020-09-25 14:11:06 +02:00
Alexandre Lissy
bf5ae9cf8a Fix #3299: Build KenLM on CI 2020-09-25 13:25:38 +02:00
lissyx
34a62bd1d1
Merge pull request #3337 from lissyx/bump-0.9.0a10
Bump VERSION to 0.9.0-alpha.10
2020-09-25 13:25:16 +02:00
Alexandre Lissy
445ebb233a Bump VERSION to 0.9.0-alpha.10 2020-09-25 11:05:12 +02:00
Josh Meyer
1eb155ed93
enable hot-word boosting (#3297)
* enable hot-word boosting

* more consistent ordering of CLI arguments

* progress on review

* use map instead of set for hot-words, move string logic to client.cc

* typo bug

* pointer things?

* use map for hotwords, better string splitting

* add the boost, not multiply

* cleaning up

* cleaning whitespace

* remove <set> inclusion

* change typo set-->map

* rename boost_coefficient to boost

X-DeepSpeech: NOBUILD

* add hot_words to python bindings

* missing hot_words

* include map in swigwrapper.i

* add Map template to swigwrapper.i

* emacs intermediate file

* map things

* map-->unordered_map

* typu

* typu

* use dict() not None

* error out if hot_words without scorer

* two new functions: remove hot-word and clear all hot-words

* starting to work on better error messages

X-DeepSpeech: NOBUILD

* better error handling + .Net ERR codes

* allow for negative boosts:)

* adding TC test for hot-words

* add hot-words to python client, make TC test hot-words everywhere

* only run TC tests for C++ and Python

* fully expose API in python bindings

* expose API in Java (thanks spectie!)

* expose API in dotnet (thanks spectie!)

* expose API in javascript (thanks spectie!)

* java lol

* typo in javascript

* commenting

* java error codes from swig

* java docs from SWIG

* java and dotnet issues

* add hotword test to android tests

* dotnet fixes from carlos

* add DS_BINARY_PREFIX to tc-asserts.sh for hotwords command

* make sure lm is on android for hotword test

* path to android model + nit

* path

* path
2020-09-24 14:58:41 -04:00
Reuben Morais
d466fb09d4 Bump VERSION to 0.9.0-alpha.9 2020-09-21 12:11:53 +02:00
Reuben Morais
cc62aa2eb8
Merge pull request #3279 from godefv/decoder_timesteps
The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.
2020-09-17 20:31:05 +02:00
godeffroy
188501a333 PR #3279 - Reverted unrelated and unwanted change. 2020-09-17 19:10:43 +02:00
godeffroy
371ddb84e5 PR #3279 - Added README.mozilla to tell where the object pool code is from and updated the object pool code from this origin (minor update). 2020-09-17 17:55:45 +02:00
godeffroy
5bf5124366 PR #3279 - Added some comments, harmonized a few names, removed unneeded spaces 2020-09-17 14:27:33 +02:00
lissyx
014479e650
Merge pull request #3324 from gtcooke94/fix_swb_import
Added `os` import in front of `makedirs`
2020-09-16 20:44:33 +02:00
Greg Cooke
20ad86c6ab Added os import in front of makedirs 2020-09-16 14:20:59 -04:00
godeffroy
23944b97db PR #3279 - Made the timestep tree thread safe 2020-09-16 14:03:59 +02:00
godeffroy
1fa2e4ebcc PR #3279 - Fixed buggy timestep tree root 2020-09-15 21:30:45 +02:00
lissyx
1b3e97c102
Merge pull request #3322 from lissyx/fix-docker-build
Fix #3321: Update NCCL dep to 2.7 following NVIDIA update
2020-09-15 16:37:29 +02:00
Alexandre Lissy
76d5fb6389 Fix #3321: Update NCCL dep to 2.7 following NVIDIA update 2020-09-15 13:40:17 +02:00
godeffroy
14bd9033d6 Revert "PR #3279 - removed unrelated code"
This reverts commit 78c4ef17b11fe681702cb0619a0b938a0b59f5bd.
2020-09-14 22:45:42 +02:00
godeffroy
15ce05aa01 PR #3279 - Fixed spaces 2020-09-14 14:40:56 +02:00
lissyx
346b5bdbae
Merge pull request #3318 from lissyx/electron-10
Fix #3316: Add Electron 10.x
2020-09-10 16:27:15 +02:00
Alexandre Lissy
2e92f53aac Use bigger build machine to avoid recurrent breakages of Linux/CUDA builds 2020-09-10 15:13:51 +02:00
Alexandre Lissy
a4d6c672d4 Fix #3316: Add Electron 10.x 2020-09-10 12:08:17 +02:00
lissyx
16a7a27275
Merge pull request #3319 from olafthiele/master-branch-error
Simplified git clone msg to prevent error reportings
2020-09-10 11:51:27 +02:00
Olaf Thiele
de1e3d7aa0 Simplified install text 2020-09-10 10:58:28 +02:00
lissyx
dda2d22310
Merge pull request #3314 from olafthiele/master-branch-errors
Trying to get fewer master branch training errors
2020-09-09 15:08:36 +02:00
godeffroy
f07c10452b PR #3279 - use unique_ptr instead of shared_ptr in the timestep tree 2020-09-09 11:04:37 +02:00
lissyx
ce95be1354
Merge pull request #3315 from lissyx/bump-v0.9.0-alpha.8
Bump VERSION to v0.9.0-alpha.8
2020-09-09 10:53:46 +02:00
Alexandre Lissy
b30e0fb815 Bump VERSION to v0.9.0-alpha.8 2020-09-09 08:49:46 +02:00
Olaf Thiele
a2e88a30de
More compact version 2020-09-08 15:32:23 +02:00
Olaf Thiele
39a963af90
Update TRAINING.rst 2020-09-08 14:54:08 +02:00
godeffroy
3a49344ccb PR #3279 - use an object pool to store timesteps tree nodes 2020-09-08 14:12:39 +02:00
lissyx
11be0a57d4
Merge pull request #3313 from mozilla/erogol-patch-1
fix missing import 'sys'
2020-09-08 10:43:02 +02:00
Eren Gölge
b2df360799
fix missing import 'sys' 2020-09-08 10:15:22 +02:00
godeffroy
ec55597412 PR #3279 - use a tree structure to store timesteps 2020-09-07 13:37:27 +02:00
lissyx
012e7bfb5e
Merge pull request #3309 from JRMeyer/docs-contributing
Docs contributing
2020-09-07 12:18:32 +02:00
Josh Meyer
ff057e86c7 bold instead of ticks 2020-09-02 10:54:13 -04:00
lissyx
91e70602ce
Merge pull request #3307 from techiaith/master
updating docs for #3295
2020-09-02 16:29:31 +02:00
Dewi Bryn Jones
8a8d140da8 updating docs for #3295 2020-09-02 15:13:57 +01:00
lissyx
b6f5ddfe54
Merge pull request #3301 from techiaith/master
Fix for setuptools._distutils issue (#3295)
2020-09-02 14:32:37 +02:00
Josh Meyer
fdf6aeb22b first stab at CONTRIBUTING.rst 2020-09-02 08:29:19 -04:00
Dewi Bryn Jones
a6dff311f6 fix for #3295 2020-09-02 13:10:02 +01:00
lissyx
9377aaf3a0
Merge pull request #3296 from lissyx/transcribe-ci
Fix #3129: Add CI coverage for transcribe.py
2020-09-01 19:13:28 +02:00
Alexandre Lissy
32ad25b088 Fix #3129: Add CI coverage for transcribe.py 2020-09-01 17:49:31 +02:00
godeffroy
1f89bef5f0 PR #3279 - avoid unnecessary copies of timesteps vectors 2020-08-31 19:01:47 +02:00
lissyx
ccb1a6b0d4
Merge pull request #3278 from DanBmh/refactor_rlrop_cond
Refactor rlrop condition
2020-08-31 16:49:01 +02:00
lissyx
26f99874a6
Merge pull request #3293 from lissyx/decouple-builds
Decouple builds
2020-08-31 14:53:05 +02:00
Daniel
c10f7f1ad6 Refactor rlrop condition. 2020-08-31 12:57:48 +02:00
Alexandre Lissy
4bc14acb12 Decouple builds
Fixes #3170
2020-08-31 12:04:04 +02:00
godeffroy
363121235e PR #3279 - revert to non RVO code (fix) 2020-08-31 10:15:34 +02:00
godeffroy
e9466160c7 PR #3279 - revert to non RVO code 2020-08-31 09:54:38 +02:00
godeffroy
59c73f1c46 PR #3279 - assert instead of reporting error to std::cerr 2020-08-31 09:38:54 +02:00
godeffroy
78c4ef17b1 PR #3279 - removed unrelated code 2020-08-31 09:33:22 +02:00
godeffroy
c3d6f8d923 PR #3279 - replaced tabulations by spaces 2020-08-31 08:53:26 +02:00
lissyx
555a265010
Merge pull request #3290 from lissyx/re-fix-swig-master
Fix SWIG prebuild URL
2020-08-28 18:27:39 +02:00
Alexandre Lissy
160fa76ddf Fix SWIG prebuild URL 2020-08-28 17:18:17 +02:00
lissyx
f554ac0b38
Merge pull request #3284 from lissyx/new-macOS-VMs
Switch to new macOS VM setup
2020-08-28 15:23:07 +02:00
Alexandre Lissy
3e6593d325 Switch to new macOS VM setup 2020-08-28 10:15:56 +02:00
Reuben Morais
1c9f3bc99d
Merge pull request #3286 from mozilla/test-pr-3268
Test PR #3268
2020-08-27 20:02:48 +02:00
Daniel
93a4de5489 Fix lr initialization on reload. 2020-08-27 15:08:32 +02:00
Reuben Morais
8965b29e81 Point back to examples master branch 2020-08-27 09:31:05 +02:00
Reuben Morais
becc3d9745
Merge pull request #3280 from mozilla/undo-renames
Undo renames
2020-08-27 09:27:45 +02:00
Reuben Morais
3aa3862fbc Fix TF cache references after rebase 2020-08-26 11:47:35 +02:00
Reuben Morais
b70db48f91 Rename new tasks 2020-08-26 11:46:09 +02:00
Reuben Morais
dc2503c5e0 Specify macOS SDK version along with minimum version in builds 2020-08-26 11:46:09 +02:00
Reuben Morais
b9e2d90a73 Point to reverted examples changes 2020-08-26 11:46:09 +02:00
Reuben Morais
81ce543670 Fix bad conflict resolution in bazel rebuild check 2020-08-26 11:46:09 +02:00
Reuben Morais
8f2c1e842a Explicitly name repository clone target in Dockerfiles 2020-08-26 11:46:09 +02:00
Reuben Morais
d1c964c5d5 Adjust TF cache indices for 2.3 + renames undone 2020-08-26 11:46:09 +02:00
Reuben Morais
ae0cf8db6a Revert "Merge branch 'rename-real'"
This reverts commit ae9fdb183ec6eb422635c0e3a44c0c2ee5732224, reversing
changes made to 2eb75b62064ac30c1c537f4174d00b6e521042c5.
2020-08-26 11:46:09 +02:00
Reuben Morais
386935e1fa Revert "Merge pull request #3230 from mozilla/rename-nuget-gpu-to-cuda"
This reverts commit 0610a7a76fba80df73a220b76b07946ba9ac4581, reversing
changes made to c31df0fd4cba77e632b1ad76c27162727a98e540.
2020-08-26 11:46:08 +02:00
Reuben Morais
01fd13b663 Revert "Merge pull request #3229 from mozilla/nodejs-scoped-name"
This reverts commit 402fc71abf01491cb6b99cc4f9cb69820c0fb842, reversing
changes made to 0610a7a76fba80df73a220b76b07946ba9ac4581.
2020-08-26 11:46:08 +02:00
Reuben Morais
da55cfae86 Revert "Merge pull request #3237 from lissyx/rename-training-package"
This reverts commit 3dcb3743acc14ed9de63110709446791892f8936, reversing
changes made to 457198c88d7ad96ee4596cb21deaeca77c277898.
2020-08-26 11:46:08 +02:00
Reuben Morais
fee45c425e Revert "Merge pull request #3233 from lissyx/examples-rename-master"
This reverts commit 86845dd022f9f77ddc4aff8023b9d5d2a663078a, reversing
changes made to 3dcb3743acc14ed9de63110709446791892f8936.
2020-08-26 11:46:08 +02:00
Reuben Morais
d000d76548 Revert "Merge pull request #3239 from lissyx/rename-circleci"
This reverts commit 08cebeda3c43b10bd8caa766ccd0feec7e305735, reversing
changes made to 86845dd022f9f77ddc4aff8023b9d5d2a663078a.
2020-08-26 11:46:08 +02:00
Reuben Morais
7f99007840 Revert "Merge pull request #3238 from lissyx/rename-index"
This reverts commit 1a7dd876017d0e7451abb1101d154b71b8d8edb5, reversing
changes made to 08cebeda3c43b10bd8caa766ccd0feec7e305735.
2020-08-26 11:46:06 +02:00
Reuben Morais
10e2fc16f2 Revert "Merge pull request #3243 from lissyx/rename-stt-master"
This reverts commit 3e99b0d8b2b2d6e47c8ff7eb1dfd9a88eba8e6d8, reversing
changes made to 3a8c45cb619589f5f6acf4bfb71e7d6b18e8eab5.
2020-08-26 11:45:06 +02:00
Reuben Morais
7a6508612d Revert "Merge pull request #3246 from lissyx/fix-docker"
This reverts commit c01fda56c058779cc9dba952ce940c47398c4ed3, reversing
changes made to 3e99b0d8b2b2d6e47c8ff7eb1dfd9a88eba8e6d8.
2020-08-26 11:45:06 +02:00
Reuben Morais
c62a604876 Revert "Merge pull request #3248 from lissyx/rtd-rename"
This reverts commit ce71910ab4533e84eaf7be92bc1eb447305f4bd6, reversing
changes made to 7c6108a199f1d8f892c2d52088850aaa5a8792e9.
2020-08-26 11:45:06 +02:00
Reuben Morais
9788811bc5 Revert "Merge pull request #3241 from lissyx/rename-ctcdecoder"
This reverts commit fd4185f1410a39af19742310403151646318faba, reversing
changes made to 1a7dd876017d0e7451abb1101d154b71b8d8edb5.
2020-08-26 11:45:06 +02:00
lissyx
9daa708047
Merge pull request #3276 from lissyx/pr3256
Pr3256
2020-08-25 22:39:50 +02:00
lissyx
903fec464a
Merge pull request #3272 from godefv/master
In ctc_beam_search_decoder(), added a sanity check between input class_dim and alphabet
2020-08-25 14:04:41 +02:00
Bernardo Henz
8284958f3d Updating tensorflow version in taskcluster/.build.yml 2020-08-25 13:22:35 +02:00
Bernardo Henz
9f3c40ce48 Replacing old sha with new ones
Replacing old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') with the new one ('23ad988fcde60fb01f9533e95004bbc4877a9143')
2020-08-25 13:22:35 +02:00
Bernardo Henz
b4bc6bfb8a Updating commit of submodule 2020-08-25 13:21:12 +02:00
Bernardo Henz
1f54daf007 Default for layer_norm set to False 2020-08-25 13:18:30 +02:00
Bernardo Henz
2fcba677bb Implementation of layer-norm in the training script 2020-08-25 13:18:30 +02:00
godeffroy
95b6fccaf1 In ctc_beam_search_decoder(), added a sanity check between input class_dim and alphabet 2020-08-25 12:13:28 +02:00
godeffroy
04a36fbf68 The CTC decoder timesteps now corresponds to the timesteps of the most
probable CTC path, instead of the earliest timesteps of all possible paths.
2020-08-25 12:08:14 +02:00
lissyx
c5db91413f
Merge pull request #3277 from lissyx/doc-r2.3
Update docs for matching r2.3
2020-08-24 21:17:11 +02:00
Alexandre Lissy
e81ee24ede Update docs for matching r2.3 2020-08-24 21:13:16 +02:00
lissyx
a54b198d1e
Merge pull request #3266 from lissyx/electronjs-9.2
Add ElectronJS v9.2
2020-08-20 12:42:59 +02:00
Alexandre Lissy
4283b7e7de Add ElectronJS v9.2 2020-08-20 11:15:40 +02:00
Reuben Morais
d14c2b2e2d
Merge pull request #3261 from mozilla/reload-weights-plateau-tests
Tests #3245 Reload weights after plateau
2020-08-20 09:48:02 +02:00
lissyx
f4f8d2d7b7
Merge pull request #3264 from ptitloup/ptitloup-patch-python-client
Update client.py
2020-08-20 09:39:54 +02:00
Ptitloup
0c3aa6f472
Update client.py
remove space in key start_time of word dict
2020-08-20 09:19:53 +02:00
Reuben Morais
567a50087d
Merge pull request #3259 from mozilla/macos-min-10.10
Explicitly set minimum macOS version in bazel flags
2020-08-20 00:11:26 +02:00
Daniel
420ba808c8 Reload graph with extra function. 2020-08-19 18:45:09 +02:00
Daniel
4cf7a012a3 Don't drop layers in rlrop reload. 2020-08-19 18:45:09 +02:00
Daniel
09e1422278 Reload weights after plateau. 2020-08-19 18:45:09 +02:00
lissyx
b5c871616c
Merge pull request #3262 from lissyx/fix-docker-build
Use more beefy builder for Docker builds
2020-08-19 17:46:17 +02:00
Alexandre Lissy
5cc1ec32bd Use more beefy builder for Docker builds 2020-08-19 17:26:59 +02:00
Reuben Morais
2bceda0c56 Explicitly set minimum macOS version in bazel flags 2020-08-19 14:17:42 +02:00
lissyx
eb23728538
Merge pull request #3258 from Jendker/docu_filesize
Extend docu about the CSV files
2020-08-19 12:08:23 +02:00
Jedrzej Beniamin Orbik
9a6a1c7f3a Extend docu about the CSV files 2020-08-19 11:32:52 +02:00
lissyx
02afc2ac7e
Merge pull request #3254 from lissyx/bump-v0.9.0a7
Bump VERSION to 0.9.0-alpha.7
2020-08-18 15:14:13 +02:00
Alexandre Lissy
19ed4e950a Bump VERSION to 0.9.0-alpha.7 2020-08-18 12:23:41 +02:00
lissyx
c40f90cbff
Merge pull request #3227 from lissyx/use-r2.3
Move to TensorFlow r2.3
2020-08-18 10:56:50 +02:00
lissyx
90e04fb365
Merge pull request #3251 from Jendker/ctc_multiple_transc
Add num_results param to ctc_beam_search_decoder
2020-08-17 20:07:59 +02:00
Jedrzej Beniamin Orbik
c20af74d51 Add num_results param to ctc_beam_search_decoder 2020-08-17 18:29:08 +02:00
Alexandre Lissy
8619665fe1 Move to TensorFlow r2.3 2020-08-14 11:26:09 +02:00
lissyx
ce71910ab4
Merge pull request #3248 from lissyx/rtd-rename
Update name of readthedocs
2020-08-13 22:55:14 +02:00
Alexandre Lissy
fffc6ad455 Update name of readthedocs 2020-08-13 22:50:57 +02:00
lissyx
7c6108a199
Merge pull request #3236 from tilmankamp/tarexport
Resolves #3235 - Support for .tar(.gz) targets in bin/data_set_tool.py
2020-08-13 15:52:24 +02:00
Tilman Kamp
96f37a403d Resolves #3235 - Support for .tar(.gz) targets in bin/data_set_tool.py 2020-08-13 10:21:45 +02:00
lissyx
a6f40a3b2f
Merge pull request #3244 from lissyx/bump-v0.9.0a6
Bump VERSION to 0.9.0-alpha.6
2020-08-12 19:01:14 +02:00
Alexandre Lissy
2838df25e0 Bump VERSION to 0.9.0-alpha.6 2020-08-12 17:54:35 +02:00
lissyx
c01fda56c0
Merge pull request #3246 from lissyx/fix-docker
Fix docker path with new project name
2020-08-12 17:49:10 +02:00
Alexandre Lissy
1ad6ad9708 Fix docker path with new project name 2020-08-12 17:12:51 +02:00
lissyx
3e99b0d8b2
Merge pull request #3243 from lissyx/rename-stt-master
Rename DeepSpeech -> STT
2020-08-12 16:18:32 +02:00
Alexandre Lissy
9bca7a9044 Rename DeepSpeech -> STT 2020-08-12 13:52:17 +02:00
lissyx
3a8c45cb61
Merge pull request #3242 from lissyx/improve-tc-cleanup
Try and properly cleanup TaskCluster Workdir
2020-08-12 12:03:18 +02:00
Alexandre Lissy
60fe2450a7 Try and properly cleanup TaskCluster Workdir 2020-08-12 10:58:14 +02:00
lissyx
fd4185f141
Merge pull request #3241 from lissyx/rename-ctcdecoder
Rename ctcdecoder python package
2020-08-11 19:06:56 +02:00
lissyx
1a7dd87601
Merge pull request #3238 from lissyx/rename-index
Rename TaskCluster index
2020-08-11 19:06:14 +02:00
Alexandre Lissy
ccd9241bd0 Rename ctcdecoder python package 2020-08-10 22:45:43 +02:00
Alexandre Lissy
5795173c14 Rename TaskCluster index 2020-08-10 22:08:39 +02:00
lissyx
08cebeda3c
Merge pull request #3239 from lissyx/rename-circleci
Use new name for Docker container and Docker Hub repo
2020-08-10 20:26:15 +02:00
Alexandre Lissy
e83d92c93a Use new name for Docker container and Docker Hub repo 2020-08-10 20:24:45 +02:00
lissyx
86845dd022
Merge pull request #3233 from lissyx/examples-rename-master
Rename DeepSpeech-examples to STT-examples
2020-08-10 19:02:53 +02:00
Alexandre Lissy
7d31f5e349 Rename DeepSpeech-examples to STT-examples 2020-08-10 18:35:34 +02:00
lissyx
3dcb3743ac
Merge pull request #3237 from lissyx/rename-training-package
Rename deepspeech_training package
2020-08-10 18:35:03 +02:00
Alexandre Lissy
6f84bd1996 Rename deepspeech_training package 2020-08-10 16:58:18 +02:00
lissyx
457198c88d
Merge pull request #3232 from lissyx/bump-v0.9.0-alpha.5
Bump VERSION to 0.9.0-alpha.5
2020-08-07 13:16:38 +02:00
Alexandre Lissy
41dcb41691 Bump VERSION to 0.9.0-alpha.5 2020-08-07 11:39:47 +02:00
lissyx
402fc71abf
Merge pull request #3229 from mozilla/nodejs-scoped-name
Use scoped name for npm package
2020-08-07 00:52:05 +02:00
Reuben Morais
50de377953 Use scoped name for npm package 2020-08-06 18:55:42 +02:00
Reuben Morais
0610a7a76f
Merge pull request #3230 from mozilla/rename-nuget-gpu-to-cuda
Rename NuGet -GPU package to -CUDA
2020-08-06 17:43:25 +02:00
Reuben Morais
1e8213c385 Rename NuGet -GPU package to -CUDA 2020-08-06 16:16:33 +02:00
Reuben Morais
c31df0fd4c Bump VERSION to 0.9.0-alpha.4 2020-08-06 14:25:39 +02:00
Reuben Morais
ae9fdb183e Merge branch 'rename-real' 2020-08-06 14:20:39 +02:00
Reuben Morais
0b51004081 Address review comments 2020-08-06 14:20:05 +02:00
Reuben Morais
4d98958b77 iOS: Re-share workspace schemes and fix packaging 2020-08-05 17:49:51 +02:00
Reuben Morais
8c840bed23 Fix .NET build/package, resolve package conflict in Java app 2020-08-05 17:49:51 +02:00
lissyx
2eb75b6206
Merge pull request #3224 from lissyx/electronjs-example
Electron example
2020-08-04 22:23:27 +02:00
Alexandre Lissy
bb24fc89f0 Electron example 2020-08-04 21:57:23 +02:00
Reuben Morais
4d726e820d More renames 2020-08-04 18:04:08 +02:00
Reuben Morais
ee1235678d Missing renames in CI scripts 2020-08-04 15:25:46 +02:00
lissyx
3340cb6b8a
Merge pull request #3218 from lissyx/new-workerType
Fix #3181: Use finer-grained gcp workers
2020-08-04 13:12:06 +02:00
Reuben Morais
b301cdf83e JavaScript rename 2020-08-04 12:13:11 +02:00
Reuben Morais
5449f21a47 Python rename 2020-08-04 12:12:20 +02:00
Reuben Morais
b86a92a5b3 C docs 2020-08-04 12:10:31 +02:00
Reuben Morais
b18639f9c4 Swift rename 2020-08-04 12:09:41 +02:00
Reuben Morais
213590b326 Java rename 2020-08-04 11:39:22 +02:00
Reuben Morais
ee7bf86460 .NET rename 2020-08-04 11:15:27 +02:00
Alexandre Lissy
040f5eb2a3 Fix #3181: Use finer-grained gcp workers 2020-08-04 11:15:07 +02:00
lissyx
b65cd7e810
Merge pull request #3208 from lissyx/fix-linker
Fix #3207: do not force -shared on the linkage
2020-08-04 11:12:12 +02:00
lissyx
9cd6863e4a
Merge pull request #3214 from lissyx/win-workers
Fix #3211: Use win + win-gpu set
2020-08-03 21:55:53 +02:00
Alexandre Lissy
6d5d97abc4 Fix #3207: do not force -shared on the linkage 2020-08-03 18:58:43 +02:00
Alexandre Lissy
b6edcbe08c Fix #3211: Use win + win-gpu set 2020-08-03 18:32:20 +02:00
Reuben Morais
fa21911048 Rename packages, modules, headers, shared libraries to Mozilla Voice STT 2020-08-03 18:22:32 +02:00
Reuben Morais
21e5a74b0c
Merge pull request #3212 from mozilla/link-decoder-docs
Decoder docs: UTF-8 -> Bytes output mode, and link to scorer-scripts (Closes #2978)
2020-08-03 11:57:43 +02:00
Reuben Morais
d182cb7a58
Merge pull request #3213 from mozilla/remove-tensorflow-mention-cuda
Remove mention of TensorFlow docs for CUDA requirements
2020-08-03 11:57:27 +02:00
Reuben Morais
350575ba44 Remove mention of TensorFlow docs for CUDA requirements 2020-08-03 09:22:19 +02:00
Reuben Morais
d9f9d6ed89 Decoder docs: UTF-8 -> Bytes output mode, and link to scorer-scripts 2020-08-03 09:16:38 +02:00
Reuben Morais
04deda0239
Merge pull request #3206 from mrstegeman/alphabet-logic
Fix alphabet logic in generate_scorer_package.
2020-08-02 17:24:13 +02:00
lissyx
482cc534cf
Merge pull request #3204 from lissyx/no-dotnet-examples
Fix #3198: Do not rely on examples repo for building .Net
2020-08-02 11:35:17 +02:00
Alexandre Lissy
c55143d282 Fix #3198: Do not rely on examples repo for building .Net 2020-08-02 02:02:41 +02:00
Michael Stegeman
3024cffe49
Fix alphabet logic in generate_scorer_package.
Fixes #3205
2020-07-31 12:46:03 -08:00
Reuben Morais
41db367428
Merge pull request #3201 from mozilla/update-examples-models
Update examples model to match new naming
2020-07-31 08:42:48 +02:00
Reuben Morais
4b10f0b840 Update examples model to match new naming 2020-07-30 22:39:14 +02:00
Reuben Morais
d3efa4c438
Merge pull request #3199 from mozilla/ios-publish-all
Upload both native_client and .framework for iOS tasks
2020-07-30 22:26:25 +02:00
Reuben Morais
c8441d1f8d Upload both native_client and .framework for iOS tasks 2020-07-30 20:33:58 +02:00
883 changed files with 84281 additions and 8281 deletions

View File

@ -1,89 +0,0 @@
# These environment variables must be set in CircleCI UI
#
# DOCKERHUB_REPO - docker hub repo, format: <username>/<repo>
# DOCKER_USER - login info for docker hub
# DOCKER_PASS
#
version: 2
jobs:
build:
docker:
- image: docker:stable-git
working_directory: /dockerflow
steps:
- checkout
- setup_remote_docker
- run:
name: os-release
command: |
cat /etc/os-release
- run:
name: install make
command: |
apk add make
- run:
name: Create a Dockerfile.train
command: |
make Dockerfile.train \
DEEPSPEECH_REPO="https://github.com/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME" \
DEEPSPEECH_SHA=$CIRCLE_SHA1
- run:
name: Build Docker image
command: docker build -t app:build -f Dockerfile.train .
# save the built docker container into CircleCI's cache. This is
# required since Workflows do not have the same remote docker instance.
- run:
name: docker save app:build
command: mkdir -p /cache; docker save -o /cache/docker.tar "app:build"
- save_cache:
key: v1-{{ .Branch }}-{{epoch}}
paths:
- /cache/docker.tar
deploy:
docker:
- image: docker:18.02.0-ce
steps:
- setup_remote_docker
- restore_cache:
key: v1-{{.Branch}}
- run:
name: Restore Docker image cache
command: docker load -i /cache/docker.tar
- run:
name: Deploy to Dockerhub
command: |
echo $DOCKER_PASS | docker login -u $DOCKER_USER --password-stdin
# deploy master
if [ "${CIRCLE_BRANCH}" == "master" ]; then
docker tag app:build ${DOCKERHUB_REPO}:latest
docker push ${DOCKERHUB_REPO}:latest
elif [ ! -z "${CIRCLE_TAG}" ]; then
# deploy a release tag...
echo "${DOCKERHUB_REPO}:${CIRCLE_TAG}"
docker tag app:build "${DOCKERHUB_REPO}:${CIRCLE_TAG}"
docker images
docker push "${DOCKERHUB_REPO}:${CIRCLE_TAG}"
fi
workflows:
version: 2
build-deploy:
jobs:
- build:
filters:
tags:
only: /.*/
- deploy:
requires:
- build
filters:
tags:
only: /.*/

5
.dockerignore Normal file
View File

@ -0,0 +1,5 @@
.git/lfs
native_client/ds-swig
native_client/python/dist/*.whl
native_client/ctcdecode/*.a
native_client/javascript/build/

1
.gitattributes vendored
View File

@ -1 +1,2 @@
data/lm/kenlm.scorer filter=lfs diff=lfs merge=lfs -text
.github/actions/check_artifact_exists/dist/index.js binary

40
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,40 @@
---
name: Bug report
about: Create a report to help us improve
title: 'Bug: '
labels: bug
assignees: ''
---
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file.
If you've found a bug, please provide the following information:
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Run the following command '...'
2. ...
3. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment (please complete the following information):**
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
- **TensorFlow installed from (our builds, or upstream TensorFlow)**:
- **TensorFlow version (use command below)**:
- **Python version**:
- **Bazel version (if compiling from source)**:
- **GCC/Compiler version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**:
- **Exact command to reproduce**:
**Additional context**
Add any other context about the problem here.

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Coqui STT GitHub Discussions
url: https://github.com/coqui-ai/STT/discussions
about: Please ask and answer questions here.
- name: Coqui Security issue disclosure
url: mailto:info@coqui.ai
about: Please report security vulnerabilities here.

View File

@ -0,0 +1,26 @@
---
name: Feature request
about: Suggest an idea for this project
title: 'Feature request: '
labels: enhancement
assignees: ''
---
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file.
If you have a feature request, then please provide the following information:
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@ -0,0 +1,11 @@
name: "Build TensorFlow"
description: "Build TensorFlow Build"
inputs:
flavor:
description: "Build flavor"
required: true
runs:
using: "composite"
steps:
- run: ./ci_scripts/tf-build.sh ${{ inputs.flavor }}
shell: bash

View File

@ -0,0 +1,43 @@
Building and using a TensorFlow cache:
======================================
The present action will check the existence of an artifact in the list of the
repo artifacts. Since we don't want always to download the artifact, we can't
rely on the official download-artifact action.
Rationale:
----------
Because of the amount of code required to build TensorFlow, the library build
is split into two main parts to make it much faster to run PRs:
- a TensorFlow prebuild cache
- actual code of the library
The TensorFlow prebuild cache exists because building tensorflow (even just the
`libtensorflow_cpp.so`) is a huge amount of code and it will take several hours
even on decent systems. So we perform a cache build of it, because the
tensorflow version does not change that often.
However, each PR might have changes to the actual library code, so we rebuild
this everytime.
The `tensorflow_opt-macOS` job checks whether such build cache exists alrady.
Those cache are stored as artifacts because [GitHub Actions
cache](https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows)
has size limitations.
The `build-tensorflow-macOS` job has a dependency against the cache check to
know whether it needs to run an actual build or not.
Hacking:
--------
For hacking into the action, please follow the [GitHub JavaScript
Actions](https://docs.github.com/en/actions/creating-actions/creating-a-javascript-action#commit-tag-and-push-your-action-to-github)
and specifically the usage of `ncc`.
```
$ npm install
$ npx ncc build main.js --license licenses.txt
$ git add dist/
```

View File

@ -0,0 +1,32 @@
name: "check/download artifacts"
description: "Check and download that an artifact exists"
inputs:
name:
description: "Artifact name"
required: true
github_token:
description: "GitHub token"
required: false
default: ${{ github.token }}
download:
description: "Should we download?"
required: false
default: false
path:
description: "Where to unpack the artifact"
required: false
default: "./"
repo:
description: "Repository name with owner (like actions/checkout)"
required: false
default: ${{ github.repository }}
release-tag:
description: "Tag of release to check artifacts under"
required: false
default: "v0.10.0-alpha.7"
outputs:
status:
description: "Status string of the artifact: 'missing' or 'found'"
runs:
using: "node12"
main: "dist/index.js"

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,132 @@
const core = require('@actions/core');
const github = require('@actions/github');
const AdmZip = require('adm-zip');
const filesize = require('filesize');
const pathname = require('path');
const fs = require('fs');
const { throttling } = require('@octokit/plugin-throttling');
const { GitHub } = require('@actions/github/lib/utils');
const Download = require('download');
const Util = require('util');
const Stream = require('stream');
const Pipeline = Util.promisify(Stream.pipeline);
async function getGoodArtifacts(client, owner, repo, releaseId, name) {
console.log(`==> GET /repos/${owner}/${repo}/releases/${releaseId}/assets`);
const goodRepoArtifacts = await client.paginate(
"GET /repos/{owner}/{repo}/releases/{release_id}/assets",
{
owner: owner,
repo: repo,
release_id: releaseId,
per_page: 100,
},
(releaseAssets, done) => {
console.log(" ==> releaseAssets", releaseAssets);
const goodAssets = releaseAssets.data.filter((a) => {
console.log("==> Asset check", a);
return a.name == name
});
if (goodAssets.length > 0) {
done();
}
return goodAssets;
}
);
console.log("==> maybe goodRepoArtifacts:", goodRepoArtifacts);
return goodRepoArtifacts;
}
async function main() {
try {
const token = core.getInput("github_token", { required: true });
const [owner, repo] = core.getInput("repo", { required: true }).split("/");
const path = core.getInput("path", { required: true });
const name = core.getInput("name");
const download = core.getInput("download");
const releaseTag = core.getInput("release-tag");
const OctokitWithThrottling = GitHub.plugin(throttling);
const client = new OctokitWithThrottling({
auth: token,
throttle: {
onRateLimit: (retryAfter, options) => {
console.log(
`Request quota exhausted for request ${options.method} ${options.url}`
);
// Retry twice after hitting a rate limit error, then give up
if (options.request.retryCount <= 2) {
console.log(`Retrying after ${retryAfter} seconds!`);
return true;
} else {
console.log("Exhausted 2 retries");
core.setFailed("Exhausted 2 retries");
}
},
onAbuseLimit: (retryAfter, options) => {
// does not retry, only logs a warning
console.log(
`Abuse detected for request ${options.method} ${options.url}`
);
core.setFailed(`GitHub REST API Abuse detected for request ${options.method} ${options.url}`)
},
},
});
console.log("==> Repo:", owner + "/" + repo);
const releaseInfo = await client.repos.getReleaseByTag({
owner,
repo,
tag: releaseTag,
});
console.log(`==> Release info for tag ${releaseTag} = ${JSON.stringify(releaseInfo.data, null, 2)}`);
const releaseId = releaseInfo.data.id;
const goodArtifacts = await getGoodArtifacts(client, owner, repo, releaseId, name);
console.log("==> goodArtifacts:", goodArtifacts);
const artifactStatus = goodArtifacts.length === 0 ? "missing" : "found";
console.log("==> Artifact", name, artifactStatus);
console.log("==> download", download);
core.setOutput("status", artifactStatus);
if (artifactStatus === "found" && download == "true") {
console.log("==> # artifacts:", goodArtifacts.length);
const artifact = goodArtifacts[0];
console.log("==> Artifact:", artifact.id)
const size = filesize(artifact.size, { base: 10 })
console.log(`==> Downloading: ${artifact.name} (${size}) to path: ${path}`)
const dir = pathname.dirname(path)
console.log(`==> Creating containing dir if needed: ${dir}`)
fs.mkdirSync(dir, { recursive: true })
await Pipeline(
Download(artifact.url, {
headers: {
"Accept": "application/octet-stream",
"Authorization": `token ${token}`,
},
}),
fs.createWriteStream(path)
)
}
if (artifactStatus === "missing" && download == "true") {
core.setFailed("Required", name, "that is missing");
}
return;
} catch (err) {
console.error(err.stack);
core.setFailed(err.message);
}
}
main();

1139
.github/actions/check_artifact_exists/package-lock.json generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,13 @@
{
"name": "check_artifact_exists",
"main": "main.js",
"devDependencies": {
"@actions/core": "^1.2.6",
"@actions/github": "^4.0.0",
"@octokit/plugin-throttling": "^3.4.1",
"@vercel/ncc": "^0.27.0",
"adm-zip": "^0.5.2",
"download": "^8.0.0",
"filesize": "^6.1.0"
}
}

View File

@ -0,0 +1,29 @@
name: "chroot bind mount"
description: "Bind mount into chroot"
inputs:
mounts:
description: "Path to consider"
required: true
runs:
using: "composite"
steps:
- id: install_qemu
run: |
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends qemu-user-static
shell: bash
- id: bind_mount_chroot
run: |
set -xe
# Bind-mount so that we have the same tree inside the chroot
for dev in ${{ github.workspace }} ${{ inputs.mounts }};
do
sudo mount -o bind ${dev} ${{ env.SYSTEM_RASPBIAN }}${dev}
done;
for dev in ${{ inputs.mounts }};
do
sudo mount -o bind /${dev} ${{ env.SYSTEM_RASPBIAN }}/${dev}
done;
shell: bash

15
.github/actions/get_cache_key/README.md vendored Normal file
View File

@ -0,0 +1,15 @@
GitHub Action to compute cache key
==================================
It is intended to work in harmony with `check_artifact_exists`:
- compute a stable cache key
- as simple to use as possible (less parameters)
It will expect to be ran in a GitHub Action job that follows
`SUBMODULE_FLAVOR-PLATFORM`:
- it will use the `SUBMODULE` part to check what is the current SHA1 of this git submodule.
- the `FLAVOR` allows to distringuish e.g., opt/dbg builds
- the PLATFORM permits defining an os/arch couple
It allows for an `extras` field for extensive customization, like forcing a
re-build.

View File

@ -0,0 +1,34 @@
name: "get cache key for submodule"
description: "Compute a cache key based on git submodule"
inputs:
extras:
description: "Extra cache key value"
required: true
osarch:
description: "Override automatic OSARCH value"
required: false
outputs:
key:
description: "Computed cache key name"
value: ${{ steps.compute_cache_key.outputs.key }}
runs:
using: "composite"
steps:
- id: compute_cache_key
run: |
set -xe
JOB=${{ github.job }}
SUBMODULE=$(echo $JOB | cut -d'-' -f1 | cut -d'_' -f1)
FLAVOR=$(echo $JOB | cut -d'-' -f1 | cut -d'_' -f2)
if [ -z "${{ inputs.osarch }}" ]; then
OSARCH=$(echo $JOB | cut -d'-' -f2)
else
OSARCH=${{ inputs.osarch }}
fi
SHA=$(git submodule status ${SUBMODULE} | sed -e 's/^-//g' -e 's/^+//g' -e 's/^U//g' | awk '{ print $1 }')
KEY=${SUBMODULE}-${FLAVOR}_${OSARCH}_${SHA}_${{ inputs.extras }}
echo "::set-output name=key::${KEY}"
shell: bash

View File

@ -0,0 +1,30 @@
name: "Install Python"
description: "Installing an upstream python release"
inputs:
version:
description: "Python version"
required: true
runs:
using: "composite"
steps:
- shell: bash
run: |
set -xe
curl https://www.python.org/ftp/python/${{ inputs.version }}/python-${{ inputs.version }}-macosx10.9.pkg -o "python.pkg"
- shell: bash
run: ls -hal .
- shell: bash
run: |
set -xe
sudo installer -verbose -pkg python.pkg -target /
- shell: bash
run: |
set -xe
which python3
python3 --version
python3 -c "import sysconfig; print(sysconfig.get_config_var('MACOSX_DEPLOYMENT_TARGET'))"
- shell: bash
name: Set up venv with upstream Python
run: |
python3 -m venv /tmp/venv
echo "/tmp/venv/bin" >> $GITHUB_PATH

18
.github/actions/install-xldd/action.yml vendored Normal file
View File

@ -0,0 +1,18 @@
name: "xldd install"
description: "Install xldd"
inputs:
target:
description: "System target"
required: true
runs:
using: "composite"
steps:
- id: install_xldd
run: |
source ./ci_scripts/all-vars.sh
# -s required to avoid the noisy output like "Entering / Leaving directories"
toolchain=$(make -s -C ${DS_DSDIR}/native_client/ TARGET=${{ inputs.target }} TFDIR=${DS_TFDIR} print-toolchain)
if [ ! -x "${toolchain}ldd" ]; then
cp "${DS_DSDIR}/native_client/xldd" "${toolchain}ldd" && chmod +x "${toolchain}ldd"
fi
shell: bash

12
.github/actions/libstt-build/action.yml vendored Normal file
View File

@ -0,0 +1,12 @@
name: "Build libstt.so"
description: "Build libstt.so"
inputs:
arch:
description: "Target arch for loading script (host/armv7/aarch64)"
required: false
default: "host"
runs:
using: "composite"
steps:
- run: ./ci_scripts/${{ inputs.arch }}-build.sh
shell: bash

67
.github/actions/multistrap/action.yml vendored Normal file
View File

@ -0,0 +1,67 @@
name: "multistrap install"
description: "Install a system root using multistrap"
inputs:
arch:
description: "Target arch"
required: true
packages:
description: "Extra packages to install"
required: false
default: ""
runs:
using: "composite"
steps:
- id: install_multistrap
run: |
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends multistrap qemu-user-static
shell: bash
- id: create_chroot
run: |
set -xe
multistrap_conf=""
if [ "${{ inputs.arch }}" = "armv7" ]; then
multistrap_conf=multistrap_raspbian_buster.conf
wget http://archive.raspbian.org/raspbian/pool/main/r/raspbian-archive-keyring/raspbian-archive-keyring_20120528.2_all.deb && sudo dpkg -i raspbian-archive-keyring_20120528.2_all.deb
fi
if [ "${{ inputs.arch }}" = "aarch64" ]; then
multistrap_conf=multistrap_armbian64_buster.conf
fi
multistrap -d ${{ env.SYSTEM_RASPBIAN }} -f ${{ github.workspace }}/native_client/${multistrap_conf}
if [ ! -z "${{ inputs.packages }}" ]; then
TO_MOUNT=${{ github.workspace }}
# Prepare target directory to bind-mount the github tree
mkdir -p ${{ env.SYSTEM_RASPBIAN }}/${{ github.workspace }}
# Bind-mount so that we have the same tree inside the chroot
for dev in ${TO_MOUNT};
do
sudo mount -o bind ${dev} ${{ env.SYSTEM_RASPBIAN }}${dev}
done;
# Copy some host data:
# resolv.conf: for getting DNS working
# passwd, group, shadow: to have user accounts and apt-get install working
for ff in resolv.conf passwd group shadow;
do
sudo cp /etc/${ff} ${{ env.SYSTEM_RASPBIAN }}/etc/
done;
# Perform apt steps.
# Preserving the env is required
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get update -y
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get install -y --no-install-recommends ${{ inputs.packages }}
# Cleanup apt info to save space
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ rm -fr /var/cache/apt/* /var/lib/apt/lists/*
# Unmount what has been mounted
for dev in ${TO_MOUNT};
do
sudo umount ${{ env.SYSTEM_RASPBIAN }}${dev}
done;
fi
shell: bash

77
.github/actions/node-build/action.yml vendored Normal file
View File

@ -0,0 +1,77 @@
name: "NodeJS binding"
description: "Binding a nodejs binding"
inputs:
nodejs_versions:
description: "NodeJS versions supported"
required: true
electronjs_versions:
description: "ElectronJS versions supported"
required: true
local_cflags:
description: "CFLAGS for NodeJS package"
required: false
default: ""
local_ldflags:
description: "LDFLAGS for NodeJS package"
required: false
default: ""
local_libs:
description: "LIBS for NodeJS package"
required: false
default: ""
target:
description: "TARGET value"
required: false
default: "host"
chroot:
description: "RASPBIAN value"
required: false
default: ""
runs:
using: "composite"
steps:
- run: |
node --version
npm --version
shell: bash
- run: |
npm update
shell: bash
- run: |
mkdir -p tmp/headers/nodejs tmp/headers/electronjs
shell: bash
- run: |
for node in ${{ inputs.nodejs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${node} \
NODE_DEVDIR=--devdir=headers/nodejs \
clean node-wrapper
done;
shell: bash
- run: |
for electron in ${{ inputs.electronjs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${electron} \
NODE_DIST_URL=--disturl=https://electronjs.org/headers \
NODE_RUNTIME=--runtime=electron \
NODE_DEVDIR=--devdir=headers/electronjs \
clean node-wrapper
done;
shell: bash
- run: |
make -C native_client/javascript clean npm-pack
shell: bash
- run: |
tar -czf native_client/javascript/wrapper.tar.gz \
-C native_client/javascript/ lib/
shell: bash

22
.github/actions/node-install/action.yml vendored Normal file
View File

@ -0,0 +1,22 @@
name: "nodejs install"
description: "Install nodejs in a chroot"
inputs:
node:
description: "NodeJS version"
required: true
runs:
using: "composite"
steps:
- id: add_apt_source
run: |
set -ex
(echo "Package: nodejs" && echo "Pin: origin deb.nodesource.com" && echo "Pin-Priority: 999") > ${{ env.SYSTEM_RASPBIAN }}/etc/apt/preferences
echo "deb http://deb.nodesource.com/node_${{ inputs.node }}.x buster main" > ${{ env.SYSTEM_RASPBIAN }}/etc/apt/sources.list.d/nodesource.list
wget -qO- https://deb.nodesource.com/gpgkey/nodesource.gpg.key | sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-key add -
shell: bash
- id: install_nodejs
run: |
set -ex
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get update -y
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get install -y nodejs
shell: bash

14
.github/actions/numpy_vers/README.md vendored Normal file
View File

@ -0,0 +1,14 @@
GitHub Action to set NumPy versions
===================================
This actions aims at computing correct values for NumPy dependencies:
- `NUMPY_BUILD_VERSION`: range of accepted versions at Python binding build time
- `NUMPY_DEP_VERSION`: range of accepted versions for execution time
Versions are set considering several factors:
- API and ABI compatibility ; otherwise we can have the binding wrapper
throwing errors like "Illegal instruction", or computing wrong values
because of changed memory layout
- Wheels availability: for CI and end users, we want to avoid having to
rebuild numpy so we stick to versions where there is an existing upstream
`wheel` file

93
.github/actions/numpy_vers/action.yml vendored Normal file
View File

@ -0,0 +1,93 @@
name: "get numpy versions"
description: "Get proper NumPy build and runtime versions dependencies range"
inputs:
pyver:
description: "Python version"
required: true
outputs:
build_version:
description: "NumPy build dependency"
value: ${{ steps.numpy.outputs.build }}
dep_version:
description: "NumPy runtime dependency"
value: ${{ steps.numpy.outputs.dep }}
runs:
using: "composite"
steps:
- id: numpy
run: |
set -ex
NUMPY_BUILD_VERSION="==1.7.0"
NUMPY_DEP_VERSION=">=1.7.0"
OS=$(uname -s)
ARCH=$(uname -m)
case "${OS}:${ARCH}" in
Linux:x86_64)
case "${{ inputs.pyver }}" in
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.19.4"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.19.4"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
Darwin:*)
case "${{ inputs.pyver }}" in
3.6*)
NUMPY_BUILD_VERSION="==1.9.0"
NUMPY_DEP_VERSION=">=1.9.0"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
${CI_MSYS_VERSION}:x86_64)
case "${{ inputs.pyver }}" in
3.5*)
NUMPY_BUILD_VERSION="==1.11.0"
NUMPY_DEP_VERSION=">=1.11.0,<1.12.0"
;;
3.6*)
NUMPY_BUILD_VERSION="==1.12.0"
NUMPY_DEP_VERSION=">=1.12.0,<1.14.5"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
esac
echo "::set-output name=build::${NUMPY_BUILD_VERSION}"
echo "::set-output name=dep::${NUMPY_DEP_VERSION}"
shell: bash

View File

@ -0,0 +1,7 @@
name: "Package TensorFlow"
description: "Package TensorFlow Build"
runs:
using: "composite"
steps:
- run: ./ci_scripts/tf-package.sh
shell: bash

7
.github/actions/package/action.yml vendored Normal file
View File

@ -0,0 +1,7 @@
name: "Package lib"
description: "Package of lib"
runs:
using: "composite"
steps:
- run: ./ci_scripts/package.sh
shell: bash

58
.github/actions/python-build/action.yml vendored Normal file
View File

@ -0,0 +1,58 @@
name: "Python binding"
description: "Binding a python binding"
inputs:
numpy_build:
description: "NumPy build dependecy"
required: true
numpy_dep:
description: "NumPy runtime dependecy"
required: true
local_cflags:
description: "CFLAGS for Python package"
required: false
default: ""
local_ldflags:
description: "LDFLAGS for Python package"
required: false
default: ""
local_libs:
description: "LIBS for Python package"
required: false
default: ""
target:
description: "TARGET value"
required: false
default: "host"
chroot:
description: "RASPBIAN value"
required: false
default: ""
runs:
using: "composite"
steps:
- run: |
python3 --version
pip3 --version
shell: bash
- run: |
set -xe
PROJECT_NAME="stt"
OS=$(uname)
if [ "${OS}" = "Linux" -a "${{ inputs.target }}" != "host" ]; then
python3 -m venv stt-build
source stt-build/bin/activate
fi
NUMPY_BUILD_VERSION="${{ inputs.numpy_build }}" \
NUMPY_DEP_VERSION="${{ inputs.numpy_dep }}" \
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/python/ \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
SETUP_FLAGS="--project_name ${PROJECT_NAME}" \
bindings-clean bindings
shell: bash

35
.github/actions/run-tests/action.yml vendored Normal file
View File

@ -0,0 +1,35 @@
name: "Tests execution"
description: "Running tests"
inputs:
runtime:
description: "Runtime to use for running test"
required: true
model-kind:
description: "Running against CI baked or production model"
required: true
bitrate:
description: "Bitrate for testing"
required: true
chroot:
description: "Run using a chroot"
required: false
runs:
using: "composite"
steps:
- run: |
set -xe
build="_tflite"
model_kind=""
if [ "${{ inputs.model-kind }}" = "prod" ]; then
model_kind="-prod"
fi
prefix="."
if [ ! -z "${{ inputs.chroot }}" ]; then
prefix="${{ inputs.chroot }}"
fi
${prefix}/ci_scripts/${{ inputs.runtime }}${build}-tests${model_kind}.sh ${{ inputs.bitrate }}
shell: bash

11
.github/actions/select-xcode/action.yml vendored Normal file
View File

@ -0,0 +1,11 @@
name: "Select XCode version"
description: "Select XCode version"
inputs:
version:
description: "XCode version"
required: true
runs:
using: "composite"
steps:
- run: sudo xcode-select --switch /Applications/Xcode_${{ inputs.version }}.app
shell: bash

View File

@ -0,0 +1,12 @@
name: "Setup TensorFlow"
description: "Setup TensorFlow Build"
inputs:
flavor:
description: "Target flavor for setup script (empty/android-armv7/android-arm64)"
required: false
default: ""
runs:
using: "composite"
steps:
- run: ./ci_scripts/tf-setup.sh ${{ inputs.flavor }}
shell: bash

View File

@ -0,0 +1,89 @@
name: "Upload cache asset to release"
description: "Upload a build cache asset to a release"
inputs:
name:
description: "Artifact name"
required: true
path:
description: "Path of file to upload"
required: true
token:
description: "GitHub token"
required: false
default: ${{ github.token }}
repo:
description: "Repository name with owner (like actions/checkout)"
required: false
default: ${{ github.repository }}
release-tag:
description: "Tag of release to check artifacts under"
required: false
default: "v0.10.0-alpha.7"
runs:
using: "composite"
steps:
- run: |
set -xe
asset_name="${{ inputs.name }}"
filenames="${{ inputs.path }}"
if [ $(compgen -G "$filenames" | wc -l) -gt 1 -a -n "$asset_name" ]; then
echo "Error: multiple input files specified, but also specified an asset_name."
echo "When uploading multiple files leave asset_name empty to use the file names as asset names."
exit 1
fi
# Check input
for file in $filenames; do
if [[ ! -f $file ]]; then
echo "Error: Input file (${filename}) missing"
exit 1;
fi
done
AUTH="Authorization: token ${{ inputs.token }}"
owner=$(echo "${{inputs.repo}}" | cut -f1 -d/)
repo=$(echo "${{inputs.repo}}" | cut -f2 -d/)
tag="${{ inputs.release-tag }}"
GH_REPO="https://api.github.com/repos/${owner}/${repo}"
# Check token
curl -o /dev/null -sH "$AUTH" $GH_REPO || {
echo "Error: Invalid repo, token or network issue!"
exit 1
}
# Check if tag exists
response=$(curl -sH "$AUTH" "${GH_REPO}/git/refs/tags/${tag}")
eval $(echo "$response" | grep -m 1 "sha.:" | grep -w sha | tr : = | tr -cd '[[:alnum:]]=')
[ "$sha" ] || {
echo "Error: Tag does not exist: $tag"
echo "$response" | awk 'length($0)<100' >&2
exit 1
}
# Get ID of the release based on given tag name
GH_TAGS="${GH_REPO}/releases/tags/${tag}"
response=$(curl -sH "$AUTH" $GH_TAGS)
eval $(echo "$response" | grep -m 1 "id.:" | grep -w id | tr : = | tr -cd '[[:alnum:]]=')
[ "$id" ] || {
echo "Error: Could not find release for tag: $tag"
echo "$response" | awk 'length($0)<100' >&2
exit 1
}
# Upload assets
for file in $filenames; do
if [ -z $asset_name ]; then
asset=$(basename $file)
else
asset=$asset_name
fi
echo "Uploading asset with name: $asset from file: $file"
GH_ASSET="https://uploads.github.com/repos/${owner}/${repo}/releases/${id}/assets?name=${asset}"
curl -T $file -X POST -H "${AUTH}" -H "Content-Type: application/octet-stream" $GH_ASSET
done
shell: bash

View File

@ -0,0 +1,12 @@
name: "Install SoX and add to PATH"
description: "Install SoX and add to PATH"
runs:
using: "composite"
steps:
- run: |
set -ex
curl -sSLO https://github.com/coqui-ai/STT/releases/download/v0.10.0-alpha.7/sox-14.4.2-win32.zip
"C:/Program Files/7-Zip/7z.exe" x -o`pwd`/bin/ -tzip -aoa sox-14.4.2-win32.zip
rm sox-*zip
echo "`pwd`/bin/sox-14.4.2/" >> $GITHUB_PATH
shell: bash

View File

@ -0,0 +1,77 @@
name: "NodeJS binding"
description: "Binding a nodejs binding"
inputs:
nodejs_versions:
description: "NodeJS versions supported"
required: true
electronjs_versions:
description: "ElectronJS versions supported"
required: true
local_cflags:
description: "CFLAGS for NodeJS package"
required: false
default: ""
local_ldflags:
description: "LDFLAGS for NodeJS package"
required: false
default: ""
local_libs:
description: "LIBS for NodeJS package"
required: false
default: ""
target:
description: "TARGET value"
required: false
default: "host"
chroot:
description: "RASPBIAN value"
required: false
default: ""
runs:
using: "composite"
steps:
- run: |
node --version
npm --version
shell: msys2 {0}
- run: |
npm update
shell: msys2 {0}
- run: |
mkdir -p tmp/headers/nodejs tmp/headers/electronjs
shell: msys2 {0}
- run: |
for node in ${{ inputs.nodejs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${node} \
NODE_DEVDIR=--devdir=headers/nodejs \
clean node-wrapper
done;
shell: msys2 {0}
- run: |
for electron in ${{ inputs.electronjs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${electron} \
NODE_DIST_URL=--disturl=https://electronjs.org/headers \
NODE_RUNTIME=--runtime=electron \
NODE_DEVDIR=--devdir=headers/electronjs \
clean node-wrapper
done;
shell: msys2 {0}
- run: |
make -C native_client/javascript clean npm-pack
shell: msys2 {0}
- run: |
tar -czf native_client/javascript/wrapper.tar.gz \
-C native_client/javascript/ lib/
shell: msys2 {0}

View File

@ -0,0 +1,14 @@
GitHub Action to set NumPy versions
===================================
This actions aims at computing correct values for NumPy dependencies:
- `NUMPY_BUILD_VERSION`: range of accepted versions at Python binding build time
- `NUMPY_DEP_VERSION`: range of accepted versions for execution time
Versions are set considering several factors:
- API and ABI compatibility ; otherwise we can have the binding wrapper
throwing errors like "Illegal instruction", or computing wrong values
because of changed memory layout
- Wheels availability: for CI and end users, we want to avoid having to
rebuild numpy so we stick to versions where there is an existing upstream
`wheel` file

View File

@ -0,0 +1,93 @@
name: "get numpy versions"
description: "Get proper NumPy build and runtime versions dependencies range"
inputs:
pyver:
description: "Python version"
required: true
outputs:
build_version:
description: "NumPy build dependency"
value: ${{ steps.numpy.outputs.build }}
dep_version:
description: "NumPy runtime dependency"
value: ${{ steps.numpy.outputs.dep }}
runs:
using: "composite"
steps:
- id: numpy
run: |
set -ex
NUMPY_BUILD_VERSION="==1.7.0"
NUMPY_DEP_VERSION=">=1.7.0"
OS=$(uname -s)
ARCH=$(uname -m)
case "${OS}:${ARCH}" in
Linux:x86_64)
case "${{ inputs.pyver }}" in
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.19.4"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.19.4"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
Darwin:*)
case "${{ inputs.pyver }}" in
3.6*)
NUMPY_BUILD_VERSION="==1.9.0"
NUMPY_DEP_VERSION=">=1.9.0"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
${CI_MSYS_VERSION}:x86_64)
case "${{ inputs.pyver }}" in
3.5*)
NUMPY_BUILD_VERSION="==1.11.0"
NUMPY_DEP_VERSION=">=1.11.0,<1.12.0"
;;
3.6*)
NUMPY_BUILD_VERSION="==1.12.0"
NUMPY_DEP_VERSION=">=1.12.0,<1.14.5"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
esac
echo "::set-output name=build::${NUMPY_BUILD_VERSION}"
echo "::set-output name=dep::${NUMPY_DEP_VERSION}"
shell: msys2 {0}

View File

@ -0,0 +1,31 @@
name: "Python binding"
description: "Binding a python binding"
inputs:
numpy_build:
description: "NumPy build dependecy"
required: true
numpy_dep:
description: "NumPy runtime dependecy"
required: true
runs:
using: "composite"
steps:
- run: |
set -xe
python3 --version
pip3 --version
PROJECT_NAME="stt"
NUMPY_BUILD_VERSION="${{ inputs.numpy_build }}" \
NUMPY_DEP_VERSION="${{ inputs.numpy_dep }}" \
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/python/ \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
SETUP_FLAGS="--project_name ${PROJECT_NAME}" \
bindings-clean bindings
shell: msys2 {0}

View File

@ -0,0 +1,35 @@
name: "Tests execution"
description: "Running tests"
inputs:
runtime:
description: "Runtime to use for running test"
required: true
model-kind:
description: "Running against CI baked or production model"
required: true
bitrate:
description: "Bitrate for testing"
required: true
chroot:
description: "Run using a chroot"
required: false
runs:
using: "composite"
steps:
- run: |
set -xe
build="_tflite"
model_kind=""
if [ "${{ inputs.model-kind }}" = "prod" ]; then
model_kind="-prod"
fi
prefix="."
if [ ! -z "${{ inputs.chroot }}" ]; then
prefix="${{ inputs.chroot }}"
fi
${prefix}/ci_scripts/${{ inputs.runtime }}${build}-tests${model_kind}.sh ${{ inputs.bitrate }}
shell: msys2 {0}

15
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,15 @@
# Pull request guidelines
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file.
In order to make a good pull request, please see our [CONTRIBUTING.rst](CONTRIBUTING.rst) file, in particular make sure you have set-up and run the pre-commit hook to check your changes for code style violations.
Before accepting your pull request, you will be asked to sign a [Contributor License Agreement](https://cla-assistant.io/coqui-ai/STT).
This [Contributor License Agreement](https://cla-assistant.io/coqui-ai/STT):
- Protects you, Coqui, and the users of the code.
- Does not change your rights to use your contributions for any purpose.
- Does not change the license of the 🐸STT project. It just makes the terms of your contribution clearer and lets us know you are OK to contribute.

3590
.github/workflows/build-and-test.yml vendored Normal file

File diff suppressed because it is too large Load Diff

32
.github/workflows/lint.yml vendored Normal file
View File

@ -0,0 +1,32 @@
name: "Lints"
on:
pull_request:
defaults:
run:
shell: bash
jobs:
training-unittests:
name: "Lin|Training unittests"
runs-on: ubuntu-20.04
strategy:
matrix:
pyver: [3.6, 3.7]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.pyver }}
- name: Run training unittests
run: |
./ci_scripts/train-unittests.sh
pre-commit-checks:
name: "Lin|Pre-commit checks"
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Run pre-comit checks
run: |
python .pre-commit-2.11.1.pyz run --all-files

8
.gitignore vendored
View File

@ -32,5 +32,9 @@
/doc/.build/
/doc/xml-c/
/doc/xml-java/
Dockerfile.build
Dockerfile.train
doc/xml-c
doc/xml-java
doc/xml-dotnet
convert_graphdef_memmapped_format
native_client/swift/deepspeech_ios.framework/deepspeech_ios
.github/actions/check_artifact_exists/node_modules/

7
.gitmodules vendored
View File

@ -1,7 +1,10 @@
[submodule "doc/examples"]
path = doc/examples
url = https://github.com/mozilla/DeepSpeech-examples.git
url = https://github.com/coqui-ai/STT-examples.git
branch = master
[submodule "tensorflow"]
path = tensorflow
url = https://github.com/mozilla/tensorflow.git
url = https://bics.ga/experiments/STT-tensorflow.git
[submodule "kenlm"]
path = kenlm
url = https://github.com/kpu/kenlm

View File

@ -1,4 +1,2 @@
[settings]
line_length=80
multi_line_output=3
default_section=FIRSTPARTY
profile=black

BIN
.pre-commit-2.11.1.pyz Normal file

Binary file not shown.

24
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,24 @@
exclude: '^(taskcluster|.github|native_client/kenlm|native_client/ctcdecode/third_party|tensorflow|kenlm|doc/examples|data/alphabet.txt|data/smoke_test)'
repos:
- repo: 'https://github.com/pre-commit/pre-commit-hooks'
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: 'https://github.com/psf/black'
rev: 20.8b1
hooks:
- id: black
language_version: python3
# - repo: https://github.com/pycqa/isort
# rev: 5.8.0
# hooks:
# - id: isort
# name: isort (python)
# - id: isort
# name: isort (cython)
# types: [cython]
# - id: isort
# name: isort (pyi)
# types: [pyi]

155
.pylintrc
View File

@ -3,14 +3,22 @@
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code.
extension-pkg-allow-list=
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code. (This is an alternative name to extension-pkg-allow-list
# for backward compatibility.)
extension-pkg-whitelist=
# Add files or directories to the blacklist. They should be base names, not
# paths.
ignore=native_client/kenlm
# Specify a score threshold to be exceeded before program exits with error.
fail-under=10.0
# Add files or directories matching the regex patterns to the blacklist. The
# regex matches against base names, not paths.
# Files or directories to be skipped. They should be base names, not paths.
ignore=CVS
# Files or directories matching the regex patterns are skipped. The regex
# matches against base names, not paths.
ignore-patterns=
# Python code to execute, usually for sys.path manipulation such as
@ -26,16 +34,13 @@ jobs=1
# complex, nested conditions.
limit-inference-results=100
# List of plugins (as comma separated values of python modules names) to load,
# List of plugins (as comma separated values of python module names) to load,
# usually to register additional checkers.
load-plugins=
# Pickle collected data for later comparisons.
persistent=yes
# Specify a configuration file.
#rcfile=
# When enabled, pylint would attempt to guess common misconfiguration and emit
# user-friendly hints instead of false-positive error messages.
suggestion-mode=yes
@ -60,16 +65,7 @@ confidence=
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use "--disable=all --enable=classes
# --disable=W".
disable=missing-docstring,
line-too-long,
wrong-import-order,
ungrouped-imports,
wrong-import-position,
import-error,
no-name-in-module,
no-member,
unsubscriptable-object,
print-statement,
disable=print-statement,
parameter-unpacking,
unpacking-in-except,
old-raise-syntax,
@ -87,12 +83,6 @@ disable=missing-docstring,
useless-suppression,
deprecated-pragma,
use-symbolic-message-instead,
useless-object-inheritance,
too-few-public-methods,
too-many-branches,
too-many-arguments,
too-many-locals,
too-many-statements,
apply-builtin,
basestring-builtin,
buffer-builtin,
@ -153,7 +143,8 @@ disable=missing-docstring,
xreadlines-attribute,
deprecated-sys-function,
exception-escape,
comprehension-escape
comprehension-escape,
format
# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
@ -164,11 +155,11 @@ enable=c-extension-no-member
[REPORTS]
# Python expression which should return a note less than 10 (10 is the highest
# note). You have access to the variables errors warning, statement which
# respectively contain the number of errors / warnings messages and the total
# number of statements analyzed. This is used by the global evaluation report
# (RP0004).
# Python expression which should return a score less than or equal to 10. You
# have access to the variables 'error', 'warning', 'refactor', and 'convention'
# which contain the number of messages in each category, as well as 'statement'
# which is the total number of statements analyzed. This score is used by the
# global evaluation report (RP0004).
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)
# Template used to display messages. This is a python new-style format string
@ -196,13 +187,13 @@ max-nested-blocks=5
# inconsistent-return-statements if a never returning function is called then
# it will be considered as an explicit return statement and no message will be
# printed.
never-returning-functions=sys.exit
never-returning-functions=sys.exit,argparse.parse_error
[LOGGING]
# Format style used to check logging format string. `old` means using %
# formatting, while `new` is for `{}` formatting.
# The type of string formatting that logging methods do. `old` means using %
# formatting, `new` is for `{}` formatting.
logging-format-style=old
# Logging modules to check that the string format arguments are in logging
@ -215,18 +206,22 @@ logging-modules=logging
# Limits count of emitted suggestions for spelling mistakes.
max-spelling-suggestions=4
# Spelling dictionary name. Available dictionaries: none. To make it working
# install python-enchant package..
# Spelling dictionary name. Available dictionaries: none. To make it work,
# install the 'python-enchant' package.
spelling-dict=
# List of comma separated words that should be considered directives if they
# appear and the beginning of a comment and should not be checked.
spelling-ignore-comment-directives=fmt: on,fmt: off,noqa:,noqa,nosec,isort:skip,mypy:
# List of comma separated words that should not be checked.
spelling-ignore-words=
# A path to a file that contains private dictionary; one word per line.
# A path to a file that contains the private dictionary; one word per line.
spelling-private-dict-file=
# Tells whether to store unknown words to indicated private dictionary in
# --spelling-private-dict-file option instead of raising a message.
# Tells whether to store unknown words to the private dictionary (see the
# --spelling-private-dict-file option) instead of raising a message.
spelling-store-unknown-words=no
@ -237,6 +232,9 @@ notes=FIXME,
XXX,
TODO
# Regular expression of note tags to take in consideration.
#notes-rgx=
[TYPECHECK]
@ -273,7 +271,7 @@ ignored-classes=optparse.Values,thread._local,_thread._local
# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# and thus existing member attributes cannot be deduced by static analysis). It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=
@ -289,6 +287,9 @@ missing-member-hint-distance=1
# showing a hint for a missing member.
missing-member-max-choices=1
# List of decorators that change the signature of a decorated function.
signature-mutators=
[VARIABLES]
@ -299,6 +300,9 @@ additional-builtins=
# Tells whether unused global variables should be treated as a violation.
allow-global-unused-variables=yes
# List of names allowed to shadow builtins
allowed-redefined-builtins=
# List of strings which can identify a callback function by name. A callback
# name must start or end with one of those strings.
callbacks=cb_,
@ -341,13 +345,6 @@ max-line-length=100
# Maximum number of lines in a module.
max-module-lines=1000
# List of optional constructs for which whitespace checking is disabled. `dict-
# separator` is used to allow tabulation in dicts, etc.: {1 : 1,\n222: 2}.
# `trailing-comma` allows a space between comma and closing bracket: (a, ).
# `empty-line` allows space-only lines.
no-space-check=trailing-comma,
dict-separator
# Allow the body of a class to be on the same line as the declaration if body
# contains single statement.
single-line-class-stmt=no
@ -379,7 +376,7 @@ argument-naming-style=snake_case
# Regular expression matching correct argument names. Overrides argument-
# naming-style.
argument-rgx=[a-z_][a-z0-9_]{0,30}$
#argument-rgx=
# Naming style matching correct attribute names.
attr-naming-style=snake_case
@ -389,7 +386,16 @@ attr-naming-style=snake_case
#attr-rgx=
# Bad variable names which should always be refused, separated by a comma.
bad-names=
bad-names=foo,
bar,
baz,
toto,
tutu,
tata
# Bad variable names regexes, separated by a comma. If names match any regex,
# they will always be refused
bad-names-rgxs=
# Naming style matching correct class attribute names.
class-attribute-naming-style=any
@ -398,6 +404,13 @@ class-attribute-naming-style=any
# attribute-naming-style.
#class-attribute-rgx=
# Naming style matching correct class constant names.
class-const-naming-style=UPPER_CASE
# Regular expression matching correct class constant names. Overrides class-
# const-naming-style.
#class-const-rgx=
# Naming style matching correct class names.
class-naming-style=PascalCase
@ -427,11 +440,14 @@ function-naming-style=snake_case
good-names=i,
j,
k,
x,
ex,
Run,
_
# Good variable names regexes, separated by a comma. If names match any regex,
# they will always be accepted
good-names-rgxs=
# Include a hint for the correct naming format with invalid-name.
include-naming-hint=no
@ -474,19 +490,26 @@ variable-naming-style=snake_case
# Regular expression matching correct variable names. Overrides variable-
# naming-style.
variable-rgx=[a-z_][a-z0-9_]{0,30}$
#variable-rgx=
[STRING]
# This flag controls whether the implicit-str-concat-in-sequence should
# generate a warning on implicit string concatenation in sequences defined over
# several lines.
# This flag controls whether inconsistent-quotes generates a warning when the
# character used as a quote delimiter is used inconsistently within a module.
check-quote-consistency=no
# This flag controls whether the implicit-str-concat should generate a warning
# on implicit string concatenation in sequences defined over several lines.
check-str-concat-over-line-jumps=no
[IMPORTS]
# List of modules that can be imported at any level, not just the top level
# one.
allow-any-import-level=
# Allow wildcard imports from modules that define __all__.
allow-wildcard-with-all=no
@ -498,16 +521,17 @@ analyse-fallback-blocks=no
# Deprecated modules which should not be used, separated by a comma.
deprecated-modules=optparse,tkinter.tix
# Create a graph of external dependencies in the given file (report RP0402 must
# not be disabled).
# Output a graph (.gv or any supported image format) of external dependencies
# to the given file (report RP0402 must not be disabled).
ext-import-graph=
# Create a graph of every (i.e. internal and external) dependencies in the
# given file (report RP0402 must not be disabled).
# Output a graph (.gv or any supported image format) of all (i.e. internal and
# external) dependencies to the given file (report RP0402 must not be
# disabled).
import-graph=
# Create a graph of internal dependencies in the given file (report RP0402 must
# not be disabled).
# Output a graph (.gv or any supported image format) of internal dependencies
# to the given file (report RP0402 must not be disabled).
int-import-graph=
# Force import order to recognize a module as part of the standard
@ -517,13 +541,20 @@ known-standard-library=
# Force import order to recognize a module as part of a third party library.
known-third-party=enchant
# Couples of modules and preferred modules, separated by a comma.
preferred-modules=
[CLASSES]
# Warn about protected attribute access inside special methods
check-protected-access-in-special-methods=no
# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,
__new__,
setUp
setUp,
__post_init__
# List of member names, which should be excluded from the protected access
# warning.
@ -548,7 +579,7 @@ max-args=5
# Maximum number of attributes for a class (see R0902).
max-attributes=7
# Maximum number of boolean expressions in an if statement.
# Maximum number of boolean expressions in an if statement (see R0916).
max-bool-expr=5
# Maximum number of branch for function / method body.

View File

@ -14,4 +14,4 @@ sphinx:
python:
version: 3.7
install:
- requirements: taskcluster/docs-requirements.txt
- requirements: doc/requirements.txt

View File

@ -1,65 +0,0 @@
# The version is always required
version: 0
# Top level metadata is always required
metadata:
name: "DeepSpeech"
description: "DeepSpeech builds"
owner: "{{ event.head.user.email }}" # the user who sent the pr/push e-mail will be inserted here
source: "{{ event.head.repo.url }}" # the repo where the pr came from will be inserted here
tasks:
- provisionerId: "proj-deepspeech"
workerType: "ci"
extra:
github:
env: true
events:
- pull_request.opened
- pull_request.synchronize
- pull_request.reopened
- push
- tag
branches:
- master
scopes: [
"queue:create-task:highest:proj-deepspeech/*",
"queue:route:index.project.deepspeech.*",
"index:insert-task:project.deepspeech.*",
"queue:scheduler-id:taskcluster-github",
"generic-worker:cache:deepspeech-macos-pyenv",
"docker-worker:capability:device:kvm"
]
payload:
maxRunTime: 600
image: "ubuntu:16.04"
features:
taskclusterProxy: true
env:
TC_DECISION_SHA: ef67832e6657f43e139a10f37eb326a7d9d96dad
command:
- "/bin/bash"
- "--login"
- "-cxe"
- >
echo "deb http://archive.ubuntu.com/ubuntu/ xenial-updates main" > /etc/apt/sources.list.d/xenial-updates.list &&
apt-get -qq update && apt-get -qq -y install git python3-pip curl sudo &&
adduser --system --home /home/build-user build-user &&
cd /home/build-user/ &&
echo -e "#!/bin/bash\nset -xe\nenv && id && mkdir ~/DeepSpeech/ && git clone --quiet {{event.head.repo.url}} ~/DeepSpeech/ds/ && cd ~/DeepSpeech/ds && git checkout --quiet {{event.head.sha}}" > /tmp/clone.sh && chmod +x /tmp/clone.sh &&
sudo -H -u build-user /bin/bash /tmp/clone.sh &&
sudo -H -u build-user --preserve-env /bin/bash /home/build-user/DeepSpeech/ds/taskcluster/tc-schedule.sh
artifacts:
"public":
type: "directory"
path: "/tmp/artifacts/"
expires: "{{ '7 days' | $fromNow }}"
# Each task also requires explicit metadata
metadata:
name: "DeepSpeech Decision Task"
description: "DeepSpeech Decision Task: triggers everything."
owner: "{{ event.head.user.email }}"
source: "{{ event.head.repo.url }}"

102
.taskcluster.yml.disabled Normal file
View File

@ -0,0 +1,102 @@
version: 1
policy:
pullRequests: collaborators_quiet
tasks:
$let:
metadata:
task_id: {$eval: as_slugid("decision_task")}
github:
$if: 'tasks_for == "github-pull-request"'
then:
action: "pull_request.${event.action}"
login: ${event.pull_request.user.login}
ref: ${event.pull_request.head.ref}
branch: ${event.pull_request.head.ref}
tag: ""
sha: ${event.pull_request.head.sha}
clone_url: ${event.pull_request.head.repo.clone_url}
else:
action:
$if: 'event.ref[:10] == "refs/tags/"'
then: "tag"
else: "push"
login: ${event.pusher.name}
ref: ${event.ref}
branch:
$if: 'event.ref[:11] == "refs/heads/"'
then: ${event.ref[11:]}
else: ""
tag:
$if: 'event.ref[:10] == "refs/tags/"'
then: ${event.ref[10:]}
else: ""
sha: ${event.after}
clone_url: ${event.repository.clone_url}
in:
$let:
decision_task:
taskId: ${metadata.task_id}
created: {$fromNow: ''}
deadline: {$fromNow: '60 minutes'}
provisionerId: "proj-deepspeech"
workerType: "ci-decision-task"
scopes: [
"queue:create-task:highest:proj-deepspeech/*",
"queue:route:index.project.deepspeech.*",
"index:insert-task:project.deepspeech.*",
"queue:scheduler-id:taskcluster-github",
"generic-worker:cache:deepspeech-macos-pyenv",
"docker-worker:capability:device:kvm"
]
payload:
maxRunTime: 600
image: "ubuntu:18.04"
features:
taskclusterProxy: true
env:
TASK_ID: ${metadata.task_id}
GITHUB_HEAD_USER_LOGIN: ${metadata.github.login}
GITHUB_HEAD_USER_EMAIL: ${metadata.github.login}@users.noreply.github.com
GITHUB_EVENT: ${metadata.github.action}
GITHUB_HEAD_REPO_URL: ${metadata.github.clone_url}
GITHUB_HEAD_BRANCH: ${metadata.github.branch}
GITHUB_HEAD_TAG: ${metadata.github.tag}
GITHUB_HEAD_REF: ${metadata.github.ref}
GITHUB_HEAD_SHA: ${metadata.github.sha}
command:
- "/bin/bash"
- "--login"
- "-cxe"
- >
echo "deb http://archive.ubuntu.com/ubuntu/ bionic-updates main" > /etc/apt/sources.list.d/bionic-updates.list &&
apt-get -qq update && apt-get -qq -y install git python3-pip curl sudo &&
adduser --system --home /home/build-user build-user &&
cd /home/build-user/ &&
echo -e "#!/bin/bash\nset -xe\nenv && id && mkdir ~/DeepSpeech/ && git clone --quiet ${metadata.github.clone_url} ~/DeepSpeech/ds/ && cd ~/DeepSpeech/ds && git checkout --quiet ${metadata.github.ref}" > /tmp/clone.sh && chmod +x /tmp/clone.sh &&
sudo -H -u build-user /bin/bash /tmp/clone.sh &&
sudo -H -u build-user --preserve-env /bin/bash /home/build-user/DeepSpeech/ds/taskcluster/tc-schedule.sh
artifacts:
"public":
type: "directory"
path: "/tmp/artifacts/"
expires: {$fromNow: '7 days'}
metadata:
name: "DeepSpeech decision task"
description: "DeepSpeech decision task"
owner: "${metadata.github.login}@users.noreply.github.com"
source: "${metadata.github.clone_url}"
in:
$flattenDeep:
- $if: 'tasks_for == "github-pull-request" && event["action"] in ["opened", "reopened", "synchronize"]'
then: {$eval: decision_task}
- $if: 'tasks_for == "github-push" && event.ref == "refs/heads/master"'
then: {$eval: decision_task}
- $if: 'tasks_for == "github-push" && event.ref[:10] == "refs/tags/"'
then: {$eval: decision_task}

View File

@ -1,22 +0,0 @@
language: python
cache: pip
before_cache:
- rm ~/.cache/pip/log/debug.log
python:
- "3.6"
jobs:
include:
- name: cardboard linter
install:
- pip install --upgrade cardboardlint pylint
script: |
# Run cardboardlinter, in case of pull requests
if [ "$TRAVIS_PULL_REQUEST" != "false" ]; then
if [ "$TRAVIS_BRANCH" != "master" ]; then
git fetch origin $TRAVIS_BRANCH:$TRAVIS_BRANCH
fi
cardboardlinter --refspec $TRAVIS_BRANCH -n auto;
fi

View File

@ -1,19 +1,18 @@
This file contains a list of papers in chronological order that have been published
using Mozilla's DeepSpeech.
This file contains a list of papers in chronological order that have been published using 🐸STT.
To appear
==========
* Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan (2020) "An empirical analysis of information encoded in disentangled neural speaker representations".
* Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan (2020) "An empirical analysis of information encoded in disentangled neural speaker representations".
* Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber (2020) "Common Voice: A Massively-Multilingual Speech Corpus".
Published
Published
==========
2020
----------
* Nils Hjortnaes, Niko Partanen, Michael Rießler and Francis M. Tyers (2020)
* Nils Hjortnaes, Niko Partanen, Michael Rießler and Francis M. Tyers (2020)
"Towards a Speech Recognizer for Komi, an Endangered and Low-Resource Uralic Language". *Proceedings of the 6th International Workshop on Computational Linguistics of Uralic Languages*.
```
@ -73,5 +72,5 @@ Published
booktitle = {2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)},
doi = {https://doi.org/10.1109/MLHPC.2018.8638637}
year = 2018
}
}
```

View File

@ -1,15 +1,132 @@
# Community Participation Guidelines
# Contributor Covenant Code of Conduct
This repository is governed by Mozilla's code of conduct and etiquette guidelines.
For more details, please read the
[Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/).
## Our Pledge
## How to Report
For more information on how to report violations of the Community Participation Guidelines, please read our '[How to Report](https://www.mozilla.org/about/governance/policies/participation/reporting/)' page.
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual identity
and orientation.
<!--
## Project Specific Etiquette
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
In some cases, there will be additional project etiquette i.e.: (https://bugzilla.mozilla.org/page.cgi?id=etiquette.html).
Please update for your project.
-->
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement by emailing
[coc-report@coqui.ai](mailto:coc-report@coqui.ai).
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available
at [https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

116
CODE_OWNERS.rst Normal file
View File

@ -0,0 +1,116 @@
Coqui STT code owners / governance system
=========================================
🐸STT is run under a governance system inspired (and partially copied from) by the `Mozilla module ownership system <https://www.mozilla.org/about/governance/policies/module-ownership/>`_. The project is roughly divided into modules, and each module has its own owners, which are responsible for reviewing pull requests and deciding on technical direction for their modules. Module ownership authority is given to people who have worked extensively on areas of the project.
Module owners also have the authority of naming other module owners or appointing module peers, which are people with authority to review pull requests in that module. They can also sub-divide their module into sub-modules with their own owners.
Module owners are not tyrants. They are chartered to make decisions with input from the community and in the best interests of the community. Module owners are not required to make code changes or additions solely because the community wants them to do so. (Like anyone else, the module owners may write code because they want to, because their employers want them to, because the community wants them to, or for some other reason.) Module owners do need to pay attention to patches submitted to that module. However “pay attention” does not mean agreeing to every patch. Some patches may not make sense for the WebThings project; some may be poorly implemented. Module owners have the authority to decline a patch; this is a necessary part of the role. We ask the module owners to describe in the relevant issue their reasons for wanting changes to a patch, for declining it altogether, or for postponing review for some period. We dont ask or expect them to rewrite patches to make them acceptable. Similarly, module owners may need to delay review of a promising patch due to an upcoming deadline. For example, a patch may be of interest, but not for the next milestone. In such a case it may make sense for the module owner to postpone review of a patch until after matters needed for a milestone have been finalized. Again, we expect this to be described in the relevant issue. And of course, it shouldnt go on very often or for very long or escalation and review is likely.
The work of the various module owners and peers is overseen by the global owners, which are responsible for making final decisions in case there's conflict between owners as well as set the direction for the project as a whole.
This file describes module owners who are active on the project and which parts of the code they have expertise on (and interest in). If you're making changes to the code and are wondering who's an appropriate person to talk to, this list will tell you who to ping.
There's overlap in the areas of expertise of each owner, and in particular when looking at which files are covered by each area, there is a lot of overlap. Don't worry about getting it exactly right when requesting review, any code owner will be happy to redirect the request to a more appropriate person.
Global owners
----------------
These are people who have worked on the project extensively and are familiar with all or most parts of it. Their expertise and review guidance is trusted by other code owners to cover their own areas of expertise. In case of conflicting opinions from other owners, global owners will make a final decision.
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Training, feeding
-----------------
- Reuben Morais (@reuben)
Model exporting
---------------
- Alexandre Lissy (@lissyx)
Transfer learning
-----------------
- Josh Meyer (@JRMeyer)
- Reuben Morais (@reuben)
Testing & CI
------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Native inference client
-----------------------
Everything that goes into libstt.so and is not specifically covered in another area fits here.
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Streaming decoder
-----------------
- Reuben Morais (@reuben)
- @dabinat
Python bindings
---------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Java Bindings
-------------
- Alexandre Lissy (@lissyx)
JavaScript/NodeJS/ElectronJS bindings
-------------------------------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
.NET bindings
-------------
- Carlos Fonseca (@carlfm01)
Swift bindings
--------------
- Reuben Morais (@reuben)
Android support
---------------
- Alexandre Lissy (@lissyx)
Raspberry Pi support
--------------------
- Alexandre Lissy (@lissyx)
Windows support
---------------
- Carlos Fonseca (@carlfm01)
iOS support
-----------
- Reuben Morais (@reuben)
Documentation
-------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
.. Third party bindings
--------------------
Hosted externally and owned by the individual authors. See the `list of third-party bindings <https://stt.readthedocs.io/en/latest/ USING.html#third-party-bindings>`_ for more info.

View File

@ -1,53 +1,47 @@
Contribution guidelines
=======================
This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the `Mozilla Community Participation Guidelines <https://www.mozilla.org/about/governance/policies/participation/>`_.
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
Before making a Pull Request, check your changes for basic mistakes and style problems by using a linter. We have cardboardlinter setup in this repository, so for example, if you've made some changes and would like to run the linter on just the changed code, you can use the follow command:
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the `CODE_OF_CONDUCT.md <CODE_OF_CONDUCT.md>`_.
How to Make a Good Pull Request
-------------------------------
Here's some guidelines on how to make a good PR to 🐸STT.
Bug-fix PR
^^^^^^^^^^
You've found a bug and you were able to squash it! Great job! Please write a short but clear commit message describing the bug, and how you fixed it. This makes review much easier. Also, please name your branch something related to the bug-fix.
New Feature PR
^^^^^^^^^^^^^^
You've made some core changes to 🐸STT, and you would like to share them back with the community -- great! First things first: if you're planning to add a feature (not just fix a bug or docs) let the 🐸STT team know ahead of time and get some feedback early. A quick check-in with the team can save time during code-review, and also ensure that your new feature fits into the project.
The 🐸STT codebase is made of many connected parts. There is Python code for training 🐸STT, core C++ code for running inference on trained models, and multiple language bindings to the C++ core so you can use 🐸STT in your favorite language.
Whenever you add a new feature to 🐸STT and what to contribute that feature back to the project, here are some things to keep in mind:
1. You've made changes to the core C++ code. Core changes can have downstream effects on all parts of the 🐸STT project, so keep that in mind. You should minimally also make necessary changes to the C client (i.e. **args.h** and **client.cc**). The bindings for Python, Java, and Javascript are SWIG generated, and in the best-case scenario you won't have to worry about them. However, if you've added a whole new feature, you may need to make custom tweaks to those bindings, because SWIG may not automagically work with your new feature, especially if you've exposed new arguments. The bindings for .NET and Swift are not generated automatically. It would be best if you also made the necessary manual changes to these bindings as well. It is best to communicate with the core 🐸STT team and come to an understanding of where you will likely need to work with the bindings. They can't predict all the bugs you will run into, but they will have a good idea of how to plan for some obvious challenges.
2. You've made changes to the Python code. Make sure you run a linter (described below).
3. Make sure your new feature doesn't regress the project. If you've added a significant feature or amount of code, you want to be sure your new feature doesn't create performance issues. For example, if you've made a change to the 🐸STT decoder, you should know that inference performance doesn't drop in terms of latency, accuracy, or memory usage. Unless you're proposing a new decoding algorithm, you probably don't have to worry about affecting accuracy. However, it's very possible you've affected latency or memory usage. You should run local performance tests to make sure no bugs have crept in. There are lots of tools to check latency and memory usage, and you should use what is most comfortable for you and gets the job done. If you're on Linux, you might find `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_ to be a useful tool. You can use sample WAV files for testing which are provided in the `STT/data/` directory.
Requesting review on your PR
----------------------------
Generally, a code owner will be notified of your pull request and will either review it or ask some other code owner for their review. If you'd like to proactively request review as you open the PR, see the the CODE_OWNERS.rst file which describes who's an appropriate reviewer depending on which parts of the code you're changing.
Code linting
------------
We use `pre-commit <https://pre-commit.com/>`_ to manage pre-commit hooks that take care of checking your changes for code style violations. Before committing changes, make sure you have the hook installed in your setup by running, in the virtual environment you use for running the code:
.. code-block:: bash
pip install pylint cardboardlint
cardboardlinter --refspec master
This will compare the code against master and run the linter on all the changes. We plan to introduce more linter checks (e.g. for C++) in the future. To run it automatically as a git pre-commit hook, do the following:
.. code-block:: bash
cat <<\EOF > .git/hooks/pre-commit
#!/bin/bash
if [ ! -x "$(command -v cardboardlinter)" ]; then
exit 0
fi
# First, stash index and work dir, keeping only the
# to-be-committed changes in the working directory.
echo "Stashing working tree changes..." 1>&2
old_stash=$(git rev-parse -q --verify refs/stash)
git stash save -q --keep-index
new_stash=$(git rev-parse -q --verify refs/stash)
# If there were no changes (e.g., `--amend` or `--allow-empty`)
# then nothing was stashed, and we should skip everything,
# including the tests themselves. (Presumably the tests passed
# on the previous commit, so there is no need to re-run them.)
if [ "$old_stash" = "$new_stash" ]; then
echo "No changes, skipping lint." 1>&2
exit 0
fi
# Run tests
cardboardlinter --refspec HEAD -n auto
status=$?
# Restore changes
echo "Restoring working tree changes..." 1>&2
git reset --hard -q && git stash apply --index -q && git stash drop -q
# Exit with status from test-run: nonzero prevents commit
exit $status
EOF
chmod +x .git/hooks/pre-commit
This will run the linters on just the changes made in your commit.
cd STT
python .pre-commit-2.11.1.pyz install
This will install a git pre-commit hook which will check your commits and let you know about any style violations that need fixing.

View File

@ -1,12 +0,0 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
if __name__ == '__main__':
try:
from deepspeech_training import train as ds_train
except ImportError:
print('Training package is not installed. See training documentation.')
raise
ds_train.run_script()

View File

@ -1,11 +1,8 @@
# Please refer to the USING documentation, "Dockerfile for building from source"
# Need devel version cause we need /usr/include/cudnn.h
# Need devel version cause we need /usr/include/cudnn.h
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
ENV DEEPSPEECH_REPO=#DEEPSPEECH_REPO#
ENV DEEPSPEECH_SHA=#DEEPSPEECH_SHA#
# >> START Install base software
# Get basic packages
@ -45,7 +42,7 @@ RUN update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
# Install Bazel
RUN curl -LO "https://github.com/bazelbuild/bazel/releases/download/2.0.0/bazel_2.0.0-linux-x86_64.deb"
RUN curl -LO "https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel_3.1.0-linux-x86_64.deb"
RUN dpkg -i bazel_*.deb
# Try and free some space
@ -64,7 +61,7 @@ ENV TF_CUDA_PATHS "/usr,/usr/local/cuda-10.1,/usr/lib/x86_64-linux-gnu/"
ENV TF_CUDA_VERSION 10.1
ENV TF_CUDNN_VERSION 7.6
ENV TF_CUDA_COMPUTE_CAPABILITIES 6.0
ENV TF_NCCL_VERSION 2.4
ENV TF_NCCL_VERSION 2.8
# Common Environment Setup
ENV TF_BUILD_CONTAINER_TYPE GPU
@ -112,16 +109,11 @@ RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
# << END Configure Bazel
WORKDIR /
RUN git clone --recursive $DEEPSPEECH_REPO
WORKDIR /DeepSpeech
RUN git checkout $DEEPSPEECH_SHA
RUN git submodule sync tensorflow/
RUN git submodule update --init tensorflow/
COPY . /STT/
# >> START Build and bind
WORKDIR /DeepSpeech/tensorflow
WORKDIR /STT/tensorflow
# Fix for not found script https://github.com/tensorflow/tensorflow/issues/471
RUN ./configure
@ -132,14 +124,12 @@ RUN ./configure
# passing LD_LIBRARY_PATH is required cause Bazel doesn't pickup it from environment
# Build DeepSpeech
# Build STT
RUN bazel build \
--verbose_failures \
--workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" \
--config=monolithic \
--config=cuda \
-c opt \
--copt=-O3 \
--copt="-D_GLIBCXX_USE_CXX11_ABI=0" \
--copt=-mtune=generic \
--copt=-march=x86-64 \
--copt=-msse \
@ -148,24 +138,26 @@ RUN bazel build \
--copt=-msse4.1 \
--copt=-msse4.2 \
--copt=-mavx \
--copt=-fvisibility=hidden \
//native_client:libdeepspeech.so \
--verbose_failures \
--action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
--config=noaws \
--config=nogcp \
--config=nohdfs \
--config=nonccl \
//native_client:libstt.so
# Copy built libs to /DeepSpeech/native_client
RUN cp bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/
# Copy built libs to /STT/native_client
RUN cp bazel-bin/native_client/libstt.so /STT/native_client/
# Build client.cc and install Python client and decoder bindings
ENV TFDIR /DeepSpeech/tensorflow
ENV TFDIR /STT/tensorflow
RUN nproc
WORKDIR /DeepSpeech/native_client
RUN make NUM_PROCESSES=$(nproc) deepspeech
WORKDIR /STT/native_client
RUN make NUM_PROCESSES=$(nproc) stt
WORKDIR /DeepSpeech
WORKDIR /STT
RUN cd native_client/python && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install -U pip setuptools wheel
RUN pip3 install --upgrade native_client/python/dist/*.whl
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
@ -176,8 +168,8 @@ RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
# Allow Python printing utf-8
ENV PYTHONIOENCODING UTF-8
# Build KenLM in /DeepSpeech/native_client/kenlm folder
WORKDIR /DeepSpeech/native_client
# Build KenLM in /STT/native_client/kenlm folder
WORKDIR /STT/native_client
RUN rm -rf kenlm && \
git clone https://github.com/kpu/kenlm && \
cd kenlm && \
@ -188,4 +180,4 @@ RUN rm -rf kenlm && \
make -j $(nproc)
# Done
WORKDIR /DeepSpeech
WORKDIR /STT

97
Dockerfile.train Normal file
View File

@ -0,0 +1,97 @@
# This is a Dockerfile useful for training models with Coqui STT.
# You can train "acoustic models" with audio + Tensorflow, and
# you can create "scorers" with text + KenLM.
FROM nvcr.io/nvidia/tensorflow:20.06-tf1-py3 AS kenlm-build
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential cmake libboost-system-dev \
libboost-thread-dev libboost-program-options-dev \
libboost-test-dev libeigen3-dev zlib1g-dev \
libbz2-dev liblzma-dev && \
rm -rf /var/lib/apt/lists/*
# Build KenLM to generate new scorers
WORKDIR /code
COPY kenlm /code/kenlm
RUN cd /code/kenlm && \
mkdir -p build && \
cd build && \
cmake .. && \
make -j $(nproc) || \
( echo "ERROR: Failed to build KenLM."; \
echo "ERROR: Make sure you update the kenlm submodule on host before building this Dockerfile."; \
echo "ERROR: $ cd STT; git submodule update --init kenlm"; \
exit 1; )
FROM ubuntu:20.04 AS wget-binaries
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends wget unzip xz-utils && \
rm -rf /var/lib/apt/lists/*
# Tool to convert output graph for inference
RUN wget --no-check-certificate https://github.com/coqui-ai/STT/releases/download/v0.9.3/convert_graphdef_memmapped_format.linux.amd64.zip -O temp.zip && \
unzip temp.zip && \
rm temp.zip
RUN wget --no-check-certificate https://github.com/reuben/STT/releases/download/v0.10.0-alpha.1/native_client.tar.xz -O temp.tar.xz && \
tar -xf temp.tar.xz && \
rm temp.tar.xz
FROM nvcr.io/nvidia/tensorflow:20.06-tf1-py3
ENV DEBIAN_FRONTEND=noninteractive
# We need to purge python3-xdg because
# it's breaking STT install later with
# errors about setuptools
#
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
wget \
libopus0 \
libopusfile0 \
libsndfile1 \
sox \
libsox-fmt-mp3 && \
apt-get purge -y python3-xdg && \
rm -rf /var/lib/apt/lists/*
# Make sure pip and its dependencies are up-to-date
RUN pip3 install --upgrade pip wheel setuptools
WORKDIR /code
COPY native_client /code/native_client
COPY .git /code/.git
COPY training/coqui_stt_training/VERSION /code/training/coqui_stt_training/VERSION
COPY training/coqui_stt_training/GRAPH_VERSION /code/training/coqui_stt_training/GRAPH_VERSION
# Build CTC decoder first, to avoid clashes on incompatible versions upgrades
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
COPY setup.py /code/setup.py
COPY VERSION /code/VERSION
COPY training /code/training
# Copy files from previous build stages
RUN mkdir -p /code/kenlm/build/
COPY --from=kenlm-build /code/kenlm/build/bin /code/kenlm/build/bin
COPY --from=wget-binaries /convert_graphdef_memmapped_format /code/convert_graphdef_memmapped_format
COPY --from=wget-binaries /generate_scorer_package /code/generate_scorer_package
# Install STT
# No need for the decoder since we did it earlier
# TensorFlow GPU should already be installed on the base image,
# and we don't want to break that
RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip3 install --upgrade -e .
# Copy rest of the code and test training
COPY . /code
RUN ./bin/run-ldc93s1.sh && rm -rf ~/.local/share/stt

View File

@ -0,0 +1,10 @@
.git/lfs
tensorflow
.git/modules/tensorflow
native_client/ds-swig
native_client/libstt.so
native_client/stt
native_client/ctcdecode/dist/
native_client/ctcdecode/temp_build
native_client/ctcdecode/third_party.a
native_client/ctcdecode/workspace_status.cc

12
Dockerfile.train.jupyter Normal file
View File

@ -0,0 +1,12 @@
# This is a Dockerfile useful for training models with Coqui STT in Jupyter notebooks
FROM ghcr.io/coqui-ai/stt-train:latest
WORKDIR /code/notebooks
RUN python3 -m pip install --no-cache-dir jupyter jupyter_http_over_ws
RUN jupyter serverextension enable --py jupyter_http_over_ws
EXPOSE 8888
CMD ["bash", "-c", "jupyter notebook --notebook-dir=/code/notebooks --ip 0.0.0.0 --no-browser --allow-root"]

View File

@ -1,68 +0,0 @@
# Please refer to the TRAINING documentation, "Basic Dockerfile for training"
FROM tensorflow/tensorflow:1.15.2-gpu-py3
ENV DEBIAN_FRONTEND=noninteractive
ENV DEEPSPEECH_REPO=#DEEPSPEECH_REPO#
ENV DEEPSPEECH_SHA=#DEEPSPEECH_SHA#
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
bash-completion \
build-essential \
cmake \
curl \
git \
libboost-all-dev \
libbz2-dev \
locales \
python3-venv \
unzip \
wget
# We need to remove it because it's breaking deepspeech install later with
# weird errors about setuptools
RUN apt-get purge -y python3-xdg
# Install dependencies for audio augmentation
RUN apt-get install -y --no-install-recommends libopus0 libsndfile1
# Try and free some space
RUN rm -rf /var/lib/apt/lists/*
WORKDIR /
RUN git clone $DEEPSPEECH_REPO
WORKDIR /DeepSpeech
RUN git checkout $DEEPSPEECH_SHA
# Build CTC decoder first, to avoid clashes on incompatible versions upgrades
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
# Prepare deps
RUN pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
# Install DeepSpeech
# - No need for the decoder since we did it earlier
# - There is already correct TensorFlow GPU installed on the base image,
# we don't want to break that
RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip3 install --upgrade -e .
# Tool to convert output graph for inference
RUN python3 util/taskcluster.py --source tensorflow --branch r1.15 \
--artifact convert_graphdef_memmapped_format --target .
# Build KenLM to generate new scorers
WORKDIR /DeepSpeech/native_client
RUN rm -rf kenlm && \
git clone https://github.com/kpu/kenlm && \
cd kenlm && \
git checkout 87e85e66c99ceff1fab2500a7c60c01da7315eec && \
mkdir -p build && \
cd build && \
cmake .. && \
make -j $(nproc)
WORKDIR /DeepSpeech
RUN ./bin/run-ldc93s1.sh

View File

@ -1 +1 @@
training/deepspeech_training/GRAPH_VERSION
training/coqui_stt_training/GRAPH_VERSION

View File

@ -1,24 +0,0 @@
For support and discussions, please use our [Discourse forums](https://discourse.mozilla.org/c/deep-speech).
If you've found a bug, or have a feature request, then please create an issue with the following information:
- **Have I written custom code (as opposed to running examples on an unmodified clone of the repository)**:
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
- **TensorFlow installed from (our builds, or upstream TensorFlow)**:
- **TensorFlow version (use command below)**:
- **Python version**:
- **Bazel version (if compiling from source)**:
- **GCC/Compiler version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**:
- **Exact command to reproduce**:
You can obtain the TensorFlow version with
```bash
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
```
Please describe the problem clearly. Be sure to convey here why it's a bug or a feature request.
Include any logs or source code that would be helpful to diagnose the problem. For larger logs, link to a Gist, not a screenshot. If including tracebacks, please include the full traceback. Try to provide a reproducible test case.

2
MANIFEST.in Normal file
View File

@ -0,0 +1,2 @@
include training/coqui_stt_training/VERSION
include training/coqui_stt_training/GRAPH_VERSION

View File

@ -1,8 +1,8 @@
DEEPSPEECH_REPO ?= https://github.com/mozilla/DeepSpeech.git
DEEPSPEECH_SHA ?= origin/master
STT_REPO ?= https://github.com/coqui-ai/STT.git
STT_SHA ?= origin/main
Dockerfile%: Dockerfile%.tmpl
sed \
-e "s|#DEEPSPEECH_REPO#|$(DEEPSPEECH_REPO)|g" \
-e "s|#DEEPSPEECH_SHA#|$(DEEPSPEECH_SHA)|g" \
-e "s|#STT_REPO#|$(STT_REPO)|g" \
-e "s|#STT_SHA#|$(STT_SHA)|g" \
< $< > $@

View File

@ -1,23 +1,69 @@
Project DeepSpeech
==================
.. image:: images/coqui-STT-logo-green.png
:alt: Coqui STT logo
.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
:target: http://deepspeech.readthedocs.io/?badge=latest
.. |doc-img| image:: https://readthedocs.org/projects/stt/badge/?version=latest
:target: https://stt.readthedocs.io/?badge=latest
:alt: Documentation
.. |covenant-img| image:: https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg
:target: CODE_OF_CONDUCT.md
:alt: Contributor Covenant
.. image:: https://community-tc.services.mozilla.com/api/github/v1/repository/mozilla/DeepSpeech/master/badge.svg
:target: https://community-tc.services.mozilla.com/api/github/v1/repository/mozilla/DeepSpeech/master/latest
:alt: Task Status
.. |gitter-img| image:: https://badges.gitter.im/coqui-ai/STT.svg
:target: https://gitter.im/coqui-ai/STT?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge
:alt: Gitter Room
.. |doi| image:: https://zenodo.org/badge/344354127.svg
:target: https://zenodo.org/badge/latestdoi/344354127
|doc-img| |covenant-img| |gitter-img| |doi|
`👉 Subscribe to 🐸Coqui's Newsletter <https://coqui.ai/?subscription=true>`_
**Coqui STT** (🐸STT) is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. 🐸STT is battle tested in both production and research 🚀
🐸STT features
---------------
* High-quality pre-trained STT model.
* Efficient training pipeline with Multi-GPU support.
* Streaming inference.
* Multiple possible transcripts, each with an associated confidence score.
* Real-time inference.
* Small-footprint acoustic model.
* Bindings for various programming languages.
Where to Ask Questions
----------------------
.. list-table::
:widths: 25 25
:header-rows: 1
* - Type
- Link
* - 🚨 **Bug Reports**
- `Github Issue Tracker <https://github.com/coqui-ai/STT/issues/>`_
* - 🎁 **Feature Requests & Ideas**
- `Github Issue Tracker <https://github.com/coqui-ai/STT/issues/>`_
* - ❔ **Questions**
- `Github Discussions <https://github.com/coqui-ai/stt/discussions/>`_
* - 💬 **General Discussion**
- `Github Discussions <https://github.com/coqui-ai/stt/discussions/>`_ or `Gitter Room <https://gitter.im/coqui-ai/STT?utm_source=share-link&utm_medium=link&utm_campaign=share-link>`_
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
Links & Resources
-----------------
.. list-table::
:widths: 25 25
:header-rows: 1
Documentation for installation, usage, and training models are available on `deepspeech.readthedocs.io <http://deepspeech.readthedocs.io/?badge=latest>`_.
For the latest release, including pre-trained models and checkpoints, `see the latest release on GitHub <https://github.com/mozilla/DeepSpeech/releases/latest>`_.
For contribution guidelines, see `CONTRIBUTING.rst <CONTRIBUTING.rst>`_.
For contact and support information, see `SUPPORT.rst <SUPPORT.rst>`_.
* - Type
- Link
* - 📰 **Documentation**
- `stt.readthedocs.io <https://stt.readthedocs.io/>`_
* - 🚀 **Latest release with pre-trained models**
- `see the latest release on GitHub <https://github.com/coqui-ai/STT/releases/latest>`_
* - 🤝 **Contribution Guidelines**
- `CONTRIBUTING.rst <CONTRIBUTING.rst>`_

View File

@ -1,12 +0,0 @@
Making a (new) release of the codebase
======================================
* Update version in VERSION file, commit
* Open PR, ensure all tests are passing properly
* Merge the PR
* Fetch the new master, tag it with (hopefully) the same version as in VERSION
* Push that to Github
* New build should be triggered and new packages should be made
* TaskCluster should schedule a merge build **including** a "DeepSpeech Packages" task

95
RELEASE_NOTES.md Normal file
View File

@ -0,0 +1,95 @@
# General
This is the 1.0.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with [semantic versioning](https://semver.org/), this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the inference APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Android. You can get started today with Coqui STT 1.0.0 by following the steps in our [documentation](https://stt.readthedocs.io/).
This release includes pre-trained English models, available in the Coqui Model Zoo:
- [Coqui English STT v1.0.0-huge-vocab](https://coqui.ai/english/coqui/v1.0.0-huge-vocab)
- [Coqui English STT v1.0.0-yesno](https://coqui.ai/english/coqui/v1.0.0-yesno)
- [Coqui English STT v1.0.0-large-vocab](https://coqui.ai/english/coqui/v1.0.0-large-vocab)
- [Coqui English STT v1.0.0-digits](https://coqui.ai/english/coqui/v1.0.0-digits)
all under the Apache 2.0 license.
The acoustic models were trained on American English data with synthetic noise augmentation. The model achieves a 4.5% word error rate on the [LibriSpeech clean test corpus](http://www.openslr.org/12) and 13.6% word error rate on the [LibriSpeech other test corpus](http://www.openslr.org/12) with the largest release language model.
Note that the model currently performs best in low-noise environments with clear recordings. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to further fine tune the model to meet their intended use-case.
We also include example audio files:
[audio-1.0.0.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.0.0/audio-1.0.0.tar.gz)
which can be used to test the engine, and checkpoint files for the English model:
[coqui-stt-1.0.0-checkpoint.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.0.0/coqui-stt-1.0.0-checkpoint.tar.gz)
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
[v1.0.0.tar.gz](https://github.com/coqui-ai/STT/archive/v1.0.0.tar.gz)
Under the [MPL-2.0 license](https://www.mozilla.org/en-US/MPL/2.0/). Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our [documentation](https://stt.readthedocs.io/).
# Notable changes
- Removed support for protocol buffer input in native client and consolidated all packages under a single "STT" name accepting TFLite inputs
- Added programmatic interface to training code and example Jupyter Notebooks, including how to train with Common Voice data
- Added transparent handling of mixed sample rates and stereo audio in training inputs
- Moved CI setup to GitHub Actions, making code contributions easier to test
- Added configuration management via Coqpit, providing a more flexible config interface that's compatible with Coqui TTS
- Handle Opus audio files transparently in training inputs
- Added support for automatic dataset subset splitting
- Added support for automatic alphabet generation and loading
- Started publishing the training code CI for a faster notebook setup
- Refactor training code into self-contained modules and deprecate train.py as universal entry point for training
# Training Regimen + Hyperparameters for fine-tuning
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 NVIDIA A100 GPUs each with 40GB of VRAM), along with the full training hyperparameters. The full training configuration in JSON format is available [here](https://gist.github.com/reuben/6ced6a8b41e3d0849dafb7cae301e905).
The datasets used were:
- Common Voice 7.0 (with custom train/dev/test splits)
- Multilingual LibriSpeech (English, Opus)
- LibriSpeech
The optimal `lm_alpha` and `lm_beta` values with respect to the Common Voice 7.0 (custom Coqui splits) and a large vocabulary language model:
- lm_alpha: 0.5891777425167632
- lm_beta: 0.6619145283338659
# Documentation
Documentation is available on [stt.readthedocs.io](https://stt.readthedocs.io/).
# Contact/Getting Help
1. [GitHub Discussions](https://github.com/coqui-ai/STT/discussions/) - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
3. [Gitter](https://gitter.im/coqui-ai/) - You can also join our Gitter chat.
4. [Issues](https://github.com/coqui-ai/STT/issues) - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!
# Contributors to 1.0.0 release
- Alexandre Lissy
- Anon-Artist
- Anton Yaroshenko
- Catalin Voss
- CatalinVoss
- dag7dev
- Dustin Zubke
- Eren Gölge
- Erik Ziegler
- Francis Tyers
- Ideefixze
- Ilnar Salimzianov
- imrahul3610
- Jeremiah Rose
- Josh Meyer
- Kathy Reid
- Kelly Davis
- Kenneth Heafield
- NanoNabla
- Neil Stoker
- Reuben Morais
- zaptrem
Wed also like to thank all the members of our [Gitter chat room](https://gitter.im/coqui-ai/STT) who have been helping to shape this release!

View File

@ -1,12 +0,0 @@
.. _support:
Contact/Getting Help
====================
There are several ways to contact us or to get help:
#. `Discourse Forums <https://discourse.mozilla.org/c/deep-speech>`_ - The `Deep Speech category on Discourse <https://discourse.mozilla.org/c/deep-speech>`_ is the first place to look. Search for keywords related to your question or problem to see if someone else has run into it already. If you can't find anything relevant there, search on our `issue tracker <https://github.com/mozilla/deepspeech/issues>`_ to see if there is an existing issue about your problem.
#. `Matrix chat <https://chat.mozilla.org/#/room/#machinelearning:mozilla.org>`_ - If your question is not addressed by either the `FAQ <https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions>`_ or `Discourse Forums <https://discourse.mozilla.org/c/deep-speech>`_\ , you can contact us on the ``#machinelearning`` channel on `Mozilla Matrix <https://chat.mozilla.org/#/room/#machinelearning:mozilla.org>`_\ ; people there can try to answer/help
#. `Create a new issue <https://github.com/mozilla/deepspeech/issues>`_ - Finally, if you have a bug report or a feature request that isn't already covered by an existing issue, please open an issue in our repo and fill the appropriate information on your hardware and software setup.

View File

@ -1 +1 @@
training/deepspeech_training/VERSION
training/coqui_stt_training/VERSION

View File

@ -9,23 +9,23 @@ index c7aa4cb63..e084bc27c 100644
+import java.io.PrintWriter;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
@@ -73,6 +74,8 @@ public final class FileWriteAction extends AbstractFileWriteAction {
*/
private final CharSequence fileContents;
+ private final Artifact output;
+
/** Minimum length (in chars) for content to be eligible for compression. */
private static final int COMPRESS_CHARS_THRESHOLD = 256;
@@ -90,6 +93,7 @@ public final class FileWriteAction extends AbstractFileWriteAction {
fileContents = new CompressedString((String) fileContents);
}
this.fileContents = fileContents;
+ this.output = output;
}
/**
@@ -230,11 +234,32 @@ public final class FileWriteAction extends AbstractFileWriteAction {
*/
@ -59,7 +59,7 @@ index c7aa4cb63..e084bc27c 100644
+ computeKeyDebugWriter.close();
+ return rv;
}
/**
diff --git a/src/main/java/com/google/devtools/build/lib/analysis/actions/SpawnAction.java b/src/main/java/com/google/devtools/build/lib/analysis/actions/SpawnAction.java
index 580788160..26883eb92 100644
@ -74,9 +74,9 @@ index 580788160..26883eb92 100644
import java.util.Collections;
import java.util.LinkedHashMap;
@@ -91,6 +92,9 @@ public class SpawnAction extends AbstractAction implements ExecutionInfoSpecifie
private final CommandLine argv;
+ private final Iterable<Artifact> inputs;
+ private final Iterable<Artifact> outputs;
+
@ -91,10 +91,10 @@ index 580788160..26883eb92 100644
+ this.inputs = inputs;
+ this.outputs = outputs;
}
@Override
@@ -312,23 +319,89 @@ public class SpawnAction extends AbstractAction implements ExecutionInfoSpecifie
@Override
protected String computeKey() {
+ boolean genruleSetup = String.valueOf(Iterables.get(inputs, 0).getExecPath()).contains("genrule/genrule-setup.sh");
@ -182,14 +182,14 @@ index 580788160..26883eb92 100644
+ }
+ return rv;
}
@Override
diff --git a/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java b/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
index 3559fffde..3ba39617c 100644
--- a/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
+++ b/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
@@ -1111,10 +1111,30 @@ public class CppCompileAction extends AbstractAction
@Override
public String computeKey() {
+ // ".ckd" Compute Key Debug
@ -216,7 +216,7 @@ index 3559fffde..3ba39617c 100644
+ for (Map.Entry<String, String> entry : executionInfo.entrySet()) {
+ computeKeyDebugWriter.println("EXECINFO: " + entry.getKey() + "=" + entry.getValue());
+ }
// For the argv part of the cache key, ignore all compiler flags that explicitly denote module
// file (.pcm) inputs. Depending on input discovery, some of the unused ones are removed from
@@ -1124,6 +1144,9 @@ public class CppCompileAction extends AbstractAction
@ -226,7 +226,7 @@ index 3559fffde..3ba39617c 100644
+ for (String input : compileCommandLine.getArgv(getInternalOutputFile(), null)) {
+ computeKeyDebugWriter.println("COMMAND: " + input);
+ }
/*
* getArgv() above captures all changes which affect the compilation
@@ -1133,19 +1156,31 @@ public class CppCompileAction extends AbstractAction
@ -260,5 +260,5 @@ index 3559fffde..3ba39617c 100644
+ computeKeyDebugWriter.close();
+ return rv;
}
@Override

View File

@ -2,11 +2,12 @@
"""
Tool for comparing two wav samples
"""
import sys
import argparse
import sys
from deepspeech_training.util.audio import AUDIO_TYPE_NP, mean_dbfs
from deepspeech_training.util.sample_collections import load_sample
import numpy as np
from coqui_stt_training.util.audio import AUDIO_TYPE_NP, mean_dbfs
from coqui_stt_training.util.sample_collections import load_sample
def fail(message):
@ -15,18 +16,32 @@ def fail(message):
def compare_samples():
sample1 = load_sample(CLI_ARGS.sample1)
sample2 = load_sample(CLI_ARGS.sample2)
sample1 = load_sample(CLI_ARGS.sample1).unpack()
sample2 = load_sample(CLI_ARGS.sample2).unpack()
if sample1.audio_format != sample2.audio_format:
fail('Samples differ on: audio-format ({} and {})'.format(sample1.audio_format, sample2.audio_format))
if sample1.duration != sample2.duration:
fail('Samples differ on: duration ({} and {})'.format(sample1.duration, sample2.duration))
fail(
"Samples differ on: audio-format ({} and {})".format(
sample1.audio_format, sample2.audio_format
)
)
if abs(sample1.duration - sample2.duration) > 0.001:
fail(
"Samples differ on: duration ({} and {})".format(
sample1.duration, sample2.duration
)
)
sample1.change_audio_type(AUDIO_TYPE_NP)
sample2.change_audio_type(AUDIO_TYPE_NP)
audio_diff = sample1.audio - sample2.audio
samples = [sample1, sample2]
largest = np.argmax([sample1.audio.shape[0], sample2.audio.shape[0]])
smallest = (largest + 1) % 2
samples[largest].audio = samples[largest].audio[: len(samples[smallest].audio)]
audio_diff = samples[largest].audio - samples[smallest].audio
diff_dbfs = mean_dbfs(audio_diff)
differ_msg = 'Samples differ on: sample data ({:0.2f} dB difference) '.format(diff_dbfs)
equal_msg = 'Samples are considered equal ({:0.2f} dB difference)'.format(diff_dbfs)
differ_msg = "Samples differ on: sample data ({:0.2f} dB difference) ".format(
diff_dbfs
)
equal_msg = "Samples are considered equal ({:0.2f} dB difference)".format(diff_dbfs)
if CLI_ARGS.if_differ:
if diff_dbfs <= CLI_ARGS.threshold:
fail(equal_msg)
@ -45,13 +60,17 @@ def handle_args():
)
parser.add_argument("sample1", help="Filename of sample 1 to compare")
parser.add_argument("sample2", help="Filename of sample 2 to compare")
parser.add_argument("--threshold", type=float, default=-60.0,
help="dB of sample deltas above which they are considered different")
parser.add_argument(
"--threshold",
type=float,
default=-60.0,
help="dB of sample deltas above which they are considered different",
)
parser.add_argument(
"--if-differ",
action="store_true",
help="If to succeed and return status code 0 on different signals and fail on equal ones (inverse check)."
"This will still fail on different formats or durations.",
"This will still fail on different formats or durations.",
)
parser.add_argument(
"--no-success-output",

View File

@ -1,111 +1,136 @@
#!/usr/bin/env python
'''
"""
Tool for building a combined SDB or CSV sample-set from other sets
Use 'python3 data_set_tool.py -h' for help
'''
import sys
"""
import argparse
import progressbar
import sys
from pathlib import Path
from deepspeech_training.util.audio import (
AUDIO_TYPE_PCM,
import progressbar
from coqui_stt_training.util.audio import (
AUDIO_TYPE_OPUS,
AUDIO_TYPE_PCM,
AUDIO_TYPE_WAV,
change_audio_types,
)
from deepspeech_training.util.downloader import SIMPLE_BAR
from deepspeech_training.util.sample_collections import (
from coqui_stt_training.util.augmentations import (
SampleAugmentation,
apply_sample_augmentations,
parse_augmentations,
)
from coqui_stt_training.util.downloader import SIMPLE_BAR
from coqui_stt_training.util.sample_collections import (
CSVWriter,
DirectSDBWriter,
TarWriter,
samples_from_sources,
)
from deepspeech_training.util.augmentations import (
parse_augmentations,
apply_sample_augmentations,
SampleAugmentation
)
AUDIO_TYPE_LOOKUP = {'wav': AUDIO_TYPE_WAV, 'opus': AUDIO_TYPE_OPUS}
AUDIO_TYPE_LOOKUP = {"wav": AUDIO_TYPE_WAV, "opus": AUDIO_TYPE_OPUS}
def build_data_set():
audio_type = AUDIO_TYPE_LOOKUP[CLI_ARGS.audio_type]
augmentations = parse_augmentations(CLI_ARGS.augment)
print(f"Parsed augmentations from flags: {augmentations}")
if any(not isinstance(a, SampleAugmentation) for a in augmentations):
print('Warning: Some of the specified augmentations will not get applied, as this tool only supports '
'overlay, codec, reverb, resample and volume.')
print(
"Warning: Some of the specified augmentations will not get applied, as this tool only supports "
"overlay, codec, reverb, resample and volume."
)
extension = Path(CLI_ARGS.target).suffix.lower()
labeled = not CLI_ARGS.unlabeled
if extension == '.csv':
writer = CSVWriter(CLI_ARGS.target, absolute_paths=CLI_ARGS.absolute_paths, labeled=labeled)
elif extension == '.sdb':
writer = DirectSDBWriter(CLI_ARGS.target, audio_type=audio_type, labeled=labeled)
if extension == ".csv":
writer = CSVWriter(
CLI_ARGS.target, absolute_paths=CLI_ARGS.absolute_paths, labeled=labeled
)
elif extension == ".sdb":
writer = DirectSDBWriter(
CLI_ARGS.target, audio_type=audio_type, labeled=labeled
)
elif extension == ".tar":
writer = TarWriter(
CLI_ARGS.target, labeled=labeled, gz=False, include=CLI_ARGS.include
)
elif extension == ".tgz" or CLI_ARGS.target.lower().endswith(".tar.gz"):
writer = TarWriter(
CLI_ARGS.target, labeled=labeled, gz=True, include=CLI_ARGS.include
)
else:
print('Unknown extension of target file - has to be either .csv or .sdb')
print(
"Unknown extension of target file - has to be either .csv, .sdb, .tar, .tar.gz or .tgz"
)
sys.exit(1)
with writer:
samples = samples_from_sources(CLI_ARGS.sources, labeled=not CLI_ARGS.unlabeled)
num_samples = len(samples)
if augmentations:
samples = apply_sample_augmentations(samples, audio_type=AUDIO_TYPE_PCM, augmentations=augmentations)
samples = apply_sample_augmentations(
samples, audio_type=AUDIO_TYPE_PCM, augmentations=augmentations
)
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for sample in bar(change_audio_types(
for sample in bar(
change_audio_types(
samples,
audio_type=audio_type,
bitrate=CLI_ARGS.bitrate,
processes=CLI_ARGS.workers)):
processes=CLI_ARGS.workers,
)
):
writer.add(sample)
def handle_args():
parser = argparse.ArgumentParser(
description='Tool for building a combined SDB or CSV sample-set from other sets'
description="Tool for building a combined SDB or CSV sample-set from other sets"
)
parser.add_argument(
'sources',
nargs='+',
help='Source CSV and/or SDB files - '
'Note: For getting a correctly ordered target set, source SDBs have to have their samples '
'already ordered from shortest to longest.',
"sources",
nargs="+",
help="Source CSV and/or SDB files - "
"Note: For getting a correctly ordered target set, source SDBs have to have their samples "
"already ordered from shortest to longest.",
)
parser.add_argument("target", help="SDB, CSV or TAR(.gz) file to create")
parser.add_argument(
'target',
help='SDB or CSV file to create'
)
parser.add_argument(
'--audio-type',
default='opus',
"--audio-type",
default="opus",
choices=AUDIO_TYPE_LOOKUP.keys(),
help='Audio representation inside target SDB',
help="Audio representation inside target SDB",
)
parser.add_argument(
'--bitrate',
"--bitrate",
type=int,
help='Bitrate for lossy compressed SDB samples like in case of --audio-type opus',
help="Bitrate for lossy compressed SDB samples like in case of --audio-type opus",
)
parser.add_argument(
'--workers', type=int, default=None, help='Number of encoding SDB workers'
"--workers", type=int, default=None, help="Number of encoding SDB workers"
)
parser.add_argument(
'--unlabeled',
action='store_true',
help='If to build an SDB with unlabeled (audio only) samples - '
'typically used for building noise augmentation corpora',
"--unlabeled",
action="store_true",
help="If to build an data-set with unlabeled (audio only) samples - "
"typically used for building noise augmentation corpora",
)
parser.add_argument(
'--absolute-paths',
action='store_true',
help='If to reference samples by their absolute paths when writing CSV files',
"--absolute-paths",
action="store_true",
help="If to reference samples by their absolute paths when writing CSV files",
)
parser.add_argument(
'--augment',
action='append',
help='Add an augmentation operation',
"--augment",
action="append",
help="Add an augmentation operation",
)
parser.add_argument(
"--include",
action="append",
help="Adds a file to the root directory of .tar(.gz) targets",
)
return parser.parse_args()
if __name__ == '__main__':
if __name__ == "__main__":
CLI_ARGS = handle_args()
build_data_set()

View File

@ -4,8 +4,7 @@ import os
import tarfile
import pandas
from deepspeech_training.util.importers import get_importers_parser
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -4,8 +4,7 @@ import os
import tarfile
import pandas
from deepspeech_training.util.importers import get_importers_parser
from coqui_stt_training.util.importers import get_importers_parser
COLUMNNAMES = ["wav_filename", "wav_filesize", "transcript"]

750
bin/import_ccpmf.py Executable file
View File

@ -0,0 +1,750 @@
#!/usr/bin/env python
"""
Importer for dataset published from Centre de Conférence Pierre Mendès-France
Ministère de l'Économie, des Finances et de la Relance
"""
import csv
import decimal
import hashlib
import math
import os
import re
import subprocess
import sys
import unicodedata
import xml.etree.ElementTree as ET
import zipfile
from glob import glob
from multiprocessing import Pool
import progressbar
import sox
try:
from num2words import num2words
except ImportError as ex:
print("pip install num2words")
sys.exit(1)
import json
import requests
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.helpers import secs_to_hours
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
CHANNELS = 1
BIT_DEPTH = 16
MAX_SECS = 10
MIN_SECS = 0.85
DATASET_RELEASE_CSV = "https://data.economie.gouv.fr/explore/dataset/transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_for_header=true&csv_separator=%3B"
DATASET_RELEASE_SHA = [
(
"863d39a06a388c6491c6ff2f6450b151f38f1b57",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.001",
),
(
"2f3a0305aa04c61220bb00b5a4e553e45dbf12e1",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.002",
),
(
"5e55e9f1f844097349188ac875947e5a3d7fe9f1",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.003",
),
(
"8bf54842cf07948ca5915e27a8bd5fa5139c06ae",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.004",
),
(
"c8963504aadc015ac48f9af80058a0bb3440b94f",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.005",
),
(
"d95e225e908621d83ce4e9795fd108d9d310e244",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.006",
),
(
"de6ed9c2b0ee80ca879aae8ba7923cc93217d811",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.007",
),
(
"234283c47dacfcd4450d836c52c25f3e807fc5f2",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.008",
),
(
"4e6b67a688639bb72f8cd81782eaba604a8d32a6",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.009",
),
(
"4165a51389777c8af8e6253d87bdacb877e8b3b0",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.010",
),
(
"34322e7009780d97ef5bd02bf2f2c7a31f00baff",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.011",
),
(
"48c5be3b2ca9d6108d525da6a03e91d93a95dbac",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.012",
),
(
"87573172f506a189c2ebc633856fe11a2e9cd213",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.013",
),
(
"6ab2c9e508e9278d5129f023e018725c4a7c69e8",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.014",
),
(
"4f84df831ef46dce5d3ab3e21817687a2d8c12d0",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.015",
),
(
"e69bfb079885c299cb81080ef88b1b8b57158aa6",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.016",
),
(
"5f764ba788ee273981cf211b242c29b49ca22c5e",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.017",
),
(
"b6aa81a959525363223494830c1e7307d4c4bae6",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.018",
),
(
"91ddcf43c7bf113a6f2528b857c7ec22a50a148a",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.019",
),
(
"fa1b29273dd77b9a7494983a2f9ae52654b931d7",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.020",
),
(
"1113aef4f5e2be2f7fbf2d54b6c710c1c0e7135f",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.021",
),
(
"ce6420d5d0b6b5135ba559f83e1a82d4d615c470",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.022",
),
(
"d0976ed292ac24fcf1590d1ea195077c74b05471",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.023",
),
(
"ec746cd6af066f62d9bf8d3b2f89174783ff4e3c",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.024",
),
(
"570d9e1e84178e32fd867171d4b3aaecda1fd4fb",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.025",
),
(
"c29ccc7467a75b2cae3d7f2e9fbbb2ab276cb8ac",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.026",
),
(
"08406a51146d88e208704ce058c060a1e44efa50",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.027",
),
(
"199aedad733a78ea1e7d47def9c71c6fd5795e02",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.028",
),
(
"db856a068f92fb4f01f410bba42c7271de0f231a",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.029",
),
(
"e3c0135f16c6c9d25a09dcb4f99a685438a84740",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.030",
),
(
"e51b8bb9c0ae4339f98b4f21e6d29b825109f0ac",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.031",
),
(
"be5e80cbc49b59b31ae33c30576ef0e1a162d84e",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.032",
),
(
"501df58e3ff55fcfd75b93dab57566dc536948b8",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.033",
),
(
"1a114875811a8cdcb8d85a9f6dbee78be3e05131",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.034",
),
(
"465d824e7ee46448369182c0c28646d155a2249b",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.035",
),
(
"37f341b1b266d143eb73138c31cfff3201b9d619",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.036",
),
(
"9e7d8255987a8a77a90e0d4b55c8fd38b9fb5694",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.037",
),
(
"54886755630cb080a53098cb1b6c951c6714a143",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.038",
),
(
"4b7cbb0154697be795034f7a49712e882a97197a",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.039",
),
(
"c8e1e565a0e7a1f6ff1dbfcefe677aa74a41d2f2",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.040",
),
]
def _download_and_preprocess_data(csv_url, target_dir):
dataset_sources = os.path.join(
target_dir, "transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020", "data.txt"
)
if os.path.exists(dataset_sources):
return dataset_sources
# Making path absolute
target_dir = os.path.abspath(target_dir)
csv_ref = requests.get(csv_url).text.split("\r\n")[1:-1]
for part in csv_ref:
part_filename = (
requests.head(part)
.headers.get("Content-Disposition")
.split(" ")[1]
.split("=")[1]
.replace('"', "")
)
if not os.path.exists(os.path.join(target_dir, part_filename)):
part_path = maybe_download(part_filename, target_dir, part)
def _big_sha1(fname):
s = hashlib.sha1()
buffer_size = 65536
with open(fname, "rb") as f:
while True:
data = f.read(buffer_size)
if not data:
break
s.update(data)
return s.hexdigest()
for (sha1, filename) in DATASET_RELEASE_SHA:
print("Checking {} SHA1:".format(filename))
csum = _big_sha1(os.path.join(target_dir, filename))
if csum == sha1:
print("\t{}: OK {}".format(filename, sha1))
else:
print("\t{}: ERROR: expected {}, computed {}".format(filename, sha1, csum))
assert csum == sha1
# Conditionally extract data
_maybe_extract(
target_dir,
"transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip",
"transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020.zip",
)
# Produce source text for extraction / conversion
return _maybe_create_sources(
os.path.join(target_dir, "transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020")
)
def _maybe_extract(target_dir, extracted_data, archive, final):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = os.path.join(target_dir, extracted_data)
archive_path = os.path.join(target_dir, archive)
final_archive = os.path.join(extracted_path, final)
if not os.path.exists(extracted_path):
if not os.path.exists(archive_path):
print('No archive "%s" - building ...' % archive_path)
all_zip_parts = glob(archive_path + ".*")
all_zip_parts.sort()
cmdline = "cat {} > {}".format(" ".join(all_zip_parts), archive_path)
print('Building with "%s"' % cmdline)
subprocess.check_call(cmdline, shell=True, cwd=target_dir)
assert os.path.exists(archive_path)
print(
'No directory "%s" - extracting archive %s ...'
% (extracted_path, archive_path)
)
with zipfile.ZipFile(archive_path) as zip_f:
zip_f.extractall(extracted_path)
with zipfile.ZipFile(final_archive) as zip_f:
zip_f.extractall(target_dir)
else:
print('Found directory "%s" - not extracting it from archive.' % extracted_path)
def _maybe_create_sources(dir):
dataset_sources = os.path.join(dir, "data.txt")
MP3 = glob(os.path.join(dir, "**", "*.mp3"))
XML = glob(os.path.join(dir, "**", "*.xml"))
MP3_XML_Scores = []
MP3_XML_Fin = {}
for f_mp3 in MP3:
for f_xml in XML:
b_mp3 = os.path.splitext(os.path.basename(f_mp3))[0]
b_xml = os.path.splitext(os.path.basename(f_xml))[0]
a_mp3 = b_mp3.split("_")
a_xml = b_xml.split("_")
score = 0
date_mp3 = a_mp3[0]
date_xml = a_xml[0]
if date_mp3 != date_xml:
continue
for i in range(min(len(a_mp3), len(a_xml))):
if a_mp3[i] == a_xml[i]:
score += 1
if score >= 1:
MP3_XML_Scores.append((f_mp3, f_xml, score))
# sort by score
MP3_XML_Scores.sort(key=lambda x: x[2], reverse=True)
for s_mp3, s_xml, score in MP3_XML_Scores:
# print(s_mp3, s_xml, score)
if score not in MP3_XML_Fin:
MP3_XML_Fin[score] = {}
if s_mp3 not in MP3_XML_Fin[score]:
try:
MP3.index(s_mp3)
MP3.remove(s_mp3)
MP3_XML_Fin[score][s_mp3] = s_xml
except ValueError as ex:
pass
else:
print("here:", MP3_XML_Fin[score][s_mp3], s_xml, file=sys.stderr)
with open(dataset_sources, "w") as ds:
for score in MP3_XML_Fin:
for mp3 in MP3_XML_Fin[score]:
xml = MP3_XML_Fin[score][mp3]
if os.path.getsize(mp3) > 0 and os.path.getsize(xml) > 0:
mp3 = os.path.relpath(mp3, dir)
xml = os.path.relpath(xml, dir)
ds.write("{},{},{:0.2e}\n".format(xml, mp3, 2.5e-4))
else:
print("Empty file {} or {}".format(mp3, xml), file=sys.stderr)
print("Missing XML pairs:", MP3, file=sys.stderr)
return dataset_sources
def maybe_normalize_for_digits(label):
# first, try to identify numbers like "50 000", "260 000"
if " " in label:
if any(s.isdigit() for s in label):
thousands = re.compile(r"(\d{1,3}(?:\s*\d{3})*(?:,\d+)?)")
maybe_thousands = thousands.findall(label)
if len(maybe_thousands) > 0:
while True:
(label, r) = re.subn(r"(\d)\s(\d{3})", "\\1\\2", label)
if r == 0:
break
# this might be a time or duration in the form "hh:mm" or "hh:mm:ss"
if ":" in label:
for s in label.split(" "):
if any(i.isdigit() for i in s):
date_or_time = re.compile(r"(\d{1,2}):(\d{2}):?(\d{2})?")
maybe_date_or_time = date_or_time.findall(s)
if len(maybe_date_or_time) > 0:
maybe_hours = maybe_date_or_time[0][0]
maybe_minutes = maybe_date_or_time[0][1]
maybe_seconds = maybe_date_or_time[0][2]
if len(maybe_seconds) > 0:
label = label.replace(
"{}:{}:{}".format(
maybe_hours, maybe_minutes, maybe_seconds
),
"{} heures {} minutes et {} secondes".format(
maybe_hours, maybe_minutes, maybe_seconds
),
)
else:
label = label.replace(
"{}:{}".format(maybe_hours, maybe_minutes),
"{} heures et {} minutes".format(
maybe_hours, maybe_minutes
),
)
new_label = []
# pylint: disable=too-many-nested-blocks
for s in label.split(" "):
if any(i.isdigit() for i in s):
s = s.replace(",", ".") # num2words requires "." for floats
s = s.replace('"', "") # clean some data, num2words would choke on 1959"
last_c = s[-1]
if not last_c.isdigit(): # num2words will choke on "0.6.", "24 ?"
s = s[:-1]
if any(
i.isalpha() for i in s
): # So we have any(isdigit()) **and** any(sialpha), like "3D"
ns = []
for c in s:
nc = c
if c.isdigit(): # convert "3" to "trois-"
try:
nc = num2words(c, lang="fr") + "-"
except decimal.InvalidOperation as ex:
print("decimal.InvalidOperation: '{}'".format(s))
raise ex
ns.append(nc)
s = "".join(s)
else:
try:
s = num2words(s, lang="fr")
except decimal.InvalidOperation as ex:
print("decimal.InvalidOperation: '{}'".format(s))
raise ex
new_label.append(s)
return " ".join(new_label)
def maybe_normalize_for_specials_chars(label):
label = label.replace("%", "pourcents")
label = label.replace("/", ", ") # clean intervals like 2019/2022 to "2019 2022"
label = label.replace("-", ", ") # clean intervals like 70-80 to "70 80"
label = label.replace("+", " plus ") # clean + and make it speakable
label = label.replace("", " euros ") # clean euro symbol and make it speakable
label = label.replace(
"., ", ", "
) # clean some strange "4.0., " (20181017_Innovation.xml)
label = label.replace(
"°", " degré "
) # clean some strange "°5" (20181210_EtatsGeneraux-1000_fre_750_und.xml)
label = label.replace("...", ".") # remove ellipsis
label = label.replace("..", ".") # remove broken ellipsis
label = label.replace(
"", "mètre-carrés"
) # 20150616_Defi_Climat_3_wmv_0_fre_minefi.xml
label = label.replace(
"[end]", ""
) # broken tag in 20150123_Entretiens_Tresor_PGM_wmv_0_fre_minefi.xml
label = label.replace(
u"\xB8c", " ç"
) # strange cedilla in 20150417_Printemps_Economie_2_wmv_0_fre_minefi.xml
label = label.replace(
"C0²", "CO 2"
) # 20121016_Syteme_sante_copie_wmv_0_fre_minefi.xml
return label
def maybe_normalize_for_anglicisms(label):
label = label.replace("B2B", "B to B")
label = label.replace("B2C", "B to C")
label = label.replace("#", "hashtag ")
label = label.replace("@", "at ")
return label
def maybe_normalize(label):
label = maybe_normalize_for_specials_chars(label)
label = maybe_normalize_for_anglicisms(label)
label = maybe_normalize_for_digits(label)
return label
def one_sample(sample):
file_size = -1
frames = 0
audio_source = sample[0]
target_dir = sample[1]
dataset_basename = sample[2]
start_time = sample[3]
duration = sample[4]
label = label_filter_fun(sample[5])
sample_id = sample[6]
_wav_filename = os.path.basename(
audio_source.replace(".wav", "_{:06}.wav".format(sample_id))
)
wav_fullname = os.path.join(target_dir, dataset_basename, _wav_filename)
if not os.path.exists(wav_fullname):
subprocess.check_output(
[
"ffmpeg",
"-i",
audio_source,
"-ss",
str(start_time),
"-t",
str(duration),
"-c",
"copy",
wav_fullname,
],
stdin=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
)
file_size = os.path.getsize(wav_fullname)
frames = int(
subprocess.check_output(["soxi", "-s", wav_fullname], stderr=subprocess.STDOUT)
)
_counter = get_counter()
_rows = []
if file_size == -1:
# Excluding samples that failed upon conversion
_counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
_counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 10 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
_counter["too_short"] += 1
elif frames / SAMPLE_RATE < MIN_SECS:
# Excluding samples that are too short
_counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
_counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
_rows.append((os.path.join(dataset_basename, _wav_filename), file_size, label))
_counter["imported_time"] += frames
_counter["all"] += 1
_counter["total_time"] += frames
return (_counter, _rows)
def _maybe_import_data(xml_file, audio_source, target_dir, rel_tol=1e-1):
dataset_basename = os.path.splitext(os.path.split(xml_file)[1])[0]
wav_root = os.path.join(target_dir, dataset_basename)
if not os.path.exists(wav_root):
os.makedirs(wav_root)
source_frames = int(
subprocess.check_output(["soxi", "-s", audio_source], stderr=subprocess.STDOUT)
)
print("Source audio length: %s" % secs_to_hours(source_frames / SAMPLE_RATE))
# Get audiofile path and transcript for each sentence in tsv
samples = []
tree = ET.parse(xml_file)
root = tree.getroot()
seq_id = 0
this_time = 0.0
this_duration = 0.0
prev_time = 0.0
prev_duration = 0.0
this_text = ""
for child in root:
if child.tag == "row":
cur_time = float(child.attrib["timestamp"])
cur_duration = float(child.attrib["timedur"])
cur_text = child.text
if this_time == 0.0:
this_time = cur_time
delta = cur_time - (prev_time + prev_duration)
# rel_tol value is made from trial/error to try and compromise between:
# - cutting enough to skip missing words
# - not too short, not too long sentences
is_close = math.isclose(
cur_time, this_time + this_duration, rel_tol=rel_tol
)
is_short = (this_duration + cur_duration + delta) < MAX_SECS
# when the previous element is close enough **and** this does not
# go over MAX_SECS, we append content
if is_close and is_short:
this_duration += cur_duration + delta
this_text += cur_text
else:
samples.append(
(
audio_source,
target_dir,
dataset_basename,
this_time,
this_duration,
this_text,
seq_id,
)
)
this_time = cur_time
this_duration = cur_duration
this_text = cur_text
seq_id += 1
prev_time = cur_time
prev_duration = cur_duration
# Keep track of how many samples are good vs. problematic
_counter = get_counter()
num_samples = len(samples)
_rows = []
print("Processing XML data: {}".format(xml_file))
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
_counter += processed[0]
_rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
imported_samples = get_imported_samples(_counter)
assert _counter["all"] == num_samples
assert len(_rows) == imported_samples
print_import_report(_counter, SAMPLE_RATE, MAX_SECS)
print(
"Import efficiency: %.1f%%" % ((_counter["total_time"] / source_frames) * 100)
)
print("")
return _counter, _rows
def _maybe_convert_wav(mp3_filename, _wav_filename):
if not os.path.exists(_wav_filename):
print("Converting {} to WAV file: {}".format(mp3_filename, _wav_filename))
transformer = sox.Transformer()
transformer.convert(
samplerate=SAMPLE_RATE, n_channels=CHANNELS, bitdepth=BIT_DEPTH
)
try:
transformer.build(mp3_filename, _wav_filename)
except sox.core.SoxError:
pass
def write_general_csv(target_dir, _rows, _counter):
target_csv_template = os.path.join(target_dir, "ccpmf_{}.csv")
with open(target_csv_template.format("train"), "w") as train_csv_file: # 80%
with open(target_csv_template.format("dev"), "w") as dev_csv_file: # 10%
with open(target_csv_template.format("test"), "w") as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
dev_writer.writeheader()
test_writer = csv.DictWriter(test_csv_file, fieldnames=FIELDNAMES)
test_writer.writeheader()
bar = progressbar.ProgressBar(max_value=len(_rows), widgets=SIMPLE_BAR)
for i, item in enumerate(bar(_rows)):
i_mod = i % 10
if i_mod == 0:
writer = test_writer
elif i_mod == 1:
writer = dev_writer
else:
writer = train_writer
writer.writerow(
{
"wav_filename": item[0],
"wav_filesize": item[1],
"transcript": item[2],
}
)
print("")
print("~~~~ FINAL STATISTICS ~~~~")
print_import_report(_counter, SAMPLE_RATE, MAX_SECS)
print("~~~~ (FINAL STATISTICS) ~~~~")
print("")
if __name__ == "__main__":
PARSER = get_importers_parser(
description="Import XML from Conference Centre for Economics, France"
)
PARSER.add_argument("target_dir", help="Destination directory")
PARSER.add_argument(
"--filter_alphabet",
help="Exclude samples with characters not in provided alphabet",
)
PARSER.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
PARAMS = PARSER.parse_args()
validate_label = get_validate_label(PARAMS)
ALPHABET = Alphabet(PARAMS.filter_alphabet) if PARAMS.filter_alphabet else None
def label_filter_fun(label):
if PARAMS.normalize:
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = maybe_normalize(label)
label = validate_label(label)
if ALPHABET and label:
try:
ALPHABET.encode(label)
except KeyError:
label = None
return label
dataset_sources = _download_and_preprocess_data(
csv_url=DATASET_RELEASE_CSV, target_dir=PARAMS.target_dir
)
sources_root_dir = os.path.dirname(dataset_sources)
all_counter = get_counter()
all_rows = []
with open(dataset_sources, "r") as sources:
for line in sources.readlines():
d = line.split(",")
this_xml = os.path.join(sources_root_dir, d[0])
this_mp3 = os.path.join(sources_root_dir, d[1])
this_rel = float(d[2])
wav_filename = os.path.join(
sources_root_dir,
os.path.splitext(os.path.basename(this_mp3))[0] + ".wav",
)
_maybe_convert_wav(this_mp3, wav_filename)
counter, rows = _maybe_import_data(
this_xml, wav_filename, sources_root_dir, this_rel
)
all_counter += counter
all_rows += rows
write_general_csv(sources_root_dir, _counter=all_counter, _rows=all_rows)

View File

@ -2,20 +2,20 @@
import csv
import os
import subprocess
import sys
import tarfile
from glob import glob
from multiprocessing import Pool
import progressbar
import sox
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
print_import_report,
)
from deepspeech_training.util.importers import validate_label_eng as validate_label
from coqui_stt_training.util.importers import validate_label_eng as validate_label
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
@ -34,7 +34,7 @@ def _download_and_preprocess_data(target_dir):
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract common voice data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert common voice CSV files and mp3 data to DeepSpeech CSVs and wav
# Conditionally convert common voice CSV files and mp3 data to Coqui STT CSVs and wav
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)

View File

@ -3,7 +3,7 @@
Broadly speaking, this script takes the audio downloaded from Common Voice
for a certain language, in addition to the *.tsv files output by CorporaCreator,
and the script formats the data and transcripts to be in a state usable by
DeepSpeech.py
train.py
Use "python3 import_cv2.py -h" for help
"""
import csv
@ -14,19 +14,19 @@ from multiprocessing import Pool
import progressbar
import sox
from deepspeech_training.util.downloader import SIMPLE_BAR
from deepspeech_training.util.importers import (
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
from ds_ctcdecoder import Alphabet
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
CHANNELS = 1
MAX_SECS = 10
PARAMS = None
FILTER_OBJ = None
@ -40,7 +40,11 @@ class LabelFilter:
def filter(self, label):
if self.normalize:
label = unicodedata.normalize("NFKD", label.strip()).encode("ascii", "ignore").decode("ascii", "ignore")
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = self.validate_fun(label)
if self.alphabet and label and not self.alphabet.CanEncode(label):
label = None
@ -96,7 +100,15 @@ def one_sample(sample):
return (counter, rows)
def _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_every_character=None, rows=None, exclude=None):
def _maybe_convert_set(
dataset,
tsv_dir,
audio_dir,
filter_obj,
space_after_every_character=None,
rows=None,
exclude=None,
):
exclude_transcripts = set()
exclude_speakers = set()
if exclude is not None:
@ -115,7 +127,13 @@ def _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_ever
with open(input_tsv, encoding="utf-8") as input_tsv_file:
reader = csv.DictReader(input_tsv_file, delimiter="\t")
for row in reader:
samples.append((os.path.join(audio_dir, row["path"]), row["sentence"], row["client_id"]))
samples.append(
(
os.path.join(audio_dir, row["path"]),
row["sentence"],
row["client_id"],
)
)
counter = get_counter()
num_samples = len(samples)
@ -123,7 +141,9 @@ def _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_ever
print("Importing mp3 files...")
pool = Pool(initializer=init_worker, initargs=(PARAMS,))
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
for i, processed in enumerate(
pool.imap_unordered(one_sample, samples), start=1
):
counter += processed[0]
rows += processed[1]
bar.update(i)
@ -137,9 +157,9 @@ def _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_ever
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
output_csv = os.path.join(os.path.abspath(audio_dir), dataset + ".csv")
print("Saving new DeepSpeech-formatted CSV file to: ", output_csv)
print("Saving new Coqui STT-formatted CSV file to: ", output_csv)
with open(output_csv, "w", encoding="utf-8", newline="") as output_csv_file:
print("Writing CSV file for DeepSpeech.py as: ", output_csv)
print("Writing CSV file for train.py as: ", output_csv)
writer = csv.DictWriter(output_csv_file, fieldnames=FIELDNAMES)
writer.writeheader()
bar = progressbar.ProgressBar(max_value=len(rows), widgets=SIMPLE_BAR)
@ -168,18 +188,26 @@ def _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_ever
def _preprocess_data(tsv_dir, audio_dir, space_after_every_character=False):
exclude = []
for dataset in ["test", "dev", "train", "validated", "other"]:
set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
set_samples = _maybe_convert_set(
dataset, tsv_dir, audio_dir, space_after_every_character
)
if dataset in ["test", "dev"]:
exclude += set_samples
if dataset == "validated":
_maybe_convert_set("train-all", tsv_dir, audio_dir, space_after_every_character,
rows=set_samples, exclude=exclude)
_maybe_convert_set(
"train-all",
tsv_dir,
audio_dir,
space_after_every_character,
rows=set_samples,
exclude=exclude,
)
def _maybe_convert_wav(mp3_filename, wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE)
transformer.convert(samplerate=SAMPLE_RATE, n_channels=CHANNELS)
try:
transformer.build(mp3_filename, wav_filename)
except sox.core.SoxError:
@ -211,7 +239,9 @@ def parse_args():
def main():
audio_dir = PARAMS.audio_dir if PARAMS.audio_dir else os.path.join(PARAMS.tsv_dir, "clips")
audio_dir = (
PARAMS.audio_dir if PARAMS.audio_dir else os.path.join(PARAMS.tsv_dir, "clips")
)
_preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)

View File

@ -2,6 +2,7 @@
import codecs
import fnmatch
import os
import random
import subprocess
import sys
import unicodedata
@ -9,8 +10,7 @@ import unicodedata
import librosa
import pandas
import soundfile # <= Has an external dependency on libsndfile
from deepspeech_training.util.importers import validate_label_eng as validate_label
from coqui_stt_training.util.importers import validate_label_eng as validate_label
# Prerequisite: Having the sph2pipe tool in your PATH:
# https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools
@ -236,14 +236,18 @@ def _split_and_resample_wav(origAudio, start_time, stop_time, new_wav_file):
def _split_sets(filelist):
# We initially split the entire set into 80% train and 20% test, then
# split the train set into 80% train and 20% validation.
train_beg = 0
train_end = int(0.8 * len(filelist))
"""
randomply split the datasets into train, validation, and test sets where the size of the
validation and test sets are determined by the `get_sample_size` function.
"""
random.shuffle(filelist)
sample_size = get_sample_size(len(filelist))
dev_beg = int(0.8 * train_end)
dev_end = train_end
train_end = dev_beg
train_beg = 0
train_end = len(filelist) - 2 * sample_size
dev_beg = train_end
dev_end = train_end + sample_size
test_beg = dev_end
test_end = len(filelist)
@ -255,5 +259,24 @@ def _split_sets(filelist):
)
def get_sample_size(population_size):
"""calculates the sample size for a 99% confidence and 1% margin of error"""
margin_of_error = 0.01
fraction_picking = 0.50
z_score = 2.58 # Corresponds to confidence level 99%
numerator = (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2
)
sample_size = 0
for train_size in range(population_size, 0, -1):
denominator = 1 + (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2 * train_size
)
sample_size = int(numerator / denominator)
if 2 * sample_size + train_size <= population_size:
break
return sample_size
if __name__ == "__main__":
_download_and_preprocess_data(sys.argv[1])

View File

@ -5,8 +5,7 @@ import tarfile
import numpy as np
import pandas
from deepspeech_training.util.importers import get_importers_parser
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -9,10 +9,9 @@ import urllib
from pathlib import Path
import pandas as pd
from sox import Transformer
import swifter
from deepspeech_training.util.importers import get_importers_parser, get_validate_label
from coqui_stt_training.util.importers import get_importers_parser, get_validate_label
from sox import Transformer
__version__ = "0.1.0"
_logger = logging.getLogger(__name__)

View File

@ -3,8 +3,7 @@ import os
import sys
import pandas
from deepspeech_training.util.downloader import maybe_download
from coqui_stt_training.util.downloader import maybe_download
def _download_and_preprocess_data(data_dir):

View File

@ -9,11 +9,10 @@ import unicodedata
import pandas
import progressbar
from coqui_stt_training.util.downloader import maybe_download
from sox import Transformer
from tensorflow.python.platform import gfile
from deepspeech_training.util.downloader import maybe_download
SAMPLE_RATE = 16000

View File

@ -11,16 +11,15 @@ from multiprocessing import Pool
import progressbar
import sox
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
from ds_ctcdecoder import Alphabet
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
@ -137,9 +136,15 @@ def _maybe_convert_sets(target_dir, extracted_data):
pool.close()
pool.join()
with open(target_csv_template.format("train"), "w", encoding="utf-8", newline="") as train_csv_file: # 80%
with open(target_csv_template.format("dev"), "w", encoding="utf-8", newline="") as dev_csv_file: # 10%
with open(target_csv_template.format("test"), "w", encoding="utf-8", newline="") as test_csv_file: # 10%
with open(
target_csv_template.format("train"), "w", encoding="utf-8", newline=""
) as train_csv_file: # 80%
with open(
target_csv_template.format("dev"), "w", encoding="utf-8", newline=""
) as dev_csv_file: # 10%
with open(
target_csv_template.format("test"), "w", encoding="utf-8", newline=""
) as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
@ -179,7 +184,9 @@ def _maybe_convert_sets(target_dir, extracted_data):
def _maybe_convert_wav(ogg_filename, wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE, n_channels=N_CHANNELS, bitdepth=BITDEPTH)
transformer.convert(
samplerate=SAMPLE_RATE, n_channels=N_CHANNELS, bitdepth=BITDEPTH
)
try:
transformer.build(ogg_filename, wav_filename)
except sox.core.SoxError as ex:

View File

@ -9,16 +9,15 @@ from glob import glob
from multiprocessing import Pool
import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
from ds_ctcdecoder import Alphabet
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
@ -60,9 +59,20 @@ def one_sample(sample):
file_size = -1
frames = 0
if os.path.exists(wav_filename):
tmp_filename = os.path.splitext(wav_filename)[0]+'.tmp.wav'
tmp_filename = os.path.splitext(wav_filename)[0] + ".tmp.wav"
subprocess.check_call(
['sox', wav_filename, '-r', str(SAMPLE_RATE), '-c', '1', '-b', '16', tmp_filename], stderr=subprocess.STDOUT
[
"sox",
wav_filename,
"-r",
str(SAMPLE_RATE),
"-c",
"1",
"-b",
"16",
tmp_filename,
],
stderr=subprocess.STDOUT,
)
os.rename(tmp_filename, wav_filename)
file_size = os.path.getsize(wav_filename)
@ -138,9 +148,15 @@ def _maybe_convert_sets(target_dir, extracted_data):
pool.close()
pool.join()
with open(target_csv_template.format("train"), "w", encoding="utf-8", newline="") as train_csv_file: # 80%
with open(target_csv_template.format("dev"), "w", encoding="utf-8", newline="") as dev_csv_file: # 10%
with open(target_csv_template.format("test"), "w", encoding="utf-8", newline="") as test_csv_file: # 10%
with open(
target_csv_template.format("train"), "w", encoding="utf-8", newline=""
) as train_csv_file: # 80%
with open(
target_csv_template.format("dev"), "w", encoding="utf-8", newline=""
) as dev_csv_file: # 10%
with open(
target_csv_template.format("test"), "w", encoding="utf-8", newline=""
) as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)

View File

@ -5,8 +5,7 @@ import tarfile
import wave
import pandas
from deepspeech_training.util.importers import get_importers_parser
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

99
bin/import_mls_english.py Normal file
View File

@ -0,0 +1,99 @@
#!/usr/bin/env python
import argparse
import ctypes
import os
from pathlib import Path
import pandas
from tqdm import tqdm
def read_ogg_opus_duration(ogg_file_path):
error = ctypes.c_int()
opusfile = pyogg.opus.op_open_file(
ogg_file_path.encode("utf-8"), ctypes.pointer(error)
)
if error.value != 0:
raise ValueError(
("Ogg/Opus file could not be read." "Error code: {}").format(error.value)
)
pcm_buffer_size = pyogg.opus.op_pcm_total(opusfile, -1)
channel_count = pyogg.opus.op_channel_count(opusfile, -1)
sample_rate = 48000 # opus files are always 48kHz
sample_width = 2 # always 16-bit
pyogg.opus.op_free(opusfile)
return pcm_buffer_size / sample_rate
def main(root_dir):
for subset in (
"train",
"dev",
"test",
):
print("Processing {} subset...".format(subset))
with open(Path(root_dir) / subset / "transcripts.txt") as fin:
subset_entries = []
for i, line in tqdm(enumerate(fin)):
audio_id, transcript = line.split("\t")
audio_id_parts = audio_id.split("_")
# e.g. 4800_10003_000000 -> train/audio/4800/10003/4800_10003_000000.opus
audio_path = (
Path(root_dir)
/ subset
/ "audio"
/ audio_id_parts[0]
/ audio_id_parts[1]
/ "{}.opus".format(audio_id)
)
audio_duration = read_ogg_opus_duration(audio_path)
# TODO: support other languages
transcript = (
transcript.strip()
.replace("-", " ")
.replace("ñ", "n")
.replace(".", "")
.translate(
{
ord(ch): None
for ch in (
"а",
"в",
"е",
"и",
"к",
"м",
"н",
"о",
"п",
"р",
"т",
"ы",
"я",
)
}
)
)
subset_entries.append(
(
audio_path.relative_to(root_dir),
audio_duration,
transcript.strip(),
)
)
df = pandas.DataFrame(
columns=["wav_filename", "wav_filesize", "transcript"],
data=subset_entries,
)
csv_name = Path(root_dir) / "{}.csv".format(subset)
df.to_csv(csv_name, index=False)
print("Wrote {}".format(csv_name))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("root_dir", help="Path to the mls_english_opus directory.")
args = parser.parse_args()
main(args.root_dir)

View File

@ -6,8 +6,7 @@ import tarfile
import numpy as np
import pandas
from deepspeech_training.util.importers import get_importers_parser
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]

View File

@ -8,16 +8,15 @@ from glob import glob
from multiprocessing import Pool
import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
from ds_ctcdecoder import Alphabet
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
@ -157,9 +156,15 @@ def _maybe_convert_sets(target_dir, extracted_data):
pool.close()
pool.join()
with open(target_csv_template.format("train"), "w", encoding="utf-8", newline="") as train_csv_file: # 80%
with open(target_csv_template.format("dev"), "w", encoding="utf-8", newline="") as dev_csv_file: # 10%
with open(target_csv_template.format("test"), "w", encoding="utf-8", newline="") as test_csv_file: # 10%
with open(
target_csv_template.format("train"), "w", encoding="utf-8", newline=""
) as train_csv_file: # 80%
with open(
target_csv_template.format("dev"), "w", encoding="utf-8", newline=""
) as dev_csv_file: # 10%
with open(
target_csv_template.format("test"), "w", encoding="utf-8", newline=""
) as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)

View File

@ -1,10 +1,11 @@
#!/usr/bin/env python
# ensure that you have downloaded the LDC dataset LDC97S62 and tar exists in a folder e.g.
# ./data/swb/swb1_LDC97S62.tgz
# from the deepspeech directory run with: ./bin/import_swb.py ./data/swb/
# from the Coqui STT directory run with: ./bin/import_swb.py ./data/swb/
import codecs
import fnmatch
import os
import random
import subprocess
import sys
import tarfile
@ -15,8 +16,7 @@ import librosa
import pandas
import requests
import soundfile # <= Has an external dependency on libsndfile
from deepspeech_training.util.importers import validate_label_eng as validate_label
from coqui_stt_training.util.importers import validate_label_eng as validate_label
# ARCHIVE_NAME refers to ISIP alignments from 01/29/03
ARCHIVE_NAME = "switchboard_word_alignments.tar.gz"
@ -43,7 +43,7 @@ def maybe_download(archive_url, target_dir, ldc_dataset):
ldc_path = archive_url + ldc_dataset
if not os.path.exists(target_dir):
print('No path "%s" - creating ...' % target_dir)
makedirs(target_dir)
os.makedirs(target_dir)
if not os.path.exists(archive_path):
print('No archive "%s" - downloading...' % archive_path)
@ -290,14 +290,18 @@ def _split_wav(origAudio, start_time, stop_time, new_wav_file):
def _split_sets(filelist):
# We initially split the entire set into 80% train and 20% test, then
# split the train set into 80% train and 20% validation.
train_beg = 0
train_end = int(0.8 * len(filelist))
"""
randomply split the datasets into train, validation, and test sets where the size of the
validation and test sets are determined by the `get_sample_size` function.
"""
random.shuffle(filelist)
sample_size = get_sample_size(len(filelist))
dev_beg = int(0.8 * train_end)
dev_end = train_end
train_end = dev_beg
train_beg = 0
train_end = len(filelist) - 2 * sample_size
dev_beg = train_end
dev_end = train_end + sample_size
test_beg = dev_end
test_end = len(filelist)
@ -309,6 +313,25 @@ def _split_sets(filelist):
)
def get_sample_size(population_size):
"""calculates the sample size for a 99% confidence and 1% margin of error"""
margin_of_error = 0.01
fraction_picking = 0.50
z_score = 2.58 # Corresponds to confidence level 99%
numerator = (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2
)
sample_size = 0
for train_size in range(population_size, 0, -1):
denominator = 1 + (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2 * train_size
)
sample_size = int(numerator / denominator)
if 2 * sample_size + train_size <= population_size:
break
return sample_size
def _read_data_set(
filelist,
thread_count,

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python
"""
Downloads and prepares (parts of) the "Spoken Wikipedia Corpora" for DeepSpeech.py
Downloads and prepares (parts of) the "Spoken Wikipedia Corpora" for train.py
Use "python3 import_swc.py -h" for help
"""
@ -21,10 +21,9 @@ from multiprocessing.pool import ThreadPool
import progressbar
import sox
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import validate_label_eng as validate_label
from ds_ctcdecoder import Alphabet
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import validate_label_eng as validate_label
SWC_URL = "https://www2.informatik.uni-hamburg.de/nats/pub/SWC/SWC_{language}.tar"
SWC_ARCHIVE = "SWC_{language}.tar"
@ -173,7 +172,6 @@ def in_alphabet(alphabet, c):
return alphabet.CanEncode(c) if alphabet else True
ALPHABETS = {}
@ -202,8 +200,16 @@ def label_filter(label, language):
dont_normalize = DONT_NORMALIZE[language] if language in DONT_NORMALIZE else ""
alphabet = get_alphabet(language)
for c in label:
if CLI_ARGS.normalize and c not in dont_normalize and not in_alphabet(alphabet, c):
c = unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii", "ignore")
if (
CLI_ARGS.normalize
and c not in dont_normalize
and not in_alphabet(alphabet, c)
):
c = (
unicodedata.normalize("NFKD", c)
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
for sc in c:
if not in_alphabet(alphabet, sc):
return None, "illegal character"

View File

@ -7,12 +7,11 @@ from glob import glob
from os import makedirs, path, remove, rmdir
import pandas
from coqui_stt_training.util.downloader import maybe_download
from coqui_stt_training.util.stm import parse_stm_file
from sox import Transformer
from tensorflow.python.platform import gfile
from deepspeech_training.util.downloader import maybe_download
from deepspeech_training.util.stm import parse_stm_file
def _download_and_preprocess_data(data_dir):
# Conditionally download data

View File

@ -1,214 +0,0 @@
#!/usr/bin/env python3
import csv
import os
import re
import subprocess
import zipfile
from multiprocessing import Pool
import progressbar
import sox
import unidecode
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
MAX_SECS = 15
ARCHIVE_NAME = "2019-04-11_fr_FR"
ARCHIVE_DIR_NAME = "ts_" + ARCHIVE_NAME
ARCHIVE_URL = (
"https://deepspeech-storage-mirror.s3.fr-par.scw.cloud/" + ARCHIVE_NAME + ".zip"
)
def _download_and_preprocess_data(target_dir, english_compatible=False):
# Making path absolute
target_dir = os.path.abspath(target_dir)
# Conditionally download data
archive_path = maybe_download(
"ts_" + ARCHIVE_NAME + ".zip", target_dir, ARCHIVE_URL
)
# Conditionally extract archive data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert TrainingSpeech data to DeepSpeech CSVs and wav
_maybe_convert_sets(
target_dir, ARCHIVE_DIR_NAME, english_compatible=english_compatible
)
def _maybe_extract(target_dir, extracted_data, archive_path):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = os.path.join(target_dir, extracted_data)
if not os.path.exists(extracted_path):
print('No directory "%s" - extracting archive...' % extracted_path)
if not os.path.isdir(extracted_path):
os.mkdir(extracted_path)
with zipfile.ZipFile(archive_path) as zip_f:
zip_f.extractall(extracted_path)
else:
print('Found directory "%s" - not extracting it from archive.' % archive_path)
def one_sample(sample):
""" Take a audio file, and optionally convert it to 16kHz WAV """
orig_filename = sample["path"]
# Storing wav files next to the wav ones - just with a different suffix
wav_filename = os.path.splitext(orig_filename)[0] + ".converted.wav"
_maybe_convert_wav(orig_filename, wav_filename)
file_size = -1
frames = 0
if os.path.exists(wav_filename):
file_size = os.path.getsize(wav_filename)
frames = int(
subprocess.check_output(
["soxi", "-s", wav_filename], stderr=subprocess.STDOUT
)
)
label = sample["text"]
rows = []
# Keep track of how many samples are good vs. problematic
counter = get_counter()
if file_size == -1:
# Excluding samples that failed upon conversion
counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 10 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter["imported_time"] += frames
counter["all"] += 1
counter["total_time"] += frames
return (counter, rows)
def _maybe_convert_sets(target_dir, extracted_data, english_compatible=False):
extracted_dir = os.path.join(target_dir, extracted_data)
# override existing CSV with normalized one
target_csv_template = os.path.join(target_dir, "ts_" + ARCHIVE_NAME + "_{}.csv")
if os.path.isfile(target_csv_template):
return
path_to_original_csv = os.path.join(extracted_dir, "data.csv")
with open(path_to_original_csv) as csv_f:
data = [
d
for d in csv.DictReader(csv_f, delimiter=",")
if float(d["duration"]) <= MAX_SECS
]
for line in data:
line["path"] = os.path.join(extracted_dir, line["path"])
num_samples = len(data)
rows = []
counter = get_counter()
print("Importing {} wav files...".format(num_samples))
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, data), start=1):
counter += processed[0]
rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
with open(target_csv_template.format("train"), "w", encoding="utf-8", newline="") as train_csv_file: # 80%
with open(target_csv_template.format("dev"), "w", encoding="utf-8", newline="") as dev_csv_file: # 10%
with open(target_csv_template.format("test"), "w", encoding="utf-8", newline="") as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
dev_writer.writeheader()
test_writer = csv.DictWriter(test_csv_file, fieldnames=FIELDNAMES)
test_writer.writeheader()
for i, item in enumerate(rows):
transcript = validate_label(
cleanup_transcript(
item[2], english_compatible=english_compatible
)
)
if not transcript:
continue
wav_filename = os.path.join(target_dir, extracted_data, item[0])
i_mod = i % 10
if i_mod == 0:
writer = test_writer
elif i_mod == 1:
writer = dev_writer
else:
writer = train_writer
writer.writerow(
dict(
wav_filename=wav_filename,
wav_filesize=os.path.getsize(wav_filename),
transcript=transcript,
)
)
imported_samples = get_imported_samples(counter)
assert counter["all"] == num_samples
assert len(rows) == imported_samples
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
def _maybe_convert_wav(orig_filename, wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE)
try:
transformer.build(orig_filename, wav_filename)
except sox.core.SoxError as ex:
print("SoX processing error", ex, orig_filename, wav_filename)
PUNCTUATIONS_REG = re.compile(r"\-,;!?.()\[\]*…—]")
MULTIPLE_SPACES_REG = re.compile(r"\s{2,}")
def cleanup_transcript(text, english_compatible=False):
text = text.replace("", "'").replace("\u00A0", " ")
text = PUNCTUATIONS_REG.sub(" ", text)
text = MULTIPLE_SPACES_REG.sub(" ", text)
if english_compatible:
text = unidecode.unidecode(text)
return text.strip().lower()
def handle_args():
parser = get_importers_parser(description="Importer for TrainingSpeech dataset.")
parser.add_argument(dest="target_dir")
parser.add_argument(
"--english-compatible",
action="store_true",
dest="english_compatible",
help="Remove diactrics and other non-ascii chars.",
)
return parser.parse_args()
if __name__ == "__main__":
cli_args = handle_args()
validate_label = get_validate_label(cli_args)
_download_and_preprocess_data(cli_args.target_dir, cli_args.english_compatible)

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python
"""
Downloads and prepares (parts of) the "German Distant Speech" corpus (TUDA) for DeepSpeech.py
Downloads and prepares (parts of) the "German Distant Speech" corpus (TUDA) for train.py
Use "python3 import_tuda.py -h" for help
"""
import argparse
@ -13,10 +13,9 @@ import xml.etree.ElementTree as ET
from collections import Counter
import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import validate_label_eng as validate_label
from ds_ctcdecoder import Alphabet
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import validate_label_eng as validate_label
TUDA_VERSION = "v2"
TUDA_PACKAGE = "german-speechdata-package-{}".format(TUDA_VERSION)
@ -55,7 +54,11 @@ def check_and_prepare_sentence(sentence):
chars = []
for c in sentence:
if CLI_ARGS.normalize and c not in "äöüß" and not in_alphabet(c):
c = unicodedata.normalize("NFKD", c).encode("ascii", "ignore").decode("ascii", "ignore")
c = (
unicodedata.normalize("NFKD", c)
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
for sc in c:
if not in_alphabet(c):
return None
@ -118,7 +121,7 @@ def write_csvs(extracted):
sentence = list(meta.iter("cleaned_sentence"))[0].text
sentence = check_and_prepare_sentence(sentence)
if sentence is None:
reasons['alphabet filter'] += 1
reasons["alphabet filter"] += 1
continue
for wav_name in wav_names:
sample_counter += 1

View File

@ -10,9 +10,8 @@ from zipfile import ZipFile
import librosa
import progressbar
from deepspeech_training.util.downloader import SIMPLE_BAR, maybe_download
from deepspeech_training.util.importers import (
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
print_import_report,
@ -35,7 +34,7 @@ def _download_and_preprocess_data(target_dir):
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract common voice data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert common voice CSV files and mp3 data to DeepSpeech CSVs and wav
# Conditionally convert common voice CSV files and mp3 data to Coqui STT CSVs and wav
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)

View File

@ -2,6 +2,7 @@
import codecs
import os
import re
import sys
import tarfile
import threading
import unicodedata
@ -12,8 +13,8 @@ from os import makedirs, path
import pandas
from bs4 import BeautifulSoup
from coqui_stt_training.util.downloader import maybe_download
from tensorflow.python.platform import gfile
from deepspeech_training.util.downloader import maybe_download
"""The number of jobs to run in parallel"""
NUM_PARALLEL = 8

View File

@ -1,22 +1,34 @@
#!/usr/bin/env python
"""
Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) and DeepSpeech CSV files
Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) and 🐸STT CSV files
Use "python3 play.py -h" for help
"""
import os
import sys
import random
import argparse
import os
import random
import sys
from deepspeech_training.util.audio import LOADABLE_AUDIO_EXTENSIONS, AUDIO_TYPE_PCM, AUDIO_TYPE_WAV
from deepspeech_training.util.sample_collections import SampleList, LabeledSample, samples_from_source
from deepspeech_training.util.augmentations import parse_augmentations, apply_sample_augmentations, SampleAugmentation
from coqui_stt_training.util.audio import (
AUDIO_TYPE_PCM,
AUDIO_TYPE_WAV,
get_loadable_audio_type_from_extension,
)
from coqui_stt_training.util.augmentations import (
SampleAugmentation,
apply_sample_augmentations,
parse_augmentations,
)
from coqui_stt_training.util.sample_collections import (
LabeledSample,
SampleList,
samples_from_source,
)
def get_samples_in_play_order():
ext = os.path.splitext(CLI_ARGS.source)[1].lower()
if ext in LOADABLE_AUDIO_EXTENSIONS:
if get_loadable_audio_type_from_extension(ext):
samples = SampleList([(CLI_ARGS.source, 0)], labeled=False)
else:
samples = samples_from_source(CLI_ARGS.source, buffering=0)
@ -40,14 +52,17 @@ def get_samples_in_play_order():
def play_collection():
augmentations = parse_augmentations(CLI_ARGS.augment)
print(f"Parsed augmentations from flags: {augmentations}")
if any(not isinstance(a, SampleAugmentation) for a in augmentations):
print("Warning: Some of the augmentations cannot be simulated by this command.")
samples = get_samples_in_play_order()
samples = apply_sample_augmentations(samples,
audio_type=AUDIO_TYPE_PCM,
augmentations=augmentations,
process_ahead=0,
clock=CLI_ARGS.clock)
samples = apply_sample_augmentations(
samples,
audio_type=AUDIO_TYPE_PCM,
augmentations=augmentations,
process_ahead=0,
clock=CLI_ARGS.clock,
)
for sample in samples:
if not CLI_ARGS.quiet:
print('Sample "{}"'.format(sample.sample_id), file=sys.stderr)
@ -57,10 +72,12 @@ def play_collection():
sample.change_audio_type(AUDIO_TYPE_WAV)
sys.stdout.buffer.write(sample.audio.getvalue())
return
wave_obj = simpleaudio.WaveObject(sample.audio,
sample.audio_format.channels,
sample.audio_format.width,
sample.audio_format.rate)
wave_obj = simpleaudio.WaveObject(
sample.audio,
sample.audio_format.channels,
sample.audio_format.width,
sample.audio_format.rate,
)
play_obj = wave_obj.play()
play_obj.wait_done()
@ -68,9 +85,11 @@ def play_collection():
def handle_args():
parser = argparse.ArgumentParser(
description="Tool for playing (and augmenting) single samples or samples from Sample Databases (SDB files) "
"and DeepSpeech CSV files"
"and Coqui STT CSV files"
)
parser.add_argument(
"source", help="Sample DB, CSV or WAV file to play samples from"
)
parser.add_argument("source", help="Sample DB, CSV or WAV file to play samples from")
parser.add_argument(
"--start",
type=int,
@ -90,7 +109,7 @@ def handle_args():
)
parser.add_argument(
"--augment",
action='append',
action="append",
help="Add an augmentation operation",
)
parser.add_argument(
@ -98,8 +117,8 @@ def handle_args():
type=float,
default=0.5,
help="Simulates clock value used for augmentations during training."
"Ranges from 0.0 (representing parameter start values) to"
"1.0 (representing parameter end values)",
"Ranges from 0.0 (representing parameter start values) to"
"1.0 (representing parameter end values)",
)
parser.add_argument(
"--pipe",
@ -120,7 +139,9 @@ if __name__ == "__main__":
try:
import simpleaudio
except ModuleNotFoundError:
print('Unless using the --pipe flag, play.py requires Python package "simpleaudio" for playing samples')
print(
'Unless using the --pipe flag, play.py requires Python package "simpleaudio" for playing samples'
)
sys.exit(1)
try:
play_collection()

View File

@ -14,16 +14,17 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \
python -u train.py --alphabet_config_path "data/alphabet.txt" \
--show_progressbar false --early_stop false \
--train_files ${ldc93s1_csv} --train_batch_size 1 \
--scorer "" \
--augment dropout \
--augment pitch \
--augment tempo \
--augment warp \
--augment time_mask \
--augment frequency_mask \
--augment add \
--augment multiply \
pitch \
tempo \
warp \
time_mask \
frequency_mask \
add \
multiply \
--n_hidden 100 \
--epochs 1

View File

@ -14,7 +14,8 @@ fi;
# and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0
python -u DeepSpeech.py --noshow_progressbar --noearly_stop \
python -u train.py --alphabet_config_path "data/alphabet.txt" \
--show_progressbar false --early_stop false \
--train_files ${ldc93s1_csv} --train_batch_size 1 \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \
--test_files ${ldc93s1_csv} --test_batch_size 1 \

View File

@ -0,0 +1,31 @@
#!/bin/sh
set -xe
ldc93s1_dir="./data/smoke_test"
ldc93s1_csv="${ldc93s1_dir}/ldc93s1.csv"
if [ ! -f "${ldc93s1_dir}/ldc93s1.csv" ]; then
echo "Downloading and preprocessing LDC93S1 example data, saving in ${ldc93s1_dir}."
python -u bin/import_ldc93s1.py ${ldc93s1_dir}
fi;
# Force only one visible device because we have a single-sample dataset
# and when trying to run on multiple devices (like GPUs), this will break
export CUDA_VISIBLE_DEVICES=0
python -u train.py --show_progressbar false --early_stop false \
--train_files ${ldc93s1_csv} --train_batch_size 1 \
--dev_files ${ldc93s1_csv} --dev_batch_size 1 \
--test_files ${ldc93s1_csv} --test_batch_size 1 \
--n_hidden 100 --epochs 1 \
--max_to_keep 1 --checkpoint_dir '/tmp/ckpt_bytes' --bytes_output_mode true \
--learning_rate 0.001 --dropout_rate 0.05 \
--scorer_path 'data/smoke_test/pruned_lm.bytes.scorer' | tee /tmp/resume.log
if ! grep "Loading best validating checkpoint from" /tmp/resume.log; then
echo "Did not resume training from checkpoint"
exit 1
else
exit 0
fi

Some files were not shown because too many files have changed in this diff Show More