Compare commits

...

2194 Commits

Author SHA1 Message Date
003b399253 Fix up the RPi4 build 2021-12-04 16:28:48 +00:00
f008d10c49 Use tensorflow fork with rpi4ub-armv8 build target 2021-12-04 16:08:39 +00:00
0f698133aa Add an rpi4ub-armv8 build variant 2021-12-04 15:45:55 +00:00
8cea2cbfec Use my fork of the fork of tensorflow 2021-12-04 12:34:10 +00:00
Reuben Morais
dbd38c3a89
Merge pull request #2032 from coqui-ai/transcription-scripts-docs
[transcribe] Fix multiprocessing hangs, clean-up target collection, write docs
2021-12-03 16:46:48 +01:00
Reuben Morais
b43e710959 Docs for transcription with training package 2021-12-03 16:22:43 +01:00
Reuben Morais
ff24a8b917 Undo late-imports 2021-12-03 16:22:43 +01:00
Reuben Morais
479d963155 Set training pkg python-requires<3.8 (due to TF 1.15.4 limit) 2021-12-03 16:22:34 +01:00
Reuben Morais
d90bb60506 [transcribe] Fix multiprocessing hangs, clean-up target collection 2021-12-01 15:44:25 +01:00
Reuben Morais
5cefd7069c Use known paths for Scorer and Alphabet copy in export 2021-11-23 14:21:11 +01:00
Reuben Morais
154a67fb2c
Merge pull request #2026 from coqui-ai/save-scorer-alphabet-savedmodel
Save Scorer and alphabet with SavedModel exports
2021-11-19 20:07:32 +01:00
Reuben Morais
d6456ae4aa Save Scorer and alphabet with SavedModel exports 2021-11-19 19:40:00 +01:00
Reuben Morais
3020949075
Merge pull request #2025 from coqui-ai/various-fixes
Docs fixes, SavedModel export, transcribe.py revival
2021-11-19 16:10:20 +01:00
Reuben Morais
efdaa61e2c Revive transcribe.py
Update to use Coqpit based config handling, fix multiprocesing setup, and add CI coverage.
2021-11-19 13:57:44 +01:00
Reuben Morais
419b15b72a Allow exporting as SavedModel 2021-11-18 13:48:52 +01:00
Reuben Morais
6a9bd1e6b6 Add usage instructions for C API 2021-11-17 14:07:22 +01:00
Reuben Morais
922d668155
Merge pull request #2024 from juliandarley/fix-shlex-typo
Fix typo in client.py - shlex in line 17
2021-11-17 13:05:30 +01:00
Julian Darley
8ed0a827de Fix typo in clienty.py - shlex in line 17 2021-11-17 08:33:32 +00:00
Reuben Morais
11c2edb068
Merge pull request #2018 from coqui-ai/node-electron-version-bump
Update NodeJS and ElectronJS build/test versions to supported releases
2021-11-12 22:47:41 +01:00
Reuben Morais
e7c28ca3c9 Remove outdated comment in supported platforms doc [skip ci] 2021-11-12 22:47:15 +01:00
Reuben Morais
2af6f8da89 Explicitly name TF build cache destination file
GitHub's API has stopped sending the artifact name as the file name, so we ended up with a file matching the artifact ID.
Name the full file path explicitly so there's no room for changes.
2021-11-12 22:47:02 +01:00
Reuben Morais
a5c981bb48 Update NodeJS and ElectronJS build/test versions to supported releases 2021-11-12 22:47:02 +01:00
Reuben Morais
23af8bd095 Bump version to v1.1.0-alpha.1 2021-10-31 21:42:00 +01:00
Reuben Morais
2b955fc70f
Merge pull request #2004 from coqui-ai/flashlight-docs
Improve decoder docs and include in RTD
2021-10-31 21:40:04 +01:00
Reuben Morais
90feb63894 Improve decoder package docs and include in RTD 2021-10-31 20:12:37 +01:00
Reuben Morais
91f1307de4 Pin docutils version as 0.18 release breaks build
Build breaks when writing output for AUGMENTATION.rst with error:

AttributeError: 'Values' object has no attribute 'section_self_link'
2021-10-31 16:46:03 +01:00
Reuben Morais
3d1e3ed3ba Don't include RELEASE_NOTES for pre-releases [skip ci] 2021-10-30 17:31:04 +02:00
Reuben Morais
9a2c2028c7 Bump version to v1.1.0-alpha.0 2021-10-30 17:24:05 +02:00
Reuben Morais
6ef733be54
Merge pull request #2001 from coqui-ai/decoder-flashlight
Expose Flashlight LexiconDecoder/LexiconFreeDecoder in decoder package
2021-10-30 17:19:41 +02:00
Reuben Morais
a61180aeae Fix Flashlight multiplatform build 2021-10-30 16:23:44 +02:00
Reuben Morais
391036643c debug 2021-10-30 16:23:44 +02:00
Reuben Morais
04f62ac9f7 Exercise training graph inference/Flashlight decoder in extra training tests 2021-10-30 14:59:32 +02:00
Reuben Morais
755fb81a62 Expose Flashlight LexiconDecoder/LexiconFreeDecoder 2021-10-30 14:59:32 +02:00
Reuben Morais
5f2ff85fe8
Merge pull request #1977 from Legion2/patch-1
fixed duplicate deallocation of stream in Swift STTStream
2021-10-30 10:19:17 +02:00
Reuben Morais
489e49f698
Merge pull request #1990 from JRMeyer/evaluate_tflite
Update evaluate_tflite.py script for Coqpit
2021-10-30 10:18:56 +02:00
Reuben Morais
65e66117e2
Merge pull request #1998 from coqui-ai/aar-pack-deps
Package dynamic deps in AAR
2021-10-29 20:48:43 +02:00
Reuben Morais
a726351341 Bump Windows TF build cache due to worker upgrade 2021-10-29 20:05:22 +02:00
Reuben Morais
d753431d11 Fix build on Windows after internal GitHub Actions MSYS2 changes 2021-10-29 20:05:22 +02:00
Reuben Morais
83b40b2532 Rehost PCRE package to avoid external outages interrupting CI 2021-10-25 11:03:19 +02:00
Reuben Morais
1f7b43f94e Package libkenlm.so, libtensorflowlite.so and libtflitedelegates.so in AAR 2021-10-25 11:03:19 +02:00
Reuben Morais
5ff8d11393 Use export beam width by default in evaluation 2021-10-13 13:36:30 +02:00
Josh Meyer
157ce340b6 Update evaluate_tflite.py script for Coqpit 2021-10-07 14:46:03 -04:00
Reuben Morais
27584037f8 Bump version to v1.0.0 2021-10-04 16:30:39 +02:00
Reuben Morais
29e980473f Docs changes for 1.0.0 2021-10-04 16:30:39 +02:00
Reuben Morais
0b36745338 Bump version to v0.10.0-alpha.29 2021-10-02 14:18:39 +02:00
Reuben Morais
c6a91dad2a Fix permissions for Docker push, tagging of prereleases 2021-10-02 14:18:26 +02:00
Reuben Morais
1233fc7b71 Bump version to v0.10.0-alpha.28 2021-10-02 13:47:04 +02:00
Reuben Morais
bd45ecf56e Centralized handling of git tag/VERSION checks 2021-10-02 13:46:52 +02:00
Reuben Morais
a4faa4475a Bump version to v0.10.0-alpha.27 2021-10-02 13:40:28 +02:00
Reuben Morais
18812376dc
Merge pull request #1981 from coqui-ai/aar-publish
AAR build+publish
2021-10-02 13:39:28 +02:00
Reuben Morais
62effd9acb AAR build+publish 2021-10-02 13:38:46 +02:00
Reuben Morais
8a64ed2a1e Bump version to v0.10.0-alpha.26 2021-09-28 13:58:00 +02:00
Reuben Morais
178cdacf5e
Merge pull request #1980 from coqui-ai/docker-kenlm-base
Build KenLM in same base as final image
2021-09-28 13:57:17 +02:00
Reuben Morais
0b60e4dbbb Build KenLM in same base as final img 2021-09-28 13:38:11 +02:00
Leon Kiefer
fab1bbad73
fixed duplicate deallocation of stream
streamCtx must be unset after STT_FreeStream was called in STT_FinishStreamWithMetadata, else STT_FreeStream is called again on destruction of STTStream resulting in EXC_BAD_ACCESS errors
2021-09-26 12:56:28 +02:00
Reuben Morais
5691d4e053
Merge pull request #1975 from coqui-ai/android-builds
Android builds
2021-09-22 13:09:26 +02:00
Reuben Morais
c536d1bd01 Add Android build tasks 2021-09-22 11:55:19 +02:00
Reuben Morais
d4091badf9 Rename host-build action to libstt-build 2021-09-21 18:09:51 +02:00
Reuben Morais
1d75af5ab4 Fix and improve build instructions for Android and RPi 2021-09-21 18:09:51 +02:00
Reuben Morais
8bd5dac837 Declare delegate dependencies on Android 2021-09-21 18:09:51 +02:00
Josh Meyer
46bae2f3fc
Fix typo in link to colab 2021-09-16 06:59:45 -04:00
Josh Meyer
df67678220
Merge pull request #1966 from JRMeyer/cv-notebook
Python notebook for training on Common Voice
2021-09-16 06:51:30 -04:00
Reuben Morais
fd719ac013
Merge pull request #1968 from coqui-ai/rehost-sox-win
Rehost SoX Windows package to avoid Sourceforge outages
2021-09-16 11:58:17 +02:00
Reuben Morais
4861557a03 Rehost SoX Windows package to avoid Sourceforge outages 2021-09-16 11:22:32 +02:00
Reuben Morais
835d657648
Merge pull request #1967 from coqui-ai/batch-shuffling
Add support for shuffling batches after N epochs (Fixes #1901)
2021-09-16 11:05:06 +02:00
Reuben Morais
72599be9d4 Add support for shuffling batches after N epochs 2021-09-16 10:40:27 +02:00
Josh Meyer
7cbe879fc6 Use python 3.7, not 3.8 2021-09-16 03:36:32 -04:00
Josh Meyer
8cfc1163e2 Add checkout action 2021-09-16 03:28:11 -04:00
Josh Meyer
c78f98a7bc Add separate job to CI for notebook tests 2021-09-16 03:19:09 -04:00
Josh Meyer
90d4e43c58
Use sudo for installing opus things 2021-09-15 12:22:05 -04:00
Josh Meyer
be7500c8b7
Fix Typo 2021-09-15 12:04:34 -04:00
Josh Meyer
1a55ce8078
Add missing opus tools to CI 2021-09-15 12:03:30 -04:00
Josh Meyer
242d2eff2c
Add missing jupyter install in CI 2021-09-15 10:09:05 -04:00
Josh Meyer
56d1282642 Merge branch 'cv-notebook' of github.com:JRMeyer/STT into cv-notebook 2021-09-15 09:20:16 -04:00
Josh Meyer
bd7809421d Add notebooks to CI workflow 2021-09-15 09:19:46 -04:00
Josh Meyer
5e1e810102
Merge branch 'coqui-ai:main' into cv-notebook 2021-09-15 09:09:00 -04:00
Josh Meyer
6405bd1758 Add CI tests for notebooks 2021-09-15 09:08:12 -04:00
Josh Meyer
8a3cea8b6d Cosmetic changes 2021-09-15 07:57:53 -04:00
Josh Meyer
f6a64e7dd8 Typo 2021-09-15 07:19:35 -04:00
Josh Meyer
cbd3db9d28 Cosmetic notebook changes 2021-09-15 07:16:08 -04:00
Josh Meyer
2729da33a8 More notebook work 2021-09-15 06:54:25 -04:00
Reuben Morais
feeb2a222d Bump version to v0.10.0-alpha.25 2021-09-15 11:09:54 +02:00
Reuben Morais
76267ebdff Rename libstt and native_client archives when publishing on GitHub Actions 2021-09-15 11:08:20 +02:00
Josh Meyer
0e8920ed63
Use table to organize notebooks 2021-09-15 04:09:00 -04:00
Josh Meyer
903c2b4aca
Install STT from pypi in notebook 2021-09-15 04:03:27 -04:00
Josh Meyer
5201c2a10c
Install STT from pypi in notebook 2021-09-15 04:02:08 -04:00
Josh Meyer
7085fd3ed3 Add notebook for CV 2021-09-14 11:50:35 -04:00
Reuben Morais
ef8825f5f6 Bump version to v0.10.0-alpha.24 2021-09-14 13:06:22 +02:00
Reuben Morais
4744d0c9e4 Separate brace expansion into two upload calls 2021-09-14 13:05:51 +02:00
Reuben Morais
93e743d171 Bump version to v0.10.0-alpha.23 2021-09-14 12:47:36 +02:00
Reuben Morais
e0e5b0391c Don't overwrite asset_name for multiple files 2021-09-14 12:47:25 +02:00
Reuben Morais
7a20c9bd90 Bump version to v0.10.0-alpha.22 2021-09-14 12:34:10 +02:00
Reuben Morais
473d1a8e4f Fix filename when uploading multiple assets, upload training package 2021-09-14 12:33:54 +02:00
Reuben Morais
92aff6a8ef Bump version to v0.10.0-alpha.21 2021-09-14 12:22:10 +02:00
Reuben Morais
220cc8ab15 Fix escaping of variable when creating new release 2021-09-14 12:21:56 +02:00
Reuben Morais
39e57b522a Bump version to v0.10.0-alpha.20 2021-09-14 12:15:29 +02:00
Reuben Morais
2e588bd0b8 Fix GitHub upload logic for multiple assets 2021-09-14 12:15:19 +02:00
Reuben Morais
abc0399fdb Bump version to v0.10.0-alpha.19 2021-09-14 11:47:34 +02:00
Reuben Morais
26b578c1c7 Checkout source for upload-release-asset action 2021-09-14 11:47:10 +02:00
Reuben Morais
6f4a3c1200 Bump version to v0.10.0-alpha.18 2021-09-14 09:48:58 +02:00
Reuben Morais
e8a5e91151 Fix syntax errors in tag scripts 2021-09-14 09:41:35 +02:00
Reuben Morais
810164d679 Bump version to v0.10.0-alpha.16 2021-09-13 18:35:03 +02:00
Reuben Morais
aed43cc988 Fix syntax error in GitHub Release asset upload task 2021-09-13 18:34:48 +02:00
Reuben Morais
d437ecc69f Bump version to v0.10.0-alpha.15 2021-09-13 17:53:35 +02:00
Josh Meyer
ba581501f4
Merge pull request #1965 from JRMeyer/notebooks
Fix notebook syntax after train.py was split into parts
2021-09-13 09:31:14 -04:00
Josh Meyer
638874e925 Fix notebook syntax after train.py was split into parts 2021-09-13 09:29:55 -04:00
Reuben Morais
b5e8ebb943 Fix quickstart docs [skip ci] 2021-09-13 13:08:14 +02:00
Reuben Morais
822019bf05
Merge pull request #1961 from coqui-ai/upload-wheels-release
Upload built artifacts to GitHub releases
2021-09-09 18:50:39 +02:00
Reuben Morais
01c992caef Upload built artifacts to GitHub releases 2021-09-09 18:13:18 +02:00
Reuben Morais
97a2cb21ee
Merge pull request #1960 from coqui-ai/fix-dockerfile-build
Fix Dockerfile.build build after TFLite changes
2021-09-08 12:17:01 +02:00
Reuben Morais
e6d5a0ca8d Fix linter error [skip ci] 2021-09-08 12:16:25 +02:00
Reuben Morais
738874fb6f Fix Dockerfile.build build after TFLite changes 2021-09-08 12:00:11 +02:00
Reuben Morais
28f107fb96
Merge pull request #1956 from jeremiahrose/build-local-source
Fix #1955 Use local source instead of redownloading in Dockerfile.build
2021-09-08 11:11:39 +02:00
Jeremiah Rose
0917206827 Update Dockerfile.build documentation in DEPLOYMENT.rst 2021-09-08 10:20:24 +10:00
Reuben Morais
909b343ce0 Fix header logo scaling 2021-09-07 22:03:09 +02:00
Reuben Morais
a51cc78a3b git add missing logo image [skip ci] 2021-09-07 18:30:48 +02:00
Reuben Morais
083a9e1ecc Add logo and wordmark to docs [skip ci] 2021-09-07 18:28:52 +02:00
Reuben Morais
6635668eb3
Merge pull request #1951 from coqui-ai/docs-pass
Documentation cleanup pass to match recent changes
2021-09-07 10:15:46 +02:00
Jeremiah Rose
d85187aa44 Use local source instead of redownloading in Dockerfile.build 2021-09-07 09:45:31 +10:00
Reuben Morais
186bb63b57 Documentation cleanup pass to match recent changes 2021-08-27 14:24:23 +02:00
Reuben Morais
6214816e26 Merge branch 'publish-training-code' (Fixes #1950) 2021-08-27 13:12:14 +02:00
Reuben Morais
eb19d271fd Publish training package on PyPI 2021-08-27 13:11:27 +02:00
Reuben Morais
f94d16bcc3
Merge pull request #1948 from coqui-ai/remove-exception-box
Remove ExceptionBox and remember_exception
2021-08-26 21:13:18 +02:00
Reuben Morais
33c2190015 Remove ExceptionBox and remember_exception
TensorFlow already handles surfacing dataset exceptions internally.
2021-08-26 19:58:17 +02:00
Reuben Morais
497c828dd7
Merge pull request #1947 from coqui-ai/dataset-split
Automatic dataset split/alphabet generation
2021-08-26 19:45:06 +02:00
Reuben Morais
412de47623 Introduce --auto_input_dataset flag for input formatting
Automatically split data into sets and generate alphabet.
2021-08-26 18:03:32 +02:00
Reuben Morais
8458352255 Disable side-effects when importing train/evaluate scripts 2021-08-26 15:24:11 +02:00
Reuben Morais
b62fa678e6 Remove dead code 2021-08-26 12:00:15 +02:00
Reuben Morais
07ed417627 Bump version to v0.10.0-alpha.14 2021-08-26 10:57:27 +02:00
Reuben Morais
66b8a56454
Merge pull request #1945 from coqui-ai/alphabet-loading-generation
Convenience features for alphabet loading/saving/generation
2021-08-25 20:35:09 +02:00
Reuben Morais
02adea2d50 Generate and save alphabet automatically if dataset is fully specified 2021-08-25 19:39:05 +02:00
Reuben Morais
2b5a844c05 Load alphabet alongside checkpoint if present, some config fixes/cleanup 2021-08-25 19:39:03 +02:00
Reuben Morais
87f0a371b1 Serialize alphabet alongside checkpoint 2021-08-25 19:38:30 +02:00
Reuben Morais
5afe3c6e59
Merge pull request #1946 from coqui-ai/training-submodules
Split train.py into separate modules
2021-08-25 19:37:53 +02:00
Reuben Morais
2fd98de56f Split train.py into separate modules
Currently train.py is overloaded with many independent features.
Understanding the code and what will be the result of a training
call requires untangling the entire script. It's also an error
prone UX. This is a first step at separating independent parts
into their own scripts.
2021-08-25 18:57:30 +02:00
Reuben Morais
71da178138
Merge pull request #1942 from coqui-ai/nc-api-boundary
Python training API cleanup, mark nodes known by native client
2021-08-23 12:57:08 +02:00
Reuben Morais
3dff38ab3d Point to newer native_client build with lower glibc requirement [skip ci] 2021-08-20 16:35:25 +02:00
Reuben Morais
80a109b04e Pin Advanced Training Topics to docs sidebar [skip ci] (Fixes #1893) 2021-08-19 18:51:19 +02:00
Reuben Morais
fb2691ad70 Fix link to training with CV data in Playbook [skip ci] (Fixes #1932) 2021-08-19 18:44:58 +02:00
Reuben Morais
4c3537952a Fix lm_optimizer.py to use new Config/flags/logging setup 2021-08-19 18:42:07 +02:00
Reuben Morais
f9556d2236 Add comments marking nodes with names/shapes known by native client 2021-08-19 18:33:48 +02:00
Reuben Morais
f90408d3ab Move early_training_checks to train function 2021-08-19 18:33:32 +02:00
Reuben Morais
ad7335db0e Fix docs code listing for flags [skip ci] 2021-08-19 18:25:08 +02:00
Reuben Morais
392f4dbb25 Merge branch 'downgrade-docker-train-base' (Fixes #1941) 2021-08-19 18:22:28 +02:00
Reuben Morais
3995ec62c5 Bump Windows TF build cache due to upgraded MSVC 2021-08-19 18:22:18 +02:00
Reuben Morais
2936c72c08 Build and publish Docker train image on tag 2021-08-19 18:22:18 +02:00
Reuben Morais
32b44c5447 Downgrade training Dockerfile base image to one that has TFLite support
See https://github.com/NVIDIA/tensorflow/issues/16
2021-08-19 11:27:37 +02:00
Reuben Morais
4fc60bf5e9
Merge pull request #1938 from coqui-ai/non-quantized-export
Non quantized export + Better error message on missing alphabet
2021-08-12 11:29:21 +02:00
Reuben Morais
f71e32735f Add a more explicit error message when alphabet is not specified 2021-08-06 16:55:42 +02:00
Reuben Morais
3cff3dd0de Add an --export_quantize flag to control TFLite export quantization 2021-08-06 16:52:55 +02:00
Reuben Morais
285b524299
Merge pull request #1931 from coqui-ai/pull_request_template 2021-08-03 15:28:45 +02:00
Reuben Morais
c3cc7aae2e Bump version to v0.10.0-alpha.13 2021-08-02 21:08:00 +02:00
Reuben Morais
b5db9b2f41 Merge branch 'npm-publish' (Fixes #1930) 2021-08-02 21:07:28 +02:00
Reuben Morais
f3df9b16d5 Publish Node package on npmjs.com on tags 2021-08-02 20:42:12 +02:00
kdavis-coqui
d2bcbcc6b7 Added CLA info to pull request template 2021-08-02 17:45:19 +02:00
Reuben Morais
800ddae12f Bump version to v0.10.0-alpha.12 2021-08-01 23:56:46 +02:00
Reuben Morais
5a5db45c7e
Merge pull request #1923 from coqui-ai/tf-libstt-manylinux
Build TensorFlow+libstt+Python packages in manylinux_2_24 containers
2021-08-01 23:56:23 +02:00
Reuben Morais
9d44e2f506 Disable wrapping of struct ctors to workaround NodeJS 16.6 ABI break 2021-08-01 23:25:23 +02:00
Reuben Morais
1a423a4c8d Force plat name in favor of auditwheel for Python packages
Auditwheel can't properly deal with the shared libraries we bundle
and ends up copying some of them, ending up with a package with
duplicated images.
2021-08-01 23:00:57 +02:00
Reuben Morais
8f0b759103 Build TensorFlow+libstt+Py pkgs on manylinux_2_24 2021-08-01 23:00:57 +02:00
Josh Meyer
90a067df49
Merge pull request #1926 from JRMeyer/progressbar-to-tqdm
Change progressbar to tqdm
2021-07-30 17:18:47 -04:00
Josh Meyer
b77d33a108
Merge pull request #1927 from JRMeyer/tfv1-moving
Move tfv1 calls inside high-level functions
2021-07-30 17:18:25 -04:00
Josh Meyer
256af35a61 Move tfv1 calls inside high-level functions 2021-07-30 13:25:36 -04:00
Josh Meyer
da23122cca Add SIMPLE_BAR for other scripts 2021-07-30 13:09:14 -04:00
Josh Meyer
fb2d99e9e0 Change progressbar to tqdm 2021-07-30 12:52:09 -04:00
Josh Meyer
df26eca4d2
Merge pull request #1920 from JRMeyer/transfer-learning-notebook
Fix config checkpoint handling and add notebook
2021-07-30 10:15:12 -04:00
Josh Meyer
1e79b8703d checkpoint_dir always overrides {save,load}_checkpoint_dir 2021-07-30 07:57:11 -04:00
Josh Meyer
979dccdf58 Fix typo 2021-07-30 06:48:33 -04:00
Josh Meyer
aeeb2549b1 Install after git clone 2021-07-30 06:48:33 -04:00
Josh Meyer
6c26f616ba Fix typo in notebooks 2021-07-30 06:48:33 -04:00
Josh Meyer
4eb5d7814a Fix typo in notebooks 2021-07-30 06:48:33 -04:00
Josh Meyer
bc1839baf4 Add cell to install STT in notebooks 2021-07-30 06:48:33 -04:00
Josh Meyer
4f14420a25 Fix broken colab button 2021-07-30 06:48:33 -04:00
Josh Meyer
f4f0c1dba9 Add README with Colab links 2021-07-30 06:48:33 -04:00
Josh Meyer
e95c8fe0b0 Fix typo in notebook 2021-07-30 06:48:33 -04:00
Josh Meyer
73f7b765ef Fix config issue and add notebook 2021-07-30 06:48:33 -04:00
Reuben Morais
8c5c35a0ad Bump version to v0.10.0-alpha.11 2021-07-29 17:45:16 +02:00
Reuben Morais
58a8e813e4
Merge pull request #1921 from coqui-ai/tflite-only
Remove full TF backend from native client and CI
2021-07-29 17:44:38 +02:00
Reuben Morais
d119957586 Update supported architectures doc 2021-07-29 16:45:30 +02:00
Reuben Morais
c69735e3b6 Build swig and decoder on manylinux_2_24 2021-07-29 16:45:30 +02:00
Reuben Morais
42ebbf9120 Remove full TF backend 2021-07-28 17:19:27 +02:00
Reuben Morais
2020f1b15a Undo accidental auth removal in check_artifact_exists action 2021-07-28 17:19:27 +02:00
Josh Meyer
4cd1a1cec4
Merge pull request #1919 from coqui-ai/JRMeyer-linter
Exclude smoke testing data from linter
2021-07-27 10:14:40 -04:00
Josh Meyer
c9840e59b1
Exclude smoke testing data from linter 2021-07-27 08:06:02 -04:00
Josh Meyer
4b4f00da56
Merge pull request #1918 from coqui-ai/JRMeyer-alphabet-patch
Add missing space to russian sample alphabet
2021-07-27 08:01:11 -04:00
Josh Meyer
1c7539e9c9
Add missing space to russian sample alphabet 2021-07-27 05:40:20 -04:00
Reuben Morais
4b2af9ce6b
Merge pull request #1909 from coqui-ai/kenlm-dynamic
Dynamically link KenLM and distribute with packages
2021-07-27 00:36:15 +02:00
Reuben Morais
3a695f9c1c Fix packaging and linking of libkenlm on Windows 2021-07-26 19:30:25 +02:00
Reuben Morais
36923c1e93 Enable -fexceptions in decoder builds 2021-07-26 12:50:53 +02:00
Reuben Morais
8a40ff086d Force link against KenLM on Windows 2021-07-26 12:43:49 +02:00
Reuben Morais
cbbdcbf246 Revert factoring out of decoder library build definition 2021-07-26 12:43:49 +02:00
Reuben Morais
7846f4602e Export needed KenLM symbols manually for Windows 2021-07-26 12:42:59 +02:00
Reuben Morais
b7428d114e Dynamically link KenLM and distribute with packages 2021-07-26 12:41:46 +02:00
Reuben Morais
579c36c98c
Merge pull request #1912 from JRMeyer/jupyter
Add Jupyter notebook and Dockerfile with Jupyter
2021-07-24 20:33:06 +02:00
Josh Meyer
3119911657 Next core Coqui STT docker image will have notebooks dir 2021-07-23 12:17:55 -04:00
Josh Meyer
7d40d5d686 Specify latest for base Coqui STT docker image 2021-07-23 12:16:26 -04:00
Josh Meyer
ea82ab4cb8 Remove old unneeded files 2021-07-23 12:15:12 -04:00
Josh Meyer
9f7fda14cb Add first Jupyter notebook 2021-07-23 12:12:02 -04:00
Josh Meyer
d1b0aadfbc
Merge pull request #1910 from JRMeyer/main
Bump VERSION to 0.10.0-alpha.10
2021-07-22 10:31:59 -04:00
Josh Meyer
9b5176321d Bump VERSION to 0.10.0-alpha.10 2021-07-22 10:30:14 -04:00
Josh Meyer
c19faeb5d0
Merge pull request #1908 from JRMeyer/config-logic 2021-07-22 08:15:14 -04:00
Josh Meyer
b4827fa462 Formatting changes from pre-commit 2021-07-22 05:39:45 -04:00
Josh Meyer
b6d40a3451 Add CI test for in-script variable setting 2021-07-21 15:35:33 -04:00
Josh Meyer
414748f1fe Remove extra imports 2021-07-21 15:23:56 -04:00
Josh Meyer
ec37b3324a Add example python script with initialize_globals_from_args() 2021-07-21 15:13:39 -04:00
Josh Meyer
ae9280ef1a Cleaner lines for CI args 2021-07-21 12:09:43 -04:00
Josh Meyer
a050b076cb Cleaner lines for CI args 2021-07-21 12:07:24 -04:00
Josh Meyer
3438dd2beb Fix checkpoint setting logic 2021-07-21 11:53:11 -04:00
Josh Meyer
90ce16fa15 Shortening some print statements 2021-07-21 09:37:32 -04:00
Josh Meyer
6da7b5fc26 Raise error when alphabet and bytes_mode both specified 2021-07-21 08:44:57 -04:00
Josh Meyer
0389560a92 Remove alphabet.txt from CI tests with bytes_output_mode 2021-07-21 07:16:58 -04:00
Josh Meyer
4342906c50 Better file_size handling 2021-07-21 06:37:25 -04:00
Josh Meyer
f6bd7bcf7d Handle file_size passed as int 2021-07-21 06:23:19 -04:00
Josh Meyer
4dc565beca Move checking logic into __post_init__() 2021-07-21 05:00:05 -04:00
Josh Meyer
5b4fa27467 Add required alphabet path to CI tests 2021-07-20 09:50:47 -04:00
Josh Meyer
920e92d68a Remove check_values and default alphabet 2021-07-20 09:34:44 -04:00
Josh Meyer
59e32556a4 Currently working notebook 2021-07-20 09:07:54 -04:00
Josh Meyer
848a612efe Import _SttConfig 2021-07-20 08:41:22 -04:00
Josh Meyer
afbcc01369 Break out config instantiation and setting 2021-07-20 08:13:07 -04:00
Josh Meyer
a37ca2ec27 Simplyfy dockerfile and add notebook 2021-07-20 04:20:57 -04:00
Josh Meyer
d0f8eb96cd Take out OVH run-time params 2021-07-16 11:52:51 -04:00
Josh Meyer
649bc53536 Remove extra installs from Dockerfile 2021-07-16 11:14:00 -04:00
Josh Meyer
ef5d472b29
Merge pull request #1894 from JRMeyer/dockerfile
Use multi-stage build process for Dockerfile.train
2021-07-16 10:05:33 -04:00
Josh Meyer
2c26497d96 Cleanup for smaller containers 2021-07-16 09:32:19 -04:00
Josh Meyer
f062f75e17 working on dockerfile with jupyter support 2021-07-16 09:24:00 -04:00
Reuben Morais
ba24f010eb Bump version to v0.10.0-alpha.9 2021-07-15 20:56:48 +02:00
Reuben Morais
b7380a6928 Comment out publishing of arv7 and aarch64 wheels 2021-07-15 20:56:28 +02:00
Reuben Morais
b12aa69922 Bump version to v0.10.0-alpha.8 2021-07-15 18:32:17 +02:00
Reuben Morais
5ded871a5e Separate PyPI publish jobs per API token and publish decoder package 2021-07-15 18:31:45 +02:00
Reuben Morais
550f5368a8
Merge pull request #1902 from coqui-ai/upload-tf-cache-to-release
Save build cache as release asset instead of artifact
2021-07-15 18:20:20 +02:00
Reuben Morais
460209d209 Pick up MSVC version automatically to handle worker upgrades cleanly 2021-07-15 16:27:37 +02:00
Josh Meyer
99fa146253 Fix wording in error message 2021-07-15 09:54:54 -04:00
Josh Meyer
52ecb5dbe2 Remove unneeded assets and only copy kenlm bins 2021-07-15 09:53:04 -04:00
Reuben Morais
d6da5191f5
Merge pull request #1903 from JRMeyer/minor-branding 2021-07-15 12:30:45 +02:00
Josh Meyer
a29db11a62 Change DS to STT in top python scripts 2021-07-15 06:26:06 -04:00
Reuben Morais
f026c75dae Setup MSVC env in Win TF build job 2021-07-15 11:06:44 +02:00
Reuben Morais
ed171b2efd Save build cache as release asset instead of artifact 2021-07-15 10:10:24 +02:00
Reuben Morais
2d9cbb2f06 Fix PYTHON_BIN_PATH in Windows build 2021-07-14 14:13:15 +02:00
Reuben Morais
283379775e Bump version to v0.10.0-alpha.7 2021-07-13 18:40:12 +02:00
Reuben Morais
432ca99db1 Add job to publish Python artifacts to PyPI on tag pushes 2021-07-13 18:40:12 +02:00
Reuben Morais
f1c0559406 Exclude a few more jobs from non-PR triggers 2021-07-13 18:25:58 +02:00
Reuben Morais
7b7f52f44c Enable build-and-test workflow on tags 2021-07-13 18:25:58 +02:00
Josh Meyer
8c65cbf064
Add missing Sox library for processing MP3 data 2021-06-14 16:09:26 -04:00
Josh Meyer
0385dfb5aa Fix broken multi-line error message 2021-06-14 10:32:38 -04:00
Josh Meyer
d3b337af09 Fixed error print statement 2021-06-14 07:20:34 -04:00
Josh Meyer
75fbd0ca30 Error message when KenLM build fails 2021-06-14 05:04:01 -04:00
Josh Meyer
769b310919 Use multistage building in dockerfile 2021-06-11 14:36:23 -04:00
Josh Meyer
6f2c7a8a7b
Merge pull request #1892 from JRMeyer/update-kenlm
Update kenlm submodule
2021-06-11 08:14:27 -04:00
Josh Meyer
806a16d1c0 Update kenlm submodule 2021-06-11 08:09:53 -04:00
Reuben Morais
866e15af7f Comment out isort pre-commit hook until we can figure out discrepancies between macOS and Linux 2021-06-10 16:57:57 +02:00
Reuben Morais
f2a21b2258
Merge pull request #1890 from JRMeyer/pre-commit-hook-changes
Add changes from pre-commit hook
2021-06-10 16:56:45 +02:00
Josh Meyer
9252cef138 Add changes from pre-commit hook 2021-06-10 10:49:54 -04:00
Reuben Morais
2e5efe5e15
Merge pull request #1889 from JRMeyer/dockerfile
Use NVIDIA image in Dockerfile
2021-06-10 16:01:52 +02:00
Josh Meyer
38e06e4635 Use NVIDIA image in Dockerfile 2021-06-10 09:47:15 -04:00
Reuben Morais
bf07f35420
Merge pull request #1887 from JRMeyer/dockerfile
Add dependencies for opus training
2021-06-10 00:00:21 +02:00
Josh Meyer
eba8e1ad4a Add dependencies for opus training 2021-06-09 14:02:07 -04:00
Reuben Morais
4ebcbea8b3
Merge pull request #1874 from erksch/ios-deployment-target-9.0
Set iOS deployment target to 9.0
2021-05-25 16:31:16 +02:00
Reuben Morais
a2515397cf
Merge pull request #1876 from erksch/rename-ios-framework
Change static ios framework name to stt_ios from coqui_stt_ios
2021-05-25 16:30:43 +02:00
Reuben Morais
0a38b72e34
Merge pull request #1877 from erksch/remove-libstt-from-ios-test-project
Remove libstt.so reference from stt_ios_test project
2021-05-25 16:30:13 +02:00
Erik Ziegler
d69c15db1a Remove libstt.so reference from stt_ios_test project 2021-05-24 17:29:36 +02:00
Erik Ziegler
ced136c657 Change static ios framework name to stt_ios from coqui_stt_ios 2021-05-24 17:22:55 +02:00
Erik Ziegler
b2fee574d8
Set iOS deployment target to 9.0 2021-05-23 20:59:09 +02:00
Reuben Morais
1bf8058379
Merge pull request #1871 from coqui-ai/training-tests
Training tests
2021-05-21 14:22:12 +02:00
Reuben Morais
f9ecf8370e Training unittests and lint check 2021-05-21 13:17:05 +02:00
Reuben Morais
3f17bba229 Training tests 2021-05-21 13:17:05 +02:00
Reuben Morais
5ba1e4d969 Remove broken TrainingSpeech importer temporarily
During the fork the archive URL was broken and nobody has mentioned it since.
Additionally the dependency on Unidecode (GPL) complicates licensing.

Removing it for now until both points are fixed.
2021-05-20 17:02:39 +02:00
Reuben Morais
debd1d9495
Merge pull request #1866 from coqui-ai/coqpit-config
Switch flag/config handling to Coqpit
2021-05-20 14:38:42 +02:00
Reuben Morais
eab6d3f5d9 Break dependency cycle between augmentation and config 2021-05-19 20:19:36 +02:00
Reuben Morais
d83630fef4 Print fully parsed augmentation config 2021-05-19 20:19:36 +02:00
Reuben Morais
5114362f6d Fix regression caused by PR #1868 2021-05-19 20:19:36 +02:00
Reuben Morais
5ad6e6abbf Switch flag/config handling to Coqpit 2021-05-19 20:19:36 +02:00
Reuben Morais
fb826f714d
Merge pull request #1828 from coqui-ai/global-cleanup
Run pre-commit hooks on all files
2021-05-18 13:47:20 +02:00
Reuben Morais
43a6c3e62a Run pre-commit hooks on all files 2021-05-18 13:45:52 +02:00
Reuben Morais
14aee5d35b Reintroduce excludes to pre-commit-hook 2021-05-18 13:45:09 +02:00
Josh Meyer
ac2bbd6a79
Merge pull request #1868 from JRMeyer/data-augmentation-cleaning
Add logging and clean up some augmentation code
2021-05-18 07:05:12 -04:00
Josh Meyer
7bec52c55d More compact return statement 2021-05-18 06:47:32 -04:00
Josh Meyer
9a708328e7 Use class name and review cleanup 2021-05-18 06:12:56 -04:00
Reuben Morais
d2c5f979ce Update pre-commit setup 2021-05-18 11:46:53 +02:00
Josh Meyer
f19ecbdd93 Add logging for augmentation and more transparent syntax 2021-05-17 10:05:49 -04:00
Josh Meyer
b793aa53bb
Merge pull request #1867 from ftyers/patch-2
Fix typo in augmentations.py
2021-05-13 11:36:55 -04:00
Francis Tyers
37cc7f2312
Update augmentations.py
Looks like `clock_to` got changed to `final_clock` but this one was missed.
2021-05-13 16:11:46 +01:00
Reuben Morais
8d62a6e154 Add some clarifying comments on building SWIG from source 2021-05-05 17:04:10 +02:00
Reuben Morais
b78894d7ab
Merge pull request #1864 from NanoNabla/pr_docu_ppc64_swig
Stop pointing people to a fork for docs on building SWIG
2021-05-05 16:59:08 +02:00
NanoNabla
af35faf67e fixes docu, use official swig release on ppc64le 2021-05-05 16:46:50 +02:00
Reuben Morais
397c351fa7
Merge pull request #1863 from IlnarSelimcan/patch-1
[docs playbook] fix a typo in TRAINING.md
2021-05-04 22:47:41 +02:00
Ilnar Salimzianov
c235841871
[docs playbook] fix a typo in TRAINING.md 2021-05-04 22:50:22 +03:00
Reuben Morais
ce71ec0c89 Include missing changes in MLS English importer 2021-05-04 19:06:18 +02:00
Reuben Morais
ad4025af7d Merge branch 'build-decoder-push' (Closes #1860) 2021-05-03 16:59:19 +02:00
Reuben Morais
3dcd56145c Expand build matrix of decoder package, build on push 2021-05-03 15:00:17 +02:00
Reuben Morais
1f3d2dab4c Bump version to v0.10.0-alpha.6 2021-04-30 14:01:16 +02:00
Reuben Morais
3d2ab809ee
Merge pull request #1858 from coqui-ai/ci-updates
CI: Add ARM build and tests / Add NodeJS 16.0.0
2021-04-30 13:59:54 +02:00
Alexandre Lissy
a4d5d14304 Add NodeJS 16.0.0 2021-04-30 12:14:18 +02:00
Alexandre Lissy
1eec25a9ab CI: Linux ARMv7 / Aarch64 2021-04-30 11:05:05 +02:00
Reuben Morais
f147c78a97
Merge pull request #1856 from coqui-ai/decoder-rename
Rename decoder package to coqui_stt_ctcdecoder
2021-04-28 12:56:54 +02:00
Reuben Morais
36e0223c07 Bump version to v0.10.0-alpha.5 2021-04-27 19:46:27 +02:00
Reuben Morais
c952ee0b0d Rename decoder package to coqui_stt_ctcdecoder 2021-04-27 19:46:12 +02:00
Reuben Morais
e5aff105d4
Merge pull request #1855 from coqui-ai/linux-ci
Linux CI base
2021-04-27 19:21:51 +02:00
Reuben Morais
5ddd7e0fa2 Try to reduce API calls in check_artifact_exists 2021-04-27 14:58:14 +02:00
Reuben Morais
93128cae5f Address review comments 2021-04-27 14:58:14 +02:00
Reuben Morais
01b5a79c5c Linux CI scripts fixes 2021-04-27 14:58:14 +02:00
Reuben Morais
3f85c1d8da Linux base CI 2021-04-27 14:58:14 +02:00
Reuben Morais
a0914d8915 Improve job name 2021-04-27 14:53:07 +02:00
Reuben Morais
46dab53e11 Remove unused SWIG native build job 2021-04-27 14:52:36 +02:00
Reuben Morais
b542e9e469 Update Windows SWIG build job to use caching 2021-04-27 14:44:50 +02:00
Reuben Morais
9639a27929 Run build/test workflow on pushes to main, not master 2021-04-19 14:37:27 +02:00
Reuben Morais
9c7003d77d
Merge pull request #1843 from coqui-ai/windows-ci
Windows base CI
2021-04-19 12:44:12 +02:00
Reuben Morais
59297e526c Windows base CI 2021-04-19 10:48:14 +02:00
Alexandre Lissy
df8d17fc4e Ensure proper termination for ElectronJS and NodeJS 2021-04-18 17:03:16 +02:00
Alexandre Lissy
5558f55701 Use caching for node_modules and headers 2021-04-18 17:03:00 +02:00
Alexandre Lissy
b0c38d5aa9 Remove references to TaskCluster from ci_scripts/ 2021-04-18 17:01:47 +02:00
Alexandre Lissy
d45149b02e NodeJS repackaging 2021-04-18 16:54:00 +02:00
Reuben Morais
5d4941067f Add explicit attribution, description of changes and link to original in playbook 2021-04-15 09:10:38 +02:00
Reuben Morais
09b04a8f83
Merge pull request #1831 from coqui-ai/windows-ci
Windows CI
2021-04-12 14:56:01 +02:00
Reuben Morais
ec271453c1 Ensure upstream Python is used 2021-04-12 14:10:28 +02:00
Reuben Morais
8fe4eb8357 CI rebrand pass 2021-04-12 13:24:54 +02:00
Reuben Morais
7855f0a563 Base Windows CI setup 2021-04-12 12:54:07 +02:00
Reuben Morais
7d017df80c Bump pygments from 2.6.1 to 2.7.4 in doc/requirements.txt 2021-04-12 12:52:59 +02:00
Alexandre Lissy
f5369c8f4b Remove code refs to TaskCluster 2021-04-12 12:51:55 +02:00
Reuben Morais
5b3119ad3f Remove CircleCI setup 2021-04-12 12:44:32 +02:00
Alexandre Lissy
54f232c51a Reduce non multiplatform NodeJS/ElectronJS package tests matrices 2021-04-12 12:43:50 +02:00
Alexandre Lissy
3d96b1d4fd Fix #3549 2021-04-10 18:31:20 +02:00
Kenneth Heafield
b6b8160810 MSVC doesn't like const Proxy operator*() const.
Fixes #308
2021-04-10 18:31:13 +02:00
Alexandre Lissy
c4a4ca2bf8 Fix #3586: NumPy versions 2021-04-10 16:13:53 +02:00
Alexandre Lissy
ef31be2e32 Fix #3593: Limit tests to PR 2021-04-10 16:13:48 +02:00
Alexandre Lissy
3e66adba01 Fix #3578: Re-instate Python TF/TFLite tests on GitHub Actions / macOS 2021-04-10 16:13:42 +02:00
Alexandre Lissy
7168e83ac0 Fix #3590: Move training to macOS 2021-04-10 16:13:33 +02:00
CatalinVoss
51fd6170fa Fix documentation for check_characters.py script 2021-04-10 16:10:57 +02:00
Reuben Morais
449a723bf6 Add missing imports for sample rate normalization 2021-04-06 12:45:32 +02:00
Reuben Morais
39627b282c
Merge pull request #1827 from coqui-ai/reuben-issue-templates-1
Update issue templates
2021-04-06 12:34:54 +02:00
Reuben Morais
5933634db5 Update issue templates 2021-04-06 12:34:25 +02:00
Reuben Morais
4696c3dd0a Create PR template and update issue template 2021-04-06 12:08:40 +02:00
Reuben Morais
4d764c0559 Add importer for English subset of Multilingual LibriSpeech 2021-04-06 11:59:41 +02:00
Reuben Morais
4b9b0743a8 Replace cardboardlinter + pylint setup with pre-commit + black 2021-04-06 11:58:58 +02:00
Reuben Morais
8cdaa18533 Normalize sample rate of dev/test sets to avoid feature computation errors 2021-04-06 11:41:12 +02:00
Reuben Morais
c78af058a5 Merge branch 'playbook-into-docs' 2021-03-30 19:39:27 +02:00
Reuben Morais
c0d068702e Commit non git add'ed dockerignore files 2021-03-30 19:39:10 +02:00
Reuben Morais
0bd653a975 Merge STT playbook into docs 2021-03-30 19:38:31 +02:00
Reuben Morais
a5c950e334 Fix .readthedocs.yml to point at the correct docs requirements file 2021-03-30 18:49:31 +02:00
Josh Meyer
ce0dacd3d2
Add link to generate_scorer_package releases 2021-03-30 10:58:20 -04:00
Reuben Morais
91c5f90f3c Merge pull request #1821 from JRMeyer/docs 2021-03-29 21:34:29 +02:00
Reuben Morais
3409fde4a0 Rename model export metadata flags 2021-03-29 21:06:38 +02:00
Reuben Morais
214a150c19 Fixes for Dockerfile.{train,build} and adjust instructions for new image 2021-03-29 21:05:49 +02:00
Reuben Morais
1029d06a23 Reinstate manylinux1 hack on Python package build 2021-03-29 19:24:11 +02:00
Reuben Morais
c95b89f3c5 Remove dummy workflow 2021-03-27 11:26:42 +01:00
Alexandre Lissy
719050f204 Fix #3581: GitHub Actions test model 2021-03-27 11:24:18 +01:00
Alexandre Lissy
63aeb6a945 Introduce ci_scripts/ for GitHub Actions 2021-03-27 11:24:11 +01:00
Alexandre Lissy
cd80708546 GitHub Actions for macOS 2021-03-27 11:23:59 +01:00
Kathy Reid
654a83a294 Replace remove_remote() method with remove method
Partially resolves #3569
2021-03-27 11:16:53 +01:00
CatalinVoss
c152be2343 Handle mono conversion within pcm_to_np() 2021-03-27 11:16:35 +01:00
CatalinVoss
be5f9627da Don't throw on mono audio any more since everything should work? 2021-03-27 11:16:27 +01:00
CatalinVoss
900a01305c Expose some internal layers for downstream applications 2021-03-27 11:16:19 +01:00
Josh Meyer
653ce25a7c
Merge pull request #1807 from JRMeyer/docs
Overhaul the language model docs + include in ToC
2021-03-24 11:58:49 -04:00
Josh Meyer
04451a681c Overhaul the language model docs + include in ToC 2021-03-24 11:34:28 -04:00
Josh Meyer
cb75dcb419
Merge pull request #1808 from JRMeyer/docs-building
building docs minor changes
2021-03-24 11:10:49 -04:00
Josh Meyer
d5e000427f Reword docs for building binaries + include in ToC 2021-03-24 11:03:03 -04:00
Reuben Morais
b5f72ca4cb
Remove missing feature from list
Raised in https://github.com/coqui-ai/STT/discussions/1814
2021-03-24 10:41:44 +01:00
Eren Gölge
116029aafe
Update README.rst (#1796)
* Update README.rst

* Update README.rst

* Update README.rst

fixes
2021-03-21 14:06:00 +01:00
Reuben Morais
6c9f3a52dc Add empty workflow file to main branch 2021-03-19 13:36:43 +01:00
Reuben Morais
b4e8802aff Switch doc theme to Furo 2021-03-19 10:25:55 +01:00
Reuben Morais
6b9de13ad1 Adjust name of Python package in build system 2021-03-19 10:25:55 +01:00
Josh Meyer
629706b262
Docs welcome page and Development / Inference page overhaul (#1793)
* Docs welcome page and Development / Inference page overhaul

* Address review comments

* Fix broken refs and other small adjustments

Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2021-03-17 10:14:50 +01:00
Reuben Morais
2d654706ed
Merge pull request #1794 from coqui-ai/erogol-doi-patch
Update README with DOI from Zenodo
2021-03-16 18:56:02 +01:00
Eren Gölge
f024b0ded6
Update README.rst
DOI from ZENODO
2021-03-15 23:29:59 +01:00
Reuben Morais
e64d62631c
Merge pull request #1792 from coqui-ai/erogol-patch-2
Gitter room
2021-03-14 16:42:01 +01:00
Eren Gölge
52a709c807
Update README.rst
gitter link

Note: Without sub-def the next badge goes to the new line
2021-03-13 17:55:27 +01:00
Josh Meyer
120ff297af
🐸 instead of \:frog\: 2021-03-09 10:41:28 -05:00
Josh Meyer
89d9a53b86
readme: help + community 2021-03-08 12:47:57 -05:00
Kelly Davis
31f3a6a235 Changes for new Linux packages and bump VERSION 2021-03-08 16:55:43 +01:00
Kelly Davis
8a03f4bce5 Note on supported platforms 2021-03-07 19:50:30 +01:00
Kelly Davis
f02c12925a More updates 2021-03-07 19:25:10 +01:00
Josh Meyer
8c95f3ec20
readme 2021-03-07 13:13:41 -05:00
Kelly Davis
4c37313c3d Some leftover references 2021-03-07 14:47:47 +01:00
Kelly Davis
742b44dd2c Merge branch 'rebrand' onto main 2021-03-07 14:42:44 +01:00
Kelly Davis
57adefcc10 More rebranding, submodules, some internal names 2021-03-07 14:41:43 +01:00
Kelly Davis
6d4d1a7153 More rebranding, API names, iOS, .NET 2021-03-07 14:29:02 +01:00
Kelly Davis
136ca35ca2 Contributor covenant badge 2021-03-07 11:37:17 +01:00
Kelly Davis
95f122806e More rebranding, Java package, C++ impl 2021-03-07 11:34:01 +01:00
Kelly Davis
f33f0b382d More rebranding, Python and JS packages verified 2021-03-07 11:14:16 +01:00
Kelly Davis
99fc28a6c7 More rebranding 2021-03-05 16:46:18 +01:00
Kelly Davis
915886b3b7 Main README logo 2021-03-05 12:53:37 +01:00
Kelly Davis
d2009582e9 Rebranding WIP 2021-03-05 12:48:08 +01:00
lissyx
2bb42d4fb1
Merge pull request #3548 from lissyx/doc-net-build
Expose .Net building doc better
2021-03-03 15:45:25 +01:00
Alexandre Lissy
a087509ab7 Expose .Net building doc better 2021-03-03 15:42:31 +01:00
Reuben Morais
8c8b80dc0b
Merge pull request #3546 from dzubke/Iss-3511_split-sets
Fix #3511: split-sets on sample size
2021-03-01 18:09:38 +00:00
Dustin Zubke
6945663698 Fix #3511: split-sets on sample size 2021-02-28 16:09:37 -05:00
lissyx
385c8c769b
Merge pull request #3539 from lissyx/new-swig
Tentative merge of SWIG master
2021-02-25 18:54:58 +01:00
Alexandre Lissy
206b8355b1 Fix #3540: Force NAudio 1.10.0 2021-02-25 17:09:15 +01:00
Alexandre Lissy
fee12be4ff Update SWIG with upstream 4.1-aligned branch 2021-02-25 17:09:15 +01:00
lissyx
7b2eeb6734
Merge pull request #3524 from Ideefixze/master
Added hot-word boosting doc
2021-02-12 20:35:13 +01:00
Ideefixze
7cf257a2f5 Added hot-word boosting api example doc
Comments for API bindings
X-DeepSpeech: NOBUILD
2021-02-12 19:52:19 +01:00
lissyx
cc038c1263
Merge pull request #3527 from zaptrem/master
Fix incompatible Swift module error
2021-02-12 09:34:48 +01:00
zaptrem
9d83e18113 Fix incompatible Swift 2021-02-12 00:08:34 -05:00
lissyx
962a117f7e
Merge pull request #3518 from lissyx/rebuild-swig
Fix #3517: Update SWIG sha1
2021-02-01 16:54:06 +01:00
Alexandre Lissy
6eca9b4e0a Fix #3517: Update SWIG sha1 2021-02-01 16:21:44 +01:00
CatalinVoss
f27908e7e3 Fix copying remote AudioFile target to local 2021-01-26 10:02:59 +00:00
Reuben Morais
efbd6be727 Merge PR #3509 (Use pyyaml.safe_load in tc-decision.py) 2021-01-25 09:53:55 +00:00
lissyx
50c7ac6cf6
Merge pull request #3514 from lissyx/fix-decision-task-master
Set base image to ubuntu 18.04
2021-01-25 10:46:52 +01:00
Anton Yaroshenko
54565a056f Set base image to ubuntu 18.04 2021-01-25 10:41:23 +01:00
Reuben Morais
d7e0e89aed
Merge pull request #3510 from zaptrem/patch-1
Swift iOS Bindings: Expose DeepSpeechTokenMetadata fields
2021-01-22 10:29:50 +00:00
zaptrem
28ddc6b0e0
Expose DeepSpeechTokenMetadata fields
Currently, attempting to access member fields DeepSpeechTokenMetadata objects output from intermediateDecodeWithMetadata causes a crash. Changing these lines makes the object work as (I assume) intended.
2021-01-22 03:42:08 -05:00
lissyx
93c7d1d5dc
Merge pull request #3508 from tud-zih-tools/docu_unsupported_architecture
Docu building ctc decoder on unsupported architecture
2021-01-21 23:33:09 +01:00
NanoNabla
5873145c8e arm is not supported for building cdcdecoder 2021-01-21 23:31:07 +01:00
NanoNabla
334f6b1e47 improve ctcdecode docu for unsupported platforms 2021-01-21 20:59:27 +01:00
NanoNabla
aec81bc048 add hints for building ctcdecode on unsupported platforms 2021-01-21 10:52:26 +01:00
lissyx
b9aa725900
Merge pull request #3505 from tud-zih-tools/ppc64le_integration
build ctcdecode on ppc64le
2021-01-20 23:38:18 +01:00
NanoNabla
d0f0a2d6e8 applying lissyx's patch from mozilla#3379, make it possible to set PYTHON_PLATFORM_NAME in environment on target host 2021-01-20 20:18:03 +01:00
Reuben Morais
80b5fe10df
Merge pull request #3493 from mozilla/add-ogg-opus-training-support
Add ogg opus training support
2021-01-20 17:53:10 +00:00
NanoNabla
80da74c472 add build rules for ctcdecode on ppc64le 2021-01-20 17:25:29 +01:00
Reuben Morais
b2feb04763 Fix some test names/descs and drop Py3.5 training tests 2021-01-18 16:23:40 +00:00
Reuben Morais
f2e57467c6 Compare sample durations with an epsilon 2021-01-18 16:20:03 +00:00
Reuben Morais
db45057dcc Add missing metadata leak suppressions 2021-01-18 13:57:44 +00:00
Reuben Morais
64465cd93a Bump NCCL version due to NVIDIA base image update 2021-01-18 13:37:13 +00:00
Reuben Morais
79a42b345d Read audio format from data before running augmentation passes instead of assuming default 2021-01-18 12:11:31 +00:00
Reuben Morais
8c0d46cb7f Normalize sample rate of train_files by default 2021-01-18 12:11:31 +00:00
Reuben Morais
d4152f6e67 Add support for Ogg/Opus audio files for training 2021-01-18 12:11:31 +00:00
Reuben Morais
ad0f7d2ab7
Merge pull request #3486 from KathyReid/patch-3
Update refs to 0.9.3 from 0.9.2
2021-01-03 09:56:24 +00:00
Kathy Reid
bb47cf26d0
Update refs to 0.9.3 from 0.9.2
I'm using this documentation to build out a Playbook - please don't interpret this as nitpicking, saw a minor change and made it.
2021-01-03 13:53:46 +11:00
Anon-Artist
5edfcdb92e
Update tc-decision.py 2020-12-21 15:34:00 +05:30
Reuben Morais
fcbd92d0d7 Bump version to v0.10.0-alpha.3 2020-12-19 09:28:21 +00:00
Reuben Morais
81c2a33f5b Separate branch and tag 2020-12-19 09:23:32 +00:00
Reuben Morais
239656c0f9 Bump version to v0.10.0-alpha.2 2020-12-19 09:11:06 +00:00
Reuben Morais
05654ef896 Expose GITHUB_HEAD_TAG, used by package upload scriptworker 2020-12-19 09:10:40 +00:00
Reuben Morais
55751e5d70 Bump version to v0.10.0-alpha.1 2020-12-19 08:48:27 +00:00
Reuben Morais
dc16a0e7f9 Separate ref and branch/tag metadata 2020-12-18 23:49:33 +00:00
Reuben Morais
9c988c764b Fix metadata.github.ref on push and tag 2020-12-18 23:42:29 +00:00
Reuben Morais
273d461f6a Bump version to v0.10.0-alpha.0 2020-12-18 23:29:54 +00:00
Reuben Morais
caaec68f59
Merge pull request #3473 from mozilla/taskcluster-v1
Convert to .taskcluster.yml v1
2020-12-18 20:36:38 +00:00
Reuben Morais
4723de25bf Use payload.env instead of forwarding variables manually 2020-12-18 17:00:00 +00:00
Reuben Morais
bb1ad00194 Convert to .taskcluster.yml v1
forward TASK_ID, add created and deadline

more fixes

typo

try without TASK_ID

fix task templates

add missing env vars to tc decision dry runs

avoid repetition in .taskcluster and manually forward varibles to tc-decision.py

url -> clone_url

simulate GITHUB_EVENT

separate ref an sha

correct pull request actions

correct pull request policy
2020-12-18 09:35:14 +00:00
Reuben Morais
07d0e93083 Add paragraph on expected behavior from module owners
X-DeepSpeech: NOBUILD
2020-12-17 08:59:36 +00:00
Reuben Morais
8a88e6e063 Fix link in RST
X-DeepSpeech: NOBUILD
2020-12-17 08:53:59 +00:00
Reuben Morais
89cae68706 Improve explanation of governance model 2020-12-17 08:51:08 +00:00
lissyx
3e10163ec8
Merge pull request #3416 from lissyx/pr-3414
.NET Client Binding Fix
2020-12-08 14:48:32 +01:00
Reuben Morais
b3b9e268a7
Merge pull request #3460 from mozilla/more-doc-fixes
More documentation fixes
2020-12-08 15:42:11 +02:00
imrahul3610
1be44c63fc Hotword support for .NET client tests 2020-12-08 13:42:53 +01:00
Reuben Morais
d422955c4a Fix doc references to renamed StreamImpl class 2020-12-08 13:52:04 +02:00
Reuben Morais
1102185abf More branding fixes for docs & Java bindings 2020-12-08 13:36:28 +02:00
Reuben Morais
857ce297f0
Merge pull request #3459 from mozilla/move-linter-circleci
Move linting job to CircleCI
2020-12-08 13:24:35 +02:00
Reuben Morais
0e2209e2b3 Remove Travis 2020-12-08 13:21:05 +02:00
Reuben Morais
25c4f97aa7 Move linting job to CircleCI 2020-12-08 13:21:05 +02:00
Sjors Holtrop
8c8387c45a
Rename Stream class to StreamImpl, export its type as Stream (#3456) 2020-12-08 12:19:21 +01:00
Reuben Morais
4e55d63351
Fix package name reference in Java API docs (#3458) 2020-12-08 10:44:31 +01:00
Catalin Voss
6640cf2341
Remote training I/O once more (#3437)
* Redo remote I/O changes once more; this time without messing with taskcluster

* Add bin changes

* Fix merge-induced issue?

* For the interleaved case with multiple collections, unpack audio on the fly

To reproduce the previous failure

rm data/smoke_test/ldc93s1.csv
rm data/smoke_test/ldc93s1.sdb
rm -rf /tmp/ldc93s1_cache_sdb_csv
rm -rf /tmp/ckpt_sdb_csv
rm -rf /tmp/train_sdb_csv

./bin/run-tc-ldc93s1_new_sdb_csv.sh 109 16000
python -u DeepSpeech.py --noshow_progressbar --noearly_stop --train_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --train_batch_size 1 --feature_cache /tmp/ldc93s1_cache_sdb_csv --dev_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --dev_batch_size 1 --test_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --test_batch_size 1 --n_hidden 100 --epochs 109 --max_to_keep 1 --checkpoint_dir /tmp/ckpt_sdb_csv --learning_rate 0.001 --dropout_rate 0.05 --export_dir /tmp/train_sdb_csv --scorer_path data/smoke_test/pruned_lm.scorer --audio_sample_rate 16000

* Attempt to preserve length information with a wrapper around `map()`… this gets pretty python-y

* Call the right `__next__()`

* Properly implement the rest of the map wrappers here……

* Fix trailing whitespace situation and other linter complaints

* Remove data accidentally checked in

* Fix overlay augmentations

* Wavs must be open in rb mode if we're passing in an external file pointer -- this confused me

* Lint whitespace

* Revert "Fix trailing whitespace situation and other linter complaints"

This reverts commit c3c45397a2f98e9b00d00c18c4ced4fc52475032.

* Fix linter issue but without such an aggressive diff

* Move unpack_maybe into sample_collections

* Use unpack_maybe in place of duplicate lambda

* Fix confusing comment

* Add clarifying comment for on-the-fly unpacking
2020-12-07 13:07:34 +01:00
Reuben Morais
18b66adf46
Merge pull request #3435 from olafthiele/scorerchange
Conditional msg for missing lm.binary added
2020-12-07 13:59:36 +02:00
Reuben Morais
a947e80f70
Merge pull request #3454 from mozilla/branding-cleanup
Branding cleanup
2020-12-07 13:59:03 +02:00
Reuben Morais
4639d57f81
Merge pull request #3455 from mozilla/conda-instructions
Add some guidelines for conda environments for training
2020-12-07 10:56:57 +02:00
Reuben Morais
f6ddc4f72c Add some guidelines for conda environments for training 2020-12-07 10:55:35 +02:00
Reuben Morais
c7ce999e02 Remove trademark from Swift binding project identifier 2020-12-07 10:20:02 +02:00
Reuben Morais
da0209de01 Remove trademark from Java binding package names 2020-12-07 10:18:56 +02:00
Reuben Morais
f822b04e1b Branding cleanup
Remove Mozilla trademarks.
2020-12-07 10:07:39 +02:00
Reuben Morais
ad7d61f837
Merge pull request #3452 from mozilla/codeowners
Add listing of code owners/reviewers and reference from CONTRIBUTING.rst
2020-12-04 15:23:28 +02:00
Reuben Morais
bc078423eb Merge branch 'pr-3436-leaks' (Fixes #3436 and #3451) 2020-12-04 15:21:17 +02:00
Reuben Morais
c6318859df Re-add missing TF flags to deepspeech_bundle library 2020-12-04 15:20:09 +02:00
CatalinVoss
32b6067a01 Enable static build of DeepSpeech iOS framework
Set up additional `deepspeech_ios` target with static build steps

Xcode config: lock swift version at 5.0, bundle framework rather than dynamic lib, never strip swift symbols, add framework search paths, and bring in lstdc++

Runtime schema config: disable the main thread checker as this causes trouble with the static build

Update model versions to 0.9.1

Remove libdeepspeech.so from example app bundling steps

Swift lib embed settings that are somehow essential

Attempt to adjust taskcluster build steps

Add a basic podspec

Add framework to gitignore

Fix podspec version code

Attempt to fix taskcluster unzip step

Switch deepspeech targets for iOS build

Try doing this unzip in one step

Remove packaging steps for unneeded stuff because libdeepspeech.so is no longer a thing here. I suppose we could add a step to package the iOS static lib instead.

Fix podspec version

Set up podspec relative assuming a clone from the repo root

Remove space in iOS package step

Fix buildfile nit

Link stdc++ in explicitly with iOS build only

Revert "Remove space in iOS package step"

This reverts commit 3e1922ea370c110f9854ae7e97101f2ea00f55c6.
2020-12-04 15:19:49 +02:00
Reuben Morais
73240a0f1d Add listing of code owners/reviewers and reference from contribution guidelines
X-DeepSpeech: NOBUILD
2020-12-04 15:17:09 +02:00
lissyx
bcfc74874f
Merge pull request #3444 from lissyx/doc-cuda
Fix #3443: Link to upstream Dockerfile for lack of correct TensorFlow…
2020-11-27 12:37:45 +01:00
Alexandre Lissy
c979e360da Fix #3443: Link to upstream Dockerfile for lack of correct TensorFlow GPU deps doc. 2020-11-27 12:36:23 +01:00
lissyx
da31812173
Merge pull request #3440 from lissyx/electronjs_11
Adding support for ElectronJS v11.0
2020-11-26 16:08:52 +01:00
Alexandre Lissy
c0c5e6ade8 Adding support for ElectronJS v11.0 2020-11-26 13:28:57 +01:00
lissyx
d217369839
Merge pull request #3428 from lissyx/import-ccef
Importer for XML file provided by Conference Centre for Economics, France
2020-11-24 09:51:36 +01:00
Alexandre Lissy
c822a6e875 Importer for dataset from Centre de Conférences Pierre Mendès-France
Released by Ministère de l'Economie, des Finances, et de la Relance
2020-11-24 09:49:39 +01:00
Olaf Thiele
3ae77ca75d Conditional msg for missing lm.binary added 2020-11-23 19:55:27 +01:00
Reuben Morais
ecc48062a7
Merge pull request #3432 from mozilla/revert-remote-io
Revert remote IO PR
2020-11-19 19:35:20 +02:00
Reuben Morais
88f7297215 Revert "Merge pull request #3420 from CatalinVoss/remote-io"
This reverts commit 08d18d7328c03eb0c65d28ffdc0d3755549585e0, reversing
changes made to 12badcce1ffc820bebc4cd2ed5d9787b248200f6.
2020-11-19 16:58:21 +02:00
Reuben Morais
f5cbda694a Revert "Merge pull request #3424 from mozilla/io-fixes"
This reverts commit ab1288ffde7118a76e5394e142b789adf3ad1bba, reversing
changes made to 08d18d7328c03eb0c65d28ffdc0d3755549585e0.
2020-11-19 16:58:01 +02:00
lissyx
ee68367580
Merge pull request #3430 from lissyx/fix-tc-gzip
Fix #3429: TaskCluster behavioral change wrt compression of artifacts
2020-11-19 14:54:36 +01:00
Alexandre Lissy
3caa474cce Fix #3429: TaskCluster behavioral change wrt compression of artifacts 2020-11-19 13:23:56 +01:00
Reuben Morais
ab1288ffde
Merge pull request #3424 from mozilla/io-fixes
Fix I/O issues introduced in #3420
2020-11-18 08:07:10 +02:00
CatalinVoss
6cb638211e Only unpack when we need to, to make things work with SDBs 2020-11-17 16:55:49 -08:00
CatalinVoss
24e9e6777c Make sure we properly unpack samples when changing audio types 2020-11-17 14:44:26 -08:00
CatalinVoss
9aaa0e406b Make sure to unpack samples now 2020-11-17 14:31:48 -08:00
CatalinVoss
8bf1e9ddb7 Fix too aggressive F&R 2020-11-17 14:21:31 -08:00
CatalinVoss
ffe2155733 Undo remote edits for taskcluster as this is all local 2020-11-17 13:47:55 -08:00
CatalinVoss
7121ca5a2b Add a dockerignore for slightly faster local docker builds 2020-11-17 13:40:35 -08:00
Reuben Morais
08d18d7328
Merge pull request #3420 from CatalinVoss/remote-io
Remote I/O Training Setup
2020-11-17 11:53:32 +02:00
CatalinVoss
d0678cd1b7 Remove unused unordered imap from LimitPool 2020-11-16 13:47:21 -08:00
CatalinVoss
611633fcf6 Remove unnecessary uses of open_remote() where we know __file__ will always be local 2020-11-16 13:47:06 -08:00
CatalinVoss
b5b3b2546c Clean up remote I/O docs 2020-11-16 13:46:34 -08:00
CatalinVoss
fb6d4ca361 Add disclaimers to CSV and Tar writers 2020-11-13 19:36:07 -08:00
CatalinVoss
8c1a183c67 Clean up print debugging statements 2020-11-13 19:24:09 -08:00
CatalinVoss
47020e4ecb Add an imap_unordered helper to LimitPool -- I might experiment with this 2020-11-13 19:20:02 -08:00
CatalinVoss
3d2b09b951 Linter seems unhappy with conditional imports. Make gfile a module-level import.
I usually do this as a conditional because tf takes a while to load and it's nice to skip it when you want to run a script that just preps data or something like that, but it doesn't seem like a big deal.
2020-11-13 10:47:06 -08:00
CatalinVoss
2332e7fb76 Linter fix: define self.tmp_src_file_path in init 2020-11-13 10:45:53 -08:00
CatalinVoss
be39d3354d Perform data loading I/O within worker process rather than main process by wrapping Sample 2020-11-12 21:46:39 -08:00
CatalinVoss
fc0b495643 TODO: CSVWriter still totally breaks with remote paths 2020-11-12 16:46:59 -08:00
CatalinVoss
86cba458c5 Fix remote path handling for CSV sample reading 2020-11-12 16:40:59 -08:00
CatalinVoss
8fe972eb6f Fix wave file reading helpers 2020-11-12 16:40:40 -08:00
CatalinVoss
783cdad8db Fix downloader and taskcluster directory mgmt with remote I/O 2020-11-12 16:30:11 -08:00
CatalinVoss
64d278560d Why do we need absolute paths everywhere here? 2020-11-12 16:29:43 -08:00
CatalinVoss
0030cab220 Skip remote zipping for now 2020-11-12 16:29:23 -08:00
CatalinVoss
a6322b384e Fix remote I/O handling in train 2020-11-12 16:29:16 -08:00
CatalinVoss
8f31072998 Fix startswith check 2020-11-12 15:09:42 -08:00
CatalinVoss
90e2e1f7d2 Respect buffering, encoding, newline, closefd, and opener if we're looking at a local file 2020-11-12 14:45:05 -08:00
CatalinVoss
ad08830421 Work remote I/O into audio utils -- a bit more involved 2020-11-12 14:17:03 -08:00
CatalinVoss
3d503bd69e Add universal is_remote_path to I/O helper 2020-11-12 14:16:37 -08:00
CatalinVoss
c3dc4c0d5c Fix bad I/O helper fn replace errors 2020-11-12 14:06:22 -08:00
CatalinVoss
abe5dd2eb4 Remote I/O for taskcluster 2020-11-12 12:49:44 -08:00
CatalinVoss
296b74e01a Remote I/O for sample_collections 2020-11-12 10:54:44 -08:00
CatalinVoss
7de317cf59 Remote I/O for evaluate_tools 2020-11-12 10:49:33 -08:00
CatalinVoss
396ac7fe46 Remote I/O for downloader 2020-11-12 10:48:49 -08:00
CatalinVoss
933d96dc74 Fix relative imports 2020-11-12 10:47:26 -08:00
CatalinVoss
42170a57eb Remote I/O for config 2020-11-12 10:46:49 -08:00
CatalinVoss
83e5cf0416 Remote I/O fro check_characters 2020-11-12 10:46:15 -08:00
CatalinVoss
579921cc92 Work remote I/O into train script 2020-11-12 10:45:35 -08:00
CatalinVoss
53e3f5374f Add I/O helpers for remote file access 2020-11-12 10:44:19 -08:00
lissyx
12badcce1f
Merge pull request #3393 from imrahul361/master
Run test On Java Client
2020-11-05 16:30:41 +01:00
imrahul3610
3ac6b4fda6 Run test On Java Client 2020-11-05 19:10:50 +05:30
lissyx
8f9d6ad024
Merge pull request #3408 from lissyx/pr-3406
Pr 3406
2020-11-05 13:13:40 +01:00
dag7dev
3a2879933f initial commit for py39 support 2020-11-04 20:16:35 +01:00
Reuben Morais
b72e2643c4
Merge pull request #3395 from CatalinVoss/patch-1
Minor Training Variable Consistency fix
2020-11-03 21:50:59 +01:00
Catalin Voss
98e75c3c03
Call the logits probs in create_inference_graph after they go thru softmax 2020-11-03 09:49:27 -08:00
lissyx
19eeadd0f3
Merge pull request #3398 from lissyx/fix-rtd
Force npm install on RTD and set appropriate PATH value
2020-11-03 14:36:47 +01:00
Alexandre Lissy
1cd5e44a52 Force npm install on RTD and set appropriate PATH value 2020-11-03 14:33:52 +01:00
Catalin Voss
9a92fa40ca
Make variables consistent 2020-11-02 21:09:35 -08:00
lissyx
d9a35d63b0
Merge pull request #3390 from JRMeyer/contributing-docs
note about perf testing
2020-10-29 10:27:51 +01:00
Josh Meyer
b732e39567 note about perf testing
X-DeepSpeech: NOBUILD
2020-10-28 10:22:19 -04:00
lissyx
4427cf9a42
Merge pull request #3389 from suriyaa/patch-1
Use HTTPS in README.md
2020-10-27 12:32:04 +01:00
Suriyaa Sundararuban
87c44d75a3
Use HTTPS in README.md 2020-10-27 11:04:32 +01:00
lissyx
e6a281ed4f
Merge pull request #3383 from ftyers/node15
update for NodeJS 15
2020-10-26 18:22:22 +01:00
Francis Tyers
55e31c4025 update for NodeJS 15 2020-10-26 15:44:06 +00:00
lissyx
5e2a916899
Merge pull request #3385 from liezl200/sys-import-voxforge
Add missing sys import to import_voxforge.py
2020-10-23 15:07:02 +02:00
Liezl P
af7c4e90df Add missing sys import to import_voxforge.py 2020-10-22 23:09:49 -10:00
Reuben Morais
0798698e97
Merge pull request #3380 from piraka9011/patch-1
Convert channels for CV2 dataset
2020-10-17 00:43:08 +02:00
Anas Abou Allaban
521842deea
Convert channels for CV2 dataset
When running a training session on the CV2 dataset, it is possible to get the following error:

```
ValueError: Mono-channel audio required
```

This makes the [pysox Transformer](https://pysox.readthedocs.io/en/latest/api.html#sox.transform.Transformer.convert) also convert the channels.
2020-10-15 11:22:39 -04:00
lissyx
e508cd30b7
Merge pull request #3377 from actual-kwarter/master
Minor spelling fixes to CONTRIBUTING.rst X-DeepSpeech: NOBUILD
2020-10-14 10:28:33 +02:00
THCKwarter
e9fc614d8a Minor spelling fixes to CONTRIBUTING.rst X-DeepSpeech: NOBUILD 2020-10-13 22:53:49 -05:00
Reuben Morais
51e351e895
Merge pull request #3370 from tiagomoraismorgado/patch-1
X-DeepSpeech: NOBUILD
2020-10-12 14:18:09 +02:00
tiagomoraismorgado
f753b86ca9
[docs/typos/enhance] - mozilla/deepspeech/readme.rst - update
[docs/typos/enhance] - mozilla/deepspeech/readme.rst - update
2020-10-12 12:46:32 +01:00
lissyx
435b20d530
Merge pull request #3369 from nmstoker/patch-1
Tiny fix to addHotWord doc string parameters
2020-10-12 09:03:33 +02:00
Neil Stoker
2ca91039c8
Tiny fix to addHotWord doc string parameters
As the parameter for boost was actually written as "word" in the doc string, it was replacing the previous type for word with the type intended for boost and not showing any type for boost, thus messing up what displayed on https://deepspeech.readthedocs.io/en/master/Python-API.html
2020-10-11 17:46:20 +01:00
lissyx
7ca237d19b
Merge pull request #3361 from imrahul361/master
enable hot-words boosting for Javascript
2020-10-10 16:00:47 +02:00
imrahul3610
9df89bd945 Fix JavaScript binding calls for Hot Words 2020-10-10 11:30:27 +05:30
imrahul3610
368f76557a Run Tests on CI for JS Client 2020-10-10 11:30:27 +05:30
imrahul3610
29b39fd2d5 JS Binding Fix 2020-10-10 11:30:27 +05:30
Reuben Morais
07fcd5bcd1
Merge pull request #3360 from mozilla/utf8alphabet-python-bindings
Fix binding of UTF8Alphabet class in decoder package
2020-10-06 22:07:45 +02:00
Reuben Morais
cc2763e0b7 Add small bytes output mode scorer for tests 2020-10-06 18:19:34 +02:00
Reuben Morais
09f0aa3d75 Rename --force_utf8 flag to --force_bytes_output_mode to avoid confusion 2020-10-06 18:19:34 +02:00
Reuben Morais
83a36b7a34 Rename --utf8 flag to --bytes_output_mode to avoid confusion 2020-10-06 18:19:33 +02:00
Reuben Morais
fb4f5b6a84 Add some coverage for training and inference in bytes output mode 2020-10-06 18:19:33 +02:00
Reuben Morais
2fd11dd74a Fix binding of UTF8Alphabet class in decoder package 2020-10-06 13:13:34 +02:00
lissyx
421f44cf73
Merge pull request #3357 from JRMeyer/mono-channel-error-message
mono-channel error, not just an assertion
2020-10-03 11:08:07 +02:00
josh meyer
afee570f3c mono-channel error, not just an assertion
X-DeepSpeech: NOBUILD
2020-10-02 13:27:43 -07:00
lissyx
dd4122a04a
Merge pull request #3356 from lissyx/linux-valgrind
Linux valgrind
2020-10-01 18:49:19 +02:00
Alexandre Lissy
fdd663829a Fix #3355: Add valgrind runs 2020-10-01 15:31:21 +02:00
Alexandre Lissy
86bba80b0e Fix #3292: Linux debug builds 2020-10-01 12:40:24 +02:00
lissyx
f20f939ade
Merge pull request #3351 from lissyx/leak-intermediate-decode
Fix leak in C++ client
2020-09-29 18:30:13 +02:00
Alexandre Lissy
9a34507023 Fix leak in C++ client 2020-09-29 16:02:27 +02:00
lissyx
0c020d11bc
Merge pull request #3350 from lissyx/test-lzma-bz2
Auto-discover lzma/bz2 linkage of libmagic
2020-09-29 12:52:42 +02:00
Alexandre Lissy
9674ced520 Auto-discover lzma/bz2 linkage of libmagic 2020-09-29 10:52:37 +02:00
lissyx
c7d58d628e
Merge pull request #3343 from lissyx/docker-1.15.4
Use correct 1.15.4 docker image
2020-09-28 14:54:14 +02:00
Alexandre Lissy
02548c17de Fix #3347: Disable Git-LFS on Windows 2020-09-28 13:33:59 +02:00
Alexandre Lissy
57c26827c0 Use correct 1.15.4 docker image 2020-09-28 12:43:12 +02:00
lissyx
731dd1b6bd
Merge pull request #3338 from lissyx/tf-1.15.4
Fix #3088: Use TensorFlow 1.15.4 with CUDNN fix
2020-09-25 16:00:36 +02:00
lissyx
d7be8e2789
Fix typo on DS_ClearHotWords 2020-09-25 14:45:37 +02:00
lissyx
5a88417547
Merge pull request #3339 from lissyx/missing-hotword-c-doc
Fix missing doc for new Hot Word API
2020-09-25 14:40:27 +02:00
Alexandre Lissy
25c2965da8 Fix missing doc for new Hot Word API
X-DeepSpeech: NOBUILD
2020-09-25 14:39:42 +02:00
lissyx
0728ac259e
Merge pull request #3320 from lissyx/build-kenlm
Fix #3299: Build KenLM on CI
2020-09-25 14:36:23 +02:00
Alexandre Lissy
16165f3ddc Fix #3088: Use TensorFlow 1.15.4 with CUDNN fix 2020-09-25 14:11:06 +02:00
Alexandre Lissy
bf5ae9cf8a Fix #3299: Build KenLM on CI 2020-09-25 13:25:38 +02:00
lissyx
34a62bd1d1
Merge pull request #3337 from lissyx/bump-0.9.0a10
Bump VERSION to 0.9.0-alpha.10
2020-09-25 13:25:16 +02:00
Alexandre Lissy
445ebb233a Bump VERSION to 0.9.0-alpha.10 2020-09-25 11:05:12 +02:00
Josh Meyer
1eb155ed93
enable hot-word boosting (#3297)
* enable hot-word boosting

* more consistent ordering of CLI arguments

* progress on review

* use map instead of set for hot-words, move string logic to client.cc

* typo bug

* pointer things?

* use map for hotwords, better string splitting

* add the boost, not multiply

* cleaning up

* cleaning whitespace

* remove <set> inclusion

* change typo set-->map

* rename boost_coefficient to boost

X-DeepSpeech: NOBUILD

* add hot_words to python bindings

* missing hot_words

* include map in swigwrapper.i

* add Map template to swigwrapper.i

* emacs intermediate file

* map things

* map-->unordered_map

* typu

* typu

* use dict() not None

* error out if hot_words without scorer

* two new functions: remove hot-word and clear all hot-words

* starting to work on better error messages

X-DeepSpeech: NOBUILD

* better error handling + .Net ERR codes

* allow for negative boosts:)

* adding TC test for hot-words

* add hot-words to python client, make TC test hot-words everywhere

* only run TC tests for C++ and Python

* fully expose API in python bindings

* expose API in Java (thanks spectie!)

* expose API in dotnet (thanks spectie!)

* expose API in javascript (thanks spectie!)

* java lol

* typo in javascript

* commenting

* java error codes from swig

* java docs from SWIG

* java and dotnet issues

* add hotword test to android tests

* dotnet fixes from carlos

* add DS_BINARY_PREFIX to tc-asserts.sh for hotwords command

* make sure lm is on android for hotword test

* path to android model + nit

* path

* path
2020-09-24 14:58:41 -04:00
Reuben Morais
d466fb09d4 Bump VERSION to 0.9.0-alpha.9 2020-09-21 12:11:53 +02:00
Reuben Morais
cc62aa2eb8
Merge pull request #3279 from godefv/decoder_timesteps
The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.
2020-09-17 20:31:05 +02:00
godeffroy
188501a333 PR #3279 - Reverted unrelated and unwanted change. 2020-09-17 19:10:43 +02:00
godeffroy
371ddb84e5 PR #3279 - Added README.mozilla to tell where the object pool code is from and updated the object pool code from this origin (minor update). 2020-09-17 17:55:45 +02:00
godeffroy
5bf5124366 PR #3279 - Added some comments, harmonized a few names, removed unneeded spaces 2020-09-17 14:27:33 +02:00
lissyx
014479e650
Merge pull request #3324 from gtcooke94/fix_swb_import
Added `os` import in front of `makedirs`
2020-09-16 20:44:33 +02:00
Greg Cooke
20ad86c6ab Added os import in front of makedirs 2020-09-16 14:20:59 -04:00
godeffroy
23944b97db PR #3279 - Made the timestep tree thread safe 2020-09-16 14:03:59 +02:00
godeffroy
1fa2e4ebcc PR #3279 - Fixed buggy timestep tree root 2020-09-15 21:30:45 +02:00
lissyx
1b3e97c102
Merge pull request #3322 from lissyx/fix-docker-build
Fix #3321: Update NCCL dep to 2.7 following NVIDIA update
2020-09-15 16:37:29 +02:00
Alexandre Lissy
76d5fb6389 Fix #3321: Update NCCL dep to 2.7 following NVIDIA update 2020-09-15 13:40:17 +02:00
godeffroy
14bd9033d6 Revert "PR #3279 - removed unrelated code"
This reverts commit 78c4ef17b11fe681702cb0619a0b938a0b59f5bd.
2020-09-14 22:45:42 +02:00
godeffroy
15ce05aa01 PR #3279 - Fixed spaces 2020-09-14 14:40:56 +02:00
lissyx
346b5bdbae
Merge pull request #3318 from lissyx/electron-10
Fix #3316: Add Electron 10.x
2020-09-10 16:27:15 +02:00
Alexandre Lissy
2e92f53aac Use bigger build machine to avoid recurrent breakages of Linux/CUDA builds 2020-09-10 15:13:51 +02:00
Alexandre Lissy
a4d6c672d4 Fix #3316: Add Electron 10.x 2020-09-10 12:08:17 +02:00
lissyx
16a7a27275
Merge pull request #3319 from olafthiele/master-branch-error
Simplified git clone msg to prevent error reportings
2020-09-10 11:51:27 +02:00
Olaf Thiele
de1e3d7aa0 Simplified install text 2020-09-10 10:58:28 +02:00
lissyx
dda2d22310
Merge pull request #3314 from olafthiele/master-branch-errors
Trying to get fewer master branch training errors
2020-09-09 15:08:36 +02:00
godeffroy
f07c10452b PR #3279 - use unique_ptr instead of shared_ptr in the timestep tree 2020-09-09 11:04:37 +02:00
lissyx
ce95be1354
Merge pull request #3315 from lissyx/bump-v0.9.0-alpha.8
Bump VERSION to v0.9.0-alpha.8
2020-09-09 10:53:46 +02:00
Alexandre Lissy
b30e0fb815 Bump VERSION to v0.9.0-alpha.8 2020-09-09 08:49:46 +02:00
Olaf Thiele
a2e88a30de
More compact version 2020-09-08 15:32:23 +02:00
Olaf Thiele
39a963af90
Update TRAINING.rst 2020-09-08 14:54:08 +02:00
godeffroy
3a49344ccb PR #3279 - use an object pool to store timesteps tree nodes 2020-09-08 14:12:39 +02:00
lissyx
11be0a57d4
Merge pull request #3313 from mozilla/erogol-patch-1
fix missing import 'sys'
2020-09-08 10:43:02 +02:00
Eren Gölge
b2df360799
fix missing import 'sys' 2020-09-08 10:15:22 +02:00
godeffroy
ec55597412 PR #3279 - use a tree structure to store timesteps 2020-09-07 13:37:27 +02:00
lissyx
012e7bfb5e
Merge pull request #3309 from JRMeyer/docs-contributing
Docs contributing
2020-09-07 12:18:32 +02:00
Josh Meyer
ff057e86c7 bold instead of ticks 2020-09-02 10:54:13 -04:00
lissyx
91e70602ce
Merge pull request #3307 from techiaith/master
updating docs for #3295
2020-09-02 16:29:31 +02:00
Dewi Bryn Jones
8a8d140da8 updating docs for #3295 2020-09-02 15:13:57 +01:00
lissyx
b6f5ddfe54
Merge pull request #3301 from techiaith/master
Fix for setuptools._distutils issue (#3295)
2020-09-02 14:32:37 +02:00
Josh Meyer
fdf6aeb22b first stab at CONTRIBUTING.rst 2020-09-02 08:29:19 -04:00
Dewi Bryn Jones
a6dff311f6 fix for #3295 2020-09-02 13:10:02 +01:00
lissyx
9377aaf3a0
Merge pull request #3296 from lissyx/transcribe-ci
Fix #3129: Add CI coverage for transcribe.py
2020-09-01 19:13:28 +02:00
Alexandre Lissy
32ad25b088 Fix #3129: Add CI coverage for transcribe.py 2020-09-01 17:49:31 +02:00
godeffroy
1f89bef5f0 PR #3279 - avoid unnecessary copies of timesteps vectors 2020-08-31 19:01:47 +02:00
lissyx
ccb1a6b0d4
Merge pull request #3278 from DanBmh/refactor_rlrop_cond
Refactor rlrop condition
2020-08-31 16:49:01 +02:00
lissyx
26f99874a6
Merge pull request #3293 from lissyx/decouple-builds
Decouple builds
2020-08-31 14:53:05 +02:00
Daniel
c10f7f1ad6 Refactor rlrop condition. 2020-08-31 12:57:48 +02:00
Alexandre Lissy
4bc14acb12 Decouple builds
Fixes #3170
2020-08-31 12:04:04 +02:00
godeffroy
363121235e PR #3279 - revert to non RVO code (fix) 2020-08-31 10:15:34 +02:00
godeffroy
e9466160c7 PR #3279 - revert to non RVO code 2020-08-31 09:54:38 +02:00
godeffroy
59c73f1c46 PR #3279 - assert instead of reporting error to std::cerr 2020-08-31 09:38:54 +02:00
godeffroy
78c4ef17b1 PR #3279 - removed unrelated code 2020-08-31 09:33:22 +02:00
godeffroy
c3d6f8d923 PR #3279 - replaced tabulations by spaces 2020-08-31 08:53:26 +02:00
lissyx
555a265010
Merge pull request #3290 from lissyx/re-fix-swig-master
Fix SWIG prebuild URL
2020-08-28 18:27:39 +02:00
Alexandre Lissy
160fa76ddf Fix SWIG prebuild URL 2020-08-28 17:18:17 +02:00
lissyx
f554ac0b38
Merge pull request #3284 from lissyx/new-macOS-VMs
Switch to new macOS VM setup
2020-08-28 15:23:07 +02:00
Alexandre Lissy
3e6593d325 Switch to new macOS VM setup 2020-08-28 10:15:56 +02:00
Reuben Morais
1c9f3bc99d
Merge pull request #3286 from mozilla/test-pr-3268
Test PR #3268
2020-08-27 20:02:48 +02:00
Daniel
93a4de5489 Fix lr initialization on reload. 2020-08-27 15:08:32 +02:00
Reuben Morais
8965b29e81 Point back to examples master branch 2020-08-27 09:31:05 +02:00
Reuben Morais
becc3d9745
Merge pull request #3280 from mozilla/undo-renames
Undo renames
2020-08-27 09:27:45 +02:00
Reuben Morais
3aa3862fbc Fix TF cache references after rebase 2020-08-26 11:47:35 +02:00
Reuben Morais
b70db48f91 Rename new tasks 2020-08-26 11:46:09 +02:00
Reuben Morais
dc2503c5e0 Specify macOS SDK version along with minimum version in builds 2020-08-26 11:46:09 +02:00
Reuben Morais
b9e2d90a73 Point to reverted examples changes 2020-08-26 11:46:09 +02:00
Reuben Morais
81ce543670 Fix bad conflict resolution in bazel rebuild check 2020-08-26 11:46:09 +02:00
Reuben Morais
8f2c1e842a Explicitly name repository clone target in Dockerfiles 2020-08-26 11:46:09 +02:00
Reuben Morais
d1c964c5d5 Adjust TF cache indices for 2.3 + renames undone 2020-08-26 11:46:09 +02:00
Reuben Morais
ae0cf8db6a Revert "Merge branch 'rename-real'"
This reverts commit ae9fdb183ec6eb422635c0e3a44c0c2ee5732224, reversing
changes made to 2eb75b62064ac30c1c537f4174d00b6e521042c5.
2020-08-26 11:46:09 +02:00
Reuben Morais
386935e1fa Revert "Merge pull request #3230 from mozilla/rename-nuget-gpu-to-cuda"
This reverts commit 0610a7a76fba80df73a220b76b07946ba9ac4581, reversing
changes made to c31df0fd4cba77e632b1ad76c27162727a98e540.
2020-08-26 11:46:08 +02:00
Reuben Morais
01fd13b663 Revert "Merge pull request #3229 from mozilla/nodejs-scoped-name"
This reverts commit 402fc71abf01491cb6b99cc4f9cb69820c0fb842, reversing
changes made to 0610a7a76fba80df73a220b76b07946ba9ac4581.
2020-08-26 11:46:08 +02:00
Reuben Morais
da55cfae86 Revert "Merge pull request #3237 from lissyx/rename-training-package"
This reverts commit 3dcb3743acc14ed9de63110709446791892f8936, reversing
changes made to 457198c88d7ad96ee4596cb21deaeca77c277898.
2020-08-26 11:46:08 +02:00
Reuben Morais
fee45c425e Revert "Merge pull request #3233 from lissyx/examples-rename-master"
This reverts commit 86845dd022f9f77ddc4aff8023b9d5d2a663078a, reversing
changes made to 3dcb3743acc14ed9de63110709446791892f8936.
2020-08-26 11:46:08 +02:00
Reuben Morais
d000d76548 Revert "Merge pull request #3239 from lissyx/rename-circleci"
This reverts commit 08cebeda3c43b10bd8caa766ccd0feec7e305735, reversing
changes made to 86845dd022f9f77ddc4aff8023b9d5d2a663078a.
2020-08-26 11:46:08 +02:00
Reuben Morais
7f99007840 Revert "Merge pull request #3238 from lissyx/rename-index"
This reverts commit 1a7dd876017d0e7451abb1101d154b71b8d8edb5, reversing
changes made to 08cebeda3c43b10bd8caa766ccd0feec7e305735.
2020-08-26 11:46:06 +02:00
Reuben Morais
10e2fc16f2 Revert "Merge pull request #3243 from lissyx/rename-stt-master"
This reverts commit 3e99b0d8b2b2d6e47c8ff7eb1dfd9a88eba8e6d8, reversing
changes made to 3a8c45cb619589f5f6acf4bfb71e7d6b18e8eab5.
2020-08-26 11:45:06 +02:00
Reuben Morais
7a6508612d Revert "Merge pull request #3246 from lissyx/fix-docker"
This reverts commit c01fda56c058779cc9dba952ce940c47398c4ed3, reversing
changes made to 3e99b0d8b2b2d6e47c8ff7eb1dfd9a88eba8e6d8.
2020-08-26 11:45:06 +02:00
Reuben Morais
c62a604876 Revert "Merge pull request #3248 from lissyx/rtd-rename"
This reverts commit ce71910ab4533e84eaf7be92bc1eb447305f4bd6, reversing
changes made to 7c6108a199f1d8f892c2d52088850aaa5a8792e9.
2020-08-26 11:45:06 +02:00
Reuben Morais
9788811bc5 Revert "Merge pull request #3241 from lissyx/rename-ctcdecoder"
This reverts commit fd4185f1410a39af19742310403151646318faba, reversing
changes made to 1a7dd876017d0e7451abb1101d154b71b8d8edb5.
2020-08-26 11:45:06 +02:00
lissyx
9daa708047
Merge pull request #3276 from lissyx/pr3256
Pr3256
2020-08-25 22:39:50 +02:00
lissyx
903fec464a
Merge pull request #3272 from godefv/master
In ctc_beam_search_decoder(), added a sanity check between input class_dim and alphabet
2020-08-25 14:04:41 +02:00
Bernardo Henz
8284958f3d Updating tensorflow version in taskcluster/.build.yml 2020-08-25 13:22:35 +02:00
Bernardo Henz
9f3c40ce48 Replacing old sha with new ones
Replacing old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') with the new one ('23ad988fcde60fb01f9533e95004bbc4877a9143')
2020-08-25 13:22:35 +02:00
Bernardo Henz
b4bc6bfb8a Updating commit of submodule 2020-08-25 13:21:12 +02:00
Bernardo Henz
1f54daf007 Default for layer_norm set to False 2020-08-25 13:18:30 +02:00
Bernardo Henz
2fcba677bb Implementation of layer-norm in the training script 2020-08-25 13:18:30 +02:00
godeffroy
95b6fccaf1 In ctc_beam_search_decoder(), added a sanity check between input class_dim and alphabet 2020-08-25 12:13:28 +02:00
godeffroy
04a36fbf68 The CTC decoder timesteps now corresponds to the timesteps of the most
probable CTC path, instead of the earliest timesteps of all possible paths.
2020-08-25 12:08:14 +02:00
lissyx
c5db91413f
Merge pull request #3277 from lissyx/doc-r2.3
Update docs for matching r2.3
2020-08-24 21:17:11 +02:00
Alexandre Lissy
e81ee24ede Update docs for matching r2.3 2020-08-24 21:13:16 +02:00
lissyx
a54b198d1e
Merge pull request #3266 from lissyx/electronjs-9.2
Add ElectronJS v9.2
2020-08-20 12:42:59 +02:00
Alexandre Lissy
4283b7e7de Add ElectronJS v9.2 2020-08-20 11:15:40 +02:00
Reuben Morais
d14c2b2e2d
Merge pull request #3261 from mozilla/reload-weights-plateau-tests
Tests #3245 Reload weights after plateau
2020-08-20 09:48:02 +02:00
lissyx
f4f8d2d7b7
Merge pull request #3264 from ptitloup/ptitloup-patch-python-client
Update client.py
2020-08-20 09:39:54 +02:00
Ptitloup
0c3aa6f472
Update client.py
remove space in key start_time of word dict
2020-08-20 09:19:53 +02:00
Reuben Morais
567a50087d
Merge pull request #3259 from mozilla/macos-min-10.10
Explicitly set minimum macOS version in bazel flags
2020-08-20 00:11:26 +02:00
Daniel
420ba808c8 Reload graph with extra function. 2020-08-19 18:45:09 +02:00
Daniel
4cf7a012a3 Don't drop layers in rlrop reload. 2020-08-19 18:45:09 +02:00
Daniel
09e1422278 Reload weights after plateau. 2020-08-19 18:45:09 +02:00
lissyx
b5c871616c
Merge pull request #3262 from lissyx/fix-docker-build
Use more beefy builder for Docker builds
2020-08-19 17:46:17 +02:00
Alexandre Lissy
5cc1ec32bd Use more beefy builder for Docker builds 2020-08-19 17:26:59 +02:00
Reuben Morais
2bceda0c56 Explicitly set minimum macOS version in bazel flags 2020-08-19 14:17:42 +02:00
lissyx
eb23728538
Merge pull request #3258 from Jendker/docu_filesize
Extend docu about the CSV files
2020-08-19 12:08:23 +02:00
Jedrzej Beniamin Orbik
9a6a1c7f3a Extend docu about the CSV files 2020-08-19 11:32:52 +02:00
lissyx
02afc2ac7e
Merge pull request #3254 from lissyx/bump-v0.9.0a7
Bump VERSION to 0.9.0-alpha.7
2020-08-18 15:14:13 +02:00
Alexandre Lissy
19ed4e950a Bump VERSION to 0.9.0-alpha.7 2020-08-18 12:23:41 +02:00
lissyx
c40f90cbff
Merge pull request #3227 from lissyx/use-r2.3
Move to TensorFlow r2.3
2020-08-18 10:56:50 +02:00
lissyx
90e04fb365
Merge pull request #3251 from Jendker/ctc_multiple_transc
Add num_results param to ctc_beam_search_decoder
2020-08-17 20:07:59 +02:00
Jedrzej Beniamin Orbik
c20af74d51 Add num_results param to ctc_beam_search_decoder 2020-08-17 18:29:08 +02:00
Alexandre Lissy
8619665fe1 Move to TensorFlow r2.3 2020-08-14 11:26:09 +02:00
lissyx
ce71910ab4
Merge pull request #3248 from lissyx/rtd-rename
Update name of readthedocs
2020-08-13 22:55:14 +02:00
Alexandre Lissy
fffc6ad455 Update name of readthedocs 2020-08-13 22:50:57 +02:00
lissyx
7c6108a199
Merge pull request #3236 from tilmankamp/tarexport
Resolves #3235 - Support for .tar(.gz) targets in bin/data_set_tool.py
2020-08-13 15:52:24 +02:00
Tilman Kamp
96f37a403d Resolves #3235 - Support for .tar(.gz) targets in bin/data_set_tool.py 2020-08-13 10:21:45 +02:00
lissyx
a6f40a3b2f
Merge pull request #3244 from lissyx/bump-v0.9.0a6
Bump VERSION to 0.9.0-alpha.6
2020-08-12 19:01:14 +02:00
Alexandre Lissy
2838df25e0 Bump VERSION to 0.9.0-alpha.6 2020-08-12 17:54:35 +02:00
lissyx
c01fda56c0
Merge pull request #3246 from lissyx/fix-docker
Fix docker path with new project name
2020-08-12 17:49:10 +02:00
Alexandre Lissy
1ad6ad9708 Fix docker path with new project name 2020-08-12 17:12:51 +02:00
lissyx
3e99b0d8b2
Merge pull request #3243 from lissyx/rename-stt-master
Rename DeepSpeech -> STT
2020-08-12 16:18:32 +02:00
Alexandre Lissy
9bca7a9044 Rename DeepSpeech -> STT 2020-08-12 13:52:17 +02:00
lissyx
3a8c45cb61
Merge pull request #3242 from lissyx/improve-tc-cleanup
Try and properly cleanup TaskCluster Workdir
2020-08-12 12:03:18 +02:00
Alexandre Lissy
60fe2450a7 Try and properly cleanup TaskCluster Workdir 2020-08-12 10:58:14 +02:00
lissyx
fd4185f141
Merge pull request #3241 from lissyx/rename-ctcdecoder
Rename ctcdecoder python package
2020-08-11 19:06:56 +02:00
lissyx
1a7dd87601
Merge pull request #3238 from lissyx/rename-index
Rename TaskCluster index
2020-08-11 19:06:14 +02:00
Alexandre Lissy
ccd9241bd0 Rename ctcdecoder python package 2020-08-10 22:45:43 +02:00
Alexandre Lissy
5795173c14 Rename TaskCluster index 2020-08-10 22:08:39 +02:00
lissyx
08cebeda3c
Merge pull request #3239 from lissyx/rename-circleci
Use new name for Docker container and Docker Hub repo
2020-08-10 20:26:15 +02:00
Alexandre Lissy
e83d92c93a Use new name for Docker container and Docker Hub repo 2020-08-10 20:24:45 +02:00
lissyx
86845dd022
Merge pull request #3233 from lissyx/examples-rename-master
Rename DeepSpeech-examples to STT-examples
2020-08-10 19:02:53 +02:00
Alexandre Lissy
7d31f5e349 Rename DeepSpeech-examples to STT-examples 2020-08-10 18:35:34 +02:00
lissyx
3dcb3743ac
Merge pull request #3237 from lissyx/rename-training-package
Rename deepspeech_training package
2020-08-10 18:35:03 +02:00
Alexandre Lissy
6f84bd1996 Rename deepspeech_training package 2020-08-10 16:58:18 +02:00
lissyx
457198c88d
Merge pull request #3232 from lissyx/bump-v0.9.0-alpha.5
Bump VERSION to 0.9.0-alpha.5
2020-08-07 13:16:38 +02:00
Alexandre Lissy
41dcb41691 Bump VERSION to 0.9.0-alpha.5 2020-08-07 11:39:47 +02:00
lissyx
402fc71abf
Merge pull request #3229 from mozilla/nodejs-scoped-name
Use scoped name for npm package
2020-08-07 00:52:05 +02:00
Reuben Morais
50de377953 Use scoped name for npm package 2020-08-06 18:55:42 +02:00
Reuben Morais
0610a7a76f
Merge pull request #3230 from mozilla/rename-nuget-gpu-to-cuda
Rename NuGet -GPU package to -CUDA
2020-08-06 17:43:25 +02:00
Reuben Morais
1e8213c385 Rename NuGet -GPU package to -CUDA 2020-08-06 16:16:33 +02:00
Reuben Morais
c31df0fd4c Bump VERSION to 0.9.0-alpha.4 2020-08-06 14:25:39 +02:00
Reuben Morais
ae9fdb183e Merge branch 'rename-real' 2020-08-06 14:20:39 +02:00
Reuben Morais
0b51004081 Address review comments 2020-08-06 14:20:05 +02:00
Reuben Morais
4d98958b77 iOS: Re-share workspace schemes and fix packaging 2020-08-05 17:49:51 +02:00
Reuben Morais
8c840bed23 Fix .NET build/package, resolve package conflict in Java app 2020-08-05 17:49:51 +02:00
lissyx
2eb75b6206
Merge pull request #3224 from lissyx/electronjs-example
Electron example
2020-08-04 22:23:27 +02:00
Alexandre Lissy
bb24fc89f0 Electron example 2020-08-04 21:57:23 +02:00
Reuben Morais
4d726e820d More renames 2020-08-04 18:04:08 +02:00
Reuben Morais
ee1235678d Missing renames in CI scripts 2020-08-04 15:25:46 +02:00
lissyx
3340cb6b8a
Merge pull request #3218 from lissyx/new-workerType
Fix #3181: Use finer-grained gcp workers
2020-08-04 13:12:06 +02:00
Reuben Morais
b301cdf83e JavaScript rename 2020-08-04 12:13:11 +02:00
Reuben Morais
5449f21a47 Python rename 2020-08-04 12:12:20 +02:00
Reuben Morais
b86a92a5b3 C docs 2020-08-04 12:10:31 +02:00
Reuben Morais
b18639f9c4 Swift rename 2020-08-04 12:09:41 +02:00
Reuben Morais
213590b326 Java rename 2020-08-04 11:39:22 +02:00
Reuben Morais
ee7bf86460 .NET rename 2020-08-04 11:15:27 +02:00
Alexandre Lissy
040f5eb2a3 Fix #3181: Use finer-grained gcp workers 2020-08-04 11:15:07 +02:00
lissyx
b65cd7e810
Merge pull request #3208 from lissyx/fix-linker
Fix #3207: do not force -shared on the linkage
2020-08-04 11:12:12 +02:00
lissyx
9cd6863e4a
Merge pull request #3214 from lissyx/win-workers
Fix #3211: Use win + win-gpu set
2020-08-03 21:55:53 +02:00
Alexandre Lissy
6d5d97abc4 Fix #3207: do not force -shared on the linkage 2020-08-03 18:58:43 +02:00
Alexandre Lissy
b6edcbe08c Fix #3211: Use win + win-gpu set 2020-08-03 18:32:20 +02:00
Reuben Morais
fa21911048 Rename packages, modules, headers, shared libraries to Mozilla Voice STT 2020-08-03 18:22:32 +02:00
Reuben Morais
21e5a74b0c
Merge pull request #3212 from mozilla/link-decoder-docs
Decoder docs: UTF-8 -> Bytes output mode, and link to scorer-scripts (Closes #2978)
2020-08-03 11:57:43 +02:00
Reuben Morais
d182cb7a58
Merge pull request #3213 from mozilla/remove-tensorflow-mention-cuda
Remove mention of TensorFlow docs for CUDA requirements
2020-08-03 11:57:27 +02:00
Reuben Morais
350575ba44 Remove mention of TensorFlow docs for CUDA requirements 2020-08-03 09:22:19 +02:00
Reuben Morais
d9f9d6ed89 Decoder docs: UTF-8 -> Bytes output mode, and link to scorer-scripts 2020-08-03 09:16:38 +02:00
Reuben Morais
04deda0239
Merge pull request #3206 from mrstegeman/alphabet-logic
Fix alphabet logic in generate_scorer_package.
2020-08-02 17:24:13 +02:00
lissyx
482cc534cf
Merge pull request #3204 from lissyx/no-dotnet-examples
Fix #3198: Do not rely on examples repo for building .Net
2020-08-02 11:35:17 +02:00
Alexandre Lissy
c55143d282 Fix #3198: Do not rely on examples repo for building .Net 2020-08-02 02:02:41 +02:00
Michael Stegeman
3024cffe49
Fix alphabet logic in generate_scorer_package.
Fixes #3205
2020-07-31 12:46:03 -08:00
Reuben Morais
41db367428
Merge pull request #3201 from mozilla/update-examples-models
Update examples model to match new naming
2020-07-31 08:42:48 +02:00
Reuben Morais
4b10f0b840 Update examples model to match new naming 2020-07-30 22:39:14 +02:00
Reuben Morais
d3efa4c438
Merge pull request #3199 from mozilla/ios-publish-all
Upload both native_client and .framework for iOS tasks
2020-07-30 22:26:25 +02:00
Reuben Morais
c8441d1f8d Upload both native_client and .framework for iOS tasks 2020-07-30 20:33:58 +02:00
Reuben Morais
4ce32c157b
Merge pull request #3195 from mozilla/remove-unused-decoder-method
Remove unused Scorer method
2020-07-30 13:50:52 +02:00
Reuben Morais
6141740f89 Remove unused Scorer method
This method was made unused by https://github.com/mozilla/DeepSpeech/pull/3021
after reports such as https://github.com/mozilla/DeepSpeech/issues/3004
of confusion interpreting the confidence values.
2020-07-30 11:00:00 +02:00
Reuben Morais
396504ea07
Merge pull request #3191 from reuben/swift-mic-streaming
iOS microphone streaming
2020-07-28 14:25:27 +02:00
Reuben Morais
c1fd93ac8d
Merge pull request #3192 from mozilla/remove-scorer
Remove external scorer file and documentation and flag references
2020-07-28 11:15:00 +02:00
Reuben Morais
216da91842 Remove Git LFS from docs 2020-07-28 11:05:10 +02:00
Reuben Morais
e3c34b29d6 Small adjustments to avoid hardcoding filenames and avoid generic DeepSpeech name 2020-07-28 10:28:29 +02:00
Erik Ziegler
35d2908db9 Add support for microphone streaming in swift native client test project 2020-07-28 10:28:28 +02:00
Reuben Morais
2835151951 Remove external scorer file and documentation and flag references 2020-07-27 21:09:32 +02:00
lissyx
d98bf84b41
Merge pull request #3188 from lissyx/msys2-keyring
Fix #3187: update msys2 installer
2020-07-27 15:56:26 +02:00
Alexandre Lissy
e13804321f Fix #3187: update msys2 installer 2020-07-27 13:05:50 +02:00
lissyx
aa597b3f1f
Merge pull request #3185 from lissyx/doc-generate-scorer
Doc generate scorer
2020-07-27 10:27:54 +02:00
Alexandre Lissy
9e3c4209b9 Fix #3184: add missing label for data augmentation doc
X-DeepSpeech: NOBUILD
2020-07-27 10:27:10 +02:00
Alexandre Lissy
8629573587 Fix #3182: document rebuild of generate_scorer_package
X-DeepSpeech: NOBUILD
2020-07-27 10:23:46 +02:00
Tilman Kamp
15a624134a
Merge pull request #3179 from tilmankamp/fix-for-librosa
Fixes #3178 - Librosa requires 1-dimensional array for mono samples
2020-07-24 14:32:17 +02:00
Tilman Kamp
9e023660ef
Merge pull request #3177 from tilmankamp/reverse
Resolves #1565 - Limiting and reversing data-sets
2020-07-24 11:26:53 +02:00
Tilman Kamp
ecbdf46940 Fixes #3178 - Librosa requires 1-dimensional array for mono samples 2020-07-24 10:37:30 +02:00
Tilman Kamp
9a5d19d7c5 Resolves #1565 - Limiting and reversing data-sets 2020-07-24 10:30:34 +02:00
Reuben Morais
816c2d84ce
Merge pull request #3176 from mozilla/alphabet-binding-docs
Document Alphabet methods in decoder binding as well
2020-07-23 16:06:54 +02:00
Reuben Morais
2cdc228db4 Use Alphabet.CanEncode in text_to_char_array 2020-07-23 13:16:12 +02:00
Reuben Morais
eb33fc1719 Document Alphabet methods in Python binding as well 2020-07-23 13:00:10 +02:00
Reuben Morais
38f6afdba8
Merge pull request #3173 from mozilla/ios-framework-publish
Build and publish iOS framework in GitHub release files
2020-07-22 11:44:51 +02:00
Reuben Morais
844b375e7d Address review comments 2020-07-22 10:35:56 +02:00
Reuben Morais
509d06d474 Fix typo in ios-package.sh 2020-07-22 09:49:19 +02:00
Reuben Morais
47685f059f Disable code signing in CI builds 2020-07-22 00:19:07 +02:00
Reuben Morais
2fd1474e69 Fix deepspeech_ios_project reference after folder move 2020-07-22 00:19:07 +02:00
Reuben Morais
ce0ef4fd1e Build and publish deepspeech_ios.framework 2020-07-22 00:19:06 +02:00
lissyx
a24d7ab5b1
Merge pull request #3100 from carlfm01/uwp
Add UWP Nuget packing support
2020-07-21 21:26:44 +02:00
Reuben Morais
9bdce0a305 Move deepspeech_ios_test projects to same level as deepspeech_ios 2020-07-21 19:09:14 +02:00
Carlos Fonseca M
48fb43c3eb Add UWP Nuget packing support 2020-07-21 10:38:37 -06:00
Tilman Kamp
b18a3a4ef5
Merge pull request #3147 from tilmankamp/data_set_tool
Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly
2020-07-21 18:35:50 +02:00
Tilman Kamp
a982a61d83 Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly 2020-07-21 17:02:01 +02:00
Reuben Morais
ffcec7f9aa
Merge pull request #3150 from mozilla/ios-build
iOS support
2020-07-20 14:10:52 +02:00
Reuben Morais
2672878618 Add docs to Swift bindings and missing methods 2020-07-20 11:52:35 +02:00
Reuben Morais
d9dac13343 Clean up tf_tc-build.sh 2020-07-20 11:17:05 +02:00
Reuben Morais
5e5db17371 Address review comments 2020-07-20 11:16:57 +02:00
Reuben Morais
de7a249fcd Fix linker issues during tests with new workers 2020-07-20 00:22:46 +02:00
Reuben Morais
be43b3fdc1 Bump caches for artifacts rebuilt on new worker 2020-07-20 00:22:46 +02:00
Reuben Morais
f0f4b0ddc1 Remove even more bazel flags 2020-07-20 00:22:46 +02:00
Reuben Morais
aa8e9b0647 Use correct build flags for ARM64 vs x86_64 2020-07-20 00:22:45 +02:00
Reuben Morais
2f568e7785 Don't use BAZEL_OPT_FLAGS in iOS builds 2020-07-20 00:22:45 +02:00
Reuben Morais
6c38d56968 Use submodule TF tc-vars.sh 2020-07-20 00:22:45 +02:00
Reuben Morais
e8d642bf44 Bump TensorFlow to remove usage of -z linker keyword on iOS 2020-07-20 00:22:45 +02:00
Reuben Morais
f7c50663e1 Checkout fixed formulas commit in tf_tc-brew.sh 2020-07-20 00:22:45 +02:00
Reuben Morais
a274c26a89 Add Swift wrapper framework 2020-07-20 00:22:45 +02:00
Reuben Morais
a1aa873259 Embed bitcode when linking 2020-07-20 00:22:44 +02:00
Reuben Morais
4ca0f94d78 client.cc iOS build 2020-07-20 00:22:44 +02:00
Reuben Morais
c85f95f781 Add DeepSpeech iOS tasks 2020-07-20 00:22:44 +02:00
Reuben Morais
3ce07afae0 Add TensorFlow iOS tasks 2020-07-20 00:22:42 +02:00
Reuben Morais
972f3031fe Merge branch 'new-workers' (Fixes #3168) 2020-07-19 22:36:48 +02:00
Reuben Morais
0e8a28de57 Bump caches and fix linker issues in new workers 2020-07-19 22:36:36 +02:00
lissyx
e1626c667e
Merge pull request #3149 from karansag/patch-1
Update TRAINING.rst
2020-07-17 10:20:21 +02:00
Karan Sagar
36a2f3b38d
Update TRAINING.rst
Update wording on relative / absolute paths.
2020-07-16 12:38:59 -04:00
lissyx
78ae08cdb4
Merge pull request #3161 from lissyx/bump-v0.9.0-alpha.3
Bump VERSION to 0.9.0-alpha.3
2020-07-15 16:22:46 +02:00
Alexandre Lissy
554cfae020 Bump VERSION to 0.9.0-alpha.3 2020-07-15 16:22:06 +02:00
lissyx
3fd8049bfd
Merge pull request #3160 from lissyx/circleci
Fix #3157: Add CircleCI config
2020-07-15 16:19:21 +02:00
Alexandre Lissy
c31c4843b3 Fix #3157: Add CircleCI config 2020-07-15 16:17:51 +02:00
lissyx
75804924f2
Merge pull request #3159 from tirkarthi/fix-xml
Use ElementTree instead of deprecated cElementTree.
2020-07-15 15:00:18 +02:00
Karthikeyan Singaravelan
0f27c802d9 Use ElementTree instead of deprecated cElementTree. 2020-07-15 12:46:00 +00:00
lissyx
bb7a0457a3
Merge pull request #3151 from pbxqdown/master
Fix several typos in docs.
2020-07-13 12:13:30 +02:00
Qian Xiao
37dc3e08a4 Fix several typos in docs. 2020-07-11 16:23:50 -07:00
Karan Sagar
058f53af3a
Update TRAINING.rst
I'm new to DeepSpeech, but I noticed when following the training instructions that the filenames _appear_ to be relative paths in the CSV. Let me know if I'm misinterpreting.

Thanks!
2020-07-10 15:24:45 -04:00
Tilman Kamp
84f4c15278
Merge pull request #3145 from tilmankamp/build_sdb_aug
Resolves #3144 - Add augmentation support to build_sdb.py
2020-07-09 14:35:05 +02:00
Tilman Kamp
61bd5dd88d Resolves #3144 - Add augmentation support to build_sdb.py 2020-07-09 11:55:26 +02:00
lissyx
825923a652
Merge pull request #3142 from lissyx/electronjs-v9.1
Fix #3141: Add ElectronJS v9.1
2020-07-08 18:06:56 +02:00
Alexandre Lissy
9f953d12ba Fix nasty regression on some build/cache tasks 2020-07-08 17:09:25 +02:00
Alexandre Lissy
b832acb54b Fix #3141: Add ElectronJS v9.1 2020-07-08 17:00:33 +02:00
Reuben Morais
48cd53e474 Merge branch 'reference-training-decoder-docs' (Fixes #3140) 2020-07-08 13:18:29 +02:00
Reuben Morais
5fd4a0451f Address review comments 2020-07-08 13:18:18 +02:00
lissyx
fe7fdb95f6
Merge pull request #3139 from lissyx/electron-builder-win
Fix #3127: Adjust PATH for electronjs/windows with electron-builder
2020-07-08 01:13:10 +02:00
Alexandre Lissy
48f904ac27 Fix #3127: Adjust PATH for electronjs/windows with electron-builder 2020-07-07 22:09:32 +02:00
Reuben Morais
672ce377ac Only update examples submodule from remote 2020-07-07 20:02:31 +02:00
Reuben Morais
daf28086e5 Add note on model input data considerations and reference training/scorer docs 2020-07-07 20:02:31 +02:00
Tilman Kamp
6882248ab0
Merge pull request #3137 from tilmankamp/fix_missing_alphabet
Fix: #3130 - Missing deepspeech_training.util.text.Alphabet
2020-07-07 17:50:07 +02:00
Reuben Morais
d412b86b0d
Merge pull request #3135 from mozilla/java-inconsistencies
Fix some style inconsistencies in Java bindings (Fixes #3121)
2020-07-07 17:13:05 +02:00
Tilman Kamp
084da3724d Fix: #3130 - Missing deepspeech_training.util.text.Alphabet 2020-07-07 17:02:24 +02:00
Reuben Morais
2471b10c27 Update Java tests 2020-07-07 15:24:15 +02:00
Reuben Morais
16f89dff9e Update Java docs 2020-07-07 15:22:36 +02:00
Reuben Morais
417b8e4fe3 Fix style inconsistencies in Java bindings 2020-07-07 15:18:23 +02:00
Reuben Morais
18ea7391f3 Bump VERSION to 0.9.0-alpha.2 2020-07-07 10:53:28 +02:00
Reuben Morais
c64e416f61
Merge pull request #3131 from mozilla/alphabet-fallible
Add methods to check for label presence in Alphabet
2020-07-07 10:51:33 +02:00
Reuben Morais
c6dc7ba8c0 Add methods to check for label presence in Alphabet 2020-07-06 19:34:03 +02:00
Reuben Morais
7f2964e6ab Bump VERSION to 0.9.0-alpha.1 2020-07-06 11:30:35 +02:00
Reuben Morais
66d1f167fc
Merge pull request #3125 from mozilla/utf8-regressions
Fix some regressions from Alphabet refactoring (Fixes #3123)
2020-07-04 13:21:09 +02:00
Reuben Morais
03ed4a45f8 Don't add empty lines to Alphabet when parsing 2020-07-04 11:28:57 +02:00
Reuben Morais
30de5153bc Fix regressions in bytes output mode 2020-07-04 11:27:33 +02:00
Reuben Morais
1964e80efe
Merge pull request #3124 from DanBmh/patch-1
Update TRAINING.rst
2020-07-04 11:26:59 +02:00
DanBmh
91697ace49
Update TRAINING.rst
Related to #3123
2020-07-03 17:09:08 +02:00
lissyx
c2dfc7118a
Merge pull request #3117 from lissyx/docker-save
Name, tag and save docker image
2020-07-03 12:27:39 +02:00
Alexandre Lissy
436561b0e4 Support for Docker Hub automated builds 2020-07-03 11:56:52 +02:00
lissyx
556ea0c16f
Merge pull request #3103 from lissyx/tflite-delegated-r2.2
Enable TFLite delegations
2020-07-03 09:41:17 +02:00
Alexandre Lissy
67004ca137 Enable TFLite delegations 2020-07-03 01:34:54 +02:00
Reuben Morais
d2d46c3aee
Merge pull request #3113 from mozilla/generate-package-cpp
Rewrite generate_package.py in C++ to avoid training dependencies
2020-07-02 23:13:25 +02:00
Reuben Morais
65915c7f57 Address review comments 2020-07-02 14:09:42 +02:00
Reuben Morais
24526aa82d
Merge pull request #3118 from mozilla/model-type-docs
Add more doc text around distinction between various pre-trained model files (Fixes #2941)
2020-06-30 22:16:25 +02:00
Reuben Morais
5c41b8966e Fix broken link to C API docs
X-DeepSpeech: NOBUILD
2020-06-30 20:33:00 +02:00
Reuben Morais
d0bd1e5c8e Add more doc text around distinction between various pre-trained model files 2020-06-30 20:01:44 +02:00
Reuben Morais
8f6106b35d Update docs to refer to new generate_scorer_package 2020-06-30 16:47:41 +02:00
Tilman Kamp
3762a9b588
Merge pull request #3091 from tilmankamp/warp
Warp augmentation
2020-06-30 14:56:51 +02:00
Tilman Kamp
a48ebdfde8 Reverb augmentation: Workaround for import problem in scikit-learn dependency of librosa 2020-06-30 14:13:12 +02:00
Reuben Morais
2504360e95 Handle universal newlines in Alphabet file parsing 2020-06-30 09:52:45 +02:00
Reuben Morais
5039fb51d5 Package generate_scorer_package on Android 2020-06-30 09:52:45 +02:00
Reuben Morais
6618148e9b Update tensorflow with Boost rules 2020-06-30 09:52:45 +02:00
Reuben Morais
4a589dd897 Build/package/publish generate_scorer_package in CI 2020-06-30 09:52:45 +02:00
Reuben Morais
a84abf813c Deduplicate Alphabet implementations, use C++ one everywhere 2020-06-30 09:52:45 +02:00
Reuben Morais
f82c77392d Rewrite data/lm/generate_package.py into native_client/generate_scorer_package.cpp 2020-06-30 09:52:44 +02:00
Reuben Morais
03ca94887c Move DS_ErrorCodeToErrorMessage impl to its own object so it can be used without including all of libdeepspeech 2020-06-30 09:52:44 +02:00
lissyx
39696f0d67
Merge pull request #3116 from lissyx/fix-git-clone
Set git remote origin before fetching
2020-06-30 09:49:19 +02:00
Alexandre Lissy
f365576517 Set git remote origin before fetching 2020-06-30 00:53:12 +02:00
lissyx
837902ff10
Hotfix TensorFlow repo reference 2020-06-29 22:37:37 +02:00
lissyx
f340f96323
Merge pull request #3107 from lissyx/git-submodule
Use TensorFlow as a submodule
2020-06-29 22:11:28 +02:00
Alexandre Lissy
80ee63fac6 Use TensorFlow as a submodule 2020-06-29 17:03:07 +02:00
Tilman Kamp
eebf12134e Warp augmentation 2020-06-29 16:22:31 +02:00
lissyx
db717d7f73
Merge pull request #3108 from DanBmh/update_docker
Build kenlm in training container image.
2020-06-26 16:17:44 +02:00
Daniel
91b3db33c3 Build kenlm in training container image. 2020-06-26 15:07:18 +02:00
lissyx
be006da2d2
Merge pull request #3098 from lissyx/bump-0.9
Bump VERSION to 0.9.0-alpha.0
2020-06-24 00:09:47 +02:00
Alexandre Lissy
6d1f0c73ef Bump VERSION to 0.9.0-alpha.0 2020-06-23 19:30:09 +02:00
lissyx
7cc6ea959f
Merge pull request #3094 from lissyx/update-msys2
Update msys2 to 2020-06-20 release
2020-06-23 17:59:38 +02:00
Alexandre Lissy
da471ecbab Fix #3095: Update msys2 to 2020-06-20 release 2020-06-23 16:39:05 +02:00
lissyx
3a8bb2e066
Merge pull request #2952 from lissyx/r2.2
Use TensorFlow r2.2 in native client
2020-06-23 14:16:36 +02:00
Alexandre Lissy
eca69d1c84 Trusty -> Xenial 2020-06-22 10:48:12 +02:00
Alexandre Lissy
f169e8f921 Linux cleanup 2020-06-22 10:48:12 +02:00
Alexandre Lissy
4a174f6adc Remove libssl 1.0.2 hack 2020-06-22 10:48:12 +02:00
Alexandre Lissy
bc086ec998 Build DeepSpeech using TensorFlow r2.2 2020-06-22 10:48:12 +02:00
Alexandre Lissy
41d7b4e6f0 Use TensorFlow r2.2 artifacts 2020-06-22 10:48:12 +02:00
Tilman Kamp
6f2ba4b5b4
Merge pull request #3090 from tilmankamp/augext
Fix #3089 - Recreate overlay queue on augmentation restart
2020-06-19 12:22:34 +02:00
Tilman Kamp
da96d14eaa Fix #3089 - Recreate overlay queue on augmentation restart 2020-06-19 11:10:09 +02:00
lissyx
9c1dbd43a0
Merge pull request #3087 from DanBmh/fix_docker
Install checkpoint converting tool in container.
2020-06-18 18:11:55 +02:00
Daniel
e17619bec8 Make paths relative. 2020-06-18 17:26:38 +02:00
Reuben Morais
fcd9563fcd
Merge pull request #3085 from mozilla/new-version-074
Bump VERSION to 0.7.4
2020-06-18 16:56:15 +02:00
Tilman Kamp
4c6245d155
Merge pull request #3055 from tilmankamp/augext
Refactoring of TF based augmentations
2020-06-18 16:13:46 +02:00
Reuben Morais
5edc1cf503 Bump VERSION to 0.7.4 2020-06-18 15:21:31 +02:00
Reuben Morais
bc31eb4b9e Fix usage of ARG instead of ENV in Dockerfile.train 2020-06-18 15:21:31 +02:00
Daniel
eda5f69f2d Install checkpoint converting tool. 2020-06-18 15:20:28 +02:00
Reuben Morais
188a6f2c1e
Merge pull request #3080 from mozilla/install-instructions
Remove --force-reinstall from training code install
2020-06-18 14:38:08 +02:00
lissyx
12a24b8e98
Merge pull request #3083 from DanBmh/fix_docker
Add dependencies for new audio augmentation flags. Fixes #3082.
2020-06-18 13:18:56 +02:00
Daniel
3f8033e1f1 Add dependencies for new audio augmentation flags. Fixes #3082. 2020-06-18 12:24:56 +02:00
Reuben Morais
6ccbbede09 Remove --force-reinstall from training code install
No longer needed since we started publishing ds_ctcdecode on PyPI.
2020-06-17 15:27:52 +02:00
lissyx
b7fa0ade33
Merge pull request #3072 from lissyx/docker-train
Fix #3071: Don't reinstall TensorFlow on top of TensorFlow
2020-06-17 12:50:25 +02:00
lissyx
07c8daef43
Update setup.py
Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2020-06-17 12:50:17 +02:00
lissyx
7fc1ac8fe1
Merge pull request #3069 from lissyx/generic-tc-caching
Fix #3068: More generic TaskCluster build/caching tasks
2020-06-17 12:49:15 +02:00
Alexandre Lissy
f4f4903b2b Fix #3071: Don't reinstall TensorFlow on top of TensorFlow 2020-06-17 11:34:11 +02:00
lissyx
2514b67933
Merge pull request #3066 from lissyx/output-stream-error
Fix #3053: Check output stream when producing scorer
2020-06-17 10:09:15 +02:00
Kelly Davis
e6135bbbfa
Merge pull request #3076 from eagledot/master
Added third-party bindings for NIM-lang.
2020-06-17 08:12:20 +02:00
Anubhav
4e3b4bb3a6
Added third-party bindings for NIM-lang. 2020-06-17 10:07:57 +05:30
Alexandre Lissy
a47c9a2b8c Request android default instead of google_apis
It seems some armv7a image disappeared
2020-06-17 02:06:25 +02:00
Alexandre Lissy
0fd28cfbdf Updating caches 2020-06-17 02:06:25 +02:00
Alexandre Lissy
b52139ceb6 Fix #3075: Add Android 11 to CI 2020-06-17 02:06:25 +02:00
Alexandre Lissy
4f7842c966 Fix #3068: More generic TaskCluster build/caching tasks 2020-06-17 02:06:25 +02:00
lissyx
c1353892b4
Merge pull request #3074 from lissyx/fix-ssl
Fix #3073: Update libssl version
2020-06-17 00:12:35 +02:00
Alexandre Lissy
7768c89e2a Fix #3073: Update libssl version 2020-06-16 23:38:12 +02:00
Alexandre Lissy
6c2cbbd725 Fix #3053: Check output stream when producing scorer 2020-06-16 23:28:01 +02:00
Tilman Kamp
5dd08d2f8e Deactivated scorer in graph augmentation test 2020-06-16 16:57:09 +02:00
Reuben Morais
d538d80ddb
Merge pull request #3067 from DanBmh/update_ignore
Ignore generated dockerfiles
2020-06-16 16:28:53 +02:00
Daniel
e10b807e92 Ignore generated dockerfiles. 2020-06-16 16:24:55 +02:00
Tilman Kamp
a28df45192 Respect None case for augmentations list 2020-06-16 15:46:28 +02:00
Tilman Kamp
7a835bee5a Updated training tests 2020-06-16 13:51:07 +02:00
Tilman Kamp
2d5dcc359a Tests for TF based value range picking 2020-06-16 13:32:32 +02:00
lissyx
c839ab5355
Merge pull request #3065 from lissyx/supported-platforms
Fix #2942: Document supported platforms
2020-06-16 12:31:46 +02:00
Alexandre Lissy
aeb4c5b105 Fix #2942: Document supported platforms 2020-06-16 12:30:50 +02:00
Tilman Kamp
5b6de213d8 Follow-up on PR comments; removed warp augmentation; split pitch_and_tempo augmentation 2020-06-16 11:07:57 +02:00
Tilman Kamp
ea21c7d24e
Apply suggestions from code review
Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2020-06-16 10:22:45 +02:00
Tilman Kamp
0bec67d74c
Update bin/play.py
Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2020-06-16 10:10:01 +02:00
lissyx
ff83f1b8f4
Merge pull request #3060 from lissyx/docker-decouple
Decouple Dockerfile into build and train
2020-06-15 14:30:13 +02:00
Alexandre Lissy
4d541394e8 Decouple Dockerfile into build and train 2020-06-15 14:13:22 +02:00
lissyx
cbb9c28e2c
Merge pull request #3062 from ObliviousParadigm/patch-1
DOC: Fixed grammatical mistake.
2020-06-14 13:31:48 +02:00
ObliviousParadigm
c294d80a93
DOC: Fixed grammatical mistake. 2020-06-14 16:00:32 +05:30
Tilman Kamp
d94db7ca43 Refactoring of TF based augmentations 2020-06-10 13:42:45 +02:00
lissyx
e99b938ebf
Merge pull request #3054 from lissyx/import-time
Report imported vs total audio time
2020-06-10 13:35:01 +02:00
Alexandre Lissy
cfc79799ec Report imported vs total audio time 2020-06-10 13:12:15 +02:00
Reuben Morais
bfaa68945a
Merge pull request #3051 from mozilla/add-metrics-tracking
Add read-only metrics tracking
2020-06-09 13:14:50 +02:00
Reuben Morais
ecd79531c8 Add training test with --metrics_files 2020-06-08 18:06:21 +02:00
Reuben Morais
07d2c39138 Split SDB tests from basic training tests to speed up CI dependents 2020-06-08 18:06:21 +02:00
Reuben Morais
e069b6d61f Add read only validation metrics
For now this is just CTC loss like a validation set, but without affecting
best validation checkpoint tracking logic. Eventually this could compute WER
on a smaller set, for example.
2020-06-08 15:26:37 +02:00
Reuben Morais
572963e7bd
Merge pull request #3043 from mozilla/version-not-symlink
Move VERSION and GRAPH_VERSION to training directory
2020-06-08 14:34:28 +02:00
Reuben Morais
fdf5700d37
Merge pull request #3045 from ricky-cck/master
Fix csv writer parameter
2020-06-08 14:01:15 +02:00
Reuben Morais
b3ae9701b1
Merge pull request #3046 from mozilla/setup-decoder-pypi
Use decoder package from PyPI (Fixes #3044)
2020-06-08 14:00:17 +02:00
Reuben Morais
ba7b0f7436 Merge branch 'alphabet-leak' (Fixes #3049) 2020-06-08 13:59:01 +02:00
Reuben Morais
daba4278ff Add explanation of SWIG ignore side effects 2020-06-08 13:58:32 +02:00
Reuben Morais
7a739c9b98
Merge pull request #3047 from mozilla/dev-test-no-rounding
Only use drop_remainder in dataset for train phase
2020-06-08 13:05:03 +02:00
Reuben Morais
06408b8ddd Flip direction of VERSION and GRAPH_VERSION links 2020-06-08 11:22:24 +02:00
Reuben Morais
53192b68b8 Be more specific in %ignoring symbols since it applies to all imports 2020-06-08 11:20:22 +02:00
Reuben Morais
28c7f4c35d Only use drop_remainder in dataset for train phase 2020-06-08 10:47:13 +02:00
Reuben Morais
209056ceb5 Test PyPI decoder package after upload 2020-06-08 10:27:14 +02:00
Reuben Morais
1a808b216e Download decoder wheel from PyPI 2020-06-05 16:28:07 +02:00
Reuben Morais
544aa364fc Publish decoder wheel to PyPI 2020-06-05 16:26:56 +02:00
RickyChan
a252ae01a0 Fix csv DictWriter parameter 2020-06-05 22:27:19 +09:00
RickyChan
3c83f9f24a Fix csv writer parameter [https://docs.python.org/3/library/csv.html#csv.writer] 2020-06-05 19:00:52 +08:00
lissyx
80a3d70686
Merge pull request #3042 from lissyx/tflite-upload
Add missing TFLite binaries
2020-06-05 11:10:16 +02:00
Alexandre Lissy
c074bb2f6d Add missing TFLite binaries 2020-06-05 11:09:17 +02:00
lissyx
11f347b4da
Merge pull request #3037 from lissyx/rtd-taskcluster-united
Make TaskCluster build the docs like RTD
2020-06-04 15:22:07 +02:00
Alexandre Lissy
23139b2430 Make TaskCluster build the docs like RTD 2020-06-04 14:27:16 +02:00
Tilman Kamp
c5ca78a4ed
Merge pull request #3030 from marekjg/master
Fix arguments order in pcm_to_np call
2020-06-04 14:08:29 +02:00
lissyx
88584941bc
Merge pull request #3036 from lissyx/doc-fix
Install npm deps for ReadTheDocs
2020-06-04 11:25:58 +02:00
Alexandre Lissy
cd571ff4be Bump VERSION to 0.7.3 2020-06-04 09:48:10 +02:00
Alexandre Lissy
a3b0eb6589 Install npm deps for RTD 2020-06-04 09:48:10 +02:00
lissyx
61696afedc
Merge pull request #3034 from lissyx/bump-v0.7.2
Bump VERSION to 0.7.2
2020-06-04 09:04:36 +02:00
Alexandre Lissy
e3ae74f80a Bump VERSION to 0.7.2 2020-06-04 09:03:50 +02:00
lissyx
222a25f979
Merge pull request #3029 from lissyx/fix-nodejs-win-tests
Fix wrong nodejs version for Windows tests
2020-06-04 07:59:13 +02:00
Marek Grzegorek
8a87d1d100
Fix arguments order in pcm_to_np call
pcm_to_np takes segment buffer as its first argument and audio format as a second one. The wrong order cause "bytes object has no attribute 'channels'' ArgumentError.
2020-06-03 18:05:52 +02:00
Alexandre Lissy
3ba3e10ecd Fix wrong nodejs version for Windows tests 2020-06-03 17:06:25 +02:00
lissyx
64fd79f9c1
Merge pull request #3027 from lissyx/node-v14-electron-v9
Node v14 electron v9
2020-06-03 15:56:35 +02:00
Alexandre Lissy
75a320e87b Enable ElectronJS / TFLite / Windows tests 2020-06-03 13:45:36 +02:00
Alexandre Lissy
f5d4f7f9d6 Fix typo in description for NodeJS / ARMbian 2020-06-03 11:52:32 +02:00
Alexandre Lissy
d24fb70869 Update node-gyp cache 2020-06-03 11:50:37 +02:00
Alexandre Lissy
aa4c746899 Maximize binary compatibility 2020-06-03 11:50:37 +02:00
Alexandre Lissy
cdeb933c0b Add ElectronJS v9.0 2020-06-03 11:50:37 +02:00
Alexandre Lissy
dc8dbbd398 Add NodeJS v14 2020-06-03 11:50:37 +02:00
Alexandre Lissy
f925dd9fc8 Fix Homebrew checks 2020-06-03 11:50:37 +02:00
Alexandre Lissy
00577873ce Update Homebrew 2020-06-03 11:50:37 +02:00
Reuben Morais
60397964e1 Add some native_client build outputs to .gitignore
X-DeepSpeech: NOBUILD
2020-06-02 12:29:52 +02:00
Reuben Morais
b327fa3c73
Merge pull request #3025 from reuben/pr3024
PR #3024
2020-05-30 18:27:21 +02:00
Reuben Morais
ab2ba41c7f Convert path to str to fix Python 3.5 compat 2020-05-30 15:16:30 +02:00
Shubham Kumar
84d2f2a5f1
update description of test-training-unittests*yml 2020-05-29 20:54:27 +05:30
Shubham Kumar
73c4f3a201
update tc-train-unittest.sh 2020-05-29 20:50:36 +05:30
Shubham Kumar
ccca1c1fed
add tests to TC and update travis 2020-05-29 20:28:52 +05:30
Shubham Kumar
0b78f4ff01
move unittest to TC 2020-05-29 17:06:08 +05:30
Reuben Morais
b9f9b3cedd Merge branch 'import_cv2_multiprocessing' (Fixes #3008) 2020-05-28 00:00:43 +02:00
Reuben Morais
3d0ec01853 Fix typo from argument reordering 2020-05-27 19:02:55 +02:00
Reuben Morais
af4bc31c27
Merge pull request #3021 from mozilla/confidence-raw-scores
Return raw scores in confidence value (Fixes #3004)
2020-05-27 18:40:02 +02:00
Reuben Morais
d548222518 Return raw scores in confidence value 2020-05-27 16:48:57 +02:00
Reuben Morais
65bbc9ae34
Explicitly mention Bazel 0.24.1 since the TensorFlow documentation page skips 1.15 2020-05-27 09:24:01 +02:00
Reuben Morais
99c34df368
Update dangling reference to removed scorer scripts docs
See #3016
2020-05-26 17:35:08 +02:00
Reuben Morais
aef9ed9792 Add missing artifacts_deps entries to RTD task 2020-05-26 15:12:29 +02:00
Reuben Morais
28cefa0c78 Fix .yaml -> .yml extension in RTD update task 2020-05-26 15:05:28 +02:00
Reuben Morais
05f246e48c Make sure Travis can handle PR w/ non-master base
X-DeepSpeech: NOBUILD
2020-05-26 12:55:44 +02:00
Reuben Morais
b12c7be710
Merge pull request #3012 from reuben/index-ts
Switch index.js to TypeScript
2020-05-25 19:57:02 +02:00
Reuben Morais
45d8f7cd61 Explicitly pass filter context to multiprocessing function 2020-05-25 18:00:08 +02:00
Reuben Morais
a462d951cf Remove unneeded npm dependencies for doc build 2020-05-25 17:09:50 +02:00
Reuben Morais
31ba2898a8
Merge pull request #3009 from mozilla/lm-docs-rtd
Add data/lm doc to RTD, and some general doc improvements and fixes
2020-05-25 16:18:57 +02:00
Reuben Morais
facdff8c70 Switch JavaScript index.js to TypeScript 2020-05-25 13:08:02 +02:00
Reuben Morais
0ed7d301e3
Merge pull request #3011 from mozilla/pr3010
PR #3010 - Fix Stream.intermediateDecodeWithMetadata + tests
2020-05-25 12:01:29 +02:00
Reuben Morais
fdd3b319a5 Exercise intermediateDecode and intermediateDecodeWithMetadata in streaming tests 2020-05-25 11:01:40 +02:00
Reuben Morais
1f30cf2717 Use Buffer type in TS definitions that take a Buffer 2020-05-25 11:01:15 +02:00
Reuben Morais
83320c1a10 Remove bogus Stream parameter in Stream.intermediateDecode TS definition 2020-05-25 11:00:58 +02:00
Greg Richardson
b8f9d036c2 Fix JS IntermediateDecodeWithMetadata binding 2020-05-24 21:38:56 -06:00
Reuben Morais
4356a2764b Add data/lm doc to RTD, and some general doc improvements and fixes 2020-05-24 15:35:10 +02:00
Reuben Morais
50b0b8c010 Propagate error code from load_lm in decoder binding 2020-05-20 12:05:35 +02:00
Reuben Morais
420b2c5673
Merge pull request #3002 from mozilla/update-msys2
Update MSYS2 base archive and work around startup problem
2020-05-19 22:21:55 +02:00
Reuben Morais
90ce0921bb Adjust TC_MSYS_VERSION to match uname in new MSYS2 base 2020-05-19 19:17:36 +02:00
Alexandre Lissy
b6b2ec6d64 Update MSYS2 base archive and work around startup problem 2020-05-19 18:05:52 +02:00
Reuben Morais
4ce7da717b
Merge pull request #3001 from mozilla/setup-py-decoder-windows
Windows support in setup.py decoder wheel installation (Fixes #2992)
2020-05-19 16:10:35 +02:00
Reuben Morais
ac2b63c0bf Windows support in setup.py decoder wheel installation 2020-05-19 14:50:26 +02:00
Reuben Morais
430132c5a5
Merge pull request #2998 from mozilla/scorer-error
Improve error handling around Scorer (Fixes #2995 and #2996)
2020-05-19 13:55:54 +02:00
Tilman Kamp
1c2d89723f
Merge pull request #2897 from tilmankamp/live_augmentation
Live augmentation
2020-05-19 09:05:47 +02:00
Reuben Morais
836707d3ab Disable pacman update to workaround zstd package issuee 2020-05-19 00:10:13 +02:00
Reuben Morais
4cfe5e535d
Merge pull request #2999 from mozilla/readthedocs-automation
Add task to trigger ReadTheDocs builds & version update
2020-05-18 21:05:58 +02:00
Reuben Morais
a590e3726b Add link to RTD, actually exit on error 2020-05-18 19:04:33 +02:00
Reuben Morais
1276c47d8f Trigger same error message for all input formats 2020-05-18 18:57:23 +02:00
Reuben Morais
e8647aa5fa Add missing import in generate_package.py 2020-05-18 18:52:30 +02:00
Reuben Morais
ce00feffaa Add task to trigger ReadTheDocs builds & version update
X-DeepSpeech: NOBUILD
2020-05-18 18:46:08 +02:00
Reuben Morais
bfd90f1f9b Include error descriptions in documentation page 2020-05-18 17:41:53 +02:00
Reuben Morais
361e216297 Only ignore (expected) missing trie error in generate_package.py 2020-05-18 17:30:49 +02:00
Reuben Morais
4db20b3cd6 Expose DS error codes in ds_ctcdecoder package 2020-05-18 17:30:48 +02:00
Reuben Morais
afb8c55b6e Basic coverage of DS_ErrorCodeToErrorMessage 2020-05-18 17:30:48 +02:00
Reuben Morais
d3d5398d6b Move error definition and description together 2020-05-18 17:30:48 +02:00
Tilman Kamp
ac9a17d8a7 Moved signal augmentation tests to own test config 2020-05-18 15:34:09 +02:00
Tilman Kamp
96caa2d115 Follow up on PR comments 2020-05-18 14:23:02 +02:00
Tilman Kamp
7b08e595a7 Value range unit tests 2020-05-18 12:39:15 +02:00
Reuben Morais
7ed900e333 Add better error information in Scorer initialization 2020-05-18 11:19:19 +02:00
Kelly Davis
3b4c39f80d
Merge pull request #2994 from Jendker/patch-1
Bug fix - test_csvs argument was ignored
2020-05-15 19:33:48 +02:00
Jędrzej Beniamin Orbik
0849261b38
Bug fix - test_csvs argument was not used
Removed spurious overwriting of argument 'test_csvs' in evaluate.py
This bug lead to problems in lm_optimizer.py
2020-05-15 18:59:56 +02:00
Tilman Kamp
a5303ccca6 Renamed prepare_samples to augment_samples 2020-05-14 16:50:18 +02:00
Tilman Kamp
64e14886b8 Apply suggestions from code review
Co-authored-by: Reuben Morais <reuben.morais@gmail.com>
2020-05-14 15:04:52 +02:00
Reuben Morais
2e9c281d06
Merge pull request #2990 from mozilla/release-071
Bump VERSION to 0.7.1
2020-05-12 17:29:44 +02:00
Reuben Morais
d1b4ea8538 Bump VERSION to 0.7.1 2020-05-12 16:06:01 +02:00
Reuben Morais
e23390eb89
Merge pull request #2982 from mozilla/node-tflite-stream-tests
Run streaming tests in Node TFLite tasks too
2020-05-12 16:01:26 +02:00
Tilman Kamp
c5ceee26dd Live audio augmentation 2020-05-12 10:18:21 +02:00
Tilman Kamp
927859728f Named tuple AudioFormat, parameter re-ordering in util.audio and NP to PCM conversion support 2020-05-12 10:09:15 +02:00
Tilman Kamp
f8acf5cba7
Merge pull request #2989 from JRMeyer/pip-install-fix
remove bad reference to requirements.txt
2020-05-12 08:41:28 +02:00
josh meyer
de710ab3d6 remove bad reference to requirements.txt 2020-05-11 18:27:49 -07:00
Tilman Kamp
ef81bb9512
Merge pull request #2984 from tilmankamp/cv-train-all
CV2 importer: train-all.csv
2020-05-07 16:54:44 +02:00
Tilman Kamp
3871cdc67f CV2 importer: Writes additional train-all.csv with all validated samples except speakers and/or transcripts already in dev or test 2020-05-07 15:09:42 +02:00
Reuben Morais
33cba89227
Merge pull request #2983 from mozilla/alpha-071a2
Bump VERSION to 0.7.1-alpha.2
2020-05-07 14:11:25 +02:00
Reuben Morais
e188e255b0 Bump VERSION to 0.7.1-alpha.2 2020-05-07 13:12:43 +02:00
Reuben Morais
7394b13141 Handle missing xxd gracefully 2020-05-06 13:58:06 +02:00
Reuben Morais
3b53f92c0a Run streaming tests in Node TFLite tasks too 2020-05-06 13:58:06 +02:00
Reuben Morais
9a7ec1ae0d
Merge pull request #2981 from mozilla/pr2980-tests
PR #2980 + tests (Fixes #2979)
2020-05-06 13:26:58 +02:00
Reuben Morais
a02eddec38 Add JS test for streaming + metadata 2020-05-06 12:04:17 +02:00
Reuben Morais
b0e0972b78 Fix reference to Stream.finishStreamWithMetadata 2020-05-06 12:03:50 +02:00
Reuben Morais
e6e3cc539b Mark aNumResults parameters in *withMetadata methods as optional 2020-05-06 12:03:14 +02:00
Matt McCartney
41d990e538 fix(js): declare FinishStreamWithMetadata result object
- as demanded by strict mode

close #2979
2020-05-05 14:32:07 -07:00
Reuben Morais
553c8ec6f1
Merge pull request #2976 from lwalejko/fix_beam_width_setting_in_python_client
Fix beam width setting in python client
2020-05-05 17:06:57 +02:00
Łukasz Wałejko
d8c3b71033 fix beam width setting in python client 2020-05-05 14:54:36 +02:00
Reuben Morais
409d19346a Merge branch 'issue2968' (Fixes #2968) 2020-05-04 14:09:16 +02:00
Reuben Morais
def8214454 Remove dts-gen docs from JavaScript package README
X-DeepSpeech: NOBUILD
2020-05-04 14:08:52 +02:00
Reuben Morais
f848bf4940
Merge pull request #2971 from mozilla/bump-0.7.1a1
Bump VERSION to 0.7.1-alpha.1
2020-05-04 13:33:59 +02:00
Reuben Morais
524f7a7646
Bump VERSION to 0.7.1-alpha.1 2020-05-04 11:42:43 +02:00
Reuben Morais
d36092cd9b
Merge pull request #2970 from mozilla/enabledecoder-error-handling
Improve error handling for DS_EnableExternalScorer (Fixes #2969)
2020-05-04 11:41:32 +02:00
Reuben Morais
48971413e5 Improve EnableExternalScorer error handling in Python and JS bindings 2020-05-03 15:39:31 +02:00
Reuben Morais
5deb8a2f7b Don't leave partially initialized scorer on failure 2020-05-03 15:39:07 +02:00
Reuben Morais
fdee05f321
Merge pull request #2960 from mozilla/new-alpha-071
Bump VERSION to 0.7.1-alpha.0
2020-05-01 20:58:54 +02:00
Reuben Morais
8240666bb3 Bump VERSION to 0.7.1-alpha.0 2020-05-01 19:28:27 +02:00
Reuben Morais
de2b58ae1e Merge branch 'pr/2966' (Fixes #2966) 2020-05-01 19:27:21 +02:00
Reuben Morais
1afabc0e69 Minor tweaks to the automatic mixed precision docs 2020-05-01 19:27:13 +02:00
Reuben Morais
b71294d6c2
Merge pull request #2964 from mozilla/python-client-no-candidates
Add --candidate_transcripts flag to Python client
2020-05-01 19:07:13 +02:00
Reuben Morais
c324e4c8c2 Retry one more time 2020-05-01 17:32:39 +02:00
Reuben Morais
efd536bfa3 Retry tasks on TaskCluster 2020-05-01 00:14:46 +02:00
Łukasz Wałejko
ae2d3754e6
Update TRAINING.rst 2020-04-30 15:24:15 +02:00
Reuben Morais
e0283f529a Add --candidate_transcripts flag to Python client 2020-04-30 10:09:09 +02:00
Reuben Morais
26e2f88bfe
Merge pull request #2959 from mozilla/test-output-utf8
Don't escape non-ASCII chars in test_output_file JSON & other small fixes
2020-04-29 18:05:57 +02:00
Reuben Morais
6eb784bd3e Make DeepSpeech.py executable and call Python interpreter explicitly in docs
X-DeepSpeech: NOBUILD
2020-04-29 15:58:44 +02:00
Reuben Morais
aa143e1b9e Add missing log_warn import 2020-04-29 15:45:09 +02:00
Reuben Morais
b283aadae6 Don't escape non-ASCII characters in test_output_file 2020-04-29 15:45:09 +02:00
Reuben Morais
65b7c41746
Merge pull request #2957 from mozilla/nodejs-stream-wrapper
Return Stream wrapper in JS Model.createStream, add test coverage (Fixes #2956)
2020-04-29 15:18:20 +02:00
Reuben Morais
09b756aa0c
Merge pull request #2951 from mozilla/hack-mac-sox-static
Ugly, very ugly, incredibly ugly static linking of libsox on macOS
2020-04-29 14:16:39 +02:00
Reuben Morais
6f83e05341 Test JS client streaming mode 2020-04-29 14:05:18 +02:00
Reuben Morais
be1bd04b50 Add streaming mode to JS client 2020-04-29 13:43:40 +02:00
Reuben Morais
b0415af4b4 Return Stream wrapper in JS Model.createStream 2020-04-29 13:43:28 +02:00
lissyx
d120c4096a
Merge pull request #2954 from lissyx/py-tflite
Python TFLite tests
2020-04-29 11:27:10 +02:00
Alexandre Lissy
ecdcf9e28a Python TFLite tests 2020-04-29 02:05:14 +02:00
Reuben Morais
4930186197
Merge pull request #2949 from mozilla/docs-rtd
Docs centered on ReadTheDocs instead of GitHub
2020-04-28 16:06:44 +02:00
Reuben Morais
ea7475d09c Ugly, very ugly, incredibly ugly static linking of libsox on macOS
All of the brew installed dependencies have static libraries as well, but the macOS linker will always prefer a dynamic library if both exist under the same `-L/foo -lbar` resolution. The only way to force static linking is to include a full path to the static library. These changes basically reverse engineer the static library locations and then pass those to the linker.
2020-04-28 14:13:53 +02:00
Reuben Morais
6f9fcf3029 Embed flag definitions 2020-04-28 13:33:45 +02:00
david gauchard
117324e665
Add a --discount_fallback option to generate_lm.py (#2945) 2020-04-28 11:58:41 +02:00
Reuben Morais
d85b0960eb Address review comment 2020-04-28 11:55:58 +02:00
Reuben Morais
1838a1e0d4 Remove FAQ reference and reword SUPPORT.rst a bit 2020-04-28 11:55:20 +02:00
lissyx
060bddde8c
Merge pull request #2946 from jschueller/patch-1
Install deepspeech header
2020-04-28 00:29:25 +02:00
Reuben Morais
a584c8e6b6 Docs centered on ReadTheDocs instead of GitHub 2020-04-27 20:31:11 +02:00
Reuben Morais
6e9b251da2 Re-introduce master warning to main README 2020-04-26 09:22:51 +02:00
Julien Schueller
1eed145aa1
Install deepspeech header 2020-04-26 09:19:03 +02:00
lissyx
6e834b088d
Merge pull request #2944 from jschueller/patch-1
Doc: Mention we explicitely need Bazel 0.24.1
2020-04-25 15:31:34 +02:00
Julien Schueller
1a6fabf346
Doc: Mention we explicitely need Bazel 0.24.1
ref https://github.com/mozilla/DeepSpeech/issues/2943
2020-04-25 14:15:16 +02:00
Reuben Morais
3fbbca2b55
Merge pull request #2939 from mozilla/new-version
Bump VERSION to 0.7.0 and update docs
2020-04-24 18:14:56 +02:00
Reuben Morais
b25404e294 Bump VERSION to 0.7.0 and update docs 2020-04-24 16:50:31 +02:00
Reuben Morais
8c0fbcbbbf Merge branch 'new-alpha' (Fixes #2938) 2020-04-24 14:37:59 +02:00
Reuben Morais
0720bcb713 Skip prebuilt decoder wheel in Dockerfile 2020-04-24 13:46:17 +02:00
Reuben Morais
43a85518d2 Bump VERSION to 0.7.0-alpha.4 2020-04-24 13:19:59 +02:00
Reuben Morais
01c269da9c
Merge pull request #2936 from mozilla/update-prod-models
Update prod model
2020-04-24 13:15:37 +02:00
Reuben Morais
ff8570c1f3 Update prod model expected inference results 2020-04-24 09:00:15 +02:00
Reuben Morais
3353f6a60a Update prod model 2020-04-23 21:29:31 +02:00
Reuben Morais
b8416bd4bf
Merge pull request #2935 from GoldschmittGabriel/patch-1
Update import_swc.py
2020-04-23 14:58:59 +02:00
GaGo
3daca6f209
Update import_swc.py
I tried to use the importer. With the Error path not defined.

I think it's appear after the refactor by @reuben.
Nothing big, but I think an commit worth. :)
2020-04-23 14:45:39 +02:00
Kelly Davis
1cf002731f
Merge pull request #2934 from mozilla/unpin-numpy-training
Unpin numpy version in training package
2020-04-23 11:55:25 +02:00
Kelly Davis
ba6ff124d5
Merge pull request #2933 from mozilla/scorer_params
Updated alpha and beta
2020-04-23 11:17:13 +02:00
Reuben Morais
301def71e0 Unpin numpy in training package 2020-04-23 11:00:09 +02:00
kdavis-mozilla
f3b4943e18 Updated alpha and beta 2020-04-23 07:19:37 +02:00
lissyx
cc449e0443
Merge pull request #2925 from Jendker/evaluate_tflite_fix
Add missing external scorer
2020-04-21 15:34:29 +02:00
lissyx
de5bfed8b3
Merge pull request #2929 from lissyx/ci-py37-py38
Fix #2928: Add Python 3.7 CI coverage
2020-04-20 18:21:10 +02:00
Tilman Kamp
eed66e3b75
Merge pull request #2926 from tilmankamp/fix-mailab-importer
M-AILAB importer: Ensure all samples are 16 kHz
2020-04-20 17:44:06 +02:00
Tilman Kamp
e99e06a278 M-AILAB importer: Ensure all samples are 16 kHz 2020-04-20 17:34:58 +02:00
Alexandre Lissy
a48341f57f Fix #2928: Add Python 3.7, 3.8 CI coverage 2020-04-20 17:21:19 +02:00
lissyx
d4822a2f97
Merge pull request #2927 from lissyx/m_or_mu
Do not use m/mu ABI for Py3.8+
2020-04-20 17:12:34 +02:00
Alexandre Lissy
08ff548d26 Do not use m/mu ABI for Py3.8+ 2020-04-20 16:23:42 +02:00
lissyx
933623b9be
Merge pull request #2921 from lissyx/fix-dsswig-priority
Force ds-swig first in PATH to avoid messing if system-wide exists
2020-04-20 16:16:30 +02:00
Jedrzej Beniamin Orbik
a4b08594eb Added missing external scorer 2020-04-20 16:04:54 +02:00
Alexandre Lissy
8835b4d64a Force numba pinned version 2020-04-20 13:49:34 +02:00
Alexandre Lissy
67522cc986 Force numba pinned version 2020-04-20 13:28:25 +02:00
Alexandre Lissy
9b7724e559 Force ds-swig first in PATH to avoid messing if system-wide exists 2020-04-20 12:39:48 +02:00
Reuben Morais
bf3ebefd60
Merge pull request #2911 from jimregan/patch-1
import_lingua_libre.py: n channels + bitdepth
2020-04-17 21:21:12 +02:00
DanBmh
bfe778482c
Refactor generate_package.py (#2903)
* Improve formatting and paths in LM README

* Improve logging in generate_package.py

Co-authored-by: Daniel <daniel@mail.de>
2020-04-17 21:20:45 +02:00
Reuben Morais
a019e979ff
Merge pull request #2915 from mozilla/delay-beam-expansion (Fixes #2867)
Delay beam expansion until a non-blank label has probability >0.1%
2020-04-17 21:03:08 +02:00
Reuben Morais
33760a6bcd Delay beam expansion until a non-blank label has probability >0.1% 2020-04-17 14:33:40 +02:00
Kelly Davis
7efdfc54a6
Merge pull request #2912 from JRMeyer/transfer-learning-docs2
Transfer Learning docs
2020-04-17 13:11:49 +02:00
JRMeyer
2342ba7956 rebased docs on master 2020-04-16 18:29:53 -07:00
Jim Regan
5a7e4ea348
import_lingua_libre.py: n channels + bitdepth 2020-04-16 22:44:32 +02:00
lissyx
8c76c92694
Merge pull request #2909 from madprogramer/patch-1
Documentation Suggestion: native_client can be found in the releases
2020-04-16 21:21:26 +02:00
madprogramer
e6779e8f84
Added: native_client can be found in the releases
I wanted to install the binary without cloning the entire repository (and thereby the required `util/taskcluster.py`) and was only able to find it pretty much by chance. I feel that adding this into the README could save people from a few headaches.
2020-04-16 15:45:56 +03:00
Reuben Morais
40250988db Merge branch 'pr2801' (Fixes #2801) 2020-04-14 13:07:50 +02:00
Reuben Morais
c27387fd98 README tweaks 2020-04-14 13:07:44 +02:00
Daniel
f82a77f249 Update readme. 2020-04-14 13:00:23 +02:00
Kelly Davis
c80d7f6f3d
Merge pull request #2900 from chrillemanden/fix-train-doc-typos
Fix documentation typos in section Augmentation
2020-04-14 10:06:34 +02:00
Reuben Morais
2b88c76737
Merge pull request #2901 from mozilla/issue2526
Use Alphabet to compute string values in get_prev_* (Fixes #2526)
2020-04-13 13:48:38 +02:00
Reuben Morais
5fa1839a7f Use Alphabet to compute string values in get_prev_* 2020-04-13 10:23:23 +02:00
chrillemanden
737c92f962 Fix documentation typos in section Augmentation 2020-04-12 15:59:32 +02:00
Daniel
8c73bf6fbf Small fixes. 2020-04-09 16:58:22 +02:00
Daniel
00e4dbe3fd Merge remote-tracking branch 'upstream/master' 2020-04-08 20:27:43 +02:00
Daniel
c29c0beb72 Default to required params. 2020-04-08 20:23:04 +02:00
lissyx
0f71bd2493
Merge pull request #2894 from lissyx/local-swig
Fix #2885: Improve ds-swig integration
2020-04-08 19:45:49 +02:00
Alexandre Lissy
a7deb9ee79 Ensure docker build pip really install locally built package 2020-04-08 18:48:12 +02:00
Alexandre Lissy
d8d5e6f358 Fix #2885: Improve ds-swig integration 2020-04-08 18:48:12 +02:00
Reuben Morais
96418bea15
Merge pull request #2884 from mozilla/issue2883
Only allow graph/layer initialization at start of training (Fixes #2883)
2020-04-08 13:47:41 +02:00
lissyx
83675f5ad8
Merge pull request #2893 from lissyx/example-api-net
Example api net
2020-04-08 12:47:40 +02:00
Alexandre Lissy
a699896282 Update Sphinx deps 2020-04-08 12:27:35 +02:00
Alexandre Lissy
c28d7dd9c4 Fix useless parsing of doc/node_modules/ 2020-04-08 12:21:02 +02:00
Alexandre Lissy
2d398b64d8 Fix DeepSpeechStream reference 2020-04-08 12:20:46 +02:00
Alexandre Lissy
2b51316d68 Add .Net API usage example 2020-04-08 11:57:38 +02:00
lissyx
e22890c251
Merge pull request #2890 from lissyx/doc-training
Name section more explicit
2020-04-07 19:58:38 +02:00
Alexandre Lissy
75a0205b37 Name section more explicit 2020-04-07 19:49:05 +02:00
Reuben Morais
0c6e90868e Split --load into two to avoid unexpected behavior at evaluation time 2020-04-07 14:24:05 +02:00
Reuben Morais
cc7a0ada46 Only allow graph/layer initialization at start of training 2020-04-07 14:24:05 +02:00
lissyx
3c20251684
Merge pull request #2889 from lissyx/api-example-refs
Fix #2888: Use start-after / end-before for API example line references
2020-04-07 14:19:48 +02:00
Alexandre Lissy
5e3c5e9131 Fix #2888: Use start-after / end-before for API example line references 2020-04-07 14:01:09 +02:00
lissyx
b5a805056f
Merge pull request #2882 from lissyx/PR2876
Pr2876
2020-04-06 23:52:45 +02:00
Alexandre Lissy
88ac227ebe Fix decoder doc not generated 2020-04-06 22:12:00 +02:00
Alexandre Lissy
5723dba180 Update doc for TypeScript support 2020-04-06 22:12:00 +02:00
Alexandre Lissy
3581bdf9fe Add TypeScript CI 2020-04-06 22:12:00 +02:00
Alexandre Lissy
bf31b2e351 Expose Stream-related within Stream class 2020-04-06 22:12:00 +02:00
Alexandre Lissy
567595aa5a Package and expose TypeScript for JS interface 2020-04-06 11:15:42 +02:00
Daniel
a291e23041 Update readme. 2020-04-04 10:50:28 +02:00
Daniel
e16b72ff28 Use os.join and kenlm parameter usage description. 2020-04-03 17:58:52 +02:00
Anas Abou Allaban
510d71353f Update README
Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
2020-04-03 13:28:56 +02:00
Anas Abou Allaban
be40b07307 Add type declaration file for v0.7.0
Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
2020-04-03 13:28:56 +02:00
Tilman Kamp
510e29fe65
Merge pull request #2879 from JRMeyer/batch-transcribe
batch transcribe dir (recursively) with transcribe.py
2020-04-03 09:05:28 +02:00
JRMeyer
cad03d33ea simplify function 2020-04-02 21:40:57 -07:00
lissyx
0cee75c295
Merge pull request #2880 from NormanTUD/master
Corrected Typo in Dockerfile
2020-04-02 10:29:14 +02:00
Norman Koch
b88b63399a Corrected Typo in Dockerfile 2020-04-02 04:18:14 -04:00
JRMeyer
aa223d7090 batch transcribe dir (recursively) with transcribe.py 2020-04-01 20:40:53 -07:00
Daniel
e862cd41db Read from input.txt.gz again. 2020-04-01 17:29:02 +02:00
Daniel
38afe38f0b Implement some change request. 2020-04-01 17:15:52 +02:00
Daniel
b27e0347b1 Add more parameters.
Implement some change request.
2020-04-01 16:54:58 +02:00
Reuben Morais
4d567cef38
Merge pull request #2877 from mozilla/fix-compute-for-setup.py
Fix .compute for packaged training code
2020-04-01 16:35:28 +02:00
Reuben Morais
7dab19ebe5 Fix .compute for packaged training code 2020-04-01 16:24:36 +02:00
Kelly Davis
0cc815f1f0
Merge pull request #2826 from TeHikuMedia/add_trial_pruning
Add trial pruning to lm_optimizer.py
2020-04-01 14:56:31 +02:00
lissyx
d80cdb564a
Merge pull request #2868 from lissyx/doc-validate_label
Mention validate_label_locale in training doc
2020-04-01 14:55:00 +02:00
Alexandre Lissy
0a11a8293e Mention validate_label_locale in training doc
Fixes #2865
2020-04-01 14:54:20 +02:00
lissyx
7b8b678310
Merge pull request #2869 from lissyx/scorer-fail-early
Add some early checks, for Scorer at first
2020-04-01 14:51:05 +02:00
Tilman Kamp
5acc5282c7
Merge pull request #2859 from tilmankamp/dot-compute
Updated .compute
2020-04-01 11:42:00 +02:00
Reuben Morais
7c3c9d0b8d
Merge pull request #2871 from mozilla/decoder-as-dep
Automatically install ds_ctcdecoder in setup.py
2020-04-01 10:29:50 +02:00
Reuben Morais
af7e2c294d Pin versions of pip, setuptools, wheel in training readme
X-DeepSpeech: NOBUILD
2020-04-01 10:29:22 +02:00
Reuben Morais
c428acf478 Automatically install ds_ctcdecoder in setup.py 2020-03-31 18:47:45 +02:00
Alexandre Lissy
950d097ca1 Add some early checks, for Scorer at first
Fixes #2807
2020-03-31 15:46:57 +02:00
Reuben Morais
83d22e591b
Merge pull request #2856 from reuben/training-install
Package training code to avoid sys.path hacks
2020-03-31 15:42:42 +02:00
Reuben Morais
02fa9c781c Fix Python 3.5 compat issue 2020-03-31 14:59:32 +02:00
Reuben Morais
dc119880b9 Sync training package version with main version 2020-03-31 14:02:26 +02:00
Reuben Morais
c738d55012 Remove unneeded six.moves import 2020-03-31 13:57:44 +02:00
Reuben Morais
2f68ed1001 Remove unneeded future imports from importers 2020-03-31 13:55:32 +02:00
Reuben Morais
6f0bf3b3a8 Reformat importers with black 2020-03-31 13:43:30 +02:00
Reuben Morais
b7e6b8c3e6 Sort importer imports with isort 2020-03-31 13:43:00 +02:00
Reuben Morais
20b0ab17ea Remove unused GPU usage tools 2020-03-31 13:42:41 +02:00
lissyx
e1206f47d3
Merge pull request #2833 from alexcannan/master
Fixed sample rate logic in Python client
2020-03-30 23:45:59 +02:00
Reuben Morais
967043ef95
Merge pull request #2863 from mozilla/check-alphabet-generate-package
Error early in generate_package.py if no alphabet was specified and not using UTF-8 mode
2020-03-30 19:42:19 +02:00
Reuben Morais
09673581a4 Error early in generate_package.py if no alphabet was specified and not using UTF-8 mode
X-DeepSpeech: NOBUILD
2020-03-30 18:16:56 +02:00
Daniel
a79cc0cee9 Merge remote-tracking branch 'upstream/master' 2020-03-29 12:34:03 +02:00
Daniel
f97c79e0e8 Make generate_lm.py language independent. 2020-03-29 12:29:18 +02:00
Tilman Kamp
0c4a2050b8 Updated .compute 2020-03-27 17:08:57 +01:00
Reuben Morais
080dc7df3c
Add a little more documentation around the decoder and UTF-8 mode (#2850)
* Add decoder and UTF-8 docs

* Address review comments
2020-03-27 11:26:19 +01:00
Reuben Morais
a05baa35c9 Package training code to avoid sys.path hacks 2020-03-25 21:34:50 +01:00
lissyx
58bc2f2bb1
Merge pull request #2853 from lissyx/bump-v0.7.0-alpha.3
Bump VERSION to 0.7.0-alpha.3
2020-03-25 13:04:43 +01:00
Alexandre Lissy
60150bd0bd Bump VERSION to 0.7.0-alpha.3 2020-03-25 13:04:11 +01:00
lissyx
6720662cfd
Merge pull request #2848 from lissyx/PR2843+build-ctc-windows
Pr2843+build ctc windows
2020-03-25 10:50:16 +01:00
Alexandre Lissy
1598be8124 Add CTC decoder build on TaskCluster 2020-03-25 09:35:12 +01:00
lissyx
ea233439ce
Merge pull request #2843 from ryojiysd/ctcdecoder-wheels-win
Support building ctc_decoder wheel package on Windows system
2020-03-25 09:34:47 +01:00
lissyx
c8046cbff0
Fix -fPIC typo 2020-03-25 09:18:58 +01:00
Ryoji Yoshida
4ffbd46cea Address review comments 2020-03-25 10:33:22 +09:00
Tilman Kamp
8088b574fc
Merge pull request #2849 from tilmankamp/unlabeled-samples
Fix #2830 - Support for unlabeled samples
2020-03-24 18:20:22 +01:00
Tilman Kamp
41da7b2870 Fix #2830 - Support for unlabeled samples 2020-03-24 16:53:02 +01:00
Ryoji Yoshida
d7cca2a791 Support building ctc_decoder wheel package on Windows system 2020-03-24 09:48:30 +09:00
Kelly Davis
5740d64e6e
Merge pull request #2840 from NormanTUD/master
Corrected typo in flags
2020-03-22 20:57:16 +01:00
Norman Koch
6ec19b9498 Corrected typo in flags 2020-03-22 20:39:41 +01:00
Shubham Kumar
5d50d21da0
Exposing ErrorCode API in Tree bindings (#2806)
* Added API to Python bindings

* Added API to JavaScript bindings

* Added API to Java bindings

* Added API to .NET binding
2020-03-21 15:40:39 +01:00
Reuben Morais
dfd69e47f1
Merge pull request #2835 from mozilla/reuben-bump-tf
Bump dependency to TensorFlow 1.15.2
2020-03-20 21:59:53 +01:00
lissyx
a270d23814
Merge pull request #2834 from lissyx/new-android
Add CI for Android 8.0, 9.0 and 10.0
2020-03-20 20:28:34 +01:00
Alexandre Lissy
1bedf9ef60 Add CI for Android 8.0, 9.0 and 10.0
We limit ourselves to x86_64 because it seems Google does not provide
any system images after API level 25 for arm64-v8a and armeabi-v7a.
There is also no system image for API level 27 for x86_64.
2020-03-20 19:40:36 +01:00
Reuben Morais
017c9a6f8c
Bump dependency to TensorFlow 1.15.2
Fixes security issues in TensorFlow and stops GitHub nagging us.
2020-03-20 19:39:03 +01:00
Reuben Morais
903d0b8fe4
Merge pull request #2792 from reuben/multiple_transcriptions
Expose multiple transcriptions in "WithMetadata" API
2020-03-20 16:58:32 +01:00
Reuben Morais
ee30a1c9de Adapt Java bindings to const structs 2020-03-20 13:51:29 +01:00
Alex Cannan
4fd39175b3 Fixed sample rate logic in Python client 2020-03-19 14:37:59 -04:00
lissyx
5e46d702af
Merge pull request #2821 from lissyx/win-cuda-tests
Add Windows CUDA CI
2020-03-19 18:10:46 +01:00
Alexandre Lissy
28ff863b55 Add Windows CUDA CI
Fixes #1948
2020-03-19 16:56:02 +01:00
lissyx
ff9a720764
Merge pull request #2818 from lissyx/validate_label_locale+multiprocessing.notDummy
Validate label locale+multiprocessing.not dummy
2020-03-19 10:14:06 +01:00
Caleb Moses
7072daa05c
Remove try_loading from evaluate call 2020-03-19 10:54:22 +13:00
Reuben Morais
1547498e82 Const members in structs 2020-03-18 19:32:57 +01:00
Alexandre Lissy
7b2a409f9f Converting importers from multiprocessing.dummy to multiprocessing
Fixes #2817
2020-03-18 11:04:36 +01:00
Alexandre Lissy
ce59228824 Localizeable validate_label
Fixes #2804
2020-03-18 11:04:36 +01:00
Alexandre Lissy
f9e05fe0c3 Share argparser amongst importers 2020-03-18 11:04:36 +01:00
Caleb Moses
8e37a5cfb4 Run reset_default_graph before every evaluate 2020-03-18 10:38:51 +13:00
Caleb Moses
c9e6cbc958 Add trial pruning to lm_optimizer.py 2020-03-18 10:14:47 +13:00
Reuben Morais
29a2ac37f0
Merge pull request #2779 from reuben/export-metadata
Write model metadata to export folder unconditionally
2020-03-17 12:08:35 -03:00
Reuben Morais
2ec34d5a06 Address review comments 2020-03-17 14:48:31 +01:00
Reuben Morais
e9ae38bf47 Update docs 2020-03-17 14:48:31 +01:00
Reuben Morais
c52f3b32fa Adapt Java bindings to new API 2020-03-17 14:47:58 +01:00
Reuben Morais
bb709ff955 Adapt .NET bindings to new API 2020-03-17 14:47:58 +01:00
Reuben Morais
09048e2ea2 Adapt JavaScript bindings to new API 2020-03-17 14:47:58 +01:00
Reuben Morais
6e88a37ad4 Adapt Python bindings to new API 2020-03-17 14:47:58 +01:00
Reuben Morais
c74dcffe79 Adjust client.cc for new API and small cleanup of code and function names 2020-03-17 14:47:58 +01:00
Reuben Morais
ea8c7d2957 Add DS_IntermediateDecodeWithMetadata 2020-03-17 14:47:58 +01:00
Reuben Morais
69bd032605 Improve API naming around Metadata objects 2020-03-17 14:47:58 +01:00
dabinat
e1fec4e818 Client - Change JSON output to return alternatives transcripts in an "alternatives" array 2020-03-17 14:47:58 +01:00
dabinat
e0c42f01a4 Moved result limiting to ModelState instead of CTC decoder 2020-03-17 14:47:58 +01:00
dabinat
969b2ac4ba Changed variable names to match coding style 2020-03-17 14:47:58 +01:00
dabinat
004d66d224 Client changes to show multiple transcriptions in JSON output 2020-03-17 14:47:58 +01:00
dabinat
32c969c184 Expose multiple transcriptions through the API 2020-03-17 14:47:58 +01:00
Kelly Davis
b57eaa19d6
Merge pull request #2783 from mozilla/optuna
Added optimizer for lm_alpha + lm_beta
2020-03-17 09:58:01 +01:00
kdavis-mozilla
f0dbdf7855 Renamed optimizer 2020-03-17 09:56:33 +01:00
Reuben Morais
8ca087d955
Merge pull request #2820 from mozilla/taskcluster-py-nonstable
Correctly handle non stable versions in `--branch`
2020-03-12 14:02:10 -03:00
Reuben Morais
94cca3c651
Correctly handle non stable versions in --branch 2020-03-12 13:43:27 -03:00
Reuben Morais
e54eb5a783
Make readthedocs link more obvious 2020-03-12 13:31:42 -03:00
Tilman Kamp
87f70693c3
Merge pull request #2819 from tilmankamp/sdb
Process pool for audio preparation
2020-03-12 15:13:25 +01:00
Tilman Kamp
63bc695600 Process pool for audio preparation 2020-03-12 14:34:24 +01:00
Tilman Kamp
60304da5a6
Merge pull request #2723 from tilmankamp/sdb
Sample DBs
2020-03-12 13:57:28 +01:00
Tilman Kamp
6b1d6773de SDB support 2020-03-10 10:32:58 +01:00
Daniel
f808720b5b Update readme. 2020-03-09 16:34:04 +01:00
Daniel
9c73700ac7 Add error hint and default values for alpha and beta. 2020-03-09 16:26:57 +01:00
lissyx
3bd0b20bf7
Merge pull request #2814 from AI-ML-Projects/master
Enhancement debian package manager tweaks
2020-03-06 21:27:29 +01:00
Pratik Raj
00e3350ea1
Update Dockerfile 2020-03-07 01:33:32 +05:30
Pratik Raj
69586e8c75
Enhancement debian package manager tweaks
Major Changes No 1 : debian package manager tweaks

By default, Ubuntu or Debian based "apt" or "apt-get" system installs recommended but not suggested packages . 

By passing "--no-install-recommends" option, the user lets apt-get know not to consider recommended packages as a dependency to install.

This results in smaller downloads and installation of packages .

Refer to blog at [Ubuntu Blog](https://ubuntu.com/blog/we-reduced-our-docker-images-by-60-with-no-install-recommends) .

Major Changes No 2 : added packages apt-utils ca-certificates

Because build is 

1.  Slow and in log it is showing because "apt-utils" not installed 

2. to avoid build to exits with error without having certificate
2020-03-07 00:57:04 +05:30
lissyx
43b93f3164
Merge pull request #2813 from lissyx/enforce-newline-removal
Enforce proper line ending removal when reading alphabet
2020-03-06 15:52:21 +01:00
Alexandre Lissy
763ed38bae Enforce proper line ending removal when reading alphabet
Fixes #2611
2020-03-06 15:19:56 +01:00
lissyx
b52a4a96c6
Merge pull request #2812 from lissyx/real-alphabet-path
Show actual alphabet path in error message
2020-03-06 10:58:03 +01:00
lissyx
0f8291df71
Proper arguments ordering 2020-03-06 10:15:18 +01:00
Alexandre Lissy
61fa1ad428 Show actual alphabet path in error message 2020-03-06 10:13:18 +01:00
Daniel
ef095881ca Fix too many arguments for format string. 2020-03-03 16:58:47 +01:00
Daniel
c6109c30f3 Add some statistics. 2020-03-03 16:49:52 +01:00
Daniel
c9a433486f Add more arguments. Rename file variables. 2020-03-03 16:48:43 +01:00
lissyx
fe8ee4f778
Merge pull request #2802 from lissyx/sample_rate_checking
Ensure sample rate comparison with proper types
2020-02-28 11:56:27 +01:00
Alexandre Lissy
639a68d2ae Ensure sample rate comparison with proper types
Fixes #2798
2020-02-28 11:16:34 +01:00
Daniel
c505a4ec6c Update some comments. 2020-02-27 17:46:16 +01:00
Daniel
15a75c77ff Rewrite generate_lm.py to allow usage with other languages. 2020-02-27 17:18:19 +01:00
lissyx
d18720a7d8
Merge pull request #2790 from lissyx/test-kvm
Use KVM
2020-02-27 17:02:17 +01:00
lissyx
8c96634e6b
Merge pull request #2797 from dabinat/cudnn-doc-change
Doc: change cuDNN dependency to 7.6
2020-02-26 20:53:41 +01:00
dabinat
81dd30847c Doc: change cuDNN dependency to 7.6 2020-02-26 11:33:59 -08:00
Alexandre Lissy
af45400461 Use KVM for Android emulator 2020-02-26 19:49:02 +01:00
Reuben Morais
a9e72eb152 Merge branch 'pr/2794' (Fixes PR #2794) 2020-02-26 14:47:55 +01:00
Reuben Morais
a97e961d16 Fix nits 2020-02-26 14:47:45 +01:00
Shubham Kumar
03196c875d used strdup for showing error 2020-02-26 18:17:50 +05:30
kdavis-mozilla
561131a05c Added optimizer for lm_alpha + lm_beta 2020-02-26 11:41:22 +01:00
Shubham Kumar
c77d3d6f2d added documentation 2020-02-26 14:26:43 +05:30
lissyx
84ac39769c
Merge pull request #2787 from lissyx/gradle-android-prebuilt
Cache gradle deps and Android emulator setup
2020-02-26 09:29:31 +01:00
Shubham Kumar
4fd747e540 added implementation of DS_ErrorCodeToErrorMessage 2020-02-26 13:30:05 +05:30
Alexandre Lissy
c5ded5adfe Cache gradle deps and Android emulator setup 2020-02-25 19:17:13 +01:00
Reuben Morais
377f0bc4b8
Merge pull request #2789 from reuben/issue2786
Make const functions receive const ModelState pointers (Fixes #2786)
2020-02-25 11:57:24 +01:00
Reuben Morais
669aa497cc Address review comments 2020-02-25 11:52:48 +01:00
Reuben Morais
b74738a405 Make const functions receive const ModelState pointers 2020-02-25 11:15:45 +01:00
Reuben Morais
1f1f5a98e4
Merge pull request #2781 from rhamnett/patch-2
Add flag to force reinitialisation of learning rate after lr_plateau
2020-02-24 16:08:13 +01:00
lissyx
b3b357cdac
Merge pull request #2776 from lissyx/pyenv-prebuilt
Produce pyenv ready-to-use
2020-02-24 13:09:33 +01:00
lissyx
4059da2869
Merge pull request #2785 from imskr/sk-fix
added ctc decoder builds tasks
2020-02-24 12:34:21 +01:00
Shubham Kumar
b03d7fe4ee added against dependencies 2020-02-24 17:01:32 +05:30
Alexandre Lissy
e9f530f7c7 Make webrtcvad really optional 2020-02-24 12:08:12 +01:00
Alexandre Lissy
bd5044fe31 Ensure proper python ABI 2020-02-24 12:02:30 +01:00
Alexandre Lissy
8029f3d7dd Produce pyenv ready-to-use 2020-02-24 12:01:36 +01:00
Shubham Kumar
279dc947f1 added ctc decoder builds tasks 2020-02-23 15:01:57 +05:30
Richard Hamnett
a3268545ab
Update flags.py
change flag datatype to boolean
2020-02-21 19:24:13 +00:00
Reuben Morais
46e7993075
Merge pull request #2771 from reuben/warn-sample-rate
Warn if --audio_sample_rate does not match training sample
2020-02-21 19:46:05 +01:00
Richard Hamnett
5e1f54ae4f
Reset learning rate if force set 2020-02-21 18:33:43 +00:00
Richard Hamnett
0de9e4bf80
Add force_initialize_learning_rate
Ability to reset learning rate which has been reduced by reduce_lr_on_plateau
2020-02-21 18:32:03 +00:00
Reuben Morais
aff310d73a
Merge pull request #2780 from rhamnett/patch-1
Fix transcribe.py - use new checkpoint load method
2020-02-21 18:04:24 +01:00
Reuben Morais
f264134a61
Merge pull request #2778 from mozilla/ftyers-patch-1
Create BIBLIOGRAPHY.md
2020-02-21 18:01:07 +01:00
Francis Tyers
965927d91b
Update BIBLIOGRAPHY.md 2020-02-21 16:59:23 +00:00
Francis Tyers
ba21d4434b
Update BIBLIOGRAPHY.md 2020-02-21 16:56:25 +00:00
Richard Hamnett
e101cb8cc5
Fix transcribe.py - use new checkpoint load method
Replaced non existing try_loading() method with the saver method and respect load flag

Removed tf.train.Saver()
2020-02-21 15:56:58 +00:00
Reuben Morais
48178005a2 Write model metadata to export folder unconditionally 2020-02-21 12:46:44 +01:00
Francis Tyers
943f19e1d5
Create BIBLIOGRAPHY.md 2020-02-20 23:12:46 +00:00
Reuben Morais
4291db7309 Handle graph without learning rate variable for export case 2020-02-20 15:40:35 +01:00
lissyx
234a64c6ea
Merge pull request #2766 from lissyx/homebrew-prebuilt-caches
Generate one-time Homebrew tarball
2020-02-20 12:47:05 +01:00
Alexandre Lissy
7d1663b1c5 Generate one-time Homebrew tarball 2020-02-20 11:46:43 +01:00
lissyx
fc39433f9b
Merge pull request #2775 from lissyx/remove-irc
Remove IRC notifications
2020-02-20 10:47:33 +01:00
Alexandre Lissy
2c69273a49 Remove IRC notifications 2020-02-20 10:09:11 +01:00
Reuben Morais
536b821d24
Merge pull request #2772 from reuben/error-code-hex
Report error code as hexadecimal numbers for easier lookup
2020-02-19 15:19:21 +01:00
Reuben Morais
f47c7f8421 Report error code as hexadecimal numbers for easier lookup 2020-02-19 14:11:01 +01:00
Reuben Morais
0b82c751db
Merge pull request #2770 from reuben/lr_reduction_rebased
Reduce learning rate on plateau
2020-02-18 22:20:54 +01:00
Reuben Morais
1178215423 Warn if --audio_sample_rate does not match training sample
In PR #2688, we started specifying the upper frequency limit when computing Mfccs.
This value was computed as half of the --audio_sample_rate value. Despite accepting
a variable sample rate input for the Mfcc computation, the TensorFlow OP only takes
a constant upper frequency limit, so we can't pass a dynamic value computed from each
sample to the op.

This means we lost the ability to transparently train on data with multiple sample
rates. This commit adds a warning message in case a training sample does not match
the --audio_sample_rate flag.
2020-02-18 18:15:01 +01:00
Reuben Morais
559042a218 Increase epoch count in train tests to guarantee outputs in 8kHz mode 2020-02-18 18:14:15 +01:00
Reuben Morais
78e8dfdf38 Disable early stopping and LR reduction on plateau by default 2020-02-18 16:18:37 +01:00
Reuben Morais
6e12b7caed Allow missing learning rate variable in older checkpoints 2020-02-18 16:18:31 +01:00
Daniel
17ddc5600e Reduce learning rate on plateau. 2020-02-18 16:17:51 +01:00
lissyx
47f702bf37
Merge pull request #2768 from lissyx/auto-decompress-artifacts
Automagically decompress GZipped artifacts
2020-02-18 15:29:46 +01:00
Alexandre Lissy
020619fa97 Automagically decompress GZipped artifacts
Fixes #2760
2020-02-18 13:59:30 +01:00
Reuben Morais
44ff4c54b9
Merge pull request #2767 from reuben/readable_repr_bindings
Add a better __repr__ for Metadata objects in Python bindings
2020-02-18 13:28:57 +01:00
Reuben Morais
ac26a785df Add a better __repr__ for Metadata objects in Python bindings 2020-02-18 12:19:10 +01:00
Reuben Morais
1c69d93b4e Update training docs to mention new CuDNN flags and checkpoint dir flags
X-DeepSpeech: NOBUILD
2020-02-17 17:40:44 +01:00
Reuben Morais
200a46711a
Merge pull request #2763 from reuben/transfer-learning-rebase
Transfer learning support
2020-02-17 17:29:03 +01:00
Reuben Morais
bd8b96c19d Remove unneeded Saver instances 2020-02-17 15:55:16 +01:00
Reuben Morais
0e2f34f8cf Synchronize TensorFlow logging with --log_level flag 2020-02-17 12:46:00 +01:00
Reuben Morais
c46d8396bc Respect --load when exporting 2020-02-17 12:45:37 +01:00
lissyx
9dd63d5c87
Merge pull request #2765 from lissyx/bump-v0.7.0-alpha.2
Bump VERSION to 0.7.0-alpha.2
2020-02-17 12:13:38 +01:00
Alexandre Lissy
685fb1cc9b Bump VERSION to 0.7.0-alpha.2 2020-02-17 12:12:57 +01:00
lissyx
8fc6b7b216
Merge pull request #2764 from lissyx/electronjsv8.0
Support ElectronJS v8.0
2020-02-17 12:11:44 +01:00
Reuben Morais
cedd72da9b Force UTF-8 IO encoding 2020-02-17 10:46:40 +01:00
Alexandre Lissy
82344b9fe2 Support ElectronJS v8.0
Fixes #2759
2020-02-17 09:59:39 +01:00
Reuben Morais
f32fd7a33f Add transfer learning test 2020-02-17 08:29:10 +01:00
JRMeyer
5bba9ea5d1 Transfer-learning support 2020-02-17 08:29:10 +01:00
Reuben Morais
f27457cbbe
Merge pull request #2762 from reuben/remove-generate-trie-docs
Remove references to generate_trie from docs (Fixes #2761)
2020-02-16 18:09:55 +01:00
Reuben Morais
27e2e44400 Remove references to generate_trie from docs
X-DeepSpeech: NOBUILD
2020-02-16 11:36:27 +01:00
lissyx
22b518f8fa
Merge pull request #2757 from lissyx/speed-python-builds
Ensure python builds uses all ressources
2020-02-14 20:03:55 +01:00
Alexandre Lissy
0add08e30d Ensure python builds uses all ressources 2020-02-14 18:55:03 +01:00
lissyx
551bded23e
Merge pull request #2756 from lissyx/no-remirror
Manually update the TC index when needed
2020-02-14 14:52:45 +01:00
Alexandre Lissy
e0277853b2 Manually update the TC index when needed 2020-02-14 12:55:47 +01:00
lissyx
5155f0afcb
Merge pull request #2755 from lissyx/ts-mirror
Use other storage for TrainingSpeech dataset
2020-02-13 20:27:05 +01:00
Alexandre Lissy
95b931df4f Use other storage for TrainingSpeech dataset
Fixes #2715

X-DeepSpeech: NOBUILD
2020-02-13 20:26:25 +01:00
Reuben Morais
40f1827a24
Merge pull request #2746 from reuben/example-linenos
Update example line numbers
2020-02-13 17:37:47 +01:00
Reuben Morais
e9cf022e21 Merge branch 'expose-version' (Fixes #2745) 2020-02-13 17:35:13 +01:00
Reuben Morais
7001a17418 Expose version in a consumable way 2020-02-13 17:34:58 +01:00
lissyx
aa6f84ac56
Merge pull request #2749 from lissyx/tc-scripts
Explode tc-tests-utils into several smaller chunks
2020-02-13 17:18:55 +01:00
PedroDKE
50c5e5b23a
Make the webrtcvad dependency optional (#2754) 2020-02-13 17:16:37 +01:00
Alexandre Lissy
d6f0a5026c Explode tc-tests-utils into several smaller chunks
Fixes #1840
2020-02-13 15:42:34 +01:00
lissyx
a334c333de
Merge pull request #2731 from lissyx/swig-build
Build SWIG locally
2020-02-12 16:17:19 +01:00
Alexandre Lissy
e5149ded12 Build SWIG locally 2020-02-12 15:12:30 +01:00
Reuben Morais
09224cea46 Update example line numbers 2020-02-12 15:02:52 +01:00
Reuben Morais
52775504e5 Point to master branch of examples 2020-02-12 13:18:37 +01:00
Reuben Morais
88a1048322 Merge branch 'embed-beam-width' (Fixes #2744) 2020-02-12 13:15:43 +01:00
Reuben Morais
c2f336b3e3 Remove hardcoded constants from evaluate_tflite.py 2020-02-12 13:08:34 +01:00
Reuben Morais
3637f88c06 Fix CI errors, address comments, update examples 2020-02-12 10:13:02 +01:00
Reuben Morais
c512383aec Fix consumers of DS_CreateModel 2020-02-12 10:13:02 +01:00
Reuben Morais
8e9b6ef7b3 Embed default beam width into exported graph and remove param from DS_CreateModel 2020-02-11 21:23:05 +01:00
Reuben Morais
5366f90375 Merge branch 'decoder-api-changes' (PR #2681) 2020-02-11 21:18:57 +01:00
Reuben Morais
6efc3ccf50 Update examples model asset 2020-02-11 19:44:36 +01:00
Reuben Morais
8dedda7759 Address review comments 2020-02-11 19:44:36 +01:00
Reuben Morais
1d3b3a31a1 Address review comments and update docs 2020-02-11 19:44:36 +01:00
Reuben Morais
efbed73d5c Improve error handling around Scorer loading 2020-02-11 19:44:36 +01:00
Reuben Morais
3b54f54524 Fix linter errors
X-DeepSpeech: NOBUILD
2020-02-11 19:44:36 +01:00
Reuben Morais
1e2eb96248 Update all API consumers 2020-02-11 19:44:36 +01:00
Reuben Morais
708b21a63e Add tool to extract vocabulary from the old LM binary format 2020-02-11 19:44:36 +01:00
Reuben Morais
a156d28504 Switch smoke test scorer to new format 2020-02-11 19:44:36 +01:00
Reuben Morais
b34723588d Switch to new scorer format 2020-02-11 19:44:29 +01:00
Reuben Morais
ab08f5ee5a Change decoder API 2020-02-11 19:44:26 +01:00
Reuben Morais
16d5632d6f Write default values for alpha and beta into trie header 2020-02-11 19:44:26 +01:00
Reuben Morais
b33d90b7bd Load combined format from Scorer 2020-02-11 19:44:26 +01:00
Reuben Morais
214b50f490 Add generate_package tool to create combined scorer package 2020-02-11 19:44:26 +01:00
Reuben Morais
be2229ef29 Refactor Scorer so model/trie package can be created by an external tool 2020-02-11 19:44:26 +01:00
Reuben Morais
7c0354483e Stop including vocabulary data in LM.binary. 2020-02-11 19:44:19 +01:00
Reuben Morais
d65422c8ab Update KenLM to b9f35777d112ce2fc10bd3986302517a16dc3883 2020-02-11 19:44:17 +01:00
lissyx
93e5ce498a
Merge pull request #2737 from lissyx/win-nodegyp-cache
Produce and use node-gyp cache
2020-02-11 18:57:43 +01:00
Reuben Morais
3b9a39cb67
Merge pull request #2740 from reuben/irc-to-matrix
Point people to Matrix room instead of IRC
2020-02-11 17:45:48 +01:00
Reuben Morais
5eac447de4 Point people to Matrix room instead of IRC
X-DeepSpeech: NOBUILD
2020-02-11 17:44:44 +01:00
Alexandre Lissy
ce5629d33a Produce and use node-gyp cache
Fixes #2718
2020-02-11 12:46:19 +01:00
Reuben Morais
ed335f42cb
Merge pull request #2739 from reuben/virtualenv-windows
Use venv instead of virtualenv package on Windows
2020-02-11 12:06:23 +01:00
Reuben Morais
0a138c717e Use venv instead of virtualenv package 2020-02-11 11:16:32 +01:00
Reuben Morais
49fe54e0e3
Merge pull request #2688 from mozilla/mfcc-upper-frequency
Specify upper frequency limit when computing Mfccs
2020-02-11 10:26:09 +01:00
Reuben Morais
550c572962
Merge pull request #2736 from reuben/fix-extraneous-or-tests
Remove extraneous OR operators in tc-tests-utils.sh
2020-02-10 17:38:58 +01:00
Reuben Morais
92be76657f Stop using 8kHz data with concurrent streams test as it does not resample 2020-02-10 16:22:10 +01:00
Reuben Morais
245fb24946 Fix expected outputs 2020-02-10 16:22:10 +01:00
Reuben Morais
a3ce59de58 Remove extraneous OR operators in tc-tests-utils.sh
Fixes:

+ '[' -o 16k = 8k ']'
/home/build-user/DeepSpeech/ds/taskcluster/tc-tests-utils.sh: line 262: [: too many arguments
2020-02-10 16:22:10 +01:00
lissyx
33efd9b7ff
Merge pull request #2724 from DanBmh/master
Print best and worst results in a WER report.
2020-02-07 11:27:16 +01:00
Daniel
726cc20586 Rename dataset param. 2020-02-06 14:47:59 +01:00
Daniel
8cc91fafb2 Moved summary printing to samples printing. 2020-02-06 14:44:41 +01:00
Daniel
de92142986 Named example sections. 2020-02-06 13:31:30 +01:00
Daniel
4186cbef88 Reverse ordered loss again. 2020-02-06 12:50:20 +01:00
Reuben Morais
f09aad48aa
Merge pull request #2726 from carlfm01/decoding-fix
Fix Intermediate decoding
2020-02-06 12:30:30 +01:00
Daniel
f5145526f0 Dont need flags. 2020-02-06 11:43:48 +01:00
Daniel
9ec88b7f28 Add whitespace again. 2020-02-06 11:41:40 +01:00
Daniel
63a07e6834 Added summary to evaluate_tflite.py and moved method to evaluate_tools.py. 2020-02-06 11:39:31 +01:00
Daniel
a0b5d3e7e0 Restore order of imports. 2020-02-06 11:07:11 +01:00
Daniel
320e815bb7 Remove semicolon. 2020-02-06 11:05:15 +01:00
Daniel
369e3c9fc3 Revert linebreak. 2020-02-06 11:01:37 +01:00
Daniel
272ed99d24 Add median examples. Fix sorting. 2020-02-06 10:55:48 +01:00
Carlos Fonseca M
17e011c18d Fix Intermediate decoding 2020-02-05 22:47:09 -06:00
Daniel
a2f05ccabe Print best and worst results in a WER report. 2020-02-05 17:52:10 +01:00
lissyx
6f5af8ec2c
Merge pull request #2719 from lissyx/doc-pydev
Ensure documentation mentions python3-dev
2020-02-04 20:43:46 +01:00
Alexandre Lissy
00d11e2fec Ensure documentation mentions python3-dev
Fixes #2712
2020-02-04 18:18:41 +01:00
lissyx
60cbe3b201
Merge pull request #2714 from lissyx/bump-v0.7.0-alpha.1
Bump VERSION to 0.7.0-alpha.1
2020-02-03 14:54:27 +01:00
Alexandre Lissy
dd982b2224 Bump VERSION to 0.7.0-alpha.1 2020-02-03 14:53:09 +01:00
lissyx
92d8bad7c1
Merge pull request #2711 from lissyx/ctcdecoder_version_check
Enforce CTC decoder version check
2020-02-03 13:40:25 +01:00
Alexandre Lissy
ff401732a3 Enforce CTC decoder version check
Fix #2710
2020-02-03 10:24:21 +01:00
lissyx
6af68efa00
Merge pull request #2708 from lissyx/bump-v0.7.0-alpha.0
Bump VERSION to 0.7.0-alpha.0
2020-01-31 12:03:16 +01:00
Alexandre Lissy
4e452dcd0b Bump VERSION to 0.7.0-alpha.0 2020-01-31 12:02:43 +01:00
lissyx
8c5a7b07c9
Merge pull request #2705 from lissyx/nodejs-tflite
Produce TFLite-specific NPM package
2020-01-31 11:48:35 +01:00
lissyx
5d0e4cc8ed
Merge pull request #2704 from lissyx/remove-benchmark-nc
Remove unused benchmark_nc
2020-01-31 11:47:30 +01:00
lissyx
3d18ab8058
Merge pull request #2706 from lissyx/win-tflite
Produce TFLite NuGet package
2020-01-30 16:03:32 +01:00
Alexandre Lissy
0f9869fb00 Produce TFLite NuGet package 2020-01-30 11:49:49 +01:00
Alexandre Lissy
6e521ff3a2 Produce TFLite-specific NPM package 2020-01-30 10:47:42 +01:00
Alexandre Lissy
dc25509950 Remove unused benchmark_nc 2020-01-30 08:47:59 +01:00
lissyx
0428c846f2
Merge pull request #2700 from lissyx/swig-doc
Improve SWIG reference
2020-01-29 11:16:03 +01:00
Alexandre Lissy
38db9a2441 Improve SWIG reference 2020-01-29 11:11:30 +01:00
lissyx
d74ab7dc1a
Merge pull request #2695 from lissyx/tf_unique_ptr
Use std::unique_ptr<> for TensorFlow session
2020-01-29 10:54:37 +01:00
Alexandre Lissy
656eea4622 Use std::unique_ptr<> for TensorFlow session 2020-01-29 09:42:05 +01:00
Reuben Morais
1d6a337ab4 Merge branch 'javascript-buffer-length' (Fixes #2693) 2020-01-27 21:22:25 +01:00
Reuben Morais
8d42c2bdd9 Adjust Buffer length to account for element size inside the JS binding 2020-01-27 18:20:57 +01:00
Reuben Morais
502436f8f3 Merge branch 'PedroDKE-args_export_model_name' (Fixes #2690) 2020-01-27 15:59:47 +01:00
Reuben Morais
50830d7022 Fix whitespace 2020-01-27 15:59:39 +01:00
PedroDKE
3e349497ed added an argument to choose the final export model name 2020-01-25 12:30:07 +01:00
Reuben Morais
9735d066c5 Bump graph version 2020-01-24 10:20:42 +01:00
Reuben Morais
709cd0d2f2 Specify upper frequency limit when computing Mfccs 2020-01-24 09:52:09 +01:00
lissyx
a9855f1e4e
Merge pull request #2685 from juandspy/patch-1
Update TRAINING.rst  (mmap-able model)
2020-01-22 16:58:55 +01:00
juandspy
29a92e098f
Update taskcluster.py
I copied ``maybe_download_tc_bin`` syntax in order to make the code easier to follow.
2020-01-22 16:22:15 +01:00
juandspy
616760eb52
Update TRAINING.rst 2020-01-22 16:15:26 +01:00
juandspy
b6008d0454
Update util/taskcluster.py 2020-01-22 14:50:42 +01:00
lissyx
d6ca542722
Merge pull request #2686 from lissyx/ensure-r1.15
Ensure properly link to TensorFlow r1.15
2020-01-22 12:31:27 +01:00
Alexandre Lissy
d9072c2a87 Ensure properly link to TensorFlow r1.15 2020-01-22 10:50:55 +01:00
juandspy
685c1f7c1b
Update TRAINING.rst (mmap-able model)
I had this "command not found" problem and solved it with https://discourse.mozilla.org/t/how-to-create-a-mmap-able-model-from-the-output-graph-pb-file/28984/13?, so I'm adding it to the documentation.
2020-01-22 08:58:33 +01:00
lissyx
f02a993820
Merge pull request #2682 from juandspy/patch-1
Update evaluate.py
2020-01-21 10:03:18 +01:00
juandspy
a0e528f52e
Update evaluate.py
The package ```sys``` was not imported and gives a ```NameError: name 'sys' is not defined``` in line 96 when no test_file was provided. I added the import statement.
2020-01-21 09:57:33 +01:00
Reuben Morais
94882fb1c9
Mention use of CuDNN RNN in release checkpoints
X-DeepSpeech: NOBUILD
2020-01-18 18:07:55 +00:00
Reuben Morais
3cea430f7e
Merge pull request #2679 from mozilla/pr-2548-multistream
Re-land PR #2548 multistream support for .NET bindings
2020-01-18 15:04:21 +01:00
Carlos Fonseca M
923729d920 Multi-stream support .NET
Adds multi-stream support for the .NET client using the same acoustic model.
2020-01-18 12:35:29 +01:00
Carlos Fonseca M
fe2477b25c Remove unused members
FreeString and FreeMetadata are both private usage only.
2020-01-18 12:34:13 +01:00
lissyx
ff4906a45a
Merge pull request #2677 from lissyx/arm64-tflite
Arm64 tflite
2020-01-18 12:24:45 +01:00
Alexandre Lissy
c5136fd4ac Update ARM64 tests against TFLite assets
Fixes #2676
2020-01-18 10:08:04 +01:00
Tilman Kamp
65f2a09023
Merge pull request #2673 from tilmankamp/helpers
Introducing utils.helpers for miscellaneous helper functions
2020-01-15 16:49:48 +01:00
lissyx
c9385d4c11
Merge pull request #2674 from lissyx/fix-metadata-time
Fix word detection for time computation
2020-01-14 17:39:52 +01:00
Alexandre Lissy
66fe634cfe Fix word detection for time computation
Fixes #2623
2020-01-14 16:33:51 +01:00
Tilman Kamp
ad9f0c581b Introducing utils.helpers for miscellaneous helper functions 2020-01-14 16:04:18 +01:00
lissyx
7b3bc31171
Merge pull request #2667 from lissyx/remove-python-nodejs-oldies
Remove python nodejs oldies
2020-01-13 18:03:03 +01:00
lissyx
ea6377c2aa
Merge pull request #2666 from lissyx/fix-prodmodel
Remove v0.6.0 TFLite prod model workaround
2020-01-13 16:31:35 +01:00
lissyx
1752d6d03f
Merge pull request #2644 from lissyx/android-binding-typo
Fix DS_EnableDecoderWithLM typo in Android bindings
2020-01-13 16:31:27 +01:00
Alexandre Lissy
e4acdd7545 Remove Homebrew node workaround 2020-01-13 16:17:13 +01:00
Alexandre Lissy
b18675eae5 Switch to NodeJS v12 (LTS) for build 2020-01-13 16:17:13 +01:00
Alexandre Lissy
b216b943b9 Update Python, NodeJS and ElectronJS to latest stables 2020-01-13 16:17:13 +01:00
Alexandre Lissy
2eaa9e4a18 Remove ElectronJS < 5.0 (unsupported) 2020-01-13 13:53:17 +01:00
Alexandre Lissy
03892fb3fd Remove NodeJS < v10 2020-01-13 13:53:17 +01:00
Alexandre Lissy
d911ccb2b7 Fix DS_EnableDecoderWithLM typo in Android bindings
Fixes #2643
2020-01-13 12:43:19 +01:00
Alexandre Lissy
d76f1929b0 Remove v0.6.0 TFLite prod model workaround 2020-01-13 12:42:41 +01:00
Alexandre Lissy
197704b868 Remove Python 2.7
Fixes #2659
2020-01-13 12:42:34 +01:00
Reuben Morais
2df62da147
Merge pull request #2668 from mozilla/fix-evaluate-1.15
Upgrade pip, setuptools and wheel before installing requirements
2020-01-13 12:39:22 +01:00
Reuben Morais
700c9747d9 Pin versions of pip, setuptools, wheel 2020-01-13 11:24:10 +01:00
Reuben Morais
22aabae55a Upgrade pip, setuptools and wheel before installing requirements 2020-01-13 11:14:13 +01:00
Reuben Morais
7225336cea
Merge pull request #2645 from reuben/tf-1.15
Update to TensorFlow 1.15
2020-01-13 10:07:03 +01:00
Reuben Morais
faed282cfc
Merge pull request #2664 from mozilla/evaluate_tflite_fixes
evaluate_tflite.py fixes
2020-01-13 10:03:40 +01:00
Reuben Morais
17597f4526 Fix benchmark_model test 2020-01-12 15:26:08 +01:00
Reuben Morais
7e1f4a2d68 Fix linter errors
X-DeepSpeech: NOBUILD
2020-01-12 13:59:25 +01:00
Reuben Morais
d0e86fe10a Add a test for evaluate_tflite.py 2020-01-12 13:35:21 +01:00
Reuben Morais
cce1cec740 Upgrade pip, setuptools and wheel before installing requirements 2020-01-12 12:23:56 +01:00
Reuben Morais
42cb00dafd Switch TF dependency to r1.15 branch 2020-01-12 12:23:56 +01:00
Reuben Morais
fa66a04798 Update evaluate_tflite requirements 2020-01-12 11:02:15 +01:00
Reuben Morais
c28f61d370 Output full paths to results dump in evaluate_tflite.py 2020-01-12 11:01:54 +01:00
Reuben Morais
33e725bb25 Make evaluate_tflite.py work with v0.6.1 calculate_report 2020-01-12 11:01:33 +01:00
Reuben Morais
42fe30d572 Make evaluate_tflite.py work with relative paths in the CSV 2020-01-12 10:59:30 +01:00
Reuben Morais
3df20fee52
Merge pull request #2658 from mozilla/bump-v0.6.1
Bump version to v0.6.1
2020-01-10 18:08:45 +01:00
lissyx
c024464c12
Merge pull request #2640 from lissyx/train-8k
Add 8kHz training test coverage
2020-01-10 16:21:15 +01:00
Reuben Morais
fc63ce0c04 Bump version to v0.6.1 2020-01-10 15:02:47 +01:00
Alexandre Lissy
581515e094 Add 8kHz training test coverage
Fixes #2638
2020-01-10 14:51:45 +01:00
lissyx
2d47855e21
Merge pull request #2657 from lissyx/node-10
Temp workaround to move to node@10 on Homebrew
2020-01-10 13:38:49 +01:00
Alexandre Lissy
3bbed56f1e Temp workaround to move to node@10 on Homebrew 2020-01-10 13:10:04 +01:00
Reuben Morais
0facbb03d3
Merge pull request #2647 from mozilla/issue1709-feature-cache-help
Improve --feature_cache help text (Fixes #1709)
2020-01-10 11:31:31 +01:00
Reuben Morais
e5eedf5252
Improve --feature_cache help text
X-DeepSpeech: NOBUILD
2020-01-10 10:06:02 +00:00
lissyx
2cb48e72f7
Merge pull request #2639 from lissyx/allthedocs
Publish README/USING/TRAINING to readthedocs
2020-01-08 20:56:43 +01:00
Reuben Morais
f6cd28ba2d
Fix handling of InvalidArgumentError in training loop
X-DeepSpeech: NOBUILD
2020-01-08 17:10:47 +00:00
Alexandre Lissy
4c7d5fb0e1 Publish README/USING/TRAINING to readthedocs
Fixes #2581
2020-01-08 16:56:13 +01:00
Reuben Morais
1dfba839ea
Merge pull request #2637 from Jendker/patch-1
Remove information about 16 kHz only support

X-DeepSpeech: NOBUILD
2020-01-07 16:30:56 +00:00
Jędrzej Beniamin Orbik
7c5d37312e
Remove information about 16 kHz only support
Based on the discussion here: https://discourse.mozilla.org/t/inference-with-model-different-than-16khz/43217/17 models for data different than 16kHz can be trained and used with client now.
2020-01-07 17:26:02 +01:00
lissyx
73278cf8d6
Merge pull request #2630 from lissyx/win-redist
Ensure document vc_redist
2020-01-03 17:14:45 +01:00
Alexandre Lissy
af8b64f3bc Ensure document vc_redist
Fixes #2606
2020-01-03 17:11:57 +01:00
Reuben Morais
8d1f52a677
Merge pull request #2629 from mychiux413/fix-axis-inversion-problem
fix axis inversion problem
2020-01-03 13:25:43 +01:00
Reuben Morais
e1d14eb9a9
Merge pull request #2560 from mychiux413/sparse_warp
[SpecAugment] Refactor sparse_image_warp for dynamic shape of spectrogram
2020-01-03 13:25:31 +01:00
Yi-Hua Chiu
6e2befacb2 swap freq <-> time 2020-01-03 11:14:28 +08:00
Yi-Hua Chiu
4133e620bd remove debugging leftover, add UNSUPPORTED note, just skip invertible error 2020-01-03 10:32:02 +08:00
lissyx
56eae497ce
Merge pull request #2615 from lissyx/tflite-prod-tests
Add TFLite prod tests
2020-01-02 17:49:45 +01:00
Alexandre Lissy
13d05c4a6f Run with fixed release model 2020-01-02 16:16:58 +01:00
Alexandre Lissy
89cd481d52 Add TFLite prod tests
Fixes #2614
2020-01-02 16:16:58 +01:00
lissyx
3cd79aecdf
Merge pull request #2628 from lissyx/bintray-readme
Re-enable Markdown small README for Bintray hosting
2020-01-02 16:11:54 +01:00
Alexandre Lissy
b90f80e7ed Re-enable Markdown small README for Bintray hosting 2020-01-02 16:11:19 +01:00
lissyx
48401e96fd
Merge pull request #2607 from lissyx/fix-silences
Don't OOV_SCORE on empty prefix
2020-01-02 15:07:47 +01:00
Tilman Kamp
242d70dc8c
Merge pull request #2625 from tilmankamp/swc_debug
Implements #2624 - SWC importer: CSV columns for article and speaker
2020-01-02 13:19:05 +01:00
Alexandre Lissy
f44d2ddeb9 Don't OOV_SCORE on empty prefix
Fixes #2579
2020-01-02 12:39:12 +01:00
Reuben Morais
6fa2babdfd
Merge pull request #2620 from mozilla/fix-js-readme
Fix JS package README
2020-01-02 10:22:33 +01:00
Reuben Morais
de5afc8871
Merge pull request #2626 from dabinat/mmap-readme-change
TRAINING.rst - Include exact command for getting mmap tool
2020-01-02 10:21:25 +01:00
Yi-Hua Chiu
c570cb670a sparse_warp still can have chance to raise error even after millions steps, so just recover the invertible error while training, unless error raise 3 times, it will be aborted 2020-01-02 11:06:15 +08:00
dabinat
d1b8eaa402 TRAINING.rst - Include exact command for getting mmap tool 2019-12-31 14:20:25 -08:00
Tilman Kamp
259a60b7b1 Implements #2624 - SWC importer: CSV columns for article and speaker 2019-12-31 16:23:37 +01:00
Yi-Hua Chiu
fa41809a40 [MOD] change time_warping_para as 20 to make spec not sound too vague [FIX] make sure invertible error is not raised after many many epochs 2019-12-31 16:31:46 +08:00
Reuben Morais
3fd25badb5 Fix JS package README 2019-12-30 10:13:58 +01:00
Reuben Morais
85a61a3ab7
Merge pull request #2616 from KathyReid/patch-1
Update to specify which package libpthread is in
2019-12-24 08:03:36 +00:00
Kathy Reid
2a19c444d4
Update to specify which package libpthread is in
libpthread on Ubuntu (and presumably any other Debian or Debian derivatives) is in the libpthread-stubs0-dev package, it took me a bit of digging to find it. Best, Kathy
2019-12-21 12:05:46 +11:00
lissyx
1d0035ce7f
Merge pull request #2613 from lissyx/fix-tflite-forget_bias
Set forget_bias=0 for static RNN implementation
2019-12-20 14:54:54 +01:00
Alexandre Lissy
5f003cfbd6 Set forget_bias=0 for static RNN implementation
Fixes #2612
2019-12-20 13:28:46 +01:00
Yi-Hua Chiu
3aedcc4222 [FIX] decprecate fixed-frequency-edge, which always have chance to raise tensor invertible error 2019-12-19 11:26:05 +08:00
Yi-Hua Chiu
72c09ebb38 [FIX] reversible error if dest_time == 0 2019-12-18 18:22:38 +08:00
Yi-Hua Chiu
533d15645f [FIX] constraint time_warping_para to protect short audio augment 2019-12-18 14:26:39 +08:00
Reuben Morais
551b3dd5f5
Merge pull request #2599 from mozilla/pypi-tflite-windows
Also upload Windows TFLite Python package to PyPI
2019-12-13 18:36:04 +01:00
Reuben Morais
83338541ae Also upload Windows TFLite Python package to PyPI
X-DeepSpeech: NOBUILD
2019-12-13 18:21:41 +01:00
Reuben Morais
dd645d5a06
Merge pull request #2593 from mozilla/make-new-alpha
Bump version to v0.6.1-alpha.0
2019-12-13 17:09:57 +01:00
Reuben Morais
d3d337c10e Bump version to v0.6.1-alpha.0 2019-12-13 15:57:48 +01:00
lissyx
a0d01a5186
Merge pull request #2597 from lissyx/android-emulator
Use Xvfb for emulator
2019-12-13 15:39:04 +01:00
Alexandre Lissy
399a4f76e1 Run emulator under xvfb 2019-12-13 14:12:17 +01:00
Reuben Morais
2675cfc6fc
Merge pull request #2598 from mozilla/fix-tests-ldc93s1
Use data/smoke_test in tests to avoid depending on LDC servers
2019-12-13 13:53:34 +01:00
Reuben Morais
fa8061022e Use data/smoke_test in tests to avoid depending on LDC servers 2019-12-13 12:48:05 +01:00
lissyx
7791f6245c
Merge pull request #2596 from thecodrr/master
Added third-party bindings for Vlang
2019-12-12 10:04:43 +01:00
Abdullah Atta
1c2a153841
added vspeech third-party bindings 2019-12-12 04:42:03 +05:00
Reuben Morais
3ea3a58f92
Merge pull request #2574 from mozilla/move-examples
Move examples to separate repository (Fixes #2564)
2019-12-10 19:14:35 +01:00
Reuben Morais
2471cf709d Download examples repository in Windows tasks 2019-12-10 16:25:01 +01:00
Reuben Morais
31991ff90c Remove individual example links from main README
X-DeepSpeech: NOBUILD
2019-12-10 16:25:01 +01:00
Reuben Morais
808b154ef9 Use submodule for building contrib examples into docs 2019-12-10 16:25:01 +01:00
Reuben Morais
bce5544595 Build WPF example from examples repo 2019-12-10 16:25:01 +01:00
Reuben Morais
a5deaa5f48 Check out and point to external examples repo in automation 2019-12-10 16:25:01 +01:00
Reuben Morais
5a0adc8846 Remove example code 2019-12-10 16:25:00 +01:00
Reuben Morais
911743a0b8
Merge pull request #2591 from mozilla/revert-2548-net-streams
Revert "Multi-stream support .NET"
2019-12-10 16:16:16 +01:00
Reuben Morais
03a822b670
Revert "Multi-stream support .NET" 2019-12-10 14:04:39 +01:00
Reuben Morais
35e04d383e
Merge pull request #2586 from mozilla/tflite-packages
Build and publish Python TFLite package as deepspeech-tflite
2019-12-09 19:38:36 +01:00
Reuben Morais
9e48845c51
Merge pull request #2585 from mozilla/package-readmes
Use simple README pointing to GitHub for JS/Python packages
2019-12-09 19:37:08 +01:00
Reuben Morais
a9fff3f866
Merge pull request #2548 from carlfm01/net-streams
Multi-stream support .NET
2019-12-06 19:48:10 +00:00
Reuben Morais
53fcfd5096 Build and publish Python TFLite package as deepspeech-tflite 2019-12-06 19:16:58 +01:00
Reuben Morais
c8a57b192c Use simple README pointing to GitHub for JS/Python packages 2019-12-06 17:33:00 +01:00
Reuben Morais
797ff4f8a9
Update example links in README
X-DeepSpeech: NOBUILD
2019-12-05 16:27:57 +00:00
Reuben Morais
8d87606f00
Update example links in README
X-DeepSpeech: NOBUILD
2019-12-05 16:22:30 +00:00
Reuben Morais
0427c1572a
Merge pull request #2582 from mozilla/intermediate-decode-docs
Remove outdated mention of DS_IntermediateDecode being expensive to call
2019-12-05 14:44:51 +00:00
Reuben Morais
13fdfee844 Remove outdated mention of DS_IntermediateDecode being expensive to call
X-DeepSpeech: NOBUILD
2019-12-05 15:44:06 +01:00
Reuben Morais
ae7b455c51
Merge pull request #2565 from JRMeyer/docs
Remove bad formatting in documentation

X-DeepSpeech: NOBUILD
2019-12-04 15:45:29 +00:00
Reuben Morais
b0beb1fefa
Merge pull request #2568 from mozilla/warn-fractional-window-size
Error early if audio sample rate and feature window/step length are invalid (Fixes #2323)
2019-12-04 15:44:22 +00:00
Reuben Morais
122b0f2d98
Merge pull request #2573 from mozilla/examples-v0.6.0
Pin examples to 0.6.0
2019-12-04 15:43:52 +00:00
Reuben Morais
1021ee10ed Pin examples to 0.6.0 2019-12-04 15:53:09 +01:00
lissyx
146ab7ae00
Merge pull request #2571 from pietrop/master
Update Readme.md
2019-12-04 15:32:32 +01:00
Pietro
62103fc505
Update Readme.md 2019-12-04 14:27:25 +00:00
Reuben Morais
240646b708 Error early if audio sample rate and feature window/step length are invalid 2019-12-04 11:06:16 +01:00
Reuben Morais
b4ffbfa307
Merge pull request #2559 from mozilla/docs-master-warning
Warn about master docs applying to master only
2019-12-04 09:44:16 +00:00
Yi-Hua Chiu
5fbc7e8596 [FIX] use time_warping_para to constraint control width, add logic to disable cache [ADD] num_control_points 2019-12-04 10:25:36 +08:00
josh meyer
e7c70a67be remove bad ticks 2019-12-03 13:42:35 -08:00
Reuben Morais
6d43e213a7 Retry tag on TaskCluster 2019-12-03 18:12:34 +01:00
Reuben Morais
e6b5d227c4
Merge pull request #2562 from mozilla/bump-v0.6.0
Bump version to v0.6.0
2019-12-03 17:07:39 +00:00
Reuben Morais
c11cbd4b4b Bump version to v0.6.0 2019-12-03 16:46:28 +01:00
Reuben Morais
b2c4de3cb0 Update docs to refer to v0.6.0 release links 2019-12-03 16:46:28 +01:00
Reuben Morais
d79166514a Merge branch 'update-prod-models' (Fixes #2561) 2019-12-03 16:45:39 +01:00
Reuben Morais
a45706c40d Drop support for Python 3.4 as it is EOL and no longer builds on macOS 2019-12-03 16:44:21 +01:00
Reuben Morais
2aab67f520 Update prod model expected inference results 2019-12-03 15:28:12 +01:00
Reuben Morais
551570616b Reduce training task from 399 epochs to 220, enough to overfit LDC93S1 2019-12-03 13:07:15 +01:00
Reuben Morais
1f98373274 Update examples model asset for v0.6 release 2019-12-03 12:15:05 +01:00
Reuben Morais
9b3d2c1c37 Update prod model assets for v0.6 release 2019-12-03 12:10:07 +01:00
Yi-Hua Chiu
ec0ee65eb0 [FIX] prevent random_uniform generate from x to x 2019-12-03 18:40:52 +08:00
Yi-Hua Chiu
9b3e4aa9d3 don't touch original lint 2019-12-03 18:28:24 +08:00
Yi-Hua Chiu
450483c30b clean code 2019-12-03 18:28:24 +08:00
Yi-Hua Chiu
368f0d413a sparse image warp to dynamic tensor 2019-12-03 18:28:24 +08:00
Yi-Hua Chiu
271a58e464 prepare files for refactoring 2019-12-03 18:28:24 +08:00
Reuben Morais
34d4fb7b96 Warn about master docs applying to master only
X-DeepSpeech: NOBUILD
2019-12-02 17:55:01 +01:00
Kelly Davis
0ab348878c
Merge pull request #2558 from mozilla/readthedocs
Updated old docs in prep for 0.6.0
2019-12-02 15:02:55 +01:00
kdavis-mozilla
3d7dc179e9 Addressed review comments 2019-12-02 14:11:31 +01:00
kdavis-mozilla
f75b9cc926 Updating Geometry 2019-12-02 11:04:27 +01:00
kdavis-mozilla
7d96540d66 Updating Introduction 2019-12-02 08:13:38 +01:00
Carlos Fonseca M
d024c333bc Multi-stream support .NET
Adds multi-stream support for the .NET client using the same acoustic model.
2019-11-29 08:09:17 -06:00
Carlos Fonseca M
4c6a95db39 Remove unused members
FreeString and FreeMetadata are both private usage only.
2019-11-29 08:09:17 -06:00
Reuben Morais
b6bc46f3fb
Merge pull request #2554 from mozilla/disable-cache-to-memory
Disable caching features to memory
2019-11-29 12:17:21 +00:00
Reuben Morais
e3b1b5fd42 Disable caching features to memory 2019-11-29 10:17:15 +01:00
Reuben Morais
271e3639a7
Merge pull request #2555 from mozilla/ubuntu-16.04
Remove ubuntu-advantage-tools package to workaround ESM repository 401 problem
2019-11-29 07:33:14 +00:00
Reuben Morais
a4b6431e2f Remove ubuntu-advantage-tools package to workaround ESM repository availability 2019-11-28 22:25:54 +01:00
lissyx
caf039bcf8
Merge pull request #2553 from lissyx/fix-examples-link
Link to proper README for examples/
2019-11-27 13:47:25 +01:00
Alexandre Lissy
c1c038bbdf Link to proper README for examples/
Fixes #2552
2019-11-27 13:44:27 +01:00
Reuben Morais
d925e6b5fc
Merge pull request #2528 from mozilla/filter-lm
Filter LM by removing very rare words
2019-11-26 13:16:15 +00:00
lissyx
670d24ae03
Merge pull request #2546 from aayagar001/master
Fixed store_true in --json
2019-11-24 16:27:42 +01:00
aayagar001
2057305df7
Fixed store_true in --json 2019-11-24 20:31:24 +05:30
lissyx
abe9583b16
Merge pull request #2544 from lissyx/electron-7.1
Support ElectronJS v7.1.2
2019-11-24 15:45:32 +01:00
Alexandre Lissy
dde09757d4 Support ElectronJS v7.1.2
Fixes #2543
2019-11-24 14:23:37 +01:00
Tilman Kamp
b2338368f4
Merge pull request #2541 from tilmankamp/fixexeflag
Fix: Added executable flag to DeepSpeech.py again
2019-11-21 14:02:41 +01:00
Tilman Kamp
a160dc72c4 Fix: Added executable flag to DeepSpeech.py again 2019-11-21 13:08:53 +01:00
Tilman Kamp
f3d69147fe
Merge pull request #2538 from tilmankamp/transcribe
Tool for bulk transcription
2019-11-21 13:05:08 +01:00
Tilman Kamp
29528ed7b7 Separate process per file; less log noise 2019-11-20 17:29:13 +01:00
lissyx
a8081692a4
Merge pull request #2524 from aayagar001/master
Added support of --json timestamp in python client.py
2019-11-20 13:43:10 +01:00
Tilman Kamp
c24c510fd9 Tool for bulk transcription 2019-11-18 16:03:03 +01:00
Reuben Morais
381faaf6b6 Switch to --prune 0 0 1 model and move generation code to a script 2019-11-15 13:28:45 +01:00
lissyx
f7ee7e4995
Merge pull request #2535 from lissyx/ensure-tcworkdir-cleanup
Do not fail cleanup if some cache dir is missing on macOS
2019-11-14 17:06:38 +01:00
Alexandre Lissy
c04dd6798b Do not fail cleanup if some cache dir is missing on macOS 2019-11-14 15:52:36 +01:00
lissyx
8242bb4751
Merge pull request #2534 from lissyx/doc-line-refs
Doc line refs
2019-11-14 11:20:59 +01:00
Alexandre Lissy
3a068f3fac Update example line numbers
Fixes #2533

X-DeepSpeech: NOBUILD
2019-11-14 11:14:07 +01:00
Alexandre Lissy
0ad6c10e8c Update Python example line numbers 2019-11-14 11:13:42 +01:00
Alexandre Lissy
b1d1ef6450 Update JavaScript example line numbers 2019-11-14 10:55:06 +01:00
Alexandre Lissy
849ed712e1 Update Java example line numbers 2019-11-14 10:53:25 +01:00
Alexandre Lissy
20c4ced80b Update C example line numbers 2019-11-14 09:53:07 +01:00
lissyx
b3787ee188
Merge pull request #2532 from lissyx/bump-v0.6.0-alpha.15
Bump VERSION to 0.6.0-alpha.15
2019-11-14 09:19:19 +01:00
Alexandre Lissy
1bf246dfab Bump VERSION to 0.6.0-alpha.15 2019-11-14 09:18:30 +01:00
lissyx
dab5183a98
Merge pull request #2527 from lissyx/pyenv-virtualenv-cache
Don't fail on existing pyenv virtualenv symlink
2019-11-13 18:43:57 +01:00
Reuben Morais
ad2769f479 Filter LM by removing very rare words 2019-11-13 17:38:40 +01:00
Alexandre Lissy
024f8e4ddf Avoid force-reinstalling existing pyenv virtualenv 2019-11-13 17:02:55 +01:00
lissyx
f18643b1b9
Merge pull request #2525 from carlfm01/wpf-mvvm
Move WPF example to MVVM
2019-11-13 15:41:25 +01:00
lissyx
c5935644a1
Merge pull request #2522 from lissyx/package_as_zip
Support packaging as Zip file
2019-11-13 13:18:45 +01:00
Reuben Morais
1eaec6eb5e
Merge pull request #2521 from mozilla/utf8
UTF-8 target
2019-11-13 11:24:45 +00:00
Carlos Fonseca M
42e533016c Move WPF example to MVVM 2019-11-13 04:25:05 -06:00
Alexandre Lissy
fe6230020c Support packaging as Zip file 2019-11-13 11:16:11 +01:00
Reuben Morais
7cd8e20045 Re-export model used by examples with UTF-8 trie 2019-11-13 10:37:50 +01:00
aayagar001
c07e0a9208
Fixed PEP8 standard 2019-11-13 11:45:52 +05:30
aayagar001
7530543286
Added --json timestamp support in python client 2019-11-13 11:30:14 +05:30
aayagar001
c19ff7d3f1
Merge pull request #1 from aayagar001/json-timestamp
client.py for supporting --json argument for timestamp info
2019-11-13 11:17:25 +05:30
aayagar001
5b74d9b1ce
client.py for supporting --json argument for timestamp info
Added function to convert metadata info into timestamp based json.
2019-11-13 11:10:52 +05:30
Reuben Morais
b05b48a0df Force build 2019-11-12 22:38:55 +01:00
Reuben Morais
d2eb305b73 Address review comment and add missing check for presence of scorer 2019-11-12 21:56:42 +01:00
Reuben Morais
b70de5a9ba
Merge pull request #2509 from bprfh/patch-2
Add README.rst to example Folder
2019-11-12 13:02:12 +00:00
Reuben Morais
da8a3e546f
Adjust title formatting 2019-11-12 13:01:54 +00:00
Reuben Morais
0e6952c3a8 Avoid reconstructing strings twice on decode 2019-11-11 12:53:05 +01:00
Reuben Morais
c1b1a59423 Score prefix as soon as a grapheme is formed rather than 1 byte later 2019-11-11 12:52:48 +01:00
Reuben Morais
f4cdd988df UTF-8 target 2019-11-11 11:36:16 +01:00
lissyx
b1093b6cd3
Merge pull request #2516 from lissyx/fix-macos-cleanup
Fix cleanup on macOS
2019-11-08 18:03:13 +01:00
lissyx
6a9222ba2b
Merge pull request #2511 from bprfh/patch-1
Better instructions for example/vad_transcriber
2019-11-08 18:01:59 +01:00
Reuben Morais
acbf6583ea
Merge pull request #2514 from mozilla/remove-first-party-common
Remove first party code from common.a
2019-11-08 16:54:29 +00:00
bprfh
26f20f989d
Removed wrong bug note, spelling, wording. 2019-11-08 17:51:46 +01:00
Alexandre Lissy
3d3b8f1f13 Fix cleanup on macOS 2019-11-08 16:59:53 +01:00
Reuben Morais
ae705cf95f Remove first party code from common.a 2019-11-08 14:33:06 +01:00
lissyx
179d4d6f31
Merge pull request #2510 from bprfh/patch-3
Add links to examples to README.rst
2019-11-07 19:42:40 +01:00
bprfh
0f475f70a4
Moved Examples list
Moved the examples list between "Using a Pre-trained Model" and "Training your own Model" as requested
2019-11-07 19:40:00 +01:00
bprfh
65d81add96
Fixed the title markdown as requested
Changed wrong title markdown from "===============" to "=====", as requested.
2019-11-07 19:21:31 +01:00
bprfh
1f3f3cbf53
Add links to examples to README.rst
Show examples on the first page, so people can easily find them.
2019-11-07 16:34:21 +01:00
bprfh
0ddc83c707
Give a short description on what is in the folder
Give a short description and overview of the content of the folder, so we can link from the main README.rst
2019-11-07 16:31:54 +01:00
lissyx
d754e53f21
Merge pull request #2507 from lissyx/bump-v0.6.0-alpha.14
Bump VERSION to 0.6.0-alpha.14
2019-11-07 07:29:04 +01:00
Alexandre Lissy
c7816ee5dd Bump VERSION to 0.6.0-alpha.14 2019-11-07 07:27:18 +01:00
lissyx
23e72a6074
Merge pull request #2504 from lissyx/node-13
Add NodeJS v13
2019-11-07 06:34:19 +01:00
Alexandre Lissy
de71d6559d Split NodeJS testing per-arch/system
Fixes #2497
2019-11-06 14:35:00 +01:00
Alexandre Lissy
953bee9381 Add NodeJS v13
Fixes #2501
2019-11-06 14:35:00 +01:00
lissyx
43a187aecf
Merge pull request #2500 from lissyx/spurious-workspace
Improve spurious rebuild checks
2019-11-06 14:34:46 +01:00
Tilman Kamp
31b36c76b3
Merge pull request #2505 from tilmankamp/fixtudaswc
Fix: ms per char minimum for SWC and TUDA importers
2019-11-06 13:29:20 +01:00
Tilman Kamp
343d07173f Fix: ms per char minimum for SWC and TUDA importers 2019-11-06 13:14:55 +01:00
bprfh
5eb47053e7
Added clearer instructions for setup and errors
The tutorial misses a few steps for the setup and doesn't mention a few common errors.
I also tried format it better.
2019-11-06 12:02:48 +01:00
Alexandre Lissy
ddb88e18e4 Improve spurious rebuild checks 2019-11-06 08:17:01 +01:00
lissyx
80493c83c3
Merge pull request #2498 from lissyx/bump-v0.6.0-alpha.13
Bump VERSION to 0.6.0-alpha.13
2019-11-05 16:22:49 +01:00
Alexandre Lissy
f5708a99d5 Bump VERSION to 0.6.0-alpha.13 2019-11-05 16:22:15 +01:00
lissyx
3d03353e64
Merge pull request #2495 from lissyx/electronjs7
Add ElectronJS v7.0
2019-11-05 15:15:52 +01:00
Reuben Morais
af5d18cf29
Merge pull request #2481 from mozilla/embed-alphabet
Embed alphabet in model file
2019-11-05 13:37:19 +00:00
Alexandre Lissy
b152802fd1 Add ElectronJS v7.0
Fixes #2494
2019-11-05 12:26:52 +01:00
Reuben Morais
10c652b420 Document serialization format 2019-11-05 09:15:18 +01:00
Reuben Morais
bd6a9d03b1 Use model from Python 3.6 training run 2019-11-05 09:10:10 +01:00
Reuben Morais
b8ebf9011b Address review comments 2019-11-05 09:10:10 +01:00
Reuben Morais
34314767f7 Fix prod model tests 2019-11-05 09:10:09 +01:00
Reuben Morais
3fdc7d422d Remove alphabet param usage 2019-11-05 09:02:42 +01:00
Reuben Morais
8c82081779 Embed alphabet directly in model 2019-11-05 09:02:21 +01:00
lissyx
493aaed151
Merge pull request #2493 from lissyx/bump-v0.6.0-alpha.12
Bump VERSION to 0.6.0-alpha.12
2019-11-05 09:00:15 +01:00
Alexandre Lissy
e1be01b1d6 Bump VERSION to 0.6.0-alpha.12 2019-11-05 08:57:31 +01:00
lissyx
1e400760b2
Merge pull request #2491 from lissyx/tc-community
Move to TC Community
2019-11-05 08:48:57 +01:00
Alexandre Lissy
2c898d92cb Move to TC Community 2019-11-05 07:42:39 +01:00
lissyx
1089b59e72
Merge pull request #2486 from djmitche/bug1574659
Bug 1574659 - migrate from taskcluster.net to community-tc
2019-11-04 20:14:27 +01:00
lissyx
8235dd2a4e
Merge pull request #2492 from lissyx/fix-js-doc
Update JS doc for changed API
2019-11-04 20:07:26 +01:00
Alexandre Lissy
8cbcf1da3c Update JS doc for changed API 2019-11-04 20:06:47 +01:00
lissyx
c1da680ae9
Merge pull request #2483 from lissyx/alphabet-consistency
Check unicode normalization
2019-11-04 11:43:52 +01:00
Alexandre Lissy
489dbad3a4 Check unicode normalization 2019-11-04 11:41:13 +01:00
lissyx
6929fee2d3
Merge pull request #2482 from lissyx/fix-eval-tflite
Update evalutate_tflite with wav_filename
2019-11-04 10:55:43 +01:00
Dustin J. Mitchell
c86eca944f remove unnecessary lowest-priority scope 2019-11-03 03:57:57 +00:00
Dustin J. Mitchell
bd12eacafa include /api/ in community-tc URLs 2019-11-03 03:54:18 +00:00
Miles Crabill
3501ce15c2
update taskcluster.yml for community-tc 2019-11-01 14:14:03 -07:00
Miles Crabill
27efcf470a
swap taskcluster.net references for community-tc.services.mozilla.com 2019-11-01 14:12:50 -07:00
Reuben Morais
c746361d4a
Merge pull request #2480 from tilmankamp/fixmailab
Relative paths in M-AILAB importer
2019-11-01 09:56:03 +00:00
Reuben Morais
19efb47a45
Merge pull request #2479 from tilmankamp/keepcolons
Removing exclamation-marks, colons and semi-colons from labels
2019-11-01 09:53:37 +00:00
Alexandre Lissy
f3240bffbc Update evaluate_tflite with wav_filename 2019-10-31 17:31:29 +01:00
Tilman Kamp
96a720c597 Relative paths in M-AILAB importer 2019-10-30 16:18:14 +01:00
Tilman Kamp
d38a3f13f7 Removing exclamation-marks, colons and semi-colons from labels 2019-10-30 16:14:58 +01:00
Tilman Kamp
0ba549b83f
Merge pull request #2478 from tilmankamp/fixmailab
Fix for empty skip list case; making linter happy
2019-10-30 13:08:45 +01:00
Tilman Kamp
df1df83720 Fix for empty skip list case; making linter happy 2019-10-30 12:59:16 +01:00
Reuben Morais
31ec7a71f2 Fix undefined variable when saving test samples to --test_output_file
X-DeepSpeech: NOBUILD
2019-10-30 10:30:00 +01:00
Reuben Morais
62d592fc1e
Merge pull request #2475 from mozilla/faster-startup
Improve training startup time
2019-10-29 12:41:42 +00:00
Reuben Morais
b39da7f8b7 Improve training startup time 2019-10-29 12:47:34 +01:00
Reuben Morais
02a1cc0cbf
Merge pull request #2473 from safa0/add-alias-for-flags-to-match-deepspeech-module-flags
Add aliases for lm, trie, alphabet in util/flags.py
2019-10-29 11:44:42 +00:00
Sam Safaei
4138a2571a added alias for trie,alphabet,lm_binary in util/flags to match module flags
This matches $deepspeech --lm --trie --alphabet
with DeepSpeech.py and evaluate.py that use other names for the same flags
2019-10-29 00:23:04 +01:00
Reuben Morais
26d10a5df3
Merge pull request #2471 from tilmankamp/exeflag
Added executable flag to some importers
2019-10-28 16:32:08 +00:00
Tilman Kamp
cf6245847f Added executable flag to some importers 2019-10-28 14:33:40 +01:00
Tilman Kamp
36b510221d
Merge pull request #2470 from tilmankamp/cvfixes
Making sample paths relative; additional sub-sets
2019-10-28 14:26:03 +01:00
Tilman Kamp
cef7c45f03 Making sample paths relative; additional sub-sets 2019-10-28 12:25:12 +01:00
lissyx
3a2eb28983
Merge pull request #2469 from vinhngx/amp-doc
adding amp doc
2019-10-28 08:05:50 +01:00
Vinh Nguyen
b105640d28 adding amp doc 2019-10-27 23:29:05 +00:00
lissyx
f3694efbca
Merge pull request #2467 from lissyx/bump-v0.6.0-alpha.11
Bump VERSION to 0.6.0-alpha.11
2019-10-26 12:03:43 +02:00
Alexandre Lissy
3889739a9b Bump VERSION to 0.6.0-alpha.11 2019-10-26 12:02:32 +02:00
lissyx
969146c908
Merge pull request #2462 from lissyx/py38
Build Python 3.8 wheels
2019-10-25 19:15:40 +02:00
Tilman Kamp
1af17c9e2d
Merge pull request #2464 from tilmankamp/tuda_importer
TUDA importer
2019-10-25 16:59:33 +02:00
Tilman Kamp
2cdfcff4c6 TUDA importer 2019-10-25 14:57:25 +02:00
Alexandre Lissy
f80dbcda75 Build Python 3.8 wheels
Fixes #2461
2019-10-25 12:28:27 +02:00
Reuben Morais
44a605c8b7
Merge pull request #2435 from mozilla/uplift-utf8-fixes
Uplift general fixes from UTF-8 work
2019-10-25 09:09:48 +00:00
Reuben Morais
1d86469b00
Merge pull request #2460 from mozilla/fix-evaluate-py
Avoid using references to the same object in sparse_tuple_to_text
2019-10-24 16:14:16 +00:00
Tilman Kamp
ede4dd6f93
Merge pull request #2459 from tilmankamp/import_swc
Spoken Wikipedia importer
2019-10-24 15:35:49 +02:00
Tilman Kamp
4be08fa6d3 Removed dutch ij digraph from normalization blacklist 2019-10-24 12:43:38 +02:00
Reuben Morais
68251d6944 Avoid using references to the same object in sparse_tuple_to_text 2019-10-24 10:17:18 +02:00
Tilman Kamp
122a007d33 Linter induced changes 2019-10-23 16:47:50 +02:00
Tilman Kamp
010f24578f Better alphabet access 2019-10-23 15:10:08 +02:00
lissyx
bde5c31fd4
Merge pull request #2458 from lissyx/doc-dotnet
Add .Net Framework API doc
2019-10-23 14:28:43 +02:00
Tilman Kamp
3424ab2b5d Spoken Wikipedia importer 2019-10-23 14:22:37 +02:00
Reuben Morais
d35107acdb
Merge pull request #2453 from mozilla/expose_cutoff
Expose cutoff_prob and cutoff_top_n as flags
2019-10-23 12:21:36 +00:00
Reuben Morais
ca401b0813
Merge pull request #2454 from mozilla/encapsulate-alphabet
Add Alphabet.encode analog to .decode and better encapsulate implementation details
2019-10-23 12:21:02 +00:00
Alexandre Lissy
60cec3722f Add .Net Framework API doc
Fixes #2457
2019-10-23 14:11:45 +02:00
Reuben Morais
f0688ec941 Add Alphabet.encode analog to .decode and better encapsulate implementation details 2019-10-23 11:22:56 +02:00
Reuben Morais
707281ce31
Merge pull request #2455 from Murcurio/patch-3
Update Dockerfile

X-DeepSpeech: NOBUILD
2019-10-23 09:18:09 +00:00
Reuben Morais
12baf5ffbc Expose cutoff_prob and cutoff_top_n as flags 2019-10-23 11:15:23 +02:00
Murcurio
4aa52738aa
Update Dockerfile
Added line to initialise git-lfs before cloning the repo, without this command lm.binary doesn't pull. If we want this to be versions specific, might also be worth doing a git checkout <version>
2019-10-23 13:31:31 +11:00
Reuben Morais
6e287bd340
Merge pull request #2447 from Murcurio/patch-2
Use explicit encoding when opening files in import_cv2.py
2019-10-22 10:12:22 +00:00
lissyx
469ddd2cf7
Merge pull request #2448 from lissyx/fix-ctc-leak
Use std::shared_ptr instead of raw pointer for dictionary_
2019-10-18 11:59:08 +02:00
Alexandre Lissy
ef3f8004ce Use std::shared_ptr instead of raw pointer for dictionary_
Fixes #2403
2019-10-18 10:15:59 +02:00
Murcurio
9055d49b47
Update import_cv2.py
Requires utf8 encoding, without this it tries to read it as ascii and fails
2019-10-18 12:49:33 +11:00
lissyx
336daa1641
Merge pull request #2445 from lissyx/bump-v0.6.0-alpha.10
Bump VERSION to 0.6.0-alpha.10
2019-10-17 08:54:21 +02:00
Alexandre Lissy
2cbc79fb8a Bump VERSION to 0.6.0-alpha.10 2019-10-17 08:53:38 +02:00
Reuben Morais
6eebdda7c5
Merge pull request #2437 from mozilla/update-brew
Update Homebrew to 2.1.14
2019-10-16 17:29:08 +00:00
lissyx
82fcaa5b23
Merge pull request #2439 from lissyx/tflite_metadata
Store graph version on TFLite
2019-10-16 18:35:21 +02:00
Alexandre Lissy
1939f74ec0 Store graph version on TFLite 2019-10-16 15:36:45 +02:00
Reuben Morais
4d5c9d1868 Point to updated TensorFlow artifacts 2019-10-16 11:30:51 +02:00
Reuben Morais
a902d7b343 Update Homebrew to 2.1.14 2019-10-16 11:29:39 +02:00
Reuben Morais
cd2e8c8947
Merge pull request #2438 from mozilla/debugging-utils
Add some debugging helpers behind a preprocessor flag
2019-10-15 14:54:25 +00:00
Reuben Morais
ef3bdb2540 Merge PR #2434 - Add flag for automatic mixed precision training 2019-10-15 13:45:19 +02:00
Reuben Morais
31922cb3dc Clarify docs and fix linter 2019-10-15 13:44:09 +02:00
Reuben Morais
31d81740ee Add debugging helpers to PathTrie 2019-10-15 12:49:38 +02:00
Reuben Morais
83a89dcae6 Add debugging code to trie_load.cc 2019-10-15 12:43:03 +02:00
lissyx
818ea6a40f
Merge pull request #2433 from lissyx/cli_lm
Expose beam width, lm_alpha and lm_beta in CLI args
2019-10-14 21:22:03 +02:00
Reuben Morais
abc687b3b9 Specify BOS for final scoring at decode if applicable 2019-10-14 21:03:14 +02:00
Reuben Morais
c8802a38e7 Don't add special tokens to vocabulary 2019-10-14 21:03:14 +02:00
Reuben Morais
3015237e8d Replace incomplete sorts with partial sorts 2019-10-14 21:03:14 +02:00
Reuben Morais
739841d731 Respect FLAGS.load in evaluate.py 2019-10-14 21:03:14 +02:00
Reuben Morais
e0d8ef75e8 Use try_loading and FLAGS.load in --one_shot_infer code 2019-10-14 21:03:14 +02:00
Reuben Morais
e75d1e4b61 Respect --test_output_files from DeepSpeech.py 2019-10-14 21:03:14 +02:00
Alexandre Lissy
ef5ae5c0b4 Expose beam width, lm_alpha and lm_beta in CLI args 2019-10-14 15:07:59 +02:00
Vinh Nguyen
e0bd1423b5 adding automatic mixed precision training support 2019-10-14 12:34:29 +00:00
Vinh Nguyen
909fa60601 adding automatic mixed precision training support 2019-10-14 12:34:10 +00:00
lissyx
65a53aeb4a
Merge pull request #2432 from lissyx/update-doc-link
Update doc link for model compatibility
2019-10-12 17:14:33 +02:00
Alexandre Lissy
c15d9b4b8b Update doc link for model compatibility
X-DeepSpeech: NOBUILD
2019-10-12 17:13:59 +02:00
lissyx
cb3d1a3f36
Merge pull request #2431 from lissyx/update-doc-api
Add missing doc for new API
2019-10-12 17:05:11 +02:00
Alexandre Lissy
b536f5761f Add missing doc for new API
X-DeepSpeech: NOBUILD
2019-10-12 17:04:12 +02:00
Reuben Morais
49b8509c7a Add macOS debug symbol generation target for libdeepspeech
X-DeepSpeech: NOBUILD
2019-10-11 18:13:12 +02:00
lissyx
b9bb019f30
Merge pull request #2429 from mozilla/ctcdecode-debug-build
Debug build for decoder package
2019-10-11 18:06:19 +02:00
Reuben Morais
aad1b2234b Debug build for ds_ctcdecoder package 2019-10-11 18:01:02 +02:00
Reuben Morais
d9b79a6689
Merge pull request #2428 from mone27/patch-1
Set default param of skiplist to '' in M-AILABS importer
2019-10-11 17:07:22 +02:00
mone27
a867c919bd
Setted default param of skiplist to ''
fixing issue found in https://github.com/MozillaItalia/DeepSpeech-Italian-Model/pull/21
2019-10-11 16:42:32 +02:00
lissyx
3d20350502
Merge pull request #2427 from lissyx/bump-v0.6.0-alpha.9
Bump VERSION to 0.6.0-alpha.9
2019-10-11 12:57:07 +02:00
Alexandre Lissy
156a16e330 Bump VERSION to 0.6.0-alpha.9 2019-10-11 12:56:22 +02:00
lissyx
a1fcb6e350
Merge pull request #2426 from mozilla/bump-graph-version
Bump graph version due to forget_bias change
2019-10-11 12:54:09 +02:00
Reuben Morais
7dc2be1a9b Point examples to ldc93s1 model with new graph version 2019-10-11 11:40:23 +02:00
Reuben Morais
fcbebbe71a
Merge pull request #2425 from mozilla/expose-sample-rate-value
Expose sample rate value in API and use it in in-tree consumers
2019-10-11 09:39:31 +02:00
Reuben Morais
ce0292b92c Bump graph version due to forget_bias change 2019-10-10 23:59:54 +02:00
Reuben Morais
673d620a67 Expose and use model sample rate in Java 2019-10-10 23:03:45 +02:00
Reuben Morais
4dc18dd8ee Expose and use model sample rate in .NET 2019-10-10 23:03:45 +02:00
lissyx
f94ed4c744
Merge pull request #2424 from lissyx/readme-links
Fix broken references to README.md
2019-10-10 22:05:31 +02:00
Reuben Morais
5cb15ca6ed Use model sample rate in examples 2019-10-10 22:04:33 +02:00
Alexandre Lissy
32e9b8cd3e Fix broken references to README.md
Fixes #2423

X-DeepSpeech: NOBUILD
2019-10-10 22:04:31 +02:00
Reuben Morais
0be2787e4e Expose and use model sample rate in JavaScript 2019-10-10 21:57:26 +02:00
Reuben Morais
afea2b4231 Expose and use model sample rate in Python 2019-10-10 21:50:15 +02:00
Reuben Morais
c1ed6d711d Use model sample rate in client.cc 2019-10-10 21:46:01 +02:00
Reuben Morais
0241f725cd Expose model sample rate in API 2019-10-10 21:45:33 +02:00
Reuben Morais
315a67bf69
Merge pull request #2420 from mozilla/remove-sr-param
Remove unused sample rate param from API
2019-10-10 21:05:29 +02:00
Reuben Morais
2b68c56025 Sync all the docs with sample rate changes
X-DeepSpeech: NOBUILD
2019-10-10 17:15:58 +02:00
Reuben Morais
9200b720c3 Remove sample rate parameter usage from concurrent streams test 2019-10-10 15:44:11 +02:00
Reuben Morais
998daa5bca Remove sample rate parameter usage from evaluate_tflite.py 2019-10-10 15:44:11 +02:00
Reuben Morais
baaa5842b2 Remove sample rate parameter usage from examples 2019-10-10 15:44:11 +02:00
Reuben Morais
11ad23cc1f Remove sample rate parameter usage from .NET binding 2019-10-10 15:44:11 +02:00
Reuben Morais
1007d93da2 Remove sample rate parameter usage from Java binding 2019-10-10 15:44:11 +02:00
Reuben Morais
385279bc20 Remove sample rate parameter usage from JavaScript binding 2019-10-10 15:44:11 +02:00
Reuben Morais
97bab38a7e Remove sample rate parameter usage from Python binding 2019-10-10 15:44:11 +02:00
Reuben Morais
abb11f040d Remove sample rate parameter usage from client.cc 2019-10-10 15:44:11 +02:00
Reuben Morais
2f4116695f Remove unused sample rate param from API 2019-10-10 14:34:00 +02:00
lissyx
42726b3612
Merge pull request #2409 from lissyx/electrons-6
Add ElectronJS v6.0
2019-10-10 14:33:34 +02:00
Alexandre Lissy
eab8bf5dec Add ElectronJS v6.0
Fixes #2408
2019-10-10 13:48:37 +02:00
lissyx
3026d8706b
Merge pull request #2417 from lissyx/tflite-post_training_quantize
Use TFLite optimizations flag
2019-10-09 14:55:32 +02:00
lissyx
747889d16c
Merge pull request #2419 from lissyx/tc-from-tag
Move default branch to current VERSION content instead of master
2019-10-09 14:54:50 +02:00
Alexandre Lissy
5e7679593b Move default branch to current VERSION content instead of master
Fixes #2418

X-DeepSpeech: NOBUILD
2019-10-09 14:29:14 +02:00
Alexandre Lissy
4f1f67f55d Use TFLite optimizations flag
Fixes #2415
2019-10-09 12:21:35 +02:00
Reuben Morais
f893dc8c23
Merge pull request #2413 from mozilla/issue2410
Set sample_rate attribute of ds_audio_buffer in NO_SOX client (Fixes #2410)
2019-10-09 10:15:18 +02:00
Reuben Morais
ce785534fe Set sample_rate attribute of ds_audio_buffer in NO_SOX client 2019-10-09 09:38:38 +02:00
lissyx
031479d88b
Merge pull request #2406 from lissyx/disable-cache-dataaug
Disable cache when data augmentation is set
2019-10-08 18:10:32 +02:00
Alexandre Lissy
c35068f880 Disable cache when data augmentation is set
Fixes #2396
2019-10-08 17:29:59 +02:00
Reuben Morais
8c1fd5b31e
Merge pull request #2407 from lissyx/fix-doc-links
Fix bogus cross-file links
2019-10-08 15:46:41 +02:00
Reuben Morais
fe4451f22b
Merge pull request #2405 from mozilla/fix-cpu-graph
Set forget_bias=0 in CPU graph for compatibility with CudnnRNN
2019-10-08 12:10:51 +02:00
Alexandre Lissy
3dae00b4ab Fix bogus cross-file links 2019-10-08 06:46:52 +02:00
Reuben Morais
5a287a65e5 Set forget_bias=0 in CPU graph for compatibility with CudnnRNN 2019-10-07 21:06:29 +02:00
Reuben Morais
fb611efd00
Merge pull request #2400 from carlfm01/remove-intptr
Replace structs with IntPtr .NET
2019-10-07 08:44:32 +02:00
Carlos Fonseca
0f826f6324 Add thread-safe close 2019-10-05 04:00:00 +00:00
Carlos Fonseca
acabb26378 Move structs to IntPtr 2019-10-05 03:58:40 +00:00
lissyx
30e0da9029
Merge pull request #2395 from lissyx/md-to-rst
Move from Markdown to reStructuredText
2019-10-04 14:11:40 +02:00
Alexandre Lissy
65c942efbb Update cardboardlint configuration 2019-10-04 13:56:41 +02:00
Alexandre Lissy
d1936c60b3 Refer to examples from doc
Fixes #2338
2019-10-04 12:07:32 +02:00
Alexandre Lissy
9ce8c24165 Move from Markdown to reStructuredText 2019-10-04 12:07:32 +02:00
lissyx
9ac8cebb3b
Merge pull request #2394 from lissyx/importers-slr-mailabs
Importers slr mailabs
2019-10-02 15:24:40 +02:00
Alexandre Lissy
0ac4df6f82 Add M-AILABS importer 2019-10-02 13:40:19 +02:00
Alexandre Lissy
e22f9787be Add SLR57 importer: African Accented French 2019-10-02 13:40:19 +02:00
lissyx
b888058e4e
Merge pull request #2392 from lissyx/fix-macos-cleanup
Improve macOS tc-workdir cleanup
2019-09-30 17:43:16 +02:00
Alexandre Lissy
0969b4f9b9 Improve macOS tc-workdir cleanup 2019-09-30 15:24:20 +02:00
Reuben Morais
d0a578221d
Merge pull request #2391 from mozilla/optional-lm-test
Make language model scoring optional in Python inference code
2019-09-30 13:35:26 +02:00
Reuben Morais
4302a5f767 Make language model scoring optional in Python inference code 2019-09-30 11:43:00 +02:00
Reuben Morais
14c0db7294 Merge branch 'remove-wrong-docs' 2019-09-30 10:56:54 +02:00
Reuben Morais
67e3eefb95 Remove incorrect docs on Scorer
X-DeepSpeech: NOBUILD
2019-09-30 10:54:59 +02:00
lissyx
f0e954183f
Merge pull request #2387 from lissyx/bump-v0.6.0-alpha.8
Bump VERSION to 0.6.0-alpha.8
2019-09-27 11:11:02 +02:00
Alexandre Lissy
8595f2a7bb Bump VERSION to 0.6.0-alpha.8 2019-09-27 11:10:10 +02:00
Reuben Morais
ba56407376
Merge pull request #2383 from mozilla/scorer-cleanup
Don't explicitly score the BOS token, and avoid copies when scoring sentences
2019-09-27 11:06:24 +02:00
Reuben Morais
a323973521 Address review comments 2019-09-27 11:02:22 +02:00
lissyx
f51f9d9704
Merge pull request #2386 from artie-inc/typo
Fixing typo s/StreamingContext/StreamingState/
2019-09-27 10:46:56 +02:00
lissyx
9f988de6ba
Merge pull request #2385 from lissyx/lm-lazy
Load KenLM with LAZY
2019-09-26 23:13:48 +02:00
Alexandre Lissy
86b44a7cb7 Load KenLM with LAZY
Fixes #2384
2019-09-26 21:08:27 +02:00
JRMeyer
fed5039cc7 Fixing typo s/StreamingContext/StreamingState/
X-DeepSpeech: NOBUILD
2019-09-26 12:05:37 -07:00
Reuben Morais
6dba6d4a95 Don't explicitly score the BOS token, and avoid copies when scoring sentences 2019-09-26 14:08:32 +02:00
lissyx
513c8e9ab7
Merge pull request #2381 from lissyx/rtd-badge
Put back ReadTheDocs badge
2019-09-24 18:28:00 +02:00
Alexandre Lissy
3a8d395729 Put back ReadTheDocs badge
X-DeepSpeech: NOBUILD
2019-09-24 18:27:23 +02:00
lissyx
5196fa6e9b
Merge pull request #2362 from lissyx/all-the-docs
All the docs
2019-09-24 18:23:07 +02:00
Alexandre Lissy
2b7ab99478 Fix pylint 2019-09-24 18:22:45 +02:00
Alexandre Lissy
bf7cc1df54 Sphinx doc 2019-09-24 18:22:45 +02:00
lissyx
f67818e7b0
Merge pull request #2379 from lissyx/bump-v0.6.0-alpha.7
Bump VERSION to 0.6.0-alpha.7
2019-09-24 13:38:52 +02:00
Alexandre Lissy
693648657f Bump VERSION to 0.6.0-alpha.7 2019-09-24 11:06:29 +02:00
Alexandre Lissy
33281c4aac Add TaskCluster documentation generation 2019-09-24 10:55:26 +02:00
Alexandre Lissy
6c4fa52e42 Re-enable readthedocs.io 2019-09-24 10:55:26 +02:00
Alexandre Lissy
458692692e Fix header preprocessor alignment 2019-09-24 10:55:26 +02:00
lissyx
ea1f2b1995
Merge pull request #2373 from lissyx/tflite_arm
Switch to TFLite for RPi3/4
2019-09-24 08:11:52 +02:00
Alexandre Lissy
4103247a21 Use TFLite runtime on RPi3/RPi4 2019-09-23 14:44:27 +02:00
Chirag Ahuja
58fdd55eea Augmentation Documentation (#2355)
Augmentation Documentation
2019-09-23 12:43:17 +02:00
Reuben Morais
7995e4230b
Merge pull request #2358 from mozilla/rename-probability-confidence
Rename metadata probability field to confidence
2019-09-23 12:41:51 +02:00
Reuben Morais
005b5a8c3b
Merge pull request #2366 from mozilla/issue2365
Allow specifying --branch when getting decoder URL
2019-09-23 11:59:54 +02:00
Reuben Morais
b2e3a43767
Merge pull request #2375 from rhamnett/master
Create import_vctk.py
2019-09-23 11:59:22 +02:00
Richard Hamnett
792d8e0a27
Create import_vctk.py 2019-09-21 20:46:55 +01:00
lissyx
ccf1b2e73e
Merge pull request #2372 from lissyx/bump-v0.6.0-alpha.6
Bump VERSION to 0.6.0-alpha.6
2019-09-19 17:20:00 +02:00
Alexandre Lissy
579925483b Bump VERSION to 0.6.0-alpha.6 2019-09-19 16:30:03 +02:00
lissyx
f98bfefc77
Merge pull request #2354 from lissyx/run-examples-taskcluster
Run examples on TaskCluster
2019-09-18 20:56:22 +02:00
Alexandre Lissy
5465747e37 Fix linter errors 2019-09-18 15:54:19 +02:00
Reuben Morais
d0e11c73cd Address review comments 2019-09-18 15:09:11 +02:00
Alexandre Lissy
b5a3e328da Update examples to run latest DeepSpeech
Fixes #2351
2019-09-17 20:19:09 +02:00
Alexandre Lissy
5ef0117df0 Run examples on TaskCluster
Fixes #2353
2019-09-17 20:19:09 +02:00
Reuben Morais
82a5b37073 Allow specifying --branch when getting decoder URL 2019-09-15 15:11:15 +02:00
Reuben Morais
2bf8161ca4 Merge branch 'pr-2361' (Fixes #2361) 2019-09-13 12:16:49 +02:00
Mahmoud Hashem
bfcc7e86e7 Expose flag to allow incremental GPU memory allocation 2019-09-13 12:13:31 +02:00
Reuben Morais
150fb67a02
Validate WAV header duration against file size
X-DeepSpeech: NOBUILD
2019-09-12 10:16:54 +00:00
Reuben Morais
0ac498bc50 Rename metadata probability field to confidence 2019-09-11 11:09:44 +02:00
Reuben Morais
fcb9bf6d9f
Also remove samples with noise
X-DeepSpeech: NOBUILD
2019-09-11 09:02:21 +00:00
Reuben Morais
90c2acd810
Merge pull request #2357 from mozilla/magicdata-importer
Add MAGICDATA importer
2019-09-11 10:29:44 +02:00
Reuben Morais
732d0b221d Add MAGICDATA importer 2019-09-11 10:06:44 +02:00
Reuben Morais
889f069b1c
Merge pull request #2352 from mozilla/data-augmentation-pr
Data augmentation PR
2019-09-09 21:29:02 +02:00
Reuben Morais
6b7ebf47f2
Merge pull request #2350 from mozilla/breaking-api-cleanup
[BREAKING] API cleanup
2019-09-09 21:28:43 +02:00
Reuben Morais
7abbe077d8 Add a #warning to deepspeech_compat.h
X-DeepSpeech: NOBUILD
2019-09-09 16:10:19 +02:00
Reuben Morais
b95ebea9ba Fix linter error 2019-09-09 13:47:26 +02:00
Reuben Morais
b6af8c5dc7 Remove some duplicated code 2019-09-09 12:20:16 +02:00
Reuben Morais
d051d4fd0e Remove sparse image warp, fix boolean flags type, rebase to master 2019-09-09 12:11:28 +02:00
Bernardo
0e4eed7be3 removing trailing space 2019-09-09 12:07:51 +02:00
Bernardo
b89fb04b97 space after comma 2019-09-09 12:07:51 +02:00
Bernardo Henz
49c6a9c973 adding 'train_phase' to create_dataset. Now we can augment only the training-set. 2019-09-09 12:07:51 +02:00
Bernardo Henz
0cc5ff230f -spectrogram augmentations 2019-09-09 12:07:51 +02:00
Bernardo Henz
5d5ef15ab7 -data-aug via additive and multiplicative noise in feature-space 2019-09-09 12:07:51 +02:00
Reuben Morais
9c92f909b3
Merge pull request #2346 from mozilla/save-flags
Save flag values next to checkpoints (Fixes #2345)
2019-09-09 12:07:13 +02:00
Reuben Morais
f8f6b33cce Update .NET examples 2019-09-09 12:03:14 +02:00
Reuben Morais
f4e57902ba Update Java bindings 2019-09-09 12:03:14 +02:00
Reuben Morais
a8c53d2154 Update .NET bindings and client 2019-09-09 12:03:14 +02:00
Reuben Morais
bc6741cd41 Update JS bindings and client 2019-09-09 12:03:14 +02:00
Reuben Morais
249fdadc32 Update Python bindings and client 2019-09-09 12:03:14 +02:00
Reuben Morais
a815426918 Update client.cc 2019-09-09 11:38:28 +02:00
Reuben Morais
61b9b0e84d Add convenience header for backwards compatibility 2019-09-09 11:36:17 +02:00
Reuben Morais
c402b971d6 Remove unused params and make function names more consistent 2019-09-09 11:35:44 +02:00
Reuben Morais
896ac9d6c7
Merge pull request #2343 from mozilla/readme-update
Update README
2019-09-06 17:03:42 +02:00
Reuben Morais
36403cb64b Add model download and extraction to initial example
X-DeepSpeech: NOBUILD
2019-09-06 14:02:31 +02:00
Reuben Morais
a9851c949a Address review comments 2019-09-06 14:01:58 +02:00
Reuben Morais
371e73eb69 Create checkpoint dir before writing flags file in it 2019-09-06 13:55:55 +02:00
Reuben Morais
ba2d29b36f Save flag values next to checkpoints 2019-09-06 11:35:10 +02:00
Reuben Morais
4a5d6dcf00 Address review comments
X-DeepSpeech: NOBUILD
2019-09-04 17:17:07 +02:00
Reuben Morais
cdd4530e66 Update README 2019-09-04 16:52:56 +02:00
Reuben Morais
935ede3f28
Merge pull request #2329 from mozilla/tf_upgrade_v2
Run tf_upgrade_v2 to ease eventual transition to TF 2.0
2019-08-28 22:00:23 +02:00
Reuben Morais
85d646350f Update name of audio ops package in TF 1.14/TF 2.0 2019-08-28 18:16:13 +02:00
Reuben Morais
670e06365e Run tf_upgrade_v2 on our code 2019-08-28 17:53:24 +02:00
Reuben Morais
06dee673c7
Merge pull request #2327 from mozilla/abseil-flags
Switch from deprecated tfv1.app to absl-py
2019-08-28 17:22:20 +02:00
Reuben Morais
24bcdeb3d6 Switch from deprecated tfv1.app to absl-py 2019-08-28 10:55:33 +02:00
Reuben Morais
289e346a66
Merge pull request #2324 from bjornbytes/c-compat
Make deepspeech.h compilable as C
2019-08-28 10:38:15 +02:00
bjorn
f7fc74c078 rm trailing whitespace; 2019-08-27 15:38:39 -07:00
bjorn
73b2bbe8da Use struct typedefs for C compatibility; 2019-08-27 15:34:03 -07:00
bjorn
5d24f19115 Only use extern C when compiling as C++; 2019-08-27 15:25:30 -07:00
Reuben Morais
7e96961e35
Merge pull request #2322 from rcgale/master
Fixed issue where multiple csvs could not load
2019-08-26 22:46:50 +02:00
Robert Gale
05448441d3 Fixed issue where multiple csvs could not load
With the new `create_dataset` approach introduced by PR #2283 (read: mine, sorry!), duplicate
indices in the df would cause a fatal error where the columns could not be referenced by
name. Adding `ignore_index=True` during append allows pandas to assign new indices to
rows, and fixes the issue.
2019-08-26 13:37:05 -07:00
Reuben Morais
4c14c6b78b
Merge pull request #2303 from mozilla/simplify-decoder
Simplify decoder impl by making it object oriented, avoid pointers where possible
2019-08-26 17:02:39 +02:00
Reuben Morais
89f63dcd69 Make Scorer init fallible and check it in callers 2019-08-26 13:59:15 +02:00
Reuben Morais
47b9b71776 Automatically format BUILD file with buildifier 2019-08-26 12:01:26 +02:00
Reuben Morais
c730102867 Avoid rebuilding decoder sources for every binary target 2019-08-26 12:01:26 +02:00
Reuben Morais
1b18494e63 Address review comments 2019-08-23 12:19:32 +02:00
Reuben Morais
b2ef9cca83 Make Alphabet init fallible and check it in model creation 2019-08-23 12:19:32 +02:00
Reuben Morais
4dabd248bc Make Alphabet copyable and default-constructable and avoid pointers 2019-08-23 12:19:32 +02:00
Reuben Morais
4d882a8aec Simplify decoder impl by making it object oriented, avoid pointers where possible 2019-08-23 12:19:32 +02:00
Reuben Morais
f442b69aeb
Merge pull request #2320 from mozilla/swig-4.0.1
Use globbing instead of hardcoding SWIG version
2019-08-23 12:19:03 +02:00
Reuben Morais
43b60a621c Use globbing instead of hardcoding SWIG version 2019-08-23 10:45:13 +02:00
lissyx
5fa6d23782
Merge pull request #2316 from lissyx/bump-v0.6.0-alpha.5
Bump VERSION to 0.6.0-alpha.5
2019-08-22 09:53:46 +02:00
Alexandre Lissy
97c373a8a9 Bump VERSION to 0.6.0-alpha.5 2019-08-22 09:53:17 +02:00
lissyx
18281b9e89
Merge pull request #2311 from lissyx/armbian-buster
Move to ARMbian Buster
2019-08-22 09:31:51 +02:00
Alexandre Lissy
dfe8be30b4 Move to ARMbian Buster
Fixes #2310
2019-08-21 22:58:10 +02:00
lissyx
d5544b4a15
Merge pull request #2313 from lissyx/static-libsox
Statically link libsox
2019-08-21 22:53:18 +02:00
lissyx
1ec34190a8
Merge pull request #2309 from carlfm01/wpf-build
Add WPF example build
2019-08-21 22:24:38 +02:00
Alexandre Lissy
8534c0f93a Statically link libsox 2019-08-21 21:35:08 +02:00
Carlos Fonseca
4812276d88 Add WPF example build 2019-08-21 19:04:20 +00:00
lissyx
50e2a99316
Merge pull request #2308 from lissyx/remove-prealloc
Remove ununsed prealloc frames
2019-08-21 14:53:47 +02:00
lissyx
0e47048f9c
Merge pull request #2307 from lissyx/rpi-buster
Build for Raspbian Buster
2019-08-20 21:49:51 +02:00
Alexandre Lissy
81b3b159c4 Remove ununsed prealloc frames
Fixes #2298
2019-08-20 18:38:46 +02:00
Alexandre Lissy
e06fce51ac Move Raspbian support to Buster
Fixes #2272
2019-08-20 16:37:26 +02:00
Reuben Morais
b25de5ac05
Merge pull request #2302 from mozilla/issue2294
Only update time step of leaf prefixes (Fixes #2294)
2019-08-20 12:04:35 +02:00
Reuben Morais
e3bf5d3cc6 Only update time step of leaf prefixes
The intention of this check is to improve the accuracy of the timings by recording the time step where the character saw its highest probability rather than the first time step where it was seen. The problem happens when updating the time step of a prefix that already has children. In that case, if any of the children have a time step that is earlier than `new_timestep`, it'll break the linearity of the timings. My fix is to simply check that the prefix we're updating is a leaf.

For example, say during decoding we have the following beams (format is `(char | time)`, tree node id below, nodes with same id are the same object):

```
1. (-1 | 0 ) -> ('s' | 10) -> ('h' | 13) -> ('e' | 14)
        A                B                  C                D

2. (-1 | 0 ) -> ('s' | 10) -> ('h' | 14)
        A                B                  E
```

And the prefix list is [B, C, D, E]. Currently, if we process character 'h' in time step 15 with a probability higher than both C and E, we update both nodes to have time step 15, which breaks linearity in beam 1. With my fix, we only update node E, which is a leaf. In my tests this does fix the problem, but since we don't have any known good quality data to verify against, it's hard to know if it has other side effects.
2019-08-20 12:03:59 +02:00
lissyx
3e60413f27
Merge pull request #2305 from rhamnett/patch-3
Update import_voxforge.py
2019-08-18 13:53:09 +02:00
Richard Hamnett
57156fffd0
Update import_voxforge.py
Fix importer
2019-08-18 09:44:12 +01:00
Reuben Morais
7f642eda94
Merge pull request #2283 from rcgale/master
Checking for empty transcripts during character encoding
2019-08-16 09:18:22 +02:00
Reuben Morais
29836c9dcc
Merge pull request #2289 from mozilla/error-non-finite-loss
Error message if a sample has non-finite loss
2019-08-12 10:48:59 +02:00
Robert Gale
85e25fa2d7 Applying text_to_char_array to each row in DataFrame so we can provide wav_filename context on exception 2019-08-07 14:43:41 -07:00
Reuben Morais
86fff2f660
Merge pull request #2265 from mozilla/cudnnrnn_compatible
Allow loading a CuDNN RNN checkpoint in a CPU-capable graph (Fixes #2264)
2019-08-07 14:57:16 +02:00
Reuben Morais
248c01001e Error message if a sample has non-finite loss 2019-08-07 10:31:15 +02:00
Reuben Morais
c76070be19
Merge pull request #2282 from mozilla/dynamic-batch-size-in-train-val-graph
Use dynamic batch size in train/val graph
2019-08-07 10:03:53 +02:00
dabinat
1cf2d6a8e6
Merge pull request #2287 from dabinat/label-validation
Label validation - Replace hyphens with spaces
2019-08-06 06:19:45 -07:00
dabinat
abc05b4a4d Label validation - Replace hyphens with spaces 2019-08-05 09:20:13 -07:00
Robert Gale
a3e0e9f9bc
Update text.py
"characters" was a bad variable name now that I think about it
2019-08-01 12:14:13 -07:00
Robert Gale
8ec6ac8079 Checking for empty transcripts during character encoding
This way we can get a plain English exception early, rather than a matrix shape error during training.
2019-08-01 11:19:21 -07:00
Reuben Morais
3636d9b481 Use dynamic batch size in train/val graph
Avoid needing to use the same batch size for training and validation.
2019-08-01 14:53:46 +02:00
Tilman Kamp
daa6167829
Merge pull request #2268 from tilmankamp/reportfilename
Fix #2180 - Added wav_filename to WER report
2019-07-23 17:19:32 +02:00
Tilman Kamp
007e512c00 Fix #2180 - Added wav_filename to WER report 2019-07-23 16:36:10 +02:00
Reuben Morais
84e1fa98b9
Remove unneccessary validator for --export_dir
X-DeepSpeech: NOBUILD
2019-07-22 14:19:19 +00:00
Reuben Morais
9da99d74e4 Remove additional unneeded op lib deps 2019-07-22 15:40:11 +02:00
Reuben Morais
b68bfdbb6e
Merge pull request #2263 from mozilla/remove-unneeded-ops
Remove use of StridedSlice and update op/kernel deps (Fixes #2179)
2019-07-22 13:39:01 +00:00
Reuben Morais
2c29cda641 Remove trailing whitespace 2019-07-22 14:31:29 +02:00
Reuben Morais
7fb3b8f22d Update kernel/op dependencies 2019-07-22 14:31:29 +02:00
Reuben Morais
6afd96e30f Use static batch size whenever it's known 2019-07-22 14:29:45 +02:00
Reuben Morais
23f5bc090d Update prod model to version re-exported with latest master 2019-07-22 14:29:45 +02:00
Reuben Morais
e3d0a44e83 Allow loading a CuDNN RNN checkpoint in a CPU-capable graph 2019-07-22 12:56:26 +02:00
Reuben Morais
1d50667234
Merge pull request #2240 from mozilla/cudnnrnn
Add CuDNN RNN support
2019-07-22 07:27:28 +00:00
Reuben Morais
fd3fbcaa78 Address review comments 2019-07-20 09:15:07 +02:00
Reuben Morais
f7a715d506 Use CuDNN RNN for training 2019-07-20 09:15:07 +02:00
lissyx
7fd7381871
Merge pull request #2253 from lissyx/fix-clean-target
Remove useless make target 'bindings-clean'
2019-07-19 20:38:31 +02:00
Alexandre Lissy
eca7882b14 Remove useless make target 'bindings-clean'
Fixes #2175
2019-07-19 17:39:56 +02:00
lissyx
03e2818649
Merge pull request #2258 from lissyx/fix-lzma
Add liblzma-dev to build python from pyenv with _lzma
2019-07-19 17:23:13 +02:00
Alexandre Lissy
938fdaf118 Add liblzma-dev to build python from pyenv with _lzma
Fixes #2256
2019-07-19 16:45:38 +02:00
lissyx
89338546ac
Merge pull request #2254 from lissyx/fix-android-doc
Fix typo in Android doc
2019-07-19 15:24:45 +02:00
Alexandre Lissy
4fc3c48462 Fix typo in Android doc
X-DeepSpeech: NOBUILD
2019-07-19 15:24:10 +02:00
lissyx
3b04fdef88
Merge pull request #2250 from alchemi5t/master
Added reference about generate_trie for clarity
2019-07-18 19:31:46 +02:00
alchemi5t
5c3e8f5e79 Added reference about generate_trie for clarity 2019-07-18 16:59:40 +05:30
Reuben Morais
0d8eee195a
Merge pull request #2247 from mozilla/exclude_swig_wrapper
Exclude generated SWIG wrapper code from decoder sources
2019-07-17 14:35:45 +00:00
Reuben Morais
8af1286d87
Exclude generated SWIG wrapper code from decoder sources 2019-07-17 13:16:38 +00:00
lissyx
69ad6016a7
Merge pull request #2239 from lissyx/try-bottle
Use TaskCluster caching for Python and Homebrew setup on macOS
2019-07-16 16:27:14 +02:00
Alexandre Lissy
cfb8f81611 Leverage TaskCluster-level caching for Homebrew and Python on macOS 2019-07-16 14:14:06 +02:00
Reuben Morais
aa88e57403
Merge pull request #2245 from mozilla/revert-dc78f8d
Revert "Remove deprecated SessionConfig" as it inadvertently disables soft placement of ops
2019-07-15 13:46:09 +00:00
Reuben Morais
6566299adf Revert "Remove deprecated SessionConfig in favor of just using defaults" as it inadvertently disables soft placement of ops
This reverts commit dc78f8d1e65c4b387aa335df243898cd04704f98.
2019-07-15 15:03:23 +02:00
lissyx
078d179ae1
Merge pull request #2242 from lissyx/package-libdev-win
Package Windows link-lib
2019-07-12 20:07:38 +02:00
Alexandre Lissy
df2a1d6479 Force rebuild 2019-07-12 17:44:34 +02:00
Alexandre Lissy
8588c192b6 Force rebuild 2019-07-12 17:44:22 +02:00
Reuben Morais
4da92084d1 Bump version to 0.6.0-alpha.4
X-DeepSpeech: NOBUILD
2019-07-12 15:24:36 +02:00
Reuben Morais
0f67078549
Merge pull request #2244 from lissyx/fix-rename_to_gpu
Ensure --cuda flag passed for CUDA builds
2019-07-12 13:23:43 +00:00
Alexandre Lissy
e9d9b33ff6 Ensure --cuda flag passed for CUDA builds
Fixes #2243
2019-07-12 14:15:45 +02:00
Alexandre Lissy
e6c05a2313 Package Windows link-lib 2019-07-12 12:06:40 +02:00
Reuben Morais
d52498ff4f
Merge pull request #2241 from mozilla/silence-unknown-opkernel-warnings
Include additional op libs to silence warnings
2019-07-12 08:04:02 +00:00
Reuben Morais
fe88960c69 Include additional op libs to silence warnings 2019-07-12 01:25:45 +02:00
lissyx
e97cc41a91
Merge pull request #2238 from lissyx/bump-v0.6.0-alpha.3
Bump VERSION to 0.6.0-alpha.3
2019-07-11 13:48:55 +02:00
Alexandre Lissy
0c63cf80b3 Bump VERSION to 0.6.0-alpha.3 2019-07-11 13:48:07 +02:00
Reuben Morais
b2a76bd182
Merge pull request #2218 from mozilla/update-1.14
Update to TF 1.14
2019-07-11 10:03:05 +00:00
Reuben Morais
4ca74f72d1 Switch to mozilla/tensorflow r1.14 indexes 2019-07-11 09:34:46 +02:00
Reuben Morais
0b96ba40c2 Workaround 7z missing path problem by removing redundant destination argument 2019-07-10 21:47:33 +02:00
lissyx
1db549a52c
Merge pull request #2234 from lissyx/fix-invalid-prefix
Ensure proper removal of prefix
2019-07-10 15:10:29 +02:00
Alexandre Lissy
c3a2b6ea48 Ensure proper removal of prefix
Fixes #2230
2019-07-10 13:27:18 +02:00
Reuben Morais
615b31a094 Switch to deepspeech-win-b worker type with cudafe++ fix 2019-07-08 18:59:53 +02:00
Reuben Morais
93e9ac19c5 Bump references to TF 1.13.1 to TF 1.14.0 2019-07-08 18:56:59 +02:00
Reuben Morais
6f3e824ef7 Use tf.compat.v1 to silence deprecation warnings and enable TF 2.0 testing 2019-07-08 18:56:59 +02:00
Reuben Morais
dc78f8d1e6 Remove deprecated SessionConfig in favor of just using defaults 2019-07-08 18:56:59 +02:00
Reuben Morais
d52ab9df0b Add TF op libs to libdeepspeech dependencies and rename darwin -> macos 2019-07-08 18:56:59 +02:00
lissyx
fdda74d498
Merge pull request #2232 from lissyx/bump-v0.6.0-alpha.2
Bump VERSION to 0.6.0-alpha.2
2019-07-05 17:44:16 +02:00
Alexandre Lissy
922e0c54fb Bump VERSION to 0.6.0-alpha.2 2019-07-05 17:43:34 +02:00
lissyx
1515d3ebb7
Merge pull request #2225 from lissyx/debug-node9-macos
WIP: Debug NodeJS v9 macOS failures
2019-07-05 16:57:59 +02:00
Alexandre Lissy
39b388c1ea Fix NodeJS bindings leakage / GC crash with MetadataItem querying
Fixes #2217
2019-07-05 14:06:28 +02:00
Reuben Morais
5c8af86aca
Merge pull request #2200 from rhamnett/patch-2
Allow for different sample rate exports
2019-07-04 15:04:22 +00:00
Richard Hamnett
a5c61ea588 Abstract sample rate / update beam width (#2199)
* Abstract sample rate
2019-07-04 14:58:45 +00:00
lissyx
06056949bb
Merge pull request #2228 from lissyx/fix-npm-prefix
Ensure 'npm install' under task directory
2019-07-03 12:50:55 +02:00
Alexandre Lissy
46bf75074f Ensure 'npm install' under task directory 2019-07-03 10:32:28 +02:00
lissyx
464d89d9ff
Merge pull request #2222 from lissyx/update-nodejs-python
Update versions of Python, NodeJS and ElectronJS
2019-07-02 19:50:12 +02:00
Reuben Morais
dbd62fc1c6
Merge pull request #2226 from eggonlea/trivial_download_fix
Fix trivial shadow variable in util/taskcluster.py
2019-07-02 17:45:19 +00:00
Li Li
0190c48d5e Fix trivial shadow variable in util script
Signed-off-by: Li Li <eggonlea@msn.com>
2019-07-02 10:39:29 -07:00
Alexandre Lissy
8753b19eb3 Update versions of Python, NodeJS and ElectronJS 2019-07-02 16:41:23 +02:00
Reuben Morais
c45c70cdaa
Merge pull request #2216 from lissyx/doc-allow-growth
Document TF_FORCE_GPU_ALLOW_GROWTH

X-DeepSpeech: NOBUILD
2019-06-27 18:16:35 +00:00
Reuben Morais
182c405eeb
Merge pull request #2160 from mozilla/more-mandarin-importers
More mandarin importers
2019-06-27 17:43:24 +00:00
Reuben Morais
67b4f6826a Add importer for aidatatang_200zh corpus 2019-06-27 14:40:20 -03:00
Reuben Morais
67a769e0d7 Add importer for Free ST Chinese Mandarin Corpus 2019-06-27 14:40:20 -03:00
Reuben Morais
ee78d471a2 Add importer for Primewords Chinese Corpus Set 1 2019-06-27 14:40:20 -03:00
Alexandre Lissy
1acfadfc98 Document TF_FORCE_GPU_ALLOW_GROWTH
Fixes #2211
2019-06-27 16:51:19 +02:00
lissyx
f7ae19a16a
Merge pull request #2215 from lissyx/package-header
Package DeepSpeech header
2019-06-27 15:34:09 +02:00
Alexandre Lissy
2f225dc987 Package DeepSpeech header
Fixes #2214
2019-06-27 15:07:01 +02:00
lissyx
f50fefda90
Merge pull request #2195 from lissyx/tflite-all
TFLite runtime on Linux/macOS/Windows
2019-06-26 21:45:07 +02:00
Alexandre Lissy
ad11e02582 Testing TFLite runtime on Linux/macOS/Windows 2019-06-26 20:18:53 +02:00
Alexandre Lissy
a9ec2b5cd6 Building TFLite runtime on Linux/macOS/Windows 2019-06-26 20:18:53 +02:00
Reuben Morais
87a9605886
Merge pull request #2212 from mozilla/workspace-status-version
Use bazel workspace status to generate versions
2019-06-26 10:58:06 +00:00
Reuben Morais
d50c0397c7 Use bazel workspace status to generate versions 2019-06-25 18:03:52 -03:00
lissyx
570f9867d6
Merge pull request #2210 from lissyx/git-subdmodules
Handle git submodules for getting DeepSpeech / TensorFlow versions
2019-06-25 18:16:41 +02:00
Alexandre Lissy
fd156a63f6 Handle git submodules for getting DeepSpeech / TensorFlow versions
Fixes #2209
2019-06-25 15:01:12 +02:00
lissyx
90e79804ad
Merge pull request #2207 from lissyx/bump-0.6.0a1
Bump VERSION to 0.6.0-alpha.1
2019-06-25 11:09:53 +02:00
Alexandre Lissy
08bb1cbdef Bump VERSION to 0.6.0-alpha.1
Fixes #2206
2019-06-25 11:09:13 +02:00
Reuben Morais
67d6b5f8ca
Merge pull request #2190 from mozilla/update-prodmodel-0.6.0-alpha.0
Update prod tests to point to re-exported 0.5 model
2019-06-24 21:53:11 +00:00
Reuben Morais
a618652283
Merge pull request #2198 from mozilla/update-brew
Update to Homebrew 2.1.6 to fix macOS failures
2019-06-24 11:24:33 -03:00
Reuben Morais
d7241df241 Update to Homebrew 2.1.6 2019-06-24 09:24:26 -03:00
Richard Hamnett
c248ed0435
Allow for different sample rate exports
Sets output format to Sox transformer
2019-06-23 19:47:12 +01:00
Francis Tyers
6e53fd8fe0
typo
X-DeepSpeech: NOBUILD
2019-06-22 22:18:25 +01:00
Francis Tyers
bc4d266cb2
Update CODINGSTYLE.md 2019-06-22 21:41:48 +01:00
Francis Tyers
11d688fe1a
Create CODINGSTYLE.md
Make a file for keeping track of coding guidelines.
2019-06-22 21:39:55 +01:00
Reuben Morais
0ea580449c
Merge pull request #2194 from mozilla/const-fst
Switch to ConstFst from VectorFst and mmap trie file when reading
2019-06-22 09:41:15 -03:00
Reuben Morais
585d94df7f Update trie files to renenerated versions 2019-06-21 23:24:21 -03:00
Reuben Morais
31afc6811f Switch to ConstFst from VectorFst and mmap trie file when reading 2019-06-21 22:21:19 -03:00
Francis Tyers
58f3758a8c Add validators for command line arguments that require paths (#2192) 2019-06-21 00:02:17 -03:00
Reuben Morais
ceb9de9fad Fix concurrent_streams.py on Python 2.7 2019-06-20 21:53:39 -03:00
Reuben Morais
79ab22086e Fix tests to preserve multiline output and stderr output 2019-06-20 20:55:08 -03:00
Reuben Morais
f70f068484 Update prod tests to point to re-exported 0.5 model 2019-06-20 18:32:30 -03:00
Reuben Morais
b2ebb65a67
Merge pull request #2189 from mozilla/bump-0.6-alpha0
Bump version to 0.6.0-alpha.0
2019-06-20 17:54:14 -03:00
Reuben Morais
dc1c716907 Bump version to 0.6.0-alpha.0
X-DeepSpeech: NOBUILD
2019-06-20 17:31:06 -03:00
Reuben Morais
a2306cf822
Merge pull request #2146 from mozilla/refactor-model-impls
Refactor TF and TFLite implementations into their own classes/files and fix concurrent/interleaved stream bugs by tracking LSTM state in StreamingState
2019-06-20 17:03:33 -03:00
Reuben Morais
4b29b78832 Bump version to 0.5.1 2019-06-20 14:25:14 -03:00
Reuben Morais
080fc27c65
Merge pull request #2184 from mozilla/fix-intermediate-decode
Fix for DS_IntermediateDecode modifying live prefixes in DecoderState
2019-06-20 10:37:26 -03:00
Reuben Morais
37b82f93ac Add a test for calling DS_IntermediateDecode during streaming 2019-06-20 09:48:00 -03:00
Reuben Morais
95a54bfdb8 Modify copies rather than live prefix scores in decoder_decode 2019-06-19 19:39:57 -03:00
Reuben Morais
d5d55958da Clean up formatting in ctc_beam_saerch_decoder.cpp 2019-06-19 19:39:57 -03:00
Reuben Morais
f12ea5e958 Add a test for interleaved/concurrent streams with a single model instance 2019-06-18 19:38:59 -03:00
Reuben Morais
ea1422d47b Document vector/tensor copy functions 2019-06-18 19:38:59 -03:00
Reuben Morais
4b305d2f5e Remove --use_seq_length flag 2019-06-18 19:38:59 -03:00
Reuben Morais
e51b9d987d Remove previous state model variable, track by hand in StreamingState instead 2019-06-18 19:38:59 -03:00
Reuben Morais
6e78bac799 Address review comments 2019-06-18 19:38:59 -03:00
Reuben Morais
6f953837fa Refactor TF and TFLite model implementations into their own classes/files 2019-06-18 19:38:59 -03:00
Reuben Morais
e136b5299a
Merge pull request #2181 from eggonlea/streaming
Add option to native client binary to print intermediate transcriptions while streaming the input file
2019-06-18 16:57:31 -03:00
Li Li
2604da4bc2 native_client: option to run streaming mode
Signed-off-by: Li Li <eggonlea@msn.com>
2019-06-18 10:44:52 -07:00
lissyx
dce068f3cb
Merge pull request #2174 from lissyx/strip-unused
Apply strip_unused on inference graph
2019-06-14 19:39:00 +02:00
Alexandre Lissy
8466d5e4eb Apply strip_unused on inference graph 2019-06-14 17:34:23 +02:00
lissyx
fece16a9be
Merge pull request #2176 from lissyx/clean-tc-scripts
Move all tc-* scripts under taskcluster/
2019-06-14 17:32:47 +02:00
Alexandre Lissy
bec5be7640 Move all tc-* scripts under taskcluster/
Fixes #1912
2019-06-14 16:06:01 +02:00
lissyx
7c8ccf1d8a
Merge pull request #2173 from lissyx/electronjs-disturl
Update ElectronJS headers dist-url
2019-06-14 13:19:22 +02:00
Alexandre Lissy
e87e5616ed Update ElectronJS headers dist-url
Fixes #2172
2019-06-14 11:26:51 +02:00
Reuben Morais
193e054a14
Merge pull request #2168 from eggonlea/evaluate_tflite
Add option to dump output in evaluate_tflite.py
2019-06-13 15:07:39 -03:00
Li Li
863c5544ca evaluate_tflite: Fix shared Queue
Also dump output to a file
Fixed some trivial pylint issues at the same time

Signed-off-by: Li Li <eggonlea@msn.com>
2019-06-13 10:47:15 -07:00
Reuben Morais
3db7a99fad
Merge pull request #2167 from mozilla/bump-0.5.0
Bump version to 0.5.0
2019-06-11 12:23:58 -03:00
Reuben Morais
a7e1d819ad Bump version to 0.5.0
X-DeepSpeech: NOBUILD
2019-06-11 12:12:40 -03:00
Reuben Morais
6f8c902f25
Merge pull request #2165 from mozilla/keep-stream-absolute-timestep
Keep absolute per-stream time step in DecoderState (Fixes #2163)
2019-06-11 11:23:04 -03:00
Reuben Morais
3ceb05e8b9 Keep absolute per-stream time step in DecoderState 2019-06-11 07:58:16 -03:00
Reuben Morais
90efbf0b4f
Merge pull request #2161 from mozilla/fix-decoder-leak
Move decoder state to StreamingState and fix leak when creating multiple streams per model
2019-06-10 15:12:52 -03:00
Reuben Morais
d9c2bec35d Move decoder state to StreamingState and fix leak when creating multiple streams per model 2019-06-10 12:21:09 -03:00
lissyx
31e8a3834e
Merge pull request #2158 from lissyx/frames
Frame counter should be initialized on all importers
2019-06-09 08:01:05 +02:00
Alexandre Lissy
f33ead8af9 Frame counter should be initialized on all importers
Fixes #2150
2019-06-09 07:43:58 +02:00
Reuben Morais
9224d4de2b Remove unused flag
X-DeepSpeech: NOBUILD
2019-06-08 18:27:44 -03:00
Reuben Morais
94df405ec4
Merge pull request #2156 from mozilla/update-prodmodel-0.5
Update prod tests to point to 0.5 model
2019-06-07 14:34:36 -03:00
Reuben Morais
f02d79cca2 Update prod tests to point to 0.5 model 2019-06-07 13:14:45 -03:00
lissyx
3572ded5da
Merge pull request #2152 from lissyx/fix-frames
Ensure frames counter is initialized
2019-06-06 11:16:52 +02:00
Alexandre Lissy
1313b51a5d Ensure frames counter is initialized
Fixes #2150
2019-06-06 10:42:26 +02:00
lissyx
35cbc16697
Merge pull request #2149 from lissyx/fix-ll
Do not fail without --bogus-records
2019-06-05 16:11:31 +02:00
Alexandre Lissy
32a73b7224 Do not fail without --bogus-records 2019-06-05 16:04:45 +02:00
lissyx
8174f3f6db
Merge pull request #2148 from lissyx/lingua-libre-bogus
Do not import known bogus Lingua Libre records
2019-06-04 18:48:56 +02:00
Alexandre Lissy
3a17896463 Do not import known bogus Lingua Libre records
Fixes #2147
2019-06-04 18:45:34 +02:00
Reuben Morais
10d98e1df9
Merge pull request #2145 from mozilla/decoder-optimizations
Decoder optimizations
2019-06-04 13:37:59 -03:00
Reuben Morais
494b573c80 Address review comments 2019-06-04 11:58:38 -03:00
Reuben Morais
1c87bf781a Avoid sorting prefix array twice 2019-06-01 15:06:56 -03:00
Reuben Morais
a46288e1c8 Only create Output structure for beams that are returned in the API 2019-06-01 15:02:41 -03:00
Reuben Morais
1201739af2
Merge pull request #2143 from mozilla/bump-0.5-alpha.11
Bump VERSION to 0.5.0-alpha.11
2019-05-30 22:55:00 -03:00
Reuben Morais
47066bc9d8 Bump VERSION to 0.5.0-alpha.11
X-DeepSpeech: NOBUILD
2019-05-30 22:48:24 -03:00
Reuben Morais
4fcf47c3c6
Merge pull request #2141 from mozilla/tflite-separate-exec-plans
Use separate execution plans for acoustic model and feature computation invoke calls (Fixes #2139)
2019-05-30 21:07:09 -03:00
Reuben Morais
be0da45878 Use separate execution plans for acoustic model and feature computation invoke calls 2019-05-30 17:39:42 -03:00
Reuben Morais
b92cea1c0b
Merge pull request #2131 from carlfm01/dotnet-errorcodes
Error codes for .NET bindings
2019-05-29 10:00:07 -03:00
Carlos Fonseca Murillo
73511e48b2 Match native error codes for .NET 2019-05-29 08:08:35 +00:00
lissyx
df5545299e
Merge pull request #2134 from lissyx/audio-hours
Computing audio hours at import
2019-05-28 17:52:51 +02:00
Alexandre Lissy
17e3f284a5 Computing audio hours at import 2019-05-28 16:46:20 +02:00
Reuben Morais
f14460ffb8
Specify minimum SWIG version 2019-05-28 13:09:43 +00:00
lissyx
f352f0efc7
Merge pull request #2136 from lissyx/win-swig4
Windows/pacmam provides SWIG 4.0.0
2019-05-28 15:06:11 +02:00
Alexandre Lissy
a9f237e654 Windows/pacmam provides SWIG 4.0.0 2019-05-28 13:24:47 +02:00
Reuben Morais
07370cceca
Merge pull request #2117 from areyliu6/master
Add flag to Common Voice importer to separate every character with spaces
2019-05-27 12:33:53 -03:00
lissyx
16afca194d
Merge pull request #2132 from lissyx/fix-macOS-caches
Ensure proper cleanup on macOS workers
2019-05-27 11:13:50 +02:00
Alexandre Lissy
81b333c2f5 Ensure proper cleanup on macOS workers
It seems like `-e` does is job and bash catches faulty exit code, thus
avoiding the required `mv` for dealing with caches. Relying on 'trap' is
the proper way to always go through this code path and ensure picking
the exit code.
2019-05-27 08:53:52 +02:00
Carlos Fonseca
5576b501b9
Merge pull request #2130 from carlfm01/master
Change wrong BEAM_WIDTH .NET console client
2019-05-26 05:17:59 +00:00
Carlos Fonseca Murillo
2bf4b4385f Change wrong BEAM_WIDTH 2019-05-26 02:12:56 +00:00
lissyx
685a0dbb89
Merge pull request #2127 from lissyx/bump-v0.5.0-alpha.10
Bump VERSION to 0.5.0-alpha.10
2019-05-22 20:07:42 +02:00
Alexandre Lissy
66b96c48a7 Bump VERSION to 0.5.0-alpha.10
X-DeepSpeech: NOBUILD
2019-05-22 20:07:05 +02:00
lissyx
325071a2a4
Merge pull request #2126 from lissyx/init-previous_state
Move init of previous_state_{c,h}
2019-05-22 20:06:14 +02:00
Alexandre Lissy
c95397c7ce Move init of previous_state_{c,h}
Fixes #2125
2019-05-22 18:29:24 +02:00
lissyx
82df4a326b
Merge pull request #2124 from lissyx/bump-v0.5.0-alpha.9
Bump VERSION to 0.5.0-alpha.9
2019-05-22 08:45:46 +02:00
Alexandre Lissy
475a18c9b2 Bump VERSION to 0.5.0-alpha.9
X-DeepSpeech: NOBUILD
2019-05-22 08:44:03 +02:00
Arey
8f806f7a3a fit PEP8 2019-05-22 13:05:17 +08:00
Arey
fbedbbc9f9 make flag name more explicit 2019-05-22 12:47:11 +08:00
dabinat
69538f2f62
Merge pull request #2121 from dabinat/streaming-decoder
CTC streaming decoder
2019-05-21 21:41:48 -07:00
dabinat
d9a269412e CTC beam search streaming decoder (+6 squashed commits)
Squashed commits:
[2941b47] Fixed nits
[700572e] Restored old CTC decoder API
[5aaf75d] Fixed nits
[969d71a] Added a destructor for DecoderState
[af0be6e] Removed accumulated_logits
[9dcb7b4] CTC beam search streaming decoder
2019-05-21 20:54:19 -07:00
Arey
53f88f0c33 common voice mandarin 2019-05-17 16:26:22 +08:00
Reuben Morais
df5bb31046
Merge pull request #2111 from mozilla/test-epoch-oom
Revert to a pipelined approach for test epochs to avoid CPU OOM with large alphabets
2019-05-14 18:57:30 +00:00
Reuben Morais
699e4ebcd7 Revert to a pipelined approach for test epochs to avoid CPU OOM with large alphabets 2019-05-13 23:49:14 -03:00
lissyx
a4b35d2f24
Merge pull request #2109 from lissyx/bump-v0.5.0-alpha.8
Bump VERSION to 0.5.0-alpha.8
2019-05-10 19:29:27 +02:00
Alexandre Lissy
6d7399add2 Bump VERSION to 0.5.0-alpha.8
X-DeepSpeech: NOBUILD
2019-05-10 19:28:52 +02:00
lissyx
e491705b40
Merge pull request #2105 from lissyx/export-c
Exported symbols should not be C++ mangled
2019-05-10 19:28:15 +02:00
Alexandre Lissy
5e7ee43174 Exported symbols should not be C++ mangled 2019-05-10 17:30:59 +02:00
lissyx
0c56d1111f
Merge pull request #2107 from lissyx/swig4-macos
Swig4 macos
2019-05-10 17:30:23 +02:00
Alexandre Lissy
af26d8b2bb Deprecate training on Python 2.7 2019-05-10 17:23:54 +02:00
Alexandre Lissy
d47839c71a Remove upstream-merged changed and move to SWIG 4.0.0 2019-05-10 12:42:07 +02:00
lissyx
41c3ffbed2
Merge pull request #2100 from lissyx/ts-wav-convert
Ensure TrainingSpeech is properly formatted
2019-05-06 16:11:28 +02:00
Alexandre Lissy
d41f98f25c Ensure TrainingSpeech is properly formatted
Fixes #2097
2019-05-06 15:58:36 +02:00
lissyx
687c07001b
Merge pull request #2096 from lissyx/ts_apr
Update TrainingSpeech dataset
2019-05-06 10:15:53 +02:00
Alexandre Lissy
8402e7ac9b Update TrainingSpeech dataset
Fixes #2092

X-DeepSpeech: NOBUILD
2019-05-06 10:13:42 +02:00
lissyx
2b05a02163
Merge pull request #2095 from mozilla/lissyx-patch-1
Bring back CUDA and CuDNN versions.
2019-05-05 15:32:13 +02:00
lissyx
f81d9ddb76
Bring back CUDA and CuDNN versions. 2019-05-05 15:31:41 +02:00
lissyx
c1816d6dfc
Merge pull request #2088 from lissyx/cuda-devices
Force only one GPU on LDC93S1 scripts
2019-04-30 19:13:52 +02:00
Alexandre Lissy
333a175dfd Force only one GPU on LDC93S1 scripts
Fixes #2087
2019-04-30 17:35:26 +02:00
Reuben Morais
f64aa73e7f
Merge pull request #2086 from mozilla/import_aishell
Add AISHELL dataset importer
2019-04-29 23:12:05 +00:00
Reuben Morais
feacdea4aa Add AISHELL dataset importer 2019-04-29 10:00:32 -03:00
Reuben Morais
cb0e9763be
Fix pre-commit hook docs on README.md
X-DeepSpeech: NOBUILD
2019-04-29 12:59:28 +00:00
lissyx
ccf7ab362b
Merge pull request #2079 from cfreemoser/patch-1
Updated Dockerfile
2019-04-29 13:40:30 +02:00
Cem Philipp Freimoser
7cfd7b85dd
Removed software-properties-common and curl 2019-04-28 20:08:31 +02:00
Cem Philipp Freimoser
2050aa4cf6
make git lfs optinal 2019-04-28 17:50:07 +02:00
Cem Philipp Freimoser
f5bb103f88
Updated Dockerfile
* use python 3
* include git-lfs
* allow checkout from GitHub instead of local copy step
2019-04-28 11:37:44 +02:00
lissyx
b240502de9
Merge pull request #2076 from lissyx/bump-v0.5.0-alpha.7
Bump VERSION to 0.5.0-alpha.7
2019-04-26 08:23:30 +02:00
Alexandre Lissy
5a77f09770 Bump VERSION to 0.5.0-alpha.7
X-DeepSpeech: NOBUILD
2019-04-26 08:22:57 +02:00
lissyx
e9af8b1634
Merge pull request #2068 from lissyx/lingua-libre
LinguaLibre importer
2019-04-25 18:48:45 +02:00
Alexandre Lissy
664813134e LinguaLibre importer
Fixes #2067
2019-04-25 18:47:59 +02:00
lissyx
34866df97b
Merge pull request #2073 from lissyx/nodejs-v12
Add NodeJS v12 support
2019-04-25 18:29:55 +02:00
Alexandre Lissy
348dd0e315 Add ElectronJS v5.0.0 2019-04-25 15:43:16 +02:00
Alexandre Lissy
f7f6a1480f Add NodeJS v12 support
Fixes #2070
2019-04-25 15:43:16 +02:00
Reuben Morais
656ab5734a
Merge pull request #2074 from mozilla/bindings-idiomatic
Some bindings clean-up
2019-04-25 13:20:50 +00:00
Reuben Morais
41e6daaff2 Expose deallocation functions in NodeJS binding 2019-04-25 08:49:25 -03:00
lissyx
ec06f942c4
Merge pull request #2072 from lissyx/bump-v0.5.0-alpha.6
Bump VERSION to 0.5.0-alpha.6
2019-04-25 08:41:12 +02:00
Alexandre Lissy
fab9c24fc7 Bump VERSION to 0.5.0-alpha.6
X-DeepSpeech: NOBUILD
2019-04-25 08:40:36 +02:00
Reuben Morais
f397006436 Make Metadata.items more idiomatic in Python bindings 2019-04-24 21:35:10 -03:00
lissyx
9815d54218
Merge pull request #2022 from lissyx/expose-metadata
Expose extended metadata information to bindings
2019-04-24 23:06:33 +02:00
Alexandre Lissy
a9717e702a Fix python linter 2019-04-24 20:12:40 +02:00
Alexandre Lissy
c3c3a3fb81 Expose extended metadata information to bindings
Fixes #2006
2019-04-24 20:12:39 +02:00
Reuben Morais
8f01cca448
Merge pull request #2058 from dabinat/json-output
Change CSV output to JSON
2019-04-18 01:12:54 +00:00
dabinat
fa09736d9b Simplified string formatting 2019-04-17 13:46:05 -07:00
dabinat
8ad7e8e6d5 Changed CSV output to JSON 2019-04-16 15:25:01 -07:00
Reuben Morais
1e601d5c4a
Merge pull request #2038 from mozilla/split-dev-test-epochs
Perform separate validation and test epochs per dataset when multiple files are specified (Fixes #1634 and #2043)
2019-04-16 15:23:16 +00:00
Reuben Morais
904ab1e288 Centralize progress logging and progress bar logic 2019-04-16 11:06:26 -03:00
Reuben Morais
9586fbbd30 Rename --train_cached_features_path to --feature_cache 2019-04-16 11:06:26 -03:00
Reuben Morais
bfa070e6c3 Compute weighted average of individual dev set losses 2019-04-16 11:06:26 -03:00
Reuben Morais
911a1ce4b1 Do separate test epochs if multiple input files are specified 2019-04-16 11:06:26 -03:00
Reuben Morais
58e9b1a78e Log total optimization time 2019-04-16 11:01:40 -03:00
Reuben Morais
a85af3da49 Do separate validation epochs if multiple input files are specified 2019-04-16 11:01:38 -03:00
Reuben Morais
68c17611c6
Merge pull request #2040 from mozilla/pylint-hook
Add linter config for Python and CI integration on PRs
2019-04-12 15:51:33 +00:00
Reuben Morais
f27ed522af Git pre-commit hook instructions 2019-04-12 11:06:14 -03:00
lissyx
718fd437ec
Merge pull request #2044 from dabinat/word-timings
Fix spaces being appended to word
2019-04-12 08:48:28 +02:00
lissyx
cdf4fe3dcd
Merge pull request #2046 from lissyx/fix-decision-task
Upgrade system for Python 3.5 for decision task
2019-04-12 00:40:23 +02:00
Alexandre Lissy
42e649809c Upgrade system for Python 3.5 for decision task
NetworkX upgrade has no more support for Python 3.4

Fixes #2045
X-DeepSpeech: NOBUILD
2019-04-12 00:28:04 +02:00
dabinat
ec50fb9839 Fixed an issue where spaces may be appended to word 2019-04-11 12:52:10 -07:00
Reuben Morais
2645da0290
Merge pull request #2042 from mozilla/move-dotnet
Move .NET bindings to native_client folder
2019-04-11 18:30:43 +00:00
Josh Meyer
6fcad513e8
Native Client README split (#2002)
Re-factioning the `native_client` README. 

This PR removes redundancies between the master README and the `native_client` README, keeping only instructions for building in the `native_client` README. 

All installation instructions for built binaries / language bindings remain in the master README.
2019-04-11 20:27:59 +02:00
Reuben Morais
97941db3d8 Move .NET bindings to native_client folder 2019-04-11 13:09:45 -03:00
Reuben Morais
6a0c186b5c Correct mistake in len check
X-DeepSpeech: NOBUILD
2019-04-11 12:07:04 -03:00
Reuben Morais
91421a3466 Mention linting in README 2019-04-11 07:02:24 -03:00
Reuben Morais
13757a4258 Fix pylint warnings 2019-04-11 07:02:21 -03:00
lissyx
a05989439e
Merge pull request #2035 from lissyx/electronjs
Build for ElectronJS
2019-04-11 09:43:55 +02:00
Alexandre Lissy
7bd1619120 Build for ElectronJS
Fixes #2032
2019-04-11 08:32:10 +02:00
Reuben Morais
a16e468498 Add pylint CI 2019-04-10 21:21:26 -03:00
lissyx
c51acc1c3b
Merge pull request #2033 from lissyx/test
Remove useless install step
2019-04-10 22:07:33 +02:00
Alexandre Lissy
5a6e32de5c Enable 'use strict' and remove 'node-pre-gyp install'
Fixes #2034
2019-04-10 22:03:51 +02:00
Tilman Kamp
78247c2377
Merge pull request #2037 from tilmankamp/fix2036
Fix #2036 - Using dev_loss in validation log message
2019-04-10 12:10:40 +02:00
Tilman Kamp
4f2d1ecc25 Fix #2036 - Using dev_loss in validation log message 2019-04-10 11:51:17 +02:00
Reuben Morais
625064c4ae
Rename Windows CUDA native client package to match Linux package
X-DeepSpeech: NOBUILD
2019-04-09 22:36:25 +00:00
Reuben Morais
ed7f6bf4ce
Update CUDA dependency section for TF 1.13
X-DeepSpeech: NOBUILD
2019-04-09 13:56:52 +00:00
Tilman Kamp
ca4300fe7f
Merge pull request #2028 from tilmankamp/fix2020
Fix #2020 - Testing best-dev checkpoint
2019-04-09 15:25:12 +02:00
Tilman Kamp
a0c0918e25 Lazy-create global_step 2019-04-09 15:08:09 +02:00
Tilman Kamp
42e5d78e9a Fix #2020 - Testing best-dev checkpoint 2019-04-09 15:08:09 +02:00
Reuben Morais
4b7c00fc36
Merge pull request #2023 from mozilla/dset-size
Don't calculate dataset size by hand, use tf.errors.OutOfRangeError
2019-04-09 09:19:35 -03:00
Reuben Morais
fdc7d77ad6 Log start and end of epoch if progress bar is disabled 2019-04-09 09:12:58 -03:00
Kelly Davis
1c84898f7b
Merge pull request #2029 from mdigiorgio/fix-clang-compilation
Use std::map to fix compilation issue with clang
2019-04-09 14:03:15 +02:00
Michele Di Giorgio
aa27c61d01 Use std::map to fix compilation issue with clang
When building with clang the following compilation error is thrown:

symbol-table.h:199:3: error: no template named 'map'
    map<int64, int64> key_map_;
    ^

This patch solves that issue by explicitly specifying the std namespace.
2019-04-09 11:37:59 +01:00
lissyx
a4a9d365d2
Merge pull request #2027 from lissyx/ec2-decision
Use EC2 Ubuntu mirrors for decision task
2019-04-09 09:49:20 +02:00
Alexandre Lissy
beb811e0f3 Use EC2 Ubuntu mirrors for decision task 2019-04-09 09:11:18 +02:00
Reuben Morais
8fa35518ea
Merge pull request #2024 from mozilla/fix-warning-typo
Fix typo in warning name
2019-04-08 21:33:46 -03:00
Reuben Morais
1c52008572
Fix typo in warning name 2019-04-08 22:29:30 +00:00
Reuben Morais
0b4b806cbf
Merge pull request #2011 from mozilla/free-strings
Provide an API function to free strings returned by API (Fixes #1979)
2019-04-08 19:25:34 -03:00
Reuben Morais
8053548e34 Check for KeyboardInterrupt directly instead of using tf.train.Coordinator 2019-04-08 18:05:20 -03:00
Reuben Morais
cc351cd607 Clean up progress bars for unknown size datasets 2019-04-08 18:04:08 -03:00
Reuben Morais
6ab91f37ec Don't calculate dataset size by hand, use tf.errors.OutOfRangeError 2019-04-08 16:18:15 -03:00
lissyx
15adf008ae
Merge pull request #2019 from lissyx/bump-v0.5.0-alpha-5
Bump VERSION to 0.5.0-alpha.5
2019-04-08 14:45:26 +02:00
Alexandre Lissy
4b3e896202 Bump VERSION to 0.5.0-alpha.5
X-DeepSpeech: NOBUILD
2019-04-08 14:44:44 +02:00
lissyx
3e93b1263b
Merge pull request #2018 from lissyx/win-pkgs
Add Windows Python packages to upload tasks
2019-04-08 14:44:31 +02:00
Alexandre Lissy
47ef5d3328 Add Windows Python packages to upload tasks
X-DeepSpeech: NOBUILD
2019-04-08 14:43:31 +02:00
lissyx
594cf5af99
Merge pull request #2003 from lissyx/py-win
Windows Python bindings
2019-04-08 14:39:09 +02:00
Reuben Morais
5054731751 Use DS_FreeStrings in JNI bindings 2019-04-08 09:38:47 -03:00
Alexandre Lissy
6dbf65cce8 Produce Windows Python bindings
Fixes #1937
2019-04-08 14:38:43 +02:00
Reuben Morais
89ed0f47f0
Merge pull request #2012 from mozilla/output-probability
Add probability to Metadata struct (Fixes #900)
2019-04-08 09:24:42 -03:00
lissyx
a4058e2199
Merge pull request #2017 from lissyx/fix-win-tests
Fix Windows tests execution
2019-04-08 12:35:48 +02:00
Alexandre Lissy
98418f7628 Use Windows paths for NodeJS extraction
This allows to overcome the 260 chars limit, being hit with some NodeJS
instances.
2019-04-08 10:55:55 +02:00
Alexandre Lissy
7f345fcccc Remove NodeJS v5.x tests on Windows
There is no nodejs-v5.*-win-x64.zip file
2019-04-08 10:22:29 +02:00
Alexandre Lissy
70fb52b125 Fix Windows tests execution 2019-04-08 09:54:06 +02:00
Reuben Morais
4b9f3fbe7d Provide an API function to free strings returned by API 2019-04-05 21:37:34 -03:00
Reuben Morais
a72b69020d Add probability to Metadata struct and fix memory management of metadata 2019-04-05 21:37:24 -03:00
Reuben Morais
d08fc4b6a2 Fix use of export_language flag 2019-04-05 21:37:05 -03:00
Reuben Morais
5779d298e1 Merge branch 'more-metadata' 2019-04-05 14:38:56 -03:00
Reuben Morais
5b80f21668 Rename language flag 2019-04-05 11:54:02 -03:00
Reuben Morais
243f69e682
Merge pull request #2005 from mozilla/relative-epochs
Ignore epochs in checkpoints, always start epoch count from zero
2019-04-05 10:31:19 -03:00
Reuben Morais
7f6fd8b48b Embed more metadata in exported model and read it in native client 2019-04-05 09:35:23 -03:00
Reuben Morais
97c36291af Rename epoch flag to epochs 2019-04-05 09:30:50 -03:00
Reuben Morais
2f3f095048 Ignore epochs in checkpoints, always start epoch count from zero 2019-04-05 00:21:04 -03:00
Reuben Morais
57450893ea
Merge pull request #1919 from mozilla/tfdatatest
Implement input pipeline with tf.data API
2019-04-05 00:13:48 -03:00
Reuben Morais
6154150317 Pass missing dropout rate parameters 2019-04-04 22:56:12 -03:00
Reuben Morais
5ee856d075 Clarify early stopping dependency on validation 2019-04-04 22:41:38 -03:00
Reuben Morais
d70753cc0f Use TASKCLUSTER_TMP_DIR instead of hardcoding /tmp 2019-04-04 22:41:38 -03:00
Reuben Morais
ed15caf3c5 Check if train/dev/test files were passed in instead of having explicit flags 2019-04-04 22:41:38 -03:00
lissyx
f458337710
Merge pull request #2001 from lissyx/cuda-3.5
WIP: TensorFlow
2019-04-03 20:49:50 +02:00
Reuben Morais
232df740db Fix TFLite bug in feature computation graph and clean up deepspeech.cc a bit 2019-04-03 10:19:22 -03:00
Alexandre Lissy
3aa286f615 Update to new TensorFlow r1.13 artifacts
Fixes #1970
2019-04-03 14:31:57 +02:00
Kelly Davis
25a254f1fc
Merge pull request #2000 from mozilla/issue1906
Gram Vaani importer
2019-04-03 13:43:46 +02:00
kdavis-mozilla
0bc132cabe Addressed review comments 2019-04-03 12:33:30 +02:00
Tilman Kamp
033ee0f4c2
Merge pull request #1996 from tilmankamp/fix1991
Fix #1991 - Additional import options for import_cv2.py
2019-04-03 10:58:31 +02:00
Tilman Kamp
c1e75eaa8d Pack to data set 2019-04-03 10:57:36 +02:00
Reuben Morais
a7cda8e761 Add version info to exported graphs 2019-04-02 21:06:03 -03:00
Reuben Morais
4e9e78fefe Infer number of MFCC features from input shape 2019-04-02 18:31:32 -03:00
Reuben Morais
d6babfb8f3 Speed up training tests and make sure they fully converge 2019-04-02 18:31:32 -03:00
Reuben Morais
6632504ad1 Don't overwrite exported graph from training task with the TFLite version 2019-04-02 18:31:32 -03:00
Reuben Morais
e7bbd4a70f Fix illegal summary names 2019-04-02 18:31:32 -03:00
Reuben Morais
12fe93bfe4 Remove c_speech_features and kiss_fft130 code 2019-04-02 18:31:32 -03:00
Reuben Morais
51f80744c6 Remove DS_AudioToInputVector and dep on c_speech_features 2019-04-02 18:31:32 -03:00
Reuben Morais
1cea2b0fe8 Rewrite input pipeline to use tf.data API 2019-04-02 18:31:32 -03:00
Reuben Morais
bd7358d94e Clarify meaning of build target
X-DeepSpeech: NOBUILD
2019-04-02 16:19:21 -03:00
kdavis-mozilla
441ac5869f Gram Vaani importer 2019-04-02 19:55:58 +02:00
Tilman Kamp
94c088be87 Updated README, some code beautification 2019-04-02 19:41:33 +02:00
Tilman Kamp
7dc236bab4 Removed unnecessary default value 2019-04-02 18:02:23 +02:00
Tilman Kamp
8e78e17904 Some code beautification 2019-04-02 18:02:23 +02:00
Tilman Kamp
5645285d25 Fix #1991 - Additional import options for import_cv2.py 2019-04-02 18:02:23 +02:00
Josh Meyer
7d7a7f7be5
Merge pull request #1965 from mozilla/import_cv2
import_cv2
2019-04-02 08:49:13 -07:00
Reuben Morais
9ca61b077e Fix checkpointing logic 2019-04-01 18:53:06 -03:00
Reuben Morais
b7b44f3573
Merge pull request #1988 from tilmankamp/remdist
Removed distributed training support
2019-04-01 16:54:37 -03:00
Josh Meyer
12d31c11bd
import_cv2 information correction
as per Kelly's review -- `import_cv2.py` does not download data, and assumes different args
2019-04-01 19:16:53 +02:00
Tilman Kamp
a179a2389f Fix #1986 - Remove distributed training support 2019-04-01 18:43:22 +02:00
lissyx
a009361e47
Merge pull request #1974 from dabinat/word-timings
Output word timings
2019-03-30 22:26:25 +01:00
Reuben Morais
cc0bc8b5a4
Merge pull request #1994 from Mozilla-GitHub-Standards/master
Add Mozilla Code of Conduct
2019-03-30 16:57:37 -03:00
dabinat
7cda855cb6 Client - Whitespace fix 2019-03-30 10:23:00 -07:00
dabinat
594f74efe9 API - Null pointer check 2019-03-30 10:22:43 -07:00
Mozilla-GitHub-Standards
1f7babda1a Add Mozilla Code of Conduct file
Fixes #1993.

_(Message COC002)_
2019-03-29 14:58:39 -07:00
dabinat
26af3b292d API - Minor cleanup 2019-03-22 10:08:13 -07:00
dabinat
0364bfa518 API - Switched to std::unique_ptr 2019-03-22 10:07:44 -07:00
dabinat
2f255307e3 Client - Added support for detecting UTF-8 spaces 2019-03-22 10:07:06 -07:00
dabinat
c56ec7ac7d Allocate memory C++-style in API 2019-03-22 02:20:00 -07:00
dabinat
c90828921e Removed unnecessary variables in API 2019-03-22 01:37:00 -07:00
dabinat
d79989c916 Client whitespace tweaks 2019-03-22 01:36:03 -07:00
dabinat
a0304eec1a Output word-level metadata from the client with the -e tag 2019-03-21 18:07:24 -07:00
dabinat
192e17f2d5 Expose letter timings on the API 2019-03-21 18:07:24 -07:00
dabinat
1fcf8a4cc3 Fixed whitespace 2019-03-21 18:03:47 -07:00
dabinat
6f667713bf Client whitespace fixes 2019-03-21 16:36:46 -07:00
dabinat
a3b81d054e Output word-level metadata from the client with the -e tag 2019-03-21 15:53:04 -07:00
dabinat
79830fe512 Expose letter timings on the API 2019-03-21 15:50:02 -07:00
Tilman Kamp
730ef1b5c8
Merge pull request #1973 from tilmankamp/fix1972
Fix #1972
2019-03-21 14:40:26 +01:00
Tilman Kamp
42f04dc9aa Fix #1972 2019-03-21 13:39:12 +01:00
lissyx
c092213096
Merge pull request #1969 from lissyx/bump-v0.5.0-alpha.4
Bump VERSION to v0.5.0-alpha.4
2019-03-20 19:29:43 +01:00
Alexandre Lissy
4f261b7d82 Bump VERSION to v0.5.0-alpha.4
X-DeepSpeech: NOBUILD
2019-03-20 19:29:00 +01:00
lissyx
97bacdd544
Merge pull request #1967 from lissyx/npm-win
Npm win
2019-03-20 19:27:35 +01:00
Alexandre Lissy
d421daa2ca Add NodeJS Windows tests 2019-03-20 17:51:15 +01:00
Alexandre Lissy
fd133c4bdc Add NPM GPU packaging task 2019-03-20 11:58:56 +01:00
Alexandre Lissy
d18697d432 Rename task node-package to node-package-cpu 2019-03-20 11:58:56 +01:00
Alexandre Lissy
2a73a76ac3 Add NodeJS build for Windows 2019-03-20 11:58:56 +01:00
Alexandre Lissy
16b7237e70 Install and patch swig on msys64 2019-03-20 09:41:49 +01:00
josh
69569aab0b import_cv2 2019-03-19 19:00:16 +01:00
1250 changed files with 118776 additions and 21455 deletions

3
.cardboardlint.yml Normal file
View File

@ -0,0 +1,3 @@
linters:
- pylint:
filefilter: ['+ *.py', '+ bin/*.py']

View File

@ -2,15 +2,15 @@
set -xe
apt-get install -y python3-venv
apt-get install -y python3-venv libopus0
python3 -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r <(grep -v tensorflow requirements.txt)
pip install tensorflow-gpu==1.13.0-rc2
# Install ds_ctcdecoder package from TaskCluster
pip install $(python3 util/taskcluster.py --decoder)
pip install -U setuptools wheel pip
pip install .
pip uninstall -y tensorflow
pip install tensorflow-gpu==1.14
mkdir -p ../keep/summaries
@ -18,19 +18,22 @@ data="${SHARED_DIR}/data"
fis="${data}/LDC/fisher"
swb="${data}/LDC/LDC97S62/swb"
lbs="${data}/OpenSLR/LibriSpeech/librivox"
cv="${data}/mozilla/CommonVoice/en_1087h_2019-06-12/clips"
npr="${data}/NPR/WAMU/sets/v0.3"
python -u DeepSpeech.py \
--train_files "${fis}-train.csv","${swb}-train.csv","${lbs}-train-clean-100.csv","${lbs}-train-clean-360.csv","${lbs}-train-other-500.csv" \
--dev_files "${lbs}-dev-clean.csv"\
--test_files "${lbs}-test-clean.csv" \
--train_files "${npr}/best-train.sdb","${npr}/good-train.sdb","${cv}/train.sdb","${fis}-train.sdb","${swb}-train.sdb","${lbs}-train-clean-100.sdb","${lbs}-train-clean-360.sdb","${lbs}-train-other-500.sdb" \
--dev_files "${lbs}-dev-clean.sdb" \
--test_files "${lbs}-test-clean.sdb" \
--train_batch_size 24 \
--dev_batch_size 48 \
--test_batch_size 48 \
--train_cudnn \
--n_hidden 2048 \
--learning_rate 0.0001 \
--dropout_rate 0.2 \
--epoch 13 \
--display_step 0 \
--validation_step 1 \
--dropout_rate 0.40 \
--epochs 150 \
--noearly_stop \
--feature_cache "../tmp/feature.cache" \
--checkpoint_dir "../keep" \
--summary_dir "../keep/summaries"

5
.dockerignore Normal file
View File

@ -0,0 +1,5 @@
.git/lfs
native_client/ds-swig
native_client/python/dist/*.whl
native_client/ctcdecode/*.a
native_client/javascript/build/

5
.gitattributes vendored
View File

@ -1,3 +1,2 @@
*.binary filter=lfs diff=lfs merge=lfs -crlf
data/lm/trie filter=lfs diff=lfs merge=lfs -crlf
data/lm/vocab.txt filter=lfs diff=lfs merge=lfs -text
data/lm/kenlm.scorer filter=lfs diff=lfs merge=lfs -text
.github/actions/check_artifact_exists/dist/index.js binary

40
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,40 @@
---
name: Bug report
about: Create a report to help us improve
title: 'Bug: '
labels: bug
assignees: ''
---
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file.
If you've found a bug, please provide the following information:
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Run the following command '...'
2. ...
3. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment (please complete the following information):**
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
- **TensorFlow installed from (our builds, or upstream TensorFlow)**:
- **TensorFlow version (use command below)**:
- **Python version**:
- **Bazel version (if compiling from source)**:
- **GCC/Compiler version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**:
- **Exact command to reproduce**:
**Additional context**
Add any other context about the problem here.

8
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: Coqui STT GitHub Discussions
url: https://github.com/coqui-ai/STT/discussions
about: Please ask and answer questions here.
- name: Coqui Security issue disclosure
url: mailto:info@coqui.ai
about: Please report security vulnerabilities here.

View File

@ -0,0 +1,26 @@
---
name: Feature request
about: Suggest an idea for this project
title: 'Feature request: '
labels: enhancement
assignees: ''
---
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file.
If you have a feature request, then please provide the following information:
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

View File

@ -0,0 +1,11 @@
name: "Build TensorFlow"
description: "Build TensorFlow Build"
inputs:
flavor:
description: "Build flavor"
required: true
runs:
using: "composite"
steps:
- run: ./ci_scripts/tf-build.sh ${{ inputs.flavor }}
shell: bash

View File

@ -0,0 +1,43 @@
Building and using a TensorFlow cache:
======================================
The present action will check the existence of an artifact in the list of the
repo artifacts. Since we don't want always to download the artifact, we can't
rely on the official download-artifact action.
Rationale:
----------
Because of the amount of code required to build TensorFlow, the library build
is split into two main parts to make it much faster to run PRs:
- a TensorFlow prebuild cache
- actual code of the library
The TensorFlow prebuild cache exists because building tensorflow (even just the
`libtensorflow_cpp.so`) is a huge amount of code and it will take several hours
even on decent systems. So we perform a cache build of it, because the
tensorflow version does not change that often.
However, each PR might have changes to the actual library code, so we rebuild
this everytime.
The `tensorflow_opt-macOS` job checks whether such build cache exists alrady.
Those cache are stored as artifacts because [GitHub Actions
cache](https://docs.github.com/en/actions/guides/caching-dependencies-to-speed-up-workflows)
has size limitations.
The `build-tensorflow-macOS` job has a dependency against the cache check to
know whether it needs to run an actual build or not.
Hacking:
--------
For hacking into the action, please follow the [GitHub JavaScript
Actions](https://docs.github.com/en/actions/creating-actions/creating-a-javascript-action#commit-tag-and-push-your-action-to-github)
and specifically the usage of `ncc`.
```
$ npm install
$ npx ncc build main.js --license licenses.txt
$ git add dist/
```

View File

@ -0,0 +1,32 @@
name: "check/download artifacts"
description: "Check and download that an artifact exists"
inputs:
name:
description: "Artifact name"
required: true
github_token:
description: "GitHub token"
required: false
default: ${{ github.token }}
download:
description: "Should we download?"
required: false
default: false
path:
description: "Where to unpack the artifact"
required: false
default: "./"
repo:
description: "Repository name with owner (like actions/checkout)"
required: false
default: ${{ github.repository }}
release-tag:
description: "Tag of release to check artifacts under"
required: false
default: "v0.10.0-alpha.7"
outputs:
status:
description: "Status string of the artifact: 'missing' or 'found'"
runs:
using: "node12"
main: "dist/index.js"

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,132 @@
const core = require('@actions/core');
const github = require('@actions/github');
const AdmZip = require('adm-zip');
const filesize = require('filesize');
const pathname = require('path');
const fs = require('fs');
const { throttling } = require('@octokit/plugin-throttling');
const { GitHub } = require('@actions/github/lib/utils');
const Download = require('download');
const Util = require('util');
const Stream = require('stream');
const Pipeline = Util.promisify(Stream.pipeline);
async function getGoodArtifacts(client, owner, repo, releaseId, name) {
console.log(`==> GET /repos/${owner}/${repo}/releases/${releaseId}/assets`);
const goodRepoArtifacts = await client.paginate(
"GET /repos/{owner}/{repo}/releases/{release_id}/assets",
{
owner: owner,
repo: repo,
release_id: releaseId,
per_page: 100,
},
(releaseAssets, done) => {
console.log(" ==> releaseAssets", releaseAssets);
const goodAssets = releaseAssets.data.filter((a) => {
console.log("==> Asset check", a);
return a.name == name
});
if (goodAssets.length > 0) {
done();
}
return goodAssets;
}
);
console.log("==> maybe goodRepoArtifacts:", goodRepoArtifacts);
return goodRepoArtifacts;
}
async function main() {
try {
const token = core.getInput("github_token", { required: true });
const [owner, repo] = core.getInput("repo", { required: true }).split("/");
const path = core.getInput("path", { required: true });
const name = core.getInput("name");
const download = core.getInput("download");
const releaseTag = core.getInput("release-tag");
const OctokitWithThrottling = GitHub.plugin(throttling);
const client = new OctokitWithThrottling({
auth: token,
throttle: {
onRateLimit: (retryAfter, options) => {
console.log(
`Request quota exhausted for request ${options.method} ${options.url}`
);
// Retry twice after hitting a rate limit error, then give up
if (options.request.retryCount <= 2) {
console.log(`Retrying after ${retryAfter} seconds!`);
return true;
} else {
console.log("Exhausted 2 retries");
core.setFailed("Exhausted 2 retries");
}
},
onAbuseLimit: (retryAfter, options) => {
// does not retry, only logs a warning
console.log(
`Abuse detected for request ${options.method} ${options.url}`
);
core.setFailed(`GitHub REST API Abuse detected for request ${options.method} ${options.url}`)
},
},
});
console.log("==> Repo:", owner + "/" + repo);
const releaseInfo = await client.repos.getReleaseByTag({
owner,
repo,
tag: releaseTag,
});
console.log(`==> Release info for tag ${releaseTag} = ${JSON.stringify(releaseInfo.data, null, 2)}`);
const releaseId = releaseInfo.data.id;
const goodArtifacts = await getGoodArtifacts(client, owner, repo, releaseId, name);
console.log("==> goodArtifacts:", goodArtifacts);
const artifactStatus = goodArtifacts.length === 0 ? "missing" : "found";
console.log("==> Artifact", name, artifactStatus);
console.log("==> download", download);
core.setOutput("status", artifactStatus);
if (artifactStatus === "found" && download == "true") {
console.log("==> # artifacts:", goodArtifacts.length);
const artifact = goodArtifacts[0];
console.log("==> Artifact:", artifact.id)
const size = filesize(artifact.size, { base: 10 })
console.log(`==> Downloading: ${artifact.name} (${size}) to path: ${path}`)
const dir = pathname.dirname(path)
console.log(`==> Creating containing dir if needed: ${dir}`)
fs.mkdirSync(dir, { recursive: true })
await Pipeline(
Download(artifact.url, {
headers: {
"Accept": "application/octet-stream",
"Authorization": `token ${token}`,
},
}),
fs.createWriteStream(path)
)
}
if (artifactStatus === "missing" && download == "true") {
core.setFailed("Required", name, "that is missing");
}
return;
} catch (err) {
console.error(err.stack);
core.setFailed(err.message);
}
}
main();

1139
.github/actions/check_artifact_exists/package-lock.json generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,13 @@
{
"name": "check_artifact_exists",
"main": "main.js",
"devDependencies": {
"@actions/core": "^1.2.6",
"@actions/github": "^4.0.0",
"@octokit/plugin-throttling": "^3.4.1",
"@vercel/ncc": "^0.27.0",
"adm-zip": "^0.5.2",
"download": "^8.0.0",
"filesize": "^6.1.0"
}
}

View File

@ -0,0 +1,29 @@
name: "chroot bind mount"
description: "Bind mount into chroot"
inputs:
mounts:
description: "Path to consider"
required: true
runs:
using: "composite"
steps:
- id: install_qemu
run: |
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends qemu-user-static
shell: bash
- id: bind_mount_chroot
run: |
set -xe
# Bind-mount so that we have the same tree inside the chroot
for dev in ${{ github.workspace }} ${{ inputs.mounts }};
do
sudo mount -o bind ${dev} ${{ env.SYSTEM_RASPBIAN }}${dev}
done;
for dev in ${{ inputs.mounts }};
do
sudo mount -o bind /${dev} ${{ env.SYSTEM_RASPBIAN }}/${dev}
done;
shell: bash

15
.github/actions/get_cache_key/README.md vendored Normal file
View File

@ -0,0 +1,15 @@
GitHub Action to compute cache key
==================================
It is intended to work in harmony with `check_artifact_exists`:
- compute a stable cache key
- as simple to use as possible (less parameters)
It will expect to be ran in a GitHub Action job that follows
`SUBMODULE_FLAVOR-PLATFORM`:
- it will use the `SUBMODULE` part to check what is the current SHA1 of this git submodule.
- the `FLAVOR` allows to distringuish e.g., opt/dbg builds
- the PLATFORM permits defining an os/arch couple
It allows for an `extras` field for extensive customization, like forcing a
re-build.

View File

@ -0,0 +1,34 @@
name: "get cache key for submodule"
description: "Compute a cache key based on git submodule"
inputs:
extras:
description: "Extra cache key value"
required: true
osarch:
description: "Override automatic OSARCH value"
required: false
outputs:
key:
description: "Computed cache key name"
value: ${{ steps.compute_cache_key.outputs.key }}
runs:
using: "composite"
steps:
- id: compute_cache_key
run: |
set -xe
JOB=${{ github.job }}
SUBMODULE=$(echo $JOB | cut -d'-' -f1 | cut -d'_' -f1)
FLAVOR=$(echo $JOB | cut -d'-' -f1 | cut -d'_' -f2)
if [ -z "${{ inputs.osarch }}" ]; then
OSARCH=$(echo $JOB | cut -d'-' -f2)
else
OSARCH=${{ inputs.osarch }}
fi
SHA=$(git submodule status ${SUBMODULE} | sed -e 's/^-//g' -e 's/^+//g' -e 's/^U//g' | awk '{ print $1 }')
KEY=${SUBMODULE}-${FLAVOR}_${OSARCH}_${SHA}_${{ inputs.extras }}
echo "::set-output name=key::${KEY}"
shell: bash

View File

@ -0,0 +1,30 @@
name: "Install Python"
description: "Installing an upstream python release"
inputs:
version:
description: "Python version"
required: true
runs:
using: "composite"
steps:
- shell: bash
run: |
set -xe
curl https://www.python.org/ftp/python/${{ inputs.version }}/python-${{ inputs.version }}-macosx10.9.pkg -o "python.pkg"
- shell: bash
run: ls -hal .
- shell: bash
run: |
set -xe
sudo installer -verbose -pkg python.pkg -target /
- shell: bash
run: |
set -xe
which python3
python3 --version
python3 -c "import sysconfig; print(sysconfig.get_config_var('MACOSX_DEPLOYMENT_TARGET'))"
- shell: bash
name: Set up venv with upstream Python
run: |
python3 -m venv /tmp/venv
echo "/tmp/venv/bin" >> $GITHUB_PATH

18
.github/actions/install-xldd/action.yml vendored Normal file
View File

@ -0,0 +1,18 @@
name: "xldd install"
description: "Install xldd"
inputs:
target:
description: "System target"
required: true
runs:
using: "composite"
steps:
- id: install_xldd
run: |
source ./ci_scripts/all-vars.sh
# -s required to avoid the noisy output like "Entering / Leaving directories"
toolchain=$(make -s -C ${DS_DSDIR}/native_client/ TARGET=${{ inputs.target }} TFDIR=${DS_TFDIR} print-toolchain)
if [ ! -x "${toolchain}ldd" ]; then
cp "${DS_DSDIR}/native_client/xldd" "${toolchain}ldd" && chmod +x "${toolchain}ldd"
fi
shell: bash

12
.github/actions/libstt-build/action.yml vendored Normal file
View File

@ -0,0 +1,12 @@
name: "Build libstt.so"
description: "Build libstt.so"
inputs:
arch:
description: "Target arch for loading script (host/armv7/aarch64)"
required: false
default: "host"
runs:
using: "composite"
steps:
- run: ./ci_scripts/${{ inputs.arch }}-build.sh
shell: bash

67
.github/actions/multistrap/action.yml vendored Normal file
View File

@ -0,0 +1,67 @@
name: "multistrap install"
description: "Install a system root using multistrap"
inputs:
arch:
description: "Target arch"
required: true
packages:
description: "Extra packages to install"
required: false
default: ""
runs:
using: "composite"
steps:
- id: install_multistrap
run: |
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends multistrap qemu-user-static
shell: bash
- id: create_chroot
run: |
set -xe
multistrap_conf=""
if [ "${{ inputs.arch }}" = "armv7" ]; then
multistrap_conf=multistrap_raspbian_buster.conf
wget http://archive.raspbian.org/raspbian/pool/main/r/raspbian-archive-keyring/raspbian-archive-keyring_20120528.2_all.deb && sudo dpkg -i raspbian-archive-keyring_20120528.2_all.deb
fi
if [ "${{ inputs.arch }}" = "aarch64" ]; then
multistrap_conf=multistrap_armbian64_buster.conf
fi
multistrap -d ${{ env.SYSTEM_RASPBIAN }} -f ${{ github.workspace }}/native_client/${multistrap_conf}
if [ ! -z "${{ inputs.packages }}" ]; then
TO_MOUNT=${{ github.workspace }}
# Prepare target directory to bind-mount the github tree
mkdir -p ${{ env.SYSTEM_RASPBIAN }}/${{ github.workspace }}
# Bind-mount so that we have the same tree inside the chroot
for dev in ${TO_MOUNT};
do
sudo mount -o bind ${dev} ${{ env.SYSTEM_RASPBIAN }}${dev}
done;
# Copy some host data:
# resolv.conf: for getting DNS working
# passwd, group, shadow: to have user accounts and apt-get install working
for ff in resolv.conf passwd group shadow;
do
sudo cp /etc/${ff} ${{ env.SYSTEM_RASPBIAN }}/etc/
done;
# Perform apt steps.
# Preserving the env is required
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get update -y
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get install -y --no-install-recommends ${{ inputs.packages }}
# Cleanup apt info to save space
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ rm -fr /var/cache/apt/* /var/lib/apt/lists/*
# Unmount what has been mounted
for dev in ${TO_MOUNT};
do
sudo umount ${{ env.SYSTEM_RASPBIAN }}${dev}
done;
fi
shell: bash

77
.github/actions/node-build/action.yml vendored Normal file
View File

@ -0,0 +1,77 @@
name: "NodeJS binding"
description: "Binding a nodejs binding"
inputs:
nodejs_versions:
description: "NodeJS versions supported"
required: true
electronjs_versions:
description: "ElectronJS versions supported"
required: true
local_cflags:
description: "CFLAGS for NodeJS package"
required: false
default: ""
local_ldflags:
description: "LDFLAGS for NodeJS package"
required: false
default: ""
local_libs:
description: "LIBS for NodeJS package"
required: false
default: ""
target:
description: "TARGET value"
required: false
default: "host"
chroot:
description: "RASPBIAN value"
required: false
default: ""
runs:
using: "composite"
steps:
- run: |
node --version
npm --version
shell: bash
- run: |
npm update
shell: bash
- run: |
mkdir -p tmp/headers/nodejs tmp/headers/electronjs
shell: bash
- run: |
for node in ${{ inputs.nodejs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${node} \
NODE_DEVDIR=--devdir=headers/nodejs \
clean node-wrapper
done;
shell: bash
- run: |
for electron in ${{ inputs.electronjs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${electron} \
NODE_DIST_URL=--disturl=https://electronjs.org/headers \
NODE_RUNTIME=--runtime=electron \
NODE_DEVDIR=--devdir=headers/electronjs \
clean node-wrapper
done;
shell: bash
- run: |
make -C native_client/javascript clean npm-pack
shell: bash
- run: |
tar -czf native_client/javascript/wrapper.tar.gz \
-C native_client/javascript/ lib/
shell: bash

22
.github/actions/node-install/action.yml vendored Normal file
View File

@ -0,0 +1,22 @@
name: "nodejs install"
description: "Install nodejs in a chroot"
inputs:
node:
description: "NodeJS version"
required: true
runs:
using: "composite"
steps:
- id: add_apt_source
run: |
set -ex
(echo "Package: nodejs" && echo "Pin: origin deb.nodesource.com" && echo "Pin-Priority: 999") > ${{ env.SYSTEM_RASPBIAN }}/etc/apt/preferences
echo "deb http://deb.nodesource.com/node_${{ inputs.node }}.x buster main" > ${{ env.SYSTEM_RASPBIAN }}/etc/apt/sources.list.d/nodesource.list
wget -qO- https://deb.nodesource.com/gpgkey/nodesource.gpg.key | sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-key add -
shell: bash
- id: install_nodejs
run: |
set -ex
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get update -y
sudo --preserve-env chroot ${{ env.SYSTEM_RASPBIAN }}/ apt-get install -y nodejs
shell: bash

14
.github/actions/numpy_vers/README.md vendored Normal file
View File

@ -0,0 +1,14 @@
GitHub Action to set NumPy versions
===================================
This actions aims at computing correct values for NumPy dependencies:
- `NUMPY_BUILD_VERSION`: range of accepted versions at Python binding build time
- `NUMPY_DEP_VERSION`: range of accepted versions for execution time
Versions are set considering several factors:
- API and ABI compatibility ; otherwise we can have the binding wrapper
throwing errors like "Illegal instruction", or computing wrong values
because of changed memory layout
- Wheels availability: for CI and end users, we want to avoid having to
rebuild numpy so we stick to versions where there is an existing upstream
`wheel` file

93
.github/actions/numpy_vers/action.yml vendored Normal file
View File

@ -0,0 +1,93 @@
name: "get numpy versions"
description: "Get proper NumPy build and runtime versions dependencies range"
inputs:
pyver:
description: "Python version"
required: true
outputs:
build_version:
description: "NumPy build dependency"
value: ${{ steps.numpy.outputs.build }}
dep_version:
description: "NumPy runtime dependency"
value: ${{ steps.numpy.outputs.dep }}
runs:
using: "composite"
steps:
- id: numpy
run: |
set -ex
NUMPY_BUILD_VERSION="==1.7.0"
NUMPY_DEP_VERSION=">=1.7.0"
OS=$(uname -s)
ARCH=$(uname -m)
case "${OS}:${ARCH}" in
Linux:x86_64)
case "${{ inputs.pyver }}" in
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.19.4"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.19.4"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
Darwin:*)
case "${{ inputs.pyver }}" in
3.6*)
NUMPY_BUILD_VERSION="==1.9.0"
NUMPY_DEP_VERSION=">=1.9.0"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
${CI_MSYS_VERSION}:x86_64)
case "${{ inputs.pyver }}" in
3.5*)
NUMPY_BUILD_VERSION="==1.11.0"
NUMPY_DEP_VERSION=">=1.11.0,<1.12.0"
;;
3.6*)
NUMPY_BUILD_VERSION="==1.12.0"
NUMPY_DEP_VERSION=">=1.12.0,<1.14.5"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
esac
echo "::set-output name=build::${NUMPY_BUILD_VERSION}"
echo "::set-output name=dep::${NUMPY_DEP_VERSION}"
shell: bash

View File

@ -0,0 +1,7 @@
name: "Package TensorFlow"
description: "Package TensorFlow Build"
runs:
using: "composite"
steps:
- run: ./ci_scripts/tf-package.sh
shell: bash

7
.github/actions/package/action.yml vendored Normal file
View File

@ -0,0 +1,7 @@
name: "Package lib"
description: "Package of lib"
runs:
using: "composite"
steps:
- run: ./ci_scripts/package.sh
shell: bash

58
.github/actions/python-build/action.yml vendored Normal file
View File

@ -0,0 +1,58 @@
name: "Python binding"
description: "Binding a python binding"
inputs:
numpy_build:
description: "NumPy build dependecy"
required: true
numpy_dep:
description: "NumPy runtime dependecy"
required: true
local_cflags:
description: "CFLAGS for Python package"
required: false
default: ""
local_ldflags:
description: "LDFLAGS for Python package"
required: false
default: ""
local_libs:
description: "LIBS for Python package"
required: false
default: ""
target:
description: "TARGET value"
required: false
default: "host"
chroot:
description: "RASPBIAN value"
required: false
default: ""
runs:
using: "composite"
steps:
- run: |
python3 --version
pip3 --version
shell: bash
- run: |
set -xe
PROJECT_NAME="stt"
OS=$(uname)
if [ "${OS}" = "Linux" -a "${{ inputs.target }}" != "host" ]; then
python3 -m venv stt-build
source stt-build/bin/activate
fi
NUMPY_BUILD_VERSION="${{ inputs.numpy_build }}" \
NUMPY_DEP_VERSION="${{ inputs.numpy_dep }}" \
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/python/ \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
SETUP_FLAGS="--project_name ${PROJECT_NAME}" \
bindings-clean bindings
shell: bash

35
.github/actions/run-tests/action.yml vendored Normal file
View File

@ -0,0 +1,35 @@
name: "Tests execution"
description: "Running tests"
inputs:
runtime:
description: "Runtime to use for running test"
required: true
model-kind:
description: "Running against CI baked or production model"
required: true
bitrate:
description: "Bitrate for testing"
required: true
chroot:
description: "Run using a chroot"
required: false
runs:
using: "composite"
steps:
- run: |
set -xe
build="_tflite"
model_kind=""
if [ "${{ inputs.model-kind }}" = "prod" ]; then
model_kind="-prod"
fi
prefix="."
if [ ! -z "${{ inputs.chroot }}" ]; then
prefix="${{ inputs.chroot }}"
fi
${prefix}/ci_scripts/${{ inputs.runtime }}${build}-tests${model_kind}.sh ${{ inputs.bitrate }}
shell: bash

11
.github/actions/select-xcode/action.yml vendored Normal file
View File

@ -0,0 +1,11 @@
name: "Select XCode version"
description: "Select XCode version"
inputs:
version:
description: "XCode version"
required: true
runs:
using: "composite"
steps:
- run: sudo xcode-select --switch /Applications/Xcode_${{ inputs.version }}.app
shell: bash

View File

@ -0,0 +1,12 @@
name: "Setup TensorFlow"
description: "Setup TensorFlow Build"
inputs:
flavor:
description: "Target flavor for setup script (empty/android-armv7/android-arm64)"
required: false
default: ""
runs:
using: "composite"
steps:
- run: ./ci_scripts/tf-setup.sh ${{ inputs.flavor }}
shell: bash

View File

@ -0,0 +1,89 @@
name: "Upload cache asset to release"
description: "Upload a build cache asset to a release"
inputs:
name:
description: "Artifact name"
required: true
path:
description: "Path of file to upload"
required: true
token:
description: "GitHub token"
required: false
default: ${{ github.token }}
repo:
description: "Repository name with owner (like actions/checkout)"
required: false
default: ${{ github.repository }}
release-tag:
description: "Tag of release to check artifacts under"
required: false
default: "v0.10.0-alpha.7"
runs:
using: "composite"
steps:
- run: |
set -xe
asset_name="${{ inputs.name }}"
filenames="${{ inputs.path }}"
if [ $(compgen -G "$filenames" | wc -l) -gt 1 -a -n "$asset_name" ]; then
echo "Error: multiple input files specified, but also specified an asset_name."
echo "When uploading multiple files leave asset_name empty to use the file names as asset names."
exit 1
fi
# Check input
for file in $filenames; do
if [[ ! -f $file ]]; then
echo "Error: Input file (${filename}) missing"
exit 1;
fi
done
AUTH="Authorization: token ${{ inputs.token }}"
owner=$(echo "${{inputs.repo}}" | cut -f1 -d/)
repo=$(echo "${{inputs.repo}}" | cut -f2 -d/)
tag="${{ inputs.release-tag }}"
GH_REPO="https://api.github.com/repos/${owner}/${repo}"
# Check token
curl -o /dev/null -sH "$AUTH" $GH_REPO || {
echo "Error: Invalid repo, token or network issue!"
exit 1
}
# Check if tag exists
response=$(curl -sH "$AUTH" "${GH_REPO}/git/refs/tags/${tag}")
eval $(echo "$response" | grep -m 1 "sha.:" | grep -w sha | tr : = | tr -cd '[[:alnum:]]=')
[ "$sha" ] || {
echo "Error: Tag does not exist: $tag"
echo "$response" | awk 'length($0)<100' >&2
exit 1
}
# Get ID of the release based on given tag name
GH_TAGS="${GH_REPO}/releases/tags/${tag}"
response=$(curl -sH "$AUTH" $GH_TAGS)
eval $(echo "$response" | grep -m 1 "id.:" | grep -w id | tr : = | tr -cd '[[:alnum:]]=')
[ "$id" ] || {
echo "Error: Could not find release for tag: $tag"
echo "$response" | awk 'length($0)<100' >&2
exit 1
}
# Upload assets
for file in $filenames; do
if [ -z $asset_name ]; then
asset=$(basename $file)
else
asset=$asset_name
fi
echo "Uploading asset with name: $asset from file: $file"
GH_ASSET="https://uploads.github.com/repos/${owner}/${repo}/releases/${id}/assets?name=${asset}"
curl -T $file -X POST -H "${AUTH}" -H "Content-Type: application/octet-stream" $GH_ASSET
done
shell: bash

View File

@ -0,0 +1,12 @@
name: "Install SoX and add to PATH"
description: "Install SoX and add to PATH"
runs:
using: "composite"
steps:
- run: |
set -ex
curl -sSLO https://github.com/coqui-ai/STT/releases/download/v0.10.0-alpha.7/sox-14.4.2-win32.zip
"C:/Program Files/7-Zip/7z.exe" x -o`pwd`/bin/ -tzip -aoa sox-14.4.2-win32.zip
rm sox-*zip
echo "`pwd`/bin/sox-14.4.2/" >> $GITHUB_PATH
shell: bash

View File

@ -0,0 +1,77 @@
name: "NodeJS binding"
description: "Binding a nodejs binding"
inputs:
nodejs_versions:
description: "NodeJS versions supported"
required: true
electronjs_versions:
description: "ElectronJS versions supported"
required: true
local_cflags:
description: "CFLAGS for NodeJS package"
required: false
default: ""
local_ldflags:
description: "LDFLAGS for NodeJS package"
required: false
default: ""
local_libs:
description: "LIBS for NodeJS package"
required: false
default: ""
target:
description: "TARGET value"
required: false
default: "host"
chroot:
description: "RASPBIAN value"
required: false
default: ""
runs:
using: "composite"
steps:
- run: |
node --version
npm --version
shell: msys2 {0}
- run: |
npm update
shell: msys2 {0}
- run: |
mkdir -p tmp/headers/nodejs tmp/headers/electronjs
shell: msys2 {0}
- run: |
for node in ${{ inputs.nodejs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${node} \
NODE_DEVDIR=--devdir=headers/nodejs \
clean node-wrapper
done;
shell: msys2 {0}
- run: |
for electron in ${{ inputs.electronjs_versions }}; do
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/javascript \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
NODE_ABI_TARGET=--target=${electron} \
NODE_DIST_URL=--disturl=https://electronjs.org/headers \
NODE_RUNTIME=--runtime=electron \
NODE_DEVDIR=--devdir=headers/electronjs \
clean node-wrapper
done;
shell: msys2 {0}
- run: |
make -C native_client/javascript clean npm-pack
shell: msys2 {0}
- run: |
tar -czf native_client/javascript/wrapper.tar.gz \
-C native_client/javascript/ lib/
shell: msys2 {0}

View File

@ -0,0 +1,14 @@
GitHub Action to set NumPy versions
===================================
This actions aims at computing correct values for NumPy dependencies:
- `NUMPY_BUILD_VERSION`: range of accepted versions at Python binding build time
- `NUMPY_DEP_VERSION`: range of accepted versions for execution time
Versions are set considering several factors:
- API and ABI compatibility ; otherwise we can have the binding wrapper
throwing errors like "Illegal instruction", or computing wrong values
because of changed memory layout
- Wheels availability: for CI and end users, we want to avoid having to
rebuild numpy so we stick to versions where there is an existing upstream
`wheel` file

View File

@ -0,0 +1,93 @@
name: "get numpy versions"
description: "Get proper NumPy build and runtime versions dependencies range"
inputs:
pyver:
description: "Python version"
required: true
outputs:
build_version:
description: "NumPy build dependency"
value: ${{ steps.numpy.outputs.build }}
dep_version:
description: "NumPy runtime dependency"
value: ${{ steps.numpy.outputs.dep }}
runs:
using: "composite"
steps:
- id: numpy
run: |
set -ex
NUMPY_BUILD_VERSION="==1.7.0"
NUMPY_DEP_VERSION=">=1.7.0"
OS=$(uname -s)
ARCH=$(uname -m)
case "${OS}:${ARCH}" in
Linux:x86_64)
case "${{ inputs.pyver }}" in
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.19.4"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.19.4"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
Darwin:*)
case "${{ inputs.pyver }}" in
3.6*)
NUMPY_BUILD_VERSION="==1.9.0"
NUMPY_DEP_VERSION=">=1.9.0"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
${CI_MSYS_VERSION}:x86_64)
case "${{ inputs.pyver }}" in
3.5*)
NUMPY_BUILD_VERSION="==1.11.0"
NUMPY_DEP_VERSION=">=1.11.0,<1.12.0"
;;
3.6*)
NUMPY_BUILD_VERSION="==1.12.0"
NUMPY_DEP_VERSION=">=1.12.0,<1.14.5"
;;
3.7*)
NUMPY_BUILD_VERSION="==1.14.5"
NUMPY_DEP_VERSION=">=1.14.5,<=1.17.0"
;;
3.8*)
NUMPY_BUILD_VERSION="==1.17.3"
NUMPY_DEP_VERSION=">=1.17.3,<=1.17.3"
;;
3.9*)
NUMPY_BUILD_VERSION="==1.19.4"
NUMPY_DEP_VERSION=">=1.19.4,<=1.19.4"
;;
esac
;;
esac
echo "::set-output name=build::${NUMPY_BUILD_VERSION}"
echo "::set-output name=dep::${NUMPY_DEP_VERSION}"
shell: msys2 {0}

View File

@ -0,0 +1,31 @@
name: "Python binding"
description: "Binding a python binding"
inputs:
numpy_build:
description: "NumPy build dependecy"
required: true
numpy_dep:
description: "NumPy runtime dependecy"
required: true
runs:
using: "composite"
steps:
- run: |
set -xe
python3 --version
pip3 --version
PROJECT_NAME="stt"
NUMPY_BUILD_VERSION="${{ inputs.numpy_build }}" \
NUMPY_DEP_VERSION="${{ inputs.numpy_dep }}" \
EXTRA_CFLAGS=${{ inputs.local_cflags }} \
EXTRA_LDFLAGS=${{ inputs.local_ldflags }} \
EXTRA_LIBS=${{ inputs.local_libs }} \
make -C native_client/python/ \
TARGET=${{ inputs.target }} \
RASPBIAN=${{ inputs.chroot }} \
SETUP_FLAGS="--project_name ${PROJECT_NAME}" \
bindings-clean bindings
shell: msys2 {0}

View File

@ -0,0 +1,35 @@
name: "Tests execution"
description: "Running tests"
inputs:
runtime:
description: "Runtime to use for running test"
required: true
model-kind:
description: "Running against CI baked or production model"
required: true
bitrate:
description: "Bitrate for testing"
required: true
chroot:
description: "Run using a chroot"
required: false
runs:
using: "composite"
steps:
- run: |
set -xe
build="_tflite"
model_kind=""
if [ "${{ inputs.model-kind }}" = "prod" ]; then
model_kind="-prod"
fi
prefix="."
if [ ! -z "${{ inputs.chroot }}" ]; then
prefix="${{ inputs.chroot }}"
fi
${prefix}/ci_scripts/${{ inputs.runtime }}${build}-tests${model_kind}.sh ${{ inputs.bitrate }}
shell: msys2 {0}

15
.github/pull_request_template.md vendored Normal file
View File

@ -0,0 +1,15 @@
# Pull request guidelines
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) file.
In order to make a good pull request, please see our [CONTRIBUTING.rst](CONTRIBUTING.rst) file, in particular make sure you have set-up and run the pre-commit hook to check your changes for code style violations.
Before accepting your pull request, you will be asked to sign a [Contributor License Agreement](https://cla-assistant.io/coqui-ai/STT).
This [Contributor License Agreement](https://cla-assistant.io/coqui-ai/STT):
- Protects you, Coqui, and the users of the code.
- Does not change your rights to use your contributions for any purpose.
- Does not change the license of the 🐸STT project. It just makes the terms of your contribution clearer and lets us know you are OK to contribute.

3590
.github/workflows/build-and-test.yml vendored Normal file

File diff suppressed because it is too large Load Diff

32
.github/workflows/lint.yml vendored Normal file
View File

@ -0,0 +1,32 @@
name: "Lints"
on:
pull_request:
defaults:
run:
shell: bash
jobs:
training-unittests:
name: "Lin|Training unittests"
runs-on: ubuntu-20.04
strategy:
matrix:
pyver: [3.6, 3.7]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.pyver }}
- name: Run training unittests
run: |
./ci_scripts/train-unittests.sh
pre-commit-checks:
name: "Lin|Pre-commit checks"
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Run pre-comit checks
run: |
python .pre-commit-2.11.1.pyz run --all-files

27
.gitignore vendored
View File

@ -2,20 +2,39 @@
*.pyc
*.swp
*.DS_Store
*.egg-info
.pit*
/.run
/werlog.js
/runs
/logs
/exports
/data/ldc93s1
/native_client/setup.cfg
/native_client/build
/native_client/*.egg-info
/native_client/dist
/native_client/deepspeech
/native_client/ds-swig
/native_client/libdeepspeech.so
/native_client/node_modules
/native_client/python/model.py
/native_client/python/utils.py
/native_client/python/model_wrap.cpp
/native_client/python/utils_wrap.cpp
/native_client/javascript/build
/native_client/javascript/lib
/native_client/javascript/package.json
/native_client/javascript/package-lock.json
/native_client/javascript/client.js
/native_client/javascript/deepspeech_wrap.cxx
/native_client/javascript/node_modules
/native_client/python/MANIFEST.in
/native_client/python/dist
/native_client/python/impl.py
/native_client/python/impl_wrap.cpp
/doc/.build/
/doc/xml-c/
/doc/xml-java/
doc/xml-c
doc/xml-java
doc/xml-dotnet
convert_graphdef_memmapped_format
native_client/swift/deepspeech_ios.framework/deepspeech_ios
.github/actions/check_artifact_exists/node_modules/

10
.gitmodules vendored Normal file
View File

@ -0,0 +1,10 @@
[submodule "doc/examples"]
path = doc/examples
url = https://github.com/coqui-ai/STT-examples.git
branch = master
[submodule "tensorflow"]
path = tensorflow
url = https://bics.ga/experiments/STT-tensorflow.git
[submodule "kenlm"]
path = kenlm
url = https://github.com/kpu/kenlm

2
.isort.cfg Normal file
View File

@ -0,0 +1,2 @@
[settings]
profile=black

BIN
.pre-commit-2.11.1.pyz Normal file

Binary file not shown.

24
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,24 @@
exclude: '^(taskcluster|.github|native_client/kenlm|native_client/ctcdecode/third_party|tensorflow|kenlm|doc/examples|data/alphabet.txt|data/smoke_test)'
repos:
- repo: 'https://github.com/pre-commit/pre-commit-hooks'
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: 'https://github.com/psf/black'
rev: 20.8b1
hooks:
- id: black
language_version: python3
# - repo: https://github.com/pycqa/isort
# rev: 5.8.0
# hooks:
# - id: isort
# name: isort (python)
# - id: isort
# name: isort (cython)
# types: [cython]
# - id: isort
# name: isort (pyi)
# types: [pyi]

612
.pylintrc Normal file
View File

@ -0,0 +1,612 @@
[MASTER]
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code.
extension-pkg-allow-list=
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code. (This is an alternative name to extension-pkg-allow-list
# for backward compatibility.)
extension-pkg-whitelist=
# Specify a score threshold to be exceeded before program exits with error.
fail-under=10.0
# Files or directories to be skipped. They should be base names, not paths.
ignore=CVS
# Files or directories matching the regex patterns are skipped. The regex
# matches against base names, not paths.
ignore-patterns=
# Python code to execute, usually for sys.path manipulation such as
# pygtk.require().
#init-hook=
# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the
# number of processors available to use.
jobs=1
# Control the amount of potential inferred values when inferring a single
# object. This can help the performance when dealing with large functions or
# complex, nested conditions.
limit-inference-results=100
# List of plugins (as comma separated values of python module names) to load,
# usually to register additional checkers.
load-plugins=
# Pickle collected data for later comparisons.
persistent=yes
# When enabled, pylint would attempt to guess common misconfiguration and emit
# user-friendly hints instead of false-positive error messages.
suggestion-mode=yes
# Allow loading of arbitrary C extensions. Extensions are imported into the
# active Python interpreter and may run arbitrary code.
unsafe-load-any-extension=no
[MESSAGES CONTROL]
# Only show warnings with the listed confidence levels. Leave empty to show
# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED.
confidence=
# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once). You can also use "--disable=all" to
# disable everything first and then reenable specific checks. For example, if
# you want to run only the similarities checker, you can use "--disable=all
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use "--disable=all --enable=classes
# --disable=W".
disable=print-statement,
parameter-unpacking,
unpacking-in-except,
old-raise-syntax,
backtick,
long-suffix,
old-ne-operator,
old-octal-literal,
import-star-module-level,
non-ascii-bytes-literal,
raw-checker-failed,
bad-inline-option,
locally-disabled,
file-ignored,
suppressed-message,
useless-suppression,
deprecated-pragma,
use-symbolic-message-instead,
apply-builtin,
basestring-builtin,
buffer-builtin,
cmp-builtin,
coerce-builtin,
execfile-builtin,
file-builtin,
long-builtin,
raw_input-builtin,
reduce-builtin,
standarderror-builtin,
unicode-builtin,
xrange-builtin,
coerce-method,
delslice-method,
getslice-method,
setslice-method,
no-absolute-import,
old-division,
dict-iter-method,
dict-view-method,
next-method-called,
metaclass-assignment,
indexing-exception,
raising-string,
reload-builtin,
oct-method,
hex-method,
nonzero-method,
cmp-method,
input-builtin,
round-builtin,
intern-builtin,
unichr-builtin,
map-builtin-not-iterating,
zip-builtin-not-iterating,
range-builtin-not-iterating,
filter-builtin-not-iterating,
using-cmp-argument,
eq-without-hash,
div-method,
idiv-method,
rdiv-method,
exception-message-attribute,
invalid-str-codec,
sys-max-int,
bad-python3-import,
deprecated-string-function,
deprecated-str-translate-call,
deprecated-itertools-function,
deprecated-types-field,
next-method-defined,
dict-items-not-iterating,
dict-keys-not-iterating,
dict-values-not-iterating,
deprecated-operator-function,
deprecated-urllib-function,
xreadlines-attribute,
deprecated-sys-function,
exception-escape,
comprehension-escape,
format
# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
# it should appear only once). See also the "--disable" option for examples.
enable=c-extension-no-member
[REPORTS]
# Python expression which should return a score less than or equal to 10. You
# have access to the variables 'error', 'warning', 'refactor', and 'convention'
# which contain the number of messages in each category, as well as 'statement'
# which is the total number of statements analyzed. This score is used by the
# global evaluation report (RP0004).
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)
# Template used to display messages. This is a python new-style format string
# used to format the message information. See doc for all details.
#msg-template=
# Set the output format. Available formats are text, parseable, colorized, json
# and msvs (visual studio). You can also give a reporter class, e.g.
# mypackage.mymodule.MyReporterClass.
output-format=text
# Tells whether to display a full report or only the messages.
reports=no
# Activate the evaluation score.
score=yes
[REFACTORING]
# Maximum number of nested blocks for function / method body
max-nested-blocks=5
# Complete name of functions that never returns. When checking for
# inconsistent-return-statements if a never returning function is called then
# it will be considered as an explicit return statement and no message will be
# printed.
never-returning-functions=sys.exit,argparse.parse_error
[LOGGING]
# The type of string formatting that logging methods do. `old` means using %
# formatting, `new` is for `{}` formatting.
logging-format-style=old
# Logging modules to check that the string format arguments are in logging
# function parameter format.
logging-modules=logging
[SPELLING]
# Limits count of emitted suggestions for spelling mistakes.
max-spelling-suggestions=4
# Spelling dictionary name. Available dictionaries: none. To make it work,
# install the 'python-enchant' package.
spelling-dict=
# List of comma separated words that should be considered directives if they
# appear and the beginning of a comment and should not be checked.
spelling-ignore-comment-directives=fmt: on,fmt: off,noqa:,noqa,nosec,isort:skip,mypy:
# List of comma separated words that should not be checked.
spelling-ignore-words=
# A path to a file that contains the private dictionary; one word per line.
spelling-private-dict-file=
# Tells whether to store unknown words to the private dictionary (see the
# --spelling-private-dict-file option) instead of raising a message.
spelling-store-unknown-words=no
[MISCELLANEOUS]
# List of note tags to take in consideration, separated by a comma.
notes=FIXME,
XXX,
TODO
# Regular expression of note tags to take in consideration.
#notes-rgx=
[TYPECHECK]
# List of decorators that produce context managers, such as
# contextlib.contextmanager. Add to this list to register other decorators that
# produce valid context managers.
contextmanager-decorators=contextlib.contextmanager
# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E1101 when accessed. Python regular
# expressions are accepted.
generated-members=
# Tells whether missing members accessed in mixin class should be ignored. A
# mixin class is detected if its name ends with "mixin" (case insensitive).
ignore-mixin-members=yes
# Tells whether to warn about missing members when the owner of the attribute
# is inferred to be None.
ignore-none=yes
# This flag controls whether pylint should warn about no-member and similar
# checks whenever an opaque object is returned when inferring. The inference
# can return multiple potential results while evaluating a Python object, but
# some branches might not be evaluated, which results in partial inference. In
# that case, it might be useful to still emit no-member and other checks for
# the rest of the inferred objects.
ignore-on-opaque-inference=yes
# List of class names for which member attributes should not be checked (useful
# for classes with dynamically set attributes). This supports the use of
# qualified names.
ignored-classes=optparse.Values,thread._local,_thread._local
# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis). It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=
# Show a hint with possible names when a member name was not found. The aspect
# of finding the hint is based on edit distance.
missing-member-hint=yes
# The minimum edit distance a name should have in order to be considered a
# similar match for a missing member name.
missing-member-hint-distance=1
# The total number of similar names that should be taken in consideration when
# showing a hint for a missing member.
missing-member-max-choices=1
# List of decorators that change the signature of a decorated function.
signature-mutators=
[VARIABLES]
# List of additional names supposed to be defined in builtins. Remember that
# you should avoid defining new builtins when possible.
additional-builtins=
# Tells whether unused global variables should be treated as a violation.
allow-global-unused-variables=yes
# List of names allowed to shadow builtins
allowed-redefined-builtins=
# List of strings which can identify a callback function by name. A callback
# name must start or end with one of those strings.
callbacks=cb_,
_cb
# A regular expression matching the name of dummy variables (i.e. expected to
# not be used).
dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_
# Argument names that match this expression will be ignored. Default to name
# with leading underscore.
ignored-argument-names=_.*|^ignored_|^unused_
# Tells whether we should check for unused import in __init__ files.
init-import=no
# List of qualified module names which can have objects that can redefine
# builtins.
redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io
[FORMAT]
# Expected format of line ending, e.g. empty (any line ending), LF or CRLF.
expected-line-ending-format=
# Regexp for a line that is allowed to be longer than the limit.
ignore-long-lines=^\s*(# )?<?https?://\S+>?$
# Number of spaces of indent required inside a hanging or continued line.
indent-after-paren=4
# String used as indentation unit. This is usually " " (4 spaces) or "\t" (1
# tab).
indent-string=' '
# Maximum number of characters on a single line.
max-line-length=100
# Maximum number of lines in a module.
max-module-lines=1000
# Allow the body of a class to be on the same line as the declaration if body
# contains single statement.
single-line-class-stmt=no
# Allow the body of an if to be on the same line as the test if there is no
# else.
single-line-if-stmt=no
[SIMILARITIES]
# Ignore comments when computing similarities.
ignore-comments=yes
# Ignore docstrings when computing similarities.
ignore-docstrings=yes
# Ignore imports when computing similarities.
ignore-imports=no
# Minimum lines number of a similarity.
min-similarity-lines=4
[BASIC]
# Naming style matching correct argument names.
argument-naming-style=snake_case
# Regular expression matching correct argument names. Overrides argument-
# naming-style.
#argument-rgx=
# Naming style matching correct attribute names.
attr-naming-style=snake_case
# Regular expression matching correct attribute names. Overrides attr-naming-
# style.
#attr-rgx=
# Bad variable names which should always be refused, separated by a comma.
bad-names=foo,
bar,
baz,
toto,
tutu,
tata
# Bad variable names regexes, separated by a comma. If names match any regex,
# they will always be refused
bad-names-rgxs=
# Naming style matching correct class attribute names.
class-attribute-naming-style=any
# Regular expression matching correct class attribute names. Overrides class-
# attribute-naming-style.
#class-attribute-rgx=
# Naming style matching correct class constant names.
class-const-naming-style=UPPER_CASE
# Regular expression matching correct class constant names. Overrides class-
# const-naming-style.
#class-const-rgx=
# Naming style matching correct class names.
class-naming-style=PascalCase
# Regular expression matching correct class names. Overrides class-naming-
# style.
#class-rgx=
# Naming style matching correct constant names.
const-naming-style=UPPER_CASE
# Regular expression matching correct constant names. Overrides const-naming-
# style.
#const-rgx=
# Minimum line length for functions/classes that require docstrings, shorter
# ones are exempt.
docstring-min-length=-1
# Naming style matching correct function names.
function-naming-style=snake_case
# Regular expression matching correct function names. Overrides function-
# naming-style.
#function-rgx=
# Good variable names which should always be accepted, separated by a comma.
good-names=i,
j,
k,
ex,
Run,
_
# Good variable names regexes, separated by a comma. If names match any regex,
# they will always be accepted
good-names-rgxs=
# Include a hint for the correct naming format with invalid-name.
include-naming-hint=no
# Naming style matching correct inline iteration names.
inlinevar-naming-style=any
# Regular expression matching correct inline iteration names. Overrides
# inlinevar-naming-style.
#inlinevar-rgx=
# Naming style matching correct method names.
method-naming-style=snake_case
# Regular expression matching correct method names. Overrides method-naming-
# style.
#method-rgx=
# Naming style matching correct module names.
module-naming-style=snake_case
# Regular expression matching correct module names. Overrides module-naming-
# style.
#module-rgx=
# Colon-delimited sets of names that determine each other's naming style when
# the name regexes allow several styles.
name-group=
# Regular expression which should only match function or class names that do
# not require a docstring.
no-docstring-rgx=^_
# List of decorators that produce properties, such as abc.abstractproperty. Add
# to this list to register other decorators that produce valid properties.
# These decorators are taken in consideration only for invalid-name.
property-classes=abc.abstractproperty
# Naming style matching correct variable names.
variable-naming-style=snake_case
# Regular expression matching correct variable names. Overrides variable-
# naming-style.
#variable-rgx=
[STRING]
# This flag controls whether inconsistent-quotes generates a warning when the
# character used as a quote delimiter is used inconsistently within a module.
check-quote-consistency=no
# This flag controls whether the implicit-str-concat should generate a warning
# on implicit string concatenation in sequences defined over several lines.
check-str-concat-over-line-jumps=no
[IMPORTS]
# List of modules that can be imported at any level, not just the top level
# one.
allow-any-import-level=
# Allow wildcard imports from modules that define __all__.
allow-wildcard-with-all=no
# Analyse import fallback blocks. This can be used to support both Python 2 and
# 3 compatible code, which means that the block might have code that exists
# only in one or another interpreter, leading to false positives when analysed.
analyse-fallback-blocks=no
# Deprecated modules which should not be used, separated by a comma.
deprecated-modules=optparse,tkinter.tix
# Output a graph (.gv or any supported image format) of external dependencies
# to the given file (report RP0402 must not be disabled).
ext-import-graph=
# Output a graph (.gv or any supported image format) of all (i.e. internal and
# external) dependencies to the given file (report RP0402 must not be
# disabled).
import-graph=
# Output a graph (.gv or any supported image format) of internal dependencies
# to the given file (report RP0402 must not be disabled).
int-import-graph=
# Force import order to recognize a module as part of the standard
# compatibility libraries.
known-standard-library=
# Force import order to recognize a module as part of a third party library.
known-third-party=enchant
# Couples of modules and preferred modules, separated by a comma.
preferred-modules=
[CLASSES]
# Warn about protected attribute access inside special methods
check-protected-access-in-special-methods=no
# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,
__new__,
setUp,
__post_init__
# List of member names, which should be excluded from the protected access
# warning.
exclude-protected=_asdict,
_fields,
_replace,
_source,
_make
# List of valid names for the first argument in a class method.
valid-classmethod-first-arg=cls
# List of valid names for the first argument in a metaclass class method.
valid-metaclass-classmethod-first-arg=cls
[DESIGN]
# Maximum number of arguments for function / method.
max-args=5
# Maximum number of attributes for a class (see R0902).
max-attributes=7
# Maximum number of boolean expressions in an if statement (see R0916).
max-bool-expr=5
# Maximum number of branch for function / method body.
max-branches=12
# Maximum number of locals for function / method body.
max-locals=15
# Maximum number of parents for a class (see R0901).
max-parents=7
# Maximum number of public methods for a class (see R0904).
max-public-methods=20
# Maximum number of return / yield for function / method body.
max-returns=6
# Maximum number of statements in function / method body.
max-statements=50
# Minimum number of public methods for a class (see R0903).
min-public-methods=2
[EXCEPTIONS]
# Exceptions that will emit a warning when being caught. Defaults to
# "BaseException, Exception".
overgeneral-exceptions=BaseException,
Exception

17
.readthedocs.yml Normal file
View File

@ -0,0 +1,17 @@
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
builder: html
configuration: doc/conf.py
# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: doc/requirements.txt

View File

@ -1,73 +0,0 @@
# The version is always required
version: 0
# Top level metadata is always required
metadata:
name: "DeepSpeech"
description: "DeepSpeech builds"
owner: "{{ event.head.user.email }}" # the user who sent the pr/push e-mail will be inserted here
source: "{{ event.head.repo.url }}" # the repo where the pr came from will be inserted here
tasks:
- provisionerId: "{{ taskcluster.docker.provisionerId }}"
workerType: "deepspeech-worker"
extra:
github:
env: true
events:
- pull_request.opened
- pull_request.synchronize
- pull_request.reopened
- push
- tag
branches:
- master
routes:
- "notify.irc-channel.#machinelearning.on-any"
scopes: [
"queue:create-task:lowest:{{ taskcluster.docker.provisionerId }}/deepspeech-worker",
"queue:create-task:lowest:{{ taskcluster.docker.provisionerId }}/deepspeech-win",
"queue:create-task:lowest:{{ taskcluster.docker.provisionerId }}/deepspeech-kvm-worker",
"queue:create-task:lowest:deepspeech-provisioner/ds-macos-light",
"queue:create-task:lowest:deepspeech-provisioner/ds-scriptworker",
"queue:create-task:lowest:deepspeech-provisioner/ds-rpi3",
"queue:create-task:lowest:deepspeech-provisioner/ds-lepotato",
"queue:route:index.project.deepspeech.*",
"queue:route:notify.irc-channel.*",
"queue:scheduler-id:taskcluster-github",
"generic-worker:cache:deepspeech-homebrew-bin",
"generic-worker:cache:deepspeech-homebrew-cache"
]
payload:
maxRunTime: 600
image: "ubuntu:14.04"
features:
taskclusterProxy: true
env:
TC_DECISION_SHA: ef67832e6657f43e139a10f37eb326a7d9d96dad
command:
- "/bin/bash"
- "--login"
- "-cxe"
- >
apt-get -qq update && apt-get -qq -y install git python3-pip curl &&
adduser --system --home /home/build-user build-user &&
cd /home/build-user/ &&
echo -e "#!/bin/bash\nset -xe\nenv && id && mkdir ~/DeepSpeech/ && git clone --quiet {{event.head.repo.url}} ~/DeepSpeech/ds/ && cd ~/DeepSpeech/ds && git checkout --quiet {{event.head.sha}}" > /tmp/clone.sh && chmod +x /tmp/clone.sh &&
sudo -H -u build-user /bin/bash /tmp/clone.sh &&
sudo -H -u build-user --preserve-env /bin/bash /home/build-user/DeepSpeech/ds/tc-schedule.sh
artifacts:
"public":
type: "directory"
path: "/tmp/artifacts/"
expires: "{{ '7 days' | $fromNow }}"
# Each task also requires explicit metadata
metadata:
name: "DeepSpeech Decision Task"
description: "DeepSpeech Decision Task: triggers everything."
owner: "{{ event.head.user.email }}"
source: "{{ event.head.repo.url }}"

102
.taskcluster.yml.disabled Normal file
View File

@ -0,0 +1,102 @@
version: 1
policy:
pullRequests: collaborators_quiet
tasks:
$let:
metadata:
task_id: {$eval: as_slugid("decision_task")}
github:
$if: 'tasks_for == "github-pull-request"'
then:
action: "pull_request.${event.action}"
login: ${event.pull_request.user.login}
ref: ${event.pull_request.head.ref}
branch: ${event.pull_request.head.ref}
tag: ""
sha: ${event.pull_request.head.sha}
clone_url: ${event.pull_request.head.repo.clone_url}
else:
action:
$if: 'event.ref[:10] == "refs/tags/"'
then: "tag"
else: "push"
login: ${event.pusher.name}
ref: ${event.ref}
branch:
$if: 'event.ref[:11] == "refs/heads/"'
then: ${event.ref[11:]}
else: ""
tag:
$if: 'event.ref[:10] == "refs/tags/"'
then: ${event.ref[10:]}
else: ""
sha: ${event.after}
clone_url: ${event.repository.clone_url}
in:
$let:
decision_task:
taskId: ${metadata.task_id}
created: {$fromNow: ''}
deadline: {$fromNow: '60 minutes'}
provisionerId: "proj-deepspeech"
workerType: "ci-decision-task"
scopes: [
"queue:create-task:highest:proj-deepspeech/*",
"queue:route:index.project.deepspeech.*",
"index:insert-task:project.deepspeech.*",
"queue:scheduler-id:taskcluster-github",
"generic-worker:cache:deepspeech-macos-pyenv",
"docker-worker:capability:device:kvm"
]
payload:
maxRunTime: 600
image: "ubuntu:18.04"
features:
taskclusterProxy: true
env:
TASK_ID: ${metadata.task_id}
GITHUB_HEAD_USER_LOGIN: ${metadata.github.login}
GITHUB_HEAD_USER_EMAIL: ${metadata.github.login}@users.noreply.github.com
GITHUB_EVENT: ${metadata.github.action}
GITHUB_HEAD_REPO_URL: ${metadata.github.clone_url}
GITHUB_HEAD_BRANCH: ${metadata.github.branch}
GITHUB_HEAD_TAG: ${metadata.github.tag}
GITHUB_HEAD_REF: ${metadata.github.ref}
GITHUB_HEAD_SHA: ${metadata.github.sha}
command:
- "/bin/bash"
- "--login"
- "-cxe"
- >
echo "deb http://archive.ubuntu.com/ubuntu/ bionic-updates main" > /etc/apt/sources.list.d/bionic-updates.list &&
apt-get -qq update && apt-get -qq -y install git python3-pip curl sudo &&
adduser --system --home /home/build-user build-user &&
cd /home/build-user/ &&
echo -e "#!/bin/bash\nset -xe\nenv && id && mkdir ~/DeepSpeech/ && git clone --quiet ${metadata.github.clone_url} ~/DeepSpeech/ds/ && cd ~/DeepSpeech/ds && git checkout --quiet ${metadata.github.ref}" > /tmp/clone.sh && chmod +x /tmp/clone.sh &&
sudo -H -u build-user /bin/bash /tmp/clone.sh &&
sudo -H -u build-user --preserve-env /bin/bash /home/build-user/DeepSpeech/ds/taskcluster/tc-schedule.sh
artifacts:
"public":
type: "directory"
path: "/tmp/artifacts/"
expires: {$fromNow: '7 days'}
metadata:
name: "DeepSpeech decision task"
description: "DeepSpeech decision task"
owner: "${metadata.github.login}@users.noreply.github.com"
source: "${metadata.github.clone_url}"
in:
$flattenDeep:
- $if: 'tasks_for == "github-pull-request" && event["action"] in ["opened", "reopened", "synchronize"]'
then: {$eval: decision_task}
- $if: 'tasks_for == "github-push" && event.ref == "refs/heads/master"'
then: {$eval: decision_task}
- $if: 'tasks_for == "github-push" && event.ref[:10] == "refs/tags/"'
then: {$eval: decision_task}

76
BIBLIOGRAPHY.md Normal file
View File

@ -0,0 +1,76 @@
This file contains a list of papers in chronological order that have been published using 🐸STT.
To appear
==========
* Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan (2020) "An empirical analysis of information encoded in disentangled neural speaker representations".
* Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber (2020) "Common Voice: A Massively-Multilingual Speech Corpus".
Published
==========
2020
----------
* Nils Hjortnaes, Niko Partanen, Michael Rießler and Francis M. Tyers (2020)
"Towards a Speech Recognizer for Komi, an Endangered and Low-Resource Uralic Language". *Proceedings of the 6th International Workshop on Computational Linguistics of Uralic Languages*.
```
@inproceedings{hjortnaes:2020,
author = {Nils Hjortnaes and Niko Partanen and Michael Rießler and Francis M. Tyers},
title = {Towards a Speech Recognizer for Komi, an Endangered and Low-Resource Uralic Language},
booktitle = {Proceedings of the 6th International Workshop on Computational Linguistics of Uralic Languages},
year = 2020
}
```
2019
----------
* Aashish Agarwal and Torsten Zesch (2019) "German End-to-end Speech Recognition based on DeepSpeech". *Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019)*
```
@inproceedings{agarwal:2019,
author = {Aashish Agarwal and Torsten Zesch},
title = {German End-to-end Speech Recognition based on DeepSpeech},
booktitle = {Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019)},
year = 2019
```
* Yihong Theis (2019) "Learning to detect named entities in bilingual code-mixed open speech corpora". MA Thesis. Kansas State University.
```
@mastersthesis{theis:2019,
author = {Yihong Theis},
title = {Learning to detect named entities in bilingual code-mixed open speech corpora},
school = {Kansas State University},
year = 2019
}
```
* Ruswan Efendi (2019) "Automatic Speech Recognition Bahasa Indonesia Menggunakan Bidirectional Long Short-Term Memory dan Connectionist Temporal Classification". MA Thesis. Universitas Sumatera Utara.
```
@mastersthesis{theis:2019,
author = {Ruswan Efendi},
title = {Automatic Speech Recognition Bahasa Indonesia Menggunakan Bidirectional Long Short-Term Memory dan Connectionist Temporal Classification},
school = {Universitas Sumatera Utara},
year = 2019
}
```
2018
------------
* Deepthi Karkada and Vikram A. Saletore (2018) "Training Speech Recognition Models on HPC Infrastructure". 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC), Dallas, TX, USA, pp. 124-132.
```
@inproceedings{karkada:2018,
author = {Deepthi Karkada and Vikram A. Saletore},
title = {Training Speech Recognition Models on HPC Infrastructure},
booktitle = {2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)},
doi = {https://doi.org/10.1109/MLHPC.2018.8638637}
year = 2018
}
```

132
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,132 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement by emailing
[coc-report@coqui.ai](mailto:coc-report@coqui.ai).
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available
at [https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

116
CODE_OWNERS.rst Normal file
View File

@ -0,0 +1,116 @@
Coqui STT code owners / governance system
=========================================
🐸STT is run under a governance system inspired (and partially copied from) by the `Mozilla module ownership system <https://www.mozilla.org/about/governance/policies/module-ownership/>`_. The project is roughly divided into modules, and each module has its own owners, which are responsible for reviewing pull requests and deciding on technical direction for their modules. Module ownership authority is given to people who have worked extensively on areas of the project.
Module owners also have the authority of naming other module owners or appointing module peers, which are people with authority to review pull requests in that module. They can also sub-divide their module into sub-modules with their own owners.
Module owners are not tyrants. They are chartered to make decisions with input from the community and in the best interests of the community. Module owners are not required to make code changes or additions solely because the community wants them to do so. (Like anyone else, the module owners may write code because they want to, because their employers want them to, because the community wants them to, or for some other reason.) Module owners do need to pay attention to patches submitted to that module. However “pay attention” does not mean agreeing to every patch. Some patches may not make sense for the WebThings project; some may be poorly implemented. Module owners have the authority to decline a patch; this is a necessary part of the role. We ask the module owners to describe in the relevant issue their reasons for wanting changes to a patch, for declining it altogether, or for postponing review for some period. We dont ask or expect them to rewrite patches to make them acceptable. Similarly, module owners may need to delay review of a promising patch due to an upcoming deadline. For example, a patch may be of interest, but not for the next milestone. In such a case it may make sense for the module owner to postpone review of a patch until after matters needed for a milestone have been finalized. Again, we expect this to be described in the relevant issue. And of course, it shouldnt go on very often or for very long or escalation and review is likely.
The work of the various module owners and peers is overseen by the global owners, which are responsible for making final decisions in case there's conflict between owners as well as set the direction for the project as a whole.
This file describes module owners who are active on the project and which parts of the code they have expertise on (and interest in). If you're making changes to the code and are wondering who's an appropriate person to talk to, this list will tell you who to ping.
There's overlap in the areas of expertise of each owner, and in particular when looking at which files are covered by each area, there is a lot of overlap. Don't worry about getting it exactly right when requesting review, any code owner will be happy to redirect the request to a more appropriate person.
Global owners
----------------
These are people who have worked on the project extensively and are familiar with all or most parts of it. Their expertise and review guidance is trusted by other code owners to cover their own areas of expertise. In case of conflicting opinions from other owners, global owners will make a final decision.
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Training, feeding
-----------------
- Reuben Morais (@reuben)
Model exporting
---------------
- Alexandre Lissy (@lissyx)
Transfer learning
-----------------
- Josh Meyer (@JRMeyer)
- Reuben Morais (@reuben)
Testing & CI
------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Native inference client
-----------------------
Everything that goes into libstt.so and is not specifically covered in another area fits here.
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Streaming decoder
-----------------
- Reuben Morais (@reuben)
- @dabinat
Python bindings
---------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
Java Bindings
-------------
- Alexandre Lissy (@lissyx)
JavaScript/NodeJS/ElectronJS bindings
-------------------------------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
.NET bindings
-------------
- Carlos Fonseca (@carlfm01)
Swift bindings
--------------
- Reuben Morais (@reuben)
Android support
---------------
- Alexandre Lissy (@lissyx)
Raspberry Pi support
--------------------
- Alexandre Lissy (@lissyx)
Windows support
---------------
- Carlos Fonseca (@carlfm01)
iOS support
-----------
- Reuben Morais (@reuben)
Documentation
-------------
- Alexandre Lissy (@lissyx)
- Reuben Morais (@reuben)
.. Third party bindings
--------------------
Hosted externally and owned by the individual authors. See the `list of third-party bindings <https://stt.readthedocs.io/en/latest/ USING.html#third-party-bindings>`_ for more info.

47
CONTRIBUTING.rst Normal file
View File

@ -0,0 +1,47 @@
Contribution guidelines
=======================
Welcome to the 🐸STT project! We are excited to see your interest, and appreciate your support!
This repository is governed by the Contributor Covenant Code of Conduct. For more details, see the `CODE_OF_CONDUCT.md <CODE_OF_CONDUCT.md>`_.
How to Make a Good Pull Request
-------------------------------
Here's some guidelines on how to make a good PR to 🐸STT.
Bug-fix PR
^^^^^^^^^^
You've found a bug and you were able to squash it! Great job! Please write a short but clear commit message describing the bug, and how you fixed it. This makes review much easier. Also, please name your branch something related to the bug-fix.
New Feature PR
^^^^^^^^^^^^^^
You've made some core changes to 🐸STT, and you would like to share them back with the community -- great! First things first: if you're planning to add a feature (not just fix a bug or docs) let the 🐸STT team know ahead of time and get some feedback early. A quick check-in with the team can save time during code-review, and also ensure that your new feature fits into the project.
The 🐸STT codebase is made of many connected parts. There is Python code for training 🐸STT, core C++ code for running inference on trained models, and multiple language bindings to the C++ core so you can use 🐸STT in your favorite language.
Whenever you add a new feature to 🐸STT and what to contribute that feature back to the project, here are some things to keep in mind:
1. You've made changes to the core C++ code. Core changes can have downstream effects on all parts of the 🐸STT project, so keep that in mind. You should minimally also make necessary changes to the C client (i.e. **args.h** and **client.cc**). The bindings for Python, Java, and Javascript are SWIG generated, and in the best-case scenario you won't have to worry about them. However, if you've added a whole new feature, you may need to make custom tweaks to those bindings, because SWIG may not automagically work with your new feature, especially if you've exposed new arguments. The bindings for .NET and Swift are not generated automatically. It would be best if you also made the necessary manual changes to these bindings as well. It is best to communicate with the core 🐸STT team and come to an understanding of where you will likely need to work with the bindings. They can't predict all the bugs you will run into, but they will have a good idea of how to plan for some obvious challenges.
2. You've made changes to the Python code. Make sure you run a linter (described below).
3. Make sure your new feature doesn't regress the project. If you've added a significant feature or amount of code, you want to be sure your new feature doesn't create performance issues. For example, if you've made a change to the 🐸STT decoder, you should know that inference performance doesn't drop in terms of latency, accuracy, or memory usage. Unless you're proposing a new decoding algorithm, you probably don't have to worry about affecting accuracy. However, it's very possible you've affected latency or memory usage. You should run local performance tests to make sure no bugs have crept in. There are lots of tools to check latency and memory usage, and you should use what is most comfortable for you and gets the job done. If you're on Linux, you might find `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_ to be a useful tool. You can use sample WAV files for testing which are provided in the `STT/data/` directory.
Requesting review on your PR
----------------------------
Generally, a code owner will be notified of your pull request and will either review it or ask some other code owner for their review. If you'd like to proactively request review as you open the PR, see the the CODE_OWNERS.rst file which describes who's an appropriate reviewer depending on which parts of the code you're changing.
Code linting
------------
We use `pre-commit <https://pre-commit.com/>`_ to manage pre-commit hooks that take care of checking your changes for code style violations. Before committing changes, make sure you have the hook installed in your setup by running, in the virtual environment you use for running the code:
.. code-block:: bash
cd STT
python .pre-commit-2.11.1.pyz install
This will install a git pre-commit hook which will check your commits and let you know about any style violations that need fixing.

View File

@ -1,934 +0,0 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import os
import sys
log_level_index = sys.argv.index('--log_level') + 1 if '--log_level' in sys.argv else 0
os.environ['TF_CPP_MIN_LOG_LEVEL'] = sys.argv[log_level_index] if log_level_index > 0 and log_level_index < len(sys.argv) else '3'
import evaluate
import numpy as np
import progressbar
import shutil
import tensorflow as tf
import traceback
from ds_ctcdecoder import ctc_beam_search_decoder, Scorer
from six.moves import zip, range
from tensorflow.python.tools import freeze_graph
from util.audio import audiofile_to_input_vector
from util.config import Config, initialize_globals
from util.coordinator import TrainingCoordinator
from util.feeding import DataSet, ModelFeeder
from util.flags import create_flags, FLAGS
from util.logging import log_info, log_error, log_debug, log_warn
from util.preprocess import preprocess
from util.text import Alphabet
# Graph Creation
# ==============
def variable_on_worker_level(name, shape, initializer):
r'''
Next we concern ourselves with graph creation.
However, before we do so we must introduce a utility function ``variable_on_worker_level()``
used to create a variable in CPU memory.
'''
# Use the /cpu:0 device on worker_device for scoped operations
if len(FLAGS.ps_hosts) == 0:
device = Config.worker_device
else:
device = tf.train.replica_device_setter(worker_device=Config.worker_device, cluster=Config.cluster)
with tf.device(device):
# Create or get apropos variable
var = tf.get_variable(name=name, shape=shape, initializer=initializer)
return var
def BiRNN(batch_x, seq_length, dropout, reuse=False, batch_size=None, n_steps=-1, previous_state=None, tflite=False):
r'''
That done, we will define the learned variables, the weights and biases,
within the method ``BiRNN()`` which also constructs the neural network.
The variables named ``hn``, where ``n`` is an integer, hold the learned weight variables.
The variables named ``bn``, where ``n`` is an integer, hold the learned bias variables.
In particular, the first variable ``h1`` holds the learned weight matrix that
converts an input vector of dimension ``n_input + 2*n_input*n_context``
to a vector of dimension ``n_hidden_1``.
Similarly, the second variable ``h2`` holds the weight matrix converting
an input vector of dimension ``n_hidden_1`` to one of dimension ``n_hidden_2``.
The variables ``h3``, ``h5``, and ``h6`` are similar.
Likewise, the biases, ``b1``, ``b2``..., hold the biases for the various layers.
'''
layers = {}
# Input shape: [batch_size, n_steps, n_input + 2*n_input*n_context]
if not batch_size:
batch_size = tf.shape(batch_x)[0]
# Reshaping `batch_x` to a tensor with shape `[n_steps*batch_size, n_input + 2*n_input*n_context]`.
# This is done to prepare the batch for input into the first layer which expects a tensor of rank `2`.
# Permute n_steps and batch_size
batch_x = tf.transpose(batch_x, [1, 0, 2, 3])
# Reshape to prepare input for first layer
batch_x = tf.reshape(batch_x, [-1, Config.n_input + 2*Config.n_input*Config.n_context]) # (n_steps*batch_size, n_input + 2*n_input*n_context)
layers['input_reshaped'] = batch_x
# The next three blocks will pass `batch_x` through three hidden layers with
# clipped RELU activation and dropout.
# 1st layer
b1 = variable_on_worker_level('b1', [Config.n_hidden_1], tf.zeros_initializer())
h1 = variable_on_worker_level('h1', [Config.n_input + 2*Config.n_input*Config.n_context, Config.n_hidden_1], tf.contrib.layers.xavier_initializer())
layer_1 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(batch_x, h1), b1)), FLAGS.relu_clip)
layer_1 = tf.nn.dropout(layer_1, rate=dropout[0])
layers['layer_1'] = layer_1
# 2nd layer
b2 = variable_on_worker_level('b2', [Config.n_hidden_2], tf.zeros_initializer())
h2 = variable_on_worker_level('h2', [Config.n_hidden_1, Config.n_hidden_2], tf.contrib.layers.xavier_initializer())
layer_2 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_1, h2), b2)), FLAGS.relu_clip)
layer_2 = tf.nn.dropout(layer_2, rate=dropout[1])
layers['layer_2'] = layer_2
# 3rd layer
b3 = variable_on_worker_level('b3', [Config.n_hidden_3], tf.zeros_initializer())
h3 = variable_on_worker_level('h3', [Config.n_hidden_2, Config.n_hidden_3], tf.contrib.layers.xavier_initializer())
layer_3 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(layer_2, h3), b3)), FLAGS.relu_clip)
layer_3 = tf.nn.dropout(layer_3, rate=dropout[2])
layers['layer_3'] = layer_3
# Now we create the forward and backward LSTM units.
# Both of which have inputs of length `n_cell_dim` and bias `1.0` for the forget gate of the LSTM.
# Forward direction cell:
if not tflite:
fw_cell = tf.contrib.rnn.LSTMBlockFusedCell(Config.n_cell_dim, reuse=reuse)
layers['fw_cell'] = fw_cell
else:
fw_cell = tf.nn.rnn_cell.LSTMCell(Config.n_cell_dim, reuse=reuse)
# `layer_3` is now reshaped into `[n_steps, batch_size, 2*n_cell_dim]`,
# as the LSTM RNN expects its input to be of shape `[max_time, batch_size, input_size]`.
layer_3 = tf.reshape(layer_3, [n_steps, batch_size, Config.n_hidden_3])
if tflite:
# Generated StridedSlice, not supported by NNAPI
#n_layer_3 = []
#for l in range(layer_3.shape[0]):
# n_layer_3.append(layer_3[l])
#layer_3 = n_layer_3
# Unstack/Unpack is not supported by NNAPI
layer_3 = tf.unstack(layer_3, n_steps)
# We parametrize the RNN implementation as the training and inference graph
# need to do different things here.
if not tflite:
output, output_state = fw_cell(inputs=layer_3, dtype=tf.float32, sequence_length=seq_length, initial_state=previous_state)
else:
output, output_state = tf.nn.static_rnn(fw_cell, layer_3, previous_state, tf.float32)
output = tf.concat(output, 0)
# Reshape output from a tensor of shape [n_steps, batch_size, n_cell_dim]
# to a tensor of shape [n_steps*batch_size, n_cell_dim]
output = tf.reshape(output, [-1, Config.n_cell_dim])
layers['rnn_output'] = output
layers['rnn_output_state'] = output_state
# Now we feed `output` to the fifth hidden layer with clipped RELU activation and dropout
b5 = variable_on_worker_level('b5', [Config.n_hidden_5], tf.zeros_initializer())
h5 = variable_on_worker_level('h5', [Config.n_cell_dim, Config.n_hidden_5], tf.contrib.layers.xavier_initializer())
layer_5 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(output, h5), b5)), FLAGS.relu_clip)
layer_5 = tf.nn.dropout(layer_5, rate=dropout[5])
layers['layer_5'] = layer_5
# Now we apply the weight matrix `h6` and bias `b6` to the output of `layer_5`
# creating `n_classes` dimensional vectors, the logits.
b6 = variable_on_worker_level('b6', [Config.n_hidden_6], tf.zeros_initializer())
h6 = variable_on_worker_level('h6', [Config.n_hidden_5, Config.n_hidden_6], tf.contrib.layers.xavier_initializer())
layer_6 = tf.add(tf.matmul(layer_5, h6), b6)
layers['layer_6'] = layer_6
# Finally we reshape layer_6 from a tensor of shape [n_steps*batch_size, n_hidden_6]
# to the slightly more useful shape [n_steps, batch_size, n_hidden_6].
# Note, that this differs from the input in that it is time-major.
layer_6 = tf.reshape(layer_6, [n_steps, batch_size, Config.n_hidden_6], name="raw_logits")
layers['raw_logits'] = layer_6
# Output shape: [n_steps, batch_size, n_hidden_6]
return layer_6, layers
# Accuracy and Loss
# =================
# In accord with 'Deep Speech: Scaling up end-to-end speech recognition'
# (http://arxiv.org/abs/1412.5567),
# the loss function used by our network should be the CTC loss function
# (http://www.cs.toronto.edu/~graves/preprint.pdf).
# Conveniently, this loss function is implemented in TensorFlow.
# Thus, we can simply make use of this implementation to define our loss.
def calculate_mean_edit_distance_and_loss(model_feeder, tower, dropout, reuse):
r'''
This routine beam search decodes a mini-batch and calculates the loss and mean edit distance.
Next to total and average loss it returns the mean edit distance,
the decoded result and the batch's original Y.
'''
# Obtain the next batch of data
batch_x, batch_seq_len, batch_y = model_feeder.next_batch(tower)
# Calculate the logits of the batch using BiRNN
logits, _ = BiRNN(batch_x, batch_seq_len, dropout, reuse)
# Compute the CTC loss using TensorFlow's `ctc_loss`
total_loss = tf.nn.ctc_loss(labels=batch_y, inputs=logits, sequence_length=batch_seq_len)
# Calculate the average loss across the batch
avg_loss = tf.reduce_mean(total_loss)
# Finally we return the average loss
return avg_loss
# Adam Optimization
# =================
# In contrast to 'Deep Speech: Scaling up end-to-end speech recognition'
# (http://arxiv.org/abs/1412.5567),
# in which 'Nesterov's Accelerated Gradient Descent'
# (www.cs.toronto.edu/~fritz/absps/momentum.pdf) was used,
# we will use the Adam method for optimization (http://arxiv.org/abs/1412.6980),
# because, generally, it requires less fine-tuning.
def create_optimizer():
optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate,
beta1=FLAGS.beta1,
beta2=FLAGS.beta2,
epsilon=FLAGS.epsilon)
return optimizer
# Towers
# ======
# In order to properly make use of multiple GPU's, one must introduce new abstractions,
# not present when using a single GPU, that facilitate the multi-GPU use case.
# In particular, one must introduce a means to isolate the inference and gradient
# calculations on the various GPU's.
# The abstraction we intoduce for this purpose is called a 'tower'.
# A tower is specified by two properties:
# * **Scope** - A scope, as provided by `tf.name_scope()`,
# is a means to isolate the operations within a tower.
# For example, all operations within 'tower 0' could have their name prefixed with `tower_0/`.
# * **Device** - A hardware device, as provided by `tf.device()`,
# on which all operations within the tower execute.
# For example, all operations of 'tower 0' could execute on the first GPU `tf.device('/gpu:0')`.
def get_tower_results(model_feeder, optimizer, dropout_rates):
r'''
With this preliminary step out of the way, we can for each GPU introduce a
tower for which's batch we calculate and return the optimization gradients
and the average loss across towers.
'''
# To calculate the mean of the losses
tower_avg_losses = []
# Tower gradients to return
tower_gradients = []
with tf.variable_scope(tf.get_variable_scope()):
# Loop over available_devices
for i in range(len(Config.available_devices)):
# Execute operations of tower i on device i
if len(FLAGS.ps_hosts) == 0:
device = Config.available_devices[i]
else:
device = tf.train.replica_device_setter(worker_device=Config.available_devices[i], cluster=Config.cluster)
with tf.device(device):
# Create a scope for all operations of tower i
with tf.name_scope('tower_%d' % i) as scope:
# Calculate the avg_loss and mean_edit_distance and retrieve the decoded
# batch along with the original batch's labels (Y) of this tower
avg_loss = calculate_mean_edit_distance_and_loss(model_feeder, i, dropout_rates, reuse=i>0)
# Allow for variables to be re-used by the next tower
tf.get_variable_scope().reuse_variables()
# Retain tower's avg losses
tower_avg_losses.append(avg_loss)
# Compute gradients for model parameters using tower's mini-batch
gradients = optimizer.compute_gradients(avg_loss)
# Retain tower's gradients
tower_gradients.append(gradients)
avg_loss_across_towers = tf.reduce_mean(tower_avg_losses, 0)
tf.summary.scalar(name='step_loss', tensor=avg_loss_across_towers, collections=['step_summaries'])
# Return gradients and the average loss
return tower_gradients, avg_loss_across_towers
def average_gradients(tower_gradients):
r'''
A routine for computing each variable's average of the gradients obtained from the GPUs.
Note also that this code acts as a synchronization point as it requires all
GPUs to be finished with their mini-batch before it can run to completion.
'''
# List of average gradients to return to the caller
average_grads = []
# Run this on cpu_device to conserve GPU memory
with tf.device(Config.cpu_device):
# Loop over gradient/variable pairs from all towers
for grad_and_vars in zip(*tower_gradients):
# Introduce grads to store the gradients for the current variable
grads = []
# Loop over the gradients for the current variable
for g, _ in grad_and_vars:
# Add 0 dimension to the gradients to represent the tower.
expanded_g = tf.expand_dims(g, 0)
# Append on a 'tower' dimension which we will average over below.
grads.append(expanded_g)
# Average over the 'tower' dimension
grad = tf.concat(grads, 0)
grad = tf.reduce_mean(grad, 0)
# Create a gradient/variable tuple for the current variable with its average gradient
grad_and_var = (grad, grad_and_vars[0][1])
# Add the current tuple to average_grads
average_grads.append(grad_and_var)
# Return result to caller
return average_grads
# Logging
# =======
def log_variable(variable, gradient=None):
r'''
We introduce a function for logging a tensor variable's current state.
It logs scalar values for the mean, standard deviation, minimum and maximum.
Furthermore it logs a histogram of its state and (if given) of an optimization gradient.
'''
name = variable.name
mean = tf.reduce_mean(variable)
tf.summary.scalar(name='%s/mean' % name, tensor=mean)
tf.summary.scalar(name='%s/sttdev' % name, tensor=tf.sqrt(tf.reduce_mean(tf.square(variable - mean))))
tf.summary.scalar(name='%s/max' % name, tensor=tf.reduce_max(variable))
tf.summary.scalar(name='%s/min' % name, tensor=tf.reduce_min(variable))
tf.summary.histogram(name=name, values=variable)
if gradient is not None:
if isinstance(gradient, tf.IndexedSlices):
grad_values = gradient.values
else:
grad_values = gradient
if grad_values is not None:
tf.summary.histogram(name='%s/gradients' % name, values=grad_values)
def log_grads_and_vars(grads_and_vars):
r'''
Let's also introduce a helper function for logging collections of gradient/variable tuples.
'''
for gradient, variable in grads_and_vars:
log_variable(variable, gradient=gradient)
# Helpers
# =======
def send_token_to_ps(session, kill=False):
# Sending our token (the task_index as a debug opportunity) to each parameter server.
# kill switch tokens are negative and decremented by 1 to deal with task_index 0
token = -FLAGS.task_index-1 if kill else FLAGS.task_index
kind = 'kill switch' if kill else 'stop'
for index, enqueue in enumerate(Config.done_enqueues):
log_debug('Sending %s token to ps %d...' % (kind, index))
session.run(enqueue, feed_dict={ Config.token_placeholder: token })
log_debug('Sent %s token to ps %d.' % (kind, index))
def train(server=None):
r'''
Trains the network on a given server of a cluster.
If no server provided, it performs single process training.
'''
# Initializing and starting the training coordinator
coord = TrainingCoordinator(Config.is_chief)
coord.start()
# Create a variable to hold the global_step.
# It will automagically get incremented by the optimizer.
global_step = tf.Variable(0, trainable=False, name='global_step')
dropout_rates = [tf.placeholder(tf.float32, name='dropout_{}'.format(i)) for i in range(6)]
# Reading training set
train_data = preprocess(FLAGS.train_files.split(','),
FLAGS.train_batch_size,
Config.n_input,
Config.n_context,
Config.alphabet,
hdf5_cache_path=FLAGS.train_cached_features_path)
train_set = DataSet(train_data,
FLAGS.train_batch_size,
limit=FLAGS.limit_train,
next_index=lambda i: coord.get_next_index('train'))
# Reading validation set
dev_data = preprocess(FLAGS.dev_files.split(','),
FLAGS.dev_batch_size,
Config.n_input,
Config.n_context,
Config.alphabet,
hdf5_cache_path=FLAGS.dev_cached_features_path)
dev_set = DataSet(dev_data,
FLAGS.dev_batch_size,
limit=FLAGS.limit_dev,
next_index=lambda i: coord.get_next_index('dev'))
# Combining all sets to a multi set model feeder
model_feeder = ModelFeeder(train_set,
dev_set,
Config.n_input,
Config.n_context,
Config.alphabet,
tower_feeder_count=len(Config.available_devices))
# Create the optimizer
optimizer = create_optimizer()
# Synchronous distributed training is facilitated by a special proxy-optimizer
if not server is None:
optimizer = tf.train.SyncReplicasOptimizer(optimizer,
replicas_to_aggregate=FLAGS.replicas_to_agg,
total_num_replicas=FLAGS.replicas)
# Get the data_set specific graph end-points
gradients, loss = get_tower_results(model_feeder, optimizer, dropout_rates)
# Average tower gradients across GPUs
avg_tower_gradients = average_gradients(gradients)
# Add summaries of all variables and gradients to log
log_grads_and_vars(avg_tower_gradients)
# Op to merge all summaries for the summary hook
merge_all_summaries_op = tf.summary.merge_all()
# These are saved on every step
step_summaries_op = tf.summary.merge_all('step_summaries')
step_summary_writers = {
'train': tf.summary.FileWriter(os.path.join(FLAGS.summary_dir, 'train'), max_queue=120),
'dev': tf.summary.FileWriter(os.path.join(FLAGS.summary_dir, 'dev'), max_queue=120)
}
# Apply gradients to modify the model
apply_gradient_op = optimizer.apply_gradients(avg_tower_gradients, global_step=global_step)
if FLAGS.early_stop is True and not FLAGS.validation_step > 0:
log_warn('Parameter --validation_step needs to be >0 for early stopping to work')
class CoordHook(tf.train.SessionRunHook):
r'''
Embedded coordination hook-class that will use variables of the
surrounding Python context.
'''
def after_create_session(self, session, coord):
log_debug('Starting queue runners...')
model_feeder.start_queue_threads(session, coord)
log_debug('Queue runners started.')
def end(self, session):
# Closing the data_set queues
log_debug('Closing queues...')
model_feeder.close_queues(session)
log_debug('Queues closed.')
# Telling the ps that we are done
send_token_to_ps(session)
# Collecting the hooks
hooks = [CoordHook()]
# Hook to handle initialization and queues for sync replicas.
if not server is None:
hooks.append(optimizer.make_session_run_hook(Config.is_chief))
# Hook to save TensorBoard summaries
if FLAGS.summary_secs > 0:
hooks.append(tf.train.SummarySaverHook(save_secs=FLAGS.summary_secs, output_dir=FLAGS.summary_dir, summary_op=merge_all_summaries_op))
# Hook wih number of checkpoint files to save in checkpoint_dir
if FLAGS.train and FLAGS.max_to_keep > 0:
saver = tf.train.Saver(max_to_keep=FLAGS.max_to_keep)
hooks.append(tf.train.CheckpointSaverHook(checkpoint_dir=FLAGS.checkpoint_dir, save_secs=FLAGS.checkpoint_secs, saver=saver))
no_dropout_feed_dict = {
dropout_rates[0]: 0.,
dropout_rates[1]: 0.,
dropout_rates[2]: 0.,
dropout_rates[3]: 0.,
dropout_rates[4]: 0.,
dropout_rates[5]: 0.,
}
# Progress Bar
def update_progressbar(set_name):
if not hasattr(update_progressbar, 'current_set_name'):
update_progressbar.current_set_name = None
if (update_progressbar.current_set_name != set_name or
update_progressbar.current_job_index == update_progressbar.total_jobs):
# finish prev pbar if it exists
if hasattr(update_progressbar, 'pbar') and update_progressbar.pbar:
update_progressbar.pbar.finish()
update_progressbar.total_jobs = None
update_progressbar.current_job_index = 0
current_epoch = coord._epoch-1
if set_name == "train":
log_info('Training epoch %i...' % current_epoch)
update_progressbar.total_jobs = coord._num_jobs_train
else:
log_info('Validating epoch %i...' % current_epoch)
update_progressbar.total_jobs = coord._num_jobs_dev
# recreate pbar
update_progressbar.pbar = progressbar.ProgressBar(max_value=update_progressbar.total_jobs,
redirect_stdout=True).start()
update_progressbar.current_set_name = set_name
if update_progressbar.pbar:
update_progressbar.pbar.update(update_progressbar.current_job_index+1, force=True)
update_progressbar.current_job_index += 1
# Initialize update_progressbar()'s child fields to safe values
update_progressbar.pbar = None
# The MonitoredTrainingSession takes care of session initialization,
# restoring from a checkpoint, saving to a checkpoint, and closing when done
# or an error occurs.
try:
with tf.train.MonitoredTrainingSession(master='' if server is None else server.target,
is_chief=Config.is_chief,
hooks=hooks,
checkpoint_dir=FLAGS.checkpoint_dir,
save_checkpoint_secs=None, # already taken care of by a hook
log_step_count_steps=0, # disable logging of steps/s to avoid TF warning in validation sets
config=Config.session_config) as session:
tf.get_default_graph().finalize()
try:
if Config.is_chief:
# Retrieving global_step from the (potentially restored) model
model_feeder.set_data_set(no_dropout_feed_dict, model_feeder.train)
step = session.run(global_step, feed_dict=no_dropout_feed_dict)
coord.start_coordination(model_feeder, step)
# Get the first job
job = coord.get_job()
while job and not session.should_stop():
log_debug('Computing %s...' % job)
is_train = job.set_name == 'train'
# The feed_dict (mainly for switching between queues)
if is_train:
feed_dict = {
dropout_rates[0]: FLAGS.dropout_rate,
dropout_rates[1]: FLAGS.dropout_rate2,
dropout_rates[2]: FLAGS.dropout_rate3,
dropout_rates[3]: FLAGS.dropout_rate4,
dropout_rates[4]: FLAGS.dropout_rate5,
dropout_rates[5]: FLAGS.dropout_rate6,
}
else:
feed_dict = no_dropout_feed_dict
# Sets the current data_set for the respective placeholder in feed_dict
model_feeder.set_data_set(feed_dict, getattr(model_feeder, job.set_name))
# Initialize loss aggregator
total_loss = 0.0
# Setting the training operation in case of training requested
train_op = apply_gradient_op if is_train else []
# So far the only extra parameter is the feed_dict
extra_params = { 'feed_dict': feed_dict }
step_summary_writer = step_summary_writers.get(job.set_name)
# Loop over the batches
for job_step in range(job.steps):
if session.should_stop():
break
log_debug('Starting batch...')
# Compute the batch
_, current_step, batch_loss, step_summary = session.run([train_op, global_step, loss, step_summaries_op], **extra_params)
# Log step summaries
step_summary_writer.add_summary(step_summary, current_step)
# Uncomment the next line for debugging race conditions / distributed TF
log_debug('Finished batch step %d.' % current_step)
# Add batch to loss
total_loss += batch_loss
# Gathering job results
job.loss = total_loss / job.steps
# Display progressbar
if FLAGS.show_progressbar:
update_progressbar(job.set_name)
# Send the current job to coordinator and receive the next one
log_debug('Sending %s...' % job)
job = coord.next_job(job)
if update_progressbar.pbar:
update_progressbar.pbar.finish()
except Exception as e:
log_error(str(e))
traceback.print_exc()
# Calling all hook's end() methods to end blocking calls
for hook in hooks:
hook.end(session)
# Only chief has a SyncReplicasOptimizer queue runner that needs to be stopped for unblocking process exit.
# A rather graceful way to do this is by stopping the ps.
# Only one party can send it w/o failing.
if Config.is_chief:
send_token_to_ps(session, kill=True)
sys.exit(1)
log_debug('Session closed.')
except tf.errors.InvalidArgumentError as e:
log_error(str(e))
log_error('The checkpoint in {0} does not match the shapes of the model.'
' Did you change alphabet.txt or the --n_hidden parameter'
' between train runs using the same checkpoint dir? Try moving'
' or removing the contents of {0}.'.format(FLAGS.checkpoint_dir))
sys.exit(1)
# Stopping the coordinator
coord.stop()
def test():
# Reading test set
test_data = preprocess(FLAGS.test_files.split(','),
FLAGS.test_batch_size,
Config.n_input,
Config.n_context,
Config.alphabet,
hdf5_cache_path=FLAGS.test_cached_features_path)
graph = create_inference_graph(batch_size=FLAGS.test_batch_size, n_steps=-1)
evaluate.evaluate(test_data, graph)
def create_inference_graph(batch_size=1, n_steps=16, tflite=False):
batch_size = batch_size if batch_size > 0 else None
# Input tensor will be of shape [batch_size, n_steps, 2*n_context+1, n_input]
input_tensor = tf.placeholder(tf.float32, [batch_size, n_steps if n_steps > 0 else None, 2*Config.n_context+1, Config.n_input], name='input_node')
seq_length = tf.placeholder(tf.int32, [batch_size], name='input_lengths')
if batch_size <= 0:
# no state management since n_step is expected to be dynamic too (see below)
previous_state = previous_state_c = previous_state_h = None
else:
if not tflite:
previous_state_c = variable_on_worker_level('previous_state_c', [batch_size, Config.n_cell_dim], initializer=None)
previous_state_h = variable_on_worker_level('previous_state_h', [batch_size, Config.n_cell_dim], initializer=None)
else:
previous_state_c = tf.placeholder(tf.float32, [batch_size, Config.n_cell_dim], name='previous_state_c')
previous_state_h = tf.placeholder(tf.float32, [batch_size, Config.n_cell_dim], name='previous_state_h')
previous_state = tf.contrib.rnn.LSTMStateTuple(previous_state_c, previous_state_h)
no_dropout = [0.0] * 6
logits, layers = BiRNN(batch_x=input_tensor,
seq_length=seq_length if FLAGS.use_seq_length else None,
dropout=no_dropout,
batch_size=batch_size,
n_steps=n_steps,
previous_state=previous_state,
tflite=tflite)
# TF Lite runtime will check that input dimensions are 1, 2 or 4
# by default we get 3, the middle one being batch_size which is forced to
# one on inference graph, so remove that dimension
if tflite:
logits = tf.squeeze(logits, [1])
# Apply softmax for CTC decoder
logits = tf.nn.softmax(logits)
if batch_size <= 0:
if tflite:
raise NotImplementedError('dynamic batch_size does not support tflite nor streaming')
if n_steps > 0:
raise NotImplementedError('dynamic batch_size expect n_steps to be dynamic too')
return (
{
'input': input_tensor,
'input_lengths': seq_length,
},
{
'outputs': tf.identity(logits, name='logits'),
},
layers
)
new_state_c, new_state_h = layers['rnn_output_state']
if not tflite:
zero_state = tf.zeros([batch_size, Config.n_cell_dim], tf.float32)
initialize_c = tf.assign(previous_state_c, zero_state)
initialize_h = tf.assign(previous_state_h, zero_state)
initialize_state = tf.group(initialize_c, initialize_h, name='initialize_state')
with tf.control_dependencies([tf.assign(previous_state_c, new_state_c), tf.assign(previous_state_h, new_state_h)]):
logits = tf.identity(logits, name='logits')
return (
{
'input': input_tensor,
'input_lengths': seq_length,
},
{
'outputs': logits,
'initialize_state': initialize_state,
},
layers
)
else:
logits = tf.identity(logits, name='logits')
new_state_c = tf.identity(new_state_c, name='new_state_c')
new_state_h = tf.identity(new_state_h, name='new_state_h')
return (
{
'input': input_tensor,
'previous_state_c': previous_state_c,
'previous_state_h': previous_state_h,
},
{
'outputs': logits,
'new_state_c': new_state_c,
'new_state_h': new_state_h,
},
layers
)
def export():
r'''
Restores the trained variables into a simpler graph that will be exported for serving.
'''
log_info('Exporting the model...')
with tf.device('/cpu:0'):
from tensorflow.python.framework.ops import Tensor, Operation
tf.reset_default_graph()
session = tf.Session(config=Config.session_config)
inputs, outputs, _ = create_inference_graph(batch_size=FLAGS.export_batch_size, n_steps=FLAGS.n_steps, tflite=FLAGS.export_tflite)
input_names = ",".join(tensor.op.name for tensor in inputs.values())
output_names_tensors = [ tensor.op.name for tensor in outputs.values() if isinstance(tensor, Tensor) ]
output_names_ops = [ tensor.name for tensor in outputs.values() if isinstance(tensor, Operation) ]
output_names = ",".join(output_names_tensors + output_names_ops)
input_shapes = ":".join(",".join(map(str, tensor.shape)) for tensor in inputs.values())
if not FLAGS.export_tflite:
mapping = {v.op.name: v for v in tf.global_variables() if not v.op.name.startswith('previous_state_')}
else:
# Create a saver using variables from the above newly created graph
def fixup(name):
if name.startswith('rnn/lstm_cell/'):
return name.replace('rnn/lstm_cell/', 'lstm_fused_cell/')
return name
mapping = {fixup(v.op.name): v for v in tf.global_variables()}
saver = tf.train.Saver(mapping)
# Restore variables from training checkpoint
checkpoint = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
checkpoint_path = checkpoint.model_checkpoint_path
output_filename = 'output_graph.pb'
if FLAGS.remove_export:
if os.path.isdir(FLAGS.export_dir):
log_info('Removing old export')
shutil.rmtree(FLAGS.export_dir)
try:
output_graph_path = os.path.join(FLAGS.export_dir, output_filename)
if not os.path.isdir(FLAGS.export_dir):
os.makedirs(FLAGS.export_dir)
def do_graph_freeze(output_file=None, output_node_names=None, variables_blacklist=None):
return freeze_graph.freeze_graph_with_def_protos(
input_graph_def=session.graph_def,
input_saver_def=saver.as_saver_def(),
input_checkpoint=checkpoint_path,
output_node_names=output_node_names,
restore_op_name=None,
filename_tensor_name=None,
output_graph=output_file,
clear_devices=False,
variable_names_blacklist=variables_blacklist,
initializer_nodes='')
if not FLAGS.export_tflite:
do_graph_freeze(output_file=output_graph_path, output_node_names=output_names, variables_blacklist='previous_state_c,previous_state_h')
else:
frozen_graph = do_graph_freeze(output_node_names=output_names, variables_blacklist='')
output_tflite_path = os.path.join(FLAGS.export_dir, output_filename.replace('.pb', '.tflite'))
converter = tf.lite.TFLiteConverter(frozen_graph, input_tensors=inputs.values(), output_tensors=outputs.values())
converter.post_training_quantize = True
tflite_model = converter.convert()
with open(output_tflite_path, 'wb') as fout:
fout.write(tflite_model)
log_info('Exported model for TF Lite engine as {}'.format(os.path.basename(output_tflite_path)))
log_info('Models exported at %s' % (FLAGS.export_dir))
except RuntimeError as e:
log_error(str(e))
def do_single_file_inference(input_file_path):
with tf.Session(config=Config.session_config) as session:
inputs, outputs, _ = create_inference_graph(batch_size=1, n_steps=-1)
# Create a saver using variables from the above newly created graph
mapping = {v.op.name: v for v in tf.global_variables() if not v.op.name.startswith('previous_state_')}
saver = tf.train.Saver(mapping)
# Restore variables from training checkpoint
# TODO: This restores the most recent checkpoint, but if we use validation to counteract
# over-fitting, we may want to restore an earlier checkpoint.
checkpoint = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if not checkpoint:
log_error('Checkpoint directory ({}) does not contain a valid checkpoint state.'.format(FLAGS.checkpoint_dir))
exit(1)
checkpoint_path = checkpoint.model_checkpoint_path
saver.restore(session, checkpoint_path)
session.run(outputs['initialize_state'])
features = audiofile_to_input_vector(input_file_path, Config.n_input, Config.n_context)
num_strides = len(features) - (Config.n_context * 2)
# Create a view into the array with overlapping strides of size
# numcontext (past) + 1 (present) + numcontext (future)
window_size = 2*Config.n_context+1
features = np.lib.stride_tricks.as_strided(
features,
(num_strides, window_size, Config.n_input),
(features.strides[0], features.strides[0], features.strides[1]),
writeable=False)
logits = session.run(outputs['outputs'], feed_dict = {
inputs['input']: [features],
inputs['input_lengths']: [num_strides],
})
logits = np.squeeze(logits)
scorer = Scorer(FLAGS.lm_alpha, FLAGS.lm_beta,
FLAGS.lm_binary_path, FLAGS.lm_trie_path,
Config.alphabet)
decoded = ctc_beam_search_decoder(logits, Config.alphabet, FLAGS.beam_width, scorer=scorer)
# Print highest probability result
print(decoded[0][1])
def main(_):
initialize_globals()
if FLAGS.train or FLAGS.test:
if len(FLAGS.worker_hosts) == 0:
# Only one local task: this process (default case - no cluster)
with tf.Graph().as_default():
tf.set_random_seed(FLAGS.random_seed)
train()
# Now do a final test epoch
if FLAGS.test:
with tf.Graph().as_default():
test()
log_debug('Done.')
else:
# Create and start a server for the local task.
server = tf.train.Server(Config.cluster, job_name=FLAGS.job_name, task_index=FLAGS.task_index)
if FLAGS.job_name == 'ps':
# We are a parameter server and therefore we just wait for all workers to finish
# by waiting for their stop tokens.
with tf.Session(server.target) as session:
for worker in FLAGS.worker_hosts:
log_debug('Waiting for stop token...')
token = session.run(Config.done_dequeues[FLAGS.task_index])
if token < 0:
log_debug('Got a kill switch token from worker %i.' % abs(token + 1))
break
log_debug('Got a stop token from worker %i.' % token)
log_debug('Session closed.')
if FLAGS.test:
test()
elif FLAGS.job_name == 'worker':
# We are a worker and therefore we have to do some work.
# Assigns ops to the local worker by default.
with tf.device(tf.train.replica_device_setter(
worker_device=Config.worker_device,
cluster=Config.cluster)):
# Do the training
train(server)
log_debug('Server stopped.')
# Are we the main process?
if Config.is_chief:
# Doing solo/post-processing work just on the main process...
# Exporting the model
if FLAGS.export_dir:
export()
if len(FLAGS.one_shot_infer):
do_single_file_inference(FLAGS.one_shot_infer)
if __name__ == '__main__' :
create_flags()
tf.app.run(main)

View File

@ -1,220 +0,0 @@
# Need devel version cause we need /usr/include/cudnn.h
# for compiling libctc_decoder_with_kenlm.so
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
# >> START Install base software
# Get basic packages
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
curl \
wget \
git \
python \
python-dev \
python-pip \
python-wheel \
python-numpy \
libcurl3-dev \
ca-certificates \
gcc \
sox \
libsox-fmt-mp3 \
htop \
nano \
swig \
cmake \
libboost-all-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
locales \
pkg-config \
libsox-dev \
openjdk-8-jdk \
bash-completion \
g++ \
unzip
# Install NCCL 2.2
RUN apt-get install -qq -y --allow-downgrades --allow-change-held-packages libnccl2=2.3.7-1+cuda10.0 libnccl-dev=2.3.7-1+cuda10.0
# Install Bazel
RUN curl -LO "https://github.com/bazelbuild/bazel/releases/download/0.19.2/bazel_0.19.2-linux-x86_64.deb"
RUN dpkg -i bazel_*.deb
# Install CUDA CLI Tools
RUN apt-get install -qq -y cuda-command-line-tools-10-0
# Install pip
RUN wget https://bootstrap.pypa.io/get-pip.py && \
python get-pip.py && \
rm get-pip.py
# << END Install base software
# >> START Configure Tensorflow Build
# Clone TensoFlow from Mozilla repo
RUN git clone https://github.com/mozilla/tensorflow/
WORKDIR /tensorflow
RUN git checkout r1.13
# GPU Environment Setup
ENV TF_NEED_CUDA 1
ENV CUDA_TOOLKIT_PATH /usr/local/cuda
ENV TF_CUDA_VERSION 10.0
ENV TF_CUDNN_VERSION 7
ENV CUDNN_INSTALL_PATH /usr/lib/x86_64-linux-gnu/
ENV TF_CUDA_COMPUTE_CAPABILITIES 6.0
ENV TF_NCCL_VERSION 2.3
# ENV NCCL_INSTALL_PATH /usr/lib/x86_64-linux-gnu/
# Common Environment Setup
ENV TF_BUILD_CONTAINER_TYPE GPU
ENV TF_BUILD_OPTIONS OPT
ENV TF_BUILD_DISABLE_GCP 1
ENV TF_BUILD_ENABLE_XLA 0
ENV TF_BUILD_PYTHON_VERSION PYTHON2
ENV TF_BUILD_IS_OPT OPT
ENV TF_BUILD_IS_PIP PIP
# Other Parameters
ENV CC_OPT_FLAGS -mavx -mavx2 -msse4.1 -msse4.2 -mfma
ENV TF_NEED_GCP 0
ENV TF_NEED_HDFS 0
ENV TF_NEED_JEMALLOC 1
ENV TF_NEED_OPENCL 0
ENV TF_CUDA_CLANG 0
ENV TF_NEED_MKL 0
ENV TF_ENABLE_XLA 0
ENV TF_NEED_AWS 0
ENV TF_NEED_KAFKA 0
ENV TF_NEED_NGRAPH 0
ENV TF_DOWNLOAD_CLANG 0
ENV TF_NEED_TENSORRT 0
ENV TF_NEED_GDR 0
ENV TF_NEED_VERBS 0
ENV TF_NEED_OPENCL_SYCL 0
ENV PYTHON_BIN_PATH /usr/bin/python2.7
ENV PYTHON_LIB_PATH /usr/lib/python2.7/dist-packages
# << END Configure Tensorflow Build
# >> START Configure Bazel
# Running bazel inside a `docker build` command causes trouble, cf:
# https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
# https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
>>/etc/bazel.bazelrc
# Put cuda libraries to where they are expected to be
RUN mkdir /usr/local/cuda/lib && \
ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/lib/libnccl.so.2 && \
ln -s /usr/include/nccl.h /usr/local/cuda/include/nccl.h && \
ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
ln -s /usr/include/cudnn.h /usr/local/cuda/include/cudnn.h
# Set library paths
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu/:/usr/local/cuda/lib64/stubs/
# << END Configure Bazel
# Copy DeepSpeech repo contents to container's /DeepSpeech
COPY . /DeepSpeech/
WORKDIR /DeepSpeech
RUN pip --no-cache-dir install -r requirements.txt
# Link DeepSpeech native_client libs to tf folder
RUN ln -s /DeepSpeech/native_client /tensorflow
# >> START Build and bind
WORKDIR /tensorflow
# Fix for not found script https://github.com/tensorflow/tensorflow/issues/471
RUN ./configure
# Using CPU optimizations:
# -mtune=generic -march=x86-64 -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx.
# Adding --config=cuda flag to build using CUDA.
# passing LD_LIBRARY_PATH is required cause Bazel doesn't pickup it from environment
# Build DeepSpeech
RUN bazel build --config=monolithic --config=cuda -c opt --copt=-O3 --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-fvisibility=hidden //native_client:libdeepspeech.so //native_client:generate_trie --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
###
### Using TensorFlow upstream should work
###
# # Build TF pip package
# RUN bazel build --config=opt --config=cuda --copt="-D_GLIBCXX_USE_CXX11_ABI=0" --copt=-mtune=generic --copt=-march=x86-64 --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx //tensorflow/tools/pip_package:build_pip_package --verbose_failures --action_env=LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
#
# # Build wheel
# RUN bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
#
# # Install tensorflow from our custom wheel
# RUN pip install /tmp/tensorflow_pkg/*.whl
# Copy built libs to /DeepSpeech/native_client
RUN cp /tensorflow/bazel-bin/native_client/generate_trie /DeepSpeech/native_client/ \
&& cp /tensorflow/bazel-bin/native_client/libdeepspeech.so /DeepSpeech/native_client/
# Install TensorFlow
WORKDIR /DeepSpeech/
RUN pip install tensorflow-gpu==1.13.1
# Make DeepSpeech and install Python bindings
ENV TFDIR /tensorflow
WORKDIR /DeepSpeech/native_client
RUN make deepspeech
WORKDIR /DeepSpeech/native_client/python
RUN make bindings
RUN pip install dist/deepspeech*
WORKDIR /DeepSpeech/native_client/ctcdecode
RUN make
RUN pip install dist/*.whl
# << END Build and bind
# Allow Python printing utf-8
ENV PYTHONIOENCODING UTF-8
# Build KenLM in /DeepSpeech/native_client/kenlm folder
WORKDIR /DeepSpeech/native_client
RUN rm -rf kenlm \
&& git clone --depth 1 https://github.com/kpu/kenlm && cd kenlm \
&& mkdir -p build \
&& cd build \
&& cmake .. \
&& make -j 4
# Done
WORKDIR /DeepSpeech

183
Dockerfile.build Normal file
View File

@ -0,0 +1,183 @@
# Please refer to the USING documentation, "Dockerfile for building from source"
# Need devel version cause we need /usr/include/cudnn.h
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
# >> START Install base software
# Get basic packages
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
bash-completion \
build-essential \
ca-certificates \
cmake \
curl \
g++ \
gcc \
git \
libbz2-dev \
libboost-all-dev \
libgsm1-dev \
libltdl-dev \
liblzma-dev \
libmagic-dev \
libpng-dev \
libsox-fmt-mp3 \
libsox-dev \
locales \
openjdk-8-jdk \
pkg-config \
python3 \
python3-dev \
python3-pip \
python3-wheel \
python3-numpy \
sox \
unzip \
wget \
zlib1g-dev
RUN update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
# Install Bazel
RUN curl -LO "https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel_3.1.0-linux-x86_64.deb"
RUN dpkg -i bazel_*.deb
# Try and free some space
RUN rm -rf /var/lib/apt/lists/*
# << END Install base software
# >> START Configure Tensorflow Build
# GPU Environment Setup
ENV TF_NEED_ROCM 0
ENV TF_NEED_OPENCL_SYCL 0
ENV TF_NEED_OPENCL 0
ENV TF_NEED_CUDA 1
ENV TF_CUDA_PATHS "/usr,/usr/local/cuda-10.1,/usr/lib/x86_64-linux-gnu/"
ENV TF_CUDA_VERSION 10.1
ENV TF_CUDNN_VERSION 7.6
ENV TF_CUDA_COMPUTE_CAPABILITIES 6.0
ENV TF_NCCL_VERSION 2.8
# Common Environment Setup
ENV TF_BUILD_CONTAINER_TYPE GPU
ENV TF_BUILD_OPTIONS OPT
ENV TF_BUILD_DISABLE_GCP 1
ENV TF_BUILD_ENABLE_XLA 0
ENV TF_BUILD_PYTHON_VERSION PYTHON3
ENV TF_BUILD_IS_OPT OPT
ENV TF_BUILD_IS_PIP PIP
# Other Parameters
ENV CC_OPT_FLAGS -mavx -mavx2 -msse4.1 -msse4.2 -mfma
ENV TF_NEED_GCP 0
ENV TF_NEED_HDFS 0
ENV TF_NEED_JEMALLOC 1
ENV TF_NEED_OPENCL 0
ENV TF_CUDA_CLANG 0
ENV TF_NEED_MKL 0
ENV TF_ENABLE_XLA 0
ENV TF_NEED_AWS 0
ENV TF_NEED_KAFKA 0
ENV TF_NEED_NGRAPH 0
ENV TF_DOWNLOAD_CLANG 0
ENV TF_NEED_TENSORRT 0
ENV TF_NEED_GDR 0
ENV TF_NEED_VERBS 0
ENV TF_NEED_OPENCL_SYCL 0
ENV PYTHON_BIN_PATH /usr/bin/python3.6
ENV PYTHON_LIB_PATH /usr/local/lib/python3.6/dist-packages
# << END Configure Tensorflow Build
# >> START Configure Bazel
# Running bazel inside a `docker build` command causes trouble, cf:
# https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
# https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
>>/etc/bazel.bazelrc
# << END Configure Bazel
WORKDIR /
COPY . /STT/
# >> START Build and bind
WORKDIR /STT/tensorflow
# Fix for not found script https://github.com/tensorflow/tensorflow/issues/471
RUN ./configure
# Using CPU optimizations:
# -mtune=generic -march=x86-64 -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx.
# Adding --config=cuda flag to build using CUDA.
# passing LD_LIBRARY_PATH is required cause Bazel doesn't pickup it from environment
# Build STT
RUN bazel build \
--verbose_failures \
--workspace_status_command="bash native_client/bazel_workspace_status_cmd.sh" \
-c opt \
--copt=-mtune=generic \
--copt=-march=x86-64 \
--copt=-msse \
--copt=-msse2 \
--copt=-msse3 \
--copt=-msse4.1 \
--copt=-msse4.2 \
--copt=-mavx \
--config=noaws \
--config=nogcp \
--config=nohdfs \
--config=nonccl \
//native_client:libstt.so
# Copy built libs to /STT/native_client
RUN cp bazel-bin/native_client/libstt.so /STT/native_client/
# Build client.cc and install Python client and decoder bindings
ENV TFDIR /STT/tensorflow
RUN nproc
WORKDIR /STT/native_client
RUN make NUM_PROCESSES=$(nproc) stt
WORKDIR /STT
RUN cd native_client/python && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install -U pip setuptools wheel
RUN pip3 install --upgrade native_client/python/dist/*.whl
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
# << END Build and bind
# Allow Python printing utf-8
ENV PYTHONIOENCODING UTF-8
# Build KenLM in /STT/native_client/kenlm folder
WORKDIR /STT/native_client
RUN rm -rf kenlm && \
git clone https://github.com/kpu/kenlm && \
cd kenlm && \
git checkout 87e85e66c99ceff1fab2500a7c60c01da7315eec && \
mkdir -p build && \
cd build && \
cmake .. && \
make -j $(nproc)
# Done
WORKDIR /STT

97
Dockerfile.train Normal file
View File

@ -0,0 +1,97 @@
# This is a Dockerfile useful for training models with Coqui STT.
# You can train "acoustic models" with audio + Tensorflow, and
# you can create "scorers" with text + KenLM.
FROM nvcr.io/nvidia/tensorflow:20.06-tf1-py3 AS kenlm-build
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential cmake libboost-system-dev \
libboost-thread-dev libboost-program-options-dev \
libboost-test-dev libeigen3-dev zlib1g-dev \
libbz2-dev liblzma-dev && \
rm -rf /var/lib/apt/lists/*
# Build KenLM to generate new scorers
WORKDIR /code
COPY kenlm /code/kenlm
RUN cd /code/kenlm && \
mkdir -p build && \
cd build && \
cmake .. && \
make -j $(nproc) || \
( echo "ERROR: Failed to build KenLM."; \
echo "ERROR: Make sure you update the kenlm submodule on host before building this Dockerfile."; \
echo "ERROR: $ cd STT; git submodule update --init kenlm"; \
exit 1; )
FROM ubuntu:20.04 AS wget-binaries
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y --no-install-recommends wget unzip xz-utils && \
rm -rf /var/lib/apt/lists/*
# Tool to convert output graph for inference
RUN wget --no-check-certificate https://github.com/coqui-ai/STT/releases/download/v0.9.3/convert_graphdef_memmapped_format.linux.amd64.zip -O temp.zip && \
unzip temp.zip && \
rm temp.zip
RUN wget --no-check-certificate https://github.com/reuben/STT/releases/download/v0.10.0-alpha.1/native_client.tar.xz -O temp.tar.xz && \
tar -xf temp.tar.xz && \
rm temp.tar.xz
FROM nvcr.io/nvidia/tensorflow:20.06-tf1-py3
ENV DEBIAN_FRONTEND=noninteractive
# We need to purge python3-xdg because
# it's breaking STT install later with
# errors about setuptools
#
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
wget \
libopus0 \
libopusfile0 \
libsndfile1 \
sox \
libsox-fmt-mp3 && \
apt-get purge -y python3-xdg && \
rm -rf /var/lib/apt/lists/*
# Make sure pip and its dependencies are up-to-date
RUN pip3 install --upgrade pip wheel setuptools
WORKDIR /code
COPY native_client /code/native_client
COPY .git /code/.git
COPY training/coqui_stt_training/VERSION /code/training/coqui_stt_training/VERSION
COPY training/coqui_stt_training/GRAPH_VERSION /code/training/coqui_stt_training/GRAPH_VERSION
# Build CTC decoder first, to avoid clashes on incompatible versions upgrades
RUN cd native_client/ctcdecode && make NUM_PROCESSES=$(nproc) bindings
RUN pip3 install --upgrade native_client/ctcdecode/dist/*.whl
COPY setup.py /code/setup.py
COPY VERSION /code/VERSION
COPY training /code/training
# Copy files from previous build stages
RUN mkdir -p /code/kenlm/build/
COPY --from=kenlm-build /code/kenlm/build/bin /code/kenlm/build/bin
COPY --from=wget-binaries /convert_graphdef_memmapped_format /code/convert_graphdef_memmapped_format
COPY --from=wget-binaries /generate_scorer_package /code/generate_scorer_package
# Install STT
# No need for the decoder since we did it earlier
# TensorFlow GPU should already be installed on the base image,
# and we don't want to break that
RUN DS_NODECODER=y DS_NOTENSORFLOW=y pip3 install --upgrade -e .
# Copy rest of the code and test training
COPY . /code
RUN ./bin/run-ldc93s1.sh && rm -rf ~/.local/share/stt

View File

@ -0,0 +1,10 @@
.git/lfs
tensorflow
.git/modules/tensorflow
native_client/ds-swig
native_client/libstt.so
native_client/stt
native_client/ctcdecode/dist/
native_client/ctcdecode/temp_build
native_client/ctcdecode/third_party.a
native_client/ctcdecode/workspace_status.cc

12
Dockerfile.train.jupyter Normal file
View File

@ -0,0 +1,12 @@
# This is a Dockerfile useful for training models with Coqui STT in Jupyter notebooks
FROM ghcr.io/coqui-ai/stt-train:latest
WORKDIR /code/notebooks
RUN python3 -m pip install --no-cache-dir jupyter jupyter_http_over_ws
RUN jupyter serverextension enable --py jupyter_http_over_ws
EXPOSE 8888
CMD ["bash", "-c", "jupyter notebook --notebook-dir=/code/notebooks --ip 0.0.0.0 --no-browser --allow-root"]

1
GRAPH_VERSION Symbolic link
View File

@ -0,0 +1 @@
training/coqui_stt_training/GRAPH_VERSION

View File

@ -1,24 +0,0 @@
For support and discussions, please use our [Discourse forums](https://discourse.mozilla.org/c/deep-speech).
If you've found a bug, or have a feature request, then please create an issue with the following information:
- **Have I written custom code (as opposed to running examples on an unmodified clone of the repository)**:
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
- **TensorFlow installed from (our builds, or upstream TensorFlow)**:
- **TensorFlow version (use command below)**:
- **Python version**:
- **Bazel version (if compiling from source)**:
- **GCC/Compiler version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**:
- **Exact command to reproduce**:
You can obtain the TensorFlow version with
```bash
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
```
Please describe the problem clearly. Be sure to convey here why it's a bug or a feature request.
Include any logs or source code that would be helpful to diagnose the problem. For larger logs, link to a Gist, not a screenshot. If including tracebacks, please include the full traceback. Try to provide a reproducible test case.

2
MANIFEST.in Normal file
View File

@ -0,0 +1,2 @@
include training/coqui_stt_training/VERSION
include training/coqui_stt_training/GRAPH_VERSION

8
Makefile Normal file
View File

@ -0,0 +1,8 @@
STT_REPO ?= https://github.com/coqui-ai/STT.git
STT_SHA ?= origin/main
Dockerfile%: Dockerfile%.tmpl
sed \
-e "s|#STT_REPO#|$(STT_REPO)|g" \
-e "s|#STT_SHA#|$(STT_SHA)|g" \
< $< > $@

402
README.md
View File

@ -1,402 +0,0 @@
# Project DeepSpeech
[![Task Status](https://github.taskcluster.net/v1/repository/mozilla/DeepSpeech/master/badge.svg)](https://github.taskcluster.net/v1/repository/mozilla/DeepSpeech/master/latest)
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on [Baidu's Deep Speech research paper](https://arxiv.org/abs/1412.5567). Project DeepSpeech uses Google's [TensorFlow](https://www.tensorflow.org/) to make the implementation easier.
![Usage](images/usage.gif)
Pre-built binaries for performing inference with a trained model can be installed with `pip3`. Proper setup using a virtual environment is recommended, and you can find that documentation [below](#using-the-python-package).
A pre-trained English model is available for use and can be downloaded using [the instructions below](#getting-the-pre-trained-model). Currently, only 16-bit, 16 kHz, mono-channel WAVE audio files are supported in the Python client.
Once everything is installed, you can then use the `deepspeech` binary to do speech-to-text on short (approximately 5-second long) audio files as such:
```bash
pip3 install deepspeech
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
```
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. See the [release notes](https://github.com/mozilla/DeepSpeech/releases) to find which GPUs are supported. To run `deepspeech` on a GPU, install the GPU specific package:
```bash
pip3 install deepspeech-gpu
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
```
Please ensure you have the required [CUDA dependency](#cuda-dependency).
See the output of `deepspeech -h` for more information on the use of `deepspeech`. (If you experience problems running `deepspeech`, please check [required runtime dependencies](native_client/README.md#required-dependencies)).
**Table of Contents**
- [Prerequisites](#prerequisites)
- [Getting the code](#getting-the-code)
- [Getting the pre-trained model](#getting-the-pre-trained-model)
- [CUDA dependency](#cuda-dependency)
- [Using the model](#using-the-model)
- [Using the Python package](#using-the-python-package)
- [Using the command line client](#using-the-command-line-client)
- [Using the Node.JS package](#using-the-nodejs-package)
- [Installing bindings from source](#installing-bindings-from-source)
- [Third party bindings](#third-party-bindings)
- [Training](#training)
- [Installing prerequisites for training](#installing-prerequisites-for-training)
- [Recommendations](#recommendations)
- [Common Voice training data](#common-voice-training-data)
- [Training a model](#training-a-model)
- [Checkpointing](#checkpointing)
- [Exporting a model for inference](#exporting-a-model-for-inference)
- [Exporting a model for TFLite](#exporting-a-model-for-tflite)
- [Distributed computing across more than one machine](#distributed-training-across-more-than-one-machine)
- [Continuing training from a release model](#continuing-training-from-a-release-model)
- [Contact/Getting Help](#contactgetting-help)
## Prerequisites
* [Python 3.6](https://www.python.org/)
* [Git Large File Storage](https://git-lfs.github.com/)
* Mac or Linux environment
* Go to [build README](examples/net_framework/README.md) to start building DeepSpeech for Windows from source.
## Getting the code
Install [Git Large File Storage](https://git-lfs.github.com/) either manually or through a package-manager if available on your system. Then clone the DeepSpeech repository normally:
```bash
git clone https://github.com/mozilla/DeepSpeech
```
## Getting the pre-trained model
If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech [releases page](https://github.com/mozilla/DeepSpeech/releases). Alternatively, you can run the following command to download and unzip the model files in your current directory:
```bash
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/deepspeech-0.4.1-models.tar.gz
tar xvfz deepspeech-0.4.1-models.tar.gz
```
## Using the model
There are three ways to use DeepSpeech inference:
- [The Python package](#using-the-python-package)
- [The command-line client](#using-the-command-line-client)
- [The Node.JS package](#using-the-nodejs-package)
### CUDA dependency
The GPU capable builds (Python, NodeJS, C++ etc) depend on the same CUDA runtime as upstream TensorFlow. Currently with TensorFlow r1.12 it depends on CUDA 9.0 and CuDNN v7.2.
### Using the Python package
Pre-built binaries which can be used for performing inference with a trained model can be installed with `pip3`. You can then use the `deepspeech` binary to do speech-to-text on an audio file:
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in [this documentation](http://docs.python-guide.org/en/latest/dev/virtualenvs/).
We will continue under the assumption that you already have your system properly setup to create new virtual environments.
#### Create a DeepSpeech virtual environment
In creating a virtual environment you will create a directory containing a `python3` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on `$HOME/tmp/deepspeech-venv`. You can create it using this command:
```
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
```
Once this command completes successfully, the environment will be ready to be activated.
#### Activating the environment
Each time you need to work with DeepSpeech, you have to *activate* this virtual environment. This is done with this simple command:
```
$ source $HOME/tmp/deepspeech-venv/bin/activate
```
#### Installing DeepSpeech Python bindings
Once your environment has been set-up and loaded, you can use `pip3` to manage packages locally. On a fresh setup of the `virtualenv`, you will have to install the DeepSpeech wheel. You can check if `deepspeech` is already installed with `pip3 list`.
To perform the installation, just use `pip3` as such:
```
$ pip3 install deepspeech
```
If `deepspeech` is already installed, you can update it as such:
```
$ pip3 install --upgrade deepspeech
```
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
```
$ pip3 install deepspeech-gpu
```
See the [release notes](https://github.com/mozilla/DeepSpeech/releases) to find which GPUs are supported. Please ensure you have the required [CUDA dependency](#cuda-dependency).
You can update `deepspeech-gpu` as follows:
```
$ pip3 install --upgrade deepspeech-gpu
```
In both cases, `pip3` should take care of installing all the required dependencies. After installation has finished, you should be able to call `deepspeech` from the command-line.
Note: the following command assumes you [downloaded the pre-trained model](#getting-the-pre-trained-model).
```bash
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
```
The arguments `--lm` and `--trie` are optional, and represent a language model.
See [client.py](native_client/python/client.py) for an example of how to use the package programatically.
### Using the command-line client
To download the pre-built binaries for the `deepspeech` command-line client, use `util/taskcluster.py`:
```bash
python3 util/taskcluster.py --target .
```
or if you're on macOS:
```bash
python3 util/taskcluster.py --arch osx --target .
```
also, if you need some binaries different than current master, like `v0.2.0-alpha.6`, you can use `--branch`:
```bash
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
```
The script `taskcluster.py` will download `native_client.tar.xz` (which includes the `deepspeech` binary and associated libraries) and extract it into the current folder. Also, `taskcluster.py` will download binaries for Linux/x86_64 by default, but you can override that behavior with the `--arch` parameter. See the help info with `python util/taskcluster.py -h` for more details. Specific branches of DeepSpeech or TensorFlow can be specified as well.
Note: the following command assumes you [downloaded the pre-trained model](#getting-the-pre-trained-model).
```bash
./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav
```
See the help output with `./deepspeech -h` and the [native client README](native_client/README.md) for more details.
### Using the Node.JS package
You can download the Node.JS bindings using `npm`:
```bash
npm install deepspeech
```
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
```bash
npm install deepspeech-gpu
```
See the [release notes](https://github.com/mozilla/DeepSpeech/releases) to find which GPUs are supported. Please ensure you have the required [CUDA dependency](#cuda-dependency).
See [client.js](native_client/javascript/client.js) for an example of how to use the bindings. Or download the [wav example](examples/nodejs_wav).
### Installing bindings from source
If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow these [`native_client` installation instructions](native_client/README.md).
### Third party bindings
In addition to the bindings above, third party developers have started to provide bindings to other languages:
* [Asticode](https://github.com/asticode) provides [Golang](https://golang.org) bindings in its [go-astideepspeech](https://github.com/asticode/go-astideepspeech) repo.
* [RustAudio](https://github.com/RustAudio) provide a [Rust](https://www.rust-lang.org) binding, the installation and use of which is described in their [deepspeech-rs](https://github.com/RustAudio/deepspeech-rs) repo.
* [stes](https://github.com/stes) provides preliminary [PKGBUILDs](https://wiki.archlinux.org/index.php/PKGBUILD) to install the client and python bindings on [Arch Linux](https://www.archlinux.org/) in the [arch-deepspeech](https://github.com/stes/arch-deepspeech) repo.
* [gst-deepspeech](https://github.com/Elleo/gst-deepspeech) provides a [GStreamer](https://gstreamer.freedesktop.org/) plugin which can be used from any language with GStreamer bindings.
## Training
### Installing prerequisites for training
Install the required dependencies using `pip3`:
```bash
cd DeepSpeech
pip3 install -r requirements.txt
```
You'll also need to install the `ds_ctcdecoder` Python package. `ds_ctcdecoder` is required for decoding the outputs of the `deepspeech` acoustic model into text. You can use `util/taskcluster.py` with the `--decoder` flag to get a URL to a binary of the decoder package appropriate for your platform and Python version:
```bash
pip3 install $(python3 util/taskcluster.py --decoder)
```
This command will download and install the `ds_ctcdecoder` package. If you prefer building the binaries from source, see the [native_client README file](native_client/README.md). You can override the platform with `--arch` if you want the package for ARM7 (`--arch arm`) or ARM64 (`--arch arm64`).
### Recommendations
If you have a capable (NVIDIA, at least 8GB of VRAM) GPU, it is highly recommended to install TensorFlow with GPU support. Training will be significantly faster than using the CPU. To enable GPU support, you can do:
```bash
pip3 uninstall tensorflow
pip3 install 'tensorflow-gpu==1.13.1'
```
Please ensure you have the required [CUDA dependency](#cuda-dependency).
### Common Voice training data
The Common Voice corpus consists of voice samples that were donated through Mozilla's [Common Voice](https://voice.mozilla.org/) Initiative.
We provide an importer (`bin/import_cv.py`) which automates downloading and preparing the Common Voice corpus as such:
```bash
bin/import_cv.py path/to/target/directory
```
If you already downloaded Common Voice from [here](https://voice.mozilla.org/data), simply run `bin/import_cv.py` on the directory where the corpus is located. The importer will detect that you've already downloaded the data and immediately proceed to unpackaging and importing. If you haven't downloaded the data already, `bin/import_cv.py` will download it for you and save to the path you've specified.
Please be aware that training with the Common Voice corpus archive requires at least 70GB of free disk space and quite some time to conclude. As this process creates a huge number of small files, using an SSD drive is highly recommended. If the import script gets interrupted, it will try to continue from where it stopped the next time you run it. Unfortunately, there are some cases where it will need to start over. Once the import is done, the directory will contain a bunch of CSV files.
The following files are official user-validated sets for training, validating and testing:
- `cv-valid-train.csv`
- `cv-valid-dev.csv`
- `cv-valid-test.csv`
The following files are the non-validated unofficial sets for training, validating and testing:
- `cv-other-train.csv`
- `cv-other-dev.csv`
- `cv-other-test.csv`
`cv-invalid.csv` contains all samples that users flagged as invalid.
A sub-directory called `cv_corpus_{version}` contains the mp3 and wav files that were extracted from an archive named `cv_corpus_{version}.tar.gz`.
All entries in the CSV files refer to their samples by absolute paths. So moving this sub-directory would require another import or tweaking the CSV files accordingly.
To use Common Voice data during training, validation and testing, you pass (comma separated combinations of) their filenames into `--train_files`, `--dev_files`, `--test_files` parameters of `DeepSpeech.py`.
If, for example, Common Voice was imported into `../data/CV`, `DeepSpeech.py` could be called like this:
```bash
./DeepSpeech.py --train_files ../data/CV/cv-valid-train.csv --dev_files ../data/CV/cv-valid-dev.csv --test_files ../data/CV/cv-valid-test.csv
```
If you are brave enough, you can also include the `other` dataset, which contains not-yet-validated content:
```bash
./DeepSpeech.py --train_files ../data/CV/cv-valid-train.csv,../data/CV/cv-other-train.csv --dev_files ../data/CV/cv-valid-dev.csv --test_files ../data/CV/cv-valid-test.csv
```
### Training a model
The central (Python) script is `DeepSpeech.py` in the project's root directory. For its list of command line options, you can call:
```bash
./DeepSpeech.py --helpfull
```
To get the output of this in a slightly better-formatted way, you can also look up the option definitions top `DeepSpeech.py`.
For executing pre-configured training scenarios, there is a collection of convenience scripts in the `bin` folder. Most of them are named after the corpora they are configured for. Keep in mind that the other speech corpora are *very large*, on the order of tens of gigabytes, and some aren't free. Downloading and preprocessing them can take a very long time, and training on them without a fast GPU (GTX 10 series recommended) takes even longer.
**If you experience GPU OOM errors while training, try reducing the batch size with the `--train_batch_size`, `--dev_batch_size` and `--test_batch_size` parameters.**
As a simple first example you can open a terminal, change to the directory of the DeepSpeech checkout and run:
```bash
./bin/run-ldc93s1.sh
```
This script will train on a small sample dataset called LDC93S1, which can be overfitted on a GPU in a few minutes for demonstration purposes. From here, you can alter any variables with regards to what dataset is used, how many training iterations are run and the default values of the network parameters.
Feel also free to pass additional (or overriding) `DeepSpeech.py` parameters to these scripts. Then, just run the script to train the modified network.
Each dataset has a corresponding importer script in `bin/` that can be used to download (if it's freely available) and preprocess the dataset. See `bin/import_librivox.py` for an example of how to import and preprocess a large dataset for training with DeepSpeech.
If you've run the old importers (in `util/importers/`), they could have removed source files that are needed for the new importers to run. In that case, simply remove the extracted folders and let the importer extract and process the dataset from scratch, and things should work.
### Checkpointing
During training of a model so-called checkpoints will get stored on disk. This takes place at a configurable time interval. The purpose of checkpoints is to allow interruption (also in the case of some unexpected failure) and later continuation of training without losing hours of training time. Resuming from checkpoints happens automatically by just (re)starting training with the same `--checkpoint_dir` of the former run.
Be aware however that checkpoints are only valid for the same model geometry they had been generated from. In other words: If there are error messages of certain `Tensors` having incompatible dimensions, this is most likely due to an incompatible model change. One usual way out would be to wipe all checkpoint files in the checkpoint directory or changing it before starting the training.
### Exporting a model for inference
If the `--export_dir` parameter is provided, a model will have been exported to this directory during training.
Refer to the corresponding [README.md](native_client/README.md) for information on building and running a client that can use the exported model.
### Exporting a model for TFLite
If you want to experiment with the TF Lite engine, you need to export a model that is compatible with it, then use the `--export_tflite` flag. If you already have a trained model, you can re-export it for TFLite by running `DeepSpeech.py` again and specifying the same `checkpoint_dir` that you used for training, as well as passing `--notrain --notest --export_tflite --export_dir /model/export/destination`.
### Making a mmap-able model for inference
The `output_graph.pb` model file generated in the above step will be loaded in memory to be dealt with when running inference.
This will result in extra loading time and memory consumption. One way to avoid this is to directly read data from the disk.
TensorFlow has tooling to achieve this: it requires building the target `//tensorflow/contrib/util:convert_graphdef_memmapped_format` (binaries are produced by our TaskCluster for some systems including Linux/amd64 and macOS/amd64), use `util/taskcluster.py` tool to download, specifying `tensorflow` as a source and `convert_graphdef_memmapped_format` as artifact.
Producing a mmap-able model is as simple as:
```
$ convert_graphdef_memmapped_format --in_graph=output_graph.pb --out_graph=output_graph.pbmm
```
Upon sucessfull run, it should report about conversion of a non-zero number of nodes. If it reports converting `0` nodes, something is wrong: make sure your model is a frozen one, and that you have not applied any incompatible changes (this includes `quantize_weights`).
### Distributed training across more than one machine
DeepSpeech has built-in support for [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed). To get an idea on how this works, you can use the script `bin/run-cluster.sh` for running a cluster with workers just on the local machine.
```bash
$ bin/run-cluster.sh --help
Usage: run-cluster.sh [--help] [--script script] [p:w:g] <arg>*
--help print this help message
--script run the provided script instead of DeepSpeech.py
p number of local parameter servers
w number of local workers
g number of local GPUs per worker
<arg>* remaining parameters will be forwarded to DeepSpeech.py or a provided script
Example usage - The following example will create a local DeepSpeech.py cluster
with 1 parameter server, and 2 workers with 1 GPU each:
$ run-cluster.sh 1:2:1 --epoch 10
```
Be aware that for the help example to be able to run, you need at least two `CUDA` capable GPUs (2 workers x 1 GPU). The script utilizes environment variable `CUDA_VISIBLE_DEVICES` for `DeepSpeech.py` to see only the provided number of GPUs per worker.
The script is meant to be a template for your own distributed computing instrumentation. Just modify the startup code for the different servers (workers and parameter servers) accordingly. You could use SSH or something similar for running them on your remote hosts.
### Continuing training from a release model
If you'd like to use one of the pre-trained models released by Mozilla to bootstrap your training process (transfer learning, fine tuning), you can do so by using the `--checkpoint_dir` flag in `DeepSpeech.py`. Specify the path where you downloaded the checkpoint from the release, and training will resume from the pre-trained model.
For example, if you want to fine tune the entire graph using your own data in `my-train.csv`, `my-dev.csv` and `my-test.csv`, for three epochs, you can something like the following, tuning the hyperparameters as needed:
```bash
mkdir fine_tuning_checkpoints
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir path/to/checkpoint/folder --epoch -3 --train_files my-train.csv --dev_files my-dev.csv --test_files my_dev.csv --learning_rate 0.0001
```
Note: the released models were trained with `--n_hidden 2048`, so you need to use that same value when initializing from the release models. Note as well the use of a negative epoch count -3 (meaning 3 more epochs) since the checkpoint you're loading from was already trained for several epochs.
## Contact/Getting Help
There are several ways to contact us or to get help:
1. [**FAQ**](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions) - We have a list of common questions, and their answers, in our [FAQ](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions). When just getting started, it's best to first check the [FAQ](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions) to see if your question is addressed.
2. [**Discourse Forums**](https://discourse.mozilla.org/c/deep-speech) - If your question is not addressed in the [FAQ](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions), the [Discourse Forums](https://discourse.mozilla.org/c/deep-speech) is the next place to look. They contain conversations on [General Topics](https://discourse.mozilla.org/t/general-topics/21075), [Using Deep Speech](https://discourse.mozilla.org/t/using-deep-speech/21076/4), and [Deep Speech Development](https://discourse.mozilla.org/t/deep-speech-development/21077).
3. [**IRC**](https://wiki.mozilla.org/IRC) - If your question is not addressed by either the [FAQ](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions) or [Discourse Forums](https://discourse.mozilla.org/c/deep-speech), you can contact us on the `#machinelearning` channel on [Mozilla IRC](https://wiki.mozilla.org/IRC); people there can try to answer/help
4. [**Issues**](https://github.com/mozilla/deepspeech/issues) - Finally, if all else fails, you can open an issue in our repo.

69
README.rst Normal file
View File

@ -0,0 +1,69 @@
.. image:: images/coqui-STT-logo-green.png
:alt: Coqui STT logo
.. |doc-img| image:: https://readthedocs.org/projects/stt/badge/?version=latest
:target: https://stt.readthedocs.io/?badge=latest
:alt: Documentation
.. |covenant-img| image:: https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg
:target: CODE_OF_CONDUCT.md
:alt: Contributor Covenant
.. |gitter-img| image:: https://badges.gitter.im/coqui-ai/STT.svg
:target: https://gitter.im/coqui-ai/STT?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge
:alt: Gitter Room
.. |doi| image:: https://zenodo.org/badge/344354127.svg
:target: https://zenodo.org/badge/latestdoi/344354127
|doc-img| |covenant-img| |gitter-img| |doi|
`👉 Subscribe to 🐸Coqui's Newsletter <https://coqui.ai/?subscription=true>`_
**Coqui STT** (🐸STT) is a fast, open-source, multi-platform, deep-learning toolkit for training and deploying speech-to-text models. 🐸STT is battle tested in both production and research 🚀
🐸STT features
---------------
* High-quality pre-trained STT model.
* Efficient training pipeline with Multi-GPU support.
* Streaming inference.
* Multiple possible transcripts, each with an associated confidence score.
* Real-time inference.
* Small-footprint acoustic model.
* Bindings for various programming languages.
Where to Ask Questions
----------------------
.. list-table::
:widths: 25 25
:header-rows: 1
* - Type
- Link
* - 🚨 **Bug Reports**
- `Github Issue Tracker <https://github.com/coqui-ai/STT/issues/>`_
* - 🎁 **Feature Requests & Ideas**
- `Github Issue Tracker <https://github.com/coqui-ai/STT/issues/>`_
* - ❔ **Questions**
- `Github Discussions <https://github.com/coqui-ai/stt/discussions/>`_
* - 💬 **General Discussion**
- `Github Discussions <https://github.com/coqui-ai/stt/discussions/>`_ or `Gitter Room <https://gitter.im/coqui-ai/STT?utm_source=share-link&utm_medium=link&utm_campaign=share-link>`_
Links & Resources
-----------------
.. list-table::
:widths: 25 25
:header-rows: 1
* - Type
- Link
* - 📰 **Documentation**
- `stt.readthedocs.io <https://stt.readthedocs.io/>`_
* - 🚀 **Latest release with pre-trained models**
- `see the latest release on GitHub <https://github.com/coqui-ai/STT/releases/latest>`_
* - 🤝 **Contribution Guidelines**
- `CONTRIBUTING.rst <CONTRIBUTING.rst>`_

View File

@ -1,9 +0,0 @@
Making a (new) release of the codebase
======================================
- Update version in VERSION file, commit
- Open PR, ensure all tests are passing properly
- Merge the PR
- Fetch the new master, tag it with (hopefully) the same version as in VERSION
- Push that to Github
- New build should be triggered and new packages should be made
- TaskCluster should schedule a merge build **including** a "DeepSpeech Packages" task

95
RELEASE_NOTES.md Normal file
View File

@ -0,0 +1,95 @@
# General
This is the 1.0.0 release for Coqui STT, the deep learning toolkit for speech-to-text. In accordance with [semantic versioning](https://semver.org/), this version is not completely backwards compatible with previous versions. The compatibility guarantees of our semantic versioning cover the inference APIs: the C API and all the official language bindings: Python, Node.JS/ElectronJS and Android. You can get started today with Coqui STT 1.0.0 by following the steps in our [documentation](https://stt.readthedocs.io/).
This release includes pre-trained English models, available in the Coqui Model Zoo:
- [Coqui English STT v1.0.0-huge-vocab](https://coqui.ai/english/coqui/v1.0.0-huge-vocab)
- [Coqui English STT v1.0.0-yesno](https://coqui.ai/english/coqui/v1.0.0-yesno)
- [Coqui English STT v1.0.0-large-vocab](https://coqui.ai/english/coqui/v1.0.0-large-vocab)
- [Coqui English STT v1.0.0-digits](https://coqui.ai/english/coqui/v1.0.0-digits)
all under the Apache 2.0 license.
The acoustic models were trained on American English data with synthetic noise augmentation. The model achieves a 4.5% word error rate on the [LibriSpeech clean test corpus](http://www.openslr.org/12) and 13.6% word error rate on the [LibriSpeech other test corpus](http://www.openslr.org/12) with the largest release language model.
Note that the model currently performs best in low-noise environments with clear recordings. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to further fine tune the model to meet their intended use-case.
We also include example audio files:
[audio-1.0.0.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.0.0/audio-1.0.0.tar.gz)
which can be used to test the engine, and checkpoint files for the English model:
[coqui-stt-1.0.0-checkpoint.tar.gz](https://github.com/coqui-ai/STT/releases/download/v1.0.0/coqui-stt-1.0.0-checkpoint.tar.gz)
which are under the Apache 2.0 license and can be used as the basis for further fine-tuning. Finally this release also includes a source code tarball:
[v1.0.0.tar.gz](https://github.com/coqui-ai/STT/archive/v1.0.0.tar.gz)
Under the [MPL-2.0 license](https://www.mozilla.org/en-US/MPL/2.0/). Note that this tarball is for archival purposes only since GitHub does not include submodules in the automatic tarballs. For usage and development with the source code, clone the repository using Git, following our [documentation](https://stt.readthedocs.io/).
# Notable changes
- Removed support for protocol buffer input in native client and consolidated all packages under a single "STT" name accepting TFLite inputs
- Added programmatic interface to training code and example Jupyter Notebooks, including how to train with Common Voice data
- Added transparent handling of mixed sample rates and stereo audio in training inputs
- Moved CI setup to GitHub Actions, making code contributions easier to test
- Added configuration management via Coqpit, providing a more flexible config interface that's compatible with Coqui TTS
- Handle Opus audio files transparently in training inputs
- Added support for automatic dataset subset splitting
- Added support for automatic alphabet generation and loading
- Started publishing the training code CI for a faster notebook setup
- Refactor training code into self-contained modules and deprecate train.py as universal entry point for training
# Training Regimen + Hyperparameters for fine-tuning
The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 NVIDIA A100 GPUs each with 40GB of VRAM), along with the full training hyperparameters. The full training configuration in JSON format is available [here](https://gist.github.com/reuben/6ced6a8b41e3d0849dafb7cae301e905).
The datasets used were:
- Common Voice 7.0 (with custom train/dev/test splits)
- Multilingual LibriSpeech (English, Opus)
- LibriSpeech
The optimal `lm_alpha` and `lm_beta` values with respect to the Common Voice 7.0 (custom Coqui splits) and a large vocabulary language model:
- lm_alpha: 0.5891777425167632
- lm_beta: 0.6619145283338659
# Documentation
Documentation is available on [stt.readthedocs.io](https://stt.readthedocs.io/).
# Contact/Getting Help
1. [GitHub Discussions](https://github.com/coqui-ai/STT/discussions/) - best place to ask questions, get support, and discuss anything related to 🐸STT with other users.
3. [Gitter](https://gitter.im/coqui-ai/) - You can also join our Gitter chat.
4. [Issues](https://github.com/coqui-ai/STT/issues) - If you have discussed a problem and identified a bug in 🐸STT, or if you have a feature request, please open an issue in our repo. Please make sure you search for an already existing issue beforehand!
# Contributors to 1.0.0 release
- Alexandre Lissy
- Anon-Artist
- Anton Yaroshenko
- Catalin Voss
- CatalinVoss
- dag7dev
- Dustin Zubke
- Eren Gölge
- Erik Ziegler
- Francis Tyers
- Ideefixze
- Ilnar Salimzianov
- imrahul3610
- Jeremiah Rose
- Josh Meyer
- Kathy Reid
- Kelly Davis
- Kenneth Heafield
- NanoNabla
- Neil Stoker
- Reuben Morais
- zaptrem
Wed also like to thank all the members of our [Gitter chat room](https://gitter.im/coqui-ai/STT) who have been helping to shape this release!

View File

@ -1 +0,0 @@
0.5.0-alpha.3

1
VERSION Symbolic link
View File

@ -0,0 +1 @@
training/coqui_stt_training/VERSION

View File

@ -9,23 +9,23 @@ index c7aa4cb63..e084bc27c 100644
+import java.io.PrintWriter;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
@@ -73,6 +74,8 @@ public final class FileWriteAction extends AbstractFileWriteAction {
*/
private final CharSequence fileContents;
+ private final Artifact output;
+
/** Minimum length (in chars) for content to be eligible for compression. */
private static final int COMPRESS_CHARS_THRESHOLD = 256;
@@ -90,6 +93,7 @@ public final class FileWriteAction extends AbstractFileWriteAction {
fileContents = new CompressedString((String) fileContents);
}
this.fileContents = fileContents;
+ this.output = output;
}
/**
@@ -230,11 +234,32 @@ public final class FileWriteAction extends AbstractFileWriteAction {
*/
@ -59,7 +59,7 @@ index c7aa4cb63..e084bc27c 100644
+ computeKeyDebugWriter.close();
+ return rv;
}
/**
diff --git a/src/main/java/com/google/devtools/build/lib/analysis/actions/SpawnAction.java b/src/main/java/com/google/devtools/build/lib/analysis/actions/SpawnAction.java
index 580788160..26883eb92 100644
@ -74,9 +74,9 @@ index 580788160..26883eb92 100644
import java.util.Collections;
import java.util.LinkedHashMap;
@@ -91,6 +92,9 @@ public class SpawnAction extends AbstractAction implements ExecutionInfoSpecifie
private final CommandLine argv;
+ private final Iterable<Artifact> inputs;
+ private final Iterable<Artifact> outputs;
+
@ -91,10 +91,10 @@ index 580788160..26883eb92 100644
+ this.inputs = inputs;
+ this.outputs = outputs;
}
@Override
@@ -312,23 +319,89 @@ public class SpawnAction extends AbstractAction implements ExecutionInfoSpecifie
@Override
protected String computeKey() {
+ boolean genruleSetup = String.valueOf(Iterables.get(inputs, 0).getExecPath()).contains("genrule/genrule-setup.sh");
@ -182,14 +182,14 @@ index 580788160..26883eb92 100644
+ }
+ return rv;
}
@Override
diff --git a/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java b/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
index 3559fffde..3ba39617c 100644
--- a/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
+++ b/src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
@@ -1111,10 +1111,30 @@ public class CppCompileAction extends AbstractAction
@Override
public String computeKey() {
+ // ".ckd" Compute Key Debug
@ -216,7 +216,7 @@ index 3559fffde..3ba39617c 100644
+ for (Map.Entry<String, String> entry : executionInfo.entrySet()) {
+ computeKeyDebugWriter.println("EXECINFO: " + entry.getKey() + "=" + entry.getValue());
+ }
// For the argv part of the cache key, ignore all compiler flags that explicitly denote module
// file (.pcm) inputs. Depending on input discovery, some of the unused ones are removed from
@@ -1124,6 +1144,9 @@ public class CppCompileAction extends AbstractAction
@ -226,7 +226,7 @@ index 3559fffde..3ba39617c 100644
+ for (String input : compileCommandLine.getArgv(getInternalOutputFile(), null)) {
+ computeKeyDebugWriter.println("COMMAND: " + input);
+ }
/*
* getArgv() above captures all changes which affect the compilation
@@ -1133,19 +1156,31 @@ public class CppCompileAction extends AbstractAction
@ -260,5 +260,5 @@ index 3559fffde..3ba39617c 100644
+ computeKeyDebugWriter.close();
+ return rv;
}
@Override

View File

@ -1,3 +1,4 @@
# Utility scripts
Utility scripts
===============
This folder contains scripts that can be used to do training on the various included importers from the command line. This is useful to be able to run training without a browser open, or unattended on a remote machine. They should be run from the base directory of the repository. Note that the default settings assume a very well-specified machine. In the situation that out-of-memory errors occur, you may find decreasing the values of `--train_batch_size`, `--dev_batch_size` and `--test_batch_size` will allow you to continue, at the expense of speed.
This folder contains scripts that can be used to do training on the various included importers from the command line. This is useful to be able to run training without a browser open, or unattended on a remote machine. They should be run from the base directory of the repository. Note that the default settings assume a very well-specified machine. In the situation that out-of-memory errors occur, you may find decreasing the values of ``--train_batch_size``\ , ``--dev_batch_size`` and ``--test_batch_size`` will allow you to continue, at the expense of speed.

View File

@ -1,506 +0,0 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import os
import sys
# To use util.tc
sys.path.append(os.path.abspath(os.path.dirname(os.path.dirname(sys.argv[0]))))
import util.taskcluster as tcu
from util.benchmark import keep_only_digits
import paramiko
import argparse
import tempfile
import shutil
import subprocess
import stat
import numpy
import matplotlib.pyplot as plt
import scipy.stats as scipy_stats
import csv
import getpass
import zipfile
from six import iteritems
from six.moves import range, map
r'''
Tool to:
- remote local or remote (ssh) native_client
- handles copying models (as protocolbuffer files)
- run native_client in benchmark mode
- collect timing results
- compute mean values (with wariances)
- output as CSV
'''
ssh_conn = None
def exec_command(command, cwd=None):
r'''
Helper to exec locally (subprocess) or remotely (paramiko)
'''
rc = None
stdout = stderr = None
if ssh_conn is None:
ld_library_path = {'LD_LIBRARY_PATH': '.:%s' % os.environ.get('LD_LIBRARY_PATH', '')}
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, env=ld_library_path, cwd=cwd)
stdout, stderr = p.communicate()
rc = p.returncode
else:
# environment= requires paramiko >= 2.1 (fails with 2.0.2)
final_command = command if cwd is None else 'cd %s && %s %s' % (cwd, 'LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH', command)
ssh_stdin, ssh_stdout, ssh_stderr = ssh_conn.exec_command(final_command)
stdout = ''.join(ssh_stdout.readlines())
stderr = ''.join(ssh_stderr.readlines())
rc = ssh_stdout.channel.recv_exit_status()
return rc, stdout, stderr
def assert_valid_dir(dir):
if dir is None:
raise AssertionError('Invalid temp directory')
return True
def get_arch_string():
r'''
Check local or remote system arch, to produce TaskCluster proper link.
'''
rc, stdout, stderr = exec_command('uname -sm')
if rc > 0:
raise AssertionError('Error checking OS')
stdout = stdout.lower().strip()
if not 'linux' in stdout:
raise AssertionError('Unsupported OS')
if 'armv7l' in stdout:
return 'arm'
if 'x86_64' in stdout:
nv_rc, nv_stdout, nv_stderr = exec_command('nvidia-smi')
nv_stdout = nv_stdout.lower().strip()
if 'NVIDIA-SMI' in nv_stdout:
return 'gpu'
else:
return 'cpu'
raise AssertionError('Unsupported arch:', stdout)
def maybe_download_binaries(dir):
assert_valid_dir(dir)
tcu.maybe_download_tc(target_dir=dir, tc_url=tcu.get_tc_url(get_arch_string()), progress=True)
def extract_native_client_tarball(dir):
r'''
Download a native_client.tar.xz file from TaskCluster and extract it to dir.
'''
assert_valid_dir(dir)
target_tarball = os.path.join(dir, 'native_client.tar.xz')
if os.path.isfile(target_tarball) and os.stat(target_tarball).st_size == 0:
return
subprocess.check_call(['pixz', '-d', 'native_client.tar.xz'], cwd=dir)
subprocess.check_call(['tar', 'xf', 'native_client.tar'], cwd=dir)
os.unlink(os.path.join(dir, 'native_client.tar'))
open(target_tarball, 'w').close()
def is_zip_file(models):
r'''
Ensure that a path is a zip file by:
- checking length is 1
- checking extension is '.zip'
'''
ext = os.path.splitext(models[0])[1]
return (len(models) == 1) and (ext == '.zip')
def maybe_inspect_zip(models):
r'''
Detect if models is a list of protocolbuffer files or a ZIP file.
If the latter, then unzip it and return the list of protocolbuffer files
that were inside.
'''
if not(is_zip_file(models)):
return models
if len(models) > 1:
return models
if len(models) < 1:
raise AssertionError('No models at all')
return zipfile.ZipFile(models[0]).namelist()
def all_files(models=[]):
r'''
Return a list of full path of files matching 'models', sorted in human
numerical order (i.e., 0 1 2 ..., 10 11 12, ..., 100, ..., 1000).
Files are supposed to be named identically except one variable component
e.g. the list,
test.weights.e5.lstm1200.ldc93s1.pb
test.weights.e5.lstm1000.ldc93s1.pb
test.weights.e5.lstm800.ldc93s1.pb
gets sorted:
test.weights.e5.lstm800.ldc93s1.pb
test.weights.e5.lstm1000.ldc93s1.pb
test.weights.e5.lstm1200.ldc93s1.pb
'''
def nsort(a, b):
fa = os.path.basename(a).split('.')
fb = os.path.basename(b).split('.')
elements_to_remove = []
assert len(fa) == len(fb)
for i in range(0, len(fa)):
if fa[i] == fb[i]:
elements_to_remove.append(fa[i])
for e in elements_to_remove:
fa.remove(e)
fb.remove(e)
assert len(fa) == len(fb)
assert len(fa) == 1
fa = keep_only_digits(fa[0])
fb = keep_only_digits(fb[0])
if fa < fb:
return -1
if fa == fb:
return 0
if fa > fb:
return 1
base = list(map(lambda x: os.path.abspath(x), maybe_inspect_zip(models)))
base.sort(cmp=nsort)
return base
def copy_tree(dir):
assert_valid_dir(dir)
sftp = ssh_conn.open_sftp()
# IOError will get triggered if the path does not exists remotely
try:
if stat.S_ISDIR(sftp.stat(dir).st_mode):
print('Directory already existent: %s' % dir)
except IOError:
print('Creating remote directory: %s' % dir)
sftp.mkdir(dir)
print('Copy files to remote')
for fname in os.listdir(dir):
fullpath = os.path.join(dir, fname)
local_stat = os.stat(fullpath)
try:
remote_mode = sftp.stat(fullpath).st_mode
except IOError:
remote_mode = 0
if not stat.S_ISREG(remote_mode):
print('Copying %s ...' % fullpath)
remote_mode = sftp.put(fullpath, fullpath, confirm=True).st_mode
if local_stat.st_mode != remote_mode:
print('Setting proper remote mode: %s' % local_stat.st_mode)
sftp.chmod(fullpath, local_stat.st_mode)
sftp.close()
def delete_tree(dir):
assert_valid_dir(dir)
sftp = ssh_conn.open_sftp()
# IOError will get triggered if the path does not exists remotely
try:
if stat.S_ISDIR(sftp.stat(dir).st_mode):
print('Removing remote files')
for fname in sftp.listdir(dir):
fullpath = os.path.join(dir, fname)
remote_stat = sftp.stat(fullpath)
if stat.S_ISREG(remote_stat.st_mode):
print('Removing %s ...' % fullpath)
sftp.remove(fullpath)
print('Removing directory %s ...' % dir)
sftp.rmdir(dir)
sftp.close()
except IOError:
print('No remote directory: %s' % dir)
def setup_tempdir(dir, models, wav, alphabet, lm_binary, trie, binaries):
r'''
Copy models, libs and binary to a directory (new one if dir is None)
'''
if dir is None:
dir = tempfile.mkdtemp(suffix='dsbench')
sorted_models = all_files(models=models)
if binaries is None:
maybe_download_binaries(dir)
else:
print('Using local binaries: %s' % (binaries))
shutil.copy2(binaries, dir)
extract_native_client_tarball(dir)
filenames = map(lambda x: os.path.join(dir, os.path.basename(x)), sorted_models)
missing_models = filter(lambda x: not os.path.isfile(x), filenames)
if len(missing_models) > 0:
# If we have a ZIP file, directly extract it to the proper path
if is_zip_file(models):
print('Extracting %s to %s' % (models[0], dir))
zipfile.ZipFile(models[0]).extractall(path=dir)
print('Extracted %s.' % models[0])
else:
# If one model is missing, let's copy everything again. Be safe.
for f in sorted_models:
print('Copying %s to %s' % (f, dir))
shutil.copy2(f, dir)
for extra_file in [ wav, alphabet, lm_binary, trie ]:
if extra_file and not os.path.isfile(os.path.join(dir, os.path.basename(extra_file))):
print('Copying %s to %s' % (extra_file, dir))
shutil.copy2(extra_file, dir)
if ssh_conn:
copy_tree(dir)
return dir, sorted_models
def teardown_tempdir(dir):
r'''
Cleanup temporary directory.
'''
if ssh_conn:
delete_tree(dir)
assert_valid_dir(dir)
shutil.rmtree(dir)
def get_sshconfig():
r'''
Read user's SSH configuration file
'''
with open(os.path.expanduser('~/.ssh/config')) as f:
cfg = paramiko.SSHConfig()
cfg.parse(f)
ret_dict = {}
for d in cfg._config:
_copy = dict(d)
# Avoid buggy behavior with strange host definitions, we need
# Hostname and not Host.
del _copy['host']
for host in d['host']:
ret_dict[host] = _copy['config']
return ret_dict
def establish_ssh(target=None, auto_trust=False, allow_agent=True, look_keys=True):
r'''
Establish a SSH connection to a remote host. It should be able to use
SSH's config file Host name declarations. By default, will not automatically
add trust for hosts, will use SSH agent and will try to load keys.
'''
def password_prompt(username, hostname):
r'''
If the Host is relying on password authentication, lets ask it.
Relying on SSH itself to take care of that would not work when the
remote authentication is password behind a SSH-key+2FA jumphost.
'''
return getpass.getpass('No SSH key for %s@%s, please provide password: ' % (username, hostname))
ssh_conn = None
if target is not None:
ssh_conf = get_sshconfig()
cfg = {
'hostname': None,
'port': 22,
'allow_agent': allow_agent,
'look_for_keys': look_keys
}
if ssh_conf.has_key(target):
user_config = ssh_conf.get(target)
# If ssh_config file's Host defined 'User' instead of 'Username'
if user_config.has_key('user') and not user_config.has_key('username'):
user_config['username'] = user_config['user']
del user_config['user']
for k in ('username', 'hostname', 'port'):
if k in user_config:
cfg[k] = user_config[k]
# Assume Password auth. If we don't do that, then when connecting
# through a jumphost we will run into issues and the user will
# not be able to input his password to the SSH prompt.
if 'identityfile' in user_config:
cfg['key_filename'] = user_config['identityfile']
else:
cfg['password'] = password_prompt(cfg['username'], cfg['hostname'] or target)
# Should be the last one, since ProxyCommand will issue connection to remote host
if 'proxycommand' in user_config:
cfg['sock'] = paramiko.ProxyCommand(user_config['proxycommand'])
else:
cfg['username'] = target.split('@')[0]
cfg['hostname'] = target.split('@')[1].split(':')[0]
cfg['password'] = password_prompt(cfg['username'], cfg['hostname'])
try:
cfg['port'] = int(target.split('@')[1].split(':')[1])
except IndexError:
# IndexError will happen if no :PORT is there.
# Default value 22 is defined above in 'cfg'.
pass
ssh_conn = paramiko.SSHClient()
if auto_trust:
ssh_conn.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh_conn.connect(**cfg)
return ssh_conn
def run_benchmarks(dir, models, wav, alphabet, lm_binary=None, trie=None, iters=-1):
r'''
Core of the running of the benchmarks. We will run on all of models, against
the WAV file provided as wav, and the provided alphabet.
'''
assert_valid_dir(dir)
inference_times = [ ]
for model in models:
model_filename = model
current_model = {
'name': model,
'iters': [ ],
'mean': numpy.infty,
'stddev': numpy.infty
}
if lm_binary and trie:
cmdline = './deepspeech --model "%s" --alphabet "%s" --lm "%s" --trie "%s" --audio "%s" -t' % (model_filename, alphabet, lm_binary, trie, wav)
else:
cmdline = './deepspeech --model "%s" --alphabet "%s" --audio "%s" -t' % (model_filename, alphabet, wav)
for it in range(iters):
sys.stdout.write('\rRunning %s: %d/%d' % (os.path.basename(model), (it+1), iters))
sys.stdout.flush()
rc, stdout, stderr = exec_command(cmdline, cwd=dir)
if rc == 0:
inference_time = float(stdout.split('\n')[1].split('=')[-1])
# print("[%d] model=%s inference=%f" % (it, model, inference_time))
current_model['iters'].append(inference_time)
else:
print('exec_command("%s") failed with rc=%d' % (cmdline, rc))
print('stdout: %s' % stdout)
print('stderr: %s' % stderr)
raise AssertionError('Execution failure: rc=%d' % (rc))
sys.stdout.write('\n')
sys.stdout.flush()
current_model['mean'] = numpy.mean(current_model['iters'])
current_model['stddev'] = numpy.std(current_model['iters'])
inference_times.append(current_model)
return inference_times
def produce_csv(input, output):
r'''
Take an input dictionnary and write it to the object-file output.
'''
output.write('"model","mean","std"\n')
for model_data in input:
output.write('"%s",%f,%f\n' % (model_data['name'], model_data['mean'], model_data['stddev']))
output.flush()
output.close()
print("Wrote as %s" % output.name)
def handle_args():
parser = argparse.ArgumentParser(description='Benchmarking tooling for DeepSpeech native_client.')
parser.add_argument('--target', required=False,
help='SSH user:pass@host string for remote benchmarking. This can also be a name of a matching \'Host\' in your SSH config.')
parser.add_argument('--autotrust', action='store_true', default=False,
help='SSH Paramiko policy to automatically trust unknown keys.')
parser.add_argument('--allowagent', action='store_true', dest='allowagent',
help='Allow the use of a SSH agent.')
parser.add_argument('--no-allowagent', action='store_false', dest='allowagent',
help='Disallow the use of a SSH agent.')
parser.add_argument('--lookforkeys', action='store_true', dest='lookforkeys',
help='Allow to look for SSH keys in ~/.ssh/.')
parser.add_argument('--no-lookforkeys', action='store_false', dest='lookforkeys',
help='Disallow to look for SSH keys in ~/.ssh/.')
parser.add_argument('--dir', required=False, default=None,
help='Local directory where to copy stuff. This will be mirrored to the remote system if needed (make sure to use path that will work on both).')
parser.add_argument('--models', nargs='+', required=False,
help='List of files (protocolbuffer) to work on. Might be a zip file.')
parser.add_argument('--wav', required=False,
help='WAV file to pass to native_client. Supply again in plotting mode to draw realine line.')
parser.add_argument('--alphabet', required=False,
help='Text file to pass to native_client for the alphabet.')
parser.add_argument('--lm_binary', required=False,
help='Path to the LM binary file used by the decoder.')
parser.add_argument('--trie', required=False,
help='Path to the trie file used by the decoder.')
parser.add_argument('--iters', type=int, required=False, default=5,
help='How many iterations to perfom on each model.')
parser.add_argument('--keep', required=False, action='store_true',
help='Keeping run files (binaries & models).')
parser.add_argument('--csv', type=argparse.FileType('w'), required=False,
help='Target CSV file where to dump data.')
parser.add_argument('--binaries', required=False, default=None,
help='Specify non TaskCluster native_client.tar.xz to use')
return parser.parse_args()
def do_main():
cli_args = handle_args()
if not cli_args.models or not cli_args.wav or not cli_args.alphabet:
raise AssertionError('Missing arguments (models, wav or alphabet)')
if cli_args.dir is not None and not os.path.isdir(cli_args.dir):
raise AssertionError('Inexistent temp directory')
if cli_args.binaries is not None and cli_args.binaries.find('native_client.tar.xz') == -1:
raise AssertionError('Local binaries must be bundled in a native_client.tar.xz file')
global ssh_conn
ssh_conn = establish_ssh(target=cli_args.target, auto_trust=cli_args.autotrust, allow_agent=cli_args.allowagent, look_keys=cli_args.lookforkeys)
tempdir, sorted_models = setup_tempdir(dir=cli_args.dir, models=cli_args.models, wav=cli_args.wav, alphabet=cli_args.alphabet, lm_binary=cli_args.lm_binary, trie=cli_args.trie, binaries=cli_args.binaries)
dest_sorted_models = list(map(lambda x: os.path.join(tempdir, os.path.basename(x)), sorted_models))
dest_wav = os.path.join(tempdir, os.path.basename(cli_args.wav))
dest_alphabet = os.path.join(tempdir, os.path.basename(cli_args.alphabet))
if cli_args.lm_binary and cli_args.trie:
dest_lm_binary = os.path.join(tempdir, os.path.basename(cli_args.lm_binary))
dest_trie = os.path.join(tempdir, os.path.basename(cli_args.trie))
inference_times = run_benchmarks(dir=tempdir, models=dest_sorted_models, wav=dest_wav, alphabet=dest_alphabet, lm_binary=dest_lm_binary, trie=dest_trie, iters=cli_args.iters)
else:
inference_times = run_benchmarks(dir=tempdir, models=dest_sorted_models, wav=dest_wav, alphabet=dest_alphabet, iters=cli_args.iters)
if cli_args.csv:
produce_csv(input=inference_times, output=cli_args.csv)
if not cli_args.keep:
teardown_tempdir(dir=tempdir)
if __name__ == '__main__' :
do_main()

View File

@ -1,146 +0,0 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import os
import sys
# To use util.tc
sys.path.append(os.path.abspath(os.path.dirname(os.path.dirname(sys.argv[0]))))
import util.taskcluster as tcu
from util.benchmark import keep_only_digits
import argparse
import numpy
import matplotlib.pyplot as plt
import scipy.stats as scipy_stats
import scipy.io.wavfile as wav
import csv
import getpass
from six import iteritems
from six.moves import range, map
r'''
Tool to:
- ingest CSV file produced by benchmark_nc and produce nice plots
'''
def reduce_filename(f):
r'''
Expects something like /tmp/tmpAjry4Gdsbench/test.weights.e5.XXX.YYY.pb
Where XXX is a variation on the model size for example
And where YYY is a const related to the training dataset
'''
f = os.path.basename(f).split('.')
return keep_only_digits(f[-3])
def ingest_csv(datasets=None, range=None):
existing_files = filter(lambda x: os.path.isfile(x[1]), datasets)
assert len(datasets) == len(existing_files)
if range:
range = map(int, range.split(','))
data = {}
for (dsname, dsfile) in datasets:
print('Reading %s from %s' % (dsname, dsfile))
with open(dsfile) as f:
d = csv.DictReader(f)
data[dsname] = []
for e in d:
if range:
re = reduce_filename(e['model'])
in_range = (re >= range[0] and re <= range[1])
if in_range:
data[dsname].append(e)
else:
data[dsname].append(e)
return data
def produce_plot(input=None, output=None):
x = range(len(input))
xlabels = list(map(lambda a: a['name'], input))
y = list(map(lambda a: a['mean'], input))
yerr = list(map(lambda a: a['stddev'], input))
print('y=', y)
print('yerr=', yerr)
plt.errorbar(x, y, yerr=yerr)
plt.show()
print("Wrote as %s" % output.name)
def produce_plot_multiseries(input=None, output=None, title=None, size=None, fig_dpi=None, source_wav=None):
fig, ax = plt.subplots()
# float() required because size.split()[] is a string
fig.set_figwidth(float(size.split('x')[0]) / fig_dpi)
fig.set_figheight(float(size.split('x')[1]) / fig_dpi)
nb_items = len(input[input.keys()[0]])
x_all = list(range(nb_items))
for serie, serie_values in iteritems(input):
xtics = list(map(lambda a: reduce_filename(a['model']), serie_values))
y = list(map(lambda a: float(a['mean']), serie_values))
yerr = list(map(lambda a: float(a['std']), serie_values))
linreg = scipy_stats.linregress(x_all, y)
ylin = linreg.intercept + linreg.slope * numpy.asarray(x_all)
ax.errorbar(x_all, y, yerr=yerr, label=('%s' % serie), fmt='-', capsize=4, elinewidth=1)
ax.plot(x_all, ylin, label=('%s ~= %0.4f*x+%0.4f (R=%0.4f)' % (serie, linreg.slope, linreg.intercept, linreg.rvalue)))
plt.xticks(x_all, xtics, rotation=60)
if source_wav:
audio = wav.read(source_wav)
print('Adding realtime')
for rt_factor in [ 0.5, 1.0, 1.5, 2.0 ]:
rt_secs = len(audio[1]) / audio[0] * rt_factor
y_rt = numpy.repeat(rt_secs, nb_items)
ax.plot(x_all, y_rt, label=('Realtime: %0.4f secs [%0.1f]' % (rt_secs, rt_factor)))
ax.set_title(title)
ax.set_xlabel('Model size')
ax.set_ylabel('Execution time (s)')
legend = ax.legend(loc='best')
plot_format = os.path.splitext(output.name)[-1].split('.')[-1]
plt.grid()
plt.tight_layout()
plt.savefig(output, transparent=False, frameon=True, dpi=fig_dpi, format=plot_format)
def handle_args():
parser = argparse.ArgumentParser(description='Benchmarking tooling for DeepSpeech native_client.')
parser.add_argument('--wav', required=False,
help='WAV file to pass to native_client. Supply again in plotting mode to draw realine line.')
parser.add_argument('--dataset', action='append', nargs=2, metavar=('name','source'),
help='Include dataset NAME from file SOURCE. Repeat the option to add more datasets.')
parser.add_argument('--title', default=None, help='Title of the plot.')
parser.add_argument('--plot', type=argparse.FileType('w'), required=False,
help='Target file where to plot data. Format will be deduced from extension.')
parser.add_argument('--size', default='800x600',
help='Size (px) of the resulting plot.')
parser.add_argument('--dpi', type=int, default=96,
help='Set plot DPI.')
parser.add_argument('--range', default=None,
help='Range of model size to use. Comma-separated string of boundaries: min,max')
return parser.parse_args()
def do_main():
cli_args = handle_args()
if not cli_args.dataset or not cli_args.plot:
raise AssertionError('Missing arguments (dataset or target file)')
# This is required to avoid errors about missing DISPLAY env var
plt.switch_backend('agg')
all_inference_times = ingest_csv(datasets=cli_args.dataset, range=cli_args.range)
if cli_args.plot:
produce_plot_multiseries(input=all_inference_times, output=cli_args.plot, title=cli_args.title, size=cli_args.size, fig_dpi=cli_args.dpi, source_wav=cli_args.wav)
if __name__ == '__main__' :
do_main()

85
bin/compare_samples.py Executable file
View File

@ -0,0 +1,85 @@
#!/usr/bin/env python
"""
Tool for comparing two wav samples
"""
import argparse
import sys
import numpy as np
from coqui_stt_training.util.audio import AUDIO_TYPE_NP, mean_dbfs
from coqui_stt_training.util.sample_collections import load_sample
def fail(message):
print(message, file=sys.stderr, flush=True)
sys.exit(1)
def compare_samples():
sample1 = load_sample(CLI_ARGS.sample1).unpack()
sample2 = load_sample(CLI_ARGS.sample2).unpack()
if sample1.audio_format != sample2.audio_format:
fail(
"Samples differ on: audio-format ({} and {})".format(
sample1.audio_format, sample2.audio_format
)
)
if abs(sample1.duration - sample2.duration) > 0.001:
fail(
"Samples differ on: duration ({} and {})".format(
sample1.duration, sample2.duration
)
)
sample1.change_audio_type(AUDIO_TYPE_NP)
sample2.change_audio_type(AUDIO_TYPE_NP)
samples = [sample1, sample2]
largest = np.argmax([sample1.audio.shape[0], sample2.audio.shape[0]])
smallest = (largest + 1) % 2
samples[largest].audio = samples[largest].audio[: len(samples[smallest].audio)]
audio_diff = samples[largest].audio - samples[smallest].audio
diff_dbfs = mean_dbfs(audio_diff)
differ_msg = "Samples differ on: sample data ({:0.2f} dB difference) ".format(
diff_dbfs
)
equal_msg = "Samples are considered equal ({:0.2f} dB difference)".format(diff_dbfs)
if CLI_ARGS.if_differ:
if diff_dbfs <= CLI_ARGS.threshold:
fail(equal_msg)
if not CLI_ARGS.no_success_output:
print(differ_msg, file=sys.stderr, flush=True)
else:
if diff_dbfs > CLI_ARGS.threshold:
fail(differ_msg)
if not CLI_ARGS.no_success_output:
print(equal_msg, file=sys.stderr, flush=True)
def handle_args():
parser = argparse.ArgumentParser(
description="Tool for checking similarity of two samples"
)
parser.add_argument("sample1", help="Filename of sample 1 to compare")
parser.add_argument("sample2", help="Filename of sample 2 to compare")
parser.add_argument(
"--threshold",
type=float,
default=-60.0,
help="dB of sample deltas above which they are considered different",
)
parser.add_argument(
"--if-differ",
action="store_true",
help="If to succeed and return status code 0 on different signals and fail on equal ones (inverse check)."
"This will still fail on different formats or durations.",
)
parser.add_argument(
"--no-success-output",
action="store_true",
help="Stay silent on success (if samples are equal of - with --if-differ - samples are not equal)",
)
return parser.parse_args()
if __name__ == "__main__":
CLI_ARGS = handle_args()
compare_samples()

136
bin/data_set_tool.py Executable file
View File

@ -0,0 +1,136 @@
#!/usr/bin/env python
"""
Tool for building a combined SDB or CSV sample-set from other sets
Use 'python3 data_set_tool.py -h' for help
"""
import argparse
import sys
from pathlib import Path
import progressbar
from coqui_stt_training.util.audio import (
AUDIO_TYPE_OPUS,
AUDIO_TYPE_PCM,
AUDIO_TYPE_WAV,
change_audio_types,
)
from coqui_stt_training.util.augmentations import (
SampleAugmentation,
apply_sample_augmentations,
parse_augmentations,
)
from coqui_stt_training.util.downloader import SIMPLE_BAR
from coqui_stt_training.util.sample_collections import (
CSVWriter,
DirectSDBWriter,
TarWriter,
samples_from_sources,
)
AUDIO_TYPE_LOOKUP = {"wav": AUDIO_TYPE_WAV, "opus": AUDIO_TYPE_OPUS}
def build_data_set():
audio_type = AUDIO_TYPE_LOOKUP[CLI_ARGS.audio_type]
augmentations = parse_augmentations(CLI_ARGS.augment)
print(f"Parsed augmentations from flags: {augmentations}")
if any(not isinstance(a, SampleAugmentation) for a in augmentations):
print(
"Warning: Some of the specified augmentations will not get applied, as this tool only supports "
"overlay, codec, reverb, resample and volume."
)
extension = Path(CLI_ARGS.target).suffix.lower()
labeled = not CLI_ARGS.unlabeled
if extension == ".csv":
writer = CSVWriter(
CLI_ARGS.target, absolute_paths=CLI_ARGS.absolute_paths, labeled=labeled
)
elif extension == ".sdb":
writer = DirectSDBWriter(
CLI_ARGS.target, audio_type=audio_type, labeled=labeled
)
elif extension == ".tar":
writer = TarWriter(
CLI_ARGS.target, labeled=labeled, gz=False, include=CLI_ARGS.include
)
elif extension == ".tgz" or CLI_ARGS.target.lower().endswith(".tar.gz"):
writer = TarWriter(
CLI_ARGS.target, labeled=labeled, gz=True, include=CLI_ARGS.include
)
else:
print(
"Unknown extension of target file - has to be either .csv, .sdb, .tar, .tar.gz or .tgz"
)
sys.exit(1)
with writer:
samples = samples_from_sources(CLI_ARGS.sources, labeled=not CLI_ARGS.unlabeled)
num_samples = len(samples)
if augmentations:
samples = apply_sample_augmentations(
samples, audio_type=AUDIO_TYPE_PCM, augmentations=augmentations
)
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for sample in bar(
change_audio_types(
samples,
audio_type=audio_type,
bitrate=CLI_ARGS.bitrate,
processes=CLI_ARGS.workers,
)
):
writer.add(sample)
def handle_args():
parser = argparse.ArgumentParser(
description="Tool for building a combined SDB or CSV sample-set from other sets"
)
parser.add_argument(
"sources",
nargs="+",
help="Source CSV and/or SDB files - "
"Note: For getting a correctly ordered target set, source SDBs have to have their samples "
"already ordered from shortest to longest.",
)
parser.add_argument("target", help="SDB, CSV or TAR(.gz) file to create")
parser.add_argument(
"--audio-type",
default="opus",
choices=AUDIO_TYPE_LOOKUP.keys(),
help="Audio representation inside target SDB",
)
parser.add_argument(
"--bitrate",
type=int,
help="Bitrate for lossy compressed SDB samples like in case of --audio-type opus",
)
parser.add_argument(
"--workers", type=int, default=None, help="Number of encoding SDB workers"
)
parser.add_argument(
"--unlabeled",
action="store_true",
help="If to build an data-set with unlabeled (audio only) samples - "
"typically used for building noise augmentation corpora",
)
parser.add_argument(
"--absolute-paths",
action="store_true",
help="If to reference samples by their absolute paths when writing CSV files",
)
parser.add_argument(
"--augment",
action="append",
help="Add an augmentation operation",
)
parser.add_argument(
"--include",
action="append",
help="Adds a file to the root directory of .tar(.gz) targets",
)
return parser.parse_args()
if __name__ == "__main__":
CLI_ARGS = handle_args()
build_data_set()

View File

@ -1,11 +0,0 @@
#!/usr/bin/env python
import sys
import os
sys.path.append(os.path.abspath('.'))
from util.gpu_usage import GPUUsage
gu = GPUUsage()
gu.start()

View File

@ -1,10 +0,0 @@
#!/usr/bin/env python
import sys
import os
sys.path.append(os.path.abspath('.'))
from util.gpu_usage import GPUUsageChart
GPUUsageChart(sys.argv[1], sys.argv[2])

View File

@ -1,14 +1,21 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import sys
# Load and export as string
with tf.gfile.FastGFile(sys.argv[1], 'rb') as fin:
graph_def = tf.GraphDef()
graph_def.ParseFromString(fin.read())
import tensorflow.compat.v1 as tfv1
from google.protobuf import text_format
with tf.gfile.FastGFile(sys.argv[1] + 'txt', 'w') as fout:
from google.protobuf import text_format
fout.write(text_format.MessageToString(graph_def))
def main():
# Load and export as string
with tfv1.gfile.FastGFile(sys.argv[1], "rb") as fin:
graph_def = tfv1.GraphDef()
graph_def.ParseFromString(fin.read())
with tfv1.gfile.FastGFile(sys.argv[1] + "txt", "w") as fout:
fout.write(text_format.MessageToString(graph_def))
if __name__ == "__main__":
main()

97
bin/import_aidatatang.py Executable file
View File

@ -0,0 +1,97 @@
#!/usr/bin/env python
import glob
import os
import tarfile
import pandas
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]
def extract(archive_path, target_dir):
print("Extracting {} into {}...".format(archive_path, target_dir))
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
def preprocess_data(tgz_file, target_dir):
# First extract main archive and sub-archives
extract(tgz_file, target_dir)
main_folder = os.path.join(target_dir, "aidatatang_200zh")
for targz in glob.glob(os.path.join(main_folder, "corpus", "*", "*.tar.gz")):
extract(targz, os.path.dirname(targz))
# Folder structure is now:
# - aidatatang_200zh/
# - transcript/aidatatang_200_zh_transcript.txt
# - corpus/train/*.tar.gz
# - corpus/train/*/*.{wav,txt,trn,metadata}
# - corpus/dev/*.tar.gz
# - corpus/dev/*/*.{wav,txt,trn,metadata}
# - corpus/test/*.tar.gz
# - corpus/test/*/*.{wav,txt,trn,metadata}
# Transcripts file has one line per WAV file, where each line consists of
# the WAV file name without extension followed by a single space followed
# by the transcript.
# Since the transcripts themselves can contain spaces, we split on space but
# only once, then build a mapping from file name to transcript
transcripts_path = os.path.join(
main_folder, "transcript", "aidatatang_200_zh_transcript.txt"
)
with open(transcripts_path) as fin:
transcripts = dict((line.split(" ", maxsplit=1) for line in fin))
def load_set(glob_path):
set_files = []
for wav in glob.glob(glob_path):
try:
wav_filename = wav
wav_filesize = os.path.getsize(wav)
transcript_key = os.path.splitext(os.path.basename(wav))[0]
transcript = transcripts[transcript_key].strip("\n")
set_files.append((wav_filename, wav_filesize, transcript))
except KeyError:
print("Warning: Missing transcript for WAV file {}.".format(wav))
return set_files
for subset in ("train", "dev", "test"):
print("Loading {} set samples...".format(subset))
subset_files = load_set(
os.path.join(main_folder, "corpus", subset, "*", "*.wav")
)
df = pandas.DataFrame(data=subset_files, columns=COLUMN_NAMES)
# Trim train set to under 10s by removing the last couple hundred samples
if subset == "train":
durations = (df["wav_filesize"] - 44) / 16000 / 2
df = df[durations <= 10.0]
print("Trimming {} samples > 10 seconds".format((durations > 10.0).sum()))
dest_csv = os.path.join(target_dir, "aidatatang_{}.csv".format(subset))
print("Saving {} set into {}...".format(subset, dest_csv))
df.to_csv(dest_csv, index=False)
def main():
# https://www.openslr.org/62/
parser = get_importers_parser(description="Import aidatatang_200zh corpus")
parser.add_argument("tgz_file", help="Path to aidatatang_200zh.tgz")
parser.add_argument(
"--target_dir",
default="",
help="Target folder to extract files into and put the resulting CSVs. Defaults to same folder as the main archive.",
)
params = parser.parse_args()
if not params.target_dir:
params.target_dir = os.path.dirname(params.tgz_file)
preprocess_data(params.tgz_file, params.target_dir)
if __name__ == "__main__":
main()

94
bin/import_aishell.py Executable file
View File

@ -0,0 +1,94 @@
#!/usr/bin/env python
import glob
import os
import tarfile
import pandas
from coqui_stt_training.util.importers import get_importers_parser
COLUMNNAMES = ["wav_filename", "wav_filesize", "transcript"]
def extract(archive_path, target_dir):
print("Extracting {} into {}...".format(archive_path, target_dir))
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
def preprocess_data(tgz_file, target_dir):
# First extract main archive and sub-archives
extract(tgz_file, target_dir)
main_folder = os.path.join(target_dir, "data_aishell")
wav_archives_folder = os.path.join(main_folder, "wav")
for targz in glob.glob(os.path.join(wav_archives_folder, "*.tar.gz")):
extract(targz, main_folder)
# Folder structure is now:
# - data_aishell/
# - train/S****/*.wav
# - dev/S****/*.wav
# - test/S****/*.wav
# - wav/S****.tar.gz
# - transcript/aishell_transcript_v0.8.txt
# Transcripts file has one line per WAV file, where each line consists of
# the WAV file name without extension followed by a single space followed
# by the transcript.
# Since the transcripts themselves can contain spaces, we split on space but
# only once, then build a mapping from file name to transcript
transcripts_path = os.path.join(
main_folder, "transcript", "aishell_transcript_v0.8.txt"
)
with open(transcripts_path) as fin:
transcripts = dict((line.split(" ", maxsplit=1) for line in fin))
def load_set(glob_path):
set_files = []
for wav in glob.glob(glob_path):
try:
wav_filename = wav
wav_filesize = os.path.getsize(wav)
transcript_key = os.path.splitext(os.path.basename(wav))[0]
transcript = transcripts[transcript_key].strip("\n")
set_files.append((wav_filename, wav_filesize, transcript))
except KeyError:
print("Warning: Missing transcript for WAV file {}.".format(wav))
return set_files
for subset in ("train", "dev", "test"):
print("Loading {} set samples...".format(subset))
subset_files = load_set(os.path.join(main_folder, subset, "S*", "*.wav"))
df = pandas.DataFrame(data=subset_files, columns=COLUMNNAMES)
# Trim train set to under 10s by removing the last couple hundred samples
if subset == "train":
durations = (df["wav_filesize"] - 44) / 16000 / 2
df = df[durations <= 10.0]
print("Trimming {} samples > 10 seconds".format((durations > 10.0).sum()))
dest_csv = os.path.join(target_dir, "aishell_{}.csv".format(subset))
print("Saving {} set into {}...".format(subset, dest_csv))
df.to_csv(dest_csv, index=False)
def main():
# http://www.openslr.org/33/
parser = get_importers_parser(description="Import AISHELL corpus")
parser.add_argument("aishell_tgz_file", help="Path to data_aishell.tgz")
parser.add_argument(
"--target_dir",
default="",
help="Target folder to extract files into and put the resulting CSVs. Defaults to same folder as the main archive.",
)
params = parser.parse_args()
if not params.target_dir:
params.target_dir = os.path.dirname(params.aishell_tgz_file)
preprocess_data(params.aishell_tgz_file, params.target_dir)
if __name__ == "__main__":
main()

750
bin/import_ccpmf.py Executable file
View File

@ -0,0 +1,750 @@
#!/usr/bin/env python
"""
Importer for dataset published from Centre de Conférence Pierre Mendès-France
Ministère de l'Économie, des Finances et de la Relance
"""
import csv
import decimal
import hashlib
import math
import os
import re
import subprocess
import sys
import unicodedata
import xml.etree.ElementTree as ET
import zipfile
from glob import glob
from multiprocessing import Pool
import progressbar
import sox
try:
from num2words import num2words
except ImportError as ex:
print("pip install num2words")
sys.exit(1)
import json
import requests
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.helpers import secs_to_hours
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
CHANNELS = 1
BIT_DEPTH = 16
MAX_SECS = 10
MIN_SECS = 0.85
DATASET_RELEASE_CSV = "https://data.economie.gouv.fr/explore/dataset/transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_for_header=true&csv_separator=%3B"
DATASET_RELEASE_SHA = [
(
"863d39a06a388c6491c6ff2f6450b151f38f1b57",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.001",
),
(
"2f3a0305aa04c61220bb00b5a4e553e45dbf12e1",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.002",
),
(
"5e55e9f1f844097349188ac875947e5a3d7fe9f1",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.003",
),
(
"8bf54842cf07948ca5915e27a8bd5fa5139c06ae",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.004",
),
(
"c8963504aadc015ac48f9af80058a0bb3440b94f",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.005",
),
(
"d95e225e908621d83ce4e9795fd108d9d310e244",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.006",
),
(
"de6ed9c2b0ee80ca879aae8ba7923cc93217d811",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.007",
),
(
"234283c47dacfcd4450d836c52c25f3e807fc5f2",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.008",
),
(
"4e6b67a688639bb72f8cd81782eaba604a8d32a6",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.009",
),
(
"4165a51389777c8af8e6253d87bdacb877e8b3b0",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.010",
),
(
"34322e7009780d97ef5bd02bf2f2c7a31f00baff",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.011",
),
(
"48c5be3b2ca9d6108d525da6a03e91d93a95dbac",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.012",
),
(
"87573172f506a189c2ebc633856fe11a2e9cd213",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.013",
),
(
"6ab2c9e508e9278d5129f023e018725c4a7c69e8",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.014",
),
(
"4f84df831ef46dce5d3ab3e21817687a2d8c12d0",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.015",
),
(
"e69bfb079885c299cb81080ef88b1b8b57158aa6",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.016",
),
(
"5f764ba788ee273981cf211b242c29b49ca22c5e",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.017",
),
(
"b6aa81a959525363223494830c1e7307d4c4bae6",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.018",
),
(
"91ddcf43c7bf113a6f2528b857c7ec22a50a148a",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.019",
),
(
"fa1b29273dd77b9a7494983a2f9ae52654b931d7",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.020",
),
(
"1113aef4f5e2be2f7fbf2d54b6c710c1c0e7135f",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.021",
),
(
"ce6420d5d0b6b5135ba559f83e1a82d4d615c470",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.022",
),
(
"d0976ed292ac24fcf1590d1ea195077c74b05471",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.023",
),
(
"ec746cd6af066f62d9bf8d3b2f89174783ff4e3c",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.024",
),
(
"570d9e1e84178e32fd867171d4b3aaecda1fd4fb",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.025",
),
(
"c29ccc7467a75b2cae3d7f2e9fbbb2ab276cb8ac",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.026",
),
(
"08406a51146d88e208704ce058c060a1e44efa50",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.027",
),
(
"199aedad733a78ea1e7d47def9c71c6fd5795e02",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.028",
),
(
"db856a068f92fb4f01f410bba42c7271de0f231a",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.029",
),
(
"e3c0135f16c6c9d25a09dcb4f99a685438a84740",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.030",
),
(
"e51b8bb9c0ae4339f98b4f21e6d29b825109f0ac",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.031",
),
(
"be5e80cbc49b59b31ae33c30576ef0e1a162d84e",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.032",
),
(
"501df58e3ff55fcfd75b93dab57566dc536948b8",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.033",
),
(
"1a114875811a8cdcb8d85a9f6dbee78be3e05131",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.034",
),
(
"465d824e7ee46448369182c0c28646d155a2249b",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.035",
),
(
"37f341b1b266d143eb73138c31cfff3201b9d619",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.036",
),
(
"9e7d8255987a8a77a90e0d4b55c8fd38b9fb5694",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.037",
),
(
"54886755630cb080a53098cb1b6c951c6714a143",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.038",
),
(
"4b7cbb0154697be795034f7a49712e882a97197a",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.039",
),
(
"c8e1e565a0e7a1f6ff1dbfcefe677aa74a41d2f2",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip.040",
),
]
def _download_and_preprocess_data(csv_url, target_dir):
dataset_sources = os.path.join(
target_dir, "transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020", "data.txt"
)
if os.path.exists(dataset_sources):
return dataset_sources
# Making path absolute
target_dir = os.path.abspath(target_dir)
csv_ref = requests.get(csv_url).text.split("\r\n")[1:-1]
for part in csv_ref:
part_filename = (
requests.head(part)
.headers.get("Content-Disposition")
.split(" ")[1]
.split("=")[1]
.replace('"', "")
)
if not os.path.exists(os.path.join(target_dir, part_filename)):
part_path = maybe_download(part_filename, target_dir, part)
def _big_sha1(fname):
s = hashlib.sha1()
buffer_size = 65536
with open(fname, "rb") as f:
while True:
data = f.read(buffer_size)
if not data:
break
s.update(data)
return s.hexdigest()
for (sha1, filename) in DATASET_RELEASE_SHA:
print("Checking {} SHA1:".format(filename))
csum = _big_sha1(os.path.join(target_dir, filename))
if csum == sha1:
print("\t{}: OK {}".format(filename, sha1))
else:
print("\t{}: ERROR: expected {}, computed {}".format(filename, sha1, csum))
assert csum == sha1
# Conditionally extract data
_maybe_extract(
target_dir,
"transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020",
"transcriptionsxml_audiomp3_mefr_ccpmf_2012-2020_2.zip",
"transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020.zip",
)
# Produce source text for extraction / conversion
return _maybe_create_sources(
os.path.join(target_dir, "transcriptionsXML_audioMP3_MEFR_CCPMF_2012-2020")
)
def _maybe_extract(target_dir, extracted_data, archive, final):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = os.path.join(target_dir, extracted_data)
archive_path = os.path.join(target_dir, archive)
final_archive = os.path.join(extracted_path, final)
if not os.path.exists(extracted_path):
if not os.path.exists(archive_path):
print('No archive "%s" - building ...' % archive_path)
all_zip_parts = glob(archive_path + ".*")
all_zip_parts.sort()
cmdline = "cat {} > {}".format(" ".join(all_zip_parts), archive_path)
print('Building with "%s"' % cmdline)
subprocess.check_call(cmdline, shell=True, cwd=target_dir)
assert os.path.exists(archive_path)
print(
'No directory "%s" - extracting archive %s ...'
% (extracted_path, archive_path)
)
with zipfile.ZipFile(archive_path) as zip_f:
zip_f.extractall(extracted_path)
with zipfile.ZipFile(final_archive) as zip_f:
zip_f.extractall(target_dir)
else:
print('Found directory "%s" - not extracting it from archive.' % extracted_path)
def _maybe_create_sources(dir):
dataset_sources = os.path.join(dir, "data.txt")
MP3 = glob(os.path.join(dir, "**", "*.mp3"))
XML = glob(os.path.join(dir, "**", "*.xml"))
MP3_XML_Scores = []
MP3_XML_Fin = {}
for f_mp3 in MP3:
for f_xml in XML:
b_mp3 = os.path.splitext(os.path.basename(f_mp3))[0]
b_xml = os.path.splitext(os.path.basename(f_xml))[0]
a_mp3 = b_mp3.split("_")
a_xml = b_xml.split("_")
score = 0
date_mp3 = a_mp3[0]
date_xml = a_xml[0]
if date_mp3 != date_xml:
continue
for i in range(min(len(a_mp3), len(a_xml))):
if a_mp3[i] == a_xml[i]:
score += 1
if score >= 1:
MP3_XML_Scores.append((f_mp3, f_xml, score))
# sort by score
MP3_XML_Scores.sort(key=lambda x: x[2], reverse=True)
for s_mp3, s_xml, score in MP3_XML_Scores:
# print(s_mp3, s_xml, score)
if score not in MP3_XML_Fin:
MP3_XML_Fin[score] = {}
if s_mp3 not in MP3_XML_Fin[score]:
try:
MP3.index(s_mp3)
MP3.remove(s_mp3)
MP3_XML_Fin[score][s_mp3] = s_xml
except ValueError as ex:
pass
else:
print("here:", MP3_XML_Fin[score][s_mp3], s_xml, file=sys.stderr)
with open(dataset_sources, "w") as ds:
for score in MP3_XML_Fin:
for mp3 in MP3_XML_Fin[score]:
xml = MP3_XML_Fin[score][mp3]
if os.path.getsize(mp3) > 0 and os.path.getsize(xml) > 0:
mp3 = os.path.relpath(mp3, dir)
xml = os.path.relpath(xml, dir)
ds.write("{},{},{:0.2e}\n".format(xml, mp3, 2.5e-4))
else:
print("Empty file {} or {}".format(mp3, xml), file=sys.stderr)
print("Missing XML pairs:", MP3, file=sys.stderr)
return dataset_sources
def maybe_normalize_for_digits(label):
# first, try to identify numbers like "50 000", "260 000"
if " " in label:
if any(s.isdigit() for s in label):
thousands = re.compile(r"(\d{1,3}(?:\s*\d{3})*(?:,\d+)?)")
maybe_thousands = thousands.findall(label)
if len(maybe_thousands) > 0:
while True:
(label, r) = re.subn(r"(\d)\s(\d{3})", "\\1\\2", label)
if r == 0:
break
# this might be a time or duration in the form "hh:mm" or "hh:mm:ss"
if ":" in label:
for s in label.split(" "):
if any(i.isdigit() for i in s):
date_or_time = re.compile(r"(\d{1,2}):(\d{2}):?(\d{2})?")
maybe_date_or_time = date_or_time.findall(s)
if len(maybe_date_or_time) > 0:
maybe_hours = maybe_date_or_time[0][0]
maybe_minutes = maybe_date_or_time[0][1]
maybe_seconds = maybe_date_or_time[0][2]
if len(maybe_seconds) > 0:
label = label.replace(
"{}:{}:{}".format(
maybe_hours, maybe_minutes, maybe_seconds
),
"{} heures {} minutes et {} secondes".format(
maybe_hours, maybe_minutes, maybe_seconds
),
)
else:
label = label.replace(
"{}:{}".format(maybe_hours, maybe_minutes),
"{} heures et {} minutes".format(
maybe_hours, maybe_minutes
),
)
new_label = []
# pylint: disable=too-many-nested-blocks
for s in label.split(" "):
if any(i.isdigit() for i in s):
s = s.replace(",", ".") # num2words requires "." for floats
s = s.replace('"', "") # clean some data, num2words would choke on 1959"
last_c = s[-1]
if not last_c.isdigit(): # num2words will choke on "0.6.", "24 ?"
s = s[:-1]
if any(
i.isalpha() for i in s
): # So we have any(isdigit()) **and** any(sialpha), like "3D"
ns = []
for c in s:
nc = c
if c.isdigit(): # convert "3" to "trois-"
try:
nc = num2words(c, lang="fr") + "-"
except decimal.InvalidOperation as ex:
print("decimal.InvalidOperation: '{}'".format(s))
raise ex
ns.append(nc)
s = "".join(s)
else:
try:
s = num2words(s, lang="fr")
except decimal.InvalidOperation as ex:
print("decimal.InvalidOperation: '{}'".format(s))
raise ex
new_label.append(s)
return " ".join(new_label)
def maybe_normalize_for_specials_chars(label):
label = label.replace("%", "pourcents")
label = label.replace("/", ", ") # clean intervals like 2019/2022 to "2019 2022"
label = label.replace("-", ", ") # clean intervals like 70-80 to "70 80"
label = label.replace("+", " plus ") # clean + and make it speakable
label = label.replace("", " euros ") # clean euro symbol and make it speakable
label = label.replace(
"., ", ", "
) # clean some strange "4.0., " (20181017_Innovation.xml)
label = label.replace(
"°", " degré "
) # clean some strange "°5" (20181210_EtatsGeneraux-1000_fre_750_und.xml)
label = label.replace("...", ".") # remove ellipsis
label = label.replace("..", ".") # remove broken ellipsis
label = label.replace(
"", "mètre-carrés"
) # 20150616_Defi_Climat_3_wmv_0_fre_minefi.xml
label = label.replace(
"[end]", ""
) # broken tag in 20150123_Entretiens_Tresor_PGM_wmv_0_fre_minefi.xml
label = label.replace(
u"\xB8c", " ç"
) # strange cedilla in 20150417_Printemps_Economie_2_wmv_0_fre_minefi.xml
label = label.replace(
"C0²", "CO 2"
) # 20121016_Syteme_sante_copie_wmv_0_fre_minefi.xml
return label
def maybe_normalize_for_anglicisms(label):
label = label.replace("B2B", "B to B")
label = label.replace("B2C", "B to C")
label = label.replace("#", "hashtag ")
label = label.replace("@", "at ")
return label
def maybe_normalize(label):
label = maybe_normalize_for_specials_chars(label)
label = maybe_normalize_for_anglicisms(label)
label = maybe_normalize_for_digits(label)
return label
def one_sample(sample):
file_size = -1
frames = 0
audio_source = sample[0]
target_dir = sample[1]
dataset_basename = sample[2]
start_time = sample[3]
duration = sample[4]
label = label_filter_fun(sample[5])
sample_id = sample[6]
_wav_filename = os.path.basename(
audio_source.replace(".wav", "_{:06}.wav".format(sample_id))
)
wav_fullname = os.path.join(target_dir, dataset_basename, _wav_filename)
if not os.path.exists(wav_fullname):
subprocess.check_output(
[
"ffmpeg",
"-i",
audio_source,
"-ss",
str(start_time),
"-t",
str(duration),
"-c",
"copy",
wav_fullname,
],
stdin=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
)
file_size = os.path.getsize(wav_fullname)
frames = int(
subprocess.check_output(["soxi", "-s", wav_fullname], stderr=subprocess.STDOUT)
)
_counter = get_counter()
_rows = []
if file_size == -1:
# Excluding samples that failed upon conversion
_counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
_counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 10 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
_counter["too_short"] += 1
elif frames / SAMPLE_RATE < MIN_SECS:
# Excluding samples that are too short
_counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
_counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
_rows.append((os.path.join(dataset_basename, _wav_filename), file_size, label))
_counter["imported_time"] += frames
_counter["all"] += 1
_counter["total_time"] += frames
return (_counter, _rows)
def _maybe_import_data(xml_file, audio_source, target_dir, rel_tol=1e-1):
dataset_basename = os.path.splitext(os.path.split(xml_file)[1])[0]
wav_root = os.path.join(target_dir, dataset_basename)
if not os.path.exists(wav_root):
os.makedirs(wav_root)
source_frames = int(
subprocess.check_output(["soxi", "-s", audio_source], stderr=subprocess.STDOUT)
)
print("Source audio length: %s" % secs_to_hours(source_frames / SAMPLE_RATE))
# Get audiofile path and transcript for each sentence in tsv
samples = []
tree = ET.parse(xml_file)
root = tree.getroot()
seq_id = 0
this_time = 0.0
this_duration = 0.0
prev_time = 0.0
prev_duration = 0.0
this_text = ""
for child in root:
if child.tag == "row":
cur_time = float(child.attrib["timestamp"])
cur_duration = float(child.attrib["timedur"])
cur_text = child.text
if this_time == 0.0:
this_time = cur_time
delta = cur_time - (prev_time + prev_duration)
# rel_tol value is made from trial/error to try and compromise between:
# - cutting enough to skip missing words
# - not too short, not too long sentences
is_close = math.isclose(
cur_time, this_time + this_duration, rel_tol=rel_tol
)
is_short = (this_duration + cur_duration + delta) < MAX_SECS
# when the previous element is close enough **and** this does not
# go over MAX_SECS, we append content
if is_close and is_short:
this_duration += cur_duration + delta
this_text += cur_text
else:
samples.append(
(
audio_source,
target_dir,
dataset_basename,
this_time,
this_duration,
this_text,
seq_id,
)
)
this_time = cur_time
this_duration = cur_duration
this_text = cur_text
seq_id += 1
prev_time = cur_time
prev_duration = cur_duration
# Keep track of how many samples are good vs. problematic
_counter = get_counter()
num_samples = len(samples)
_rows = []
print("Processing XML data: {}".format(xml_file))
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
_counter += processed[0]
_rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
imported_samples = get_imported_samples(_counter)
assert _counter["all"] == num_samples
assert len(_rows) == imported_samples
print_import_report(_counter, SAMPLE_RATE, MAX_SECS)
print(
"Import efficiency: %.1f%%" % ((_counter["total_time"] / source_frames) * 100)
)
print("")
return _counter, _rows
def _maybe_convert_wav(mp3_filename, _wav_filename):
if not os.path.exists(_wav_filename):
print("Converting {} to WAV file: {}".format(mp3_filename, _wav_filename))
transformer = sox.Transformer()
transformer.convert(
samplerate=SAMPLE_RATE, n_channels=CHANNELS, bitdepth=BIT_DEPTH
)
try:
transformer.build(mp3_filename, _wav_filename)
except sox.core.SoxError:
pass
def write_general_csv(target_dir, _rows, _counter):
target_csv_template = os.path.join(target_dir, "ccpmf_{}.csv")
with open(target_csv_template.format("train"), "w") as train_csv_file: # 80%
with open(target_csv_template.format("dev"), "w") as dev_csv_file: # 10%
with open(target_csv_template.format("test"), "w") as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
dev_writer.writeheader()
test_writer = csv.DictWriter(test_csv_file, fieldnames=FIELDNAMES)
test_writer.writeheader()
bar = progressbar.ProgressBar(max_value=len(_rows), widgets=SIMPLE_BAR)
for i, item in enumerate(bar(_rows)):
i_mod = i % 10
if i_mod == 0:
writer = test_writer
elif i_mod == 1:
writer = dev_writer
else:
writer = train_writer
writer.writerow(
{
"wav_filename": item[0],
"wav_filesize": item[1],
"transcript": item[2],
}
)
print("")
print("~~~~ FINAL STATISTICS ~~~~")
print_import_report(_counter, SAMPLE_RATE, MAX_SECS)
print("~~~~ (FINAL STATISTICS) ~~~~")
print("")
if __name__ == "__main__":
PARSER = get_importers_parser(
description="Import XML from Conference Centre for Economics, France"
)
PARSER.add_argument("target_dir", help="Destination directory")
PARSER.add_argument(
"--filter_alphabet",
help="Exclude samples with characters not in provided alphabet",
)
PARSER.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
PARAMS = PARSER.parse_args()
validate_label = get_validate_label(PARAMS)
ALPHABET = Alphabet(PARAMS.filter_alphabet) if PARAMS.filter_alphabet else None
def label_filter_fun(label):
if PARAMS.normalize:
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = maybe_normalize(label)
label = validate_label(label)
if ALPHABET and label:
try:
ALPHABET.encode(label)
except KeyError:
label = None
return label
dataset_sources = _download_and_preprocess_data(
csv_url=DATASET_RELEASE_CSV, target_dir=PARAMS.target_dir
)
sources_root_dir = os.path.dirname(dataset_sources)
all_counter = get_counter()
all_rows = []
with open(dataset_sources, "r") as sources:
for line in sources.readlines():
d = line.split(",")
this_xml = os.path.join(sources_root_dir, d[0])
this_mp3 = os.path.join(sources_root_dir, d[1])
this_rel = float(d[2])
wav_filename = os.path.join(
sources_root_dir,
os.path.splitext(os.path.basename(this_mp3))[0] + ".wav",
)
_maybe_convert_wav(this_mp3, wav_filename)
counter, rows = _maybe_import_data(
this_xml, wav_filename, sources_root_dir, this_rel
)
all_counter += counter
all_rows += rows
write_general_csv(sources_root_dir, _counter=all_counter, _rows=all_rows)

View File

@ -1,61 +1,107 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
import os
import sys
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import csv
import sox
import tarfile
import os
import subprocess
import progressbar
import sys
import tarfile
from glob import glob
from os import path
from threading import RLock
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
from util.text import validate_label
from util.downloader import maybe_download, SIMPLE_BAR
from multiprocessing import Pool
FIELDNAMES = ['wav_filename', 'wav_filesize', 'transcript']
import progressbar
import sox
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
print_import_report,
)
from coqui_stt_training.util.importers import validate_label_eng as validate_label
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
MAX_SECS = 10
ARCHIVE_DIR_NAME = 'cv_corpus_v1'
ARCHIVE_NAME = ARCHIVE_DIR_NAME + '.tar.gz'
ARCHIVE_URL = 'https://s3.us-east-2.amazonaws.com/common-voice-data-download/' + ARCHIVE_NAME
ARCHIVE_DIR_NAME = "cv_corpus_v1"
ARCHIVE_NAME = ARCHIVE_DIR_NAME + ".tar.gz"
ARCHIVE_URL = (
"https://s3.us-east-2.amazonaws.com/common-voice-data-download/" + ARCHIVE_NAME
)
def _download_and_preprocess_data(target_dir):
# Making path absolute
target_dir = path.abspath(target_dir)
target_dir = os.path.abspath(target_dir)
# Conditionally download data
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract common voice data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Conditionally convert common voice CSV files and mp3 data to DeepSpeech CSVs and wav
# Conditionally convert common voice CSV files and mp3 data to Coqui STT CSVs and wav
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)
def _maybe_extract(target_dir, extracted_data, archive_path):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = path.join(target_dir, extracted_data)
if not path.exists(extracted_path):
extracted_path = os.join(target_dir, extracted_data)
if not os.path.exists(extracted_path):
print('No directory "%s" - extracting archive...' % extracted_path)
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
else:
print('Found directory "%s" - not extracting it from archive.' % extracted_path)
def _maybe_convert_sets(target_dir, extracted_data):
extracted_dir = path.join(target_dir, extracted_data)
for source_csv in glob(path.join(extracted_dir, '*.csv')):
_maybe_convert_set(extracted_dir, source_csv, path.join(target_dir, os.path.split(source_csv)[-1]))
extracted_dir = os.path.join(target_dir, extracted_data)
for source_csv in glob(os.path.join(extracted_dir, "*.csv")):
_maybe_convert_set(
extracted_dir,
source_csv,
os.path.join(target_dir, os.path.split(source_csv)[-1]),
)
def one_sample(sample):
mp3_filename = sample[0]
# Storing wav files next to the mp3 ones - just with a different suffix
wav_filename = path.splitext(mp3_filename)[0] + ".wav"
_maybe_convert_wav(mp3_filename, wav_filename)
frames = int(
subprocess.check_output(["soxi", "-s", wav_filename], stderr=subprocess.STDOUT)
)
file_size = -1
if os.path.exists(wav_filename):
file_size = path.getsize(wav_filename)
frames = int(
subprocess.check_output(
["soxi", "-s", wav_filename], stderr=subprocess.STDOUT
)
)
label = validate_label(sample[1])
rows = []
counter = get_counter()
if file_size == -1:
# Excluding samples that failed upon conversion
counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 10 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter["imported_time"] += frames
counter["all"] += 1
counter["total_time"] += frames
return (counter, rows)
def _maybe_convert_set(extracted_dir, source_csv, target_csv):
print()
if path.exists(target_csv):
if os.path.exists(target_csv):
print('Found CSV file "%s" - not importing "%s".' % (target_csv, source_csv))
return
print('No CSV file "%s" - importing "%s"...' % (target_csv, source_csv))
@ -63,73 +109,47 @@ def _maybe_convert_set(extracted_dir, source_csv, target_csv):
with open(source_csv) as source_csv_file:
reader = csv.DictReader(source_csv_file)
for row in reader:
samples.append((row['filename'], row['text']))
samples.append((os.path.join(extracted_dir, row["filename"]), row["text"]))
# Mutable counters for the concurrent embedded routine
counter = { 'all': 0, 'failed': 0, 'invalid_label': 0, 'too_short': 0, 'too_long': 0 }
lock = RLock()
counter = get_counter()
num_samples = len(samples)
rows = []
def one_sample(sample):
mp3_filename = path.join(*(sample[0].split('/')))
mp3_filename = path.join(extracted_dir, mp3_filename)
# Storing wav files next to the mp3 ones - just with a different suffix
wav_filename = path.splitext(mp3_filename)[0] + ".wav"
_maybe_convert_wav(mp3_filename, wav_filename)
frames = int(subprocess.check_output(['soxi', '-s', wav_filename], stderr=subprocess.STDOUT))
file_size = -1
if path.exists(wav_filename):
file_size = path.getsize(wav_filename)
frames = int(subprocess.check_output(['soxi', '-s', wav_filename], stderr=subprocess.STDOUT))
label = validate_label(sample[1])
with lock:
if file_size == -1:
# Excluding samples that failed upon conversion
counter['failed'] += 1
elif label is None:
# Excluding samples that failed on label validation
counter['invalid_label'] += 1
elif int(frames/SAMPLE_RATE*1000/10/2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter['too_short'] += 1
elif frames/SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter['too_long'] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter['all'] += 1
print('Importing mp3 files...')
pool = Pool(cpu_count())
print("Importing mp3 files...")
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, _ in enumerate(pool.imap_unordered(one_sample, samples), start=1):
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
counter += processed[0]
rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
print('Writing "%s"...' % target_csv)
with open(target_csv, 'w') as target_csv_file:
with open(target_csv, "w", encoding="utf-8", newline="") as target_csv_file:
writer = csv.DictWriter(target_csv_file, fieldnames=FIELDNAMES)
writer.writeheader()
bar = progressbar.ProgressBar(max_value=len(rows), widgets=SIMPLE_BAR)
for filename, file_size, transcript in bar(rows):
writer.writerow({ 'wav_filename': filename, 'wav_filesize': file_size, 'transcript': transcript })
writer.writerow(
{
"wav_filename": filename,
"wav_filesize": file_size,
"transcript": transcript,
}
)
imported_samples = get_imported_samples(counter)
assert counter["all"] == num_samples
assert len(rows) == imported_samples
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
print('Imported %d samples.' % (counter['all'] - counter['failed'] - counter['too_short'] - counter['too_long']))
if counter['failed'] > 0:
print('Skipped %d samples that failed upon conversion.' % counter['failed'])
if counter['invalid_label'] > 0:
print('Skipped %d samples that failed on transcript validation.' % counter['invalid_label'])
if counter['too_short'] > 0:
print('Skipped %d samples that were too short to match the transcript.' % counter['too_short'])
if counter['too_long'] > 0:
print('Skipped %d samples that were longer than %d seconds.' % (counter['too_long'], MAX_SECS))
def _maybe_convert_wav(mp3_filename, wav_filename):
if not path.exists(wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE)
try:
@ -137,5 +157,6 @@ def _maybe_convert_wav(mp3_filename, wav_filename):
except sox.core.SoxError:
pass
if __name__ == "__main__":
_download_and_preprocess_data(sys.argv[1])

View File

@ -1,144 +1,250 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
import os
import sys
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import csv
import sox
import subprocess
import progressbar
from os import path
from threading import RLock
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
from util.downloader import SIMPLE_BAR
from util.text import validate_label
'''
"""
Broadly speaking, this script takes the audio downloaded from Common Voice
for a certain language, in addition to the *.tsv files output by CorporaCreator,
and the script formats the data and transcripts to be in a state usable by
DeepSpeech.py
train.py
Use "python3 import_cv2.py -h" for help
"""
import csv
import os
import subprocess
import unicodedata
from multiprocessing import Pool
Usage:
$ python3 import_cv2.py /path/to/audio/data_dir /path/to/tsv_dir
import progressbar
import sox
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
Input:
(1) audio_dir (string) path to dir of audio downloaded from Common Voice
(2) tsv_dir (string) path to dir containing {train,test,dev}.tsv files
which were generated by CorporaCreator
Ouput:
(1) csv files in format needed by DeepSpeech.py, saved into audio_dir
(2) wav files, saved into audio_dir alongside their mp3s
'''
FIELDNAMES = ['wav_filename', 'wav_filesize', 'transcript']
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
CHANNELS = 1
MAX_SECS = 10
PARAMS = None
FILTER_OBJ = None
def _preprocess_data(audio_dir, tsv_dir):
for dataset in ['train','test','dev']:
input_tsv= path.join(path.abspath(tsv_dir), dataset+".tsv")
if os.path.isfile(input_tsv):
print("Loading TSV file: ", input_tsv)
_maybe_convert_set(audio_dir, input_tsv)
else:
print("ERROR: no TSV file found: ", input_tsv)
def _maybe_convert_set(audio_dir, input_tsv):
output_csv = path.join(audio_dir,os.path.split(input_tsv)[-1].replace('tsv', 'csv'))
print("Saving new DeepSpeech-formatted CSV file to: ", output_csv)
class LabelFilter:
def __init__(self, normalize, alphabet, validate_fun):
self.normalize = normalize
self.alphabet = alphabet
self.validate_fun = validate_fun
# Get audiofile path and transcript for each sentence in tsv
samples = []
with open(input_tsv) as input_tsv_file:
reader = csv.DictReader(input_tsv_file, delimiter='\t')
for row in reader:
samples.append((row['path'], row['sentence']))
def filter(self, label):
if self.normalize:
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = self.validate_fun(label)
if self.alphabet and label and not self.alphabet.CanEncode(label):
label = None
return label
# Keep track of how many samples are good vs. problematic
counter = { 'all': 0, 'failed': 0, 'invalid_label': 0, 'too_short': 0, 'too_long': 0 }
lock = RLock()
num_samples = len(samples)
def init_worker(params):
global FILTER_OBJ # pylint: disable=global-statement
validate_label = get_validate_label(params)
alphabet = Alphabet(params.filter_alphabet) if params.filter_alphabet else None
FILTER_OBJ = LabelFilter(params.normalize, alphabet, validate_label)
def one_sample(sample):
""" Take an audio file, and optionally convert it to 16kHz WAV """
mp3_filename = sample[0]
if not os.path.splitext(mp3_filename.lower())[1] == ".mp3":
mp3_filename += ".mp3"
# Storing wav files next to the mp3 ones - just with a different suffix
wav_filename = os.path.splitext(mp3_filename)[0] + ".wav"
_maybe_convert_wav(mp3_filename, wav_filename)
file_size = -1
frames = 0
if os.path.exists(wav_filename):
file_size = os.path.getsize(wav_filename)
frames = int(
subprocess.check_output(
["soxi", "-s", wav_filename], stderr=subprocess.STDOUT
)
)
label = FILTER_OBJ.filter(sample[1])
rows = []
counter = get_counter()
if file_size == -1:
# Excluding samples that failed upon conversion
counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 10 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
rows.append((os.path.split(wav_filename)[-1], file_size, label, sample[2]))
counter["imported_time"] += frames
counter["all"] += 1
counter["total_time"] += frames
def one_sample(sample):
""" Take a audio file, and optionally convert it to 16kHz WAV """
mp3_filename = path.join(audio_dir, sample[0])
if not path.splitext(mp3_filename.lower())[1] == '.mp3':
mp3_filename += ".mp3"
# Storing wav files next to the mp3 ones - just with a different suffix
wav_filename = path.splitext(mp3_filename)[0] + ".wav"
_maybe_convert_wav(mp3_filename, wav_filename)
file_size = -1
if path.exists(wav_filename):
file_size = path.getsize(wav_filename)
frames = int(subprocess.check_output(['soxi', '-s', wav_filename], stderr=subprocess.STDOUT))
label = validate_label(sample[1])
with lock:
if file_size == -1:
# Excluding samples that failed upon conversion
counter['failed'] += 1
elif label is None:
# Excluding samples that failed on label validation
counter['invalid_label'] += 1
elif int(frames/SAMPLE_RATE*1000/10/2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter['too_short'] += 1
elif frames/SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter['too_long'] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter['all'] += 1
return (counter, rows)
print("Importing mp3 files...")
pool = Pool(cpu_count())
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, _ in enumerate(pool.imap_unordered(one_sample, samples), start=1):
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
with open(output_csv, 'w') as output_csv_file:
print('Writing CSV file for DeepSpeech.py as: ', output_csv)
def _maybe_convert_set(
dataset,
tsv_dir,
audio_dir,
filter_obj,
space_after_every_character=None,
rows=None,
exclude=None,
):
exclude_transcripts = set()
exclude_speakers = set()
if exclude is not None:
for sample in exclude:
exclude_transcripts.add(sample[2])
exclude_speakers.add(sample[3])
if rows is None:
rows = []
input_tsv = os.path.join(os.path.abspath(tsv_dir), dataset + ".tsv")
if not os.path.isfile(input_tsv):
return rows
print("Loading TSV file: ", input_tsv)
# Get audiofile path and transcript for each sentence in tsv
samples = []
with open(input_tsv, encoding="utf-8") as input_tsv_file:
reader = csv.DictReader(input_tsv_file, delimiter="\t")
for row in reader:
samples.append(
(
os.path.join(audio_dir, row["path"]),
row["sentence"],
row["client_id"],
)
)
counter = get_counter()
num_samples = len(samples)
print("Importing mp3 files...")
pool = Pool(initializer=init_worker, initargs=(PARAMS,))
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(
pool.imap_unordered(one_sample, samples), start=1
):
counter += processed[0]
rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
imported_samples = get_imported_samples(counter)
assert counter["all"] == num_samples
assert len(rows) == imported_samples
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
output_csv = os.path.join(os.path.abspath(audio_dir), dataset + ".csv")
print("Saving new Coqui STT-formatted CSV file to: ", output_csv)
with open(output_csv, "w", encoding="utf-8", newline="") as output_csv_file:
print("Writing CSV file for train.py as: ", output_csv)
writer = csv.DictWriter(output_csv_file, fieldnames=FIELDNAMES)
writer.writeheader()
bar = progressbar.ProgressBar(max_value=len(rows), widgets=SIMPLE_BAR)
for filename, file_size, transcript in bar(rows):
writer.writerow({ 'wav_filename': filename, 'wav_filesize': file_size, 'transcript': transcript })
for filename, file_size, transcript, speaker in bar(rows):
if transcript in exclude_transcripts or speaker in exclude_speakers:
continue
if space_after_every_character:
writer.writerow(
{
"wav_filename": filename,
"wav_filesize": file_size,
"transcript": " ".join(transcript),
}
)
else:
writer.writerow(
{
"wav_filename": filename,
"wav_filesize": file_size,
"transcript": transcript,
}
)
return rows
def _preprocess_data(tsv_dir, audio_dir, space_after_every_character=False):
exclude = []
for dataset in ["test", "dev", "train", "validated", "other"]:
set_samples = _maybe_convert_set(
dataset, tsv_dir, audio_dir, space_after_every_character
)
if dataset in ["test", "dev"]:
exclude += set_samples
if dataset == "validated":
_maybe_convert_set(
"train-all",
tsv_dir,
audio_dir,
space_after_every_character,
rows=set_samples,
exclude=exclude,
)
print('Imported %d samples.' % (counter['all'] - counter['failed'] - counter['too_short'] - counter['too_long']))
if counter['failed'] > 0:
print('Skipped %d samples that failed upon conversion.' % counter['failed'])
if counter['invalid_label'] > 0:
print('Skipped %d samples that failed on transcript validation.' % counter['invalid_label'])
if counter['too_short'] > 0:
print('Skipped %d samples that were too short to match the transcript.' % counter['too_short'])
if counter['too_long'] > 0:
print('Skipped %d samples that were longer than %d seconds.' % (counter['too_long'], MAX_SECS))
def _maybe_convert_wav(mp3_filename, wav_filename):
if not path.exists(wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE)
transformer.convert(samplerate=SAMPLE_RATE, n_channels=CHANNELS)
try:
transformer.build(mp3_filename, wav_filename)
except sox.core.SoxError:
pass
def parse_args():
parser = get_importers_parser(description="Import CommonVoice v2.0 corpora")
parser.add_argument("tsv_dir", help="Directory containing tsv files")
parser.add_argument(
"--audio_dir",
help='Directory containing the audio clips - defaults to "<tsv_dir>/clips"',
)
parser.add_argument(
"--filter_alphabet",
help="Exclude samples with characters not in provided alphabet",
)
parser.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
parser.add_argument(
"--space_after_every_character",
action="store_true",
help="To help transcript join by white space",
)
return parser.parse_args()
def main():
audio_dir = (
PARAMS.audio_dir if PARAMS.audio_dir else os.path.join(PARAMS.tsv_dir, "clips")
)
_preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
if __name__ == "__main__":
audio_dir = sys.argv[1]
tsv_dir = sys.argv[2]
print('Expecting your audio from Common Voice to be in: ', audio_dir)
print('Looking for *.tsv files (generated by CorporaCreator) in: ', tsv_dir)
_preprocess_data(audio_dir, tsv_dir)
PARAMS = parse_args()
main()

View File

@ -1,25 +1,20 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
import codecs
import fnmatch
import os
import random
import subprocess
import sys
import unicodedata
import librosa
import pandas
import soundfile # <= Has an external dependency on libsndfile
from coqui_stt_training.util.importers import validate_label_eng as validate_label
# Prerequisite: Having the sph2pipe tool in your PATH:
# https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
import sys
import os
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import codecs
import fnmatch
import os
import pandas
import subprocess
import unicodedata
import librosa
import soundfile # <= Has an external dependency on libsndfile
from util.text import validate_label
def _download_and_preprocess_data(data_dir):
# Assume data_dir contains extracted LDC2004S13, LDC2004T19, LDC2005S13, LDC2005T19
@ -29,33 +24,55 @@ def _download_and_preprocess_data(data_dir):
_maybe_convert_wav(data_dir, "LDC2005S13", "fisher-2005-wav")
# Conditionally split Fisher wav data
all_2004 = _split_wav_and_sentences(data_dir,
original_data="fisher-2004-wav",
converted_data="fisher-2004-split-wav",
trans_data=os.path.join("LDC2004T19", "fe_03_p1_tran", "data", "trans"))
all_2005 = _split_wav_and_sentences(data_dir,
original_data="fisher-2005-wav",
converted_data="fisher-2005-split-wav",
trans_data=os.path.join("LDC2005T19", "fe_03_p2_tran", "data", "trans"))
all_2004 = _split_wav_and_sentences(
data_dir,
original_data="fisher-2004-wav",
converted_data="fisher-2004-split-wav",
trans_data=os.path.join("LDC2004T19", "fe_03_p1_tran", "data", "trans"),
)
all_2005 = _split_wav_and_sentences(
data_dir,
original_data="fisher-2005-wav",
converted_data="fisher-2005-split-wav",
trans_data=os.path.join("LDC2005T19", "fe_03_p2_tran", "data", "trans"),
)
# The following files have incorrect transcripts that are much longer than
# their audio source. The result is that we end up with more labels than time
# slices, which breaks CTC.
all_2004.loc[all_2004["wav_filename"].str.endswith("fe_03_00265-33.53-33.81.wav"), "transcript"] = "correct"
all_2004.loc[all_2004["wav_filename"].str.endswith("fe_03_00991-527.39-528.3.wav"), "transcript"] = "that's one of those"
all_2005.loc[all_2005["wav_filename"].str.endswith("fe_03_10282-344.42-344.84.wav"), "transcript"] = "they don't want"
all_2005.loc[all_2005["wav_filename"].str.endswith("fe_03_10677-101.04-106.41.wav"), "transcript"] = "uh my mine yeah the german shepherd pitbull mix he snores almost as loud as i do"
all_2004.loc[
all_2004["wav_filename"].str.endswith("fe_03_00265-33.53-33.81.wav"),
"transcript",
] = "correct"
all_2004.loc[
all_2004["wav_filename"].str.endswith("fe_03_00991-527.39-528.3.wav"),
"transcript",
] = "that's one of those"
all_2005.loc[
all_2005["wav_filename"].str.endswith("fe_03_10282-344.42-344.84.wav"),
"transcript",
] = "they don't want"
all_2005.loc[
all_2005["wav_filename"].str.endswith("fe_03_10677-101.04-106.41.wav"),
"transcript",
] = "uh my mine yeah the german shepherd pitbull mix he snores almost as loud as i do"
# The following file is just a short sound and not at all transcribed like provided.
# So we just exclude it.
all_2004 = all_2004[~all_2004["wav_filename"].str.endswith("fe_03_00027-393.8-394.05.wav")]
all_2004 = all_2004[
~all_2004["wav_filename"].str.endswith("fe_03_00027-393.8-394.05.wav")
]
# The following file is far too long and would ruin our training batch size.
# So we just exclude it.
all_2005 = all_2005[~all_2005["wav_filename"].str.endswith("fe_03_11487-31.09-234.06.wav")]
all_2005 = all_2005[
~all_2005["wav_filename"].str.endswith("fe_03_11487-31.09-234.06.wav")
]
# The following file is too large for its transcript, so we just exclude it.
all_2004 = all_2004[~all_2004["wav_filename"].str.endswith("fe_03_01326-307.42-307.93.wav")]
all_2004 = all_2004[
~all_2004["wav_filename"].str.endswith("fe_03_01326-307.42-307.93.wav")
]
# Conditionally split Fisher data into train/validation/test sets
train_2004, dev_2004, test_2004 = _split_sets(all_2004)
@ -71,6 +88,7 @@ def _download_and_preprocess_data(data_dir):
dev_files.to_csv(os.path.join(data_dir, "fisher-dev.csv"), index=False)
test_files.to_csv(os.path.join(data_dir, "fisher-test.csv"), index=False)
def _maybe_convert_wav(data_dir, original_data, converted_data):
source_dir = os.path.join(data_dir, original_data)
target_dir = os.path.join(data_dir, converted_data)
@ -88,10 +106,18 @@ def _maybe_convert_wav(data_dir, original_data, converted_data):
for filename in fnmatch.filter(filenames, "*.sph"):
sph_file = os.path.join(root, filename)
for channel in ["1", "2"]:
wav_filename = os.path.splitext(os.path.basename(sph_file))[0] + "_c" + channel + ".wav"
wav_filename = (
os.path.splitext(os.path.basename(sph_file))[0]
+ "_c"
+ channel
+ ".wav"
)
wav_file = os.path.join(target_dir, wav_filename)
print("converting {} to {}".format(sph_file, wav_file))
subprocess.check_call(["sph2pipe", "-c", channel, "-p", "-f", "rif", sph_file, wav_file])
subprocess.check_call(
["sph2pipe", "-c", channel, "-p", "-f", "rif", sph_file, wav_file]
)
def _parse_transcriptions(trans_file):
segments = []
@ -109,18 +135,23 @@ def _parse_transcriptions(trans_file):
# We need to do the encode-decode dance here because encode
# returns a bytes() object on Python 3, and text_to_char_array
# expects a string.
transcript = unicodedata.normalize("NFKD", transcript) \
.encode("ascii", "ignore") \
.decode("ascii", "ignore")
transcript = (
unicodedata.normalize("NFKD", transcript)
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
segments.append({
"start_time": start_time,
"stop_time": stop_time,
"speaker": speaker,
"transcript": transcript,
})
segments.append(
{
"start_time": start_time,
"stop_time": stop_time,
"speaker": speaker,
"transcript": transcript,
}
)
return segments
def _split_wav_and_sentences(data_dir, trans_data, original_data, converted_data):
trans_dir = os.path.join(data_dir, trans_data)
source_dir = os.path.join(data_dir, original_data)
@ -137,59 +168,115 @@ def _split_wav_and_sentences(data_dir, trans_data, original_data, converted_data
segments = _parse_transcriptions(trans_file)
# Open wav corresponding to transcription file
wav_filenames = [os.path.splitext(os.path.basename(trans_file))[0] + "_c" + channel + ".wav" for channel in ["1", "2"]]
wav_files = [os.path.join(source_dir, wav_filename) for wav_filename in wav_filenames]
wav_filenames = [
os.path.splitext(os.path.basename(trans_file))[0]
+ "_c"
+ channel
+ ".wav"
for channel in ["1", "2"]
]
wav_files = [
os.path.join(source_dir, wav_filename) for wav_filename in wav_filenames
]
print("splitting {} according to {}".format(wav_files, trans_file))
origAudios = [librosa.load(wav_file, sr=16000, mono=False) for wav_file in wav_files]
origAudios = [
librosa.load(wav_file, sr=16000, mono=False) for wav_file in wav_files
]
# Loop over segments and split wav_file for each segment
for segment in segments:
# Create wav segment filename
start_time = segment["start_time"]
stop_time = segment["stop_time"]
new_wav_filename = os.path.splitext(os.path.basename(trans_file))[0] + "-" + str(start_time) + "-" + str(stop_time) + ".wav"
new_wav_filename = (
os.path.splitext(os.path.basename(trans_file))[0]
+ "-"
+ str(start_time)
+ "-"
+ str(stop_time)
+ ".wav"
)
new_wav_file = os.path.join(target_dir, new_wav_filename)
channel = 0 if segment["speaker"] == "A:" else 1
_split_and_resample_wav(origAudios[channel], start_time, stop_time, new_wav_file)
_split_and_resample_wav(
origAudios[channel], start_time, stop_time, new_wav_file
)
new_wav_filesize = os.path.getsize(new_wav_file)
transcript = validate_label(segment["transcript"])
if transcript != None:
files.append((os.path.abspath(new_wav_file), new_wav_filesize, transcript))
files.append(
(os.path.abspath(new_wav_file), new_wav_filesize, transcript)
)
return pandas.DataFrame(
data=files, columns=["wav_filename", "wav_filesize", "transcript"]
)
return pandas.DataFrame(data=files, columns=["wav_filename", "wav_filesize", "transcript"])
def _split_audio(origAudio, start_time, stop_time):
audioData, frameRate = origAudio
nChannels = len(audioData.shape)
startIndex = int(start_time * frameRate)
stopIndex = int(stop_time * frameRate)
return audioData[startIndex: stopIndex] if 1 == nChannels else audioData[:, startIndex: stopIndex]
return (
audioData[startIndex:stopIndex]
if 1 == nChannels
else audioData[:, startIndex:stopIndex]
)
def _split_and_resample_wav(origAudio, start_time, stop_time, new_wav_file):
frameRate = origAudio[1]
chunkData = _split_audio(origAudio, start_time, stop_time)
soundfile.write(new_wav_file, chunkData, frameRate, "PCM_16")
def _split_sets(filelist):
# We initially split the entire set into 80% train and 20% test, then
# split the train set into 80% train and 20% validation.
train_beg = 0
train_end = int(0.8 * len(filelist))
dev_beg = int(0.8 * train_end)
dev_end = train_end
train_end = dev_beg
def _split_sets(filelist):
"""
randomply split the datasets into train, validation, and test sets where the size of the
validation and test sets are determined by the `get_sample_size` function.
"""
random.shuffle(filelist)
sample_size = get_sample_size(len(filelist))
train_beg = 0
train_end = len(filelist) - 2 * sample_size
dev_beg = train_end
dev_end = train_end + sample_size
test_beg = dev_end
test_end = len(filelist)
return (filelist[train_beg:train_end],
filelist[dev_beg:dev_end],
filelist[test_beg:test_end])
return (
filelist[train_beg:train_end],
filelist[dev_beg:dev_end],
filelist[test_beg:test_end],
)
def get_sample_size(population_size):
"""calculates the sample size for a 99% confidence and 1% margin of error"""
margin_of_error = 0.01
fraction_picking = 0.50
z_score = 2.58 # Corresponds to confidence level 99%
numerator = (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2
)
sample_size = 0
for train_size in range(population_size, 0, -1):
denominator = 1 + (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2 * train_size
)
sample_size = int(numerator / denominator)
if 2 * sample_size + train_size <= population_size:
break
return sample_size
if __name__ == "__main__":
_download_and_preprocess_data(sys.argv[1])

93
bin/import_freestmandarin.py Executable file
View File

@ -0,0 +1,93 @@
#!/usr/bin/env python
import glob
import os
import tarfile
import numpy as np
import pandas
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]
def extract(archive_path, target_dir):
print("Extracting {} into {}...".format(archive_path, target_dir))
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
def preprocess_data(tgz_file, target_dir):
# First extract main archive and sub-archives
extract(tgz_file, target_dir)
main_folder = os.path.join(target_dir, "ST-CMDS-20170001_1-OS")
# Folder structure is now:
# - ST-CMDS-20170001_1-OS/
# - *.wav
# - *.txt
# - *.metadata
def load_set(glob_path):
set_files = []
for wav in glob.glob(glob_path):
wav_filename = wav
wav_filesize = os.path.getsize(wav)
txt_filename = os.path.splitext(wav_filename)[0] + ".txt"
with open(txt_filename, "r") as fin:
transcript = fin.read()
set_files.append((wav_filename, wav_filesize, transcript))
return set_files
# Load all files, then deterministically split into train/dev/test sets
all_files = load_set(os.path.join(main_folder, "*.wav"))
df = pandas.DataFrame(data=all_files, columns=COLUMN_NAMES)
df.sort_values(by="wav_filename", inplace=True)
indices = np.arange(0, len(df))
np.random.seed(12345)
np.random.shuffle(indices)
# Total corpus size: 102600 samples. 5000 samples gives us 99% confidence
# level with a margin of error of under 2%.
test_indices = indices[-5000:]
dev_indices = indices[-10000:-5000]
train_indices = indices[:-10000]
train_files = df.iloc[train_indices]
durations = (train_files["wav_filesize"] - 44) / 16000 / 2
train_files = train_files[durations <= 10.0]
print("Trimming {} samples > 10 seconds".format((durations > 10.0).sum()))
dest_csv = os.path.join(target_dir, "freestmandarin_train.csv")
print("Saving train set into {}...".format(dest_csv))
train_files.to_csv(dest_csv, index=False)
dev_files = df.iloc[dev_indices]
dest_csv = os.path.join(target_dir, "freestmandarin_dev.csv")
print("Saving dev set into {}...".format(dest_csv))
dev_files.to_csv(dest_csv, index=False)
test_files = df.iloc[test_indices]
dest_csv = os.path.join(target_dir, "freestmandarin_test.csv")
print("Saving test set into {}...".format(dest_csv))
test_files.to_csv(dest_csv, index=False)
def main():
# https://www.openslr.org/38/
parser = get_importers_parser(description="Import Free ST Chinese Mandarin corpus")
parser.add_argument("tgz_file", help="Path to ST-CMDS-20170001_1-OS.tar.gz")
parser.add_argument(
"--target_dir",
default="",
help="Target folder to extract files into and put the resulting CSVs. Defaults to same folder as the main archive.",
)
params = parser.parse_args()
if not params.target_dir:
params.target_dir = os.path.dirname(params.tgz_file)
preprocess_data(params.tgz_file, params.target_dir)
if __name__ == "__main__":
main()

365
bin/import_gram_vaani.py Executable file
View File

@ -0,0 +1,365 @@
#!/usr/bin/env python
import csv
import logging
import math
import os
import subprocess
import urllib
from pathlib import Path
import pandas as pd
import swifter
from coqui_stt_training.util.importers import get_importers_parser, get_validate_label
from sox import Transformer
__version__ = "0.1.0"
_logger = logging.getLogger(__name__)
MAX_SECS = 10
BITDEPTH = 16
N_CHANNELS = 1
SAMPLE_RATE = 16000
DEV_PERCENTAGE = 0.10
TRAIN_PERCENTAGE = 0.80
def parse_args(args):
"""Parse command line parameters
Args:
args ([str]): Command line parameters as list of strings
Returns:
:obj:`argparse.Namespace`: command line parameters namespace
"""
parser = get_importers_parser(description="Imports GramVaani data for Deep Speech")
parser.add_argument(
"--version",
action="version",
version="GramVaaniImporter {ver}".format(ver=__version__),
)
parser.add_argument(
"-v",
"--verbose",
action="store_const",
required=False,
help="set loglevel to INFO",
dest="loglevel",
const=logging.INFO,
)
parser.add_argument(
"-vv",
"--very-verbose",
action="store_const",
required=False,
help="set loglevel to DEBUG",
dest="loglevel",
const=logging.DEBUG,
)
parser.add_argument(
"-c",
"--csv_filename",
required=True,
help="Path to the GramVaani csv",
dest="csv_filename",
)
parser.add_argument(
"-t",
"--target_dir",
required=True,
help="Directory in which to save the importer GramVaani data",
dest="target_dir",
)
return parser.parse_args(args)
def setup_logging(level):
"""Setup basic logging
Args:
level (int): minimum log level for emitting messages
"""
format = "[%(asctime)s] %(levelname)s:%(name)s:%(message)s"
logging.basicConfig(
level=level, stream=sys.stdout, format=format, datefmt="%Y-%m-%d %H:%M:%S"
)
class GramVaaniCSV:
"""GramVaaniCSV representing a GramVaani dataset.
Args:
csv_filename (str): Path to the GramVaani csv
Attributes:
data (:class:`pandas.DataFrame`): `pandas.DataFrame` Containing the GramVaani csv data
"""
def __init__(self, csv_filename):
self.data = self._parse_csv(csv_filename)
def _parse_csv(self, csv_filename):
_logger.info("Parsing csv file...%s", os.path.abspath(csv_filename))
data = pd.read_csv(
os.path.abspath(csv_filename),
names=[
"piece_id",
"audio_url",
"transcript_labelled",
"transcript",
"labels",
"content_filename",
"audio_length",
"user_id",
],
usecols=["audio_url", "transcript", "audio_length"],
skiprows=[0],
engine="python",
encoding="utf-8",
quotechar='"',
quoting=csv.QUOTE_ALL,
)
data.dropna(inplace=True)
_logger.info("Parsed %d lines csv file." % len(data))
return data
class GramVaaniDownloader:
"""GramVaaniDownloader downloads a GramVaani dataset.
Args:
gram_vaani_csv (GramVaaniCSV): A GramVaaniCSV representing the data to download
target_dir (str): The path to download the data to
Attributes:
data (:class:`pandas.DataFrame`): `pandas.DataFrame` Containing the GramVaani csv data
"""
def __init__(self, gram_vaani_csv, target_dir):
self.target_dir = target_dir
self.data = gram_vaani_csv.data
def download(self):
"""Downloads the data associated with this instance
Return:
mp3_directory (os.path): The directory into which the associated mp3's were downloaded
"""
mp3_directory = self._pre_download()
self.data.swifter.apply(
func=lambda arg: self._download(*arg, mp3_directory), axis=1, raw=True
)
return mp3_directory
def _pre_download(self):
mp3_directory = os.path.join(self.target_dir, "mp3")
if not os.path.exists(self.target_dir):
_logger.info("Creating directory...%s", self.target_dir)
os.mkdir(self.target_dir)
if not os.path.exists(mp3_directory):
_logger.info("Creating directory...%s", mp3_directory)
os.mkdir(mp3_directory)
return mp3_directory
def _download(self, audio_url, transcript, audio_length, mp3_directory):
if audio_url == "audio_url":
return
mp3_filename = os.path.join(mp3_directory, os.path.basename(audio_url))
if not os.path.exists(mp3_filename):
_logger.debug("Downloading mp3 file...%s", audio_url)
urllib.request.urlretrieve(audio_url, mp3_filename)
else:
_logger.debug("Already downloaded mp3 file...%s", audio_url)
class GramVaaniConverter:
"""GramVaaniConverter converts the mp3's to wav's for a GramVaani dataset.
Args:
target_dir (str): The path to download the data from
mp3_directory (os.path): The path containing the GramVaani mp3's
Attributes:
target_dir (str): The target directory passed as a command line argument
mp3_directory (os.path): The path containing the GramVaani mp3's
"""
def __init__(self, target_dir, mp3_directory):
self.target_dir = target_dir
self.mp3_directory = Path(mp3_directory)
def convert(self):
"""Converts the mp3's associated with this instance to wav's
Return:
wav_directory (os.path): The directory into which the associated wav's were downloaded
"""
wav_directory = self._pre_convert()
for mp3_filename in self.mp3_directory.glob("**/*.mp3"):
wav_filename = os.path.join(
wav_directory,
os.path.splitext(os.path.basename(mp3_filename))[0] + ".wav",
)
if not os.path.exists(wav_filename):
_logger.debug(
"Converting mp3 file %s to wav file %s"
% (mp3_filename, wav_filename)
)
transformer = Transformer()
transformer.convert(
samplerate=SAMPLE_RATE, n_channels=N_CHANNELS, bitdepth=BITDEPTH
)
transformer.build(str(mp3_filename), str(wav_filename))
else:
_logger.debug(
"Already converted mp3 file %s to wav file %s"
% (mp3_filename, wav_filename)
)
return wav_directory
def _pre_convert(self):
wav_directory = os.path.join(self.target_dir, "wav")
if not os.path.exists(self.target_dir):
_logger.info("Creating directory...%s", self.target_dir)
os.mkdir(self.target_dir)
if not os.path.exists(wav_directory):
_logger.info("Creating directory...%s", wav_directory)
os.mkdir(wav_directory)
return wav_directory
class GramVaaniDataSets:
def __init__(self, target_dir, wav_directory, gram_vaani_csv):
self.target_dir = target_dir
self.wav_directory = wav_directory
self.csv_data = gram_vaani_csv.data
self.raw = pd.DataFrame(columns=["wav_filename", "wav_filesize", "transcript"])
self.valid = pd.DataFrame(
columns=["wav_filename", "wav_filesize", "transcript"]
)
self.train = pd.DataFrame(
columns=["wav_filename", "wav_filesize", "transcript"]
)
self.dev = pd.DataFrame(columns=["wav_filename", "wav_filesize", "transcript"])
self.test = pd.DataFrame(columns=["wav_filename", "wav_filesize", "transcript"])
def create(self):
self._convert_csv_data_to_raw_data()
self.raw.index = range(len(self.raw.index))
self.valid = self.raw[self._is_valid_raw_rows()]
self.valid = self.valid.sample(frac=1).reset_index(drop=True)
train_size, dev_size, test_size = self._calculate_data_set_sizes()
self.train = self.valid.loc[0:train_size]
self.dev = self.valid.loc[train_size : train_size + dev_size]
self.test = self.valid.loc[
train_size + dev_size : train_size + dev_size + test_size
]
def _convert_csv_data_to_raw_data(self):
self.raw[["wav_filename", "wav_filesize", "transcript"]] = self.csv_data[
["audio_url", "transcript", "audio_length"]
].swifter.apply(
func=lambda arg: self._convert_csv_data_to_raw_data_impl(*arg),
axis=1,
raw=True,
)
self.raw.reset_index()
def _convert_csv_data_to_raw_data_impl(self, audio_url, transcript, audio_length):
if audio_url == "audio_url":
return pd.Series(["wav_filename", "wav_filesize", "transcript"])
mp3_filename = os.path.basename(audio_url)
wav_relative_filename = os.path.join(
"wav", os.path.splitext(os.path.basename(mp3_filename))[0] + ".wav"
)
wav_filesize = os.path.getsize(
os.path.join(self.target_dir, wav_relative_filename)
)
transcript = validate_label(transcript)
if None == transcript:
transcript = ""
return pd.Series([wav_relative_filename, wav_filesize, transcript])
def _is_valid_raw_rows(self):
is_valid_raw_transcripts = self._is_valid_raw_transcripts()
is_valid_raw_wav_frames = self._is_valid_raw_wav_frames()
is_valid_raw_row = [
(is_valid_raw_transcript & is_valid_raw_wav_frame)
for is_valid_raw_transcript, is_valid_raw_wav_frame in zip(
is_valid_raw_transcripts, is_valid_raw_wav_frames
)
]
series = pd.Series(is_valid_raw_row)
return series
def _is_valid_raw_transcripts(self):
return pd.Series([bool(transcript) for transcript in self.raw.transcript])
def _is_valid_raw_wav_frames(self):
transcripts = [str(transcript) for transcript in self.raw.transcript]
wav_filepaths = [
os.path.join(self.target_dir, str(wav_filename))
for wav_filename in self.raw.wav_filename
]
wav_frames = [
int(
subprocess.check_output(
["soxi", "-s", wav_filepath], stderr=subprocess.STDOUT
)
)
for wav_filepath in wav_filepaths
]
is_valid_raw_wav_frames = [
self._is_wav_frame_valid(wav_frame, transcript)
for wav_frame, transcript in zip(wav_frames, transcripts)
]
return pd.Series(is_valid_raw_wav_frames)
def _is_wav_frame_valid(self, wav_frame, transcript):
is_wav_frame_valid = True
if int(wav_frame / SAMPLE_RATE * 1000 / 10 / 2) < len(str(transcript)):
is_wav_frame_valid = False
elif wav_frame / SAMPLE_RATE > MAX_SECS:
is_wav_frame_valid = False
return is_wav_frame_valid
def _calculate_data_set_sizes(self):
total_size = len(self.valid)
dev_size = math.floor(total_size * DEV_PERCENTAGE)
train_size = math.floor(total_size * TRAIN_PERCENTAGE)
test_size = total_size - (train_size + dev_size)
return (train_size, dev_size, test_size)
def save(self):
datasets = ["train", "dev", "test"]
for dataset in datasets:
self._save(dataset)
def _save(self, dataset):
dataset_path = os.path.join(self.target_dir, dataset + ".csv")
dataframe = getattr(self, dataset)
dataframe.to_csv(
dataset_path,
index=False,
encoding="utf-8",
escapechar="\\",
quoting=csv.QUOTE_MINIMAL,
)
def main(args):
"""Main entry point allowing external calls
Args:
args ([str]): command line parameter list
"""
args = parse_args(args)
validate_label = get_validate_label(args)
setup_logging(args.loglevel)
_logger.info("Starting GramVaani importer...")
_logger.info("Starting loading GramVaani csv...")
csv = GramVaaniCSV(args.csv_filename)
_logger.info("Starting downloading GramVaani mp3's...")
downloader = GramVaaniDownloader(csv, args.target_dir)
mp3_directory = downloader.download()
_logger.info("Starting converting GramVaani mp3's to wav's...")
converter = GramVaaniConverter(args.target_dir, mp3_directory)
wav_directory = converter.convert()
datasets = GramVaaniDataSets(args.target_dir, wav_directory, csv)
datasets.create()
datasets.save()
_logger.info("Finished GramVaani importer...")
main(sys.argv[1:])

View File

@ -1,28 +1,32 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
import sys
import os
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import sys
import pandas
from coqui_stt_training.util.downloader import maybe_download
from util.downloader import maybe_download
def _download_and_preprocess_data(data_dir):
# Conditionally download data
LDC93S1_BASE = "LDC93S1"
LDC93S1_BASE_URL = "https://catalog.ldc.upenn.edu/desc/addenda/"
local_file = maybe_download(LDC93S1_BASE + ".wav", data_dir, LDC93S1_BASE_URL + LDC93S1_BASE + ".wav")
trans_file = maybe_download(LDC93S1_BASE + ".txt", data_dir, LDC93S1_BASE_URL + LDC93S1_BASE + ".txt")
local_file = maybe_download(
LDC93S1_BASE + ".wav", data_dir, LDC93S1_BASE_URL + LDC93S1_BASE + ".wav"
)
trans_file = maybe_download(
LDC93S1_BASE + ".txt", data_dir, LDC93S1_BASE_URL + LDC93S1_BASE + ".txt"
)
with open(trans_file, "r") as fin:
transcript = ' '.join(fin.read().strip().lower().split(' ')[2:]).replace('.', '')
transcript = " ".join(fin.read().strip().lower().split(" ")[2:]).replace(
".", ""
)
df = pandas.DataFrame(data=[(os.path.abspath(local_file), os.path.getsize(local_file), transcript)],
columns=["wav_filename", "wav_filesize", "transcript"])
df = pandas.DataFrame(
data=[(os.path.abspath(local_file), os.path.getsize(local_file), transcript)],
columns=["wav_filename", "wav_filesize", "transcript"],
)
df.to_csv(os.path.join(data_dir, "ldc93s1.csv"), index=False)
if __name__ == "__main__":
_download_and_preprocess_data(sys.argv[1])

View File

@ -1,31 +1,38 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
import sys
import os
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import codecs
import fnmatch
import pandas
import progressbar
import os
import subprocess
import sys
import tarfile
import unicodedata
import pandas
import progressbar
from coqui_stt_training.util.downloader import maybe_download
from sox import Transformer
from util.downloader import maybe_download
from tensorflow.python.platform import gfile
SAMPLE_RATE = 16000
def _download_and_preprocess_data(data_dir):
# Conditionally download data to data_dir
print("Downloading Librivox data set (55GB) into {} if not already present...".format(data_dir))
print(
"Downloading Librivox data set (55GB) into {} if not already present...".format(
data_dir
)
)
with progressbar.ProgressBar(max_value=7, widget=progressbar.AdaptiveETA) as bar:
TRAIN_CLEAN_100_URL = "http://www.openslr.org/resources/12/train-clean-100.tar.gz"
TRAIN_CLEAN_360_URL = "http://www.openslr.org/resources/12/train-clean-360.tar.gz"
TRAIN_OTHER_500_URL = "http://www.openslr.org/resources/12/train-other-500.tar.gz"
TRAIN_CLEAN_100_URL = (
"http://www.openslr.org/resources/12/train-clean-100.tar.gz"
)
TRAIN_CLEAN_360_URL = (
"http://www.openslr.org/resources/12/train-clean-360.tar.gz"
)
TRAIN_OTHER_500_URL = (
"http://www.openslr.org/resources/12/train-other-500.tar.gz"
)
DEV_CLEAN_URL = "http://www.openslr.org/resources/12/dev-clean.tar.gz"
DEV_OTHER_URL = "http://www.openslr.org/resources/12/dev-other.tar.gz"
@ -33,12 +40,20 @@ def _download_and_preprocess_data(data_dir):
TEST_CLEAN_URL = "http://www.openslr.org/resources/12/test-clean.tar.gz"
TEST_OTHER_URL = "http://www.openslr.org/resources/12/test-other.tar.gz"
def filename_of(x): return os.path.split(x)[1]
train_clean_100 = maybe_download(filename_of(TRAIN_CLEAN_100_URL), data_dir, TRAIN_CLEAN_100_URL)
def filename_of(x):
return os.path.split(x)[1]
train_clean_100 = maybe_download(
filename_of(TRAIN_CLEAN_100_URL), data_dir, TRAIN_CLEAN_100_URL
)
bar.update(0)
train_clean_360 = maybe_download(filename_of(TRAIN_CLEAN_360_URL), data_dir, TRAIN_CLEAN_360_URL)
train_clean_360 = maybe_download(
filename_of(TRAIN_CLEAN_360_URL), data_dir, TRAIN_CLEAN_360_URL
)
bar.update(1)
train_other_500 = maybe_download(filename_of(TRAIN_OTHER_500_URL), data_dir, TRAIN_OTHER_500_URL)
train_other_500 = maybe_download(
filename_of(TRAIN_OTHER_500_URL), data_dir, TRAIN_OTHER_500_URL
)
bar.update(2)
dev_clean = maybe_download(filename_of(DEV_CLEAN_URL), data_dir, DEV_CLEAN_URL)
@ -46,9 +61,13 @@ def _download_and_preprocess_data(data_dir):
dev_other = maybe_download(filename_of(DEV_OTHER_URL), data_dir, DEV_OTHER_URL)
bar.update(4)
test_clean = maybe_download(filename_of(TEST_CLEAN_URL), data_dir, TEST_CLEAN_URL)
test_clean = maybe_download(
filename_of(TEST_CLEAN_URL), data_dir, TEST_CLEAN_URL
)
bar.update(5)
test_other = maybe_download(filename_of(TEST_OTHER_URL), data_dir, TEST_OTHER_URL)
test_other = maybe_download(
filename_of(TEST_OTHER_URL), data_dir, TEST_OTHER_URL
)
bar.update(6)
# Conditionally extract LibriSpeech data
@ -59,11 +78,17 @@ def _download_and_preprocess_data(data_dir):
LIBRIVOX_DIR = "LibriSpeech"
work_dir = os.path.join(data_dir, LIBRIVOX_DIR)
_maybe_extract(data_dir, os.path.join(LIBRIVOX_DIR, "train-clean-100"), train_clean_100)
_maybe_extract(
data_dir, os.path.join(LIBRIVOX_DIR, "train-clean-100"), train_clean_100
)
bar.update(0)
_maybe_extract(data_dir, os.path.join(LIBRIVOX_DIR, "train-clean-360"), train_clean_360)
_maybe_extract(
data_dir, os.path.join(LIBRIVOX_DIR, "train-clean-360"), train_clean_360
)
bar.update(1)
_maybe_extract(data_dir, os.path.join(LIBRIVOX_DIR, "train-other-500"), train_other_500)
_maybe_extract(
data_dir, os.path.join(LIBRIVOX_DIR, "train-other-500"), train_other_500
)
bar.update(2)
_maybe_extract(data_dir, os.path.join(LIBRIVOX_DIR, "dev-clean"), dev_clean)
@ -89,28 +114,48 @@ def _download_and_preprocess_data(data_dir):
# data_dir/LibriSpeech/split-wav/1-2-2.txt
# ...
print("Converting FLAC to WAV and splitting transcriptions...")
with progressbar.ProgressBar(max_value=7, widget=progressbar.AdaptiveETA) as bar:
train_100 = _convert_audio_and_split_sentences(work_dir, "train-clean-100", "train-clean-100-wav")
with progressbar.ProgressBar(max_value=7, widget=progressbar.AdaptiveETA) as bar:
train_100 = _convert_audio_and_split_sentences(
work_dir, "train-clean-100", "train-clean-100-wav"
)
bar.update(0)
train_360 = _convert_audio_and_split_sentences(work_dir, "train-clean-360", "train-clean-360-wav")
train_360 = _convert_audio_and_split_sentences(
work_dir, "train-clean-360", "train-clean-360-wav"
)
bar.update(1)
train_500 = _convert_audio_and_split_sentences(work_dir, "train-other-500", "train-other-500-wav")
train_500 = _convert_audio_and_split_sentences(
work_dir, "train-other-500", "train-other-500-wav"
)
bar.update(2)
dev_clean = _convert_audio_and_split_sentences(work_dir, "dev-clean", "dev-clean-wav")
dev_clean = _convert_audio_and_split_sentences(
work_dir, "dev-clean", "dev-clean-wav"
)
bar.update(3)
dev_other = _convert_audio_and_split_sentences(work_dir, "dev-other", "dev-other-wav")
dev_other = _convert_audio_and_split_sentences(
work_dir, "dev-other", "dev-other-wav"
)
bar.update(4)
test_clean = _convert_audio_and_split_sentences(work_dir, "test-clean", "test-clean-wav")
test_clean = _convert_audio_and_split_sentences(
work_dir, "test-clean", "test-clean-wav"
)
bar.update(5)
test_other = _convert_audio_and_split_sentences(work_dir, "test-other", "test-other-wav")
test_other = _convert_audio_and_split_sentences(
work_dir, "test-other", "test-other-wav"
)
bar.update(6)
# Write sets to disk as CSV files
train_100.to_csv(os.path.join(data_dir, "librivox-train-clean-100.csv"), index=False)
train_360.to_csv(os.path.join(data_dir, "librivox-train-clean-360.csv"), index=False)
train_500.to_csv(os.path.join(data_dir, "librivox-train-other-500.csv"), index=False)
train_100.to_csv(
os.path.join(data_dir, "librivox-train-clean-100.csv"), index=False
)
train_360.to_csv(
os.path.join(data_dir, "librivox-train-clean-360.csv"), index=False
)
train_500.to_csv(
os.path.join(data_dir, "librivox-train-other-500.csv"), index=False
)
dev_clean.to_csv(os.path.join(data_dir, "librivox-dev-clean.csv"), index=False)
dev_other.to_csv(os.path.join(data_dir, "librivox-dev-other.csv"), index=False)
@ -118,6 +163,7 @@ def _download_and_preprocess_data(data_dir):
test_clean.to_csv(os.path.join(data_dir, "librivox-test-clean.csv"), index=False)
test_other.to_csv(os.path.join(data_dir, "librivox-test-other.csv"), index=False)
def _maybe_extract(data_dir, extracted_data, archive):
# If data_dir/extracted_data does not exist, extract archive in data_dir
if not gfile.Exists(os.path.join(data_dir, extracted_data)):
@ -125,6 +171,7 @@ def _maybe_extract(data_dir, extracted_data, archive):
tar.extractall(data_dir)
tar.close()
def _convert_audio_and_split_sentences(extracted_dir, data_set, dest_dir):
source_dir = os.path.join(extracted_dir, data_set)
target_dir = os.path.join(extracted_dir, dest_dir)
@ -147,20 +194,22 @@ def _convert_audio_and_split_sentences(extracted_dir, data_set, dest_dir):
# We also convert the corresponding FLACs to WAV in the same pass
files = []
for root, dirnames, filenames in os.walk(source_dir):
for filename in fnmatch.filter(filenames, '*.trans.txt'):
for filename in fnmatch.filter(filenames, "*.trans.txt"):
trans_filename = os.path.join(root, filename)
with codecs.open(trans_filename, "r", "utf-8") as fin:
for line in fin:
# Parse each segment line
first_space = line.find(" ")
seqid, transcript = line[:first_space], line[first_space+1:]
seqid, transcript = line[:first_space], line[first_space + 1 :]
# We need to do the encode-decode dance here because encode
# returns a bytes() object on Python 3, and text_to_char_array
# expects a string.
transcript = unicodedata.normalize("NFKD", transcript) \
.encode("ascii", "ignore") \
.decode("ascii", "ignore")
transcript = (
unicodedata.normalize("NFKD", transcript)
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
transcript = transcript.lower().strip()
@ -168,12 +217,17 @@ def _convert_audio_and_split_sentences(extracted_dir, data_set, dest_dir):
flac_file = os.path.join(root, seqid + ".flac")
wav_file = os.path.join(target_dir, seqid + ".wav")
if not os.path.exists(wav_file):
Transformer().build(flac_file, wav_file)
tfm = Transformer()
tfm.set_output_format(rate=SAMPLE_RATE)
tfm.build(flac_file, wav_file)
wav_filesize = os.path.getsize(wav_file)
files.append((os.path.abspath(wav_file), wav_filesize, transcript))
return pandas.DataFrame(data=files, columns=["wav_filename", "wav_filesize", "transcript"])
return pandas.DataFrame(
data=files, columns=["wav_filename", "wav_filesize", "transcript"]
)
if __name__ == "__main__":
_download_and_preprocess_data(sys.argv[1])

266
bin/import_lingua_libre.py Executable file
View File

@ -0,0 +1,266 @@
#!/usr/bin/env python3
import argparse
import csv
import os
import re
import subprocess
import unicodedata
import zipfile
from glob import glob
from multiprocessing import Pool
import progressbar
import sox
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
BITDEPTH = 16
N_CHANNELS = 1
MAX_SECS = 10
ARCHIVE_DIR_NAME = "lingua_libre"
ARCHIVE_NAME = "Q{qId}-{iso639_3}-{language_English_name}.zip"
ARCHIVE_URL = "https://lingualibre.fr/datasets/" + ARCHIVE_NAME
def _download_and_preprocess_data(target_dir):
# Making path absolute
target_dir = os.path.abspath(target_dir)
# Conditionally download data
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Produce CSV files and convert ogg data to wav
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)
def _maybe_extract(target_dir, extracted_data, archive_path):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = os.path.join(target_dir, extracted_data)
if not os.path.exists(extracted_path):
print('No directory "%s" - extracting archive...' % extracted_path)
if not os.path.isdir(extracted_path):
os.mkdir(extracted_path)
with zipfile.ZipFile(archive_path) as zip_f:
zip_f.extractall(extracted_path)
else:
print('Found directory "%s" - not extracting it from archive.' % archive_path)
def one_sample(sample):
""" Take a audio file, and optionally convert it to 16kHz WAV """
ogg_filename = sample[0]
# Storing wav files next to the ogg ones - just with a different suffix
wav_filename = os.path.splitext(ogg_filename)[0] + ".wav"
_maybe_convert_wav(ogg_filename, wav_filename)
file_size = -1
frames = 0
if os.path.exists(wav_filename):
file_size = os.path.getsize(wav_filename)
frames = int(
subprocess.check_output(
["soxi", "-s", wav_filename], stderr=subprocess.STDOUT
)
)
label = label_filter(sample[1])
rows = []
counter = get_counter()
if file_size == -1:
# Excluding samples that failed upon conversion
counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 10 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter["imported_time"] += frames
counter["all"] += 1
counter["total_time"] += frames
return (counter, rows)
def _maybe_convert_sets(target_dir, extracted_data):
extracted_dir = os.path.join(target_dir, extracted_data)
# override existing CSV with normalized one
target_csv_template = os.path.join(
target_dir, ARCHIVE_DIR_NAME + "_" + ARCHIVE_NAME.replace(".zip", "_{}.csv")
)
if os.path.isfile(target_csv_template):
return
ogg_root_dir = os.path.join(extracted_dir, ARCHIVE_NAME.replace(".zip", ""))
# Get audiofile path and transcript for each sentence in tsv
samples = []
glob_dir = os.path.join(ogg_root_dir, "**/*.ogg")
for record in glob(glob_dir, recursive=True):
record_file = record.replace(ogg_root_dir + os.path.sep, "")
if record_filter(record_file):
samples.append(
(
os.path.join(ogg_root_dir, record_file),
os.path.splitext(os.path.basename(record_file))[0],
)
)
counter = get_counter()
num_samples = len(samples)
rows = []
print("Importing ogg files...")
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
counter += processed[0]
rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
with open(
target_csv_template.format("train"), "w", encoding="utf-8", newline=""
) as train_csv_file: # 80%
with open(
target_csv_template.format("dev"), "w", encoding="utf-8", newline=""
) as dev_csv_file: # 10%
with open(
target_csv_template.format("test"), "w", encoding="utf-8", newline=""
) as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
dev_writer.writeheader()
test_writer = csv.DictWriter(test_csv_file, fieldnames=FIELDNAMES)
test_writer.writeheader()
for i, item in enumerate(rows):
transcript = validate_label(item[2])
if not transcript:
continue
wav_filename = os.path.join(
ogg_root_dir, item[0].replace(".ogg", ".wav")
)
i_mod = i % 10
if i_mod == 0:
writer = test_writer
elif i_mod == 1:
writer = dev_writer
else:
writer = train_writer
writer.writerow(
dict(
wav_filename=wav_filename,
wav_filesize=os.path.getsize(wav_filename),
transcript=transcript,
)
)
imported_samples = get_imported_samples(counter)
assert counter["all"] == num_samples
assert len(rows) == imported_samples
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
def _maybe_convert_wav(ogg_filename, wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(
samplerate=SAMPLE_RATE, n_channels=N_CHANNELS, bitdepth=BITDEPTH
)
try:
transformer.build(ogg_filename, wav_filename)
except sox.core.SoxError as ex:
print("SoX processing error", ex, ogg_filename, wav_filename)
def handle_args():
parser = get_importers_parser(
description="Importer for LinguaLibre dataset. Check https://lingualibre.fr/wiki/Help:Download_from_LinguaLibre for details."
)
parser.add_argument(dest="target_dir")
parser.add_argument(
"--qId", type=int, required=True, help="LinguaLibre language qId"
)
parser.add_argument(
"--iso639-3", type=str, required=True, help="ISO639-3 language code"
)
parser.add_argument(
"--english-name", type=str, required=True, help="English name of the language"
)
parser.add_argument(
"--filter_alphabet",
help="Exclude samples with characters not in provided alphabet",
)
parser.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
parser.add_argument(
"--bogus-records",
type=argparse.FileType("r"),
required=False,
help="Text file listing well-known bogus record to skip from importing, from https://lingualibre.fr/wiki/LinguaLibre:Misleading_items",
)
return parser.parse_args()
if __name__ == "__main__":
CLI_ARGS = handle_args()
ALPHABET = Alphabet(CLI_ARGS.filter_alphabet) if CLI_ARGS.filter_alphabet else None
validate_label = get_validate_label(CLI_ARGS)
bogus_regexes = []
if CLI_ARGS.bogus_records:
for line in CLI_ARGS.bogus_records:
bogus_regexes.append(re.compile(line.strip()))
def record_filter(path):
if any(regex.match(path) for regex in bogus_regexes):
print("Reject", path)
return False
return True
def label_filter(label):
if CLI_ARGS.normalize:
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = validate_label(label)
if ALPHABET and label and not ALPHABET.CanEncode(label):
label = None
return label
ARCHIVE_NAME = ARCHIVE_NAME.format(
qId=CLI_ARGS.qId,
iso639_3=CLI_ARGS.iso639_3,
language_English_name=CLI_ARGS.english_name,
)
ARCHIVE_URL = ARCHIVE_URL.format(
qId=CLI_ARGS.qId,
iso639_3=CLI_ARGS.iso639_3,
language_English_name=CLI_ARGS.english_name,
)
_download_and_preprocess_data(target_dir=CLI_ARGS.target_dir)

242
bin/import_m-ailabs.py Executable file
View File

@ -0,0 +1,242 @@
#!/usr/bin/env python3
# pylint: disable=invalid-name
import csv
import os
import subprocess
import tarfile
import unicodedata
from glob import glob
from multiprocessing import Pool
import progressbar
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
MAX_SECS = 15
ARCHIVE_DIR_NAME = "{language}"
ARCHIVE_NAME = "{language}.tgz"
ARCHIVE_URL = "http://www.caito.de/data/Training/stt_tts/" + ARCHIVE_NAME
def _download_and_preprocess_data(target_dir):
# Making path absolute
target_dir = os.path.abspath(target_dir)
# Conditionally download data
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Produce CSV files
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)
def _maybe_extract(target_dir, extracted_data, archive_path):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = os.path.join(target_dir, extracted_data)
if not os.path.exists(extracted_path):
print('No directory "%s" - extracting archive...' % extracted_path)
if not os.path.isdir(extracted_path):
os.mkdir(extracted_path)
tar = tarfile.open(archive_path)
tar.extractall(extracted_path)
tar.close()
else:
print('Found directory "%s" - not extracting it from archive.' % archive_path)
def one_sample(sample):
""" Take a audio file, and optionally convert it to 16kHz WAV """
wav_filename = sample[0]
file_size = -1
frames = 0
if os.path.exists(wav_filename):
tmp_filename = os.path.splitext(wav_filename)[0] + ".tmp.wav"
subprocess.check_call(
[
"sox",
wav_filename,
"-r",
str(SAMPLE_RATE),
"-c",
"1",
"-b",
"16",
tmp_filename,
],
stderr=subprocess.STDOUT,
)
os.rename(tmp_filename, wav_filename)
file_size = os.path.getsize(wav_filename)
frames = int(
subprocess.check_output(
["soxi", "-s", wav_filename], stderr=subprocess.STDOUT
)
)
label = label_filter(sample[1])
counter = get_counter()
rows = []
if file_size == -1:
# Excluding samples that failed upon conversion
print("conversion failure", wav_filename)
counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 15 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter["imported_time"] += frames
counter["all"] += 1
counter["total_time"] += frames
return (counter, rows)
def _maybe_convert_sets(target_dir, extracted_data):
extracted_dir = os.path.join(target_dir, extracted_data)
# override existing CSV with normalized one
target_csv_template = os.path.join(
target_dir, ARCHIVE_DIR_NAME, ARCHIVE_NAME.replace(".tgz", "_{}.csv")
)
if os.path.isfile(target_csv_template):
return
wav_root_dir = os.path.join(extracted_dir)
# Get audiofile path and transcript for each sentence in tsv
samples = []
glob_dir = os.path.join(wav_root_dir, "**/metadata.csv")
for record in glob(glob_dir, recursive=True):
if any(
map(lambda sk: sk in record, SKIP_LIST)
): # pylint: disable=cell-var-from-loop
continue
with open(record, "r") as rec:
for re in rec.readlines():
re = re.strip().split("|")
audio = os.path.join(os.path.dirname(record), "wavs", re[0] + ".wav")
transcript = re[2]
samples.append((audio, transcript))
counter = get_counter()
num_samples = len(samples)
rows = []
print("Importing WAV files...")
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
counter += processed[0]
rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
with open(
target_csv_template.format("train"), "w", encoding="utf-8", newline=""
) as train_csv_file: # 80%
with open(
target_csv_template.format("dev"), "w", encoding="utf-8", newline=""
) as dev_csv_file: # 10%
with open(
target_csv_template.format("test"), "w", encoding="utf-8", newline=""
) as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
dev_writer.writeheader()
test_writer = csv.DictWriter(test_csv_file, fieldnames=FIELDNAMES)
test_writer.writeheader()
for i, item in enumerate(rows):
transcript = validate_label(item[2])
if not transcript:
continue
wav_filename = item[0]
i_mod = i % 10
if i_mod == 0:
writer = test_writer
elif i_mod == 1:
writer = dev_writer
else:
writer = train_writer
writer.writerow(
dict(
wav_filename=os.path.relpath(wav_filename, extracted_dir),
wav_filesize=os.path.getsize(wav_filename),
transcript=transcript,
)
)
imported_samples = get_imported_samples(counter)
assert counter["all"] == num_samples
assert len(rows) == imported_samples
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
def handle_args():
parser = get_importers_parser(
description="Importer for M-AILABS dataset. https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/."
)
parser.add_argument(dest="target_dir")
parser.add_argument(
"--filter_alphabet",
help="Exclude samples with characters not in provided alphabet",
)
parser.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
parser.add_argument(
"--skiplist",
type=str,
default="",
help="Directories / books to skip, comma separated",
)
parser.add_argument(
"--language", required=True, type=str, help="Dataset language to use"
)
return parser.parse_args()
if __name__ == "__main__":
CLI_ARGS = handle_args()
ALPHABET = Alphabet(CLI_ARGS.filter_alphabet) if CLI_ARGS.filter_alphabet else None
SKIP_LIST = filter(None, CLI_ARGS.skiplist.split(","))
validate_label = get_validate_label(CLI_ARGS)
def label_filter(label):
if CLI_ARGS.normalize:
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = validate_label(label)
if ALPHABET and label and not ALPHABET.CanEncode(label):
label = None
return label
ARCHIVE_DIR_NAME = ARCHIVE_DIR_NAME.format(language=CLI_ARGS.language)
ARCHIVE_NAME = ARCHIVE_NAME.format(language=CLI_ARGS.language)
ARCHIVE_URL = ARCHIVE_URL.format(language=CLI_ARGS.language)
_download_and_preprocess_data(target_dir=CLI_ARGS.target_dir)

127
bin/import_magicdata.py Executable file
View File

@ -0,0 +1,127 @@
#!/usr/bin/env python
import glob
import os
import tarfile
import wave
import pandas
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]
def extract(archive_path, target_dir):
print("Extracting {} into {}...".format(archive_path, target_dir))
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
def is_file_truncated(wav_filename, wav_filesize):
with wave.open(wav_filename, mode="rb") as fin:
assert fin.getframerate() == 16000
assert fin.getsampwidth() == 2
assert fin.getnchannels() == 1
header_duration = fin.getnframes() / fin.getframerate()
filesize_duration = (wav_filesize - 44) / 16000 / 2
return header_duration != filesize_duration
def preprocess_data(folder_with_archives, target_dir):
# First extract subset archives
for subset in ("train", "dev", "test"):
extract(
os.path.join(
folder_with_archives, "magicdata_{}_set.tar.gz".format(subset)
),
target_dir,
)
# Folder structure is now:
# - magicdata_{train,dev,test}.tar.gz
# - magicdata/
# - train/*.wav
# - train/TRANS.txt
# - dev/*.wav
# - dev/TRANS.txt
# - test/*.wav
# - test/TRANS.txt
# The TRANS files are CSVs with three columns, one containing the WAV file
# name, one containing the speaker ID, and one containing the transcription
def load_set(set_path):
transcripts = pandas.read_csv(
os.path.join(set_path, "TRANS.txt"), sep="\t", index_col=0
)
glob_path = os.path.join(set_path, "*", "*.wav")
set_files = []
for wav in glob.glob(glob_path):
try:
wav_filename = wav
wav_filesize = os.path.getsize(wav)
transcript_key = os.path.basename(wav)
transcript = transcripts.loc[transcript_key, "Transcription"]
# Some files in this dataset are truncated, the header duration
# doesn't match the file size. This causes errors at training
# time, so check here if things are fine before including a file
if is_file_truncated(wav_filename, wav_filesize):
print(
"Warning: File {} is corrupted, header duration does "
"not match file size. Ignoring.".format(wav_filename)
)
continue
set_files.append((wav_filename, wav_filesize, transcript))
except KeyError:
print("Warning: Missing transcript for WAV file {}.".format(wav))
return set_files
for subset in ("train", "dev", "test"):
print("Loading {} set samples...".format(subset))
subset_files = load_set(os.path.join(target_dir, subset))
df = pandas.DataFrame(data=subset_files, columns=COLUMN_NAMES)
# Trim train set to under 10s
if subset == "train":
durations = (df["wav_filesize"] - 44) / 16000 / 2
df = df[durations <= 10.0]
print("Trimming {} samples > 10 seconds".format((durations > 10.0).sum()))
with_noise = df["transcript"].str.contains(r"\[(FIL|SPK)\]")
df = df[~with_noise]
print(
"Trimming {} samples with noise ([FIL] or [SPK])".format(
sum(with_noise)
)
)
dest_csv = os.path.join(target_dir, "magicdata_{}.csv".format(subset))
print("Saving {} set into {}...".format(subset, dest_csv))
df.to_csv(dest_csv, index=False)
def main():
# https://openslr.org/68/
parser = get_importers_parser(description="Import MAGICDATA corpus")
parser.add_argument(
"folder_with_archives",
help="Path to folder containing magicdata_{train,dev,test}.tar.gz",
)
parser.add_argument(
"--target_dir",
default="",
help="Target folder to extract files into and put the resulting CSVs. Defaults to a folder called magicdata next to the archives",
)
params = parser.parse_args()
if not params.target_dir:
params.target_dir = os.path.join(params.folder_with_archives, "magicdata")
preprocess_data(params.folder_with_archives, params.target_dir)
if __name__ == "__main__":
main()

99
bin/import_mls_english.py Normal file
View File

@ -0,0 +1,99 @@
#!/usr/bin/env python
import argparse
import ctypes
import os
from pathlib import Path
import pandas
from tqdm import tqdm
def read_ogg_opus_duration(ogg_file_path):
error = ctypes.c_int()
opusfile = pyogg.opus.op_open_file(
ogg_file_path.encode("utf-8"), ctypes.pointer(error)
)
if error.value != 0:
raise ValueError(
("Ogg/Opus file could not be read." "Error code: {}").format(error.value)
)
pcm_buffer_size = pyogg.opus.op_pcm_total(opusfile, -1)
channel_count = pyogg.opus.op_channel_count(opusfile, -1)
sample_rate = 48000 # opus files are always 48kHz
sample_width = 2 # always 16-bit
pyogg.opus.op_free(opusfile)
return pcm_buffer_size / sample_rate
def main(root_dir):
for subset in (
"train",
"dev",
"test",
):
print("Processing {} subset...".format(subset))
with open(Path(root_dir) / subset / "transcripts.txt") as fin:
subset_entries = []
for i, line in tqdm(enumerate(fin)):
audio_id, transcript = line.split("\t")
audio_id_parts = audio_id.split("_")
# e.g. 4800_10003_000000 -> train/audio/4800/10003/4800_10003_000000.opus
audio_path = (
Path(root_dir)
/ subset
/ "audio"
/ audio_id_parts[0]
/ audio_id_parts[1]
/ "{}.opus".format(audio_id)
)
audio_duration = read_ogg_opus_duration(audio_path)
# TODO: support other languages
transcript = (
transcript.strip()
.replace("-", " ")
.replace("ñ", "n")
.replace(".", "")
.translate(
{
ord(ch): None
for ch in (
"а",
"в",
"е",
"и",
"к",
"м",
"н",
"о",
"п",
"р",
"т",
"ы",
"я",
)
}
)
)
subset_entries.append(
(
audio_path.relative_to(root_dir),
audio_duration,
transcript.strip(),
)
)
df = pandas.DataFrame(
columns=["wav_filename", "wav_filesize", "transcript"],
data=subset_entries,
)
csv_name = Path(root_dir) / "{}.csv".format(subset)
df.to_csv(csv_name, index=False)
print("Wrote {}".format(csv_name))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("root_dir", help="Path to the mls_english_opus directory.")
args = parser.parse_args()
main(args.root_dir)

102
bin/import_primewords.py Executable file
View File

@ -0,0 +1,102 @@
#!/usr/bin/env python
import glob
import json
import os
import tarfile
import numpy as np
import pandas
from coqui_stt_training.util.importers import get_importers_parser
COLUMN_NAMES = ["wav_filename", "wav_filesize", "transcript"]
def extract(archive_path, target_dir):
print("Extracting {} into {}...".format(archive_path, target_dir))
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
def preprocess_data(tgz_file, target_dir):
# First extract main archive and sub-archives
extract(tgz_file, target_dir)
main_folder = os.path.join(target_dir, "primewords_md_2018_set1")
# Folder structure is now:
# - primewords_md_2018_set1/
# - audio_files/
# - [0-f]/[00-0f]/*.wav
# - set1_transcript.json
transcripts_path = os.path.join(main_folder, "set1_transcript.json")
with open(transcripts_path) as fin:
transcripts = json.load(fin)
transcripts = {entry["file"]: entry["text"] for entry in transcripts}
def load_set(glob_path):
set_files = []
for wav in glob.glob(glob_path):
try:
wav_filename = wav
wav_filesize = os.path.getsize(wav)
transcript_key = os.path.basename(wav)
transcript = transcripts[transcript_key]
set_files.append((wav_filename, wav_filesize, transcript))
except KeyError:
print("Warning: Missing transcript for WAV file {}.".format(wav))
return set_files
# Load all files, then deterministically split into train/dev/test sets
all_files = load_set(os.path.join(main_folder, "audio_files", "*", "*", "*.wav"))
df = pandas.DataFrame(data=all_files, columns=COLUMN_NAMES)
df.sort_values(by="wav_filename", inplace=True)
indices = np.arange(0, len(df))
np.random.seed(12345)
np.random.shuffle(indices)
# Total corpus size: 50287 samples. 5000 samples gives us 99% confidence
# level with a margin of error of under 2%.
test_indices = indices[-5000:]
dev_indices = indices[-10000:-5000]
train_indices = indices[:-10000]
train_files = df.iloc[train_indices]
durations = (train_files["wav_filesize"] - 44) / 16000 / 2
train_files = train_files[durations <= 15.0]
print("Trimming {} samples > 15 seconds".format((durations > 15.0).sum()))
dest_csv = os.path.join(target_dir, "primewords_train.csv")
print("Saving train set into {}...".format(dest_csv))
train_files.to_csv(dest_csv, index=False)
dev_files = df.iloc[dev_indices]
dest_csv = os.path.join(target_dir, "primewords_dev.csv")
print("Saving dev set into {}...".format(dest_csv))
dev_files.to_csv(dest_csv, index=False)
test_files = df.iloc[test_indices]
dest_csv = os.path.join(target_dir, "primewords_test.csv")
print("Saving test set into {}...".format(dest_csv))
test_files.to_csv(dest_csv, index=False)
def main():
# https://www.openslr.org/47/
parser = get_importers_parser(description="Import Primewords Chinese corpus set 1")
parser.add_argument("tgz_file", help="Path to primewords_md_2018_set1.tar.gz")
parser.add_argument(
"--target_dir",
default="",
help="Target folder to extract files into and put the resulting CSVs. Defaults to same folder as the main archive.",
)
params = parser.parse_args()
if not params.target_dir:
params.target_dir = os.path.dirname(params.tgz_file)
preprocess_data(params.tgz_file, params.target_dir)
if __name__ == "__main__":
main()

236
bin/import_slr57.py Executable file
View File

@ -0,0 +1,236 @@
#!/usr/bin/env python3
import csv
import os
import subprocess
import tarfile
import unicodedata
from glob import glob
from multiprocessing import Pool
import progressbar
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import (
get_counter,
get_imported_samples,
get_importers_parser,
get_validate_label,
print_import_report,
)
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
MAX_SECS = 15
ARCHIVE_DIR_NAME = "African_Accented_French"
ARCHIVE_NAME = "African_Accented_French.tar.gz"
ARCHIVE_URL = "http://www.openslr.org/resources/57/" + ARCHIVE_NAME
def _download_and_preprocess_data(target_dir):
# Making path absolute
target_dir = os.path.abspath(target_dir)
# Conditionally download data
archive_path = maybe_download(ARCHIVE_NAME, target_dir, ARCHIVE_URL)
# Conditionally extract data
_maybe_extract(target_dir, ARCHIVE_DIR_NAME, archive_path)
# Produce CSV files
_maybe_convert_sets(target_dir, ARCHIVE_DIR_NAME)
def _maybe_extract(target_dir, extracted_data, archive_path):
# If target_dir/extracted_data does not exist, extract archive in target_dir
extracted_path = os.path.join(target_dir, extracted_data)
if not os.path.exists(extracted_path):
print('No directory "%s" - extracting archive...' % extracted_path)
if not os.path.isdir(extracted_path):
os.mkdir(extracted_path)
tar = tarfile.open(archive_path)
tar.extractall(target_dir)
tar.close()
else:
print('Found directory "%s" - not extracting it from archive.' % archive_path)
def one_sample(sample):
""" Take a audio file, and optionally convert it to 16kHz WAV """
wav_filename = sample[0]
file_size = -1
frames = 0
if os.path.exists(wav_filename):
file_size = os.path.getsize(wav_filename)
frames = int(
subprocess.check_output(
["soxi", "-s", wav_filename], stderr=subprocess.STDOUT
)
)
label = label_filter(sample[1])
counter = get_counter()
rows = []
if file_size == -1:
# Excluding samples that failed upon conversion
counter["failed"] += 1
elif label is None:
# Excluding samples that failed on label validation
counter["invalid_label"] += 1
elif int(frames / SAMPLE_RATE * 1000 / 15 / 2) < len(str(label)):
# Excluding samples that are too short to fit the transcript
counter["too_short"] += 1
elif frames / SAMPLE_RATE > MAX_SECS:
# Excluding very long samples to keep a reasonable batch-size
counter["too_long"] += 1
else:
# This one is good - keep it for the target CSV
rows.append((wav_filename, file_size, label))
counter["imported_time"] += frames
counter["all"] += 1
counter["total_time"] += frames
return (counter, rows)
def _maybe_convert_sets(target_dir, extracted_data):
extracted_dir = os.path.join(target_dir, extracted_data)
# override existing CSV with normalized one
target_csv_template = os.path.join(
target_dir, ARCHIVE_DIR_NAME, ARCHIVE_NAME.replace(".tar.gz", "_{}.csv")
)
if os.path.isfile(target_csv_template):
return
wav_root_dir = os.path.join(extracted_dir)
all_files = [
"transcripts/train/yaounde/fn_text.txt",
"transcripts/train/ca16_conv/transcripts.txt",
"transcripts/train/ca16_read/conditioned.txt",
"transcripts/dev/niger_west_african_fr/transcripts.txt",
"speech/dev/niger_west_african_fr/niger_wav_file_name_transcript.tsv",
"transcripts/devtest/ca16_read/conditioned.txt",
"transcripts/test/ca16/prompts.txt",
]
transcripts = {}
for tr in all_files:
with open(os.path.join(target_dir, ARCHIVE_DIR_NAME, tr), "r") as tr_source:
for line in tr_source.readlines():
line = line.strip()
if ".tsv" in tr:
sep = " "
else:
sep = " "
audio = os.path.basename(line.split(sep)[0])
if not (".wav" in audio):
if ".tdf" in audio:
audio = audio.replace(".tdf", ".wav")
else:
audio += ".wav"
transcript = " ".join(line.split(sep)[1:])
transcripts[audio] = transcript
# Get audiofile path and transcript for each sentence in tsv
samples = []
glob_dir = os.path.join(wav_root_dir, "**/*.wav")
for record in glob(glob_dir, recursive=True):
record_file = os.path.basename(record)
if record_file in transcripts:
samples.append((record, transcripts[record_file]))
# Keep track of how many samples are good vs. problematic
counter = get_counter()
num_samples = len(samples)
rows = []
print("Importing WAV files...")
pool = Pool()
bar = progressbar.ProgressBar(max_value=num_samples, widgets=SIMPLE_BAR)
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
counter += processed[0]
rows += processed[1]
bar.update(i)
bar.update(num_samples)
pool.close()
pool.join()
with open(
target_csv_template.format("train"), "w", encoding="utf-8", newline=""
) as train_csv_file: # 80%
with open(
target_csv_template.format("dev"), "w", encoding="utf-8", newline=""
) as dev_csv_file: # 10%
with open(
target_csv_template.format("test"), "w", encoding="utf-8", newline=""
) as test_csv_file: # 10%
train_writer = csv.DictWriter(train_csv_file, fieldnames=FIELDNAMES)
train_writer.writeheader()
dev_writer = csv.DictWriter(dev_csv_file, fieldnames=FIELDNAMES)
dev_writer.writeheader()
test_writer = csv.DictWriter(test_csv_file, fieldnames=FIELDNAMES)
test_writer.writeheader()
for i, item in enumerate(rows):
transcript = validate_label(item[2])
if not transcript:
continue
wav_filename = item[0]
i_mod = i % 10
if i_mod == 0:
writer = test_writer
elif i_mod == 1:
writer = dev_writer
else:
writer = train_writer
writer.writerow(
dict(
wav_filename=wav_filename,
wav_filesize=os.path.getsize(wav_filename),
transcript=transcript,
)
)
imported_samples = get_imported_samples(counter)
assert counter["all"] == num_samples
assert len(rows) == imported_samples
print_import_report(counter, SAMPLE_RATE, MAX_SECS)
def handle_args():
parser = get_importers_parser(
description="Importer for African Accented French dataset. More information on http://www.openslr.org/57/."
)
parser.add_argument(dest="target_dir")
parser.add_argument(
"--filter_alphabet",
help="Exclude samples with characters not in provided alphabet",
)
parser.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
return parser.parse_args()
if __name__ == "__main__":
CLI_ARGS = handle_args()
ALPHABET = Alphabet(CLI_ARGS.filter_alphabet) if CLI_ARGS.filter_alphabet else None
validate_label = get_validate_label(CLI_ARGS)
def label_filter(label):
if CLI_ARGS.normalize:
label = (
unicodedata.normalize("NFKD", label.strip())
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
label = validate_label(label)
if ALPHABET and label and not ALPHABET.CanEncode(label):
label = None
return label
_download_and_preprocess_data(target_dir=CLI_ARGS.target_dir)

View File

@ -1,44 +1,38 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
# ensure that you have downloaded the LDC dataset LDC97S62 and tar exists in a folder e.g.
# ./data/swb/swb1_LDC97S62.tgz
# from the deepspeech directory run with: ./bin/import_swb.py ./data/swb/
import sys
import os
sys.path.insert(1, os.path.join(sys.path[0], '..'))
# from the Coqui STT directory run with: ./bin/import_swb.py ./data/swb/
import codecs
import fnmatch
import pandas
import os
import random
import subprocess
import sys
import tarfile
import unicodedata
import wave
import codecs
import tarfile
import requests
from util.text import validate_label
import librosa
import soundfile # <= Has an external dependency on libsndfile
import pandas
import requests
import soundfile # <= Has an external dependency on libsndfile
from coqui_stt_training.util.importers import validate_label_eng as validate_label
# ARCHIVE_NAME refers to ISIP alignments from 01/29/03
ARCHIVE_NAME = 'switchboard_word_alignments.tar.gz'
ARCHIVE_URL = 'http://www.openslr.org/resources/5/'
ARCHIVE_DIR_NAME = 'LDC97S62'
LDC_DATASET = 'swb1_LDC97S62.tgz'
ARCHIVE_NAME = "switchboard_word_alignments.tar.gz"
ARCHIVE_URL = "http://www.openslr.org/resources/5/"
ARCHIVE_DIR_NAME = "LDC97S62"
LDC_DATASET = "swb1_LDC97S62.tgz"
def download_file(folder, url):
# https://stackoverflow.com/a/16696317/738515
local_filename = url.split('/')[-1]
local_filename = url.split("/")[-1]
full_filename = os.path.join(folder, local_filename)
r = requests.get(url, stream=True)
with open(full_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
with open(full_filename, "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
return full_filename
@ -46,10 +40,10 @@ def download_file(folder, url):
def maybe_download(archive_url, target_dir, ldc_dataset):
# If archive file does not exist, download it...
archive_path = os.path.join(target_dir, ldc_dataset)
ldc_path = archive_url+ldc_dataset
ldc_path = archive_url + ldc_dataset
if not os.path.exists(target_dir):
print('No path "%s" - creating ...' % target_dir)
makedirs(target_dir)
os.makedirs(target_dir)
if not os.path.exists(archive_path):
print('No archive "%s" - downloading...' % archive_path)
@ -65,17 +59,23 @@ def _download_and_preprocess_data(data_dir):
archive_path = os.path.abspath(os.path.join(data_dir, LDC_DATASET))
# Check swb1_LDC97S62.tgz then extract
assert(os.path.isfile(archive_path))
assert os.path.isfile(archive_path)
_extract(target_dir, archive_path)
# Transcripts
transcripts_path = maybe_download(ARCHIVE_URL, target_dir, ARCHIVE_NAME)
_extract(target_dir, transcripts_path)
# Check swb1_d1/2/3/4/swb_ms98_transcriptions
expected_folders = ["swb1_d1","swb1_d2","swb1_d3","swb1_d4","swb_ms98_transcriptions"]
assert(all([os.path.isdir(os.path.join(target_dir,e)) for e in expected_folders]))
expected_folders = [
"swb1_d1",
"swb1_d2",
"swb1_d3",
"swb1_d4",
"swb_ms98_transcriptions",
]
assert all([os.path.isdir(os.path.join(target_dir, e)) for e in expected_folders])
# Conditionally convert swb sph data to wav
_maybe_convert_wav(target_dir, "swb1_d1", "swb1_d1-wav")
_maybe_convert_wav(target_dir, "swb1_d2", "swb1_d2-wav")
@ -83,13 +83,21 @@ def _download_and_preprocess_data(data_dir):
_maybe_convert_wav(target_dir, "swb1_d4", "swb1_d4-wav")
# Conditionally split wav data
d1 = _maybe_split_wav_and_sentences(target_dir, "swb_ms98_transcriptions", "swb1_d1-wav", "swb1_d1-split-wav")
d2 = _maybe_split_wav_and_sentences(target_dir, "swb_ms98_transcriptions", "swb1_d2-wav", "swb1_d2-split-wav")
d3 = _maybe_split_wav_and_sentences(target_dir, "swb_ms98_transcriptions", "swb1_d3-wav", "swb1_d3-split-wav")
d4 = _maybe_split_wav_and_sentences(target_dir, "swb_ms98_transcriptions", "swb1_d4-wav", "swb1_d4-split-wav")
d1 = _maybe_split_wav_and_sentences(
target_dir, "swb_ms98_transcriptions", "swb1_d1-wav", "swb1_d1-split-wav"
)
d2 = _maybe_split_wav_and_sentences(
target_dir, "swb_ms98_transcriptions", "swb1_d2-wav", "swb1_d2-split-wav"
)
d3 = _maybe_split_wav_and_sentences(
target_dir, "swb_ms98_transcriptions", "swb1_d3-wav", "swb1_d3-split-wav"
)
d4 = _maybe_split_wav_and_sentences(
target_dir, "swb_ms98_transcriptions", "swb1_d4-wav", "swb1_d4-split-wav"
)
swb_files = d1.append(d2).append(d3).append(d4)
train_files, dev_files, test_files = _split_sets(swb_files)
# Write sets to disk as CSV files
@ -97,7 +105,7 @@ def _download_and_preprocess_data(data_dir):
dev_files.to_csv(os.path.join(target_dir, "swb-dev.csv"), index=False)
test_files.to_csv(os.path.join(target_dir, "swb-test.csv"), index=False)
def _extract(target_dir, archive_path):
with tarfile.open(archive_path) as tar:
tar.extractall(target_dir)
@ -118,25 +126,46 @@ def _maybe_convert_wav(data_dir, original_data, converted_data):
# Loop over sph files in source_dir and convert each to 16-bit PCM wav
for root, dirnames, filenames in os.walk(source_dir):
for filename in fnmatch.filter(filenames, "*.sph"):
for channel in ['1', '2']:
for channel in ["1", "2"]:
sph_file = os.path.join(root, filename)
wav_filename = os.path.splitext(os.path.basename(sph_file))[0] + "-" + channel + ".wav"
wav_filename = (
os.path.splitext(os.path.basename(sph_file))[0]
+ "-"
+ channel
+ ".wav"
)
wav_file = os.path.join(target_dir, wav_filename)
temp_wav_filename = os.path.splitext(os.path.basename(sph_file))[0] + "-" + channel + "-temp.wav"
temp_wav_filename = (
os.path.splitext(os.path.basename(sph_file))[0]
+ "-"
+ channel
+ "-temp.wav"
)
temp_wav_file = os.path.join(target_dir, temp_wav_filename)
print("converting {} to {}".format(sph_file, temp_wav_file))
subprocess.check_call(["sph2pipe", "-c", channel, "-p", "-f", "rif", sph_file, temp_wav_file])
subprocess.check_call(
[
"sph2pipe",
"-c",
channel,
"-p",
"-f",
"rif",
sph_file,
temp_wav_file,
]
)
print("upsampling {} to {}".format(temp_wav_file, wav_file))
audioData, frameRate = librosa.load(temp_wav_file, sr=16000, mono=True)
soundfile.write(wav_file, audioData, frameRate, "PCM_16")
os.remove(temp_wav_file)
def _parse_transcriptions(trans_file):
segments = []
with codecs.open(trans_file, "r", "utf-8") as fin:
for line in fin:
if line.startswith("#") or len(line) <= 1:
if line.startswith("#") or len(line) <= 1:
continue
tokens = line.split()
@ -150,15 +179,19 @@ def _parse_transcriptions(trans_file):
# We need to do the encode-decode dance here because encode
# returns a bytes() object on Python 3, and text_to_char_array
# expects a string.
transcript = unicodedata.normalize("NFKD", transcript) \
.encode("ascii", "ignore") \
.decode("ascii", "ignore")
transcript = (
unicodedata.normalize("NFKD", transcript)
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
segments.append({
"start_time": start_time,
"stop_time": stop_time,
"transcript": transcript,
})
segments.append(
{
"start_time": start_time,
"stop_time": stop_time,
"transcript": transcript,
}
)
return segments
@ -183,8 +216,16 @@ def _maybe_split_wav_and_sentences(data_dir, trans_data, original_data, converte
segments = _parse_transcriptions(trans_file)
# Open wav corresponding to transcription file
channel = ("2","1")[(os.path.splitext(os.path.basename(trans_file))[0])[6] == 'A']
wav_filename = "sw0" + (os.path.splitext(os.path.basename(trans_file))[0])[2:6] + "-" + channel + ".wav"
channel = ("2", "1")[
(os.path.splitext(os.path.basename(trans_file))[0])[6] == "A"
]
wav_filename = (
"sw0"
+ (os.path.splitext(os.path.basename(trans_file))[0])[2:6]
+ "-"
+ channel
+ ".wav"
)
wav_file = os.path.join(source_dir, wav_filename)
print("splitting {} according to {}".format(wav_file, trans_file))
@ -200,26 +241,39 @@ def _maybe_split_wav_and_sentences(data_dir, trans_data, original_data, converte
# Create wav segment filename
start_time = segment["start_time"]
stop_time = segment["stop_time"]
new_wav_filename = os.path.splitext(os.path.basename(trans_file))[0] + "-" + str(
start_time) + "-" + str(stop_time) + ".wav"
new_wav_filename = (
os.path.splitext(os.path.basename(trans_file))[0]
+ "-"
+ str(start_time)
+ "-"
+ str(stop_time)
+ ".wav"
)
if _is_wav_too_short(new_wav_filename):
continue
continue
new_wav_file = os.path.join(target_dir, new_wav_filename)
_split_wav(origAudio, start_time, stop_time, new_wav_file)
new_wav_filesize = os.path.getsize(new_wav_file)
transcript = segment["transcript"]
files.append((os.path.abspath(new_wav_file), new_wav_filesize, transcript))
files.append(
(os.path.abspath(new_wav_file), new_wav_filesize, transcript)
)
# Close origAudio
origAudio.close()
return pandas.DataFrame(data=files, columns=["wav_filename", "wav_filesize", "transcript"])
return pandas.DataFrame(
data=files, columns=["wav_filename", "wav_filesize", "transcript"]
)
def _is_wav_too_short(wav_filename):
short_wav_filenames = ['sw2986A-ms98-a-trans-80.6385-83.358875.wav', 'sw2663A-ms98-a-trans-161.12025-164.213375.wav']
short_wav_filenames = [
"sw2986A-ms98-a-trans-80.6385-83.358875.wav",
"sw2663A-ms98-a-trans-161.12025-164.213375.wav",
]
return wav_filename in short_wav_filenames
@ -234,24 +288,61 @@ def _split_wav(origAudio, start_time, stop_time, new_wav_file):
chunkAudio.writeframes(chunkData)
chunkAudio.close()
def _split_sets(filelist):
# We initially split the entire set into 80% train and 20% test, then
# split the train set into 80% train and 20% validation.
train_beg = 0
train_end = int(0.8 * len(filelist))
dev_beg = int(0.8 * train_end)
dev_end = train_end
train_end = dev_beg
def _split_sets(filelist):
"""
randomply split the datasets into train, validation, and test sets where the size of the
validation and test sets are determined by the `get_sample_size` function.
"""
random.shuffle(filelist)
sample_size = get_sample_size(len(filelist))
train_beg = 0
train_end = len(filelist) - 2 * sample_size
dev_beg = train_end
dev_end = train_end + sample_size
test_beg = dev_end
test_end = len(filelist)
return (filelist[train_beg:train_end], filelist[dev_beg:dev_end], filelist[test_beg:test_end])
return (
filelist[train_beg:train_end],
filelist[dev_beg:dev_end],
filelist[test_beg:test_end],
)
def _read_data_set(filelist, thread_count, batch_size, numcep, numcontext, stride=1, offset=0, next_index=lambda i: i + 1, limit=0):
def get_sample_size(population_size):
"""calculates the sample size for a 99% confidence and 1% margin of error"""
margin_of_error = 0.01
fraction_picking = 0.50
z_score = 2.58 # Corresponds to confidence level 99%
numerator = (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2
)
sample_size = 0
for train_size in range(population_size, 0, -1):
denominator = 1 + (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2 * train_size
)
sample_size = int(numerator / denominator)
if 2 * sample_size + train_size <= population_size:
break
return sample_size
def _read_data_set(
filelist,
thread_count,
batch_size,
numcep,
numcontext,
stride=1,
offset=0,
next_index=lambda i: i + 1,
limit=0,
):
# Optionally apply dataset size limit
if limit > 0:
filelist = filelist.iloc[:limit]
@ -259,7 +350,9 @@ def _read_data_set(filelist, thread_count, batch_size, numcep, numcontext, strid
filelist = filelist[offset::stride]
# Return DataSet
return DataSet(txt_files, thread_count, batch_size, numcep, numcontext, next_index=next_index)
return DataSet(
txt_files, thread_count, batch_size, numcep, numcontext, next_index=next_index
)
if __name__ == "__main__":

577
bin/import_swc.py Executable file
View File

@ -0,0 +1,577 @@
#!/usr/bin/env python
"""
Downloads and prepares (parts of) the "Spoken Wikipedia Corpora" for train.py
Use "python3 import_swc.py -h" for help
"""
import argparse
import csv
import os
import random
import re
import shutil
import sys
import tarfile
import unicodedata
import wave
import xml.etree.ElementTree as ET
from collections import Counter
from glob import glob
from multiprocessing.pool import ThreadPool
import progressbar
import sox
from coqui_stt_ctcdecoder import Alphabet
from coqui_stt_training.util.downloader import SIMPLE_BAR, maybe_download
from coqui_stt_training.util.importers import validate_label_eng as validate_label
SWC_URL = "https://www2.informatik.uni-hamburg.de/nats/pub/SWC/SWC_{language}.tar"
SWC_ARCHIVE = "SWC_{language}.tar"
LANGUAGES = ["dutch", "english", "german"]
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
FIELDNAMES_EXT = FIELDNAMES + ["article", "speaker"]
CHANNELS = 1
SAMPLE_RATE = 16000
UNKNOWN = "<unknown>"
AUDIO_PATTERN = "audio*.ogg"
WAV_NAME = "audio.wav"
ALIGNED_NAME = "aligned.swc"
SUBSTITUTIONS = {
"german": [
(re.compile(r"\$"), "dollar"),
(re.compile(r""), "euro"),
(re.compile(r"£"), "pfund"),
(
re.compile(r"ein tausend ([^\s]+) hundert ([^\s]+) er( |$)"),
r"\1zehnhundert \2er ",
),
(re.compile(r"ein tausend (acht|neun) hundert"), r"\1zehnhundert"),
(
re.compile(
r"eins punkt null null null punkt null null null punkt null null null"
),
"eine milliarde",
),
(
re.compile(
r"punkt null null null punkt null null null punkt null null null"
),
"milliarden",
),
(re.compile(r"eins punkt null null null punkt null null null"), "eine million"),
(re.compile(r"punkt null null null punkt null null null"), "millionen"),
(re.compile(r"eins punkt null null null"), "ein tausend"),
(re.compile(r"punkt null null null"), "tausend"),
(re.compile(r"punkt null"), None),
]
}
DONT_NORMALIZE = {"german": "ÄÖÜäöüß"}
PRE_FILTER = str.maketrans(dict.fromkeys("/()[]{}<>:"))
class Sample:
def __init__(self, wav_path, start, end, text, article, speaker, sub_set=None):
self.wav_path = wav_path
self.start = start
self.end = end
self.text = text
self.article = article
self.speaker = speaker
self.sub_set = sub_set
def fail(message):
print(message)
sys.exit(1)
def group(lst, get_key):
groups = {}
for obj in lst:
key = get_key(obj)
if key in groups:
groups[key].append(obj)
else:
groups[key] = [obj]
return groups
def get_sample_size(population_size):
margin_of_error = 0.01
fraction_picking = 0.50
z_score = 2.58 # Corresponds to confidence level 99%
numerator = (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2
)
sample_size = 0
for train_size in range(population_size, 0, -1):
denominator = 1 + (z_score ** 2 * fraction_picking * (1 - fraction_picking)) / (
margin_of_error ** 2 * train_size
)
sample_size = int(numerator / denominator)
if 2 * sample_size + train_size <= population_size:
break
return sample_size
def maybe_download_language(language):
lang_upper = language[0].upper() + language[1:]
return maybe_download(
SWC_ARCHIVE.format(language=lang_upper),
CLI_ARGS.base_dir,
SWC_URL.format(language=lang_upper),
)
def maybe_extract(data_dir, extracted_data, archive):
extracted = os.path.join(data_dir, extracted_data)
if os.path.isdir(extracted):
print('Found directory "{}" - not extracting.'.format(extracted))
else:
print('Extracting "{}"...'.format(archive))
with tarfile.open(archive) as tar:
members = tar.getmembers()
bar = progressbar.ProgressBar(max_value=len(members), widgets=SIMPLE_BAR)
for member in bar(members):
tar.extract(member=member, path=extracted)
return extracted
def ignored(node):
if node is None:
return False
if node.tag == "ignored":
return True
return ignored(node.find(".."))
def read_token(token):
texts, start, end = [], None, None
notes = token.findall("n")
if len(notes) > 0:
for note in notes:
attributes = note.attrib
if start is None and "start" in attributes:
start = int(attributes["start"])
if "end" in attributes:
token_end = int(attributes["end"])
if end is None or token_end > end:
end = token_end
if "pronunciation" in attributes:
t = attributes["pronunciation"]
texts.append(t)
elif "text" in token.attrib:
texts.append(token.attrib["text"])
return start, end, " ".join(texts)
def in_alphabet(alphabet, c):
return alphabet.CanEncode(c) if alphabet else True
ALPHABETS = {}
def get_alphabet(language):
if language in ALPHABETS:
return ALPHABETS[language]
alphabet_path = getattr(CLI_ARGS, language + "_alphabet")
alphabet = Alphabet(alphabet_path) if alphabet_path else None
ALPHABETS[language] = alphabet
return alphabet
def label_filter(label, language):
label = label.translate(PRE_FILTER)
label = validate_label(label)
if label is None:
return None, "validation"
substitutions = SUBSTITUTIONS[language] if language in SUBSTITUTIONS else []
for pattern, replacement in substitutions:
if replacement is None:
if pattern.match(label):
return None, "substitution rule"
else:
label = pattern.sub(replacement, label)
chars = []
dont_normalize = DONT_NORMALIZE[language] if language in DONT_NORMALIZE else ""
alphabet = get_alphabet(language)
for c in label:
if (
CLI_ARGS.normalize
and c not in dont_normalize
and not in_alphabet(alphabet, c)
):
c = (
unicodedata.normalize("NFKD", c)
.encode("ascii", "ignore")
.decode("ascii", "ignore")
)
for sc in c:
if not in_alphabet(alphabet, sc):
return None, "illegal character"
chars.append(sc)
label = "".join(chars)
label = validate_label(label)
return label, "validation" if label is None else None
def collect_samples(base_dir, language):
roots = []
for root, _, files in os.walk(base_dir):
if ALIGNED_NAME in files and WAV_NAME in files:
roots.append(root)
samples = []
reasons = Counter()
def add_sample(
p_wav_path, p_article, p_speaker, p_start, p_end, p_text, p_reason="complete"
):
if p_start is not None and p_end is not None and p_text is not None:
duration = p_end - p_start
text, filter_reason = label_filter(p_text, language)
skip = False
if filter_reason is not None:
skip = True
p_reason = filter_reason
elif CLI_ARGS.exclude_unknown_speakers and p_speaker == UNKNOWN:
skip = True
p_reason = "unknown speaker"
elif CLI_ARGS.exclude_unknown_articles and p_article == UNKNOWN:
skip = True
p_reason = "unknown article"
elif duration > CLI_ARGS.max_duration > 0 and CLI_ARGS.ignore_too_long:
skip = True
p_reason = "exceeded duration"
elif int(duration / 30) < len(text):
skip = True
p_reason = "too short to decode"
elif duration / len(text) < 10:
skip = True
p_reason = "length duration ratio"
if skip:
reasons[p_reason] += 1
else:
samples.append(
Sample(p_wav_path, p_start, p_end, text, p_article, p_speaker)
)
elif p_start is None or p_end is None:
reasons["missing timestamps"] += 1
else:
reasons["missing text"] += 1
print("Collecting samples...")
bar = progressbar.ProgressBar(max_value=len(roots), widgets=SIMPLE_BAR)
for root in bar(roots):
wav_path = os.path.join(root, WAV_NAME)
aligned = ET.parse(os.path.join(root, ALIGNED_NAME))
article = UNKNOWN
speaker = UNKNOWN
for prop in aligned.iter("prop"):
attributes = prop.attrib
if "key" in attributes and "value" in attributes:
if attributes["key"] == "DC.identifier":
article = attributes["value"]
elif attributes["key"] == "reader.name":
speaker = attributes["value"]
for sentence in aligned.iter("s"):
if ignored(sentence):
continue
split = False
tokens = list(map(read_token, sentence.findall("t")))
sample_start, sample_end, token_texts, sample_texts = None, None, [], []
for token_start, token_end, token_text in tokens:
if CLI_ARGS.exclude_numbers and any(c.isdigit() for c in token_text):
add_sample(
wav_path,
article,
speaker,
sample_start,
sample_end,
" ".join(sample_texts),
p_reason="has numbers",
)
sample_start, sample_end, token_texts, sample_texts = (
None,
None,
[],
[],
)
continue
if sample_start is None:
sample_start = token_start
if sample_start is None:
continue
token_texts.append(token_text)
if token_end is not None:
if (
token_start != sample_start
and token_end - sample_start > CLI_ARGS.max_duration > 0
):
add_sample(
wav_path,
article,
speaker,
sample_start,
sample_end,
" ".join(sample_texts),
p_reason="split",
)
sample_start = sample_end
sample_texts = []
split = True
sample_end = token_end
sample_texts.extend(token_texts)
token_texts = []
add_sample(
wav_path,
article,
speaker,
sample_start,
sample_end,
" ".join(sample_texts),
p_reason="split" if split else "complete",
)
print("Skipped samples:")
for reason, n in reasons.most_common():
print(" - {}: {}".format(reason, n))
return samples
def maybe_convert_one_to_wav(entry):
root, _, files = entry
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE, n_channels=CHANNELS)
combiner = sox.Combiner()
combiner.convert(samplerate=SAMPLE_RATE, n_channels=CHANNELS)
output_wav = os.path.join(root, WAV_NAME)
if os.path.isfile(output_wav):
return
files = sorted(glob(os.path.join(root, AUDIO_PATTERN)))
try:
if len(files) == 1:
transformer.build(files[0], output_wav)
elif len(files) > 1:
wav_files = []
for i, file in enumerate(files):
wav_path = os.path.join(root, "audio{}.wav".format(i))
transformer.build(file, wav_path)
wav_files.append(wav_path)
combiner.set_input_format(file_type=["wav"] * len(wav_files))
combiner.build(wav_files, output_wav, "concatenate")
except sox.core.SoxError:
return
def maybe_convert_to_wav(base_dir):
roots = list(os.walk(base_dir))
print("Converting and joining source audio files...")
bar = progressbar.ProgressBar(max_value=len(roots), widgets=SIMPLE_BAR)
tp = ThreadPool()
for _ in bar(tp.imap_unordered(maybe_convert_one_to_wav, roots)):
pass
tp.close()
tp.join()
def assign_sub_sets(samples):
sample_size = get_sample_size(len(samples))
speakers = group(samples, lambda sample: sample.speaker).values()
speakers = list(sorted(speakers, key=len))
sample_sets = [[], []]
while any(map(lambda s: len(s) < sample_size, sample_sets)) and len(speakers) > 0:
for sample_set in sample_sets:
if len(sample_set) < sample_size and len(speakers) > 0:
sample_set.extend(speakers.pop(0))
train_set = sum(speakers, [])
if len(train_set) == 0:
print(
"WARNING: Unable to build dev and test sets without speaker bias as there is no speaker meta data"
)
random.seed(42) # same source data == same output
random.shuffle(samples)
for index, sample in enumerate(samples):
if index < sample_size:
sample.sub_set = "dev"
elif index < 2 * sample_size:
sample.sub_set = "test"
else:
sample.sub_set = "train"
else:
for sub_set, sub_set_samples in [
("train", train_set),
("dev", sample_sets[0]),
("test", sample_sets[1]),
]:
for sample in sub_set_samples:
sample.sub_set = sub_set
for sub_set, sub_set_samples in group(samples, lambda s: s.sub_set).items():
t = sum(map(lambda s: s.end - s.start, sub_set_samples)) / (1000 * 60 * 60)
print(
'Sub-set "{}" with {} samples (duration: {:.2f} h)'.format(
sub_set, len(sub_set_samples), t
)
)
def create_sample_dirs(language):
print("Creating sample directories...")
for set_name in ["train", "dev", "test"]:
dir_path = os.path.join(CLI_ARGS.base_dir, language + "-" + set_name)
if not os.path.isdir(dir_path):
os.mkdir(dir_path)
def split_audio_files(samples, language):
print("Splitting audio files...")
sub_sets = Counter()
src_wav_files = group(samples, lambda s: s.wav_path).items()
bar = progressbar.ProgressBar(max_value=len(src_wav_files), widgets=SIMPLE_BAR)
for wav_path, file_samples in bar(src_wav_files):
file_samples = sorted(file_samples, key=lambda s: s.start)
with wave.open(wav_path, "r") as src_wav_file:
rate = src_wav_file.getframerate()
for sample in file_samples:
index = sub_sets[sample.sub_set]
sample_wav_path = os.path.join(
CLI_ARGS.base_dir,
language + "-" + sample.sub_set,
"sample-{0:06d}.wav".format(index),
)
sample.wav_path = sample_wav_path
sub_sets[sample.sub_set] += 1
src_wav_file.setpos(int(sample.start * rate / 1000.0))
data = src_wav_file.readframes(
int((sample.end - sample.start) * rate / 1000.0)
)
with wave.open(sample_wav_path, "w") as sample_wav_file:
sample_wav_file.setnchannels(src_wav_file.getnchannels())
sample_wav_file.setsampwidth(src_wav_file.getsampwidth())
sample_wav_file.setframerate(rate)
sample_wav_file.writeframes(data)
def write_csvs(samples, language):
for sub_set, set_samples in group(samples, lambda s: s.sub_set).items():
set_samples = sorted(set_samples, key=lambda s: s.wav_path)
base_dir = os.path.abspath(CLI_ARGS.base_dir)
csv_path = os.path.join(base_dir, language + "-" + sub_set + ".csv")
print('Writing "{}"...'.format(csv_path))
with open(csv_path, "w", encoding="utf-8", newline="") as csv_file:
writer = csv.DictWriter(
csv_file, fieldnames=FIELDNAMES_EXT if CLI_ARGS.add_meta else FIELDNAMES
)
writer.writeheader()
bar = progressbar.ProgressBar(
max_value=len(set_samples), widgets=SIMPLE_BAR
)
for sample in bar(set_samples):
row = {
"wav_filename": os.path.relpath(sample.wav_path, base_dir),
"wav_filesize": os.path.getsize(sample.wav_path),
"transcript": sample.text,
}
if CLI_ARGS.add_meta:
row["article"] = sample.article
row["speaker"] = sample.speaker
writer.writerow(row)
def cleanup(archive, language):
if not CLI_ARGS.keep_archive:
print('Removing archive "{}"...'.format(archive))
os.remove(archive)
language_dir = os.path.join(CLI_ARGS.base_dir, language)
if not CLI_ARGS.keep_intermediate and os.path.isdir(language_dir):
print('Removing intermediate files in "{}"...'.format(language_dir))
shutil.rmtree(language_dir)
def prepare_language(language):
archive = maybe_download_language(language)
extracted = maybe_extract(CLI_ARGS.base_dir, language, archive)
maybe_convert_to_wav(extracted)
samples = collect_samples(extracted, language)
assign_sub_sets(samples)
create_sample_dirs(language)
split_audio_files(samples, language)
write_csvs(samples, language)
cleanup(archive, language)
def handle_args():
parser = argparse.ArgumentParser(description="Import Spoken Wikipedia Corpora")
parser.add_argument("base_dir", help="Directory containing all data")
parser.add_argument(
"--language", default="all", help="One of (all|{})".format("|".join(LANGUAGES))
)
parser.add_argument(
"--exclude_numbers",
type=bool,
default=True,
help="If sequences with non-transliterated numbers should be excluded",
)
parser.add_argument(
"--max_duration",
type=int,
default=10000,
help="Maximum sample duration in milliseconds",
)
parser.add_argument(
"--ignore_too_long",
type=bool,
default=False,
help="If samples exceeding max_duration should be removed",
)
parser.add_argument(
"--normalize",
action="store_true",
help="Converts diacritic characters to their base ones",
)
for language in LANGUAGES:
parser.add_argument(
"--{}_alphabet".format(language),
help="Exclude {} samples with characters not in provided alphabet file".format(
language
),
)
parser.add_argument(
"--add_meta", action="store_true", help="Adds article and speaker CSV columns"
)
parser.add_argument(
"--exclude_unknown_speakers",
action="store_true",
help="Exclude unknown speakers",
)
parser.add_argument(
"--exclude_unknown_articles",
action="store_true",
help="Exclude unknown articles",
)
parser.add_argument(
"--keep_archive",
type=bool,
default=True,
help="If downloaded archives should be kept",
)
parser.add_argument(
"--keep_intermediate",
type=bool,
default=False,
help="If intermediate files should be kept",
)
return parser.parse_args()
if __name__ == "__main__":
CLI_ARGS = handle_args()
if CLI_ARGS.language == "all":
for lang in LANGUAGES:
prepare_language(lang)
elif CLI_ARGS.language in LANGUAGES:
prepare_language(CLI_ARGS.language)
else:
fail("Wrong language id")

View File

@ -1,24 +1,17 @@
#!/usr/bin/env python
from __future__ import absolute_import, division, print_function
# Make sure we can import stuff from util/
# This script needs to be run from the root of the DeepSpeech repository
import sys
import os
sys.path.insert(1, os.path.join(sys.path[0], '..'))
import codecs
import pandas
import tarfile
import unicodedata
import wave
from glob import glob
from os import makedirs, path, remove, rmdir
import pandas
from coqui_stt_training.util.downloader import maybe_download
from coqui_stt_training.util.stm import parse_stm_file
from sox import Transformer
from util.downloader import maybe_download
from tensorflow.python.platform import gfile
from util.stm import parse_stm_file
def _download_and_preprocess_data(data_dir):
# Conditionally download data
@ -41,6 +34,7 @@ def _download_and_preprocess_data(data_dir):
dev_files.to_csv(path.join(data_dir, "ted-dev.csv"), index=False)
test_files.to_csv(path.join(data_dir, "ted-test.csv"), index=False)
def _maybe_extract(data_dir, extracted_data, archive):
# If data_dir/extracted_data does not exist, extract archive in data_dir
if not gfile.Exists(path.join(data_dir, extracted_data)):
@ -48,6 +42,7 @@ def _maybe_extract(data_dir, extracted_data, archive):
tar.extractall(data_dir)
tar.close()
def _maybe_convert_wav(data_dir, extracted_data):
# Create extracted_data dir
extracted_dir = path.join(data_dir, extracted_data)
@ -61,6 +56,7 @@ def _maybe_convert_wav(data_dir, extracted_data):
# Conditionally convert test sph to wav
_maybe_convert_wav_dataset(extracted_dir, "test")
def _maybe_convert_wav_dataset(extracted_dir, data_set):
# Create source dir
source_dir = path.join(extracted_dir, data_set, "sph")
@ -84,6 +80,7 @@ def _maybe_convert_wav_dataset(extracted_dir, data_set):
# Remove source_dir
rmdir(source_dir)
def _maybe_split_sentences(data_dir, extracted_data):
# Create extracted_data dir
extracted_dir = path.join(data_dir, extracted_data)
@ -99,6 +96,7 @@ def _maybe_split_sentences(data_dir, extracted_data):
return train_files, dev_files, test_files
def _maybe_split_dataset(extracted_dir, data_set):
# Create stm dir
stm_dir = path.join(extracted_dir, data_set, "stm")
@ -116,14 +114,21 @@ def _maybe_split_dataset(extracted_dir, data_set):
# Open wav corresponding to stm_file
wav_filename = path.splitext(path.basename(stm_file))[0] + ".wav"
wav_file = path.join(wav_dir, wav_filename)
origAudio = wave.open(wav_file,'r')
origAudio = wave.open(wav_file, "r")
# Loop over stm_segments and split wav_file for each segment
for stm_segment in stm_segments:
# Create wav segment filename
start_time = stm_segment.start_time
stop_time = stm_segment.stop_time
new_wav_filename = path.splitext(path.basename(stm_file))[0] + "-" + str(start_time) + "-" + str(stop_time) + ".wav"
new_wav_filename = (
path.splitext(path.basename(stm_file))[0]
+ "-"
+ str(start_time)
+ "-"
+ str(stop_time)
+ ".wav"
)
new_wav_file = path.join(wav_dir, new_wav_filename)
# If the wav segment filename does not exist create it
@ -131,23 +136,29 @@ def _maybe_split_dataset(extracted_dir, data_set):
_split_wav(origAudio, start_time, stop_time, new_wav_file)
new_wav_filesize = path.getsize(new_wav_file)
files.append((path.abspath(new_wav_file), new_wav_filesize, stm_segment.transcript))
files.append(
(path.abspath(new_wav_file), new_wav_filesize, stm_segment.transcript)
)
# Close origAudio
origAudio.close()
return pandas.DataFrame(data=files, columns=["wav_filename", "wav_filesize", "transcript"])
return pandas.DataFrame(
data=files, columns=["wav_filename", "wav_filesize", "transcript"]
)
def _split_wav(origAudio, start_time, stop_time, new_wav_file):
frameRate = origAudio.getframerate()
origAudio.setpos(int(start_time*frameRate))
chunkData = origAudio.readframes(int((stop_time - start_time)*frameRate))
chunkAudio = wave.open(new_wav_file,'w')
origAudio.setpos(int(start_time * frameRate))
chunkData = origAudio.readframes(int((stop_time - start_time) * frameRate))
chunkAudio = wave.open(new_wav_file, "w")
chunkAudio.setnchannels(origAudio.getnchannels())
chunkAudio.setsampwidth(origAudio.getsampwidth())
chunkAudio.setframerate(frameRate)
chunkAudio.writeframes(chunkData)
chunkAudio.close()
if __name__ == "__main__":
_download_and_preprocess_data(sys.argv[1])

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python
'''
"""
NAME : LDC TIMIT Dataset
URL : https://catalog.ldc.upenn.edu/ldc93s1
HOURS : 5
@ -8,29 +8,32 @@
AUTHORS : Garofolo, John, et al.
TYPE : LDC Membership
LICENCE : LDC User Agreement
'''
"""
import errno
import fnmatch
import os
from os import path
import subprocess
import sys
import tarfile
import fnmatch
from os import path
import pandas as pd
import subprocess
def clean(word):
# LC ALL & strip punctuation which are not required
new = word.lower().replace('.', '')
new = new.replace(',', '')
new = new.replace(';', '')
new = new.replace('"', '')
new = new.replace('!', '')
new = new.replace('?', '')
new = new.replace(':', '')
new = new.replace('-', '')
new = word.lower().replace(".", "")
new = new.replace(",", "")
new = new.replace(";", "")
new = new.replace('"', "")
new = new.replace("!", "")
new = new.replace("?", "")
new = new.replace(":", "")
new = new.replace("-", "")
return new
def _preprocess_data(args):
# Assume data is downloaded from LDC - https://catalog.ldc.upenn.edu/ldc93s1
@ -40,16 +43,24 @@ def _preprocess_data(args):
if ignoreSASentences:
print("Using recommended ignore SA sentences")
print("Ignoring SA sentences (2 x sentences which are repeated by all speakers)")
print(
"Ignoring SA sentences (2 x sentences which are repeated by all speakers)"
)
else:
print("Using unrecommended setting to include SA sentences")
datapath = args
target = path.join(datapath, "TIMIT")
print("Checking to see if data has already been extracted in given argument: %s", target)
print(
"Checking to see if data has already been extracted in given argument: %s",
target,
)
if not path.isdir(target):
print("Could not find extracted data, trying to find: TIMIT-LDC93S1.tgz in: ", datapath)
print(
"Could not find extracted data, trying to find: TIMIT-LDC93S1.tgz in: ",
datapath,
)
filepath = path.join(datapath, "TIMIT-LDC93S1.tgz")
if path.isfile(filepath):
print("File found, extracting")
@ -103,40 +114,58 @@ def _preprocess_data(args):
# if ignoreSAsentences we only want those without SA in the name
# OR
# if not ignoreSAsentences we want all to be added
if (ignoreSASentences and not ('SA' in os.path.basename(full_wav))) or (not ignoreSASentences):
if 'train' in full_wav.lower():
if (ignoreSASentences and not ("SA" in os.path.basename(full_wav))) or (
not ignoreSASentences
):
if "train" in full_wav.lower():
train_list_wavs.append(full_wav)
train_list_trans.append(trans)
train_list_size.append(wav_filesize)
elif 'test' in full_wav.lower():
elif "test" in full_wav.lower():
test_list_wavs.append(full_wav)
test_list_trans.append(trans)
test_list_size.append(wav_filesize)
else:
raise IOError
a = {'wav_filename': train_list_wavs,
'wav_filesize': train_list_size,
'transcript': train_list_trans
}
a = {
"wav_filename": train_list_wavs,
"wav_filesize": train_list_size,
"transcript": train_list_trans,
}
c = {'wav_filename': test_list_wavs,
'wav_filesize': test_list_size,
'transcript': test_list_trans
}
c = {
"wav_filename": test_list_wavs,
"wav_filesize": test_list_size,
"transcript": test_list_trans,
}
all = {'wav_filename': train_list_wavs + test_list_wavs,
'wav_filesize': train_list_size + test_list_size,
'transcript': train_list_trans + test_list_trans
}
all = {
"wav_filename": train_list_wavs + test_list_wavs,
"wav_filesize": train_list_size + test_list_size,
"transcript": train_list_trans + test_list_trans,
}
df_all = pd.DataFrame(all, columns=['wav_filename', 'wav_filesize', 'transcript'], dtype=int)
df_train = pd.DataFrame(a, columns=['wav_filename', 'wav_filesize', 'transcript'], dtype=int)
df_test = pd.DataFrame(c, columns=['wav_filename', 'wav_filesize', 'transcript'], dtype=int)
df_all = pd.DataFrame(
all, columns=["wav_filename", "wav_filesize", "transcript"], dtype=int
)
df_train = pd.DataFrame(
a, columns=["wav_filename", "wav_filesize", "transcript"], dtype=int
)
df_test = pd.DataFrame(
c, columns=["wav_filename", "wav_filesize", "transcript"], dtype=int
)
df_all.to_csv(
target + "/timit_all.csv", sep=",", header=True, index=False, encoding="ascii"
)
df_train.to_csv(
target + "/timit_train.csv", sep=",", header=True, index=False, encoding="ascii"
)
df_test.to_csv(
target + "/timit_test.csv", sep=",", header=True, index=False, encoding="ascii"
)
df_all.to_csv(target+"/timit_all.csv", sep=',', header=True, index=False, encoding='ascii')
df_train.to_csv(target+"/timit_train.csv", sep=',', header=True, index=False, encoding='ascii')
df_test.to_csv(target+"/timit_test.csv", sep=',', header=True, index=False, encoding='ascii')
if __name__ == "__main__":
_preprocess_data(sys.argv[1])

Some files were not shown because too many files have changed in this diff Show More