STT/bin at training-refactor - STT - Gitea in the EmuNest

History

* Redo remote I/O changes once more; this time without messing with taskcluster

* Add bin changes

* Fix merge-induced issue?

* For the interleaved case with multiple collections, unpack audio on the fly

To reproduce the previous failure

rm data/smoke_test/ldc93s1.csv
rm data/smoke_test/ldc93s1.sdb
rm -rf /tmp/ldc93s1_cache_sdb_csv
rm -rf /tmp/ckpt_sdb_csv
rm -rf /tmp/train_sdb_csv

./bin/run-tc-ldc93s1_new_sdb_csv.sh 109 16000
python -u DeepSpeech.py --noshow_progressbar --noearly_stop --train_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --train_batch_size 1 --feature_cache /tmp/ldc93s1_cache_sdb_csv --dev_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --dev_batch_size 1 --test_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --test_batch_size 1 --n_hidden 100 --epochs 109 --max_to_keep 1 --checkpoint_dir /tmp/ckpt_sdb_csv --learning_rate 0.001 --dropout_rate 0.05 --export_dir /tmp/train_sdb_csv --scorer_path data/smoke_test/pruned_lm.scorer --audio_sample_rate 16000

* Attempt to preserve length information with a wrapper around `map()`… this gets pretty python-y

* Call the right `__next__()`

* Properly implement the rest of the map wrappers here……

* Fix trailing whitespace situation and other linter complaints

* Remove data accidentally checked in

* Fix overlay augmentations

* Wavs must be open in rb mode if we're passing in an external file pointer -- this confused me

* Lint whitespace

* Revert "Fix trailing whitespace situation and other linter complaints"

This reverts commit c3c45397a2f98e9b00d00c18c4ced4fc52475032.

* Fix linter issue but without such an aggressive diff

* Move unpack_maybe into sample_collections

* Use unpack_maybe in place of duplicate lambda

* Fix confusing comment

* Add clarifying comment for on-the-fly unpacking

2020-12-07 13:07:34 +01:00

compare_samples.py

Remote training I/O once more (#3437 )

2020-12-07 13:07:34 +01:00

data_set_tool.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

graphdef_binary_to_text.py

Reformat importers with black

2020-03-31 13:43:30 +02:00

import_aidatatang.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_aishell.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_ccpmf.py

Importer for dataset from Centre de Conférences Pierre Mendès-France

2020-11-24 09:49:39 +01:00

import_cv2.py

Convert channels for CV2 dataset

2020-10-15 11:22:39 -04:00

import_cv.py

fix missing import 'sys'

2020-09-08 10:15:22 +02:00

import_fisher.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_freestmandarin.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_gram_vaani.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_ldc93s1.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_librivox.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_lingua_libre.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_m-ailabs.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_magicdata.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_primewords.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_slr57.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_swb.py

Added os import in front of makedirs

2020-09-16 14:20:59 -04:00

import_swc.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_ted.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_timit.py

Reformat importers with black

2020-03-31 13:43:30 +02:00

import_ts.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_tuda.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_vctk.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

import_voxforge.py

Add missing sys import to import_voxforge.py

2020-10-22 23:09:49 -10:00

ops_in_graph.py

Reformat importers with black

2020-03-31 13:43:30 +02:00

play.py

Revert "Merge pull request #3237 from lissyx/rename-training-package"

2020-08-26 11:46:08 +02:00

README.rst

Move from Markdown to reStructuredText

2019-10-04 12:07:32 +02:00

run-ldc93s1.sh

Force only one GPU on LDC93S1 scripts

2019-04-30 17:35:26 +02:00

run-tc-graph_augmentations.sh

Warp augmentation

2020-06-29 16:22:31 +02:00

run-tc-ldc93s1_checkpoint_bytes.sh

Rename --utf8 flag to --bytes_output_mode to avoid confusion

2020-10-06 18:19:33 +02:00

run-tc-ldc93s1_checkpoint_sdb.sh

Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly

2020-07-21 17:02:01 +02:00

run-tc-ldc93s1_checkpoint.sh

Add transfer learning test

2020-02-17 08:29:10 +01:00

run-tc-ldc93s1_new_bytes_tflite.sh

Rename --utf8 flag to --bytes_output_mode to avoid confusion

2020-10-06 18:19:33 +02:00

run-tc-ldc93s1_new_bytes.sh

Rename --utf8 flag to --bytes_output_mode to avoid confusion

2020-10-06 18:19:33 +02:00

run-tc-ldc93s1_new_metrics.sh

Add training test with --metrics_files

2020-06-08 18:06:21 +02:00

run-tc-ldc93s1_new_sdb_csv.sh

Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly

2020-07-21 17:02:01 +02:00

run-tc-ldc93s1_new_sdb.sh

Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly

2020-07-21 17:02:01 +02:00

run-tc-ldc93s1_new.sh

Update all API consumers

2020-02-11 19:44:36 +01:00

run-tc-ldc93s1_singleshotinference.sh

Update all API consumers

2020-02-11 19:44:36 +01:00

run-tc-ldc93s1_tflite.sh

Update all API consumers

2020-02-11 19:44:36 +01:00

run-tc-sample_augmentations.sh

Refactoring of TF based augmentations

2020-06-10 13:42:45 +02:00

run-tc-transfer.sh

Split --load into two to avoid unexpected behavior at evaluation time

2020-04-07 14:24:05 +02:00

README.rst

Utility scripts
===============

This folder contains scripts that can be used to do training on the various included importers from the command line. This is useful to be able to run training without a browser open, or unattended on a remote machine. They should be run from the base directory of the repository. Note that the default settings assume a very well-specified machine. In the situation that out-of-memory errors occur, you may find decreasing the values of ``--train_batch_size``\ , ``--dev_batch_size`` and ``--test_batch_size`` will allow you to continue, at the expense of speed.