STT/bin
Catalin Voss 6640cf2341
Remote training I/O once more (#3437)
* Redo remote I/O changes once more; this time without messing with taskcluster

* Add bin changes

* Fix merge-induced issue?

* For the interleaved case with multiple collections, unpack audio on the fly

To reproduce the previous failure

rm data/smoke_test/ldc93s1.csv
rm data/smoke_test/ldc93s1.sdb
rm -rf /tmp/ldc93s1_cache_sdb_csv
rm -rf /tmp/ckpt_sdb_csv
rm -rf /tmp/train_sdb_csv

./bin/run-tc-ldc93s1_new_sdb_csv.sh 109 16000
python -u DeepSpeech.py --noshow_progressbar --noearly_stop --train_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --train_batch_size 1 --feature_cache /tmp/ldc93s1_cache_sdb_csv --dev_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --dev_batch_size 1 --test_files ./data/smoke_test/ldc93s1.sdb,./data/smoke_test/ldc93s1.csv --test_batch_size 1 --n_hidden 100 --epochs 109 --max_to_keep 1 --checkpoint_dir /tmp/ckpt_sdb_csv --learning_rate 0.001 --dropout_rate 0.05 --export_dir /tmp/train_sdb_csv --scorer_path data/smoke_test/pruned_lm.scorer --audio_sample_rate 16000

* Attempt to preserve length information with a wrapper around `map()`… this gets pretty python-y

* Call the right `__next__()`

* Properly implement the rest of the map wrappers here……

* Fix trailing whitespace situation and other linter complaints

* Remove data accidentally checked in

* Fix overlay augmentations

* Wavs must be open in rb mode if we're passing in an external file pointer -- this confused me

* Lint whitespace

* Revert "Fix trailing whitespace situation and other linter complaints"

This reverts commit c3c45397a2f98e9b00d00c18c4ced4fc52475032.

* Fix linter issue but without such an aggressive diff

* Move unpack_maybe into sample_collections

* Use unpack_maybe in place of duplicate lambda

* Fix confusing comment

* Add clarifying comment for on-the-fly unpacking
2020-12-07 13:07:34 +01:00
..
compare_samples.py Remote training I/O once more (#3437) 2020-12-07 13:07:34 +01:00
data_set_tool.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
graphdef_binary_to_text.py Reformat importers with black 2020-03-31 13:43:30 +02:00
import_aidatatang.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_aishell.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_ccpmf.py Importer for dataset from Centre de Conférences Pierre Mendès-France 2020-11-24 09:49:39 +01:00
import_cv2.py Convert channels for CV2 dataset 2020-10-15 11:22:39 -04:00
import_cv.py fix missing import 'sys' 2020-09-08 10:15:22 +02:00
import_fisher.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_freestmandarin.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_gram_vaani.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_ldc93s1.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_librivox.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_lingua_libre.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_m-ailabs.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_magicdata.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_primewords.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_slr57.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_swb.py Added os import in front of makedirs 2020-09-16 14:20:59 -04:00
import_swc.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_ted.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_timit.py Reformat importers with black 2020-03-31 13:43:30 +02:00
import_ts.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_tuda.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_vctk.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
import_voxforge.py Add missing sys import to import_voxforge.py 2020-10-22 23:09:49 -10:00
ops_in_graph.py Reformat importers with black 2020-03-31 13:43:30 +02:00
play.py Revert "Merge pull request #3237 from lissyx/rename-training-package" 2020-08-26 11:46:08 +02:00
README.rst Move from Markdown to reStructuredText 2019-10-04 12:07:32 +02:00
run-ldc93s1.sh Force only one GPU on LDC93S1 scripts 2019-04-30 17:35:26 +02:00
run-tc-graph_augmentations.sh Warp augmentation 2020-06-29 16:22:31 +02:00
run-tc-ldc93s1_checkpoint_bytes.sh Rename --utf8 flag to --bytes_output_mode to avoid confusion 2020-10-06 18:19:33 +02:00
run-tc-ldc93s1_checkpoint_sdb.sh Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly 2020-07-21 17:02:01 +02:00
run-tc-ldc93s1_checkpoint.sh Add transfer learning test 2020-02-17 08:29:10 +01:00
run-tc-ldc93s1_new_bytes_tflite.sh Rename --utf8 flag to --bytes_output_mode to avoid confusion 2020-10-06 18:19:33 +02:00
run-tc-ldc93s1_new_bytes.sh Rename --utf8 flag to --bytes_output_mode to avoid confusion 2020-10-06 18:19:33 +02:00
run-tc-ldc93s1_new_metrics.sh Add training test with --metrics_files 2020-06-08 18:06:21 +02:00
run-tc-ldc93s1_new_sdb_csv.sh Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly 2020-07-21 17:02:01 +02:00
run-tc-ldc93s1_new_sdb.sh Resolves #3146 - Let build_sdb.py also output CSV files and rename it accordingly 2020-07-21 17:02:01 +02:00
run-tc-ldc93s1_new.sh Update all API consumers 2020-02-11 19:44:36 +01:00
run-tc-ldc93s1_singleshotinference.sh Update all API consumers 2020-02-11 19:44:36 +01:00
run-tc-ldc93s1_tflite.sh Update all API consumers 2020-02-11 19:44:36 +01:00
run-tc-sample_augmentations.sh Refactoring of TF based augmentations 2020-06-10 13:42:45 +02:00
run-tc-transfer.sh Split --load into two to avoid unexpected behavior at evaluation time 2020-04-07 14:24:05 +02:00

Utility scripts
===============

This folder contains scripts that can be used to do training on the various included importers from the command line. This is useful to be able to run training without a browser open, or unattended on a remote machine. They should be run from the base directory of the repository. Note that the default settings assume a very well-specified machine. In the situation that out-of-memory errors occur, you may find decreasing the values of ``--train_batch_size``\ , ``--dev_batch_size`` and ``--test_batch_size`` will allow you to continue, at the expense of speed.