Augmentation Documentation (#2355)

Augmentation Documentation
This commit is contained in:
Chirag Ahuja 2019-09-23 16:13:17 +05:30 committed by Reuben Morais
parent 7995e4230b
commit 58fdd55eea
1 changed files with 30 additions and 0 deletions

View File

@ -71,6 +71,7 @@ See the output of `deepspeech -h` for more information on the use of `deepspeech
- [Exporting a model for TFLite](#exporting-a-model-for-tflite) - [Exporting a model for TFLite](#exporting-a-model-for-tflite)
- [Making a mmap-able model for inference](#making-a-mmap-able-model-for-inference) - [Making a mmap-able model for inference](#making-a-mmap-able-model-for-inference)
- [Continuing training from a release model](#continuing-training-from-a-release-model) - [Continuing training from a release model](#continuing-training-from-a-release-model)
- [Training with Augmentation](#training-with-augmentation)
- [Contribution guidelines](#contribution-guidelines) - [Contribution guidelines](#contribution-guidelines)
- [Contact/Getting Help](#contactgetting-help) - [Contact/Getting Help](#contactgetting-help)
@ -418,6 +419,34 @@ python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir path/to/checkpoint/folder
Note: the released models were trained with `--n_hidden 2048`, so you need to use that same value when initializing from the release models. Note: the released models were trained with `--n_hidden 2048`, so you need to use that same value when initializing from the release models.
### Training with augmentation
Augmentation is a useful technique for better generalization of machine learning models. Thus, a pre-processing pipeline with various augmentation techniques on raw pcm and spectrogram has been implemented and can be used while training the model. Following are the available augmentation techniques that can be enabled at training time by using the corresponding flags in the command line.
#### Audio Augmentation
1. **Standard deviation for Gaussian additive noise:** ```--data_aug_features_additive```
2. **Standard deviation for Normal distribution around 1 for multiplicative noise:** ```--data_aug_features_multiplicative```
3. **Standard deviation for speeding-up tempo. If Standard deviation is 0, this augmentation is not performed:** ```--augmentation_speed_up_std```
#### Spectrogram Augmentation
Inspired by Google Paper on [SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition]( https://arxiv.org/abs/1904.08779)
1. **Keep rate of dropout augmentation on a spectrogram (if 1, no dropout will be performed on the spectrogram)**:
* Keep Rate : ```--augmentation_spec_dropout_keeprate value between range [0 - 1]```
2. **Whether to use frequency and time masking augmentation:**
* Enable / Disable : ```--augmentation_freq_and_time_masking / --noaugmentation_freq_and_time_masking```
* Max range of masks in the frequency domain when performing freqtime-mask augmentation: ```--augmentation_freq_and_time_masking_freq_mask_range eg: 5```
* Number of masks in the frequency domain when performing freqtime-mask augmentation: ```--augmentation_freq_and_time_masking_number_freq_masks eg: 3```
* Max range of masks in the time domain when performing freqtime-mask augmentation: ```--augmentation_freq_and_time_masking_time_mask_rangee eg: 2```
* Number of masks in the time domain when performing freqtime-mask augmentation: ```augmentation_freq_and_time_masking_number_time_masks eg: 3 ```
3. **Whether to use spectrogram speed and tempo scaling:**
* Enable / Disable : ```--augmentation_pitch_and_tempo_scaling / --noaugmentation_pitch_and_tempo_scaling.```
* Min value of pitch scaling: ```--augmentation_pitch_and_tempo_scaling_min_pitch eg:0.95 ```
* Max value of pitch scaling: ```--augmentation_pitch_and_tempo_scaling_max_pitch eg:1.2```
* Max value of tempo scaling: ```--augmentation_pitch_and_tempo_scaling_max_tempo eg:1.2```
## Contribution guidelines ## Contribution guidelines
This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the [Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/). This repository is governed by Mozilla's code of conduct and etiquette guidelines. For more details, please read the [Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/).
@ -481,3 +510,4 @@ There are several ways to contact us or to get help:
3. [**IRC**](https://wiki.mozilla.org/IRC) - If your question is not addressed by either the [FAQ](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions) or [Discourse Forums](https://discourse.mozilla.org/c/deep-speech), you can contact us on the `#machinelearning` channel on [Mozilla IRC](https://wiki.mozilla.org/IRC); people there can try to answer/help 3. [**IRC**](https://wiki.mozilla.org/IRC) - If your question is not addressed by either the [FAQ](https://github.com/mozilla/DeepSpeech/wiki#frequently-asked-questions) or [Discourse Forums](https://discourse.mozilla.org/c/deep-speech), you can contact us on the `#machinelearning` channel on [Mozilla IRC](https://wiki.mozilla.org/IRC); people there can try to answer/help
4. [**Issues**](https://github.com/mozilla/deepspeech/issues) - Finally, if all else fails, you can open an issue in our repo. 4. [**Issues**](https://github.com/mozilla/deepspeech/issues) - Finally, if all else fails, you can open an issue in our repo.