Merge pull request #3258 from Jendker/docu_filesize

Extend docu about the CSV files
This commit is contained in:
lissyx 2020-08-19 12:08:23 +02:00 committed by GitHub
commit eb23728538
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -117,12 +117,15 @@ Running the importer with ``-h`` will show you some additional options.
Once the import is done, the ``clips`` sub-directory will contain for each required ``.mp3`` an additional ``.wav`` file.
It will also add the following ``.csv`` files:
* ``clips/train.csv``
* ``clips/dev.csv``
* ``clips/test.csv``
Entries in CSV files can refer to samples by their absolute or relative paths. Here, the importer produces relative paths.
The CSV files comprise of the following fields:
* ``wav_filename`` - path of the sample, either absolute or relative. Here, the importer produces relative paths.
* ``wav_filesize`` - samples size given in bytes, used for sorting the data before training. Expects integer.
* ``transcript`` - transcription target for the sample.
To use Common Voice data during training, validation and testing, you pass (comma separated combinations of) their filenames into ``--train_files``\ , ``--dev_files``\ , ``--test_files`` parameters of ``DeepSpeech.py``.