Merge branch 'master' into r0.9

This commit is contained in:
Reuben Morais 2020-10-30 17:31:35 +01:00
commit f7e816c014
4 changed files with 10 additions and 7 deletions

View File

@ -18,7 +18,7 @@ You've found a bug and you were able to squash it! Great job! Please write a sho
Documentation PR
^^^^^^^^^^^^^^^^
If you're just making updates or changes to the documentation, there's no need to run all of DeepSpeech's tests for Contiguous Itegration (i.e. Taskcluster tests). In this case, at the end of your short but clear commit message, you should add **X-DeepSpeech: NOBUILD**. This will trigger the CI tests to skip your PR, saving both time and compute.
If you're just making updates or changes to the documentation, there's no need to run all of DeepSpeech's tests for Continuous Integration (i.e. Taskcluster tests). In this case, at the end of your short but clear commit message, you should add **X-DeepSpeech: NOBUILD**. This will trigger the CI tests to skip your PR, saving both time and compute.
New Feature PR
^^^^^^^^^^^^^^
@ -29,8 +29,9 @@ The DeepSpeech codebase is made of many connected parts. There is Python code fo
Whenever you add a new feature to DeepSpeech and what to contribute that feature back to the project, here are some things to keep in mind:
1. You've made changes to the core C++ code. You should minimally also make neccesary changes to the C client (i.e. **args.h** and **client.cc**). The bindings for Python, Java, and Javascript are SWIG generated, so you don't need to worry about these. The bindings for .NET and Swift are, however, not generated automatically. It would be best if you also made the necessary manual changes to these bindings as well, but don't worry if you are unable to do so.
2. You've made changes to the training Python code. Make sure you run a linter (described below).
1. You've made changes to the core C++ code. Core changes can have downstream effects on all parts of the DeepSpeech project, so keep that in mind. You should minimally also make necessary changes to the C client (i.e. **args.h** and **client.cc**). The bindings for Python, Java, and Javascript are SWIG generated, and in the best-case scenario you won't have to worry about them. However, if you've added a whole new feature, you may need to make custom tweaks to those bindings, because SWIG may not automagically work with your new feature, especially if you've exposed new arguments. The bindings for .NET and Swift are not generated automatically. It would be best if you also made the necessary manual changes to these bindings as well. It is best to communicate with the core DeepSpeech team and come to an understanding of where you will likely need to work with the bindings. They can't predict all the bugs you will run into, but they will have a good idea of how to plan for some obvious challenges.
2. You've made changes to the Python code. Make sure you run a linter (described below).
3. Make sure your new feature doesn't regress the project. If you've added a significant feature or amount of code, you want to be sure your new feature doesn't create performance issues. For example, if you've made a change to the DeepSpeech decoder, you should know that inference performance doesn't drop in terms of latency, accuracy, or memory usage. Unless you're proposing a new decoding algorithm, you probably don't have to worry about affecting accuracy. However, it's very possible you've affected latency or memory usage. You should run local performance tests to make sure no bugs have crept in. There are lots of tools to check latency and memory usage, and you should use what is most comfortable for you and gets the job done. If you're on Linux, you might find [[perf](https://perf.wiki.kernel.org/index.php/Main_Page)] to be a useful tool. You can use sample WAV files for testing which are provided in the `DeepSpeech/data/` directory.
Python Linter
-------------

View File

@ -3,7 +3,7 @@ Project DeepSpeech
.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
:target: http://deepspeech.readthedocs.io/?badge=latest
:target: https://deepspeech.readthedocs.io/?badge=latest
:alt: Documentation
@ -12,9 +12,9 @@ Project DeepSpeech
:alt: Task Status
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.
Documentation for installation, usage, and training models are available on `deepspeech.readthedocs.io <http://deepspeech.readthedocs.io/?badge=latest>`_.
Documentation for installation, usage, and training models are available on `deepspeech.readthedocs.io <https://deepspeech.readthedocs.io/?badge=latest>`_.
For the latest release, including pre-trained models and checkpoints, `see the latest release on GitHub <https://github.com/mozilla/DeepSpeech/releases/latest>`_.

View File

@ -27,6 +27,7 @@ from ds_ctcdecoder import Alphabet
FIELDNAMES = ["wav_filename", "wav_filesize", "transcript"]
SAMPLE_RATE = 16000
CHANNELS = 1
MAX_SECS = 10
PARAMS = None
FILTER_OBJ = None
@ -179,7 +180,7 @@ def _preprocess_data(tsv_dir, audio_dir, space_after_every_character=False):
def _maybe_convert_wav(mp3_filename, wav_filename):
if not os.path.exists(wav_filename):
transformer = sox.Transformer()
transformer.convert(samplerate=SAMPLE_RATE)
transformer.convert(samplerate=SAMPLE_RATE, n_channels=CHANNELS)
try:
transformer.build(mp3_filename, wav_filename)
except sox.core.SoxError:

View File

@ -2,6 +2,7 @@
import codecs
import os
import re
import sys
import tarfile
import threading
import unicodedata