Addressed review comments

This commit is contained in:
kdavis-mozilla 2019-12-02 14:11:31 +01:00
parent f75b9cc926
commit 3d7dc179e9
2 changed files with 1 additions and 8 deletions

View File

@ -21,7 +21,7 @@ Let a single utterance :math:`x` and label :math:`y` be sampled from a training
Each utterance, :math:`x^{(i)}` is a time-series of length :math:`T^{(i)}` Each utterance, :math:`x^{(i)}` is a time-series of length :math:`T^{(i)}`
where every time-slice is a vector of audio features, where every time-slice is a vector of audio features,
:math:`x^{(i)}_t` where :math:`t=1,\ldots,T^{(i)}`. :math:`x^{(i)}_t` where :math:`t=1,\ldots,T^{(i)}`.
We use MFCC coefficients as our features; so :math:`x^{(i)}_{t,p}` denotes the :math:`p`-th MFCC feature We use MFCC's as our features; so :math:`x^{(i)}_{t,p}` denotes the :math:`p`-th MFCC feature
in the audio frame at time :math:`t`. The goal of our RNN is to convert an input in the audio frame at time :math:`t`. The goal of our RNN is to convert an input
sequence :math:`x` into a sequence of character probabilities for the transcription sequence :math:`x` into a sequence of character probabilities for the transcription
:math:`y`, with :math:`\hat{y}_t =\mathbb{P}(c_t \mid x)`, :math:`y`, with :math:`\hat{y}_t =\mathbb{P}(c_t \mid x)`,

View File

@ -3,13 +3,6 @@ Geometric Constants
This is about several constants related to the geometry of the network. This is about several constants related to the geometry of the network.
n_steps
-------
The network views each speech sample as a sequence of time-slices :math:`x^{(i)}_t` of
length :math:`T^{(i)}`. As the speech samples vary in length, we know that :math:`T^{(i)}`
need not equal :math:`T^{(j)}` for :math:`i \ne j`. For each batch, RNN in TensorFlow needs
to know ``n_steps`` which is the maximum :math:`T^{(i)}` for the batch.
n_input n_input
------- -------
Each of the at maximum ``n_steps`` vectors is a vector of MFCC features of a Each of the at maximum ``n_steps`` vectors is a vector of MFCC features of a