Addressed review comments
This commit is contained in:
parent
f75b9cc926
commit
3d7dc179e9
@ -21,7 +21,7 @@ Let a single utterance :math:`x` and label :math:`y` be sampled from a training
|
||||
Each utterance, :math:`x^{(i)}` is a time-series of length :math:`T^{(i)}`
|
||||
where every time-slice is a vector of audio features,
|
||||
:math:`x^{(i)}_t` where :math:`t=1,\ldots,T^{(i)}`.
|
||||
We use MFCC coefficients as our features; so :math:`x^{(i)}_{t,p}` denotes the :math:`p`-th MFCC feature
|
||||
We use MFCC's as our features; so :math:`x^{(i)}_{t,p}` denotes the :math:`p`-th MFCC feature
|
||||
in the audio frame at time :math:`t`. The goal of our RNN is to convert an input
|
||||
sequence :math:`x` into a sequence of character probabilities for the transcription
|
||||
:math:`y`, with :math:`\hat{y}_t =\mathbb{P}(c_t \mid x)`,
|
||||
|
@ -3,13 +3,6 @@ Geometric Constants
|
||||
|
||||
This is about several constants related to the geometry of the network.
|
||||
|
||||
n_steps
|
||||
-------
|
||||
The network views each speech sample as a sequence of time-slices :math:`x^{(i)}_t` of
|
||||
length :math:`T^{(i)}`. As the speech samples vary in length, we know that :math:`T^{(i)}`
|
||||
need not equal :math:`T^{(j)}` for :math:`i \ne j`. For each batch, RNN in TensorFlow needs
|
||||
to know ``n_steps`` which is the maximum :math:`T^{(i)}` for the batch.
|
||||
|
||||
n_input
|
||||
-------
|
||||
Each of the at maximum ``n_steps`` vectors is a vector of MFCC features of a
|
||||
|
Loading…
Reference in New Issue
Block a user