Updating Geometry

2019-12-02 11:04:27 +01:00 · 2019-12-02 11:04:27 +01:00 · f75b9cc926
commit f75b9cc926
parent 7d96540d66
2 changed files with 10 additions and 26 deletions
--- a/doc/Geometry.rst
+++ b/doc/Geometry.rst
@ -7,7 +7,7 @@ n_steps
 -------
 The network views each speech sample as a sequence of time-slices :math:`x^{(i)}_t` of
 length :math:`T^{(i)}`. As the speech samples vary in length, we know that :math:`T^{(i)}`
-need not equal :math:`T^{(j)}` for :math:`i \ne j`. For each batch, BRNN in TensorFlow needs
+need not equal :math:`T^{(j)}` for :math:`i \ne j`. For each batch, RNN in TensorFlow needs
 to know ``n_steps`` which is the maximum :math:`T^{(i)}` for the batch.

 n_input
@ -17,14 +17,14 @@ time-slice of the speech sample. We will make the number of MFCC features
 dependent upon the sample rate of the data set. Generically, if the sample rate
 is 8kHz we use 13 features. If the sample rate is 16kHz we use 26 features...
 We capture the dimension of these vectors, equivalently the number of MFCC
-features, in the variable ``n_input``.
+features, in the variable ``n_input``. By default ``n_input`` is 26.

 n_context
 ---------
-As previously mentioned, the BRNN is not simply fed the MFCC features of a given
-time-slice. It is fed, in addition, a context of :math:`C \in \{5, 7, 9\}` frames on
+As previously mentioned, the RNN is not simply fed the MFCC features of a given
+time-slice. It is fed, in addition, a context of :math:`C` frames on
 either side of the frame in question. The number of frames in this context is
-captured in the variable ``n_context``.
+captured in the variable ``n_context``. By default ``n_context`` is 9.

 Next we will introduce constants that specify the geometry of some of the
 non-recurrent layers of the network. We do this by simply specifying the number
@ -36,20 +36,13 @@ n_hidden_1, n_hidden_2, n_hidden_5
 of units in the second, and  ``n_hidden_5`` the number in the fifth. We haven't
 forgotten about the third or sixth layer. We will define their unit count below.

-A LSTM BRNN consists of a pair of LSTM RNN's.
-One LSTM RNN that works "forward in time":
+The RNN consists of an LSTM RNN that works "forward in time":

 .. image:: ../images/LSTM3-chain.png
    :alt: Image shows a diagram of a recurrent neural network with LSTM cells, with arrows depicting the flow of data from earlier time steps to later timesteps within the RNN.

-and a second LSTM RNN that works "backwards in time":
-
-.. image:: ../images/LSTM3-chain-backwards.png
-    :alt: Image shows a diagram of a recurrent neural network with LSTM cells, this time with data flowing from later time steps to earlier timesteps within the RNN.
-
 The dimension of the cell state, the upper line connecting subsequent LSTM units,
-is independent of the input dimension and the same for both the forward and
-backward LSTM RNN.
+is independent of the input dimension.

 n_cell_dim
 ----------
@ -63,11 +56,11 @@ determined by ``n_cell_dim`` as follows

 .. code:: python

-    n_hidden_3 = 2 * n_cell_dim
+    n_hidden_3 = n_cell_dim

-n_character
+n_hidden_6
 -----------
-The variable ``n_character`` will hold the number of characters in the target
+The variable ``n_hidden_6`` will hold the number of characters in the target
 language plus one, for the :math:`blank`.
 For English it is the cardinality of the set

@ -75,12 +68,3 @@ For English it is the cardinality of the set
    \{a,b,c, . . . , z, space, apostrophe, blank\}

 we referred to earlier.
-
-n_hidden_6
----------
-The number of units in the sixth layer is determined by ``n_character`` as follows:
-
-.. code:: python
-
-    n_hidden_6 = n_character
-
--- a/images/LSTM3-chain-backwards.png
+++ b/images/LSTM3-chain-backwards.png