Layer 6 output now logits as expected by CTC

2016-09-05 14:55:04 +02:00 · 2016-09-05 14:55:04 +02:00 · 0ce8adecf0
commit 0ce8adecf0
parent f45c3c4fb9
1 changed files with 8 additions and 8 deletions
--- a/DeepSpeech.ipynb
+++ b/DeepSpeech.ipynb
@ -31,9 +31,9 @@
    "\n",
    "$$h^{(5)} = g(W^{(5)} h^{(4)} + b^{(5)})$$\n",
    "\n",
-    "where $h^{(4)} = h^{(f)} + h^{(b)}$. The output layer is a standard softmax function that yields the predicted character probabilities for each time slice $t$ and character $k$ in the alphabet:\n",
+    "where $h^{(4)} = h^{(f)} + h^{(b)}$. The output layer are standard logits that correspond to the predicted character probabilities for each time slice $t$ and character $k$ in the alphabet:\n",
    "\n",
-    "$$h^{(6)}_{t,k} = \\hat{y}_{t,k} \\equiv \\mathbb{P}(c_t = k \\mid x) = \\frac{\\exp{ \\left( (W^{(6)} h^{(5)}_t)_k + b^{(6)}_k \\right)}}{\\sum_j \\exp{\\left( (W^{(6)} h^{(5)}_t)_j + b^{(6)}_j \\right)}}$$\n",
+    "$$h^{(6)}_{t,k} = \\hat{y}_{t,k} = (W^{(6)} h^{(5)}_t)_k + b^{(6)}_k$$\n",
    "\n",
    "Here $b^{(6)}_k$ denotes the $k$-th bias and $(W^{(6)} h^{(5)}_t)_k$ the $k$-th element of the matrix product.\n",
    "\n",
@ -534,8 +534,8 @@
    "    #Hidden layer with clipped RELU activation and dropout\n",
    "    layer_5 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(outputs, _weights['h5']), _biases['b5'])), relu_clip)\n",
    "    layer_5 = tf.nn.dropout(layer_5, keep_prob)\n",
-    "    #Hidden layer with softmax function\n",
-    "    layer_6 = tf.nn.softmax(tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6']))\n",
+    "    #Hidden layer of logits\n",
+    "    layer_6 = tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6'])\n",
    "    \n",
    "    # Reshape layer_6 from a tensor of shape [n_steps*batch_size, n_hidden_6]\n",
    "    # to a tensor of shape [batch_size, n_steps, n_hidden_6]\n",
@ -625,10 +625,10 @@
    "\n",
    "The next line of `BiRNN`\n",
    "```python\n",
-    "    #Hidden layer with softmax function\n",
-    "    layer_6 = tf.nn.softmax(tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6']))\n",
+    "    #Hidden layer of logits\n",
+    "    layer_6 = tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6'])\n",
    "```\n",
-    "Applies the weight matrix `_weights['h6']` and bias `_biases['h6']`to the output of `layer_5` creating `n_classes` dimensional vectors, then performs softmax on them.\n",
+    "Applies the weight matrix `_weights['h6']` and bias `_biases['h6']`to the output of `layer_5` creating `n_classes` dimensional vectors, the logits.\n",
    "\n",
    "The next lines of `BiRNN`\n",
    "```python\n",
@ -704,7 +704,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "For a sequence labelling task where the labels are drawn from an alphabet $A$, CTC consists of a softmax output layer, our `layer_6`,  with one more unit than there are labels in `A`. The activations of the first `|A|` units are the probabilities of outputting the corresponding labels at particular times, given the input sequence and the network weights. The activation of the extra unit gives the probability of outputting a $blank$, or no label. The complete sequence of network outputs is then used to define a distribution over all possible label sequences of length up to that of the input sequence."
+    "For a sequence labelling task where the labels are drawn from an alphabet $A$, CTC consists of a logits output layer, our `layer_6`,  with one more unit than there are labels in `A`. The activations of the first `|A|` units correspond to the probabilities of outputting the corresponding labels at particular times, given the input sequence and the network weights. The activation of the extra unit corresponds to the probability of outputting a $blank$, or no label. The complete sequence of network outputs is then used to define a distribution over all possible label sequences of length up to that of the input sequence."
   ]
  },
  {