Layer 6 output now logits as expected by CTC
This commit is contained in:
parent
f45c3c4fb9
commit
0ce8adecf0
@ -31,9 +31,9 @@
|
||||
"\n",
|
||||
"$$h^{(5)} = g(W^{(5)} h^{(4)} + b^{(5)})$$\n",
|
||||
"\n",
|
||||
"where $h^{(4)} = h^{(f)} + h^{(b)}$. The output layer is a standard softmax function that yields the predicted character probabilities for each time slice $t$ and character $k$ in the alphabet:\n",
|
||||
"where $h^{(4)} = h^{(f)} + h^{(b)}$. The output layer are standard logits that correspond to the predicted character probabilities for each time slice $t$ and character $k$ in the alphabet:\n",
|
||||
"\n",
|
||||
"$$h^{(6)}_{t,k} = \\hat{y}_{t,k} \\equiv \\mathbb{P}(c_t = k \\mid x) = \\frac{\\exp{ \\left( (W^{(6)} h^{(5)}_t)_k + b^{(6)}_k \\right)}}{\\sum_j \\exp{\\left( (W^{(6)} h^{(5)}_t)_j + b^{(6)}_j \\right)}}$$\n",
|
||||
"$$h^{(6)}_{t,k} = \\hat{y}_{t,k} = (W^{(6)} h^{(5)}_t)_k + b^{(6)}_k$$\n",
|
||||
"\n",
|
||||
"Here $b^{(6)}_k$ denotes the $k$-th bias and $(W^{(6)} h^{(5)}_t)_k$ the $k$-th element of the matrix product.\n",
|
||||
"\n",
|
||||
@ -534,8 +534,8 @@
|
||||
" #Hidden layer with clipped RELU activation and dropout\n",
|
||||
" layer_5 = tf.minimum(tf.nn.relu(tf.add(tf.matmul(outputs, _weights['h5']), _biases['b5'])), relu_clip)\n",
|
||||
" layer_5 = tf.nn.dropout(layer_5, keep_prob)\n",
|
||||
" #Hidden layer with softmax function\n",
|
||||
" layer_6 = tf.nn.softmax(tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6']))\n",
|
||||
" #Hidden layer of logits\n",
|
||||
" layer_6 = tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6'])\n",
|
||||
" \n",
|
||||
" # Reshape layer_6 from a tensor of shape [n_steps*batch_size, n_hidden_6]\n",
|
||||
" # to a tensor of shape [batch_size, n_steps, n_hidden_6]\n",
|
||||
@ -625,10 +625,10 @@
|
||||
"\n",
|
||||
"The next line of `BiRNN`\n",
|
||||
"```python\n",
|
||||
" #Hidden layer with softmax function\n",
|
||||
" layer_6 = tf.nn.softmax(tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6']))\n",
|
||||
" #Hidden layer of logits\n",
|
||||
" layer_6 = tf.add(tf.matmul(layer_5, _weights['h6']), _biases['b6'])\n",
|
||||
"```\n",
|
||||
"Applies the weight matrix `_weights['h6']` and bias `_biases['h6']`to the output of `layer_5` creating `n_classes` dimensional vectors, then performs softmax on them.\n",
|
||||
"Applies the weight matrix `_weights['h6']` and bias `_biases['h6']`to the output of `layer_5` creating `n_classes` dimensional vectors, the logits.\n",
|
||||
"\n",
|
||||
"The next lines of `BiRNN`\n",
|
||||
"```python\n",
|
||||
@ -704,7 +704,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For a sequence labelling task where the labels are drawn from an alphabet $A$, CTC consists of a softmax output layer, our `layer_6`, with one more unit than there are labels in `A`. The activations of the first `|A|` units are the probabilities of outputting the corresponding labels at particular times, given the input sequence and the network weights. The activation of the extra unit gives the probability of outputting a $blank$, or no label. The complete sequence of network outputs is then used to define a distribution over all possible label sequences of length up to that of the input sequence."
|
||||
"For a sequence labelling task where the labels are drawn from an alphabet $A$, CTC consists of a logits output layer, our `layer_6`, with one more unit than there are labels in `A`. The activations of the first `|A|` units correspond to the probabilities of outputting the corresponding labels at particular times, given the input sequence and the network weights. The activation of the extra unit corresponds to the probability of outputting a $blank$, or no label. The complete sequence of network outputs is then used to define a distribution over all possible label sequences of length up to that of the input sequence."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
Loading…
Reference in New Issue
Block a user