Fix label_smoothing in multidimensional CategoricalCrossentropy.

When label smoothing in CategoricalCrossentropy is non-zero, it takes tf.shape(y_true)[1] as the number of classes. However, if the true values and predictions are multidimensional (for example when training a POS tagger where batch elements are sentences composed of words), a wrong value is taken and the training does not work. This fix takes the _last_ dimension as the one containing classes.
2020-02-23 13:58:03 +01:00 · 2020-02-23 13:58:03 +01:00 · 5f1ee72e97
commit 5f1ee72e97
parent 390052e2ce
1 changed files with 1 additions and 1 deletions
--- a/tensorflow/python/keras/losses.py
+++ b/tensorflow/python/keras/losses.py
@ -1084,7 +1084,7 @@ def categorical_crossentropy(y_true,
  label_smoothing = ops.convert_to_tensor_v2(label_smoothing, dtype=K.floatx())

  def _smooth_labels():
-    num_classes = math_ops.cast(array_ops.shape(y_true)[1], y_pred.dtype)
+    num_classes = math_ops.cast(array_ops.shape(y_true)[-1], y_pred.dtype)
    return y_true * (1.0 - label_smoothing) + (label_smoothing / num_classes)

  y_true = smart_cond.smart_cond(label_smoothing,