Fix label_smoothing in multidimensional CategoricalCrossentropy.

When label smoothing in CategoricalCrossentropy is non-zero, it takes
tf.shape(y_true)[1] as the number of classes. However, if the true
values and predictions are multidimensional (for example when training
a POS tagger where batch elements are sentences composed of words),
a wrong value is taken and the training does not work.

This fix takes the _last_ dimension as the one containing classes.
This commit is contained in:
Milan Straka 2020-02-23 13:58:03 +01:00
parent 390052e2ce
commit 5f1ee72e97

View File

@ -1084,7 +1084,7 @@ def categorical_crossentropy(y_true,
label_smoothing = ops.convert_to_tensor_v2(label_smoothing, dtype=K.floatx())
def _smooth_labels():
num_classes = math_ops.cast(array_ops.shape(y_true)[1], y_pred.dtype)
num_classes = math_ops.cast(array_ops.shape(y_true)[-1], y_pred.dtype)
return y_true * (1.0 - label_smoothing) + (label_smoothing / num_classes)
y_true = smart_cond.smart_cond(label_smoothing,