diff --git a/tensorflow/lite/g3doc/performance/model_optimization.md b/tensorflow/lite/g3doc/performance/model_optimization.md index 17b6867d692..21e85653d1c 100644 --- a/tensorflow/lite/g3doc/performance/model_optimization.md +++ b/tensorflow/lite/g3doc/performance/model_optimization.md @@ -125,6 +125,47 @@ the numbers here: +### Full integer quantization with int16 activations and int8 weights + +[Quantization with int16 activations](https://www.tensorflow.org/model_optimization/guide/quantization/post_training) +is a full integer quantization scheme with activations in int16 and weights in +int8. This mode can improve accuracy of the quantized model in comparison to the +full integer quantization scheme with both activations and weights in int8 +keeping a similar model size. It is recommended when activations are sensitive +to the quantization. + +NOTE: Currently only non-optimized reference kernel implementations are +available in TFLite for this quantization scheme, so by default the performance +will be slow compared to int8 kernels. Full advantages of this mode can +currently be accessed via specialised hardware, or custom software. + +Below are the accuracy results for some models that benefit from this mode. +
+ + + + + + + + + + + + + + + + + + + +
ModelAccuracy metric type Accuracy (float32 activations) Accuracy (int8 activations) Accuracy (int16 activations)
Wav2letterWER6.7%7.7%7.2%
DeepSpeech 0.5.1 (unrolled)CER6.13%43.67%6.52%
YoloV3mAP(IOU=0.5)0.5770.5630.574
MobileNetV1Top-1 Accuracy0.70620.6940.6936
MobileNetV2Top-1 Accuracy0.7180.71260.7137
MobileBertF1(Exact match)88.81(81.23)2.08(0)88.73(81.15)
+
+ Table 2 Benefits of model quantization with int16 activations +
+
+ ### Pruning [Pruning](https://www.tensorflow.org/model_optimization/guide/pruning) works by