Update docs to refer to dynamic-range quantization universally.

PiperOrigin-RevId: 301440536 Change-Id: Ie54aba7649aed76d9c6e61e4e4a37cd07ffab82b
2020-03-17 13:27:04 -07:00 · 2020-03-17 13:27:04 -07:00 · 6ae6b65d2f
commit 6ae6b65d2f
parent ac31656224
2 changed files with 20 additions and 20 deletions
--- a/tensorflow/lite/g3doc/performance/model_optimization.md
+++ b/tensorflow/lite/g3doc/performance/model_optimization.md
@ -89,7 +89,7 @@ The following types of quantization are available in TensorFlow Lite:
 Technique                                                                                                      | Data requirements                | Size reduction | Accuracy                    | Supported hardware
 -------------------------------------------------------------------------------------------------------------- | -------------------------------- | -------------- | --------------------------- | ------------------
 [Post-training float16 quantization](post_training_float16_quant.ipynb)                                        | No data                          | Up to 50%      | Insignificant accuracy loss | CPU, GPU
-[Post-training weight quantization](post_training_quant.ipynb)                                                 | No data                          | Up to 75%      | Accuracy loss               | CPU
+[Post-training dynamic range quantization](post_training_quant.ipynb)                                          | No data                          | Up to 75%      | Accuracy loss               | CPU
 [Post-training integer quantization](post_training_integer_quant.ipynb)                                        | Unlabelled representative sample | Up to 75%      | Smaller accuracy loss       | CPU, EdgeTPU, Hexagon DSP
 [Quantization-aware training](https://github.com/tensorflow/tensorflow/tree/r1.13/tensorflow/contrib/quantize) | Labelled training data           | Up to 75%      | Smallest accuracy loss      | CPU, EdgeTPU, Hexagon DSP

--- a/tensorflow/lite/g3doc/performance/post_training_quantization.md
+++ b/tensorflow/lite/g3doc/performance/post_training_quantization.md
@ -14,13 +14,13 @@ Note: The procedures on this page require TensorFlow 1.15 or higher.
 There are several post-training quantization options to choose from. Here is a
 summary table of the choices and the benefits they provide:

-| Technique                  | Benefits                  | Hardware            |
-| -------------------------- | ------------------------- | ------------------- |
-| Weight quantization        | 4x smaller, 2-3x speedup, | CPU                 |
-:                            : accuracy                  :                     :
-| Full integer quantization  | 4x smaller, 3x+ speedup   | CPU, Edge TPU, etc. |
-| Float16 quantization       | 2x smaller, potential GPU | CPU/GPU             |
-:                            : acceleration              :                     :
+| Technique                 | Benefits                  | Hardware            |
+| ------------------------- | ------------------------- | ------------------- |
+| Dynamic range             | 4x smaller, 2-3x speedup, | CPU                 |
+: quantization              : accuracy                  :                     :
+| Full integer quantization | 4x smaller, 3x+ speedup   | CPU, Edge TPU, etc. |
+| Float16 quantization      | 2x smaller, potential GPU | CPU/GPU             |
+:                           : acceleration              :                     :

 This decision tree can help determine which post-training quantization method is
 best for your use case:
@ -34,29 +34,29 @@ However, doing so requires some model modifications to add fake quantization
 nodes, whereas the post-training quantization techniques on this page use an
 existing pre-trained model.

+### Dynamic range quantization

-### Weight quantization
-
-The simplest form of post-training quantization quantizes only the weights from
-floating point to 8-bits of precision (also called "hybrid" quantization). This
-technique is enabled as an option in the [TensorFlow Lite
-converter](../convert/):
+The simplest form of post-training quantization statically quantizes only the
+weights from floating point to 8-bits of precision. This technique is enabled as
+an option in the [TensorFlow Lite converter](../convert/):

 ```
 import tensorflow as tf
 converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
-converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
 tflite_quant_model = converter.convert()
 ```

 At inference, weights are converted from 8-bits of precision to floating point and
 computed using floating-point kernels. This conversion is done once and cached to reduce latency.

-To further improve latency, hybrid operators dynamically quantize activations to 8-bits and
-perform computations with 8-bit weights and activations. This optimization provides latencies
-close to fully fixed-point inference. However, the outputs are still stored using
-floating point, so that the speedup with hybrid ops is less than a full fixed-point computation.
-Hybrid ops are available for the most compute-intensive operators in a network:
+To further improve latency, "dynamic-range" operators dynamically quantize
+activations based on their range to 8-bits and perform computations with 8-bit
+weights and activations. This optimization provides latencies close to fully
+fixed-point inference. However, the outputs are still stored using floating
+point, so that the speedup with dynamic-range ops is less than a full
+fixed-point computation. Dynamic-range ops are available for the most
+compute-intensive operators in a network:

 *  [tf.contrib.layers.fully_connected](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected)
 *  [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)