Update docs to refer to dynamic-range quantization universally.

PiperOrigin-RevId: 301440536 Change-Id: Ie54aba7649aed76d9c6e61e4e4a37cd07ffab82b
2020-03-17 13:27:04 -07:00 · 2020-03-17 13:27:04 -07:00 · 6ae6b65d2f
commit 6ae6b65d2f
parent ac31656224
2 changed files with 20 additions and 20 deletions
--- a/tensorflow/lite/g3doc/performance/model_optimization.md
+++ b/tensorflow/lite/g3doc/performance/model_optimization.md
@ -89,7 +89,7 @@ The following types of quantization are available in TensorFlow Lite:
 Technique                                                                                                      | Data requirements                | Size reduction | Accuracy                    | Supported hardware
 -------------------------------------------------------------------------------------------------------------- | -------------------------------- | -------------- | --------------------------- | ------------------
 [Post-training float16 quantization](post_training_float16_quant.ipynb)                                        | No data                          | Up to 50%      | Insignificant accuracy loss | CPU, GPU
-[Post-training weight quantization](post_training_quant.ipynb)                                                 | No data                          | Up to 75%      | Accuracy loss               | CPU
+[Post-training dynamic range quantization](post_training_quant.ipynb)                                          | No data                          | Up to 75%      | Accuracy loss               | CPU
 [Post-training integer quantization](post_training_integer_quant.ipynb)                                        | Unlabelled representative sample | Up to 75%      | Smaller accuracy loss       | CPU, EdgeTPU, Hexagon DSP
 [Quantization-aware training](https://github.com/tensorflow/tensorflow/tree/r1.13/tensorflow/contrib/quantize) | Labelled training data           | Up to 75%      | Smallest accuracy loss      | CPU, EdgeTPU, Hexagon DSP
--- a/tensorflow/lite/g3doc/performance/post_training_quantization.md
+++ b/tensorflow/lite/g3doc/performance/post_training_quantization.md
@ -15,9 +15,9 @@ There are several post-training quantization options to choose from. Here is a
 summary table of the choices and the benefits they provide:
 | Technique                 | Benefits                  | Hardware            |
-| -------------------------- | ------------------------- | ------------------- |
+| ------------------------- | ------------------------- | ------------------- |
-| Weight quantization        | 4x smaller, 2-3x speedup, | CPU                 |
+| Dynamic range             | 4x smaller, 2-3x speedup, | CPU                 |
-:                            : accuracy                  :                     :
+: quantization              : accuracy                  :                     :
 | Full integer quantization | 4x smaller, 3x+ speedup   | CPU, Edge TPU, etc. |
 | Float16 quantization      | 2x smaller, potential GPU | CPU/GPU             |
 :                           : acceleration              :                     :
@ -34,29 +34,29 @@ However, doing so requires some model modifications to add fake quantization
 nodes, whereas the post-training quantization techniques on this page use an
 existing pre-trained model.
 ### Dynamic range quantization
-### Weight quantization
+The simplest form of post-training quantization statically quantizes only the
-
+weights from floating point to 8-bits of precision. This technique is enabled as
-The simplest form of post-training quantization quantizes only the weights from
+an option in the [TensorFlow Lite converter](../convert/):
 floating point to 8-bits of precision (also called "hybrid" quantization). This
 technique is enabled as an option in the [TensorFlow Lite
 converter](../convert/):
 ```
 import tensorflow as tf
 converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
-converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
 tflite_quant_model = converter.convert()
 ```
 At inference, weights are converted from 8-bits of precision to floating point and
 computed using floating-point kernels. This conversion is done once and cached to reduce latency.
-To further improve latency, hybrid operators dynamically quantize activations to 8-bits and
+To further improve latency, "dynamic-range" operators dynamically quantize
-perform computations with 8-bit weights and activations. This optimization provides latencies
+activations based on their range to 8-bits and perform computations with 8-bit
-close to fully fixed-point inference. However, the outputs are still stored using
+weights and activations. This optimization provides latencies close to fully
-floating point, so that the speedup with hybrid ops is less than a full fixed-point computation.
+fixed-point inference. However, the outputs are still stored using floating
-Hybrid ops are available for the most compute-intensive operators in a network:
+point, so that the speedup with dynamic-range ops is less than a full
 fixed-point computation. Dynamic-range ops are available for the most
 compute-intensive operators in a network:
 *  [tf.contrib.layers.fully_connected](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected)
 *  [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)