Update docs to refer to dynamic-range quantization universally.

PiperOrigin-RevId: 301440536
Change-Id: Ie54aba7649aed76d9c6e61e4e4a37cd07ffab82b
This commit is contained in:
Suharsh Sivakumar 2020-03-17 13:27:04 -07:00 committed by TensorFlower Gardener
parent ac31656224
commit 6ae6b65d2f
2 changed files with 20 additions and 20 deletions

View File

@ -89,7 +89,7 @@ The following types of quantization are available in TensorFlow Lite:
Technique | Data requirements | Size reduction | Accuracy | Supported hardware Technique | Data requirements | Size reduction | Accuracy | Supported hardware
-------------------------------------------------------------------------------------------------------------- | -------------------------------- | -------------- | --------------------------- | ------------------ -------------------------------------------------------------------------------------------------------------- | -------------------------------- | -------------- | --------------------------- | ------------------
[Post-training float16 quantization](post_training_float16_quant.ipynb) | No data | Up to 50% | Insignificant accuracy loss | CPU, GPU [Post-training float16 quantization](post_training_float16_quant.ipynb) | No data | Up to 50% | Insignificant accuracy loss | CPU, GPU
[Post-training weight quantization](post_training_quant.ipynb) | No data | Up to 75% | Accuracy loss | CPU [Post-training dynamic range quantization](post_training_quant.ipynb) | No data | Up to 75% | Accuracy loss | CPU
[Post-training integer quantization](post_training_integer_quant.ipynb) | Unlabelled representative sample | Up to 75% | Smaller accuracy loss | CPU, EdgeTPU, Hexagon DSP [Post-training integer quantization](post_training_integer_quant.ipynb) | Unlabelled representative sample | Up to 75% | Smaller accuracy loss | CPU, EdgeTPU, Hexagon DSP
[Quantization-aware training](https://github.com/tensorflow/tensorflow/tree/r1.13/tensorflow/contrib/quantize) | Labelled training data | Up to 75% | Smallest accuracy loss | CPU, EdgeTPU, Hexagon DSP [Quantization-aware training](https://github.com/tensorflow/tensorflow/tree/r1.13/tensorflow/contrib/quantize) | Labelled training data | Up to 75% | Smallest accuracy loss | CPU, EdgeTPU, Hexagon DSP

View File

@ -15,9 +15,9 @@ There are several post-training quantization options to choose from. Here is a
summary table of the choices and the benefits they provide: summary table of the choices and the benefits they provide:
| Technique | Benefits | Hardware | | Technique | Benefits | Hardware |
| -------------------------- | ------------------------- | ------------------- | | ------------------------- | ------------------------- | ------------------- |
| Weight quantization | 4x smaller, 2-3x speedup, | CPU | | Dynamic range | 4x smaller, 2-3x speedup, | CPU |
: : accuracy : : : quantization : accuracy : :
| Full integer quantization | 4x smaller, 3x+ speedup | CPU, Edge TPU, etc. | | Full integer quantization | 4x smaller, 3x+ speedup | CPU, Edge TPU, etc. |
| Float16 quantization | 2x smaller, potential GPU | CPU/GPU | | Float16 quantization | 2x smaller, potential GPU | CPU/GPU |
: : acceleration : : : : acceleration : :
@ -34,29 +34,29 @@ However, doing so requires some model modifications to add fake quantization
nodes, whereas the post-training quantization techniques on this page use an nodes, whereas the post-training quantization techniques on this page use an
existing pre-trained model. existing pre-trained model.
### Dynamic range quantization
### Weight quantization The simplest form of post-training quantization statically quantizes only the
weights from floating point to 8-bits of precision. This technique is enabled as
The simplest form of post-training quantization quantizes only the weights from an option in the [TensorFlow Lite converter](../convert/):
floating point to 8-bits of precision (also called "hybrid" quantization). This
technique is enabled as an option in the [TensorFlow Lite
converter](../convert/):
``` ```
import tensorflow as tf import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert() tflite_quant_model = converter.convert()
``` ```
At inference, weights are converted from 8-bits of precision to floating point and At inference, weights are converted from 8-bits of precision to floating point and
computed using floating-point kernels. This conversion is done once and cached to reduce latency. computed using floating-point kernels. This conversion is done once and cached to reduce latency.
To further improve latency, hybrid operators dynamically quantize activations to 8-bits and To further improve latency, "dynamic-range" operators dynamically quantize
perform computations with 8-bit weights and activations. This optimization provides latencies activations based on their range to 8-bits and perform computations with 8-bit
close to fully fixed-point inference. However, the outputs are still stored using weights and activations. This optimization provides latencies close to fully
floating point, so that the speedup with hybrid ops is less than a full fixed-point computation. fixed-point inference. However, the outputs are still stored using floating
Hybrid ops are available for the most compute-intensive operators in a network: point, so that the speedup with dynamic-range ops is less than a full
fixed-point computation. Dynamic-range ops are available for the most
compute-intensive operators in a network:
* [tf.contrib.layers.fully_connected](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected) * [tf.contrib.layers.fully_connected](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected)
* [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) * [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)