Update TensorFlow Lite Converter Docs in TensorFlow 1.x

PiperOrigin-RevId: 328436791 Change-Id: I60a94d4e1ba26ce9d31c027aef2f8c35d063424c
2020-08-25 17:27:18 -07:00 · 2020-08-25 17:27:18 -07:00 · 29fd07dc0b
commit 29fd07dc0b
parent b3f274f6eb
4 changed files with 309 additions and 330 deletions
--- a/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md
+++ b/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md
@ -2,175 +2,166 @@

 This page shows how to use the TensorFlow Lite Converter in the command line.

+_Note: If possible, use the **recommended** [Python API](python_api.md)
+instead._
+
 ## Command-line tools <a name="tools"></a>

+### Starting from TensorFlow 1.9
+
 There are two approaches to running the converter in the command line.

-*   `tflite_convert`: Starting from TensorFlow 1.9, the command-line tool
-    `tflite_convert` is installed as part of the Python package. All of the
-    examples below use `tflite_convert` for simplicity.
-    *   Example: `tflite_convert --output_file=...`
-*   `bazel`: In order to run the latest version of the TensorFlow Lite Converter
-    either install the nightly build using
-    [pip](https://www.tensorflow.org/install/pip) or
-    [clone the TensorFlow repository](https://www.tensorflow.org/install/source)
-    and use `bazel`.
-    *   Example: `bazel run
+*   `tflite_convert` (**recommended**):
+    *   *Install*: TensorFlow using
+        [pip](https://www.tensorflow.org/install/pip).
+    *   *Example*: `tflite_convert --output_file=...`
+*   `bazel`:
+    *   *Install*: TensorFlow from
+        [source](https://www.tensorflow.org/install/source).
+    *   *Example*: `bazel run
        //third_party/tensorflow/lite/python:tflite_convert --
        --output_file=...`

-### Converting models prior to TensorFlow 1.9 <a name="pre_tensorflow_1.9"></a>
+*All of the following examples use `tflite_convert` for simplicity.
+Alternatively, you can replace '`tflite_convert`' with '`bazel run
+//tensorflow/lite/python:tflite_convert --`'*
+
+### Prior to TensorFlow 1.9 <a name="pre_tensorflow_1.9"></a>

 The recommended approach for using the converter prior to TensorFlow 1.9 is the
-[Python API](python_api.md#pre_tensorflow_1.9). If a command line tool is
-desired, the `toco` command line tool was available in TensorFlow 1.7. Enter
-`toco --help` in Terminal for additional details on the command-line flags
-available. There were no command line tools in TensorFlow 1.8.
+[Python API](python_api.md). Only in TensorFlow 1.7, a command line tool `toco`
+was available (run `toco --help` for additional details).

-## Basic examples <a name="basic"></a>
+## Usage <a name="usage"></a>

-The following section shows examples of how to convert a basic float-point model
-from each of the supported data formats into a TensorFlow Lite FlatBuffers.
+### Setup <a name="download_models"></a>

-### Convert a TensorFlow GraphDef <a name="graphdef"></a>
-
-The follow example converts a basic TensorFlow GraphDef (frozen by
-[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py))
-into a TensorFlow Lite FlatBuffer to perform floating-point inference. Frozen
-graphs contain the variables stored in Checkpoint files as Const ops.
+Before we begin, download the models required to run the examples in this
+document:

 ```
+echo "Download MobileNet V1"
 curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
  | tar xzv -C /tmp
+
+echo "Download Inception V1"
+curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
+  | tar xzv -C /tmp
+```
+
+### Basic examples <a name="basic"></a>
+
+The following section shows examples of how to convert a basic model from each
+of the supported data formats into a TensorFlow Lite model.
+
+#### Convert a SavedModel <a name="savedmodel"></a>
+
+```
+tflite_convert \
+  --saved_model_dir=/tmp/saved_model \
+  --output_file=/tmp/foo.tflite
+```
+
+#### Convert a tf.keras model <a name="keras"></a>
+
+```
+tflite_convert \
+  --keras_model_file=/tmp/keras_model.h5 \
+  --output_file=/tmp/foo.tflite
+```
+
+#### Convert a Frozen GraphDef <a name="graphdef"></a>
+
+```
 tflite_convert \
-  --output_file=/tmp/foo.tflite \
  --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
+  --output_file=/tmp/foo.tflite \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1
 ```

-The value for `input_shapes` is automatically determined whenever possible.
+Frozen GraphDef models (or frozen graphs) are produced by
+[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)
+and require additional flags `--input_arrays` and `--output_arrays` as this
+information is not stored in the model format.

-### Convert a TensorFlow SavedModel <a name="savedmodel"></a>
+### Advanced examples

-The follow example converts a basic TensorFlow SavedModel into a Tensorflow Lite
-FlatBuffer to perform floating-point inference.
+#### Convert a quantization aware trained model into a quantized TensorFlow Lite model
+
+If you have a quantization aware trained model (i.e, a model inserted with
+`FakeQuant*` operations which record the (min, max) ranges of tensors in order
+to quantize them), then convert it into a quantized TensorFlow Lite model as
+shown below:

 ```
 tflite_convert \
+  --graph_def_file=/tmp/some_mobilenetv1_quantized_frozen_graph.pb \
  --output_file=/tmp/foo.tflite \
-  --saved_model_dir=/tmp/saved_model
-```
-
-[SavedModel](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
-has fewer required flags than frozen graphs due to access to additional data
-contained within the SavedModel. The values for `--input_arrays` and
-`--output_arrays` are an aggregated, alphabetized list of the inputs and outputs
-in the [SignatureDefs](../../serving/signature_defs.md) within
-the
-[MetaGraphDef](https://www.tensorflow.org/saved_model#apis_to_build_and_load_a_savedmodel)
-specified by `--saved_model_tag_set`. As with the GraphDef, the value for
-`input_shapes` is automatically determined whenever possible.
-
-There is currently no support for MetaGraphDefs without a SignatureDef or for
-MetaGraphDefs that use the [`assets/`
-directory](https://www.tensorflow.org/guide/saved_model#structure_of_a_savedmodel_directory).
-
-### Convert a tf.Keras model <a name="keras"></a>
-
-The following example converts a `tf.keras` model into a TensorFlow Lite
-Flatbuffer. The `tf.keras` file must contain both the model and the weights.
-
-```
-tflite_convert \
-  --output_file=/tmp/foo.tflite \
-  --keras_model_file=/tmp/keras_model.h5
-```
-
-## Quantization
-
-### Convert a TensorFlow GraphDef for quantized inference <a name="graphdef_quant"></a>
-
-The TensorFlow Lite Converter is compatible with fixed point quantization models
-described
-[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/quantize/README.md).
-These are float models with `FakeQuant*` ops inserted at the boundaries of fused
-layers to record min-max range information. This generates a quantized inference
-workload that reproduces the quantization behavior that was used during
-training.
-
-The following command generates a quantized TensorFlow Lite FlatBuffer from a
-"quantized" TensorFlow GraphDef.
-
-```
-tflite_convert \
-  --output_file=/tmp/foo.tflite \
-  --graph_def_file=/tmp/some_quantized_graph.pb \
-  --inference_type=QUANTIZED_UINT8 \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1 \
-  --mean_values=128 \
-  --std_dev_values=127
+  --inference_type=INT8 \
+  --mean_values=-0.5 \
+  --std_dev_values=127.7
 ```

-### Use \"dummy-quantization\" to try out quantized inference on a float graph <a name="dummy_quant"></a>
+*If you're setting `--inference_type=QUANTIZED_UINT8` then update
+`--mean_values=128` and `--std_dev_values=127`*

-In order to evaluate the possible benefit of generating a quantized graph, the
-converter allows "dummy-quantization" on float graphs. The flags
-`--default_ranges_min` and `--default_ranges_max` accept plausible values for
-the min-max ranges of the values in all arrays that do not have min-max
-information. "Dummy-quantization" will produce lower accuracy but will emulate
-the performance of a correctly quantized model.
+#### Convert a model with \"dummy-quantization\" into a quantized TensorFlow Lite model
+
+If you have a regular float model and only want to estimate the benefit of a
+quantized model, i.e, estimate the performance of the model as if it were
+quantized aware trained, then perform "dummy-quantization" using the flags
+`--default_ranges_min` and `--default_ranges_max`. When specified, they will be
+used as default (min, max) range for all the tensors that lack (min, max) range
+information. This will allow quantization to proceed and help you emulate the
+performance of a quantized TensorFlow Lite model but it will have a lower
+accuracy.

 The example below contains a model using Relu6 activation functions. Therefore,
 a reasonable guess is that most activation ranges should be contained in [0, 6].

 ```
-curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
-  | tar xzv -C /tmp
 tflite_convert \
-  --output_file=/tmp/foo.cc \
  --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
-  --inference_type=QUANTIZED_UINT8 \
+  --output_file=/tmp/foo.tflite \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1 \
+  --inference_type=INT8 \
+  --mean_values=-0.5 \
+  --std_dev_values=127.7
  --default_ranges_min=0 \
  --default_ranges_max=6 \
-  --mean_values=128 \
-  --std_dev_values=127
 ```

-## Specifying input and output arrays
+*If you're setting `--inference_type=QUANTIZED_UINT8` then update
+`--mean_values=128` and `--std_dev_values=127`*

-### Multiple input arrays
+#### Convert a model with multiple input arrays

 The flag `input_arrays` takes in a comma-separated list of input arrays as seen
 in the example below. This is useful for models or subgraphs with multiple
-inputs.
+inputs. Note that `--input_shapes` is provided as a colon-separated list. Each
+input shape corresponds to the input array at the same position in the
+respective list.

 ```
-curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
-  | tar xzv -C /tmp
 tflite_convert \
  --graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \
  --output_file=/tmp/foo.tflite \
-  --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
  --input_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_3/MaxPool_0a_3x3/MaxPool,InceptionV1/InceptionV1/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu \
+  --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
  --output_arrays=InceptionV1/Logits/Predictions/Reshape_1
 ```

-Note that `input_shapes` is provided as a colon-separated list. Each input shape
-corresponds to the input array at the same position in the respective list.
+#### Convert a model with multiple output arrays

-### Multiple output arrays
-
-The flag `output_arrays` takes in a comma-separated list of output arrays as
+The flag `--output_arrays` takes in a comma-separated list of output arrays as
 seen in the example below. This is useful for models or subgraphs with multiple
 outputs.

 ```
-curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
-  | tar xzv -C /tmp
 tflite_convert \
  --graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \
  --output_file=/tmp/foo.tflite \
@ -178,50 +169,45 @@ tflite_convert \
  --output_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu
 ```

-### Specifying subgraphs
+### Convert a model by specifying subgraphs

 Any array in the input file can be specified as an input or output array in
-order to extract subgraphs out of an input graph file. The TensorFlow Lite
-Converter discards the parts of the graph outside of the specific subgraph. Use
-[graph visualizations](#graph_visualizations) to identify the input and output
-arrays that make up the desired subgraph.
+order to extract subgraphs out of an input model file. The TensorFlow Lite
+Converter discards the parts of the model outside of the specific subgraph. Use
+[visualization](#visualization) to identify the input and output arrays that
+make up the desired subgraph.

 The follow command shows how to extract a single fused layer out of a TensorFlow
 GraphDef.

 ```
-curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
-  | tar xzv -C /tmp
 tflite_convert \
  --graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \
  --output_file=/tmp/foo.pb \
-  --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
  --input_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_3/MaxPool_0a_3x3/MaxPool,InceptionV1/InceptionV1/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu \
+  --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
  --output_arrays=InceptionV1/InceptionV1/Mixed_3b/concat_v2
 ```

-Note that the final representation in TensorFlow Lite FlatBuffers tends to have
+Note that the final representation in TensorFlow Lite models tends to have
 coarser granularity than the very fine granularity of the TensorFlow GraphDef
 representation. For example, while a fully-connected layer is typically
-represented as at least four separate ops in TensorFlow GraphDef (Reshape,
-MatMul, BiasAdd, Relu...), it is typically represented as a single "fused" op
-(FullyConnected) in the converter's optimized representation and in the final
-on-device representation. As the level of granularity gets coarser, some
-intermediate arrays (say, the array between the MatMul and the BiasAdd in the
-TensorFlow GraphDef) are dropped.
+represented as at least four separate operations in TensorFlow GraphDef
+(Reshape, MatMul, BiasAdd, Relu...), it is typically represented as a single
+"fused" op (FullyConnected) in the converter's optimized representation and in
+the final on-device representation. As the level of granularity gets coarser,
+some intermediate arrays (say, the array between the MatMul and the BiasAdd in
+the TensorFlow GraphDef) are dropped.

 When specifying intermediate arrays as `--input_arrays` and `--output_arrays`,
 it is desirable (and often required) to specify arrays that are meant to survive
-in the final form of the graph, after fusing. These are typically the outputs of
+in the final form of the model, after fusing. These are typically the outputs of
 activation functions (since everything in each layer until the activation
 function tends to get fused).

-## Logging
+## Visualization <a name="visualization"></a>

-
-## Graph visualizations
-
-The converter can export a graph to the Graphviz Dot format for easy
+The converter can export a model to the Graphviz Dot format for easy
 visualization using either the `--output_format` flag or the
 `--dump_graphviz_dir` flag. The subsections below outline the use cases for
 each.
@ -229,21 +215,20 @@ each.
 ### Using `--output_format=GRAPHVIZ_DOT` <a name="using_output_format_graphviz_dot"></a>

 The first way to get a Graphviz rendering is to pass `GRAPHVIZ_DOT` into
-`--output_format`. This results in a plausible visualization of the graph. This
+`--output_format`. This results in a plausible visualization of the model. This
 reduces the requirements that exist during conversion from a TensorFlow GraphDef
-to a TensorFlow Lite FlatBuffer. This may be useful if the conversion to TFLite
-is failing.
+to a TensorFlow Lite model. This may be useful if the conversion to TFLite is
+failing.

 ```
-curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
-  | tar xzv -C /tmp
 tflite_convert \
  --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
  --output_file=/tmp/foo.dot \
  --output_format=GRAPHVIZ_DOT \
-  --input_shape=1,128,128,3 \
  --input_arrays=input \
+  --input_shape=1,128,128,3 \
  --output_arrays=MobilenetV1/Predictions/Reshape_1
+
 ```

 The resulting `.dot` file can be rendered into a PDF as follows:
@ -267,12 +252,10 @@ Example PDF files are viewable online in the next section.
 The second way to get a Graphviz rendering is to pass the `--dump_graphviz_dir`
 flag, specifying a destination directory to dump Graphviz rendering to. Unlike
 the previous approach, this one retains the original output format. This
-provides a visualization of the actual graph resulting from a specific
+provides a visualization of the actual model resulting from a specific
 conversion process.

 ```
-curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
-  | tar xzv -C /tmp
 tflite_convert \
  --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
  --output_file=/tmp/foo.tflite \
@ -283,14 +266,14 @@ tflite_convert \

 This generates a few files in the destination directory. The two most important
 files are `toco_AT_IMPORT.dot` and `/tmp/toco_AFTER_TRANSFORMATIONS.dot`.
-`toco_AT_IMPORT.dot` represents the original graph containing only the
+`toco_AT_IMPORT.dot` represents the original model containing only the
 transformations done at import time. This tends to be a complex visualization
 with limited information about each node. It is useful in situations where a
 conversion command fails.

-`toco_AFTER_TRANSFORMATIONS.dot` represents the graph after all transformations
+`toco_AFTER_TRANSFORMATIONS.dot` represents the model after all transformations
 were applied to it, just before it is exported. Typically, this is a much
-smaller graph with more information about each node.
+smaller model with more information about each node.

 As before, these can be rendered to PDFs:

@ -316,15 +299,15 @@ Sample output files can be seen here below. Note that it is the same
 <tr><td>before</td><td>after</td></tr>
 </table>

-### Graph "video" logging
+### Video logging

 When `--dump_graphviz_dir` is used, one may additionally pass
-`--dump_graphviz_video`. This causes a graph visualization to be dumped after
-each individual graph transformation, resulting in thousands of files.
+`--dump_graphviz_video`. This causes a model visualization to be dumped after
+each individual model transformation, resulting in thousands of files.
 Typically, one would then bisect into these files to understand when a given
-change was introduced in the graph.
+change was introduced in the model.

-### Legend for the graph visualizations <a name="graphviz_legend"></a>
+### Legend for the Visualizations <a name="graphviz_legend"></a>

 *   Operators are red square boxes with the following hues of red:
    *   Most operators are
--- a/tensorflow/lite/g3doc/r1/convert/cmdline_reference.md
+++ b/tensorflow/lite/g3doc/r1/convert/cmdline_reference.md
@ -1,42 +1,41 @@
 # Converter command line reference

 This page is complete reference of command-line flags used by the TensorFlow
-Lite Converter's command line starting from TensorFlow 1.9 up until the most
-recent build of TensorFlow.
+Lite Converter's command line tool.

 ## High-level flags

 The following high level flags specify the details of the input and output
 files. The flag `--output_file` is always required. Additionally, either
-`--graph_def_file`, `--saved_model_dir` or `--keras_model_file` is required.
+`--saved_model_dir`, `--keras_model_file` or `--graph_def_file` is required.

 *   `--output_file`. Type: string. Specifies the full path of the output file.
-*   `--graph_def_file`. Type: string. Specifies the full path of the input
-    GraphDef file frozen using
-    [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
 *   `--saved_model_dir`. Type: string. Specifies the full path to the directory
    containing the SavedModel.
 *   `--keras_model_file`. Type: string. Specifies the full path of the HDF5 file
    containing the tf.keras model.
+*   `--graph_def_file`. Type: string. Specifies the full path of the input
+    GraphDef file frozen using
+    [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
 *   `--output_format`. Type: string. Default: `TFLITE`. Specifies the format of
    the output file. Allowed values:
-    *   `TFLITE`: TensorFlow Lite FlatBuffer format.
+    *   `TFLITE`: TensorFlow Lite model format.
    *   `GRAPHVIZ_DOT`: GraphViz `.dot` format containing a visualization of the
        graph after graph transformations.
        *   Note that passing `GRAPHVIZ_DOT` to `--output_format` leads to loss
-            of TFLite specific transformations. Therefore, the resulting
-            visualization may not reflect the final set of graph
-            transformations. To get a final visualization with all graph
-            transformations use `--dump_graphviz_dir` instead.
+            of TFLite specific transformations. To get a final visualization
+            with all graph transformations use `--dump_graphviz_dir` instead.

 The following flags specify optional parameters when using SavedModels.

-*   `--saved_model_tag_set`. Type: string. Default:
-    [kSavedModelTagServe](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/tag_constants.h).
+*   `--saved_model_tag_set`. Type: string. Default: "serve" (for more options,
+    refer to
+    [tag_constants.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/tag_constants.h)).
    Specifies a comma-separated set of tags identifying the MetaGraphDef within
    the SavedModel to analyze. All tags in the tag set must be specified.
-*   `--saved_model_signature_key`. Type: string. Default:
-    `tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY`.
+*   `--saved_model_signature_key`. Type: string. Default: "serving_default" (for
+    more options, refer to
+    [tf.compat.v1.saved_model.signature_constants](https://www.tensorflow.org/api_docs/python/tf/compat/v1/saved_model/signature_constants)).
    Specifies the key identifying the SignatureDef containing inputs and
    outputs.

@ -46,9 +45,9 @@ The following flags specify optional parameters when using SavedModels.
 file.

 *   `--input_arrays`. Type: comma-separated list of strings. Specifies the list
-    of names of input activation tensors.
+    of names of input tensors.
 *   `--output_arrays`. Type: comma-separated list of strings. Specifies the list
-    of names of output activation tensors.
+    of names of output tensors.

 The following flags define properties of the input tensors. Each item in the
 `--input_arrays` flag should correspond to each item in the following flags
@ -56,8 +55,7 @@ based on index.

 *   `--input_shapes`. Type: colon-separated list of comma-separated lists of
    integers. Each comma-separated list of integers gives the shape of one of
-    the input arrays specified in
-    [TensorFlow convention](https://www.tensorflow.org/guide/tensors#shape).
+    the input arrays.
    *   Example: `--input_shapes=1,60,80,3` for a typical vision model means a
        batch size of 1, an input image height of 60, an input image width of
        80, and an input image depth of 3 (representing RGB channels).
@ -65,24 +63,24 @@ based on index.
        has a shape of [2, 3] and "bar" has a shape of [4, 5, 6].
 *   `--std_dev_values`, `--mean_values`. Type: comma-separated list of floats.
    These specify the (de-)quantization parameters of the input array, when it
-    is quantized. This is only needed if `inference_input_type` is
+    is quantized. This is only needed if `inference_input_type` is `INT8` or
    `QUANTIZED_UINT8`.
    *   The meaning of `mean_values` and `std_dev_values` is as follows: each
        quantized value in the quantized input array will be interpreted as a
        mathematical real number (i.e. as an input activation value) according
        to the following formula:
-        *   `real_value = (quantized_input_value - mean_value) / std_dev_value`.
+        *   `real_value = (quantized_value - mean_value) / std_dev_value`.
    *   When performing float inference (`--inference_type=FLOAT`) on a
        quantized input, the quantized input would be immediately dequantized by
        the inference code according to the above formula, before proceeding
        with float inference.
-    *   When performing quantized inference
-        (`--inference_type=QUANTIZED_UINT8`), no dequantization is performed by
-        the inference code. However, the quantization parameters of all arrays,
-        including those of the input arrays as specified by `mean_value` and
-        `std_dev_value`, determine the fixed-point multipliers used in the
-        quantized inference code. `mean_value` must be an integer when
-        performing quantized inference.
+    *   When performing quantized inference (`inference_type`
+        is`INT8`or`QUANTIZED_UINT8`), no dequantization is performed by the
+        inference code. However, the quantization parameters of all arrays,
+        including those of the input arrays as specified
+        by`mean_value`and`std_dev_value`, determine the fixed-point multipliers
+        used in the quantized inference code.`mean_value` must be an integer
+        when performing quantized inference.

 ## Transformation flags

@ -92,7 +90,7 @@ have.

 *   `--inference_type`. Type: string. Default: `FLOAT`. Data type of all
    real-number arrays in the output file except for input arrays (defined by
-    `--inference_input_type`). Must be `{FLOAT, QUANTIZED_UINT8}`.
+    `--inference_input_type`). Must be `{FLOAT, INT8, QUANTIZED_UINT8}`.

    This flag only impacts real-number arrays including float and quantized
    arrays. This excludes all other data types including plain integer arrays
@ -101,6 +99,9 @@ have.
    *   If `FLOAT`, then real-numbers arrays will be of type float in the output
        file. If they were quantized in the input file, then they get
        dequantized.
+    *   If `INT8`, then real-numbers arrays will be quantized as int8 in the
+        output file. If they were float in the input file, then they get
+        quantized.
    *   If `QUANTIZED_UINT8`, then real-numbers arrays will be quantized as
        uint8 in the output file. If they were float in the input file, then
        they get quantized.
@ -109,7 +110,8 @@ have.
    array in the output file. By default the `--inference_type` is used as type
    of all of the input arrays. Flag is primarily intended for generating a
    float-point graph with a quantized input array. A Dequantized operator is
-    added immediately after the input array. Must be `{FLOAT, QUANTIZED_UINT8}`.
+    added immediately after the input array. Must be `{FLOAT, INT8,
+    QUANTIZED_UINT8}`.

    The flag is typically used for vision models taking a bitmap as input but
    requiring floating-point inference. For such image models, the uint8 input
--- a/tensorflow/lite/g3doc/r1/convert/index.md
+++ b/tensorflow/lite/g3doc/r1/convert/index.md
@ -1,48 +1,48 @@
 # TensorFlow Lite converter

-The TensorFlow Lite converter is used to convert TensorFlow models into an
-optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that
-they can be used by the TensorFlow Lite interpreter.
+The TensorFlow Lite converter takes a TensorFlow model and generates a
+TensorFlow Lite model, which is an optimized
+[FlatBuffer](https://google.github.io/flatbuffers/) (identified by the `.tflite`
+file extension).

 Note: This page contains documentation on the converter API for TensorFlow 1.x.
 The API for TensorFlow 2.0 is available
 [here](https://www.tensorflow.org/lite/convert/).

-## FlatBuffers
+## Options
+
+The TensorFlow Lite Converter can be used in two ways:
+
+*   [Python API](python_api.md) (**recommended**): Using the Python API makes it
+    easier to convert models as part of a model development pipeline and helps
+    mitigate compatibility issues early on.
+*   [Command line](cmdline_examples.md)
+
+## Workflow
+
+### Why use the 'FlatBuffer' format?

 FlatBuffer is an efficient open-source cross-platform serialization library. It
-is similar to
-[protocol buffers](https://developers.google.com/protocol-buffers), with the
-distinction that FlatBuffers do not need a parsing/unpacking step to a secondary
-representation before data can be accessed, avoiding per-object memory
-allocation. The code footprint of FlatBuffers is an order of magnitude smaller
-than protocol buffers.
+is similar to [protocol buffers](https://developers.google.com/protocol-buffers)
+used in the TensorFlow model format, with the distinction that FlatBuffers do
+not need a parsing/unpacking step to a secondary representation before data can
+be accessed, avoiding per-object memory allocation. The code footprint of
+FlatBuffers is an order of magnitude smaller than protocol buffers.

-## From model training to device deployment
-
-The TensorFlow Lite converter generates a TensorFlow Lite
-[FlatBuffer](https://google.github.io/flatbuffers/) file (`.tflite`) from a
-TensorFlow model.
+### Convert the model

 The converter supports the following input formats:

 *   [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
-*   Frozen `GraphDef`: Models generated by
+*   `tf.keras` H5 models.
+*   Frozen `GraphDef` models generated using
    [freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py).
-*   `tf.keras` HDF5 models.
-*   Any model taken from a `tf.Session` (Python API only).
+*   `tf.Session` models (Python API only).

-The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and
-the TensorFlow Lite interpreter uses the compressed model for on-device
-inference. This conversion process is shown in the diagram below:
+### Run inference
+
+The TensorFlow Lite model is then deployed to a client device, and the
+TensorFlow Lite interpreter uses the compressed model for on-device inference.
+This conversion process is shown in the diagram below:

 ![TFLite converter workflow](../images/convert/workflow.svg)
-
-## Options
-
-The TensorFlow Lite Converter can be used from either of these two options:
-
-*   [Python](python_api.md) (**Preferred**): Using the Python API makes it
-    easier to convert models as part of a model development pipeline, and helps
-    mitigate [compatibility](../tf_ops_compatibility.md) issues early on.
-*   [Command line](cmdline_examples.md)
--- a/tensorflow/lite/g3doc/r1/convert/python_api.md
+++ b/tensorflow/lite/g3doc/r1/convert/python_api.md
@ -1,119 +1,41 @@
 # Converter Python API guide

 This page describes how to convert TensorFlow models into the TensorFlow Lite
-format using the TensorFlow Lite Converter Python API.
+format using the
+[`tf.compat.v1.lite.TFLiteConverter`](https://www.tensorflow.org/api_docs/python/tf/compat/v1/lite/TFLiteConverter)
+Python API. It provides the following class methods based on the original format
+of the model:

-If you're looking for information about how to run a TensorFlow Lite model,
-see [TensorFlow Lite inference](../guide/inference.md).
+*   `tf.compat.v1.lite.TFLiteConverter.from_keras_model_file()`: Converts a
+    [Keras](https://www.tensorflow.org/guide/keras/overview) model file.
+*   `tf.compat.v1.lite.TFLiteConverter.from_saved_model()`: Converts a
+    [SavedModel](https://www.tensorflow.org/guide/saved_model).
+*   `tf.compat.v1.lite.TFLiteConverter.from_session()`: Converts a GraphDef from
+    a session.
+*   `tf.compat.v1.lite.TFLiteConverter.from_frozen_graph()`: Converts a Frozen
+    GraphDef from a file. If you have checkpoints, then first convert it to a
+    Frozen GraphDef file and then use this API as shown [here](#checkpoints).

-Note: This page describes the converter in the TensorFlow nightly release,
-installed using `pip install tf-nightly`. For docs describing older versions
-reference ["Converting models from TensorFlow 1.12"](#pre_tensorflow_1.12).
-
-
-## High-level overview
-
-While the TensorFlow Lite Converter can be used from the command line, it is
-often convenient to use in a Python script as part of the model development
-pipeline. This allows you to know early that you are designing a model that can
-be targeted to devices with mobile.
-
-## API
-
-The API for converting TensorFlow models to TensorFlow Lite is
-`tf.lite.TFLiteConverter`, which provides class methods based on the original
-format of the model. For example, `TFLiteConverter.from_session()` is available
-for GraphDefs, `TFLiteConverter.from_saved_model()` is available for
-SavedModels, and `TFLiteConverter.from_keras_model_file()` is available for
-`tf.Keras` files.
-
-Example usages for simple float-point models are shown in
-[Basic Examples](#basic). Examples usages for more complex models is shown in
-[Complex Examples](#complex).
+In the following sections, we discuss [basic examples](#basic) and
+[complex examples](#complex).

 ## Basic examples <a name="basic"></a>

-The following section shows examples of how to convert a basic float-point model
-from each of the supported data formats into a TensorFlow Lite FlatBuffers.
+The following section shows examples of how to convert a basic model from each
+of the supported model formats into a TensorFlow Lite model.

-### Exporting a GraphDef from tf.Session <a name="basic_graphdef_sess"></a>
-
-The following example shows how to convert a TensorFlow GraphDef into a
-TensorFlow Lite FlatBuffer from a `tf.Session` object.
+### Convert a Keras model file <a name="basic_keras_file"></a>

 ```python
 import tensorflow as tf

-img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
-var = tf.get_variable("weights", dtype=tf.float32, shape=(1, 64, 64, 3))
-val = img + var
-out = tf.identity(val, name="out")
-
-with tf.Session() as sess:
-  sess.run(tf.global_variables_initializer())
-  converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
-  tflite_model = converter.convert()
-  open("converted_model.tflite", "wb").write(tflite_model)
-```
-
-### Exporting a GraphDef from file <a name="basic_graphdef_file"></a>
-
-The following example shows how to convert a TensorFlow GraphDef into a
-TensorFlow Lite FlatBuffer when the GraphDef is stored in a file. Both `.pb` and
-`.pbtxt` files are accepted.
-
-The example uses
-[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz).
-The function only supports GraphDefs frozen using
-[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
-
-```python
-import tensorflow as tf
-
-graph_def_file = "/path/to/Downloads/mobilenet_v1_1.0_224/frozen_graph.pb"
-input_arrays = ["input"]
-output_arrays = ["MobilenetV1/Predictions/Softmax"]
-
-converter = tf.lite.TFLiteConverter.from_frozen_graph(
-  graph_def_file, input_arrays, output_arrays)
+converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file("keras_model.h5")
 tflite_model = converter.convert()
 open("converted_model.tflite", "wb").write(tflite_model)
 ```

-### Exporting a SavedModel <a name="basic_savedmodel"></a>
-
-The following example shows how to convert a SavedModel into a TensorFlow Lite
-FlatBuffer.
-
-```python
-import tensorflow as tf
-
-converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
-tflite_model = converter.convert()
-open("converted_model.tflite", "wb").write(tflite_model)
-```
-
-For more complex SavedModels, the optional parameters that can be passed into
-`TFLiteConverter.from_saved_model()` are `input_arrays`, `input_shapes`,
-`output_arrays`, `tag_set` and `signature_key`. Details of each parameter are
-available by running `help(tf.lite.TFLiteConverter)`.
-
-### Exporting a tf.keras File <a name="basic_keras_file"></a>
-
-The following example shows how to convert a `tf.keras` model into a TensorFlow
-Lite FlatBuffer. This example requires
-[`h5py`](http://docs.h5py.org/en/latest/build.html) to be installed.
-
-```python
-import tensorflow as tf
-
-converter = tf.lite.TFLiteConverter.from_keras_model_file("keras_model.h5")
-tflite_model = converter.convert()
-open("converted_model.tflite", "wb").write(tflite_model)
-```
-
-The `tf.keras` file must contain both the model and the weights. A comprehensive
-example including model construction can be seen below.
+The Keras file contains both the model and the weights. A comprehensive example
+is given below.

 ```python
 import numpy as np
@ -134,61 +56,133 @@ y = np.random.random((1, 3, 3))
 model.train_on_batch(x, y)
 model.predict(x)

-# Save tf.keras model in HDF5 format.
+# Save tf.keras model in H5 format.
 keras_file = "keras_model.h5"
 tf.keras.models.save_model(model, keras_file)

 # Convert to TensorFlow Lite model.
-converter = tf.lite.TFLiteConverter.from_keras_model_file(keras_file)
+converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(keras_file)
 tflite_model = converter.convert()
 open("converted_model.tflite", "wb").write(tflite_model)
 ```

-## Complex examples <a name="complex"></a>
+### Convert a SavedModel <a name="basic_savedmodel"></a>

-For models where the default value of the attributes is not sufficient, the
-attribute's values should be set before calling `convert()`. In order to call
-any constants use `tf.lite.constants.<CONSTANT_NAME>` as seen below with
-`QUANTIZED_UINT8`. Run `help(tf.lite.TFLiteConverter)` in the Python
-terminal for detailed documentation on the attributes.
+The following example shows how to convert a
+[SavedModel](https://www.tensorflow.org/guide/saved_model) into a TensorFlow
+Lite model.

-Although the examples are demonstrated on GraphDefs containing only constants.
-The same logic can be applied irrespective of the input data format.
+```python
+import tensorflow as tf

-### Exporting a quantized GraphDef <a name="complex_quant"></a>
+converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir)
+tflite_model = converter.convert()
+open("converted_model.tflite", "wb").write(tflite_model)
+```

-The following example shows how to convert a quantized model into a TensorFlow
-Lite FlatBuffer.
+### Convert a GraphDef from a session <a name="basic_graphdef_sess"></a>
+
+The following example shows how to convert a TensorFlow GraphDef into a
+TensorFlow Lite model from a `tf.Session` object.

 ```python
 import tensorflow as tf

 img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
-const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
-val = img + const
-out = tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output")
+var = tf.get_variable("weights", dtype=tf.float32, shape=(1, 64, 64, 3))
+val = img + var
+out = tf.identity(val, name="out")

 with tf.Session() as sess:
-  converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
-  converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
-  input_arrays = converter.get_input_arrays()
-  converter.quantized_input_stats = {input_arrays[0] : (0., 1.)}  # mean, std_dev
+  sess.run(tf.global_variables_initializer())
+  converter = tf.compat.v1.lite.TFLiteConverter.from_session(sess, [img], [out])
  tflite_model = converter.convert()
  open("converted_model.tflite", "wb").write(tflite_model)
 ```

+### Convert a Frozen GraphDef from file <a name="basic_graphdef_file"></a>

-## Additional instructions
+The example uses
+[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz).

-### Build from source code <a name="latest_package"></a>
+```python
+import tensorflow as tf

-In order to run the latest version of the TensorFlow Lite Converter Python API,
-either install the nightly build with
-[pip](https://www.tensorflow.org/install/pip) (recommended) or
-[Docker](https://www.tensorflow.org/install/docker), or
-[build the pip package from source](https://www.tensorflow.org/install/source).
+converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
+    graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb',
+                    # both `.pb` and `.pbtxt` files are accepted.
+    input_arrays=['input'],
+    output_arrays=['MobilenetV1/Predictions/Softmax'],
+    input_shapes={'input' : [1, 224, 224,3]},
+)
+tflite_model = converter.convert()
+open("converted_model.tflite", "wb").write(tflite_model)
+```

-### Converting models from TensorFlow 1.12 <a name="pre_tensorflow_1.12"></a>
+#### Convert checkpoints <a name="checkpoints"></a>
+
+1.  Convert checkpoints to a Frozen GraphDef as follows
+    (*[reference](https://laid.delanover.com/how-to-freeze-a-graph-in-tensorflow/)*):
+
+    *   Install [bazel](https://docs.bazel.build/versions/master/install.html)
+    *   Clone the TensorFlow repository: `git clone
+        https://github.com/tensorflow/tensorflow.git`
+    *   Build freeze graph tool: `bazel build
+        tensorflow/python/tools:freeze_graph`
+        *   The directory from which you run this should contain a file named
+            'WORKSPACE'.
+        *   If you're running on Ubuntu 16.04 OS and face issues, update the
+            command to `bazel build -c opt --copt=-msse4.1 --copt=-msse4.2
+            tensorflow/python/tools:freeze_graph`
+    *   Run freeze graph tool: `bazel run tensorflow/python/tools:freeze_graph
+        --input_graph=/path/to/graph.pbtxt --input_binary=false
+        --input_checkpoint=/path/to/model.ckpt-00010
+        --output_graph=/path/to/frozen_graph.pb
+        --output_node_names=name1,name2.....`
+        *   If you have an input `*.pb` file instead of `*.pbtxt`, then replace
+            `--input_graph=/path/to/graph.pbtxt --input_binary=false` with
+            `--input_graph=/path/to/graph.pb`
+        *   You can find the output names by exploring the graph using
+            [Netron](https://github.com/lutzroeder/netron) or
+            [summarize graph tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs).
+
+2.  Now [convert the Frozen GraphDef file](#basic_graphdef_file) to a TensorFlow
+    Lite model as shown in the example above.
+
+## Complex examples <a name="complex"></a>
+
+For models where the default value of the attributes is not sufficient, the
+attribute's values should be set before calling `convert()`. Run
+`help(tf.compat.v1.lite.TFLiteConverter)` in the Python terminal for detailed
+documentation on the attributes.
+
+### Convert a quantize aware trained model <a name="complex_quant"></a>
+
+The following example shows how to convert a quantize aware trained model into a
+TensorFlow Lite model.
+
+The example uses
+[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz).
+
+```python
+import tensorflow as tf
+
+converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
+    graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb',
+    input_arrays=['input'],
+    output_arrays=['MobilenetV1/Predictions/Softmax'],
+    input_shapes={'input' : [1, 224, 224,3]},
+)
+converter.quantized_input_stats = {['input'] : (0., 1.)}  # mean, std_dev (input range is [-1, 1])
+converter.inference_type = tf.int8 # this is the recommended type.
+# converter.inference_input_type=tf.uint8 # optional
+# converter.inference_output_type=tf.uint8 # optional
+tflite_model = converter.convert()
+with open('mobilenet_v1_1.0_224_quantized.tflite', 'wb') as f:
+  f.write(tflite_model)
+```
+
+## Convert models from TensorFlow 1.12 <a name="pre_tensorflow_1.12"></a>

 Reference the following table to convert TensorFlow models to TensorFlow Lite in
 and before TensorFlow 1.12. Run `help()` to get details of each API.