Update TensorFlow Lite Converter Docs in TensorFlow 1.x

PiperOrigin-RevId: 328436791
Change-Id: I60a94d4e1ba26ce9d31c027aef2f8c35d063424c
This commit is contained in:
Meghna Natraj 2020-08-25 17:27:18 -07:00 committed by TensorFlower Gardener
parent b3f274f6eb
commit 29fd07dc0b
4 changed files with 309 additions and 330 deletions

View File

@ -2,175 +2,166 @@
This page shows how to use the TensorFlow Lite Converter in the command line.
_Note: If possible, use the **recommended** [Python API](python_api.md)
instead._
## Command-line tools <a name="tools"></a>
### Starting from TensorFlow 1.9
There are two approaches to running the converter in the command line.
* `tflite_convert`: Starting from TensorFlow 1.9, the command-line tool
`tflite_convert` is installed as part of the Python package. All of the
examples below use `tflite_convert` for simplicity.
* Example: `tflite_convert --output_file=...`
* `bazel`: In order to run the latest version of the TensorFlow Lite Converter
either install the nightly build using
[pip](https://www.tensorflow.org/install/pip) or
[clone the TensorFlow repository](https://www.tensorflow.org/install/source)
and use `bazel`.
* Example: `bazel run
* `tflite_convert` (**recommended**):
* *Install*: TensorFlow using
[pip](https://www.tensorflow.org/install/pip).
* *Example*: `tflite_convert --output_file=...`
* `bazel`:
* *Install*: TensorFlow from
[source](https://www.tensorflow.org/install/source).
* *Example*: `bazel run
//third_party/tensorflow/lite/python:tflite_convert --
--output_file=...`
### Converting models prior to TensorFlow 1.9 <a name="pre_tensorflow_1.9"></a>
*All of the following examples use `tflite_convert` for simplicity.
Alternatively, you can replace '`tflite_convert`' with '`bazel run
//tensorflow/lite/python:tflite_convert --`'*
### Prior to TensorFlow 1.9 <a name="pre_tensorflow_1.9"></a>
The recommended approach for using the converter prior to TensorFlow 1.9 is the
[Python API](python_api.md#pre_tensorflow_1.9). If a command line tool is
desired, the `toco` command line tool was available in TensorFlow 1.7. Enter
`toco --help` in Terminal for additional details on the command-line flags
available. There were no command line tools in TensorFlow 1.8.
[Python API](python_api.md). Only in TensorFlow 1.7, a command line tool `toco`
was available (run `toco --help` for additional details).
## Basic examples <a name="basic"></a>
## Usage <a name="usage"></a>
The following section shows examples of how to convert a basic float-point model
from each of the supported data formats into a TensorFlow Lite FlatBuffers.
### Setup <a name="download_models"></a>
### Convert a TensorFlow GraphDef <a name="graphdef"></a>
The follow example converts a basic TensorFlow GraphDef (frozen by
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py))
into a TensorFlow Lite FlatBuffer to perform floating-point inference. Frozen
graphs contain the variables stored in Checkpoint files as Const ops.
Before we begin, download the models required to run the examples in this
document:
```
echo "Download MobileNet V1"
curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
| tar xzv -C /tmp
echo "Download Inception V1"
curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
| tar xzv -C /tmp
```
### Basic examples <a name="basic"></a>
The following section shows examples of how to convert a basic model from each
of the supported data formats into a TensorFlow Lite model.
#### Convert a SavedModel <a name="savedmodel"></a>
```
tflite_convert \
--saved_model_dir=/tmp/saved_model \
--output_file=/tmp/foo.tflite
```
#### Convert a tf.keras model <a name="keras"></a>
```
tflite_convert \
--keras_model_file=/tmp/keras_model.h5 \
--output_file=/tmp/foo.tflite
```
#### Convert a Frozen GraphDef <a name="graphdef"></a>
```
tflite_convert \
--output_file=/tmp/foo.tflite \
--graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
--output_file=/tmp/foo.tflite \
--input_arrays=input \
--output_arrays=MobilenetV1/Predictions/Reshape_1
```
The value for `input_shapes` is automatically determined whenever possible.
Frozen GraphDef models (or frozen graphs) are produced by
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)
and require additional flags `--input_arrays` and `--output_arrays` as this
information is not stored in the model format.
### Convert a TensorFlow SavedModel <a name="savedmodel"></a>
### Advanced examples
The follow example converts a basic TensorFlow SavedModel into a Tensorflow Lite
FlatBuffer to perform floating-point inference.
#### Convert a quantization aware trained model into a quantized TensorFlow Lite model
If you have a quantization aware trained model (i.e, a model inserted with
`FakeQuant*` operations which record the (min, max) ranges of tensors in order
to quantize them), then convert it into a quantized TensorFlow Lite model as
shown below:
```
tflite_convert \
--graph_def_file=/tmp/some_mobilenetv1_quantized_frozen_graph.pb \
--output_file=/tmp/foo.tflite \
--saved_model_dir=/tmp/saved_model
```
[SavedModel](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
has fewer required flags than frozen graphs due to access to additional data
contained within the SavedModel. The values for `--input_arrays` and
`--output_arrays` are an aggregated, alphabetized list of the inputs and outputs
in the [SignatureDefs](../../serving/signature_defs.md) within
the
[MetaGraphDef](https://www.tensorflow.org/saved_model#apis_to_build_and_load_a_savedmodel)
specified by `--saved_model_tag_set`. As with the GraphDef, the value for
`input_shapes` is automatically determined whenever possible.
There is currently no support for MetaGraphDefs without a SignatureDef or for
MetaGraphDefs that use the [`assets/`
directory](https://www.tensorflow.org/guide/saved_model#structure_of_a_savedmodel_directory).
### Convert a tf.Keras model <a name="keras"></a>
The following example converts a `tf.keras` model into a TensorFlow Lite
Flatbuffer. The `tf.keras` file must contain both the model and the weights.
```
tflite_convert \
--output_file=/tmp/foo.tflite \
--keras_model_file=/tmp/keras_model.h5
```
## Quantization
### Convert a TensorFlow GraphDef for quantized inference <a name="graphdef_quant"></a>
The TensorFlow Lite Converter is compatible with fixed point quantization models
described
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/quantize/README.md).
These are float models with `FakeQuant*` ops inserted at the boundaries of fused
layers to record min-max range information. This generates a quantized inference
workload that reproduces the quantization behavior that was used during
training.
The following command generates a quantized TensorFlow Lite FlatBuffer from a
"quantized" TensorFlow GraphDef.
```
tflite_convert \
--output_file=/tmp/foo.tflite \
--graph_def_file=/tmp/some_quantized_graph.pb \
--inference_type=QUANTIZED_UINT8 \
--input_arrays=input \
--output_arrays=MobilenetV1/Predictions/Reshape_1 \
--mean_values=128 \
--std_dev_values=127
--inference_type=INT8 \
--mean_values=-0.5 \
--std_dev_values=127.7
```
### Use \"dummy-quantization\" to try out quantized inference on a float graph <a name="dummy_quant"></a>
*If you're setting `--inference_type=QUANTIZED_UINT8` then update
`--mean_values=128` and `--std_dev_values=127`*
In order to evaluate the possible benefit of generating a quantized graph, the
converter allows "dummy-quantization" on float graphs. The flags
`--default_ranges_min` and `--default_ranges_max` accept plausible values for
the min-max ranges of the values in all arrays that do not have min-max
information. "Dummy-quantization" will produce lower accuracy but will emulate
the performance of a correctly quantized model.
#### Convert a model with \"dummy-quantization\" into a quantized TensorFlow Lite model
If you have a regular float model and only want to estimate the benefit of a
quantized model, i.e, estimate the performance of the model as if it were
quantized aware trained, then perform "dummy-quantization" using the flags
`--default_ranges_min` and `--default_ranges_max`. When specified, they will be
used as default (min, max) range for all the tensors that lack (min, max) range
information. This will allow quantization to proceed and help you emulate the
performance of a quantized TensorFlow Lite model but it will have a lower
accuracy.
The example below contains a model using Relu6 activation functions. Therefore,
a reasonable guess is that most activation ranges should be contained in [0, 6].
```
curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
| tar xzv -C /tmp
tflite_convert \
--output_file=/tmp/foo.cc \
--graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
--inference_type=QUANTIZED_UINT8 \
--output_file=/tmp/foo.tflite \
--input_arrays=input \
--output_arrays=MobilenetV1/Predictions/Reshape_1 \
--inference_type=INT8 \
--mean_values=-0.5 \
--std_dev_values=127.7
--default_ranges_min=0 \
--default_ranges_max=6 \
--mean_values=128 \
--std_dev_values=127
```
## Specifying input and output arrays
*If you're setting `--inference_type=QUANTIZED_UINT8` then update
`--mean_values=128` and `--std_dev_values=127`*
### Multiple input arrays
#### Convert a model with multiple input arrays
The flag `input_arrays` takes in a comma-separated list of input arrays as seen
in the example below. This is useful for models or subgraphs with multiple
inputs.
inputs. Note that `--input_shapes` is provided as a colon-separated list. Each
input shape corresponds to the input array at the same position in the
respective list.
```
curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
| tar xzv -C /tmp
tflite_convert \
--graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \
--output_file=/tmp/foo.tflite \
--input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
--input_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_3/MaxPool_0a_3x3/MaxPool,InceptionV1/InceptionV1/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu \
--input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
--output_arrays=InceptionV1/Logits/Predictions/Reshape_1
```
Note that `input_shapes` is provided as a colon-separated list. Each input shape
corresponds to the input array at the same position in the respective list.
#### Convert a model with multiple output arrays
### Multiple output arrays
The flag `output_arrays` takes in a comma-separated list of output arrays as
The flag `--output_arrays` takes in a comma-separated list of output arrays as
seen in the example below. This is useful for models or subgraphs with multiple
outputs.
```
curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
| tar xzv -C /tmp
tflite_convert \
--graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \
--output_file=/tmp/foo.tflite \
@ -178,50 +169,45 @@ tflite_convert \
--output_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu
```
### Specifying subgraphs
### Convert a model by specifying subgraphs
Any array in the input file can be specified as an input or output array in
order to extract subgraphs out of an input graph file. The TensorFlow Lite
Converter discards the parts of the graph outside of the specific subgraph. Use
[graph visualizations](#graph_visualizations) to identify the input and output
arrays that make up the desired subgraph.
order to extract subgraphs out of an input model file. The TensorFlow Lite
Converter discards the parts of the model outside of the specific subgraph. Use
[visualization](#visualization) to identify the input and output arrays that
make up the desired subgraph.
The follow command shows how to extract a single fused layer out of a TensorFlow
GraphDef.
```
curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \
| tar xzv -C /tmp
tflite_convert \
--graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \
--output_file=/tmp/foo.pb \
--input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
--input_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_3/MaxPool_0a_3x3/MaxPool,InceptionV1/InceptionV1/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu \
--input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \
--output_arrays=InceptionV1/InceptionV1/Mixed_3b/concat_v2
```
Note that the final representation in TensorFlow Lite FlatBuffers tends to have
Note that the final representation in TensorFlow Lite models tends to have
coarser granularity than the very fine granularity of the TensorFlow GraphDef
representation. For example, while a fully-connected layer is typically
represented as at least four separate ops in TensorFlow GraphDef (Reshape,
MatMul, BiasAdd, Relu...), it is typically represented as a single "fused" op
(FullyConnected) in the converter's optimized representation and in the final
on-device representation. As the level of granularity gets coarser, some
intermediate arrays (say, the array between the MatMul and the BiasAdd in the
TensorFlow GraphDef) are dropped.
represented as at least four separate operations in TensorFlow GraphDef
(Reshape, MatMul, BiasAdd, Relu...), it is typically represented as a single
"fused" op (FullyConnected) in the converter's optimized representation and in
the final on-device representation. As the level of granularity gets coarser,
some intermediate arrays (say, the array between the MatMul and the BiasAdd in
the TensorFlow GraphDef) are dropped.
When specifying intermediate arrays as `--input_arrays` and `--output_arrays`,
it is desirable (and often required) to specify arrays that are meant to survive
in the final form of the graph, after fusing. These are typically the outputs of
in the final form of the model, after fusing. These are typically the outputs of
activation functions (since everything in each layer until the activation
function tends to get fused).
## Logging
## Visualization <a name="visualization"></a>
## Graph visualizations
The converter can export a graph to the Graphviz Dot format for easy
The converter can export a model to the Graphviz Dot format for easy
visualization using either the `--output_format` flag or the
`--dump_graphviz_dir` flag. The subsections below outline the use cases for
each.
@ -229,21 +215,20 @@ each.
### Using `--output_format=GRAPHVIZ_DOT` <a name="using_output_format_graphviz_dot"></a>
The first way to get a Graphviz rendering is to pass `GRAPHVIZ_DOT` into
`--output_format`. This results in a plausible visualization of the graph. This
`--output_format`. This results in a plausible visualization of the model. This
reduces the requirements that exist during conversion from a TensorFlow GraphDef
to a TensorFlow Lite FlatBuffer. This may be useful if the conversion to TFLite
is failing.
to a TensorFlow Lite model. This may be useful if the conversion to TFLite is
failing.
```
curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
| tar xzv -C /tmp
tflite_convert \
--graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
--output_file=/tmp/foo.dot \
--output_format=GRAPHVIZ_DOT \
--input_shape=1,128,128,3 \
--input_arrays=input \
--input_shape=1,128,128,3 \
--output_arrays=MobilenetV1/Predictions/Reshape_1
```
The resulting `.dot` file can be rendered into a PDF as follows:
@ -267,12 +252,10 @@ Example PDF files are viewable online in the next section.
The second way to get a Graphviz rendering is to pass the `--dump_graphviz_dir`
flag, specifying a destination directory to dump Graphviz rendering to. Unlike
the previous approach, this one retains the original output format. This
provides a visualization of the actual graph resulting from a specific
provides a visualization of the actual model resulting from a specific
conversion process.
```
curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \
| tar xzv -C /tmp
tflite_convert \
--graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
--output_file=/tmp/foo.tflite \
@ -283,14 +266,14 @@ tflite_convert \
This generates a few files in the destination directory. The two most important
files are `toco_AT_IMPORT.dot` and `/tmp/toco_AFTER_TRANSFORMATIONS.dot`.
`toco_AT_IMPORT.dot` represents the original graph containing only the
`toco_AT_IMPORT.dot` represents the original model containing only the
transformations done at import time. This tends to be a complex visualization
with limited information about each node. It is useful in situations where a
conversion command fails.
`toco_AFTER_TRANSFORMATIONS.dot` represents the graph after all transformations
`toco_AFTER_TRANSFORMATIONS.dot` represents the model after all transformations
were applied to it, just before it is exported. Typically, this is a much
smaller graph with more information about each node.
smaller model with more information about each node.
As before, these can be rendered to PDFs:
@ -316,15 +299,15 @@ Sample output files can be seen here below. Note that it is the same
<tr><td>before</td><td>after</td></tr>
</table>
### Graph "video" logging
### Video logging
When `--dump_graphviz_dir` is used, one may additionally pass
`--dump_graphviz_video`. This causes a graph visualization to be dumped after
each individual graph transformation, resulting in thousands of files.
`--dump_graphviz_video`. This causes a model visualization to be dumped after
each individual model transformation, resulting in thousands of files.
Typically, one would then bisect into these files to understand when a given
change was introduced in the graph.
change was introduced in the model.
### Legend for the graph visualizations <a name="graphviz_legend"></a>
### Legend for the Visualizations <a name="graphviz_legend"></a>
* Operators are red square boxes with the following hues of red:
* Most operators are

View File

@ -1,42 +1,41 @@
# Converter command line reference
This page is complete reference of command-line flags used by the TensorFlow
Lite Converter's command line starting from TensorFlow 1.9 up until the most
recent build of TensorFlow.
Lite Converter's command line tool.
## High-level flags
The following high level flags specify the details of the input and output
files. The flag `--output_file` is always required. Additionally, either
`--graph_def_file`, `--saved_model_dir` or `--keras_model_file` is required.
`--saved_model_dir`, `--keras_model_file` or `--graph_def_file` is required.
* `--output_file`. Type: string. Specifies the full path of the output file.
* `--graph_def_file`. Type: string. Specifies the full path of the input
GraphDef file frozen using
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
* `--saved_model_dir`. Type: string. Specifies the full path to the directory
containing the SavedModel.
* `--keras_model_file`. Type: string. Specifies the full path of the HDF5 file
containing the tf.keras model.
* `--graph_def_file`. Type: string. Specifies the full path of the input
GraphDef file frozen using
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
* `--output_format`. Type: string. Default: `TFLITE`. Specifies the format of
the output file. Allowed values:
* `TFLITE`: TensorFlow Lite FlatBuffer format.
* `TFLITE`: TensorFlow Lite model format.
* `GRAPHVIZ_DOT`: GraphViz `.dot` format containing a visualization of the
graph after graph transformations.
* Note that passing `GRAPHVIZ_DOT` to `--output_format` leads to loss
of TFLite specific transformations. Therefore, the resulting
visualization may not reflect the final set of graph
transformations. To get a final visualization with all graph
transformations use `--dump_graphviz_dir` instead.
of TFLite specific transformations. To get a final visualization
with all graph transformations use `--dump_graphviz_dir` instead.
The following flags specify optional parameters when using SavedModels.
* `--saved_model_tag_set`. Type: string. Default:
[kSavedModelTagServe](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/tag_constants.h).
* `--saved_model_tag_set`. Type: string. Default: "serve" (for more options,
refer to
[tag_constants.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/tag_constants.h)).
Specifies a comma-separated set of tags identifying the MetaGraphDef within
the SavedModel to analyze. All tags in the tag set must be specified.
* `--saved_model_signature_key`. Type: string. Default:
`tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY`.
* `--saved_model_signature_key`. Type: string. Default: "serving_default" (for
more options, refer to
[tf.compat.v1.saved_model.signature_constants](https://www.tensorflow.org/api_docs/python/tf/compat/v1/saved_model/signature_constants)).
Specifies the key identifying the SignatureDef containing inputs and
outputs.
@ -46,9 +45,9 @@ The following flags specify optional parameters when using SavedModels.
file.
* `--input_arrays`. Type: comma-separated list of strings. Specifies the list
of names of input activation tensors.
of names of input tensors.
* `--output_arrays`. Type: comma-separated list of strings. Specifies the list
of names of output activation tensors.
of names of output tensors.
The following flags define properties of the input tensors. Each item in the
`--input_arrays` flag should correspond to each item in the following flags
@ -56,8 +55,7 @@ based on index.
* `--input_shapes`. Type: colon-separated list of comma-separated lists of
integers. Each comma-separated list of integers gives the shape of one of
the input arrays specified in
[TensorFlow convention](https://www.tensorflow.org/guide/tensors#shape).
the input arrays.
* Example: `--input_shapes=1,60,80,3` for a typical vision model means a
batch size of 1, an input image height of 60, an input image width of
80, and an input image depth of 3 (representing RGB channels).
@ -65,24 +63,24 @@ based on index.
has a shape of [2, 3] and "bar" has a shape of [4, 5, 6].
* `--std_dev_values`, `--mean_values`. Type: comma-separated list of floats.
These specify the (de-)quantization parameters of the input array, when it
is quantized. This is only needed if `inference_input_type` is
is quantized. This is only needed if `inference_input_type` is `INT8` or
`QUANTIZED_UINT8`.
* The meaning of `mean_values` and `std_dev_values` is as follows: each
quantized value in the quantized input array will be interpreted as a
mathematical real number (i.e. as an input activation value) according
to the following formula:
* `real_value = (quantized_input_value - mean_value) / std_dev_value`.
* `real_value = (quantized_value - mean_value) / std_dev_value`.
* When performing float inference (`--inference_type=FLOAT`) on a
quantized input, the quantized input would be immediately dequantized by
the inference code according to the above formula, before proceeding
with float inference.
* When performing quantized inference
(`--inference_type=QUANTIZED_UINT8`), no dequantization is performed by
the inference code. However, the quantization parameters of all arrays,
including those of the input arrays as specified by `mean_value` and
`std_dev_value`, determine the fixed-point multipliers used in the
quantized inference code. `mean_value` must be an integer when
performing quantized inference.
* When performing quantized inference (`inference_type`
is`INT8`or`QUANTIZED_UINT8`), no dequantization is performed by the
inference code. However, the quantization parameters of all arrays,
including those of the input arrays as specified
by`mean_value`and`std_dev_value`, determine the fixed-point multipliers
used in the quantized inference code.`mean_value` must be an integer
when performing quantized inference.
## Transformation flags
@ -92,7 +90,7 @@ have.
* `--inference_type`. Type: string. Default: `FLOAT`. Data type of all
real-number arrays in the output file except for input arrays (defined by
`--inference_input_type`). Must be `{FLOAT, QUANTIZED_UINT8}`.
`--inference_input_type`). Must be `{FLOAT, INT8, QUANTIZED_UINT8}`.
This flag only impacts real-number arrays including float and quantized
arrays. This excludes all other data types including plain integer arrays
@ -101,6 +99,9 @@ have.
* If `FLOAT`, then real-numbers arrays will be of type float in the output
file. If they were quantized in the input file, then they get
dequantized.
* If `INT8`, then real-numbers arrays will be quantized as int8 in the
output file. If they were float in the input file, then they get
quantized.
* If `QUANTIZED_UINT8`, then real-numbers arrays will be quantized as
uint8 in the output file. If they were float in the input file, then
they get quantized.
@ -109,7 +110,8 @@ have.
array in the output file. By default the `--inference_type` is used as type
of all of the input arrays. Flag is primarily intended for generating a
float-point graph with a quantized input array. A Dequantized operator is
added immediately after the input array. Must be `{FLOAT, QUANTIZED_UINT8}`.
added immediately after the input array. Must be `{FLOAT, INT8,
QUANTIZED_UINT8}`.
The flag is typically used for vision models taking a bitmap as input but
requiring floating-point inference. For such image models, the uint8 input

View File

@ -1,48 +1,48 @@
# TensorFlow Lite converter
The TensorFlow Lite converter is used to convert TensorFlow models into an
optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that
they can be used by the TensorFlow Lite interpreter.
The TensorFlow Lite converter takes a TensorFlow model and generates a
TensorFlow Lite model, which is an optimized
[FlatBuffer](https://google.github.io/flatbuffers/) (identified by the `.tflite`
file extension).
Note: This page contains documentation on the converter API for TensorFlow 1.x.
The API for TensorFlow 2.0 is available
[here](https://www.tensorflow.org/lite/convert/).
## FlatBuffers
## Options
The TensorFlow Lite Converter can be used in two ways:
* [Python API](python_api.md) (**recommended**): Using the Python API makes it
easier to convert models as part of a model development pipeline and helps
mitigate compatibility issues early on.
* [Command line](cmdline_examples.md)
## Workflow
### Why use the 'FlatBuffer' format?
FlatBuffer is an efficient open-source cross-platform serialization library. It
is similar to
[protocol buffers](https://developers.google.com/protocol-buffers), with the
distinction that FlatBuffers do not need a parsing/unpacking step to a secondary
representation before data can be accessed, avoiding per-object memory
allocation. The code footprint of FlatBuffers is an order of magnitude smaller
than protocol buffers.
is similar to [protocol buffers](https://developers.google.com/protocol-buffers)
used in the TensorFlow model format, with the distinction that FlatBuffers do
not need a parsing/unpacking step to a secondary representation before data can
be accessed, avoiding per-object memory allocation. The code footprint of
FlatBuffers is an order of magnitude smaller than protocol buffers.
## From model training to device deployment
The TensorFlow Lite converter generates a TensorFlow Lite
[FlatBuffer](https://google.github.io/flatbuffers/) file (`.tflite`) from a
TensorFlow model.
### Convert the model
The converter supports the following input formats:
* [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
* Frozen `GraphDef`: Models generated by
* `tf.keras` H5 models.
* Frozen `GraphDef` models generated using
[freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py).
* `tf.keras` HDF5 models.
* Any model taken from a `tf.Session` (Python API only).
* `tf.Session` models (Python API only).
The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and
the TensorFlow Lite interpreter uses the compressed model for on-device
inference. This conversion process is shown in the diagram below:
### Run inference
The TensorFlow Lite model is then deployed to a client device, and the
TensorFlow Lite interpreter uses the compressed model for on-device inference.
This conversion process is shown in the diagram below:
![TFLite converter workflow](../images/convert/workflow.svg)
## Options
The TensorFlow Lite Converter can be used from either of these two options:
* [Python](python_api.md) (**Preferred**): Using the Python API makes it
easier to convert models as part of a model development pipeline, and helps
mitigate [compatibility](../tf_ops_compatibility.md) issues early on.
* [Command line](cmdline_examples.md)

View File

@ -1,119 +1,41 @@
# Converter Python API guide
This page describes how to convert TensorFlow models into the TensorFlow Lite
format using the TensorFlow Lite Converter Python API.
format using the
[`tf.compat.v1.lite.TFLiteConverter`](https://www.tensorflow.org/api_docs/python/tf/compat/v1/lite/TFLiteConverter)
Python API. It provides the following class methods based on the original format
of the model:
If you're looking for information about how to run a TensorFlow Lite model,
see [TensorFlow Lite inference](../guide/inference.md).
* `tf.compat.v1.lite.TFLiteConverter.from_keras_model_file()`: Converts a
[Keras](https://www.tensorflow.org/guide/keras/overview) model file.
* `tf.compat.v1.lite.TFLiteConverter.from_saved_model()`: Converts a
[SavedModel](https://www.tensorflow.org/guide/saved_model).
* `tf.compat.v1.lite.TFLiteConverter.from_session()`: Converts a GraphDef from
a session.
* `tf.compat.v1.lite.TFLiteConverter.from_frozen_graph()`: Converts a Frozen
GraphDef from a file. If you have checkpoints, then first convert it to a
Frozen GraphDef file and then use this API as shown [here](#checkpoints).
Note: This page describes the converter in the TensorFlow nightly release,
installed using `pip install tf-nightly`. For docs describing older versions
reference ["Converting models from TensorFlow 1.12"](#pre_tensorflow_1.12).
## High-level overview
While the TensorFlow Lite Converter can be used from the command line, it is
often convenient to use in a Python script as part of the model development
pipeline. This allows you to know early that you are designing a model that can
be targeted to devices with mobile.
## API
The API for converting TensorFlow models to TensorFlow Lite is
`tf.lite.TFLiteConverter`, which provides class methods based on the original
format of the model. For example, `TFLiteConverter.from_session()` is available
for GraphDefs, `TFLiteConverter.from_saved_model()` is available for
SavedModels, and `TFLiteConverter.from_keras_model_file()` is available for
`tf.Keras` files.
Example usages for simple float-point models are shown in
[Basic Examples](#basic). Examples usages for more complex models is shown in
[Complex Examples](#complex).
In the following sections, we discuss [basic examples](#basic) and
[complex examples](#complex).
## Basic examples <a name="basic"></a>
The following section shows examples of how to convert a basic float-point model
from each of the supported data formats into a TensorFlow Lite FlatBuffers.
The following section shows examples of how to convert a basic model from each
of the supported model formats into a TensorFlow Lite model.
### Exporting a GraphDef from tf.Session <a name="basic_graphdef_sess"></a>
The following example shows how to convert a TensorFlow GraphDef into a
TensorFlow Lite FlatBuffer from a `tf.Session` object.
### Convert a Keras model file <a name="basic_keras_file"></a>
```python
import tensorflow as tf
img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
var = tf.get_variable("weights", dtype=tf.float32, shape=(1, 64, 64, 3))
val = img + var
out = tf.identity(val, name="out")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
### Exporting a GraphDef from file <a name="basic_graphdef_file"></a>
The following example shows how to convert a TensorFlow GraphDef into a
TensorFlow Lite FlatBuffer when the GraphDef is stored in a file. Both `.pb` and
`.pbtxt` files are accepted.
The example uses
[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz).
The function only supports GraphDefs frozen using
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
```python
import tensorflow as tf
graph_def_file = "/path/to/Downloads/mobilenet_v1_1.0_224/frozen_graph.pb"
input_arrays = ["input"]
output_arrays = ["MobilenetV1/Predictions/Softmax"]
converter = tf.lite.TFLiteConverter.from_frozen_graph(
graph_def_file, input_arrays, output_arrays)
converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file("keras_model.h5")
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
### Exporting a SavedModel <a name="basic_savedmodel"></a>
The following example shows how to convert a SavedModel into a TensorFlow Lite
FlatBuffer.
```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
For more complex SavedModels, the optional parameters that can be passed into
`TFLiteConverter.from_saved_model()` are `input_arrays`, `input_shapes`,
`output_arrays`, `tag_set` and `signature_key`. Details of each parameter are
available by running `help(tf.lite.TFLiteConverter)`.
### Exporting a tf.keras File <a name="basic_keras_file"></a>
The following example shows how to convert a `tf.keras` model into a TensorFlow
Lite FlatBuffer. This example requires
[`h5py`](http://docs.h5py.org/en/latest/build.html) to be installed.
```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model_file("keras_model.h5")
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
The `tf.keras` file must contain both the model and the weights. A comprehensive
example including model construction can be seen below.
The Keras file contains both the model and the weights. A comprehensive example
is given below.
```python
import numpy as np
@ -134,61 +56,133 @@ y = np.random.random((1, 3, 3))
model.train_on_batch(x, y)
model.predict(x)
# Save tf.keras model in HDF5 format.
# Save tf.keras model in H5 format.
keras_file = "keras_model.h5"
tf.keras.models.save_model(model, keras_file)
# Convert to TensorFlow Lite model.
converter = tf.lite.TFLiteConverter.from_keras_model_file(keras_file)
converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(keras_file)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
## Complex examples <a name="complex"></a>
### Convert a SavedModel <a name="basic_savedmodel"></a>
For models where the default value of the attributes is not sufficient, the
attribute's values should be set before calling `convert()`. In order to call
any constants use `tf.lite.constants.<CONSTANT_NAME>` as seen below with
`QUANTIZED_UINT8`. Run `help(tf.lite.TFLiteConverter)` in the Python
terminal for detailed documentation on the attributes.
The following example shows how to convert a
[SavedModel](https://www.tensorflow.org/guide/saved_model) into a TensorFlow
Lite model.
Although the examples are demonstrated on GraphDefs containing only constants.
The same logic can be applied irrespective of the input data format.
```python
import tensorflow as tf
### Exporting a quantized GraphDef <a name="complex_quant"></a>
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
The following example shows how to convert a quantized model into a TensorFlow
Lite FlatBuffer.
### Convert a GraphDef from a session <a name="basic_graphdef_sess"></a>
The following example shows how to convert a TensorFlow GraphDef into a
TensorFlow Lite model from a `tf.Session` object.
```python
import tensorflow as tf
img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
val = img + const
out = tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output")
var = tf.get_variable("weights", dtype=tf.float32, shape=(1, 64, 64, 3))
val = img + var
out = tf.identity(val, name="out")
with tf.Session() as sess:
converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
input_arrays = converter.get_input_arrays()
converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} # mean, std_dev
sess.run(tf.global_variables_initializer())
converter = tf.compat.v1.lite.TFLiteConverter.from_session(sess, [img], [out])
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
### Convert a Frozen GraphDef from file <a name="basic_graphdef_file"></a>
## Additional instructions
The example uses
[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz).
### Build from source code <a name="latest_package"></a>
```python
import tensorflow as tf
In order to run the latest version of the TensorFlow Lite Converter Python API,
either install the nightly build with
[pip](https://www.tensorflow.org/install/pip) (recommended) or
[Docker](https://www.tensorflow.org/install/docker), or
[build the pip package from source](https://www.tensorflow.org/install/source).
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb',
# both `.pb` and `.pbtxt` files are accepted.
input_arrays=['input'],
output_arrays=['MobilenetV1/Predictions/Softmax'],
input_shapes={'input' : [1, 224, 224,3]},
)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
### Converting models from TensorFlow 1.12 <a name="pre_tensorflow_1.12"></a>
#### Convert checkpoints <a name="checkpoints"></a>
1. Convert checkpoints to a Frozen GraphDef as follows
(*[reference](https://laid.delanover.com/how-to-freeze-a-graph-in-tensorflow/)*):
* Install [bazel](https://docs.bazel.build/versions/master/install.html)
* Clone the TensorFlow repository: `git clone
https://github.com/tensorflow/tensorflow.git`
* Build freeze graph tool: `bazel build
tensorflow/python/tools:freeze_graph`
* The directory from which you run this should contain a file named
'WORKSPACE'.
* If you're running on Ubuntu 16.04 OS and face issues, update the
command to `bazel build -c opt --copt=-msse4.1 --copt=-msse4.2
tensorflow/python/tools:freeze_graph`
* Run freeze graph tool: `bazel run tensorflow/python/tools:freeze_graph
--input_graph=/path/to/graph.pbtxt --input_binary=false
--input_checkpoint=/path/to/model.ckpt-00010
--output_graph=/path/to/frozen_graph.pb
--output_node_names=name1,name2.....`
* If you have an input `*.pb` file instead of `*.pbtxt`, then replace
`--input_graph=/path/to/graph.pbtxt --input_binary=false` with
`--input_graph=/path/to/graph.pb`
* You can find the output names by exploring the graph using
[Netron](https://github.com/lutzroeder/netron) or
[summarize graph tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs).
2. Now [convert the Frozen GraphDef file](#basic_graphdef_file) to a TensorFlow
Lite model as shown in the example above.
## Complex examples <a name="complex"></a>
For models where the default value of the attributes is not sufficient, the
attribute's values should be set before calling `convert()`. Run
`help(tf.compat.v1.lite.TFLiteConverter)` in the Python terminal for detailed
documentation on the attributes.
### Convert a quantize aware trained model <a name="complex_quant"></a>
The following example shows how to convert a quantize aware trained model into a
TensorFlow Lite model.
The example uses
[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz).
```python
import tensorflow as tf
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(
graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb',
input_arrays=['input'],
output_arrays=['MobilenetV1/Predictions/Softmax'],
input_shapes={'input' : [1, 224, 224,3]},
)
converter.quantized_input_stats = {['input'] : (0., 1.)} # mean, std_dev (input range is [-1, 1])
converter.inference_type = tf.int8 # this is the recommended type.
# converter.inference_input_type=tf.uint8 # optional
# converter.inference_output_type=tf.uint8 # optional
tflite_model = converter.convert()
with open('mobilenet_v1_1.0_224_quantized.tflite', 'wb') as f:
f.write(tflite_model)
```
## Convert models from TensorFlow 1.12 <a name="pre_tensorflow_1.12"></a>
Reference the following table to convert TensorFlow models to TensorFlow Lite in
and before TensorFlow 1.12. Run `help()` to get details of each API.