diff --git a/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md b/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md index 4c001bc7c90..45824abfee5 100644 --- a/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md +++ b/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md @@ -2,175 +2,166 @@ This page shows how to use the TensorFlow Lite Converter in the command line. +_Note: If possible, use the **recommended** [Python API](python_api.md) +instead._ + ## Command-line tools +### Starting from TensorFlow 1.9 + There are two approaches to running the converter in the command line. -* `tflite_convert`: Starting from TensorFlow 1.9, the command-line tool - `tflite_convert` is installed as part of the Python package. All of the - examples below use `tflite_convert` for simplicity. - * Example: `tflite_convert --output_file=...` -* `bazel`: In order to run the latest version of the TensorFlow Lite Converter - either install the nightly build using - [pip](https://www.tensorflow.org/install/pip) or - [clone the TensorFlow repository](https://www.tensorflow.org/install/source) - and use `bazel`. - * Example: `bazel run +* `tflite_convert` (**recommended**): + * *Install*: TensorFlow using + [pip](https://www.tensorflow.org/install/pip). + * *Example*: `tflite_convert --output_file=...` +* `bazel`: + * *Install*: TensorFlow from + [source](https://www.tensorflow.org/install/source). + * *Example*: `bazel run //third_party/tensorflow/lite/python:tflite_convert -- --output_file=...` -### Converting models prior to TensorFlow 1.9 +*All of the following examples use `tflite_convert` for simplicity. +Alternatively, you can replace '`tflite_convert`' with '`bazel run +//tensorflow/lite/python:tflite_convert --`'* + +### Prior to TensorFlow 1.9 The recommended approach for using the converter prior to TensorFlow 1.9 is the -[Python API](python_api.md#pre_tensorflow_1.9). If a command line tool is -desired, the `toco` command line tool was available in TensorFlow 1.7. Enter -`toco --help` in Terminal for additional details on the command-line flags -available. There were no command line tools in TensorFlow 1.8. +[Python API](python_api.md). Only in TensorFlow 1.7, a command line tool `toco` +was available (run `toco --help` for additional details). -## Basic examples +## Usage -The following section shows examples of how to convert a basic float-point model -from each of the supported data formats into a TensorFlow Lite FlatBuffers. +### Setup -### Convert a TensorFlow GraphDef - -The follow example converts a basic TensorFlow GraphDef (frozen by -[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)) -into a TensorFlow Lite FlatBuffer to perform floating-point inference. Frozen -graphs contain the variables stored in Checkpoint files as Const ops. +Before we begin, download the models required to run the examples in this +document: ``` +echo "Download MobileNet V1" curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \ | tar xzv -C /tmp + +echo "Download Inception V1" +curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \ + | tar xzv -C /tmp +``` + +### Basic examples + +The following section shows examples of how to convert a basic model from each +of the supported data formats into a TensorFlow Lite model. + +#### Convert a SavedModel + +``` +tflite_convert \ + --saved_model_dir=/tmp/saved_model \ + --output_file=/tmp/foo.tflite +``` + +#### Convert a tf.keras model + +``` +tflite_convert \ + --keras_model_file=/tmp/keras_model.h5 \ + --output_file=/tmp/foo.tflite +``` + +#### Convert a Frozen GraphDef + +``` tflite_convert \ - --output_file=/tmp/foo.tflite \ --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \ + --output_file=/tmp/foo.tflite \ --input_arrays=input \ --output_arrays=MobilenetV1/Predictions/Reshape_1 ``` -The value for `input_shapes` is automatically determined whenever possible. +Frozen GraphDef models (or frozen graphs) are produced by +[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py) +and require additional flags `--input_arrays` and `--output_arrays` as this +information is not stored in the model format. -### Convert a TensorFlow SavedModel +### Advanced examples -The follow example converts a basic TensorFlow SavedModel into a Tensorflow Lite -FlatBuffer to perform floating-point inference. +#### Convert a quantization aware trained model into a quantized TensorFlow Lite model + +If you have a quantization aware trained model (i.e, a model inserted with +`FakeQuant*` operations which record the (min, max) ranges of tensors in order +to quantize them), then convert it into a quantized TensorFlow Lite model as +shown below: ``` tflite_convert \ + --graph_def_file=/tmp/some_mobilenetv1_quantized_frozen_graph.pb \ --output_file=/tmp/foo.tflite \ - --saved_model_dir=/tmp/saved_model -``` - -[SavedModel](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators) -has fewer required flags than frozen graphs due to access to additional data -contained within the SavedModel. The values for `--input_arrays` and -`--output_arrays` are an aggregated, alphabetized list of the inputs and outputs -in the [SignatureDefs](../../serving/signature_defs.md) within -the -[MetaGraphDef](https://www.tensorflow.org/saved_model#apis_to_build_and_load_a_savedmodel) -specified by `--saved_model_tag_set`. As with the GraphDef, the value for -`input_shapes` is automatically determined whenever possible. - -There is currently no support for MetaGraphDefs without a SignatureDef or for -MetaGraphDefs that use the [`assets/` -directory](https://www.tensorflow.org/guide/saved_model#structure_of_a_savedmodel_directory). - -### Convert a tf.Keras model - -The following example converts a `tf.keras` model into a TensorFlow Lite -Flatbuffer. The `tf.keras` file must contain both the model and the weights. - -``` -tflite_convert \ - --output_file=/tmp/foo.tflite \ - --keras_model_file=/tmp/keras_model.h5 -``` - -## Quantization - -### Convert a TensorFlow GraphDef for quantized inference - -The TensorFlow Lite Converter is compatible with fixed point quantization models -described -[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/quantize/README.md). -These are float models with `FakeQuant*` ops inserted at the boundaries of fused -layers to record min-max range information. This generates a quantized inference -workload that reproduces the quantization behavior that was used during -training. - -The following command generates a quantized TensorFlow Lite FlatBuffer from a -"quantized" TensorFlow GraphDef. - -``` -tflite_convert \ - --output_file=/tmp/foo.tflite \ - --graph_def_file=/tmp/some_quantized_graph.pb \ - --inference_type=QUANTIZED_UINT8 \ --input_arrays=input \ --output_arrays=MobilenetV1/Predictions/Reshape_1 \ - --mean_values=128 \ - --std_dev_values=127 + --inference_type=INT8 \ + --mean_values=-0.5 \ + --std_dev_values=127.7 ``` -### Use \"dummy-quantization\" to try out quantized inference on a float graph +*If you're setting `--inference_type=QUANTIZED_UINT8` then update +`--mean_values=128` and `--std_dev_values=127`* -In order to evaluate the possible benefit of generating a quantized graph, the -converter allows "dummy-quantization" on float graphs. The flags -`--default_ranges_min` and `--default_ranges_max` accept plausible values for -the min-max ranges of the values in all arrays that do not have min-max -information. "Dummy-quantization" will produce lower accuracy but will emulate -the performance of a correctly quantized model. +#### Convert a model with \"dummy-quantization\" into a quantized TensorFlow Lite model + +If you have a regular float model and only want to estimate the benefit of a +quantized model, i.e, estimate the performance of the model as if it were +quantized aware trained, then perform "dummy-quantization" using the flags +`--default_ranges_min` and `--default_ranges_max`. When specified, they will be +used as default (min, max) range for all the tensors that lack (min, max) range +information. This will allow quantization to proceed and help you emulate the +performance of a quantized TensorFlow Lite model but it will have a lower +accuracy. The example below contains a model using Relu6 activation functions. Therefore, a reasonable guess is that most activation ranges should be contained in [0, 6]. ``` -curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \ - | tar xzv -C /tmp tflite_convert \ - --output_file=/tmp/foo.cc \ --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \ - --inference_type=QUANTIZED_UINT8 \ + --output_file=/tmp/foo.tflite \ --input_arrays=input \ --output_arrays=MobilenetV1/Predictions/Reshape_1 \ + --inference_type=INT8 \ + --mean_values=-0.5 \ + --std_dev_values=127.7 --default_ranges_min=0 \ --default_ranges_max=6 \ - --mean_values=128 \ - --std_dev_values=127 ``` -## Specifying input and output arrays +*If you're setting `--inference_type=QUANTIZED_UINT8` then update +`--mean_values=128` and `--std_dev_values=127`* -### Multiple input arrays +#### Convert a model with multiple input arrays The flag `input_arrays` takes in a comma-separated list of input arrays as seen in the example below. This is useful for models or subgraphs with multiple -inputs. +inputs. Note that `--input_shapes` is provided as a colon-separated list. Each +input shape corresponds to the input array at the same position in the +respective list. ``` -curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \ - | tar xzv -C /tmp tflite_convert \ --graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \ --output_file=/tmp/foo.tflite \ - --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \ --input_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_3/MaxPool_0a_3x3/MaxPool,InceptionV1/InceptionV1/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu \ + --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \ --output_arrays=InceptionV1/Logits/Predictions/Reshape_1 ``` -Note that `input_shapes` is provided as a colon-separated list. Each input shape -corresponds to the input array at the same position in the respective list. +#### Convert a model with multiple output arrays -### Multiple output arrays - -The flag `output_arrays` takes in a comma-separated list of output arrays as +The flag `--output_arrays` takes in a comma-separated list of output arrays as seen in the example below. This is useful for models or subgraphs with multiple outputs. ``` -curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \ - | tar xzv -C /tmp tflite_convert \ --graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \ --output_file=/tmp/foo.tflite \ @@ -178,50 +169,45 @@ tflite_convert \ --output_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu ``` -### Specifying subgraphs +### Convert a model by specifying subgraphs Any array in the input file can be specified as an input or output array in -order to extract subgraphs out of an input graph file. The TensorFlow Lite -Converter discards the parts of the graph outside of the specific subgraph. Use -[graph visualizations](#graph_visualizations) to identify the input and output -arrays that make up the desired subgraph. +order to extract subgraphs out of an input model file. The TensorFlow Lite +Converter discards the parts of the model outside of the specific subgraph. Use +[visualization](#visualization) to identify the input and output arrays that +make up the desired subgraph. The follow command shows how to extract a single fused layer out of a TensorFlow GraphDef. ``` -curl https://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz \ - | tar xzv -C /tmp tflite_convert \ --graph_def_file=/tmp/inception_v1_2016_08_28_frozen.pb \ --output_file=/tmp/foo.pb \ - --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \ --input_arrays=InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu,InceptionV1/InceptionV1/Mixed_3b/Branch_3/MaxPool_0a_3x3/MaxPool,InceptionV1/InceptionV1/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu \ + --input_shapes=1,28,28,96:1,28,28,16:1,28,28,192:1,28,28,64 \ --output_arrays=InceptionV1/InceptionV1/Mixed_3b/concat_v2 ``` -Note that the final representation in TensorFlow Lite FlatBuffers tends to have +Note that the final representation in TensorFlow Lite models tends to have coarser granularity than the very fine granularity of the TensorFlow GraphDef representation. For example, while a fully-connected layer is typically -represented as at least four separate ops in TensorFlow GraphDef (Reshape, -MatMul, BiasAdd, Relu...), it is typically represented as a single "fused" op -(FullyConnected) in the converter's optimized representation and in the final -on-device representation. As the level of granularity gets coarser, some -intermediate arrays (say, the array between the MatMul and the BiasAdd in the -TensorFlow GraphDef) are dropped. +represented as at least four separate operations in TensorFlow GraphDef +(Reshape, MatMul, BiasAdd, Relu...), it is typically represented as a single +"fused" op (FullyConnected) in the converter's optimized representation and in +the final on-device representation. As the level of granularity gets coarser, +some intermediate arrays (say, the array between the MatMul and the BiasAdd in +the TensorFlow GraphDef) are dropped. When specifying intermediate arrays as `--input_arrays` and `--output_arrays`, it is desirable (and often required) to specify arrays that are meant to survive -in the final form of the graph, after fusing. These are typically the outputs of +in the final form of the model, after fusing. These are typically the outputs of activation functions (since everything in each layer until the activation function tends to get fused). -## Logging +## Visualization - -## Graph visualizations - -The converter can export a graph to the Graphviz Dot format for easy +The converter can export a model to the Graphviz Dot format for easy visualization using either the `--output_format` flag or the `--dump_graphviz_dir` flag. The subsections below outline the use cases for each. @@ -229,21 +215,20 @@ each. ### Using `--output_format=GRAPHVIZ_DOT` The first way to get a Graphviz rendering is to pass `GRAPHVIZ_DOT` into -`--output_format`. This results in a plausible visualization of the graph. This +`--output_format`. This results in a plausible visualization of the model. This reduces the requirements that exist during conversion from a TensorFlow GraphDef -to a TensorFlow Lite FlatBuffer. This may be useful if the conversion to TFLite -is failing. +to a TensorFlow Lite model. This may be useful if the conversion to TFLite is +failing. ``` -curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \ - | tar xzv -C /tmp tflite_convert \ --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \ --output_file=/tmp/foo.dot \ --output_format=GRAPHVIZ_DOT \ - --input_shape=1,128,128,3 \ --input_arrays=input \ + --input_shape=1,128,128,3 \ --output_arrays=MobilenetV1/Predictions/Reshape_1 + ``` The resulting `.dot` file can be rendered into a PDF as follows: @@ -267,12 +252,10 @@ Example PDF files are viewable online in the next section. The second way to get a Graphviz rendering is to pass the `--dump_graphviz_dir` flag, specifying a destination directory to dump Graphviz rendering to. Unlike the previous approach, this one retains the original output format. This -provides a visualization of the actual graph resulting from a specific +provides a visualization of the actual model resulting from a specific conversion process. ``` -curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_0.50_128_frozen.tgz \ - | tar xzv -C /tmp tflite_convert \ --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \ --output_file=/tmp/foo.tflite \ @@ -283,14 +266,14 @@ tflite_convert \ This generates a few files in the destination directory. The two most important files are `toco_AT_IMPORT.dot` and `/tmp/toco_AFTER_TRANSFORMATIONS.dot`. -`toco_AT_IMPORT.dot` represents the original graph containing only the +`toco_AT_IMPORT.dot` represents the original model containing only the transformations done at import time. This tends to be a complex visualization with limited information about each node. It is useful in situations where a conversion command fails. -`toco_AFTER_TRANSFORMATIONS.dot` represents the graph after all transformations +`toco_AFTER_TRANSFORMATIONS.dot` represents the model after all transformations were applied to it, just before it is exported. Typically, this is a much -smaller graph with more information about each node. +smaller model with more information about each node. As before, these can be rendered to PDFs: @@ -316,15 +299,15 @@ Sample output files can be seen here below. Note that it is the same beforeafter -### Graph "video" logging +### Video logging When `--dump_graphviz_dir` is used, one may additionally pass -`--dump_graphviz_video`. This causes a graph visualization to be dumped after -each individual graph transformation, resulting in thousands of files. +`--dump_graphviz_video`. This causes a model visualization to be dumped after +each individual model transformation, resulting in thousands of files. Typically, one would then bisect into these files to understand when a given -change was introduced in the graph. +change was introduced in the model. -### Legend for the graph visualizations +### Legend for the Visualizations * Operators are red square boxes with the following hues of red: * Most operators are diff --git a/tensorflow/lite/g3doc/r1/convert/cmdline_reference.md b/tensorflow/lite/g3doc/r1/convert/cmdline_reference.md index 8cca69d5963..826bb7afdbb 100644 --- a/tensorflow/lite/g3doc/r1/convert/cmdline_reference.md +++ b/tensorflow/lite/g3doc/r1/convert/cmdline_reference.md @@ -1,42 +1,41 @@ # Converter command line reference This page is complete reference of command-line flags used by the TensorFlow -Lite Converter's command line starting from TensorFlow 1.9 up until the most -recent build of TensorFlow. +Lite Converter's command line tool. ## High-level flags The following high level flags specify the details of the input and output files. The flag `--output_file` is always required. Additionally, either -`--graph_def_file`, `--saved_model_dir` or `--keras_model_file` is required. +`--saved_model_dir`, `--keras_model_file` or `--graph_def_file` is required. * `--output_file`. Type: string. Specifies the full path of the output file. -* `--graph_def_file`. Type: string. Specifies the full path of the input - GraphDef file frozen using - [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). * `--saved_model_dir`. Type: string. Specifies the full path to the directory containing the SavedModel. * `--keras_model_file`. Type: string. Specifies the full path of the HDF5 file containing the tf.keras model. +* `--graph_def_file`. Type: string. Specifies the full path of the input + GraphDef file frozen using + [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). * `--output_format`. Type: string. Default: `TFLITE`. Specifies the format of the output file. Allowed values: - * `TFLITE`: TensorFlow Lite FlatBuffer format. + * `TFLITE`: TensorFlow Lite model format. * `GRAPHVIZ_DOT`: GraphViz `.dot` format containing a visualization of the graph after graph transformations. * Note that passing `GRAPHVIZ_DOT` to `--output_format` leads to loss - of TFLite specific transformations. Therefore, the resulting - visualization may not reflect the final set of graph - transformations. To get a final visualization with all graph - transformations use `--dump_graphviz_dir` instead. + of TFLite specific transformations. To get a final visualization + with all graph transformations use `--dump_graphviz_dir` instead. The following flags specify optional parameters when using SavedModels. -* `--saved_model_tag_set`. Type: string. Default: - [kSavedModelTagServe](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/tag_constants.h). +* `--saved_model_tag_set`. Type: string. Default: "serve" (for more options, + refer to + [tag_constants.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/saved_model/tag_constants.h)). Specifies a comma-separated set of tags identifying the MetaGraphDef within the SavedModel to analyze. All tags in the tag set must be specified. -* `--saved_model_signature_key`. Type: string. Default: - `tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY`. +* `--saved_model_signature_key`. Type: string. Default: "serving_default" (for + more options, refer to + [tf.compat.v1.saved_model.signature_constants](https://www.tensorflow.org/api_docs/python/tf/compat/v1/saved_model/signature_constants)). Specifies the key identifying the SignatureDef containing inputs and outputs. @@ -46,9 +45,9 @@ The following flags specify optional parameters when using SavedModels. file. * `--input_arrays`. Type: comma-separated list of strings. Specifies the list - of names of input activation tensors. + of names of input tensors. * `--output_arrays`. Type: comma-separated list of strings. Specifies the list - of names of output activation tensors. + of names of output tensors. The following flags define properties of the input tensors. Each item in the `--input_arrays` flag should correspond to each item in the following flags @@ -56,8 +55,7 @@ based on index. * `--input_shapes`. Type: colon-separated list of comma-separated lists of integers. Each comma-separated list of integers gives the shape of one of - the input arrays specified in - [TensorFlow convention](https://www.tensorflow.org/guide/tensors#shape). + the input arrays. * Example: `--input_shapes=1,60,80,3` for a typical vision model means a batch size of 1, an input image height of 60, an input image width of 80, and an input image depth of 3 (representing RGB channels). @@ -65,24 +63,24 @@ based on index. has a shape of [2, 3] and "bar" has a shape of [4, 5, 6]. * `--std_dev_values`, `--mean_values`. Type: comma-separated list of floats. These specify the (de-)quantization parameters of the input array, when it - is quantized. This is only needed if `inference_input_type` is + is quantized. This is only needed if `inference_input_type` is `INT8` or `QUANTIZED_UINT8`. * The meaning of `mean_values` and `std_dev_values` is as follows: each quantized value in the quantized input array will be interpreted as a mathematical real number (i.e. as an input activation value) according to the following formula: - * `real_value = (quantized_input_value - mean_value) / std_dev_value`. + * `real_value = (quantized_value - mean_value) / std_dev_value`. * When performing float inference (`--inference_type=FLOAT`) on a quantized input, the quantized input would be immediately dequantized by the inference code according to the above formula, before proceeding with float inference. - * When performing quantized inference - (`--inference_type=QUANTIZED_UINT8`), no dequantization is performed by - the inference code. However, the quantization parameters of all arrays, - including those of the input arrays as specified by `mean_value` and - `std_dev_value`, determine the fixed-point multipliers used in the - quantized inference code. `mean_value` must be an integer when - performing quantized inference. + * When performing quantized inference (`inference_type` + is`INT8`or`QUANTIZED_UINT8`), no dequantization is performed by the + inference code. However, the quantization parameters of all arrays, + including those of the input arrays as specified + by`mean_value`and`std_dev_value`, determine the fixed-point multipliers + used in the quantized inference code.`mean_value` must be an integer + when performing quantized inference. ## Transformation flags @@ -92,7 +90,7 @@ have. * `--inference_type`. Type: string. Default: `FLOAT`. Data type of all real-number arrays in the output file except for input arrays (defined by - `--inference_input_type`). Must be `{FLOAT, QUANTIZED_UINT8}`. + `--inference_input_type`). Must be `{FLOAT, INT8, QUANTIZED_UINT8}`. This flag only impacts real-number arrays including float and quantized arrays. This excludes all other data types including plain integer arrays @@ -101,6 +99,9 @@ have. * If `FLOAT`, then real-numbers arrays will be of type float in the output file. If they were quantized in the input file, then they get dequantized. + * If `INT8`, then real-numbers arrays will be quantized as int8 in the + output file. If they were float in the input file, then they get + quantized. * If `QUANTIZED_UINT8`, then real-numbers arrays will be quantized as uint8 in the output file. If they were float in the input file, then they get quantized. @@ -109,7 +110,8 @@ have. array in the output file. By default the `--inference_type` is used as type of all of the input arrays. Flag is primarily intended for generating a float-point graph with a quantized input array. A Dequantized operator is - added immediately after the input array. Must be `{FLOAT, QUANTIZED_UINT8}`. + added immediately after the input array. Must be `{FLOAT, INT8, + QUANTIZED_UINT8}`. The flag is typically used for vision models taking a bitmap as input but requiring floating-point inference. For such image models, the uint8 input diff --git a/tensorflow/lite/g3doc/r1/convert/index.md b/tensorflow/lite/g3doc/r1/convert/index.md index 4080689ce26..7a4e8c7bc95 100644 --- a/tensorflow/lite/g3doc/r1/convert/index.md +++ b/tensorflow/lite/g3doc/r1/convert/index.md @@ -1,48 +1,48 @@ # TensorFlow Lite converter -The TensorFlow Lite converter is used to convert TensorFlow models into an -optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that -they can be used by the TensorFlow Lite interpreter. +The TensorFlow Lite converter takes a TensorFlow model and generates a +TensorFlow Lite model, which is an optimized +[FlatBuffer](https://google.github.io/flatbuffers/) (identified by the `.tflite` +file extension). Note: This page contains documentation on the converter API for TensorFlow 1.x. The API for TensorFlow 2.0 is available [here](https://www.tensorflow.org/lite/convert/). -## FlatBuffers +## Options + +The TensorFlow Lite Converter can be used in two ways: + +* [Python API](python_api.md) (**recommended**): Using the Python API makes it + easier to convert models as part of a model development pipeline and helps + mitigate compatibility issues early on. +* [Command line](cmdline_examples.md) + +## Workflow + +### Why use the 'FlatBuffer' format? FlatBuffer is an efficient open-source cross-platform serialization library. It -is similar to -[protocol buffers](https://developers.google.com/protocol-buffers), with the -distinction that FlatBuffers do not need a parsing/unpacking step to a secondary -representation before data can be accessed, avoiding per-object memory -allocation. The code footprint of FlatBuffers is an order of magnitude smaller -than protocol buffers. +is similar to [protocol buffers](https://developers.google.com/protocol-buffers) +used in the TensorFlow model format, with the distinction that FlatBuffers do +not need a parsing/unpacking step to a secondary representation before data can +be accessed, avoiding per-object memory allocation. The code footprint of +FlatBuffers is an order of magnitude smaller than protocol buffers. -## From model training to device deployment - -The TensorFlow Lite converter generates a TensorFlow Lite -[FlatBuffer](https://google.github.io/flatbuffers/) file (`.tflite`) from a -TensorFlow model. +### Convert the model The converter supports the following input formats: * [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators) -* Frozen `GraphDef`: Models generated by +* `tf.keras` H5 models. +* Frozen `GraphDef` models generated using [freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py). -* `tf.keras` HDF5 models. -* Any model taken from a `tf.Session` (Python API only). +* `tf.Session` models (Python API only). -The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and -the TensorFlow Lite interpreter uses the compressed model for on-device -inference. This conversion process is shown in the diagram below: +### Run inference + +The TensorFlow Lite model is then deployed to a client device, and the +TensorFlow Lite interpreter uses the compressed model for on-device inference. +This conversion process is shown in the diagram below: ![TFLite converter workflow](../images/convert/workflow.svg) - -## Options - -The TensorFlow Lite Converter can be used from either of these two options: - -* [Python](python_api.md) (**Preferred**): Using the Python API makes it - easier to convert models as part of a model development pipeline, and helps - mitigate [compatibility](../tf_ops_compatibility.md) issues early on. -* [Command line](cmdline_examples.md) diff --git a/tensorflow/lite/g3doc/r1/convert/python_api.md b/tensorflow/lite/g3doc/r1/convert/python_api.md index 30d65750100..08e34b53630 100644 --- a/tensorflow/lite/g3doc/r1/convert/python_api.md +++ b/tensorflow/lite/g3doc/r1/convert/python_api.md @@ -1,119 +1,41 @@ # Converter Python API guide This page describes how to convert TensorFlow models into the TensorFlow Lite -format using the TensorFlow Lite Converter Python API. +format using the +[`tf.compat.v1.lite.TFLiteConverter`](https://www.tensorflow.org/api_docs/python/tf/compat/v1/lite/TFLiteConverter) +Python API. It provides the following class methods based on the original format +of the model: -If you're looking for information about how to run a TensorFlow Lite model, -see [TensorFlow Lite inference](../guide/inference.md). +* `tf.compat.v1.lite.TFLiteConverter.from_keras_model_file()`: Converts a + [Keras](https://www.tensorflow.org/guide/keras/overview) model file. +* `tf.compat.v1.lite.TFLiteConverter.from_saved_model()`: Converts a + [SavedModel](https://www.tensorflow.org/guide/saved_model). +* `tf.compat.v1.lite.TFLiteConverter.from_session()`: Converts a GraphDef from + a session. +* `tf.compat.v1.lite.TFLiteConverter.from_frozen_graph()`: Converts a Frozen + GraphDef from a file. If you have checkpoints, then first convert it to a + Frozen GraphDef file and then use this API as shown [here](#checkpoints). -Note: This page describes the converter in the TensorFlow nightly release, -installed using `pip install tf-nightly`. For docs describing older versions -reference ["Converting models from TensorFlow 1.12"](#pre_tensorflow_1.12). - - -## High-level overview - -While the TensorFlow Lite Converter can be used from the command line, it is -often convenient to use in a Python script as part of the model development -pipeline. This allows you to know early that you are designing a model that can -be targeted to devices with mobile. - -## API - -The API for converting TensorFlow models to TensorFlow Lite is -`tf.lite.TFLiteConverter`, which provides class methods based on the original -format of the model. For example, `TFLiteConverter.from_session()` is available -for GraphDefs, `TFLiteConverter.from_saved_model()` is available for -SavedModels, and `TFLiteConverter.from_keras_model_file()` is available for -`tf.Keras` files. - -Example usages for simple float-point models are shown in -[Basic Examples](#basic). Examples usages for more complex models is shown in -[Complex Examples](#complex). +In the following sections, we discuss [basic examples](#basic) and +[complex examples](#complex). ## Basic examples -The following section shows examples of how to convert a basic float-point model -from each of the supported data formats into a TensorFlow Lite FlatBuffers. +The following section shows examples of how to convert a basic model from each +of the supported model formats into a TensorFlow Lite model. -### Exporting a GraphDef from tf.Session - -The following example shows how to convert a TensorFlow GraphDef into a -TensorFlow Lite FlatBuffer from a `tf.Session` object. +### Convert a Keras model file ```python import tensorflow as tf -img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3)) -var = tf.get_variable("weights", dtype=tf.float32, shape=(1, 64, 64, 3)) -val = img + var -out = tf.identity(val, name="out") - -with tf.Session() as sess: - sess.run(tf.global_variables_initializer()) - converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out]) - tflite_model = converter.convert() - open("converted_model.tflite", "wb").write(tflite_model) -``` - -### Exporting a GraphDef from file - -The following example shows how to convert a TensorFlow GraphDef into a -TensorFlow Lite FlatBuffer when the GraphDef is stored in a file. Both `.pb` and -`.pbtxt` files are accepted. - -The example uses -[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz). -The function only supports GraphDefs frozen using -[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). - -```python -import tensorflow as tf - -graph_def_file = "/path/to/Downloads/mobilenet_v1_1.0_224/frozen_graph.pb" -input_arrays = ["input"] -output_arrays = ["MobilenetV1/Predictions/Softmax"] - -converter = tf.lite.TFLiteConverter.from_frozen_graph( - graph_def_file, input_arrays, output_arrays) +converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file("keras_model.h5") tflite_model = converter.convert() open("converted_model.tflite", "wb").write(tflite_model) ``` -### Exporting a SavedModel - -The following example shows how to convert a SavedModel into a TensorFlow Lite -FlatBuffer. - -```python -import tensorflow as tf - -converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) -tflite_model = converter.convert() -open("converted_model.tflite", "wb").write(tflite_model) -``` - -For more complex SavedModels, the optional parameters that can be passed into -`TFLiteConverter.from_saved_model()` are `input_arrays`, `input_shapes`, -`output_arrays`, `tag_set` and `signature_key`. Details of each parameter are -available by running `help(tf.lite.TFLiteConverter)`. - -### Exporting a tf.keras File - -The following example shows how to convert a `tf.keras` model into a TensorFlow -Lite FlatBuffer. This example requires -[`h5py`](http://docs.h5py.org/en/latest/build.html) to be installed. - -```python -import tensorflow as tf - -converter = tf.lite.TFLiteConverter.from_keras_model_file("keras_model.h5") -tflite_model = converter.convert() -open("converted_model.tflite", "wb").write(tflite_model) -``` - -The `tf.keras` file must contain both the model and the weights. A comprehensive -example including model construction can be seen below. +The Keras file contains both the model and the weights. A comprehensive example +is given below. ```python import numpy as np @@ -134,61 +56,133 @@ y = np.random.random((1, 3, 3)) model.train_on_batch(x, y) model.predict(x) -# Save tf.keras model in HDF5 format. +# Save tf.keras model in H5 format. keras_file = "keras_model.h5" tf.keras.models.save_model(model, keras_file) # Convert to TensorFlow Lite model. -converter = tf.lite.TFLiteConverter.from_keras_model_file(keras_file) +converter = tf.compat.v1.lite.TFLiteConverter.from_keras_model_file(keras_file) tflite_model = converter.convert() open("converted_model.tflite", "wb").write(tflite_model) ``` -## Complex examples +### Convert a SavedModel -For models where the default value of the attributes is not sufficient, the -attribute's values should be set before calling `convert()`. In order to call -any constants use `tf.lite.constants.` as seen below with -`QUANTIZED_UINT8`. Run `help(tf.lite.TFLiteConverter)` in the Python -terminal for detailed documentation on the attributes. +The following example shows how to convert a +[SavedModel](https://www.tensorflow.org/guide/saved_model) into a TensorFlow +Lite model. -Although the examples are demonstrated on GraphDefs containing only constants. -The same logic can be applied irrespective of the input data format. +```python +import tensorflow as tf -### Exporting a quantized GraphDef +converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir) +tflite_model = converter.convert() +open("converted_model.tflite", "wb").write(tflite_model) +``` -The following example shows how to convert a quantized model into a TensorFlow -Lite FlatBuffer. +### Convert a GraphDef from a session + +The following example shows how to convert a TensorFlow GraphDef into a +TensorFlow Lite model from a `tf.Session` object. ```python import tensorflow as tf img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3)) -const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.]) -val = img + const -out = tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output") +var = tf.get_variable("weights", dtype=tf.float32, shape=(1, 64, 64, 3)) +val = img + var +out = tf.identity(val, name="out") with tf.Session() as sess: - converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out]) - converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 - input_arrays = converter.get_input_arrays() - converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} # mean, std_dev + sess.run(tf.global_variables_initializer()) + converter = tf.compat.v1.lite.TFLiteConverter.from_session(sess, [img], [out]) tflite_model = converter.convert() open("converted_model.tflite", "wb").write(tflite_model) ``` +### Convert a Frozen GraphDef from file -## Additional instructions +The example uses +[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz). -### Build from source code +```python +import tensorflow as tf -In order to run the latest version of the TensorFlow Lite Converter Python API, -either install the nightly build with -[pip](https://www.tensorflow.org/install/pip) (recommended) or -[Docker](https://www.tensorflow.org/install/docker), or -[build the pip package from source](https://www.tensorflow.org/install/source). +converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph( + graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb', + # both `.pb` and `.pbtxt` files are accepted. + input_arrays=['input'], + output_arrays=['MobilenetV1/Predictions/Softmax'], + input_shapes={'input' : [1, 224, 224,3]}, +) +tflite_model = converter.convert() +open("converted_model.tflite", "wb").write(tflite_model) +``` -### Converting models from TensorFlow 1.12 +#### Convert checkpoints + +1. Convert checkpoints to a Frozen GraphDef as follows + (*[reference](https://laid.delanover.com/how-to-freeze-a-graph-in-tensorflow/)*): + + * Install [bazel](https://docs.bazel.build/versions/master/install.html) + * Clone the TensorFlow repository: `git clone + https://github.com/tensorflow/tensorflow.git` + * Build freeze graph tool: `bazel build + tensorflow/python/tools:freeze_graph` + * The directory from which you run this should contain a file named + 'WORKSPACE'. + * If you're running on Ubuntu 16.04 OS and face issues, update the + command to `bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 + tensorflow/python/tools:freeze_graph` + * Run freeze graph tool: `bazel run tensorflow/python/tools:freeze_graph + --input_graph=/path/to/graph.pbtxt --input_binary=false + --input_checkpoint=/path/to/model.ckpt-00010 + --output_graph=/path/to/frozen_graph.pb + --output_node_names=name1,name2.....` + * If you have an input `*.pb` file instead of `*.pbtxt`, then replace + `--input_graph=/path/to/graph.pbtxt --input_binary=false` with + `--input_graph=/path/to/graph.pb` + * You can find the output names by exploring the graph using + [Netron](https://github.com/lutzroeder/netron) or + [summarize graph tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs). + +2. Now [convert the Frozen GraphDef file](#basic_graphdef_file) to a TensorFlow + Lite model as shown in the example above. + +## Complex examples + +For models where the default value of the attributes is not sufficient, the +attribute's values should be set before calling `convert()`. Run +`help(tf.compat.v1.lite.TFLiteConverter)` in the Python terminal for detailed +documentation on the attributes. + +### Convert a quantize aware trained model + +The following example shows how to convert a quantize aware trained model into a +TensorFlow Lite model. + +The example uses +[Mobilenet_1.0_224](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz). + +```python +import tensorflow as tf + +converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph( + graph_def_file='/path/to/mobilenet_v1_1.0_224/frozen_graph.pb', + input_arrays=['input'], + output_arrays=['MobilenetV1/Predictions/Softmax'], + input_shapes={'input' : [1, 224, 224,3]}, +) +converter.quantized_input_stats = {['input'] : (0., 1.)} # mean, std_dev (input range is [-1, 1]) +converter.inference_type = tf.int8 # this is the recommended type. +# converter.inference_input_type=tf.uint8 # optional +# converter.inference_output_type=tf.uint8 # optional +tflite_model = converter.convert() +with open('mobilenet_v1_1.0_224_quantized.tflite', 'wb') as f: + f.write(tflite_model) +``` + +## Convert models from TensorFlow 1.12 Reference the following table to convert TensorFlow models to TensorFlow Lite in and before TensorFlow 1.12. Run `help()` to get details of each API.