TensorFlow Lite documentation

PiperOrigin-RevId: 246953700
This commit is contained in:
A. Unique TensorFlower 2019-05-06 21:35:39 -07:00 committed by TensorFlower Gardener
parent 60524c4167
commit 43c7b99a10
6 changed files with 353 additions and 401 deletions

View File

@ -1,15 +1,23 @@
# TensorFlow Lite converter # TensorFlow Lite converter
TensorFlow Lite uses the optimized The TensorFlow Lite converter is used to convert TensorFlow models into an
[FlatBuffer](https://google.github.io/flatbuffers/) format to represent graphs. optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that
Therefore, a TensorFlow model they can be used by the TensorFlow Lite interpreter.
([protocol buffer](https://developers.google.com/protocol-buffers/)) needs to be
converted into a `FlatBuffer` file before deploying to clients.
Note: This page contains documentation on the converter API for TensorFlow 1.x. Note: This page contains documentation on the converter API for TensorFlow 1.x.
The API for TensorFlow 2.0 is available The API for TensorFlow 2.0 is available
[here](https://www.tensorflow.org/lite/r2/convert/). [here](https://www.tensorflow.org/lite/r2/convert/).
## FlatBuffers
FlatBuffer is an efficient open-source cross-platform serialization library. It
is similar to
[protocol buffers](https://developers.google.com/protocol-buffers), with the
distinction that FlatBuffers do not need a parsing/unpacking step to a secondary
representation before data can be accessed, avoiding per-object memory
allocation. The code footprint of FlatBuffers is an order of magnitude smaller
than protocol buffers.
## From model training to device deployment ## From model training to device deployment
The TensorFlow Lite converter generates a TensorFlow Lite The TensorFlow Lite converter generates a TensorFlow Lite
@ -20,14 +28,13 @@ The converter supports the following input formats:
* [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators) * [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
* Frozen `GraphDef`: Models generated by * Frozen `GraphDef`: Models generated by
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). [freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py).
* `tf.keras` HDF5 models. * `tf.keras` HDF5 models.
* Any model taken from a `tf.Session` (Python API only). * Any model taken from a `tf.Session` (Python API only).
The TensorFlow Lite `FlatBuffer` file is then deployed to a client device The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and
(generally a mobile or embedded device), and the TensorFlow Lite interpreter the TensorFlow Lite interpreter uses the compressed model for on-device
uses the compressed model for on-device inference. This conversion process is inference. This conversion process is shown in the diagram below:
shown in the diagram below:
![TFLite converter workflow](../images/convert/workflow.svg) ![TFLite converter workflow](../images/convert/workflow.svg)

View File

@ -1,270 +1,286 @@
# Get started with TensorFlow Lite # Get started with TensorFlow Lite
Using a TensorFlow Lite model in your mobile app requires multiple TensorFlow Lite provides all the tools you need to convert and run TensorFlow
considerations: you must choose a pre-trained or custom model, convert the model models on mobile, embedded, and IoT devices. The following guide walks through
to a TensorFLow Lite format, and finally, integrate the model in your app. each step of the developer workflow and provides links to further instructions.
## 1. Choose a model ## 1. Choose a model
Depending on the use case, you can choose one of the popular open-sourced models, <a id="1_choose_a_model"></a>
such as *InceptionV3* or *MobileNets*, and re-train these models with a custom
data set or even build your own custom model. TensorFlow Lite allows you to run TensorFlow models on a wide range of devices.
A TensorFlow model is a data structure that contains the logic and knowledge of
a machine learning network trained to solve a particular problem.
There are many ways to obtain a TensorFlow model, from using pre-trained models
to training your own. To use a model with TensorFlow Lite it must be converted
into a special format. This is explained in section 2,
[Convert the model](#2_convert_the_model_format).
Note: Not all TensorFlow models will work with TensorFlow Lite, since the
interpreter supports a limited subset of TensorFlow operations. See section 2,
[Convert the model](#2_convert_the_model_format) to learn about compatibility.
### Use a pre-trained model ### Use a pre-trained model
[MobileNets](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html) The TensorFlow Lite team provides a set of pre-trained models that solve a
is a family of mobile-first computer vision models for TensorFlow designed to variety of machine learning problems. These models have been converted to work
effectively maximize accuracy, while taking into consideration the restricted with TensorFlow Lite and are ready to use in your applications.
resources for on-device or embedded applications. MobileNets are small,
low-latency, low-power models parameterized to meet the resource constraints for
a variety of uses. They can be used for classification, detection, embeddings, and
segmentation—similar to other popular large scale models, such as
[Inception](https://arxiv.org/pdf/1602.07261.pdf). Google provides 16 pre-trained
[ImageNet](http://www.image-net.org/challenges/LSVRC/) classification checkpoints
for MobileNets that can be used in mobile projects of all sizes.
[Inception-v3](https://arxiv.org/abs/1512.00567) is an image recognition model The pre-trained models include:
that achieves fairly high accuracy recognizing general objects with 1000 classes,
for example, "Zebra", "Dalmatian", and "Dishwasher". The model extracts general
features from input images using a convolutional neural network and classifies
them based on those features with fully-connected and softmax layers.
[On Device Smart Reply](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html) * [Image classification](../models/image_classification/overview.md)
is an on-device model that provides one-touch replies for incoming text messages * [Object detection](../models/object_detection/overview.md)
by suggesting contextually relevant messages. The model is built specifically for * [Smart reply](../models/smart_reply/overview.md)
memory constrained devices, such as watches and phones, and has been successfully * [Pose estimation](../models/pose_estimation/overview.md)
used in Smart Replies on Android Wear. Currently, this model is Android-specific. * [Segmentation](../models/segmentation/overview.md)
These pre-trained models are [available for download](hosted_models.md). See our full list of pre-trained models in [Models](../models).
### Re-train Inception-V3 or MobileNet for a custom data set #### Models from other sources
These pre-trained models were trained on the *ImageNet* data set which contains There are many other places you can obtain pre-trained TensorFlow models,
1000 predefined classes. If these classes are not sufficient for your use case, including [TensorFlow Hub](https://www.tensorflow.org/hub). In most cases, these
the model will need to be re-trained. This technique is called models will not be provided in the TensorFlow Lite format, and you'll have to
*transfer learning* and starts with a model that has been already trained on a [convert](#2_convert_the_model_format) them before use.
problem, then retrains the model on a similar problem. Deep learning from
scratch can take days, but transfer learning is fairly quick. In order to do
this, you need to generate a custom data set labeled with the relevant classes.
The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) ### Re-train a model (transfer learning)
codelab walks through the re-training process step-by-step. The code supports
both floating point and quantized inference. Transfer learning allows you to take a trained model and re-train it to perform
another task. For example, an
[image classification](../models/image_classification/overview.md) model could
be retrained to recognize new categories of image. Re-training takes less time
and requires less data than training a model from scratch.
You can use transfer learning to customize pre-trained models to your
application. Learn how to perform transfer learning in the
<a href="https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android">Recognize
flowers with TensorFlow</a> codelab.
### Train a custom model ### Train a custom model
A developer may choose to train a custom model using Tensorflow (see the If you have designed and trained your own TensorFlow model, or you have trained
[TensorFlow tutorials](https://www.tensorflow.org/tutorials/) for examples of building and training a model obtained from another source, you should convert it to the TensorFlow
models). If you have already written a model, the first step is to export this Lite format before use.
to a `tf.GraphDef` file. This is required because some formats do not store the
model structure outside the code, and we must communicate with other parts of
the framework. See
[Exporting the Inference Graph](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#save_the_entire_model)
to create file for the custom model.
TensorFlow Lite currently supports a subset of TensorFlow operators. Refer to ## 2. Convert the model
the [TensorFlow Lite & TensorFlow Compatibility Guide](ops_compatibility.md)
for supported operators and their usage. This set of operators will continue to
grow in future Tensorflow Lite releases.
## 2. Convert the model format <a id="2_convert_the_model_format"></a>
The [TensorFlow Lite Converter](../convert/index.md) accepts the following file TensorFlow Lite is designed to execute models efficiently on devices. Some of
formats: this efficiency comes from the use of a special format for storing models.
TensorFlow models must be converted into this format before they can be used by
TensorFlow Lite.
* `SavedModel` — A `GraphDef` and checkpoint with a signature that labels Converting models reduces their file size and introduces optimizations that do
input and output arguments to a model. See the documentation for converting not affect accuracy. Developers can opt to further reduce file size and increase
SavedModels using [Python](../convert/python_api.md#basic_savedmodel) or using speed of execution in exchange for some trade-offs. You can use the TensorFlow
the [command line](../convert/cmdline_examples.md#savedmodel). Lite converter to choose which optimizations to apply.
* `tf.keras` - A HDF5 file containing a model with weights and input and
output arguments generated by `tf.Keras`. See the documentation for
converting HDF5 models using
[Python](../convert/python_api.md#basic_keras_file) or using the
[command line](../convert/cmdline_examples.md#keras).
* `frozen tf.GraphDef` — A subclass of `tf.GraphDef` that does not contain
variables. A `GraphDef` can be converted to a `frozen GraphDef` by taking a
checkpoint and a `GraphDef`, and converting each variable into a constant
using the value retrieved from the checkpoint. Instructions on converting a
`tf.GraphDef` to a TensorFlow Lite model are described in the next
subsection.
### Converting a tf.GraphDef TensorFlow Lite supports a limited subset of TensorFlow operations, so not all
models can be converted. See [Ops compatibility](#ops-compatibility) for more
information.
TensorFlow models may be saved as a .pb or .pbtxt `tf.GraphDef` file. In order ### TensorFlow Lite converter
to convert the `tf.GraphDef` file to TensorFlow Lite, the model must first be
frozen. This process involves several file formats including the `frozen
GraphDef`:
* `tf.GraphDef` (.pb or .pbtxt) — A protobuf that represents the TensorFlow The [TensorFlow Lite converter](../convert) is a tool that converts trained
training or computation graph. It contains operators, tensors, and variables TensorFlow models into the TensorFlow Lite format. It can also introduce
definitions. optimizations, which are covered in section 4,
* *checkpoint* (.ckpt) — Serialized variables from a TensorFlow graph. Since [Optimize your model](#4_optimize_your_model_optional).
this does not contain a graph structure, it cannot be interpreted by itself.
* *TensorFlow Lite model* (.tflite) — A serialized
[FlatBuffer](https://google.github.io/flatbuffers/) that contains TensorFlow
Lite operators and tensors for the TensorFlow Lite interpreter.
You must have checkpoints that contain trained weights. The `tf.GraphDef` file The converter is available as a Python API. The following example shows a
only contains the structure of the graph. The process of merging the checkpoint TensorFlow `SavedModel` being converted into the TensorFlow Lite format:
values with the graph structure is called *freezing the graph*.
`tf.GraphDef` and checkpoint files for MobileNet models are available ```python
[here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md). import tensorflow as tf
To freeze the graph, use the following command (changing the arguments): converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
``` open("converted_model.tflite", "wb").write(tflite_model)
freeze_graph --input_graph=/tmp/mobilenet_v1_224.pb \
--input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \
--input_binary=true \
--output_graph=/tmp/frozen_mobilenet_v1_224.pb \
--output_node_names=MobileNetV1/Predictions/Reshape_1
``` ```
Set the `input_binary` flag to `True` when reading a binary protobuf, a `.pb` You can [convert TensorFlow 2.0 models](../r2/convert) in a similar way.
file. Set to `False` for a `.pbtxt` file.
Set `input_graph` and `input_checkpoint` to the respective filenames. The The converter can also be used from the
`output_node_names` may not be obvious outside of the code that built the model. [command line](../convert/cmdline_examples), but the Python API is recommended.
The easiest way to find them is to visualize the graph, either with
[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) or
`graphviz`.
The frozen `GraphDef` is now ready for conversion to the `FlatBuffer` format ### Options
(.tflite) for use on Android or iOS devices. For Android, the TensorFlow Lite
Converter tool supports both float and quantized models. To convert the frozen
`GraphDef` to the .tflite format use a command similar to the following:
``` The converter can convert from a variety of input types.
tflite_convert \
--output_file=/tmp/mobilenet_v1_1.0_224.tflite \
--graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
--input_arrays=input \
--output_arrays=MobilenetV1/Predictions/Reshape_1
```
The When [converting TensorFlow 1.x models](../convert/python_api), these are:
[frozen_graph.pb](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz)
file used here is available for download. Setting the `input_array` and
`output_array` arguments is not straightforward. The easiest way to find these
values is to explore the graph using
[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard). Reuse
the arguments for specifying the output nodes for inference in the
`freeze_graph` step.
### Full converter reference * [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model)
* Frozen GraphDef (models generated by
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py))
* [Keras](https://keras.io) HDF5 models
* Models taken from a `tf.Session`
The [TensorFlow Lite Converter](../convert/index.md) can be When [converting TensorFlow 2.x models](../r2/convert/python_api), these are:
[Python](../convert/python_api.md) or from the
[command line](../convert/cmdline_examples.md). This allows you to integrate the * [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model)
conversion step into the model design workflow, ensuring the model is easy to * [`tf.keras` models](https://www.tensorflow.org/alpha/guide/keras/overview)
convert to a mobile inference graph. * [Concrete functions](../r2/convert/concrete_function.md)
The converter can be configured to apply various optimizations that can improve
performance or reduce file size. This is covered in section 4,
[Optimize your model](#4_optimize_your_model_optional).
### Ops compatibility ### Ops compatibility
Refer to the [ops compatibility guide](ops_compatibility.md) for TensorFlow Lite currently supports a [limited subset](ops_compatibility.md) of
troubleshooting help, and if that doesn't help, please TensorFlow operations. The long term goal is for all TensorFlow operations to be
[file an issue](https://github.com/tensorflow/tensorflow/issues). supported.
### Graph Visualization tool If the model you wish to convert contains unsupported operations, you can use
[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
will result in a larger binary being deployed to devices.
The [development repo](https://github.com/tensorflow/tensorflow) contains a tool ## 3. Run inference with the model
to visualize TensorFlow Lite models after conversion. To build the
[visualize.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/visualize.py)
tool:
```sh <a id="3_use_the_tensorflow_lite_model_for_inference_in_a_mobile_app"></a>
bazel run tensorflow/lite/tools:visualize -- model.tflite model_viz.html
*Inference* is the process of running data through a model to obtain
predictions. It requires a model, an interpreter, and input data.
### TensorFlow Lite interpreter
The [TensorFlow Lite interpreter](inference.md) is a library that takes a model
file, executes the operations it defines on input data, and provides access to
the output.
The interpreter works across multiple platforms and provides a simple API for
running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.
The following code shows the interpreter being invoked from Java:
```java
try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) {
interpreter.run(input, output);
}
``` ```
This generates an interactive HTML page listing subgraphs, operations, and a ### GPU acceleration and Delegates
graph visualization.
## 3. Use the TensorFlow Lite model for inference in a mobile app Some devices provide hardware acceleration for machine learning operations. For
example, most mobile phones have GPUs, which can perform floating point matrix
operations faster than a CPU.
After completing the prior steps, you should now have a `.tflite` model file. The speed-up can be substantial. For example, a MobileNet v1 image
classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration
is used.
### Android The TensorFlow Lite interpreter can be configured with
[Delegates](../performance/delegates.md) to make use of hardware acceleration on
different devices. The [GPU Delegate](../performance/gpu.md) allows the
interpreter to run appropriate operations on the device's GPU.
Since Android apps are written in Java and the core TensorFlow library is in C++, The following code shows the GPU Delegate being used from Java:
a JNI library is provided as an interface. This is only meant for inference—it
provides the ability to load a graph, set up inputs, and run the model to
calculate outputs.
The open source Android demo app uses the JNI interface and is available ```java
[on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo/app). GpuDelegate delegate = new GpuDelegate();
You can also download a Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
[prebuilt APK](http://download.tensorflow.org/deps/tflite/TfLiteCameraDemo.apk). Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options);
See the <a href="./android.md">Android demo</a> guide for details. try {
interpreter.run(input, output);
}
```
The <a href="./android.md">Android mobile</a> guide has instructions for To add support for new hardware accelerators you can
installing TensorFlow on Android and setting up `bazel` and Android Studio. [define your own delegate](../performance/delegates.md#how_to_add_a_delegate).
### iOS ### Android and iOS
To integrate a TensorFlow model in an iOS app, see the The TensorFlow Lite interpreter is easy to use from both major mobile platforms.
[TensorFlow Lite for iOS](ios.md) guide and <a href="./ios.md">iOS demo</a> To get started, explore the [Android quickstart](android.md) and
guide. [iOS quickstart](ios.md) guides.
[Example applications](https://www.tensorflow.org/lite/examples) are available
for both platforms.
#### Core ML support To obtain the required libraries, Android developers should use the
[TensorFlow Lite AAR](android.md#use_the_tensorflow_lite_aar_from_jcenter). iOS
developers should use the
[CocoaPods for Swift or Objective-C](ios.md#add_tensorflow_lite_to_your_swift_or_objective-c_project).
Core ML is a machine learning framework used in Apple products. In addition to ### Linux
using Tensorflow Lite models directly in your applications, you can convert
trained Tensorflow models to the
[CoreML](https://developer.apple.com/machine-learning/) format for use on Apple
devices. To use the converter, refer to the
[Tensorflow-CoreML converter documentation](https://github.com/tf-coreml/tf-coreml).
### ARM32 and ARM64 Linux Embedded Linux is an important platform for deploying machine learning. We
provide build instructions for both [Raspberry Pi](build_rpi.md) and
[Arm64-based boards](build_arm64.md) such as Odroid C2, Pine64, and NanoPi.
Compile Tensorflow Lite for a Raspberry Pi by following the ### Microcontrollers
[RPi build instructions](build_rpi.md) Compile Tensorflow Lite for a generic aarch64
board such as Odroid C2, Pine64, NanoPi, and others by following the
[ARM64 Linux build instructions](build_arm64.md) This compiles a static
library file (`.a`) used to build your app. There are plans for Python bindings
and a demo app.
## 4. Optimize your model (optional) [TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md) is an
experimental port of TensorFlow Lite aimed at microcontrollers and other devices
with only kilobytes of memory.
There are two options. If you plan to run on CPU, we recommend that you quantize ### Operations
your weights and activation tensors. If the hardware is available, another
option is to run on GPU for massively parallelizable workloads. If your model requires TensorFlow operations that are not yet implemented in
TensorFlow Lite, you can use [TensorFlow Select](ops_select.md) to use them in
your model. You'll need to build a custom version of the interpreter that
includes the TensorFlow operations.
You can use [Custom operators](ops_custom.md) to write your own operations, or
port new operations into TensorFlow Lite.
[Operator versions](ops_version.md) allows you to add new functionalities and
parameters into existing operations.
## 4. Optimize your model
<a id="4_optimize_your_model_optional"></a>
TensorFlow Lite provides tools to optimize the size and performance of your
models, often with minimal impact on accuracy. Optimized models may require
slightly more complex training, conversion, or integration.
Machine learning optimization is an evolving field, and TensorFlow Lite's
[Model Optimization Toolkit](#model-optimization-toolkit) is continually growing
as new techniques are developed.
### Performance
The goal of model optimization is to reach the ideal balance of performance,
model size, and accuracy on a given device.
[Performance best practices](../performance/best_practices.md) can help guide
you through this process.
### Quantization ### Quantization
Compress your model size by lowering the precision of the parameters (i.e.
neural network weights) from their training-time 32-bit floating-point
representations into much smaller and efficient 8-bit integer ones.
This will execute the heaviest computations fast in lower precision, but the By reducing the precision of values and operations within a model, quantization
most sensitive ones with higher precision, thus typically resulting in little to can reduce both the size of model and the time required for inference. For many
no final accuracy losses for the task, yet a significant speed-up over pure models, there is only a minimal loss of accuracy.
floating-point execution.
The post-training quantization technique is integrated into the TensorFlow Lite The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The
conversion tool. Getting started is easy: after building your TensorFlow model, following Python code quantizes a `SavedModel` and saves it to disk:
simply enable the post_training_quantize flag in the TensorFlow Lite
conversion tool. Assuming that the saved model is stored in saved_model_dir, the ```python
quantized tflite flatbuffer can be generated in command line: import tensorflow as tf
```
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert() tflite_quant_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_quantized_model)
``` ```
Read the full documentation [here](../performance/post_training_quantization.md) To learn more about quantization, see
and see a tutorial [Post-training quantization](../performance/post_training_quantization.md).
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb).
### GPU ### Model Optimization Toolkit
Run on GPU GPUs are designed to have high throughput for massively
parallelizable workloads. Thus, they are well-suited for deep neural nets, which
consist of a huge number of operators, each working on some input tensor(s) that
can be easily divided into smaller workloads and carried out in parallel,
typically resulting in lower latency.
Another benefit with GPU inference is its power efficiency. GPUs carry out the The [Model Optimization Toolkit](../performance/model_optimization.md) is a set
computations in a very efficient and optimized manner, so that they consume less of tools and techniques designed to make it easy for developers to optimize
power and generate less heat than when the same task is run on CPUs. their models. Many of the techniques can be applied to all TensorFlow models and
are not specific to TensorFlow Lite, but they are especially valuable when
running inference on devices with limited resources.
Read the tutorial [here](../performance/gpu.md) and full documentation [here](../performance/gpu_advanced.md). ## Next steps
Now that you're familiar with TensorFlow Lite, explore some of the following
resources:
* If you're a mobile developer, visit [Android quickstart](android.md) or
[iOS quickstart](ios.md).
* Explore our [pre-trained models](../models).
* Try our [example apps](https://www.tensorflow.org/lite/examples).

View File

@ -1,202 +1,121 @@
# TensorFlow Lite guide # TensorFlow Lite guide
TensorFlow Lite is TensorFlows lightweight solution for mobile and embedded TensorFlow Lite is a set of tools to help developers run TensorFlow models on
devices. It enables on-device machine learning inference with low latency and a mobile, embedded, and IoT devices. It enables on-device machine learning
small binary size. TensorFlow Lite also supports hardware acceleration with the inference with low latency and a small binary size.
[Android Neural Networks
API](https://developer.android.com/ndk/guides/neuralnetworks/index.html).
TensorFlow Lite uses many techniques for achieving low latency such as TensorFlow Lite consists of two main components:
optimizing the kernels for mobile apps, pre-fused activations, and quantized
kernels that allow smaller and faster (fixed-point math) models.
Most of our TensorFlow Lite documentation is [on - The [TensorFlow Lite interpreter](inference.md), which runs specially
GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite) optimized models on many different hardware types, including mobile phones,
for the time being. embedded Linux devices, and microcontrollers.
- The [TensorFlow Lite converter](../convert/index.md), which converts
TensorFlow models into an efficient form for use by the interpreter, and can
introduce optimizations to improve binary size and performance.
## What does TensorFlow Lite contain? ### Machine learning at the edge
TensorFlow Lite supports a set of core operators, both quantized and TensorFlow Lite is designed to make it easy to perform machine learning on
float, which have been tuned for mobile platforms. They incorporate pre-fused devices, "at the edge" of the network, instead of sending data back and forth
activations and biases to further enhance performance and quantized from a server. For developers, performing machine learning on-device can help
accuracy. Additionally, TensorFlow Lite also supports using custom operations in improve:
models.
TensorFlow Lite defines a new model file format, based on * *Latency:* there's no round-trip to a server
[FlatBuffers](https://google.github.io/flatbuffers/). FlatBuffers is an * *Privacy:* no data needs to leave the device
efficient open-source cross-platform serialization library. It is similar to * *Connectivity:* an Internet connection isn't required
[protocol buffers](https://developers.google.com/protocol-buffers/?hl=en), but * *Power consumption:* network connections are power hungry
the primary difference is that FlatBuffers does not need a parsing/unpacking
step to a secondary representation before you can access data, often coupled
with per-object memory allocation. Also, the code footprint of FlatBuffers is an
order of magnitude smaller than protocol buffers.
TensorFlow Lite has a new mobile-optimized interpreter, which has the key goals TensorFlow Lite works with a huge range of devices, from tiny microcontrollers
of keeping apps lean and fast. The interpreter uses a static graph ordering and to powerful mobile phones.
a custom (less-dynamic) memory allocator to ensure minimal load, initialization,
and execution latency.
TensorFlow Lite provides an interface to leverage hardware acceleration, if Key Point: The TensorFlow Lite binary is smaller than 300KB when all supported
available on the device. It does so via the operators are linked, and less than 200KB when using only the operators needed
[Android Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html), for supporting the common image classification models InceptionV3 and MobileNet.
available on Android 8.1 (API level 27) and higher.
## Why do we need a new mobile-specific library? ## Get started
Machine Learning is changing the computing paradigm, and we see an emerging To begin working with TensorFlow Lite, visit [Get started](get_started.md).
trend of new use cases on mobile and embedded devices. Consumer expectations are
also trending toward natural, human-like interactions with their devices, driven
by the camera and voice interaction models.
There are several factors which are fueling interest in this domain: ## Key features
- Innovation at the silicon layer is enabling new possibilities for hardware * *[Interpreter](inference.md) tuned for on-device ML*, supporting a set of
acceleration, and frameworks such as the Android Neural Networks API make it core operators that are optimized for on-device applications, and with a
easy to leverage these. small binary size.
* *Diverse platform support*, covering [Android](android.md) and [iOS](ios.md)
devices, embedded Linux, and microcontrollers, making use of platform APIs
for accelerated inference.
* *APIs for multiple languages* including Java, Swift, Objective-C, C++, and
Python.
* *High performance*, with [hardware acceleration](../performance/gpu.md) on
supported devices, device-optimized kernels, and
[pre-fused activations and biases](ops_compatibility.md).
* *Model optimization tools*, including
[quantization](../performance/post_training_quantization.md), that can
reduce size and increase performance of models without sacrificing accuracy.
* *Efficient model format*, using a [FlatBuffer](../convert/index.md) that is
optimized for small size and portability.
* *[Pre-trained models](../models)* for common machine learning tasks that can
be customized to your application.
* *[Samples and tutorials](https://www.tensorflow.org/examples)* that show you
how to deploy machine learning models on supported platforms.
- Recent advances in real-time computer-vision and spoken language understanding ## Development workflow
have led to mobile-optimized benchmark models being open sourced
(e.g. MobileNets, SqueezeNet).
- Widely-available smart appliances create new possibilities for The workflow for using TensorFlow Lite involves the following steps:
on-device intelligence.
- Interest in stronger user data privacy paradigms where user data does not need 1. **Pick a model**
to leave the mobile device.
- Ability to serve offline use cases, where the device does not need to be Bring your own TensorFlow model, find a model online, or pick a model from
connected to a network. our [Pre-trained models](../models) to drop in or retrain.
We believe the next wave of machine learning applications will have significant 1. **Convert the model**
processing on mobile and embedded devices.
## TensorFlow Lite highlights If you're using a custom model, use the
[TensorFlow Lite converter](../convert/index.md) and a few lines of Python
to convert it to the TensorFlow Lite format.
TensorFlow Lite provides: 1. **Deploy to your device**
- A set of core operators, both quantized and float, many of which have been Run your model on-device with the
tuned for mobile platforms. These can be used to create and run custom [TensorFlow Lite interpreter](inference.md), with APIs in many languages.
models. Developers can also write their own custom operators and use them in
models.
- A new [FlatBuffers](https://google.github.io/flatbuffers/)-based 1. **Optimize your model**
model file format.
- On-device interpreter with kernels optimized for faster execution on mobile. Use our [Model Optimization Toolkit](../performance/model_optimization.md)
to reduce your model's size and increase its efficiency with minimal impact
on accuracy.
- TensorFlow converter to convert TensorFlow-trained models to the TensorFlow To learn more about using TensorFlow Lite in your project, see
Lite format. [Get started](get_started.md).
- Smaller in size: TensorFlow Lite is smaller than 300KB when all supported ## Technical constraints
operators are linked and less than 200KB when using only the operators needed
for supporting InceptionV3 and Mobilenet.
- **Pre-tested models:** TensorFlow Lite plans to provide high performance on-device inference for any
TensorFlow model. However, the TensorFlow Lite interpreter currently supports a
limited subset of TensorFlow operators that have been optimized for on-device
use. This means that some models require additional steps to work with
TensorFlow Lite.
All of the following models are guaranteed to work out of the box: To learn which operators are available, see
[Operator compatibility](ops_compatibility.md).
- Inception V3, a popular model for detecting the dominant objects If your model uses operators that are not yet supported by TensorFlow Lite
present in an image. interpreter, you can use [TensorFlow Select](ops_select.md) to include
TensorFlow operations in your TensorFlow Lite build. However, this will lead to
an increased binary size.
- [MobileNets](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md), TensorFlow Lite does not currently support on-device training, but it is in our
a family of mobile-first computer vision models designed to effectively [Roadmap](roadmap.md), along with other planned improvements.
maximize accuracy while being mindful of the restricted resources for an
on-device or embedded application. They are small, low-latency, low-power
models parameterized to meet the resource constraints of a variety of use
cases. They can be built upon for classification, detection, embeddings
and segmentation. MobileNet models are smaller but [lower in
accuracy](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
than Inception V3.
- On Device Smart Reply, an on-device model which provides one-touch ## Next steps
replies for an incoming text message by suggesting contextually relevant
messages. The model was built specifically for memory constrained devices
such as watches & phones and it has been successfully used to surface
[Smart Replies on Android
Wear](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
to all first-party and third-party apps.
Also see the complete list of Want to keep learning about TensorFlow Lite? Here are some next steps:
[TensorFlow Lite's supported models](hosted_models.md),
including the model sizes, performance numbers, and downloadable model files.
- Quantized versions of the MobileNet model, which runs faster than the * Visit [Get started](get_started.md) to walk through the process of using
non-quantized (float) version on CPU. TensorFlow Lite.
* If you're a mobile developer, visit [Android quickstart](android.md) or
- New Android demo app to illustrate the use of TensorFlow Lite with a quantized [iOS quickstart](ios.md).
MobileNet model for object classification. * Learn about
[TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md).
- Java and C++ API support * Explore our [pre-trained models](../models).
* Try our [example apps](https://www.tensorflow.org/lite/examples).
## Getting Started
We recommend you try out TensorFlow Lite with the pre-tested models indicated
above. If you have an existing model, you will need to test whether your model
is compatible with both the converter and the supported operator set. To test
your model, see the
[documentation on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
### Retrain Inception-V3 or MobileNet for a custom data set
The pre-trained models mentioned above have been trained on the ImageNet data
set, which consists of 1000 predefined classes. If those classes are not
relevant or useful for your use case, you will need to retrain those
models. This technique is called transfer learning, which starts with a model
that has been already trained on a problem and will then be retrained on a
similar problem. Deep learning from scratch can take days, but transfer learning
can be done fairly quickly. In order to do this, you'll need to generate your
custom data set labeled with the relevant classes.
The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
codelab walks through this process step-by-step. The retraining code supports
retraining for both floating point and quantized inference.
## TensorFlow Lite Architecture
The following diagram shows the architectural design of TensorFlow Lite:
<img src="https://www.tensorflow.org/images/tflite-architecture.jpg"
alt="TensorFlow Lite architecture diagram"
style="max-width:600px;">
Starting with a trained TensorFlow model on disk, you'll convert that model to
the TensorFlow Lite file format (`.tflite`) using the TensorFlow Lite
Converter. Then you can use that converted file in your mobile application.
Deploying the TensorFlow Lite model file uses:
- Java API: A convenience wrapper around the C++ API on Android.
- C++ API: Loads the TensorFlow Lite Model File and invokes the Interpreter. The
same library is available on both Android and iOS.
- Interpreter: Executes the model using a set of kernels. The interpreter
supports selective kernel loading; without kernels it is only 100KB, and 300KB
with all the kernels loaded. This is a significant reduction from the 1.5M
required by TensorFlow Mobile.
- On select Android devices, the Interpreter will use the Android Neural
Networks API for hardware acceleration, or default to CPU execution if none
are available.
You can also implement custom kernels using the C++ API that can be used by the
Interpreter.
## Future Work
In future releases, TensorFlow Lite will support more models and built-in
operators, contain performance improvements for both fixed point and floating
point models, improvements to the tools to enable easier developer workflows and
support for other smaller devices and more. As we continue development, we hope
that TensorFlow Lite will greatly simplify the developer experience of targeting
a model for small devices.
Future plans include using specialized machine learning hardware to get the best
possible performance for a particular model on a particular device.
## Next Steps
The TensorFlow Lite [GitHub repository](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
contains additional docs, code samples, and demo applications.

View File

@ -1,16 +1,15 @@
# TensorFlow Lite inference # TensorFlow Lite inference
[TOC] The term *inference* refers to the process of executing a TensorFlow Lite model
on-device in order to make predictions based on input data. Inference is the
final step in using the model on-device.
## Overview Inference for TensorFlow Lite models is run through an interpreter. The
TensorFlow Lite interpreter is designed to be lean and fast. The interpreter
uses a static graph ordering and a custom (less-dynamic) memory allocator to
ensure minimal load, initialization, and execution latency.
TensorFlow Lite inference is the process of executing a TensorFlow Lite This document outlines the various APIs for the interpreter, along with the
model on-device and extracting meaningful results from it. Inference is the
final step in using the model on-device in the
[architecture](index.md#tensorflow_lite_architecture).
Inference for TensorFlow Lite models is run through an interpreter. This
document outlines the various APIs for the interpreter along with the
[supported platforms](#supported-platforms). [supported platforms](#supported-platforms).
### Important Concepts ### Important Concepts
@ -43,19 +42,27 @@ TensorFlow Lite inference on device typically follows the following steps.
present it to their user. present it to their user.
### Supported Platforms ### Supported Platforms
TensorFlow inference APIs are provided for most common mobile/embedded platforms TensorFlow inference APIs are provided for most common mobile/embedded platforms
such as Android, iOS and Linux. such as Android, iOS and Linux.
#### Android #### Android
On Android, TensorFlow Lite inference can be performed using either Java or C++ On Android, TensorFlow Lite inference can be performed using either Java or C++
APIs. The Java APIs provide convenience and can be used directly within your APIs. The Java APIs provide convenience and can be used directly within your
Android Activity classes. The C++ APIs on the other hand may offer more Android Activity classes. The C++ APIs offer more flexibility and speed, but may
flexibility and speed, but may require writing JNI wrappers to move data between require writing JNI wrappers to move data between Java and C++ layers.
Java and C++ layers. You can find an example [here](android.md).
Visit the [Android quickstart](android.md) for a tutorial and example code.
#### iOS #### iOS
TensorFlow Lite provides Swift/Objective C++ APIs for inference on iOS. An
example can be found [here](ios.md). TensorFlow Lite provides native iOS libraries written in
[Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
and
[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).
Visit the [iOS quickstart](ios.md) for a tutorial and example code.
#### Linux #### Linux
On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++ On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++

View File

@ -280,5 +280,5 @@ trees in the original training data. To do this, you will need a set of training
images for each of the new labels you wish to train. images for each of the new labels you wish to train.
Learn how to perform transfer learning in the Learn how to perform transfer learning in the
<a href="https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/">TensorFlow <a href="https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android/#0">Recognize
for Poets</a> codelab. flowers with TensorFlow</a> codelab.

View File

@ -13,12 +13,15 @@ starter model and labels</a>
### Sample application ### Sample application
We have provided a pre-built APK that demonstrates the smart reply model on There is a TensorFlow Lite sample application that demonstrates the smart reply
Android. model on Android.
Go to the <a class="button button-primary" href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply">View
<a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc">GitHub Android example</a>
page</a> for instructions and list of supported ops and functionalities.
Read the
[GitHub page](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc)
to learn how the app works.
## How it works ## How it works