TensorFlow Lite documentation

PiperOrigin-RevId: 246953700
This commit is contained in:
A. Unique TensorFlower 2019-05-06 21:35:39 -07:00 committed by TensorFlower Gardener
parent 60524c4167
commit 43c7b99a10
6 changed files with 353 additions and 401 deletions

View File

@ -1,15 +1,23 @@
# TensorFlow Lite converter
TensorFlow Lite uses the optimized
[FlatBuffer](https://google.github.io/flatbuffers/) format to represent graphs.
Therefore, a TensorFlow model
([protocol buffer](https://developers.google.com/protocol-buffers/)) needs to be
converted into a `FlatBuffer` file before deploying to clients.
The TensorFlow Lite converter is used to convert TensorFlow models into an
optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that
they can be used by the TensorFlow Lite interpreter.
Note: This page contains documentation on the converter API for TensorFlow 1.x.
The API for TensorFlow 2.0 is available
[here](https://www.tensorflow.org/lite/r2/convert/).
## FlatBuffers
FlatBuffer is an efficient open-source cross-platform serialization library. It
is similar to
[protocol buffers](https://developers.google.com/protocol-buffers), with the
distinction that FlatBuffers do not need a parsing/unpacking step to a secondary
representation before data can be accessed, avoiding per-object memory
allocation. The code footprint of FlatBuffers is an order of magnitude smaller
than protocol buffers.
## From model training to device deployment
The TensorFlow Lite converter generates a TensorFlow Lite
@ -20,14 +28,13 @@ The converter supports the following input formats:
* [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
* Frozen `GraphDef`: Models generated by
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
[freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py).
* `tf.keras` HDF5 models.
* Any model taken from a `tf.Session` (Python API only).
The TensorFlow Lite `FlatBuffer` file is then deployed to a client device
(generally a mobile or embedded device), and the TensorFlow Lite interpreter
uses the compressed model for on-device inference. This conversion process is
shown in the diagram below:
The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and
the TensorFlow Lite interpreter uses the compressed model for on-device
inference. This conversion process is shown in the diagram below:
![TFLite converter workflow](../images/convert/workflow.svg)

View File

@ -1,270 +1,286 @@
# Get started with TensorFlow Lite
Using a TensorFlow Lite model in your mobile app requires multiple
considerations: you must choose a pre-trained or custom model, convert the model
to a TensorFLow Lite format, and finally, integrate the model in your app.
TensorFlow Lite provides all the tools you need to convert and run TensorFlow
models on mobile, embedded, and IoT devices. The following guide walks through
each step of the developer workflow and provides links to further instructions.
## 1. Choose a model
Depending on the use case, you can choose one of the popular open-sourced models,
such as *InceptionV3* or *MobileNets*, and re-train these models with a custom
data set or even build your own custom model.
<a id="1_choose_a_model"></a>
TensorFlow Lite allows you to run TensorFlow models on a wide range of devices.
A TensorFlow model is a data structure that contains the logic and knowledge of
a machine learning network trained to solve a particular problem.
There are many ways to obtain a TensorFlow model, from using pre-trained models
to training your own. To use a model with TensorFlow Lite it must be converted
into a special format. This is explained in section 2,
[Convert the model](#2_convert_the_model_format).
Note: Not all TensorFlow models will work with TensorFlow Lite, since the
interpreter supports a limited subset of TensorFlow operations. See section 2,
[Convert the model](#2_convert_the_model_format) to learn about compatibility.
### Use a pre-trained model
[MobileNets](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
is a family of mobile-first computer vision models for TensorFlow designed to
effectively maximize accuracy, while taking into consideration the restricted
resources for on-device or embedded applications. MobileNets are small,
low-latency, low-power models parameterized to meet the resource constraints for
a variety of uses. They can be used for classification, detection, embeddings, and
segmentation—similar to other popular large scale models, such as
[Inception](https://arxiv.org/pdf/1602.07261.pdf). Google provides 16 pre-trained
[ImageNet](http://www.image-net.org/challenges/LSVRC/) classification checkpoints
for MobileNets that can be used in mobile projects of all sizes.
The TensorFlow Lite team provides a set of pre-trained models that solve a
variety of machine learning problems. These models have been converted to work
with TensorFlow Lite and are ready to use in your applications.
[Inception-v3](https://arxiv.org/abs/1512.00567) is an image recognition model
that achieves fairly high accuracy recognizing general objects with 1000 classes,
for example, "Zebra", "Dalmatian", and "Dishwasher". The model extracts general
features from input images using a convolutional neural network and classifies
them based on those features with fully-connected and softmax layers.
The pre-trained models include:
[On Device Smart Reply](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
is an on-device model that provides one-touch replies for incoming text messages
by suggesting contextually relevant messages. The model is built specifically for
memory constrained devices, such as watches and phones, and has been successfully
used in Smart Replies on Android Wear. Currently, this model is Android-specific.
* [Image classification](../models/image_classification/overview.md)
* [Object detection](../models/object_detection/overview.md)
* [Smart reply](../models/smart_reply/overview.md)
* [Pose estimation](../models/pose_estimation/overview.md)
* [Segmentation](../models/segmentation/overview.md)
These pre-trained models are [available for download](hosted_models.md).
See our full list of pre-trained models in [Models](../models).
### Re-train Inception-V3 or MobileNet for a custom data set
#### Models from other sources
These pre-trained models were trained on the *ImageNet* data set which contains
1000 predefined classes. If these classes are not sufficient for your use case,
the model will need to be re-trained. This technique is called
*transfer learning* and starts with a model that has been already trained on a
problem, then retrains the model on a similar problem. Deep learning from
scratch can take days, but transfer learning is fairly quick. In order to do
this, you need to generate a custom data set labeled with the relevant classes.
There are many other places you can obtain pre-trained TensorFlow models,
including [TensorFlow Hub](https://www.tensorflow.org/hub). In most cases, these
models will not be provided in the TensorFlow Lite format, and you'll have to
[convert](#2_convert_the_model_format) them before use.
The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
codelab walks through the re-training process step-by-step. The code supports
both floating point and quantized inference.
### Re-train a model (transfer learning)
Transfer learning allows you to take a trained model and re-train it to perform
another task. For example, an
[image classification](../models/image_classification/overview.md) model could
be retrained to recognize new categories of image. Re-training takes less time
and requires less data than training a model from scratch.
You can use transfer learning to customize pre-trained models to your
application. Learn how to perform transfer learning in the
<a href="https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android">Recognize
flowers with TensorFlow</a> codelab.
### Train a custom model
A developer may choose to train a custom model using Tensorflow (see the
[TensorFlow tutorials](https://www.tensorflow.org/tutorials/) for examples of building and training
models). If you have already written a model, the first step is to export this
to a `tf.GraphDef` file. This is required because some formats do not store the
model structure outside the code, and we must communicate with other parts of
the framework. See
[Exporting the Inference Graph](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#save_the_entire_model)
to create file for the custom model.
If you have designed and trained your own TensorFlow model, or you have trained
a model obtained from another source, you should convert it to the TensorFlow
Lite format before use.
TensorFlow Lite currently supports a subset of TensorFlow operators. Refer to
the [TensorFlow Lite & TensorFlow Compatibility Guide](ops_compatibility.md)
for supported operators and their usage. This set of operators will continue to
grow in future Tensorflow Lite releases.
## 2. Convert the model
## 2. Convert the model format
<a id="2_convert_the_model_format"></a>
The [TensorFlow Lite Converter](../convert/index.md) accepts the following file
formats:
TensorFlow Lite is designed to execute models efficiently on devices. Some of
this efficiency comes from the use of a special format for storing models.
TensorFlow models must be converted into this format before they can be used by
TensorFlow Lite.
* `SavedModel` — A `GraphDef` and checkpoint with a signature that labels
input and output arguments to a model. See the documentation for converting
SavedModels using [Python](../convert/python_api.md#basic_savedmodel) or using
the [command line](../convert/cmdline_examples.md#savedmodel).
* `tf.keras` - A HDF5 file containing a model with weights and input and
output arguments generated by `tf.Keras`. See the documentation for
converting HDF5 models using
[Python](../convert/python_api.md#basic_keras_file) or using the
[command line](../convert/cmdline_examples.md#keras).
* `frozen tf.GraphDef` — A subclass of `tf.GraphDef` that does not contain
variables. A `GraphDef` can be converted to a `frozen GraphDef` by taking a
checkpoint and a `GraphDef`, and converting each variable into a constant
using the value retrieved from the checkpoint. Instructions on converting a
`tf.GraphDef` to a TensorFlow Lite model are described in the next
subsection.
Converting models reduces their file size and introduces optimizations that do
not affect accuracy. Developers can opt to further reduce file size and increase
speed of execution in exchange for some trade-offs. You can use the TensorFlow
Lite converter to choose which optimizations to apply.
### Converting a tf.GraphDef
TensorFlow Lite supports a limited subset of TensorFlow operations, so not all
models can be converted. See [Ops compatibility](#ops-compatibility) for more
information.
TensorFlow models may be saved as a .pb or .pbtxt `tf.GraphDef` file. In order
to convert the `tf.GraphDef` file to TensorFlow Lite, the model must first be
frozen. This process involves several file formats including the `frozen
GraphDef`:
### TensorFlow Lite converter
* `tf.GraphDef` (.pb or .pbtxt) — A protobuf that represents the TensorFlow
training or computation graph. It contains operators, tensors, and variables
definitions.
* *checkpoint* (.ckpt) — Serialized variables from a TensorFlow graph. Since
this does not contain a graph structure, it cannot be interpreted by itself.
* *TensorFlow Lite model* (.tflite) — A serialized
[FlatBuffer](https://google.github.io/flatbuffers/) that contains TensorFlow
Lite operators and tensors for the TensorFlow Lite interpreter.
The [TensorFlow Lite converter](../convert) is a tool that converts trained
TensorFlow models into the TensorFlow Lite format. It can also introduce
optimizations, which are covered in section 4,
[Optimize your model](#4_optimize_your_model_optional).
You must have checkpoints that contain trained weights. The `tf.GraphDef` file
only contains the structure of the graph. The process of merging the checkpoint
values with the graph structure is called *freezing the graph*.
The converter is available as a Python API. The following example shows a
TensorFlow `SavedModel` being converted into the TensorFlow Lite format:
`tf.GraphDef` and checkpoint files for MobileNet models are available
[here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md).
```python
import tensorflow as tf
To freeze the graph, use the following command (changing the arguments):
```
freeze_graph --input_graph=/tmp/mobilenet_v1_224.pb \
--input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \
--input_binary=true \
--output_graph=/tmp/frozen_mobilenet_v1_224.pb \
--output_node_names=MobileNetV1/Predictions/Reshape_1
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
```
Set the `input_binary` flag to `True` when reading a binary protobuf, a `.pb`
file. Set to `False` for a `.pbtxt` file.
You can [convert TensorFlow 2.0 models](../r2/convert) in a similar way.
Set `input_graph` and `input_checkpoint` to the respective filenames. The
`output_node_names` may not be obvious outside of the code that built the model.
The easiest way to find them is to visualize the graph, either with
[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) or
`graphviz`.
The converter can also be used from the
[command line](../convert/cmdline_examples), but the Python API is recommended.
The frozen `GraphDef` is now ready for conversion to the `FlatBuffer` format
(.tflite) for use on Android or iOS devices. For Android, the TensorFlow Lite
Converter tool supports both float and quantized models. To convert the frozen
`GraphDef` to the .tflite format use a command similar to the following:
### Options
```
tflite_convert \
--output_file=/tmp/mobilenet_v1_1.0_224.tflite \
--graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
--input_arrays=input \
--output_arrays=MobilenetV1/Predictions/Reshape_1
```
The converter can convert from a variety of input types.
The
[frozen_graph.pb](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz)
file used here is available for download. Setting the `input_array` and
`output_array` arguments is not straightforward. The easiest way to find these
values is to explore the graph using
[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard). Reuse
the arguments for specifying the output nodes for inference in the
`freeze_graph` step.
When [converting TensorFlow 1.x models](../convert/python_api), these are:
### Full converter reference
* [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model)
* Frozen GraphDef (models generated by
[freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py))
* [Keras](https://keras.io) HDF5 models
* Models taken from a `tf.Session`
The [TensorFlow Lite Converter](../convert/index.md) can be
[Python](../convert/python_api.md) or from the
[command line](../convert/cmdline_examples.md). This allows you to integrate the
conversion step into the model design workflow, ensuring the model is easy to
convert to a mobile inference graph.
When [converting TensorFlow 2.x models](../r2/convert/python_api), these are:
* [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model)
* [`tf.keras` models](https://www.tensorflow.org/alpha/guide/keras/overview)
* [Concrete functions](../r2/convert/concrete_function.md)
The converter can be configured to apply various optimizations that can improve
performance or reduce file size. This is covered in section 4,
[Optimize your model](#4_optimize_your_model_optional).
### Ops compatibility
Refer to the [ops compatibility guide](ops_compatibility.md) for
troubleshooting help, and if that doesn't help, please
[file an issue](https://github.com/tensorflow/tensorflow/issues).
TensorFlow Lite currently supports a [limited subset](ops_compatibility.md) of
TensorFlow operations. The long term goal is for all TensorFlow operations to be
supported.
### Graph Visualization tool
If the model you wish to convert contains unsupported operations, you can use
[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
will result in a larger binary being deployed to devices.
The [development repo](https://github.com/tensorflow/tensorflow) contains a tool
to visualize TensorFlow Lite models after conversion. To build the
[visualize.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/visualize.py)
tool:
## 3. Run inference with the model
```sh
bazel run tensorflow/lite/tools:visualize -- model.tflite model_viz.html
<a id="3_use_the_tensorflow_lite_model_for_inference_in_a_mobile_app"></a>
*Inference* is the process of running data through a model to obtain
predictions. It requires a model, an interpreter, and input data.
### TensorFlow Lite interpreter
The [TensorFlow Lite interpreter](inference.md) is a library that takes a model
file, executes the operations it defines on input data, and provides access to
the output.
The interpreter works across multiple platforms and provides a simple API for
running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.
The following code shows the interpreter being invoked from Java:
```java
try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) {
interpreter.run(input, output);
}
```
This generates an interactive HTML page listing subgraphs, operations, and a
graph visualization.
### GPU acceleration and Delegates
## 3. Use the TensorFlow Lite model for inference in a mobile app
Some devices provide hardware acceleration for machine learning operations. For
example, most mobile phones have GPUs, which can perform floating point matrix
operations faster than a CPU.
After completing the prior steps, you should now have a `.tflite` model file.
The speed-up can be substantial. For example, a MobileNet v1 image
classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration
is used.
### Android
The TensorFlow Lite interpreter can be configured with
[Delegates](../performance/delegates.md) to make use of hardware acceleration on
different devices. The [GPU Delegate](../performance/gpu.md) allows the
interpreter to run appropriate operations on the device's GPU.
Since Android apps are written in Java and the core TensorFlow library is in C++,
a JNI library is provided as an interface. This is only meant for inference—it
provides the ability to load a graph, set up inputs, and run the model to
calculate outputs.
The following code shows the GPU Delegate being used from Java:
The open source Android demo app uses the JNI interface and is available
[on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo/app).
You can also download a
[prebuilt APK](http://download.tensorflow.org/deps/tflite/TfLiteCameraDemo.apk).
See the <a href="./android.md">Android demo</a> guide for details.
```java
GpuDelegate delegate = new GpuDelegate();
Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options);
try {
interpreter.run(input, output);
}
```
The <a href="./android.md">Android mobile</a> guide has instructions for
installing TensorFlow on Android and setting up `bazel` and Android Studio.
To add support for new hardware accelerators you can
[define your own delegate](../performance/delegates.md#how_to_add_a_delegate).
### iOS
### Android and iOS
To integrate a TensorFlow model in an iOS app, see the
[TensorFlow Lite for iOS](ios.md) guide and <a href="./ios.md">iOS demo</a>
guide.
The TensorFlow Lite interpreter is easy to use from both major mobile platforms.
To get started, explore the [Android quickstart](android.md) and
[iOS quickstart](ios.md) guides.
[Example applications](https://www.tensorflow.org/lite/examples) are available
for both platforms.
#### Core ML support
To obtain the required libraries, Android developers should use the
[TensorFlow Lite AAR](android.md#use_the_tensorflow_lite_aar_from_jcenter). iOS
developers should use the
[CocoaPods for Swift or Objective-C](ios.md#add_tensorflow_lite_to_your_swift_or_objective-c_project).
Core ML is a machine learning framework used in Apple products. In addition to
using Tensorflow Lite models directly in your applications, you can convert
trained Tensorflow models to the
[CoreML](https://developer.apple.com/machine-learning/) format for use on Apple
devices. To use the converter, refer to the
[Tensorflow-CoreML converter documentation](https://github.com/tf-coreml/tf-coreml).
### Linux
### ARM32 and ARM64 Linux
Embedded Linux is an important platform for deploying machine learning. We
provide build instructions for both [Raspberry Pi](build_rpi.md) and
[Arm64-based boards](build_arm64.md) such as Odroid C2, Pine64, and NanoPi.
Compile Tensorflow Lite for a Raspberry Pi by following the
[RPi build instructions](build_rpi.md) Compile Tensorflow Lite for a generic aarch64
board such as Odroid C2, Pine64, NanoPi, and others by following the
[ARM64 Linux build instructions](build_arm64.md) This compiles a static
library file (`.a`) used to build your app. There are plans for Python bindings
and a demo app.
### Microcontrollers
## 4. Optimize your model (optional)
[TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md) is an
experimental port of TensorFlow Lite aimed at microcontrollers and other devices
with only kilobytes of memory.
There are two options. If you plan to run on CPU, we recommend that you quantize
your weights and activation tensors. If the hardware is available, another
option is to run on GPU for massively parallelizable workloads.
### Operations
If your model requires TensorFlow operations that are not yet implemented in
TensorFlow Lite, you can use [TensorFlow Select](ops_select.md) to use them in
your model. You'll need to build a custom version of the interpreter that
includes the TensorFlow operations.
You can use [Custom operators](ops_custom.md) to write your own operations, or
port new operations into TensorFlow Lite.
[Operator versions](ops_version.md) allows you to add new functionalities and
parameters into existing operations.
## 4. Optimize your model
<a id="4_optimize_your_model_optional"></a>
TensorFlow Lite provides tools to optimize the size and performance of your
models, often with minimal impact on accuracy. Optimized models may require
slightly more complex training, conversion, or integration.
Machine learning optimization is an evolving field, and TensorFlow Lite's
[Model Optimization Toolkit](#model-optimization-toolkit) is continually growing
as new techniques are developed.
### Performance
The goal of model optimization is to reach the ideal balance of performance,
model size, and accuracy on a given device.
[Performance best practices](../performance/best_practices.md) can help guide
you through this process.
### Quantization
Compress your model size by lowering the precision of the parameters (i.e.
neural network weights) from their training-time 32-bit floating-point
representations into much smaller and efficient 8-bit integer ones.
This will execute the heaviest computations fast in lower precision, but the
most sensitive ones with higher precision, thus typically resulting in little to
no final accuracy losses for the task, yet a significant speed-up over pure
floating-point execution.
By reducing the precision of values and operations within a model, quantization
can reduce both the size of model and the time required for inference. For many
models, there is only a minimal loss of accuracy.
The post-training quantization technique is integrated into the TensorFlow Lite
conversion tool. Getting started is easy: after building your TensorFlow model,
simply enable the post_training_quantize flag in the TensorFlow Lite
conversion tool. Assuming that the saved model is stored in saved_model_dir, the
quantized tflite flatbuffer can be generated in command line:
The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The
following Python code quantizes a `SavedModel` and saves it to disk:
```python
import tensorflow as tf
```
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_quantized_model)
```
Read the full documentation [here](../performance/post_training_quantization.md)
and see a tutorial
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb).
To learn more about quantization, see
[Post-training quantization](../performance/post_training_quantization.md).
### GPU
Run on GPU GPUs are designed to have high throughput for massively
parallelizable workloads. Thus, they are well-suited for deep neural nets, which
consist of a huge number of operators, each working on some input tensor(s) that
can be easily divided into smaller workloads and carried out in parallel,
typically resulting in lower latency.
### Model Optimization Toolkit
Another benefit with GPU inference is its power efficiency. GPUs carry out the
computations in a very efficient and optimized manner, so that they consume less
power and generate less heat than when the same task is run on CPUs.
The [Model Optimization Toolkit](../performance/model_optimization.md) is a set
of tools and techniques designed to make it easy for developers to optimize
their models. Many of the techniques can be applied to all TensorFlow models and
are not specific to TensorFlow Lite, but they are especially valuable when
running inference on devices with limited resources.
Read the tutorial [here](../performance/gpu.md) and full documentation [here](../performance/gpu_advanced.md).
## Next steps
Now that you're familiar with TensorFlow Lite, explore some of the following
resources:
* If you're a mobile developer, visit [Android quickstart](android.md) or
[iOS quickstart](ios.md).
* Explore our [pre-trained models](../models).
* Try our [example apps](https://www.tensorflow.org/lite/examples).

View File

@ -1,202 +1,121 @@
# TensorFlow Lite guide
TensorFlow Lite is TensorFlows lightweight solution for mobile and embedded
devices. It enables on-device machine learning inference with low latency and a
small binary size. TensorFlow Lite also supports hardware acceleration with the
[Android Neural Networks
API](https://developer.android.com/ndk/guides/neuralnetworks/index.html).
TensorFlow Lite is a set of tools to help developers run TensorFlow models on
mobile, embedded, and IoT devices. It enables on-device machine learning
inference with low latency and a small binary size.
TensorFlow Lite uses many techniques for achieving low latency such as
optimizing the kernels for mobile apps, pre-fused activations, and quantized
kernels that allow smaller and faster (fixed-point math) models.
TensorFlow Lite consists of two main components:
Most of our TensorFlow Lite documentation is [on
GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite)
for the time being.
- The [TensorFlow Lite interpreter](inference.md), which runs specially
optimized models on many different hardware types, including mobile phones,
embedded Linux devices, and microcontrollers.
- The [TensorFlow Lite converter](../convert/index.md), which converts
TensorFlow models into an efficient form for use by the interpreter, and can
introduce optimizations to improve binary size and performance.
## What does TensorFlow Lite contain?
### Machine learning at the edge
TensorFlow Lite supports a set of core operators, both quantized and
float, which have been tuned for mobile platforms. They incorporate pre-fused
activations and biases to further enhance performance and quantized
accuracy. Additionally, TensorFlow Lite also supports using custom operations in
models.
TensorFlow Lite is designed to make it easy to perform machine learning on
devices, "at the edge" of the network, instead of sending data back and forth
from a server. For developers, performing machine learning on-device can help
improve:
TensorFlow Lite defines a new model file format, based on
[FlatBuffers](https://google.github.io/flatbuffers/). FlatBuffers is an
efficient open-source cross-platform serialization library. It is similar to
[protocol buffers](https://developers.google.com/protocol-buffers/?hl=en), but
the primary difference is that FlatBuffers does not need a parsing/unpacking
step to a secondary representation before you can access data, often coupled
with per-object memory allocation. Also, the code footprint of FlatBuffers is an
order of magnitude smaller than protocol buffers.
* *Latency:* there's no round-trip to a server
* *Privacy:* no data needs to leave the device
* *Connectivity:* an Internet connection isn't required
* *Power consumption:* network connections are power hungry
TensorFlow Lite has a new mobile-optimized interpreter, which has the key goals
of keeping apps lean and fast. The interpreter uses a static graph ordering and
a custom (less-dynamic) memory allocator to ensure minimal load, initialization,
and execution latency.
TensorFlow Lite works with a huge range of devices, from tiny microcontrollers
to powerful mobile phones.
TensorFlow Lite provides an interface to leverage hardware acceleration, if
available on the device. It does so via the
[Android Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html),
available on Android 8.1 (API level 27) and higher.
Key Point: The TensorFlow Lite binary is smaller than 300KB when all supported
operators are linked, and less than 200KB when using only the operators needed
for supporting the common image classification models InceptionV3 and MobileNet.
## Why do we need a new mobile-specific library?
## Get started
Machine Learning is changing the computing paradigm, and we see an emerging
trend of new use cases on mobile and embedded devices. Consumer expectations are
also trending toward natural, human-like interactions with their devices, driven
by the camera and voice interaction models.
To begin working with TensorFlow Lite, visit [Get started](get_started.md).
There are several factors which are fueling interest in this domain:
## Key features
- Innovation at the silicon layer is enabling new possibilities for hardware
acceleration, and frameworks such as the Android Neural Networks API make it
easy to leverage these.
* *[Interpreter](inference.md) tuned for on-device ML*, supporting a set of
core operators that are optimized for on-device applications, and with a
small binary size.
* *Diverse platform support*, covering [Android](android.md) and [iOS](ios.md)
devices, embedded Linux, and microcontrollers, making use of platform APIs
for accelerated inference.
* *APIs for multiple languages* including Java, Swift, Objective-C, C++, and
Python.
* *High performance*, with [hardware acceleration](../performance/gpu.md) on
supported devices, device-optimized kernels, and
[pre-fused activations and biases](ops_compatibility.md).
* *Model optimization tools*, including
[quantization](../performance/post_training_quantization.md), that can
reduce size and increase performance of models without sacrificing accuracy.
* *Efficient model format*, using a [FlatBuffer](../convert/index.md) that is
optimized for small size and portability.
* *[Pre-trained models](../models)* for common machine learning tasks that can
be customized to your application.
* *[Samples and tutorials](https://www.tensorflow.org/examples)* that show you
how to deploy machine learning models on supported platforms.
- Recent advances in real-time computer-vision and spoken language understanding
have led to mobile-optimized benchmark models being open sourced
(e.g. MobileNets, SqueezeNet).
## Development workflow
- Widely-available smart appliances create new possibilities for
on-device intelligence.
The workflow for using TensorFlow Lite involves the following steps:
- Interest in stronger user data privacy paradigms where user data does not need
to leave the mobile device.
1. **Pick a model**
- Ability to serve offline use cases, where the device does not need to be
connected to a network.
Bring your own TensorFlow model, find a model online, or pick a model from
our [Pre-trained models](../models) to drop in or retrain.
We believe the next wave of machine learning applications will have significant
processing on mobile and embedded devices.
1. **Convert the model**
## TensorFlow Lite highlights
If you're using a custom model, use the
[TensorFlow Lite converter](../convert/index.md) and a few lines of Python
to convert it to the TensorFlow Lite format.
TensorFlow Lite provides:
1. **Deploy to your device**
- A set of core operators, both quantized and float, many of which have been
tuned for mobile platforms. These can be used to create and run custom
models. Developers can also write their own custom operators and use them in
models.
Run your model on-device with the
[TensorFlow Lite interpreter](inference.md), with APIs in many languages.
- A new [FlatBuffers](https://google.github.io/flatbuffers/)-based
model file format.
1. **Optimize your model**
- On-device interpreter with kernels optimized for faster execution on mobile.
Use our [Model Optimization Toolkit](../performance/model_optimization.md)
to reduce your model's size and increase its efficiency with minimal impact
on accuracy.
- TensorFlow converter to convert TensorFlow-trained models to the TensorFlow
Lite format.
To learn more about using TensorFlow Lite in your project, see
[Get started](get_started.md).
- Smaller in size: TensorFlow Lite is smaller than 300KB when all supported
operators are linked and less than 200KB when using only the operators needed
for supporting InceptionV3 and Mobilenet.
## Technical constraints
- **Pre-tested models:**
TensorFlow Lite plans to provide high performance on-device inference for any
TensorFlow model. However, the TensorFlow Lite interpreter currently supports a
limited subset of TensorFlow operators that have been optimized for on-device
use. This means that some models require additional steps to work with
TensorFlow Lite.
All of the following models are guaranteed to work out of the box:
To learn which operators are available, see
[Operator compatibility](ops_compatibility.md).
- Inception V3, a popular model for detecting the dominant objects
present in an image.
If your model uses operators that are not yet supported by TensorFlow Lite
interpreter, you can use [TensorFlow Select](ops_select.md) to include
TensorFlow operations in your TensorFlow Lite build. However, this will lead to
an increased binary size.
- [MobileNets](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md),
a family of mobile-first computer vision models designed to effectively
maximize accuracy while being mindful of the restricted resources for an
on-device or embedded application. They are small, low-latency, low-power
models parameterized to meet the resource constraints of a variety of use
cases. They can be built upon for classification, detection, embeddings
and segmentation. MobileNet models are smaller but [lower in
accuracy](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
than Inception V3.
TensorFlow Lite does not currently support on-device training, but it is in our
[Roadmap](roadmap.md), along with other planned improvements.
- On Device Smart Reply, an on-device model which provides one-touch
replies for an incoming text message by suggesting contextually relevant
messages. The model was built specifically for memory constrained devices
such as watches & phones and it has been successfully used to surface
[Smart Replies on Android
Wear](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
to all first-party and third-party apps.
## Next steps
Also see the complete list of
[TensorFlow Lite's supported models](hosted_models.md),
including the model sizes, performance numbers, and downloadable model files.
Want to keep learning about TensorFlow Lite? Here are some next steps:
- Quantized versions of the MobileNet model, which runs faster than the
non-quantized (float) version on CPU.
- New Android demo app to illustrate the use of TensorFlow Lite with a quantized
MobileNet model for object classification.
- Java and C++ API support
## Getting Started
We recommend you try out TensorFlow Lite with the pre-tested models indicated
above. If you have an existing model, you will need to test whether your model
is compatible with both the converter and the supported operator set. To test
your model, see the
[documentation on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
### Retrain Inception-V3 or MobileNet for a custom data set
The pre-trained models mentioned above have been trained on the ImageNet data
set, which consists of 1000 predefined classes. If those classes are not
relevant or useful for your use case, you will need to retrain those
models. This technique is called transfer learning, which starts with a model
that has been already trained on a problem and will then be retrained on a
similar problem. Deep learning from scratch can take days, but transfer learning
can be done fairly quickly. In order to do this, you'll need to generate your
custom data set labeled with the relevant classes.
The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
codelab walks through this process step-by-step. The retraining code supports
retraining for both floating point and quantized inference.
## TensorFlow Lite Architecture
The following diagram shows the architectural design of TensorFlow Lite:
<img src="https://www.tensorflow.org/images/tflite-architecture.jpg"
alt="TensorFlow Lite architecture diagram"
style="max-width:600px;">
Starting with a trained TensorFlow model on disk, you'll convert that model to
the TensorFlow Lite file format (`.tflite`) using the TensorFlow Lite
Converter. Then you can use that converted file in your mobile application.
Deploying the TensorFlow Lite model file uses:
- Java API: A convenience wrapper around the C++ API on Android.
- C++ API: Loads the TensorFlow Lite Model File and invokes the Interpreter. The
same library is available on both Android and iOS.
- Interpreter: Executes the model using a set of kernels. The interpreter
supports selective kernel loading; without kernels it is only 100KB, and 300KB
with all the kernels loaded. This is a significant reduction from the 1.5M
required by TensorFlow Mobile.
- On select Android devices, the Interpreter will use the Android Neural
Networks API for hardware acceleration, or default to CPU execution if none
are available.
You can also implement custom kernels using the C++ API that can be used by the
Interpreter.
## Future Work
In future releases, TensorFlow Lite will support more models and built-in
operators, contain performance improvements for both fixed point and floating
point models, improvements to the tools to enable easier developer workflows and
support for other smaller devices and more. As we continue development, we hope
that TensorFlow Lite will greatly simplify the developer experience of targeting
a model for small devices.
Future plans include using specialized machine learning hardware to get the best
possible performance for a particular model on a particular device.
## Next Steps
The TensorFlow Lite [GitHub repository](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
contains additional docs, code samples, and demo applications.
* Visit [Get started](get_started.md) to walk through the process of using
TensorFlow Lite.
* If you're a mobile developer, visit [Android quickstart](android.md) or
[iOS quickstart](ios.md).
* Learn about
[TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md).
* Explore our [pre-trained models](../models).
* Try our [example apps](https://www.tensorflow.org/lite/examples).

View File

@ -1,16 +1,15 @@
# TensorFlow Lite inference
[TOC]
The term *inference* refers to the process of executing a TensorFlow Lite model
on-device in order to make predictions based on input data. Inference is the
final step in using the model on-device.
## Overview
Inference for TensorFlow Lite models is run through an interpreter. The
TensorFlow Lite interpreter is designed to be lean and fast. The interpreter
uses a static graph ordering and a custom (less-dynamic) memory allocator to
ensure minimal load, initialization, and execution latency.
TensorFlow Lite inference is the process of executing a TensorFlow Lite
model on-device and extracting meaningful results from it. Inference is the
final step in using the model on-device in the
[architecture](index.md#tensorflow_lite_architecture).
Inference for TensorFlow Lite models is run through an interpreter. This
document outlines the various APIs for the interpreter along with the
This document outlines the various APIs for the interpreter, along with the
[supported platforms](#supported-platforms).
### Important Concepts
@ -43,19 +42,27 @@ TensorFlow Lite inference on device typically follows the following steps.
present it to their user.
### Supported Platforms
TensorFlow inference APIs are provided for most common mobile/embedded platforms
such as Android, iOS and Linux.
#### Android
On Android, TensorFlow Lite inference can be performed using either Java or C++
APIs. The Java APIs provide convenience and can be used directly within your
Android Activity classes. The C++ APIs on the other hand may offer more
flexibility and speed, but may require writing JNI wrappers to move data between
Java and C++ layers. You can find an example [here](android.md).
Android Activity classes. The C++ APIs offer more flexibility and speed, but may
require writing JNI wrappers to move data between Java and C++ layers.
Visit the [Android quickstart](android.md) for a tutorial and example code.
#### iOS
TensorFlow Lite provides Swift/Objective C++ APIs for inference on iOS. An
example can be found [here](ios.md).
TensorFlow Lite provides native iOS libraries written in
[Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
and
[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).
Visit the [iOS quickstart](ios.md) for a tutorial and example code.
#### Linux
On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++

View File

@ -280,5 +280,5 @@ trees in the original training data. To do this, you will need a set of training
images for each of the new labels you wish to train.
Learn how to perform transfer learning in the
<a href="https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/">TensorFlow
for Poets</a> codelab.
<a href="https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android/#0">Recognize
flowers with TensorFlow</a> codelab.

View File

@ -13,12 +13,15 @@ starter model and labels</a>
### Sample application
We have provided a pre-built APK that demonstrates the smart reply model on
Android.
There is a TensorFlow Lite sample application that demonstrates the smart reply
model on Android.
Go to the
<a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc">GitHub
page</a> for instructions and list of supported ops and functionalities.
<a class="button button-primary" href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply">View
Android example</a>
Read the
[GitHub page](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc)
to learn how the app works.
## How it works