From 43c7b99a10f083b8ab3e2327b7068a5b5b9a7e96 Mon Sep 17 00:00:00 2001 From: "A. Unique TensorFlower" Date: Mon, 6 May 2019 21:35:39 -0700 Subject: [PATCH] TensorFlow Lite documentation PiperOrigin-RevId: 246953700 --- tensorflow/lite/g3doc/convert/index.md | 27 +- tensorflow/lite/g3doc/guide/get_started.md | 416 +++++++++--------- tensorflow/lite/g3doc/guide/index.md | 259 ++++------- tensorflow/lite/g3doc/guide/inference.md | 35 +- .../models/image_classification/overview.md | 4 +- .../lite/g3doc/models/smart_reply/overview.md | 13 +- 6 files changed, 353 insertions(+), 401 deletions(-) diff --git a/tensorflow/lite/g3doc/convert/index.md b/tensorflow/lite/g3doc/convert/index.md index 45802fe3fa2..f9c6d9f6cfe 100644 --- a/tensorflow/lite/g3doc/convert/index.md +++ b/tensorflow/lite/g3doc/convert/index.md @@ -1,15 +1,23 @@ # TensorFlow Lite converter -TensorFlow Lite uses the optimized -[FlatBuffer](https://google.github.io/flatbuffers/) format to represent graphs. -Therefore, a TensorFlow model -([protocol buffer](https://developers.google.com/protocol-buffers/)) needs to be -converted into a `FlatBuffer` file before deploying to clients. +The TensorFlow Lite converter is used to convert TensorFlow models into an +optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that +they can be used by the TensorFlow Lite interpreter. Note: This page contains documentation on the converter API for TensorFlow 1.x. The API for TensorFlow 2.0 is available [here](https://www.tensorflow.org/lite/r2/convert/). +## FlatBuffers + +FlatBuffer is an efficient open-source cross-platform serialization library. It +is similar to +[protocol buffers](https://developers.google.com/protocol-buffers), with the +distinction that FlatBuffers do not need a parsing/unpacking step to a secondary +representation before data can be accessed, avoiding per-object memory +allocation. The code footprint of FlatBuffers is an order of magnitude smaller +than protocol buffers. + ## From model training to device deployment The TensorFlow Lite converter generates a TensorFlow Lite @@ -20,14 +28,13 @@ The converter supports the following input formats: * [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators) * Frozen `GraphDef`: Models generated by - [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). + [freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py). * `tf.keras` HDF5 models. * Any model taken from a `tf.Session` (Python API only). -The TensorFlow Lite `FlatBuffer` file is then deployed to a client device -(generally a mobile or embedded device), and the TensorFlow Lite interpreter -uses the compressed model for on-device inference. This conversion process is -shown in the diagram below: +The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and +the TensorFlow Lite interpreter uses the compressed model for on-device +inference. This conversion process is shown in the diagram below: ![TFLite converter workflow](../images/convert/workflow.svg) diff --git a/tensorflow/lite/g3doc/guide/get_started.md b/tensorflow/lite/g3doc/guide/get_started.md index 2e42c95cfad..e20dc08d0ca 100644 --- a/tensorflow/lite/g3doc/guide/get_started.md +++ b/tensorflow/lite/g3doc/guide/get_started.md @@ -1,270 +1,286 @@ # Get started with TensorFlow Lite -Using a TensorFlow Lite model in your mobile app requires multiple -considerations: you must choose a pre-trained or custom model, convert the model -to a TensorFLow Lite format, and finally, integrate the model in your app. +TensorFlow Lite provides all the tools you need to convert and run TensorFlow +models on mobile, embedded, and IoT devices. The following guide walks through +each step of the developer workflow and provides links to further instructions. ## 1. Choose a model -Depending on the use case, you can choose one of the popular open-sourced models, -such as *InceptionV3* or *MobileNets*, and re-train these models with a custom -data set or even build your own custom model. + + +TensorFlow Lite allows you to run TensorFlow models on a wide range of devices. +A TensorFlow model is a data structure that contains the logic and knowledge of +a machine learning network trained to solve a particular problem. + +There are many ways to obtain a TensorFlow model, from using pre-trained models +to training your own. To use a model with TensorFlow Lite it must be converted +into a special format. This is explained in section 2, +[Convert the model](#2_convert_the_model_format). + +Note: Not all TensorFlow models will work with TensorFlow Lite, since the +interpreter supports a limited subset of TensorFlow operations. See section 2, +[Convert the model](#2_convert_the_model_format) to learn about compatibility. ### Use a pre-trained model -[MobileNets](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html) -is a family of mobile-first computer vision models for TensorFlow designed to -effectively maximize accuracy, while taking into consideration the restricted -resources for on-device or embedded applications. MobileNets are small, -low-latency, low-power models parameterized to meet the resource constraints for -a variety of uses. They can be used for classification, detection, embeddings, and -segmentation—similar to other popular large scale models, such as -[Inception](https://arxiv.org/pdf/1602.07261.pdf). Google provides 16 pre-trained -[ImageNet](http://www.image-net.org/challenges/LSVRC/) classification checkpoints -for MobileNets that can be used in mobile projects of all sizes. +The TensorFlow Lite team provides a set of pre-trained models that solve a +variety of machine learning problems. These models have been converted to work +with TensorFlow Lite and are ready to use in your applications. -[Inception-v3](https://arxiv.org/abs/1512.00567) is an image recognition model -that achieves fairly high accuracy recognizing general objects with 1000 classes, -for example, "Zebra", "Dalmatian", and "Dishwasher". The model extracts general -features from input images using a convolutional neural network and classifies -them based on those features with fully-connected and softmax layers. +The pre-trained models include: -[On Device Smart Reply](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html) -is an on-device model that provides one-touch replies for incoming text messages -by suggesting contextually relevant messages. The model is built specifically for -memory constrained devices, such as watches and phones, and has been successfully -used in Smart Replies on Android Wear. Currently, this model is Android-specific. +* [Image classification](../models/image_classification/overview.md) +* [Object detection](../models/object_detection/overview.md) +* [Smart reply](../models/smart_reply/overview.md) +* [Pose estimation](../models/pose_estimation/overview.md) +* [Segmentation](../models/segmentation/overview.md) -These pre-trained models are [available for download](hosted_models.md). +See our full list of pre-trained models in [Models](../models). -### Re-train Inception-V3 or MobileNet for a custom data set +#### Models from other sources -These pre-trained models were trained on the *ImageNet* data set which contains -1000 predefined classes. If these classes are not sufficient for your use case, -the model will need to be re-trained. This technique is called -*transfer learning* and starts with a model that has been already trained on a -problem, then retrains the model on a similar problem. Deep learning from -scratch can take days, but transfer learning is fairly quick. In order to do -this, you need to generate a custom data set labeled with the relevant classes. +There are many other places you can obtain pre-trained TensorFlow models, +including [TensorFlow Hub](https://www.tensorflow.org/hub). In most cases, these +models will not be provided in the TensorFlow Lite format, and you'll have to +[convert](#2_convert_the_model_format) them before use. -The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) -codelab walks through the re-training process step-by-step. The code supports -both floating point and quantized inference. +### Re-train a model (transfer learning) + +Transfer learning allows you to take a trained model and re-train it to perform +another task. For example, an +[image classification](../models/image_classification/overview.md) model could +be retrained to recognize new categories of image. Re-training takes less time +and requires less data than training a model from scratch. + +You can use transfer learning to customize pre-trained models to your +application. Learn how to perform transfer learning in the +Recognize +flowers with TensorFlow codelab. ### Train a custom model -A developer may choose to train a custom model using Tensorflow (see the -[TensorFlow tutorials](https://www.tensorflow.org/tutorials/) for examples of building and training -models). If you have already written a model, the first step is to export this -to a `tf.GraphDef` file. This is required because some formats do not store the -model structure outside the code, and we must communicate with other parts of -the framework. See -[Exporting the Inference Graph](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#save_the_entire_model) -to create file for the custom model. +If you have designed and trained your own TensorFlow model, or you have trained +a model obtained from another source, you should convert it to the TensorFlow +Lite format before use. -TensorFlow Lite currently supports a subset of TensorFlow operators. Refer to -the [TensorFlow Lite & TensorFlow Compatibility Guide](ops_compatibility.md) -for supported operators and their usage. This set of operators will continue to -grow in future Tensorflow Lite releases. +## 2. Convert the model -## 2. Convert the model format + -The [TensorFlow Lite Converter](../convert/index.md) accepts the following file -formats: +TensorFlow Lite is designed to execute models efficiently on devices. Some of +this efficiency comes from the use of a special format for storing models. +TensorFlow models must be converted into this format before they can be used by +TensorFlow Lite. -* `SavedModel` — A `GraphDef` and checkpoint with a signature that labels - input and output arguments to a model. See the documentation for converting - SavedModels using [Python](../convert/python_api.md#basic_savedmodel) or using - the [command line](../convert/cmdline_examples.md#savedmodel). -* `tf.keras` - A HDF5 file containing a model with weights and input and - output arguments generated by `tf.Keras`. See the documentation for - converting HDF5 models using - [Python](../convert/python_api.md#basic_keras_file) or using the - [command line](../convert/cmdline_examples.md#keras). -* `frozen tf.GraphDef` — A subclass of `tf.GraphDef` that does not contain - variables. A `GraphDef` can be converted to a `frozen GraphDef` by taking a - checkpoint and a `GraphDef`, and converting each variable into a constant - using the value retrieved from the checkpoint. Instructions on converting a - `tf.GraphDef` to a TensorFlow Lite model are described in the next - subsection. +Converting models reduces their file size and introduces optimizations that do +not affect accuracy. Developers can opt to further reduce file size and increase +speed of execution in exchange for some trade-offs. You can use the TensorFlow +Lite converter to choose which optimizations to apply. -### Converting a tf.GraphDef +TensorFlow Lite supports a limited subset of TensorFlow operations, so not all +models can be converted. See [Ops compatibility](#ops-compatibility) for more +information. -TensorFlow models may be saved as a .pb or .pbtxt `tf.GraphDef` file. In order -to convert the `tf.GraphDef` file to TensorFlow Lite, the model must first be -frozen. This process involves several file formats including the `frozen -GraphDef`: +### TensorFlow Lite converter -* `tf.GraphDef` (.pb or .pbtxt) — A protobuf that represents the TensorFlow - training or computation graph. It contains operators, tensors, and variables - definitions. -* *checkpoint* (.ckpt) — Serialized variables from a TensorFlow graph. Since - this does not contain a graph structure, it cannot be interpreted by itself. -* *TensorFlow Lite model* (.tflite) — A serialized - [FlatBuffer](https://google.github.io/flatbuffers/) that contains TensorFlow - Lite operators and tensors for the TensorFlow Lite interpreter. +The [TensorFlow Lite converter](../convert) is a tool that converts trained +TensorFlow models into the TensorFlow Lite format. It can also introduce +optimizations, which are covered in section 4, +[Optimize your model](#4_optimize_your_model_optional). -You must have checkpoints that contain trained weights. The `tf.GraphDef` file -only contains the structure of the graph. The process of merging the checkpoint -values with the graph structure is called *freezing the graph*. +The converter is available as a Python API. The following example shows a +TensorFlow `SavedModel` being converted into the TensorFlow Lite format: -`tf.GraphDef` and checkpoint files for MobileNet models are available -[here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md). +```python +import tensorflow as tf -To freeze the graph, use the following command (changing the arguments): - -``` -freeze_graph --input_graph=/tmp/mobilenet_v1_224.pb \ - --input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \ - --input_binary=true \ - --output_graph=/tmp/frozen_mobilenet_v1_224.pb \ - --output_node_names=MobileNetV1/Predictions/Reshape_1 +converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) +tflite_model = converter.convert() +open("converted_model.tflite", "wb").write(tflite_model) ``` -Set the `input_binary` flag to `True` when reading a binary protobuf, a `.pb` -file. Set to `False` for a `.pbtxt` file. +You can [convert TensorFlow 2.0 models](../r2/convert) in a similar way. -Set `input_graph` and `input_checkpoint` to the respective filenames. The -`output_node_names` may not be obvious outside of the code that built the model. -The easiest way to find them is to visualize the graph, either with -[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) or -`graphviz`. +The converter can also be used from the +[command line](../convert/cmdline_examples), but the Python API is recommended. -The frozen `GraphDef` is now ready for conversion to the `FlatBuffer` format -(.tflite) for use on Android or iOS devices. For Android, the TensorFlow Lite -Converter tool supports both float and quantized models. To convert the frozen -`GraphDef` to the .tflite format use a command similar to the following: +### Options -``` -tflite_convert \ - --output_file=/tmp/mobilenet_v1_1.0_224.tflite \ - --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \ - --input_arrays=input \ - --output_arrays=MobilenetV1/Predictions/Reshape_1 -``` +The converter can convert from a variety of input types. -The -[frozen_graph.pb](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz) -file used here is available for download. Setting the `input_array` and -`output_array` arguments is not straightforward. The easiest way to find these -values is to explore the graph using -[TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard). Reuse -the arguments for specifying the output nodes for inference in the -`freeze_graph` step. +When [converting TensorFlow 1.x models](../convert/python_api), these are: -### Full converter reference +* [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model) +* Frozen GraphDef (models generated by + [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)) +* [Keras](https://keras.io) HDF5 models +* Models taken from a `tf.Session` -The [TensorFlow Lite Converter](../convert/index.md) can be -[Python](../convert/python_api.md) or from the -[command line](../convert/cmdline_examples.md). This allows you to integrate the -conversion step into the model design workflow, ensuring the model is easy to -convert to a mobile inference graph. +When [converting TensorFlow 2.x models](../r2/convert/python_api), these are: + +* [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model) +* [`tf.keras` models](https://www.tensorflow.org/alpha/guide/keras/overview) +* [Concrete functions](../r2/convert/concrete_function.md) + +The converter can be configured to apply various optimizations that can improve +performance or reduce file size. This is covered in section 4, +[Optimize your model](#4_optimize_your_model_optional). ### Ops compatibility -Refer to the [ops compatibility guide](ops_compatibility.md) for -troubleshooting help, and if that doesn't help, please -[file an issue](https://github.com/tensorflow/tensorflow/issues). +TensorFlow Lite currently supports a [limited subset](ops_compatibility.md) of +TensorFlow operations. The long term goal is for all TensorFlow operations to be +supported. -### Graph Visualization tool +If the model you wish to convert contains unsupported operations, you can use +[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This +will result in a larger binary being deployed to devices. -The [development repo](https://github.com/tensorflow/tensorflow) contains a tool -to visualize TensorFlow Lite models after conversion. To build the -[visualize.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/visualize.py) -tool: +## 3. Run inference with the model -```sh -bazel run tensorflow/lite/tools:visualize -- model.tflite model_viz.html + + +*Inference* is the process of running data through a model to obtain +predictions. It requires a model, an interpreter, and input data. + +### TensorFlow Lite interpreter + +The [TensorFlow Lite interpreter](inference.md) is a library that takes a model +file, executes the operations it defines on input data, and provides access to +the output. + +The interpreter works across multiple platforms and provides a simple API for +running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python. + +The following code shows the interpreter being invoked from Java: + +```java +try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) { + interpreter.run(input, output); +} ``` -This generates an interactive HTML page listing subgraphs, operations, and a -graph visualization. +### GPU acceleration and Delegates -## 3. Use the TensorFlow Lite model for inference in a mobile app +Some devices provide hardware acceleration for machine learning operations. For +example, most mobile phones have GPUs, which can perform floating point matrix +operations faster than a CPU. -After completing the prior steps, you should now have a `.tflite` model file. +The speed-up can be substantial. For example, a MobileNet v1 image +classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration +is used. -### Android +The TensorFlow Lite interpreter can be configured with +[Delegates](../performance/delegates.md) to make use of hardware acceleration on +different devices. The [GPU Delegate](../performance/gpu.md) allows the +interpreter to run appropriate operations on the device's GPU. -Since Android apps are written in Java and the core TensorFlow library is in C++, -a JNI library is provided as an interface. This is only meant for inference—it -provides the ability to load a graph, set up inputs, and run the model to -calculate outputs. +The following code shows the GPU Delegate being used from Java: -The open source Android demo app uses the JNI interface and is available -[on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo/app). -You can also download a -[prebuilt APK](http://download.tensorflow.org/deps/tflite/TfLiteCameraDemo.apk). -See the Android demo guide for details. +```java +GpuDelegate delegate = new GpuDelegate(); +Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate); +Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options); +try { + interpreter.run(input, output); +} +``` -The Android mobile guide has instructions for -installing TensorFlow on Android and setting up `bazel` and Android Studio. +To add support for new hardware accelerators you can +[define your own delegate](../performance/delegates.md#how_to_add_a_delegate). -### iOS +### Android and iOS -To integrate a TensorFlow model in an iOS app, see the -[TensorFlow Lite for iOS](ios.md) guide and iOS demo -guide. +The TensorFlow Lite interpreter is easy to use from both major mobile platforms. +To get started, explore the [Android quickstart](android.md) and +[iOS quickstart](ios.md) guides. +[Example applications](https://www.tensorflow.org/lite/examples) are available +for both platforms. -#### Core ML support +To obtain the required libraries, Android developers should use the +[TensorFlow Lite AAR](android.md#use_the_tensorflow_lite_aar_from_jcenter). iOS +developers should use the +[CocoaPods for Swift or Objective-C](ios.md#add_tensorflow_lite_to_your_swift_or_objective-c_project). -Core ML is a machine learning framework used in Apple products. In addition to -using Tensorflow Lite models directly in your applications, you can convert -trained Tensorflow models to the -[CoreML](https://developer.apple.com/machine-learning/) format for use on Apple -devices. To use the converter, refer to the -[Tensorflow-CoreML converter documentation](https://github.com/tf-coreml/tf-coreml). +### Linux -### ARM32 and ARM64 Linux +Embedded Linux is an important platform for deploying machine learning. We +provide build instructions for both [Raspberry Pi](build_rpi.md) and +[Arm64-based boards](build_arm64.md) such as Odroid C2, Pine64, and NanoPi. -Compile Tensorflow Lite for a Raspberry Pi by following the -[RPi build instructions](build_rpi.md) Compile Tensorflow Lite for a generic aarch64 -board such as Odroid C2, Pine64, NanoPi, and others by following the -[ARM64 Linux build instructions](build_arm64.md) This compiles a static -library file (`.a`) used to build your app. There are plans for Python bindings -and a demo app. +### Microcontrollers -## 4. Optimize your model (optional) +[TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md) is an +experimental port of TensorFlow Lite aimed at microcontrollers and other devices +with only kilobytes of memory. -There are two options. If you plan to run on CPU, we recommend that you quantize -your weights and activation tensors. If the hardware is available, another -option is to run on GPU for massively parallelizable workloads. +### Operations + +If your model requires TensorFlow operations that are not yet implemented in +TensorFlow Lite, you can use [TensorFlow Select](ops_select.md) to use them in +your model. You'll need to build a custom version of the interpreter that +includes the TensorFlow operations. + +You can use [Custom operators](ops_custom.md) to write your own operations, or +port new operations into TensorFlow Lite. + +[Operator versions](ops_version.md) allows you to add new functionalities and +parameters into existing operations. + +## 4. Optimize your model + + + +TensorFlow Lite provides tools to optimize the size and performance of your +models, often with minimal impact on accuracy. Optimized models may require +slightly more complex training, conversion, or integration. + +Machine learning optimization is an evolving field, and TensorFlow Lite's +[Model Optimization Toolkit](#model-optimization-toolkit) is continually growing +as new techniques are developed. + +### Performance + +The goal of model optimization is to reach the ideal balance of performance, +model size, and accuracy on a given device. +[Performance best practices](../performance/best_practices.md) can help guide +you through this process. ### Quantization -Compress your model size by lowering the precision of the parameters (i.e. -neural network weights) from their training-time 32-bit floating-point -representations into much smaller and efficient 8-bit integer ones. -This will execute the heaviest computations fast in lower precision, but the -most sensitive ones with higher precision, thus typically resulting in little to -no final accuracy losses for the task, yet a significant speed-up over pure -floating-point execution. +By reducing the precision of values and operations within a model, quantization +can reduce both the size of model and the time required for inference. For many +models, there is only a minimal loss of accuracy. -The post-training quantization technique is integrated into the TensorFlow Lite -conversion tool. Getting started is easy: after building your TensorFlow model, -simply enable the ‘post_training_quantize’ flag in the TensorFlow Lite -conversion tool. Assuming that the saved model is stored in saved_model_dir, the -quantized tflite flatbuffer can be generated in command line: +The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The +following Python code quantizes a `SavedModel` and saves it to disk: + +```python +import tensorflow as tf -``` converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] tflite_quant_model = converter.convert() +open("converted_model.tflite", "wb").write(tflite_quantized_model) ``` -Read the full documentation [here](../performance/post_training_quantization.md) -and see a tutorial -[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb). +To learn more about quantization, see +[Post-training quantization](../performance/post_training_quantization.md). -### GPU -Run on GPU GPUs are designed to have high throughput for massively -parallelizable workloads. Thus, they are well-suited for deep neural nets, which -consist of a huge number of operators, each working on some input tensor(s) that -can be easily divided into smaller workloads and carried out in parallel, -typically resulting in lower latency. +### Model Optimization Toolkit -Another benefit with GPU inference is its power efficiency. GPUs carry out the -computations in a very efficient and optimized manner, so that they consume less -power and generate less heat than when the same task is run on CPUs. +The [Model Optimization Toolkit](../performance/model_optimization.md) is a set +of tools and techniques designed to make it easy for developers to optimize +their models. Many of the techniques can be applied to all TensorFlow models and +are not specific to TensorFlow Lite, but they are especially valuable when +running inference on devices with limited resources. -Read the tutorial [here](../performance/gpu.md) and full documentation [here](../performance/gpu_advanced.md). +## Next steps + +Now that you're familiar with TensorFlow Lite, explore some of the following +resources: + +* If you're a mobile developer, visit [Android quickstart](android.md) or + [iOS quickstart](ios.md). +* Explore our [pre-trained models](../models). +* Try our [example apps](https://www.tensorflow.org/lite/examples). diff --git a/tensorflow/lite/g3doc/guide/index.md b/tensorflow/lite/g3doc/guide/index.md index 288f7a07576..2475c7e1132 100644 --- a/tensorflow/lite/g3doc/guide/index.md +++ b/tensorflow/lite/g3doc/guide/index.md @@ -1,202 +1,121 @@ - # TensorFlow Lite guide -TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded -devices. It enables on-device machine learning inference with low latency and a -small binary size. TensorFlow Lite also supports hardware acceleration with the -[Android Neural Networks -API](https://developer.android.com/ndk/guides/neuralnetworks/index.html). +TensorFlow Lite is a set of tools to help developers run TensorFlow models on +mobile, embedded, and IoT devices. It enables on-device machine learning +inference with low latency and a small binary size. -TensorFlow Lite uses many techniques for achieving low latency such as -optimizing the kernels for mobile apps, pre-fused activations, and quantized -kernels that allow smaller and faster (fixed-point math) models. +TensorFlow Lite consists of two main components: -Most of our TensorFlow Lite documentation is [on -GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite) -for the time being. +- The [TensorFlow Lite interpreter](inference.md), which runs specially + optimized models on many different hardware types, including mobile phones, + embedded Linux devices, and microcontrollers. +- The [TensorFlow Lite converter](../convert/index.md), which converts + TensorFlow models into an efficient form for use by the interpreter, and can + introduce optimizations to improve binary size and performance. -## What does TensorFlow Lite contain? +### Machine learning at the edge -TensorFlow Lite supports a set of core operators, both quantized and -float, which have been tuned for mobile platforms. They incorporate pre-fused -activations and biases to further enhance performance and quantized -accuracy. Additionally, TensorFlow Lite also supports using custom operations in -models. +TensorFlow Lite is designed to make it easy to perform machine learning on +devices, "at the edge" of the network, instead of sending data back and forth +from a server. For developers, performing machine learning on-device can help +improve: -TensorFlow Lite defines a new model file format, based on -[FlatBuffers](https://google.github.io/flatbuffers/). FlatBuffers is an -efficient open-source cross-platform serialization library. It is similar to -[protocol buffers](https://developers.google.com/protocol-buffers/?hl=en), but -the primary difference is that FlatBuffers does not need a parsing/unpacking -step to a secondary representation before you can access data, often coupled -with per-object memory allocation. Also, the code footprint of FlatBuffers is an -order of magnitude smaller than protocol buffers. +* *Latency:* there's no round-trip to a server +* *Privacy:* no data needs to leave the device +* *Connectivity:* an Internet connection isn't required +* *Power consumption:* network connections are power hungry -TensorFlow Lite has a new mobile-optimized interpreter, which has the key goals -of keeping apps lean and fast. The interpreter uses a static graph ordering and -a custom (less-dynamic) memory allocator to ensure minimal load, initialization, -and execution latency. +TensorFlow Lite works with a huge range of devices, from tiny microcontrollers +to powerful mobile phones. -TensorFlow Lite provides an interface to leverage hardware acceleration, if -available on the device. It does so via the -[Android Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html), -available on Android 8.1 (API level 27) and higher. +Key Point: The TensorFlow Lite binary is smaller than 300KB when all supported +operators are linked, and less than 200KB when using only the operators needed +for supporting the common image classification models InceptionV3 and MobileNet. -## Why do we need a new mobile-specific library? +## Get started -Machine Learning is changing the computing paradigm, and we see an emerging -trend of new use cases on mobile and embedded devices. Consumer expectations are -also trending toward natural, human-like interactions with their devices, driven -by the camera and voice interaction models. +To begin working with TensorFlow Lite, visit [Get started](get_started.md). -There are several factors which are fueling interest in this domain: +## Key features -- Innovation at the silicon layer is enabling new possibilities for hardware - acceleration, and frameworks such as the Android Neural Networks API make it - easy to leverage these. +* *[Interpreter](inference.md) tuned for on-device ML*, supporting a set of + core operators that are optimized for on-device applications, and with a + small binary size. +* *Diverse platform support*, covering [Android](android.md) and [iOS](ios.md) + devices, embedded Linux, and microcontrollers, making use of platform APIs + for accelerated inference. +* *APIs for multiple languages* including Java, Swift, Objective-C, C++, and + Python. +* *High performance*, with [hardware acceleration](../performance/gpu.md) on + supported devices, device-optimized kernels, and + [pre-fused activations and biases](ops_compatibility.md). +* *Model optimization tools*, including + [quantization](../performance/post_training_quantization.md), that can + reduce size and increase performance of models without sacrificing accuracy. +* *Efficient model format*, using a [FlatBuffer](../convert/index.md) that is + optimized for small size and portability. +* *[Pre-trained models](../models)* for common machine learning tasks that can + be customized to your application. +* *[Samples and tutorials](https://www.tensorflow.org/examples)* that show you + how to deploy machine learning models on supported platforms. -- Recent advances in real-time computer-vision and spoken language understanding - have led to mobile-optimized benchmark models being open sourced - (e.g. MobileNets, SqueezeNet). +## Development workflow -- Widely-available smart appliances create new possibilities for - on-device intelligence. +The workflow for using TensorFlow Lite involves the following steps: -- Interest in stronger user data privacy paradigms where user data does not need - to leave the mobile device. +1. **Pick a model** -- Ability to serve ‘offline’ use cases, where the device does not need to be - connected to a network. + Bring your own TensorFlow model, find a model online, or pick a model from + our [Pre-trained models](../models) to drop in or retrain. -We believe the next wave of machine learning applications will have significant -processing on mobile and embedded devices. +1. **Convert the model** -## TensorFlow Lite highlights + If you're using a custom model, use the + [TensorFlow Lite converter](../convert/index.md) and a few lines of Python + to convert it to the TensorFlow Lite format. -TensorFlow Lite provides: +1. **Deploy to your device** -- A set of core operators, both quantized and float, many of which have been - tuned for mobile platforms. These can be used to create and run custom - models. Developers can also write their own custom operators and use them in - models. + Run your model on-device with the + [TensorFlow Lite interpreter](inference.md), with APIs in many languages. -- A new [FlatBuffers](https://google.github.io/flatbuffers/)-based - model file format. +1. **Optimize your model** -- On-device interpreter with kernels optimized for faster execution on mobile. + Use our [Model Optimization Toolkit](../performance/model_optimization.md) + to reduce your model's size and increase its efficiency with minimal impact + on accuracy. -- TensorFlow converter to convert TensorFlow-trained models to the TensorFlow - Lite format. +To learn more about using TensorFlow Lite in your project, see +[Get started](get_started.md). -- Smaller in size: TensorFlow Lite is smaller than 300KB when all supported - operators are linked and less than 200KB when using only the operators needed - for supporting InceptionV3 and Mobilenet. +## Technical constraints -- **Pre-tested models:** +TensorFlow Lite plans to provide high performance on-device inference for any +TensorFlow model. However, the TensorFlow Lite interpreter currently supports a +limited subset of TensorFlow operators that have been optimized for on-device +use. This means that some models require additional steps to work with +TensorFlow Lite. - All of the following models are guaranteed to work out of the box: +To learn which operators are available, see +[Operator compatibility](ops_compatibility.md). - - Inception V3, a popular model for detecting the dominant objects - present in an image. +If your model uses operators that are not yet supported by TensorFlow Lite +interpreter, you can use [TensorFlow Select](ops_select.md) to include +TensorFlow operations in your TensorFlow Lite build. However, this will lead to +an increased binary size. - - [MobileNets](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md), - a family of mobile-first computer vision models designed to effectively - maximize accuracy while being mindful of the restricted resources for an - on-device or embedded application. They are small, low-latency, low-power - models parameterized to meet the resource constraints of a variety of use - cases. They can be built upon for classification, detection, embeddings - and segmentation. MobileNet models are smaller but [lower in - accuracy](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html) - than Inception V3. +TensorFlow Lite does not currently support on-device training, but it is in our +[Roadmap](roadmap.md), along with other planned improvements. - - On Device Smart Reply, an on-device model which provides one-touch - replies for an incoming text message by suggesting contextually relevant - messages. The model was built specifically for memory constrained devices - such as watches & phones and it has been successfully used to surface - [Smart Replies on Android - Wear](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html) - to all first-party and third-party apps. +## Next steps - Also see the complete list of - [TensorFlow Lite's supported models](hosted_models.md), - including the model sizes, performance numbers, and downloadable model files. +Want to keep learning about TensorFlow Lite? Here are some next steps: -- Quantized versions of the MobileNet model, which runs faster than the - non-quantized (float) version on CPU. - -- New Android demo app to illustrate the use of TensorFlow Lite with a quantized - MobileNet model for object classification. - -- Java and C++ API support - - -## Getting Started - -We recommend you try out TensorFlow Lite with the pre-tested models indicated -above. If you have an existing model, you will need to test whether your model -is compatible with both the converter and the supported operator set. To test -your model, see the -[documentation on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite). - -### Retrain Inception-V3 or MobileNet for a custom data set - -The pre-trained models mentioned above have been trained on the ImageNet data -set, which consists of 1000 predefined classes. If those classes are not -relevant or useful for your use case, you will need to retrain those -models. This technique is called transfer learning, which starts with a model -that has been already trained on a problem and will then be retrained on a -similar problem. Deep learning from scratch can take days, but transfer learning -can be done fairly quickly. In order to do this, you'll need to generate your -custom data set labeled with the relevant classes. - -The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) -codelab walks through this process step-by-step. The retraining code supports -retraining for both floating point and quantized inference. - -## TensorFlow Lite Architecture - -The following diagram shows the architectural design of TensorFlow Lite: - -TensorFlow Lite architecture diagram - -Starting with a trained TensorFlow model on disk, you'll convert that model to -the TensorFlow Lite file format (`.tflite`) using the TensorFlow Lite -Converter. Then you can use that converted file in your mobile application. - -Deploying the TensorFlow Lite model file uses: - -- Java API: A convenience wrapper around the C++ API on Android. - -- C++ API: Loads the TensorFlow Lite Model File and invokes the Interpreter. The - same library is available on both Android and iOS. - -- Interpreter: Executes the model using a set of kernels. The interpreter - supports selective kernel loading; without kernels it is only 100KB, and 300KB - with all the kernels loaded. This is a significant reduction from the 1.5M - required by TensorFlow Mobile. - -- On select Android devices, the Interpreter will use the Android Neural - Networks API for hardware acceleration, or default to CPU execution if none - are available. - -You can also implement custom kernels using the C++ API that can be used by the -Interpreter. - -## Future Work - -In future releases, TensorFlow Lite will support more models and built-in -operators, contain performance improvements for both fixed point and floating -point models, improvements to the tools to enable easier developer workflows and -support for other smaller devices and more. As we continue development, we hope -that TensorFlow Lite will greatly simplify the developer experience of targeting -a model for small devices. - -Future plans include using specialized machine learning hardware to get the best -possible performance for a particular model on a particular device. - -## Next Steps - -The TensorFlow Lite [GitHub repository](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite). -contains additional docs, code samples, and demo applications. +* Visit [Get started](get_started.md) to walk through the process of using + TensorFlow Lite. +* If you're a mobile developer, visit [Android quickstart](android.md) or + [iOS quickstart](ios.md). +* Learn about + [TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md). +* Explore our [pre-trained models](../models). +* Try our [example apps](https://www.tensorflow.org/lite/examples). diff --git a/tensorflow/lite/g3doc/guide/inference.md b/tensorflow/lite/g3doc/guide/inference.md index b0107ece0b1..353a656740e 100644 --- a/tensorflow/lite/g3doc/guide/inference.md +++ b/tensorflow/lite/g3doc/guide/inference.md @@ -1,16 +1,15 @@ # TensorFlow Lite inference -[TOC] +The term *inference* refers to the process of executing a TensorFlow Lite model +on-device in order to make predictions based on input data. Inference is the +final step in using the model on-device. -## Overview +Inference for TensorFlow Lite models is run through an interpreter. The +TensorFlow Lite interpreter is designed to be lean and fast. The interpreter +uses a static graph ordering and a custom (less-dynamic) memory allocator to +ensure minimal load, initialization, and execution latency. -TensorFlow Lite inference is the process of executing a TensorFlow Lite -model on-device and extracting meaningful results from it. Inference is the -final step in using the model on-device in the -[architecture](index.md#tensorflow_lite_architecture). - -Inference for TensorFlow Lite models is run through an interpreter. This -document outlines the various APIs for the interpreter along with the +This document outlines the various APIs for the interpreter, along with the [supported platforms](#supported-platforms). ### Important Concepts @@ -43,19 +42,27 @@ TensorFlow Lite inference on device typically follows the following steps. present it to their user. ### Supported Platforms + TensorFlow inference APIs are provided for most common mobile/embedded platforms such as Android, iOS and Linux. #### Android + On Android, TensorFlow Lite inference can be performed using either Java or C++ APIs. The Java APIs provide convenience and can be used directly within your -Android Activity classes. The C++ APIs on the other hand may offer more -flexibility and speed, but may require writing JNI wrappers to move data between -Java and C++ layers. You can find an example [here](android.md). +Android Activity classes. The C++ APIs offer more flexibility and speed, but may +require writing JNI wrappers to move data between Java and C++ layers. + +Visit the [Android quickstart](android.md) for a tutorial and example code. #### iOS -TensorFlow Lite provides Swift/Objective C++ APIs for inference on iOS. An -example can be found [here](ios.md). + +TensorFlow Lite provides native iOS libraries written in +[Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift) +and +[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc). + +Visit the [iOS quickstart](ios.md) for a tutorial and example code. #### Linux On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++ diff --git a/tensorflow/lite/g3doc/models/image_classification/overview.md b/tensorflow/lite/g3doc/models/image_classification/overview.md index 844934e467e..d4046c95cfb 100644 --- a/tensorflow/lite/g3doc/models/image_classification/overview.md +++ b/tensorflow/lite/g3doc/models/image_classification/overview.md @@ -280,5 +280,5 @@ trees in the original training data. To do this, you will need a set of training images for each of the new labels you wish to train. Learn how to perform transfer learning in the -TensorFlow -for Poets codelab. +Recognize +flowers with TensorFlow codelab. diff --git a/tensorflow/lite/g3doc/models/smart_reply/overview.md b/tensorflow/lite/g3doc/models/smart_reply/overview.md index 20c359ec9ff..b2363adcf48 100644 --- a/tensorflow/lite/g3doc/models/smart_reply/overview.md +++ b/tensorflow/lite/g3doc/models/smart_reply/overview.md @@ -13,12 +13,15 @@ starter model and labels ### Sample application -We have provided a pre-built APK that demonstrates the smart reply model on -Android. +There is a TensorFlow Lite sample application that demonstrates the smart reply +model on Android. -Go to the -GitHub -page for instructions and list of supported ops and functionalities. +View +Android example + +Read the +[GitHub page](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc) +to learn how the app works. ## How it works