TensorFlow Lite documentation

PiperOrigin-RevId: 246953700
2019-05-06 21:35:39 -07:00 · 2019-05-06 21:35:39 -07:00 · 43c7b99a10
commit 43c7b99a10
parent 60524c4167
6 changed files with 353 additions and 401 deletions
--- a/tensorflow/lite/g3doc/convert/index.md
+++ b/tensorflow/lite/g3doc/convert/index.md
@ -1,15 +1,23 @@
 # TensorFlow Lite converter
-TensorFlow Lite uses the optimized
+The TensorFlow Lite converter is used to convert TensorFlow models into an
-[FlatBuffer](https://google.github.io/flatbuffers/) format to represent graphs.
+optimized [FlatBuffer](https://google.github.io/flatbuffers/) format, so that
-Therefore, a TensorFlow model
+they can be used by the TensorFlow Lite interpreter.
 ([protocol buffer](https://developers.google.com/protocol-buffers/)) needs to be
 converted into a `FlatBuffer` file before deploying to clients.
 Note: This page contains documentation on the converter API for TensorFlow 1.x.
 The API for TensorFlow 2.0 is available
 [here](https://www.tensorflow.org/lite/r2/convert/).
 ## FlatBuffers
 FlatBuffer is an efficient open-source cross-platform serialization library. It
 is similar to
 [protocol buffers](https://developers.google.com/protocol-buffers), with the
 distinction that FlatBuffers do not need a parsing/unpacking step to a secondary
 representation before data can be accessed, avoiding per-object memory
 allocation. The code footprint of FlatBuffers is an order of magnitude smaller
 than protocol buffers.
 ## From model training to device deployment
 The TensorFlow Lite converter generates a TensorFlow Lite
@ -20,14 +28,13 @@ The converter supports the following input formats:
 *   [SavedModels](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators)
 *   Frozen `GraphDef`: Models generated by
-    [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
+    [freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py).
 *   `tf.keras` HDF5 models.
 *   Any model taken from a `tf.Session` (Python API only).
-The TensorFlow Lite `FlatBuffer` file is then deployed to a client device
+The TensorFlow Lite `FlatBuffer` file is then deployed to a client device, and
-(generally a mobile or embedded device), and the TensorFlow Lite interpreter
+the TensorFlow Lite interpreter uses the compressed model for on-device
-uses the compressed model for on-device inference. This conversion process is
+inference. This conversion process is shown in the diagram below:
 shown in the diagram below:
 ![TFLite converter workflow](../images/convert/workflow.svg)
--- a/tensorflow/lite/g3doc/guide/get_started.md
+++ b/tensorflow/lite/g3doc/guide/get_started.md
@ -1,270 +1,286 @@
 # Get started with TensorFlow Lite
-Using a TensorFlow Lite model in your mobile app requires multiple
+TensorFlow Lite provides all the tools you need to convert and run TensorFlow
-considerations: you must choose a pre-trained or custom model, convert the model
+models on mobile, embedded, and IoT devices. The following guide walks through
-to a TensorFLow Lite format, and finally, integrate the model in your app.
+each step of the developer workflow and provides links to further instructions.
 ## 1. Choose a model
-Depending on the use case, you can choose one of the popular open-sourced models,
+<a id="1_choose_a_model"></a>
-such as *InceptionV3* or *MobileNets*, and re-train these models with a custom
+
-data set or even build your own custom model.
+TensorFlow Lite allows you to run TensorFlow models on a wide range of devices.
 A TensorFlow model is a data structure that contains the logic and knowledge of
 a machine learning network trained to solve a particular problem.
 There are many ways to obtain a TensorFlow model, from using pre-trained models
 to training your own. To use a model with TensorFlow Lite it must be converted
 into a special format. This is explained in section 2,
 [Convert the model](#2_convert_the_model_format).
 Note: Not all TensorFlow models will work with TensorFlow Lite, since the
 interpreter supports a limited subset of TensorFlow operations. See section 2,
 [Convert the model](#2_convert_the_model_format) to learn about compatibility.
 ### Use a pre-trained model
-[MobileNets](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
+The TensorFlow Lite team provides a set of pre-trained models that solve a
-is a family of mobile-first computer vision models for TensorFlow designed to
+variety of machine learning problems. These models have been converted to work
-effectively maximize accuracy, while taking into consideration the restricted
+with TensorFlow Lite and are ready to use in your applications.
 resources for on-device or embedded applications. MobileNets are small,
 low-latency, low-power models parameterized to meet the resource constraints for
 a variety of uses. They can be used for classification, detection, embeddings, and
 segmentation—similar to other popular large scale models, such as
 [Inception](https://arxiv.org/pdf/1602.07261.pdf). Google provides 16 pre-trained
 [ImageNet](http://www.image-net.org/challenges/LSVRC/) classification checkpoints
 for MobileNets that can be used in mobile projects of all sizes.
-[Inception-v3](https://arxiv.org/abs/1512.00567) is an image recognition model
+The pre-trained models include:
 that achieves fairly high accuracy recognizing general objects with 1000 classes,
 for example, "Zebra", "Dalmatian", and "Dishwasher". The model extracts general
 features from input images using a convolutional neural network and classifies
 them based on those features with fully-connected and softmax layers.
-[On Device Smart Reply](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
+*   [Image classification](../models/image_classification/overview.md)
-is an on-device model that provides one-touch replies for incoming text messages
+*   [Object detection](../models/object_detection/overview.md)
-by suggesting contextually relevant messages. The model is built specifically for
+*   [Smart reply](../models/smart_reply/overview.md)
-memory constrained devices, such as watches and phones, and has been successfully
+*   [Pose estimation](../models/pose_estimation/overview.md)
-used in Smart Replies on Android Wear. Currently, this model is Android-specific.
+*   [Segmentation](../models/segmentation/overview.md)
-These pre-trained models are [available for download](hosted_models.md).
+See our full list of pre-trained models in [Models](../models).
-### Re-train Inception-V3 or MobileNet for a custom data set
+#### Models from other sources
-These pre-trained models were trained on the *ImageNet* data set which contains
+There are many other places you can obtain pre-trained TensorFlow models,
-1000 predefined classes. If these classes are not sufficient for your use case,
+including [TensorFlow Hub](https://www.tensorflow.org/hub). In most cases, these
-the model will need to be re-trained. This technique is called
+models will not be provided in the TensorFlow Lite format, and you'll have to
-*transfer learning* and starts with a model that has been already trained on a
+[convert](#2_convert_the_model_format) them before use.
 problem, then retrains the model on a similar problem. Deep learning from
 scratch can take days, but transfer learning is fairly quick. In order to do
 this, you need to generate a custom data set labeled with the relevant classes.
-The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
+### Re-train a model (transfer learning)
-codelab walks through the re-training process step-by-step. The code supports
+
-both floating point and quantized inference.
+Transfer learning allows you to take a trained model and re-train it to perform
 another task. For example, an
 [image classification](../models/image_classification/overview.md) model could
 be retrained to recognize new categories of image. Re-training takes less time
 and requires less data than training a model from scratch.
 You can use transfer learning to customize pre-trained models to your
 application. Learn how to perform transfer learning in the
 <a href="https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android">Recognize
 flowers with TensorFlow</a> codelab.
 ### Train a custom model
-A developer may choose to train a custom model using Tensorflow (see the
+If you have designed and trained your own TensorFlow model, or you have trained
-[TensorFlow tutorials](https://www.tensorflow.org/tutorials/) for examples of building and training
+a model obtained from another source, you should convert it to the TensorFlow
-models). If you have already written a model, the first step is to export this
+Lite format before use.
 to a `tf.GraphDef` file. This is required because some formats do not store the
 model structure outside the code, and we must communicate with other parts of
 the framework. See
 [Exporting the Inference Graph](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#save_the_entire_model)
 to create file for the custom model.
-TensorFlow Lite currently supports a subset of TensorFlow operators. Refer to
+## 2. Convert the model
 the [TensorFlow Lite & TensorFlow Compatibility Guide](ops_compatibility.md)
 for supported operators and their usage. This set of operators will continue to
 grow in future Tensorflow Lite releases.
-## 2. Convert the model format
+<a id="2_convert_the_model_format"></a>
-The [TensorFlow Lite Converter](../convert/index.md) accepts the following file
+TensorFlow Lite is designed to execute models efficiently on devices. Some of
-formats:
+this efficiency comes from the use of a special format for storing models.
 TensorFlow models must be converted into this format before they can be used by
 TensorFlow Lite.
-*   `SavedModel` — A `GraphDef` and checkpoint with a signature that labels
+Converting models reduces their file size and introduces optimizations that do
-    input and output arguments to a model. See the documentation for converting
+not affect accuracy. Developers can opt to further reduce file size and increase
-    SavedModels using [Python](../convert/python_api.md#basic_savedmodel) or using
+speed of execution in exchange for some trade-offs. You can use the TensorFlow
-    the [command line](../convert/cmdline_examples.md#savedmodel).
+Lite converter to choose which optimizations to apply.
 *   `tf.keras` - A HDF5 file containing a model with weights and input and
    output arguments generated by `tf.Keras`. See the documentation for
    converting HDF5 models using
    [Python](../convert/python_api.md#basic_keras_file) or using the
    [command line](../convert/cmdline_examples.md#keras).
 *   `frozen tf.GraphDef` — A subclass of `tf.GraphDef` that does not contain
    variables. A `GraphDef` can be converted to a `frozen GraphDef` by taking a
    checkpoint and a `GraphDef`, and converting each variable into a constant
    using the value retrieved from the checkpoint. Instructions on converting a
    `tf.GraphDef` to a TensorFlow Lite model are described in the next
    subsection.
-### Converting a tf.GraphDef
+TensorFlow Lite supports a limited subset of TensorFlow operations, so not all
 models can be converted. See [Ops compatibility](#ops-compatibility) for more
 information.
-TensorFlow models may be saved as a .pb or .pbtxt `tf.GraphDef` file. In order
+### TensorFlow Lite converter
 to convert the `tf.GraphDef` file to TensorFlow Lite, the model must first be
 frozen. This process involves several file formats including the `frozen
 GraphDef`:
-*   `tf.GraphDef` (.pb or .pbtxt) — A protobuf that represents the TensorFlow
+The [TensorFlow Lite converter](../convert) is a tool that converts trained
-    training or computation graph. It contains operators, tensors, and variables
+TensorFlow models into the TensorFlow Lite format. It can also introduce
-    definitions.
+optimizations, which are covered in section 4,
-*   *checkpoint* (.ckpt) — Serialized variables from a TensorFlow graph. Since
+[Optimize your model](#4_optimize_your_model_optional).
    this does not contain a graph structure, it cannot be interpreted by itself.
 *   *TensorFlow Lite model* (.tflite) — A serialized
    [FlatBuffer](https://google.github.io/flatbuffers/) that contains TensorFlow
    Lite operators and tensors for the TensorFlow Lite interpreter.
-You must have checkpoints that contain trained weights. The `tf.GraphDef` file
+The converter is available as a Python API. The following example shows a
-only contains the structure of the graph. The process of merging the checkpoint
+TensorFlow `SavedModel` being converted into the TensorFlow Lite format:
 values with the graph structure is called *freezing the graph*.
-`tf.GraphDef` and checkpoint files for MobileNet models are available
+```python
-[here](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md).
+import tensorflow as tf
-To freeze the graph, use the following command (changing the arguments):
+converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
-
+tflite_model = converter.convert()
-```
+open("converted_model.tflite", "wb").write(tflite_model)
 freeze_graph --input_graph=/tmp/mobilenet_v1_224.pb \
  --input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \
  --input_binary=true \
  --output_graph=/tmp/frozen_mobilenet_v1_224.pb \
  --output_node_names=MobileNetV1/Predictions/Reshape_1
 ```
-Set the `input_binary` flag to `True` when reading a binary protobuf, a `.pb`
+You can [convert TensorFlow 2.0 models](../r2/convert) in a similar way.
 file. Set to `False` for a `.pbtxt` file.
-Set `input_graph` and `input_checkpoint` to the respective filenames. The
+The converter can also be used from the
-`output_node_names` may not be obvious outside of the code that built the model.
+[command line](../convert/cmdline_examples), but the Python API is recommended.
 The easiest way to find them is to visualize the graph, either with
 [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) or
 `graphviz`.
-The frozen `GraphDef` is now ready for conversion to the `FlatBuffer` format
+### Options
 (.tflite) for use on Android or iOS devices. For Android, the TensorFlow Lite
 Converter tool supports both float and quantized models. To convert the frozen
 `GraphDef` to the .tflite format use a command similar to the following:
-```
+The converter can convert from a variety of input types.
 tflite_convert \
  --output_file=/tmp/mobilenet_v1_1.0_224.tflite \
  --graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
  --input_arrays=input \
  --output_arrays=MobilenetV1/Predictions/Reshape_1
 ```
-The
+When [converting TensorFlow 1.x models](../convert/python_api), these are:
 [frozen_graph.pb](https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz)
 file used here is available for download. Setting the `input_array` and
 `output_array` arguments is not straightforward. The easiest way to find these
 values is to explore the graph using
 [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard). Reuse
 the arguments for specifying the output nodes for inference in the
 `freeze_graph` step.
-### Full converter reference
+*   [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model)
 *   Frozen GraphDef (models generated by
    [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py))
 *   [Keras](https://keras.io) HDF5 models
 *   Models taken from a `tf.Session`
-The [TensorFlow Lite Converter](../convert/index.md) can be
+When [converting TensorFlow 2.x models](../r2/convert/python_api), these are:
-[Python](../convert/python_api.md) or from the
+
-[command line](../convert/cmdline_examples.md). This allows you to integrate the
+*   [SavedModel directories](https://www.tensorflow.org/alpha/guide/saved_model)
-conversion step into the model design workflow, ensuring the model is easy to
+*   [`tf.keras` models](https://www.tensorflow.org/alpha/guide/keras/overview)
-convert to a mobile inference graph.
+*   [Concrete functions](../r2/convert/concrete_function.md)
 The converter can be configured to apply various optimizations that can improve
 performance or reduce file size. This is covered in section 4,
 [Optimize your model](#4_optimize_your_model_optional).
 ### Ops compatibility
-Refer to the [ops compatibility guide](ops_compatibility.md) for
+TensorFlow Lite currently supports a [limited subset](ops_compatibility.md) of
-troubleshooting help, and if that doesn't help, please
+TensorFlow operations. The long term goal is for all TensorFlow operations to be
-[file an issue](https://github.com/tensorflow/tensorflow/issues).
+supported.
-### Graph Visualization tool
+If the model you wish to convert contains unsupported operations, you can use
 [TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
 will result in a larger binary being deployed to devices.
-The [development repo](https://github.com/tensorflow/tensorflow) contains a tool
+## 3. Run inference with the model
 to visualize TensorFlow Lite models after conversion. To build the
 [visualize.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/visualize.py)
 tool:
-```sh
+<a id="3_use_the_tensorflow_lite_model_for_inference_in_a_mobile_app"></a>
-bazel run tensorflow/lite/tools:visualize -- model.tflite model_viz.html
+
 *Inference* is the process of running data through a model to obtain
 predictions. It requires a model, an interpreter, and input data.
 ### TensorFlow Lite interpreter
 The [TensorFlow Lite interpreter](inference.md) is a library that takes a model
 file, executes the operations it defines on input data, and provides access to
 the output.
 The interpreter works across multiple platforms and provides a simple API for
 running TensorFlow Lite models from Java, Swift, Objective-C, C++, and Python.
 The following code shows the interpreter being invoked from Java:
 ```java
 try (Interpreter interpreter = new Interpreter(tensorflow_lite_model_file)) {
  interpreter.run(input, output);
 }
 ```
-This generates an interactive HTML page listing subgraphs, operations, and a
+### GPU acceleration and Delegates
 graph visualization.
-## 3. Use the TensorFlow Lite model for inference in a mobile app
+Some devices provide hardware acceleration for machine learning operations. For
 example, most mobile phones have GPUs, which can perform floating point matrix
 operations faster than a CPU.
-After completing the prior steps, you should now have a `.tflite` model file.
+The speed-up can be substantial. For example, a MobileNet v1 image
 classification model runs 5.5x faster on a Pixel 3 phone when GPU acceleration
 is used.
-### Android
+The TensorFlow Lite interpreter can be configured with
 [Delegates](../performance/delegates.md) to make use of hardware acceleration on
 different devices. The [GPU Delegate](../performance/gpu.md) allows the
 interpreter to run appropriate operations on the device's GPU.
-Since Android apps are written in Java and the core TensorFlow library is in C++,
+The following code shows the GPU Delegate being used from Java:
 a JNI library is provided as an interface. This is only meant for inference—it
 provides the ability to load a graph, set up inputs, and run the model to
 calculate outputs.
-The open source Android demo app uses the JNI interface and is available
+```java
-[on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo/app).
+GpuDelegate delegate = new GpuDelegate();
-You can also download a
+Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
-[prebuilt APK](http://download.tensorflow.org/deps/tflite/TfLiteCameraDemo.apk).
+Interpreter interpreter = new Interpreter(tensorflow_lite_model_file, options);
-See the <a href="./android.md">Android demo</a> guide for details.
+try {
  interpreter.run(input, output);
 }
 ```
-The <a href="./android.md">Android mobile</a> guide has instructions for
+To add support for new hardware accelerators you can
-installing TensorFlow on Android and setting up `bazel` and Android Studio.
+[define your own delegate](../performance/delegates.md#how_to_add_a_delegate).
-### iOS
+### Android and iOS
-To integrate a TensorFlow model in an iOS app, see the
+The TensorFlow Lite interpreter is easy to use from both major mobile platforms.
-[TensorFlow Lite for iOS](ios.md) guide and <a href="./ios.md">iOS demo</a>
+To get started, explore the [Android quickstart](android.md) and
-guide.
+[iOS quickstart](ios.md) guides.
 [Example applications](https://www.tensorflow.org/lite/examples) are available
 for both platforms.
-#### Core ML support
+To obtain the required libraries, Android developers should use the
 [TensorFlow Lite AAR](android.md#use_the_tensorflow_lite_aar_from_jcenter). iOS
 developers should use the
 [CocoaPods for Swift or Objective-C](ios.md#add_tensorflow_lite_to_your_swift_or_objective-c_project).
-Core ML is a machine learning framework used in Apple products. In addition to
+### Linux
 using Tensorflow Lite models directly in your applications, you can convert
 trained Tensorflow models to the
 [CoreML](https://developer.apple.com/machine-learning/) format for use on Apple
 devices. To use the converter, refer to the
 [Tensorflow-CoreML converter documentation](https://github.com/tf-coreml/tf-coreml).
-### ARM32 and ARM64 Linux
+Embedded Linux is an important platform for deploying machine learning. We
 provide build instructions for both [Raspberry Pi](build_rpi.md) and
 [Arm64-based boards](build_arm64.md) such as Odroid C2, Pine64, and NanoPi.
-Compile Tensorflow Lite for a Raspberry Pi by following the
+### Microcontrollers
 [RPi build instructions](build_rpi.md) Compile Tensorflow Lite for a generic aarch64
 board such as Odroid C2, Pine64, NanoPi, and others by following the
 [ARM64 Linux build instructions](build_arm64.md) This compiles a static
 library file (`.a`) used to build your app. There are plans for Python bindings
 and a demo app.
-## 4. Optimize your model (optional)
+[TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md) is an
 experimental port of TensorFlow Lite aimed at microcontrollers and other devices
 with only kilobytes of memory.
-There are two options. If you plan to run on CPU, we recommend that you quantize
+### Operations
-your weights and activation tensors. If the hardware is available, another
+
-option is to run on GPU for massively parallelizable workloads.
+If your model requires TensorFlow operations that are not yet implemented in
 TensorFlow Lite, you can use [TensorFlow Select](ops_select.md) to use them in
 your model. You'll need to build a custom version of the interpreter that
 includes the TensorFlow operations.
 You can use [Custom operators](ops_custom.md) to write your own operations, or
 port new operations into TensorFlow Lite.
 [Operator versions](ops_version.md) allows you to add new functionalities and
 parameters into existing operations.
 ## 4. Optimize your model
 <a id="4_optimize_your_model_optional"></a>
 TensorFlow Lite provides tools to optimize the size and performance of your
 models, often with minimal impact on accuracy. Optimized models may require
 slightly more complex training, conversion, or integration.
 Machine learning optimization is an evolving field, and TensorFlow Lite's
 [Model Optimization Toolkit](#model-optimization-toolkit) is continually growing
 as new techniques are developed.
 ### Performance
 The goal of model optimization is to reach the ideal balance of performance,
 model size, and accuracy on a given device.
 [Performance best practices](../performance/best_practices.md) can help guide
 you through this process.
 ### Quantization
 Compress your model size by lowering the precision of the parameters (i.e.
 neural network weights) from their training-time 32-bit floating-point
 representations into much smaller and efficient 8-bit integer ones.
-This will execute the heaviest computations fast in lower precision, but the
+By reducing the precision of values and operations within a model, quantization
-most sensitive ones with higher precision, thus typically resulting in little to
+can reduce both the size of model and the time required for inference. For many
-no final accuracy losses for the task, yet a significant speed-up over pure
+models, there is only a minimal loss of accuracy.
 floating-point execution.
-The post-training quantization technique is integrated into the TensorFlow Lite
+The TensorFlow Lite converter makes it easy to quantize TensorFlow models. The
-conversion tool. Getting started is easy: after building your TensorFlow model,
+following Python code quantizes a `SavedModel` and saves it to disk:
-simply enable the ‘post_training_quantize’ flag in the TensorFlow Lite
+
-conversion tool. Assuming that the saved model is stored in saved_model_dir, the
+```python
-quantized tflite flatbuffer can be generated in command line:
+import tensorflow as tf
 ```
 converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
 converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
 tflite_quant_model = converter.convert()
 open("converted_model.tflite", "wb").write(tflite_quantized_model)
 ```
-Read the full documentation [here](../performance/post_training_quantization.md)
+To learn more about quantization, see
-and see a tutorial
+[Post-training quantization](../performance/post_training_quantization.md).
 [here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tutorials/post_training_quant.ipynb).
-### GPU
+### Model Optimization Toolkit
 Run on GPU GPUs are designed to have high throughput for massively
 parallelizable workloads. Thus, they are well-suited for deep neural nets, which
 consist of a huge number of operators, each working on some input tensor(s) that
 can be easily divided into smaller workloads and carried out in parallel,
 typically resulting in lower latency.
-Another benefit with GPU inference is its power efficiency. GPUs carry out the
+The [Model Optimization Toolkit](../performance/model_optimization.md) is a set
-computations in a very efficient and optimized manner, so that they consume less
+of tools and techniques designed to make it easy for developers to optimize
-power and generate less heat than when the same task is run on CPUs.
+their models. Many of the techniques can be applied to all TensorFlow models and
 are not specific to TensorFlow Lite, but they are especially valuable when
 running inference on devices with limited resources.
-Read the tutorial [here](../performance/gpu.md) and full documentation [here](../performance/gpu_advanced.md).
+## Next steps
 Now that you're familiar with TensorFlow Lite, explore some of the following
 resources:
 *   If you're a mobile developer, visit [Android quickstart](android.md) or
    [iOS quickstart](ios.md).
 *   Explore our [pre-trained models](../models).
 *   Try our [example apps](https://www.tensorflow.org/lite/examples).
--- a/tensorflow/lite/g3doc/guide/index.md
+++ b/tensorflow/lite/g3doc/guide/index.md
@ -1,202 +1,121 @@
 # TensorFlow Lite guide
-TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded
+TensorFlow Lite is a set of tools to help developers run TensorFlow models on
-devices. It enables on-device machine learning inference with low latency and a
+mobile, embedded, and IoT devices. It enables on-device machine learning
-small binary size. TensorFlow Lite also supports hardware acceleration with the
+inference with low latency and a small binary size.
 [Android Neural Networks
 API](https://developer.android.com/ndk/guides/neuralnetworks/index.html).
-TensorFlow Lite uses many techniques for achieving low latency such as
+TensorFlow Lite consists of two main components:
 optimizing the kernels for mobile apps, pre-fused activations, and quantized
 kernels that allow smaller and faster (fixed-point math) models.
-Most of our TensorFlow Lite documentation is [on
+-   The [TensorFlow Lite interpreter](inference.md), which runs specially
-GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite)
+    optimized models on many different hardware types, including mobile phones,
-for the time being.
+    embedded Linux devices, and microcontrollers.
 -   The [TensorFlow Lite converter](../convert/index.md), which converts
    TensorFlow models into an efficient form for use by the interpreter, and can
    introduce optimizations to improve binary size and performance.
-## What does TensorFlow Lite contain?
+### Machine learning at the edge
-TensorFlow Lite supports a set of core operators, both quantized and
+TensorFlow Lite is designed to make it easy to perform machine learning on
-float, which have been tuned for mobile platforms. They incorporate pre-fused
+devices, "at the edge" of the network, instead of sending data back and forth
-activations and biases to further enhance performance and quantized
+from a server. For developers, performing machine learning on-device can help
-accuracy. Additionally, TensorFlow Lite also supports using custom operations in
+improve:
 models.
-TensorFlow Lite defines a new model file format, based on
+*   *Latency:* there's no round-trip to a server
-[FlatBuffers](https://google.github.io/flatbuffers/). FlatBuffers is an
+*   *Privacy:* no data needs to leave the device
-efficient open-source cross-platform serialization library. It is similar to
+*   *Connectivity:* an Internet connection isn't required
-[protocol buffers](https://developers.google.com/protocol-buffers/?hl=en), but
+*   *Power consumption:* network connections are power hungry
 the primary difference is that FlatBuffers does not need a parsing/unpacking
 step to a secondary representation before you can access data, often coupled
 with per-object memory allocation. Also, the code footprint of FlatBuffers is an
 order of magnitude smaller than protocol buffers.
-TensorFlow Lite has a new mobile-optimized interpreter, which has the key goals
+TensorFlow Lite works with a huge range of devices, from tiny microcontrollers
-of keeping apps lean and fast. The interpreter uses a static graph ordering and
+to powerful mobile phones.
 a custom (less-dynamic) memory allocator to ensure minimal load, initialization,
 and execution latency.
-TensorFlow Lite provides an interface to leverage hardware acceleration, if
+Key Point: The TensorFlow Lite binary is smaller than 300KB when all supported
-available on the device. It does so via the
+operators are linked, and less than 200KB when using only the operators needed
-[Android Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html),
+for supporting the common image classification models InceptionV3 and MobileNet.
 available on Android 8.1 (API level 27) and higher.
-## Why do we need a new mobile-specific library?
+## Get started
-Machine Learning is changing the computing paradigm, and we see an emerging
+To begin working with TensorFlow Lite, visit [Get started](get_started.md).
 trend of new use cases on mobile and embedded devices. Consumer expectations are
 also trending toward natural, human-like interactions with their devices, driven
 by the camera and voice interaction models.
-There are several factors which are fueling interest in this domain:
+## Key features
- Innovation at the silicon layer is enabling new possibilities for hardware
+*   *[Interpreter](inference.md) tuned for on-device ML*, supporting a set of
-  acceleration, and frameworks such as the Android Neural Networks API make it
+    core operators that are optimized for on-device applications, and with a
-  easy to leverage these.
+    small binary size.
 *   *Diverse platform support*, covering [Android](android.md) and [iOS](ios.md)
    devices, embedded Linux, and microcontrollers, making use of platform APIs
    for accelerated inference.
 *   *APIs for multiple languages* including Java, Swift, Objective-C, C++, and
    Python.
 *   *High performance*, with [hardware acceleration](../performance/gpu.md) on
    supported devices, device-optimized kernels, and
    [pre-fused activations and biases](ops_compatibility.md).
 *   *Model optimization tools*, including
    [quantization](../performance/post_training_quantization.md), that can
    reduce size and increase performance of models without sacrificing accuracy.
 *   *Efficient model format*, using a [FlatBuffer](../convert/index.md) that is
    optimized for small size and portability.
 *   *[Pre-trained models](../models)* for common machine learning tasks that can
    be customized to your application.
 *   *[Samples and tutorials](https://www.tensorflow.org/examples)* that show you
    how to deploy machine learning models on supported platforms.
- Recent advances in real-time computer-vision and spoken language understanding
+## Development workflow
  have led to mobile-optimized benchmark models being open sourced
  (e.g. MobileNets, SqueezeNet).
- Widely-available smart appliances create new possibilities for
+The workflow for using TensorFlow Lite involves the following steps:
  on-device intelligence.
- Interest in stronger user data privacy paradigms where user data does not need
+1.  **Pick a model**
  to leave the mobile device.
- Ability to serve ‘offline’ use cases, where the device does not need to be
+    Bring your own TensorFlow model, find a model online, or pick a model from
-  connected to a network.
+    our [Pre-trained models](../models) to drop in or retrain.
-We believe the next wave of machine learning applications will have significant
+1.  **Convert the model**
 processing on mobile and embedded devices.
-## TensorFlow Lite highlights
+    If you're using a custom model, use the
    [TensorFlow Lite converter](../convert/index.md) and a few lines of Python
    to convert it to the TensorFlow Lite format.
-TensorFlow Lite provides:
+1.  **Deploy to your device**
- A set of core operators, both quantized and float, many of which have been
+    Run your model on-device with the
-  tuned for mobile platforms.  These can be used to create and run custom
+    [TensorFlow Lite interpreter](inference.md), with APIs in many languages.
  models.  Developers can also write their own custom operators and use them in
  models.
- A new [FlatBuffers](https://google.github.io/flatbuffers/)-based
+1.  **Optimize your model**
  model file format.
- On-device interpreter with kernels optimized for faster execution on mobile.
+    Use our [Model Optimization Toolkit](../performance/model_optimization.md)
    to reduce your model's size and increase its efficiency with minimal impact
    on accuracy.
- TensorFlow converter to convert TensorFlow-trained models to the TensorFlow
+To learn more about using TensorFlow Lite in your project, see
-  Lite format.
+[Get started](get_started.md).
- Smaller in size: TensorFlow Lite is smaller than 300KB when all supported
+## Technical constraints
  operators are linked and less than 200KB when using only the operators needed
  for supporting InceptionV3 and Mobilenet.
- **Pre-tested models:**
+TensorFlow Lite plans to provide high performance on-device inference for any
 TensorFlow model. However, the TensorFlow Lite interpreter currently supports a
 limited subset of TensorFlow operators that have been optimized for on-device
 use. This means that some models require additional steps to work with
 TensorFlow Lite.
-    All of the following models are guaranteed to work out of the box:
+To learn which operators are available, see
 [Operator compatibility](ops_compatibility.md).
-    - Inception V3, a popular model for detecting the dominant objects
+If your model uses operators that are not yet supported by TensorFlow Lite
-      present in an image.
+interpreter, you can use [TensorFlow Select](ops_select.md) to include
 TensorFlow operations in your TensorFlow Lite build. However, this will lead to
 an increased binary size.
-    - [MobileNets](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md),
+TensorFlow Lite does not currently support on-device training, but it is in our
-      a family of mobile-first computer vision models designed to effectively
+[Roadmap](roadmap.md), along with other planned improvements.
      maximize accuracy while being mindful of the restricted resources for an
      on-device or embedded application. They are small, low-latency, low-power
      models parameterized to meet the resource constraints of a variety of use
      cases. They can be built upon for classification, detection, embeddings
      and segmentation. MobileNet models are smaller but [lower in
      accuracy](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html)
      than Inception V3.
-    - On Device Smart Reply, an on-device model which provides one-touch
+## Next steps
      replies for an incoming text message by suggesting contextually relevant
      messages. The model was built specifically for memory constrained devices
      such as watches & phones and it has been successfully used to surface
      [Smart Replies on Android
      Wear](https://research.googleblog.com/2017/02/on-device-machine-intelligence.html)
      to all first-party and third-party apps.
-    Also see the complete list of
+Want to keep learning about TensorFlow Lite? Here are some next steps:
    [TensorFlow Lite's supported models](hosted_models.md),
    including the model sizes, performance numbers, and downloadable model files.
- Quantized versions of the MobileNet model, which runs faster than the
+*   Visit [Get started](get_started.md) to walk through the process of using
-  non-quantized (float) version on CPU.
+    TensorFlow Lite.
-
+*   If you're a mobile developer, visit [Android quickstart](android.md) or
- New Android demo app to illustrate the use of TensorFlow Lite with a quantized
+    [iOS quickstart](ios.md).
-  MobileNet model for object classification.
+*   Learn about
-
+    [TensorFlow Lite for Microcontrollers](../microcontrollers/overview.md).
- Java and C++ API support
+*   Explore our [pre-trained models](../models).
-
+*   Try our [example apps](https://www.tensorflow.org/lite/examples).
 ## Getting Started
 We recommend you try out TensorFlow Lite with the pre-tested models indicated
 above. If you have an existing model, you will need to test whether your model
 is compatible with both the converter and the supported operator set.  To test
 your model, see the
 [documentation on GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
 ### Retrain Inception-V3 or MobileNet for a custom data set
 The pre-trained models mentioned above have been trained on the ImageNet data
 set, which consists of 1000 predefined classes. If those classes are not
 relevant or useful for your use case, you will need to retrain those
 models. This technique is called transfer learning, which starts with a model
 that has been already trained on a problem and will then be retrained on a
 similar problem. Deep learning from scratch can take days, but transfer learning
 can be done fairly quickly. In order to do this, you'll need to generate your
 custom data set labeled with the relevant classes.
 The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/)
 codelab walks through this process step-by-step. The retraining code supports
 retraining for both floating point and quantized inference.
 ## TensorFlow Lite Architecture
 The following diagram shows the architectural design of TensorFlow Lite:
 <img src="https://www.tensorflow.org/images/tflite-architecture.jpg"
     alt="TensorFlow Lite architecture diagram"
     style="max-width:600px;">
 Starting with a trained TensorFlow model on disk, you'll convert that model to
 the TensorFlow Lite file format (`.tflite`) using the TensorFlow Lite
 Converter. Then you can use that converted file in your mobile application.
 Deploying the TensorFlow Lite model file uses:
 - Java API: A convenience wrapper around the C++ API on Android.
 - C++ API: Loads the TensorFlow Lite Model File and invokes the Interpreter. The
  same library is available on both Android and iOS.
 - Interpreter: Executes the model using a set of kernels. The interpreter
  supports selective kernel loading; without kernels it is only 100KB, and 300KB
  with all the kernels loaded. This is a significant reduction from the 1.5M
  required by TensorFlow Mobile.
 - On select Android devices, the Interpreter will use the Android Neural
  Networks API for hardware acceleration, or default to CPU execution if none
  are available.
 You can also implement custom kernels using the C++ API that can be used by the
 Interpreter.
 ## Future Work
 In future releases, TensorFlow Lite will support more models and built-in
 operators, contain performance improvements for both fixed point and floating
 point models, improvements to the tools to enable easier developer workflows and
 support for other smaller devices and more. As we continue development, we hope
 that TensorFlow Lite will greatly simplify the developer experience of targeting
 a model for small devices.
 Future plans include using specialized machine learning hardware to get the best
 possible performance for a particular model on a particular device.
 ## Next Steps
 The TensorFlow Lite [GitHub repository](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite).
 contains additional docs, code samples, and demo applications.
--- a/tensorflow/lite/g3doc/guide/inference.md
+++ b/tensorflow/lite/g3doc/guide/inference.md
@ -1,16 +1,15 @@
 # TensorFlow Lite inference
-[TOC]
+The term *inference* refers to the process of executing a TensorFlow Lite model
 on-device in order to make predictions based on input data. Inference is the
 final step in using the model on-device.
-## Overview
+Inference for TensorFlow Lite models is run through an interpreter. The
 TensorFlow Lite interpreter is designed to be lean and fast. The interpreter
 uses a static graph ordering and a custom (less-dynamic) memory allocator to
 ensure minimal load, initialization, and execution latency.
-TensorFlow Lite inference is the process of executing a TensorFlow Lite
+This document outlines the various APIs for the interpreter, along with the
 model on-device and extracting meaningful results from it. Inference is the
 final step in using the model on-device in the
 [architecture](index.md#tensorflow_lite_architecture).
 Inference for TensorFlow Lite models is run through an interpreter. This
 document outlines the various APIs for the interpreter along with the
 [supported platforms](#supported-platforms).
 ### Important Concepts
@ -43,19 +42,27 @@ TensorFlow Lite inference on device typically follows the following steps.
   present it to their user.
 ### Supported Platforms
 TensorFlow inference APIs are provided for most common mobile/embedded platforms
 such as Android, iOS and Linux.
 #### Android
 On Android, TensorFlow Lite inference can be performed using either Java or C++
 APIs. The Java APIs provide convenience and can be used directly within your
-Android Activity classes. The C++ APIs on the other hand may offer more
+Android Activity classes. The C++ APIs offer more flexibility and speed, but may
-flexibility and speed, but may require writing JNI wrappers to move data between
+require writing JNI wrappers to move data between Java and C++ layers.
-Java and C++ layers. You can find an example [here](android.md).
+
 Visit the [Android quickstart](android.md) for a tutorial and example code.
 #### iOS
-TensorFlow Lite provides Swift/Objective C++ APIs for inference on iOS. An
+
-example can be found [here](ios.md).
+TensorFlow Lite provides native iOS libraries written in
 [Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
 and
 [Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).
 Visit the [iOS quickstart](ios.md) for a tutorial and example code.
 #### Linux
 On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++
--- a/tensorflow/lite/g3doc/models/image_classification/overview.md
+++ b/tensorflow/lite/g3doc/models/image_classification/overview.md
@ -280,5 +280,5 @@ trees in the original training data. To do this, you will need a set of training
 images for each of the new labels you wish to train.
 Learn how to perform transfer learning in the
-<a href="https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/">TensorFlow
+<a href="https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android/#0">Recognize
-for Poets</a> codelab.
+flowers with TensorFlow</a> codelab.
--- a/tensorflow/lite/g3doc/models/smart_reply/overview.md
+++ b/tensorflow/lite/g3doc/models/smart_reply/overview.md
@ -13,12 +13,15 @@ starter model and labels</a>
 ### Sample application
-We have provided a pre-built APK that demonstrates the smart reply model on
+There is a TensorFlow Lite sample application that demonstrates the smart reply
-Android.
+model on Android.
-Go to the
+<a class="button button-primary" href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply">View
-<a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc">GitHub
+Android example</a>
-page</a> for instructions and list of supported ops and functionalities.
+
 Read the
 [GitHub page](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/models/smartreply/g3doc)
 to learn how the app works.
 ## How it works