Move docs for Python inference into guide/inference.md, and restructure that page to organize the load/run steps based on language.

PiperOrigin-RevId: 259778674
2019-07-24 11:16:56 -07:00 · 2019-07-24 11:16:56 -07:00 · b3aafbda35
commit b3aafbda35
parent 3198b9be2e
3 changed files with 253 additions and 233 deletions
--- a/tensorflow/lite/g3doc/convert/python_api.md
+++ b/tensorflow/lite/g3doc/convert/python_api.md
@ -1,9 +1,12 @@
 # Converter Python API guide

-This page provides examples on how to use the TensorFlow Lite Converter and the
-TensorFlow Lite interpreter using the Python API.
+This page describes how to convert TensorFlow models into the TensorFlow Lite
+format using the TensorFlow Lite Converter Python API.

-Note: These docs describe the converter in the TensorFlow nightly release,
+If you're looking for information about how to run a TensorFlow Lite model,
+see [TensorFlow Lite inference](../guide/inference.md).
+
+Note: This page describes the converter in the TensorFlow nightly release,
 installed using `pip install tf-nightly`. For docs describing older versions
 reference ["Converting models from TensorFlow 1.12"](#pre_tensorflow_1.12).

@ -20,13 +23,12 @@ be targeted to devices with mobile.
 ## API

 The API for converting TensorFlow models to TensorFlow Lite is
-`tf.lite.TFLiteConverter`. The API for calling the Python interpreter is
-`tf.lite.Interpreter`.
+`tf.lite.TFLiteConverter`, which provides class methods based on the original
+format of the model. For example, `TFLiteConverter.from_session()` is available
+for GraphDefs, `TFLiteConverter.from_saved_model()` is available for
+SavedModels, and `TFLiteConverter.from_keras_model_file()` is available for
+`tf.Keras` files.

-`TFLiteConverter` provides class methods based on the original format of the
-model. `TFLiteConverter.from_session()` is available for GraphDefs.
-`TFLiteConverter.from_saved_model()` is available for SavedModels.
-`TFLiteConverter.from_keras_model_file()` is available for `tf.Keras` files.
 Example usages for simple float-point models are shown in
 [Basic Examples](#basic). Examples usages for more complex models is shown in
 [Complex Examples](#complex).
@ -177,65 +179,6 @@ with tf.Session() as sess:
  open("converted_model.tflite", "wb").write(tflite_model)
 ```

-## TensorFlow Lite Python interpreter <a name="interpreter"></a>
-
-### Using the interpreter from a model file <a name="interpreter_file"></a>
-
-The following example shows how to use the TensorFlow Lite Python interpreter
-when provided a TensorFlow Lite FlatBuffer file. The example also demonstrates
-how to run inference on random input data. Run
-`help(tf.lite.Interpreter)` in the Python terminal to get detailed
-documentation on the interpreter.
-
-```python
-import numpy as np
-import tensorflow as tf
-
-# Load TFLite model and allocate tensors.
-interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
-interpreter.allocate_tensors()
-
-# Get input and output tensors.
-input_details = interpreter.get_input_details()
-output_details = interpreter.get_output_details()
-
-# Test model on random input data.
-input_shape = input_details[0]['shape']
-input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
-interpreter.set_tensor(input_details[0]['index'], input_data)
-
-interpreter.invoke()
-
-# The function `get_tensor()` returns a copy of the tensor data.
-# Use `tensor()` in order to get a pointer to the tensor.
-output_data = interpreter.get_tensor(output_details[0]['index'])
-print(output_data)
-```
-
-### Using the interpreter from model data <a name="interpreter_data"></a>
-
-The following example shows how to use the TensorFlow Lite Python interpreter
-when starting with the TensorFlow Lite Flatbuffer model previously loaded. This
-example shows an end-to-end use case, starting from building the TensorFlow
-model.
-
-```python
-import numpy as np
-import tensorflow as tf
-
-img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
-const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
-val = img + const
-out = tf.identity(val, name="out")
-
-with tf.Session() as sess:
-  converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
-  tflite_model = converter.convert()
-
-# Load TFLite model and allocate tensors.
-interpreter = tf.lite.Interpreter(model_content=tflite_model)
-interpreter.allocate_tensors()
-```

 ## Additional instructions

--- a/tensorflow/lite/g3doc/guide/get_started.md
+++ b/tensorflow/lite/g3doc/guide/get_started.md
@ -4,22 +4,27 @@ TensorFlow Lite provides all the tools you need to convert and run TensorFlow
 models on mobile, embedded, and IoT devices. The following guide walks through
 each step of the developer workflow and provides links to further instructions.

+[TOC]
+
 ## 1. Choose a model

 <a id="1_choose_a_model"></a>

-TensorFlow Lite allows you to run TensorFlow models on a wide range of devices.
 A TensorFlow model is a data structure that contains the logic and knowledge of
 a machine learning network trained to solve a particular problem.
-
 There are many ways to obtain a TensorFlow model, from using pre-trained models
-to training your own. To use a model with TensorFlow Lite it must be converted
-into a special format. This is explained in section 2,
-[Convert the model](#2_convert_the_model_format).
+to training your own.
+
+To use a model with TensorFlow Lite, you must convert a
+full TensorFlow model into the TensorFlow Lite format—you
+cannot create or train a model using TensorFlow Lite. So you must start with a
+regular TensorFlow model, and then
+[convert the model](#2_convert_the_model_format).
+
+Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not
+all models can be converted. For details, read about the
+[TensorFlow Lite operator compatibility](ops_compatibility.md).

-Note: Not all TensorFlow models will work with TensorFlow Lite, since the
-interpreter supports a limited subset of TensorFlow operations. See section 2,
-[Convert the model](#2_convert_the_model_format) to learn about compatibility.

 ### Use a pre-trained model

@ -60,35 +65,37 @@ flowers with TensorFlow</a> codelab.
 ### Train a custom model

 If you have designed and trained your own TensorFlow model, or you have trained
-a model obtained from another source, you should convert it to the TensorFlow
-Lite format before use.
+a model obtained from another source, you must
+[convert it to the TensorFlow Lite format](#2_convert_the_model_format).

 ## 2. Convert the model

 <a id="2_convert_the_model_format"></a>

-TensorFlow Lite is designed to execute models efficiently on devices. Some of
+TensorFlow Lite is designed to execute models efficiently on mobile and other
+embedded devices with limited compute and memory resources. Some of
 this efficiency comes from the use of a special format for storing models.
 TensorFlow models must be converted into this format before they can be used by
 TensorFlow Lite.

 Converting models reduces their file size and introduces optimizations that do
-not affect accuracy. Developers can opt to further reduce file size and increase
-speed of execution in exchange for some trade-offs. You can use the TensorFlow
-Lite converter to choose which optimizations to apply.
+not affect accuracy. The TensorFlow Lite converter provides options
+that allow you to further reduce file size and increase speed of execution, with
+some trade-offs.
+
+Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not
+all models can be converted. For details, read about the
+[TensorFlow Lite operator compatibility](ops_compatibility.md).

-TensorFlow Lite supports a limited subset of TensorFlow operations, so not all
-models can be converted. See [Ops compatibility](#ops-compatibility) for more
-information.

 ### TensorFlow Lite converter

-The [TensorFlow Lite converter](../convert) is a tool that converts trained
-TensorFlow models into the TensorFlow Lite format. It can also introduce
-optimizations, which are covered in section 4,
+The [TensorFlow Lite converter](../convert) is a tool available as a Python API
+that converts trained TensorFlow models into the TensorFlow Lite format. It can
+also introduce optimizations, which are covered in section 4,
 [Optimize your model](#4_optimize_your_model_optional).

-The converter is available as a Python API. The following example shows a
+The following example shows a
 TensorFlow `SavedModel` being converted into the TensorFlow Lite format:

 ```python
@ -128,9 +135,9 @@ performance or reduce file size. This is covered in section 4,

 ### Ops compatibility

-TensorFlow Lite currently supports a [limited subset](ops_compatibility.md) of
-TensorFlow operations. The long term goal is for all TensorFlow operations to be
-supported.
+TensorFlow Lite currently supports a [limited subset of TensorFlow
+operations](ops_compatibility.md). The long term goal is for all TensorFlow
+operations to be supported.

 If the model you wish to convert contains unsupported operations, you can use
 [TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
--- a/tensorflow/lite/g3doc/guide/inference.md
+++ b/tensorflow/lite/g3doc/guide/inference.md
@ -1,91 +1,104 @@
 # TensorFlow Lite inference

 The term *inference* refers to the process of executing a TensorFlow Lite model
-on-device in order to make predictions based on input data. Inference is the
-final step in using the model on-device.
+on-device in order to make predictions based on input data. To perform an
+inference with a TensorFlow Lite model, you must run it through an
+*interpreter*. The TensorFlow Lite interpreter is designed to be lean and fast.
+The interpreter uses a static graph ordering and a custom (less-dynamic) memory
+allocator to ensure minimal load, initialization, and execution latency.

-Inference for TensorFlow Lite models is run through an interpreter. The
-TensorFlow Lite interpreter is designed to be lean and fast. The interpreter
-uses a static graph ordering and a custom (less-dynamic) memory allocator to
-ensure minimal load, initialization, and execution latency.
+This page describes how to access to the TensorFlow Lite interpreter and
+perform an inference using C++, Java, and Python, plus links to other resources
+for each [supported platform](#supported-platforms).

-This document outlines the various APIs for the interpreter, along with the
-[supported platforms](#supported-platforms).
+[TOC]

-### Important Concepts
+## Important concepts

-TensorFlow Lite inference on device typically follows the following steps.
+TensorFlow Lite inference typically follows the following steps:

-1. **Loading a Model**
+1. **Loading a model**

-   The user loads the `.tflite` model into memory which contains the model's
+   You must load the `.tflite` model into memory, which contains the model's
   execution graph.

-1. **Transforming Data**
-   Input data acquired by the user generally may not match the input data format
-   expected by the model. For eg., a user may need to resize an image or change
-   the image format to be used by the model.
+1. **Transforming data**

-1. **Running Inference**
+   Raw input data for the model generally does not match the input data format
+   expected by the model. For example, you might need to resize an image or
+   change the image format to be compatible with the model.

-   This step involves using the API to execute the model. It involves a few
-   steps such as building the interpreter, and allocating tensors as explained
-   in detail in [Running a Model](#running_a_model).
+1. **Running inference**

-1. **Interpreting Output**
+   This step involves using the TensorFlow Lite API to execute the model. It
+   involves a few steps such as building the interpreter, and allocating
+   tensors, as described in the following sections.

-   The user retrieves results from model inference and interprets the tensors in
-   a meaningful way to be used in the application.
+1. **Interpreting output**

-   For example, a model may only return a list of probabilities. It is up to the
-   application developer to meaningully map them to relevant categories and
-   present it to their user.
+   When you receive results from the model inference, you must interpret the
+   tensors in a meaningful way that's useful in your application.

-### Supported Platforms
+   For example, a model might return only a list of probabilities. It's up to
+   you to map the probabilities to relevant categories and present it to your
+   end-user.
+
+## Supported platforms

 TensorFlow inference APIs are provided for most common mobile/embedded platforms
-such as Android, iOS and Linux.
+such as Android, iOS and Linux, in multiple programming languages.

-#### Android
+In most cases, the API design reflects a preference for performance over ease of
+use. TensorFlow Lite is designed for fast inference on small devices, so it
+should be no surprise that the APIs try to avoid unnecessary copies at the
+expense of convenience. Similarly, consistency with TensorFlow APIs was not an
+explicit goal and some variance between languages is to be expected.
+
+Across all libraries, the TensorFlow Lite API enables you to load models,
+feed inputs, and retrieve inference outputs.
+
+### Android

 On Android, TensorFlow Lite inference can be performed using either Java or C++
 APIs. The Java APIs provide convenience and can be used directly within your
 Android Activity classes. The C++ APIs offer more flexibility and speed, but may
 require writing JNI wrappers to move data between Java and C++ layers.

-Visit the [Android quickstart](android.md) for a tutorial and example code.
+See below for details about using C++ and Java, or
+follow the [Android quickstart](android.md) for a tutorial and example code.

-#### iOS
+### iOS

-TensorFlow Lite provides native iOS libraries written in
+On iOS, TensorFlow Lite is available with native iOS libraries written in
 [Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
 and
 [Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).

-Visit the [iOS quickstart](ios.md) for a tutorial and example code.
+This page doesn't include a discussion for about these languages, so you should
+refer to the [iOS quickstart](ios.md) for a tutorial and example code.

-#### Linux
-On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++
-and Python APIs can be used to run inference.
+### Linux
+
+On Linux platforms (including [Raspberry Pi](build_rpi.md)), you can run
+inferences using TensorFlow Lite APIs available in C++ and Python, as shown
+in the following sections.


-## API Guides
+## Load and run a model in C++

-TensorFlow Lite provides programming APIs in C++, Java and Python, with
-experimental bindings for several other languages (C, Swift, Objective-C). In
-most cases, the API design reflects a preference for performance over ease of
-use. TensorFlow Lite is designed for fast inference on small devices so it
-should be no surprise that the APIs try to avoid unnecessary copies at the
-expense of convenience. Similarly, consistency with TensorFlow APIs was not an
-explicit goal and some variance is to be expected.
+Running a TensorFlow Lite model with C++ involves a few simple steps:

-There is also a [Python API for TensorFlow Lite](../convert/python_api.md).
+  1. Load the model into memory as a `FlatBufferModel`.
+  2. Build an `Interpreter` based on an existing `FlatBufferModel`.
+  3. Set input tensor values. (Optionally resize input tensors if the
+     predefined sizes are not desired.)
+  4. Invoke inference.
+  5. Read output tensor values.

-### Loading a Model
-
-#### C++
-The `FlatBufferModel` class encapsulates a model and can be built in a couple of
-slightly different ways depending on where the model is stored:
+The [`FlatBufferModel`](
+https://www.tensorflow.org/lite/api_docs/cc/class/tflite/flat-buffer-model.html)
+class encapsulates a TensorFlow Lite model and you can
+build it in a couple of different ways, depending on where the model is stored:

 ```c++
 class FlatBufferModel {
@ -104,72 +117,36 @@ class FlatBufferModel {
 };
 ```

-```c++
-tflite::FlatBufferModel model(path_to_model);
-```
+Note: If TensorFlow Lite detects the presence of the [Android NNAPI](
+https://developer.android.com/ndk/guides/neuralnetworks), it will
+automatically try to use shared memory to store the `FlatBufferModel`.

-Note that if TensorFlow Lite detects the presence of Android's NNAPI it will
-automatically try to use shared memory to store the FlatBufferModel.
+Now that you have the model as a `FlatBufferModel` object, you can execute it
+with an [`Interpreter`](
+https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter.html).
+A single `FlatBufferModel` can be used
+simultaneously by more than one `Interpreter`.

-#### Java
+Caution: The `FlatBufferModel` object must remain valid until
+all instances of `Interpreter` using it have been destroyed.

-TensorFlow Lite's Java API supports on-device inference and is provided as an
-Android Studio Library that allows loading models, feeding inputs, and
-retrieving inference outputs.
-
-The `Interpreter` class drives model inference with TensorFlow Lite. In
-most of the cases, this is the only class an app developer will need.
-
-The `Interpreter` can be initialized with a model file using the constructor:
-
-```java
-public Interpreter(@NotNull File modelFile);
-```
-
-or with a `MappedByteBuffer`:
-
-```java
-public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);
-```
-
-In both cases a valid TensorFlow Lite model must be provided or an
-`IllegalArgumentException` with be thrown. If a `MappedByteBuffer` is used to
-initialize an Interpreter, it should remain unchanged for the whole lifetime of
-the `Interpreter`.
-
-### Running a Model {#running_a_model}
-
-#### C++
-Running a model involves a few simple steps:
-
-  * Build an `Interpreter` based on an existing `FlatBufferModel`
-  * Optionally resize input tensors if the predefined sizes are not desired.
-  * Set input tensor values
-  * Invoke inference
-  * Read output tensor values
-
-The important parts of public interface of the `Interpreter` are provided
-below. It should be noted that:
+The important parts of the `Interpreter` API are shown in the
+code snippet below. It should be noted that:

  * Tensors are represented by integers, in order to avoid string comparisons
    (and any fixed dependency on string libraries).
  * An interpreter must not be accessed from concurrent threads.
  * Memory allocation for input and output tensors must be triggered
-    by calling AllocateTensors() right after resizing tensors.
+    by calling `AllocateTensors()` right after resizing tensors.

-In order to run the inference model in TensorFlow Lite, one has to load the
-model into a `FlatBufferModel` object which then can be executed by an
-`Interpreter`.  The `FlatBufferModel` needs to remain valid for the whole
-lifetime of the `Interpreter`, and a single `FlatBufferModel` can be
-simultaneously used by more than one `Interpreter`. In concrete terms, the
-`FlatBufferModel` object must be created before any `Interpreter` objects that
-use it, and must be kept around until they have all been destroyed.
-
-The simplest usage of TensorFlow Lite will look like this:
+The simplest usage of TensorFlow Lite with C++ looks like this:

 ```c++
-tflite::FlatBufferModel model(path_to_model);
+// Load the model
+std::unique_ptr<tflite::FlatBufferModel> model =
+    tflite::FlatBufferModel::BuildFromFile(filename);

+// Build the interpreter
 tflite::ops::builtin::BuiltinOpResolver resolver;
 std::unique_ptr<tflite::Interpreter> interpreter;
 tflite::InterpreterBuilder(*model, resolver)(&interpreter);
@ -185,9 +162,40 @@ interpreter->Invoke();
 float* output = interpreter->typed_output_tensor<float>(0);
 ```

-#### Java
+For more example code, see [`minimal.cc`](
+https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/minimal/minimal.cc)
+and [`label_image.cc`](
+https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/label_image.cc).

-The simplest usage of Tensorflow Lite Java API looks like this:
+
+## Load and run a model in Java
+
+The Java API for running an inference with TensorFlow Lite is primarily designed
+for use with Android, so it's available as an Android library dependency:
+`org.tensorflow:tensorflow-lite`.
+
+In Java, you'll use the `Interpreter` class to load a model and drive model
+inference. In many cases, this may be the only API you need.
+
+You can initialize an `Interpreter` using a `.tflite` file:
+
+```java
+public Interpreter(@NotNull File modelFile);
+```
+
+Or with a `MappedByteBuffer`:
+
+```java
+public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);
+```
+
+In both cases, you must provide a valid TensorFlow Lite model or the API throws
+`IllegalArgumentException`. If you use `MappedByteBuffer` to
+initialize an `Interpreter`, it must remain unchanged for the whole lifetime
+of the `Interpreter`.
+
+To then run an inference with the model, simply call `Interpreter.run()`.
+For example:

 ```java
 try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
@ -195,48 +203,44 @@ try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model))
 }
 ```

-If a model takes only one input and returns only one output, the following will
-trigger an inference run:
-
-```java
-interpreter.run(input, output);
-```
-
-For models with multiple inputs, or multiple outputs, use:
+The `run()` method takes only one input and returns only one output. So if your
+model has multiple inputs or multiple outputs, instead use:

 ```java
 interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);
 ```

-where each entry in `inputs` corresponds to an input tensor and
+In this case, each entry in `inputs` corresponds to an input tensor and
 `map_of_indices_to_outputs` maps indices of output tensors to the corresponding
-output data. In both cases the tensor indices should correspond to the values
-given to the
-[TensorFlow Lite Optimized Converter](../convert/cmdline_examples.md) when the
-model was created. Be aware that the order of tensors in `input` must match the
-order given to the `TensorFlow Lite Optimized Converter`.
+output data.

-The Java API also provides convenient functions for app developers to get the
-index of any model input or output using a tensor name:
+In both cases, the tensor indices should correspond to the values you gave to
+the [TensorFlow Lite Converter](../convert/) when you created the model.
+Be aware that the order of tensors in `input` must match the
+order given to the TensorFlow Lite Converter.
+
+The `Interpreter` class also provides convenient functions for you to get the
+index of any model input or output using an operation name:

 ```java
-public int getInputIndex(String tensorName);
-public int getOutputIndex(String tensorName);
+public int getInputIndex(String opName);
+public int getOutputIndex(String opName);
 ```

-If tensorName is not a valid name in model, an `IllegalArgumentException` will
-be thrown.
+If `opName` is not a valid operation in the model, it throws an
+`IllegalArgumentException`.

-##### Releasing Resources After Use
-
-An `Interpreter` owns resources. To avoid memory leak, the resources must be
-released after use by:
+Also beware that `Interpreter` owns resources. To avoid memory leak, the
+resources must be released after use by:

 ```java
 interpreter.close();
 ```

-##### Supported Data Types
+For an example project with Java, see the [Android image classification sample](
+https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android).
+
+### Supported data types (in Java)

 To use TensorFlow Lite, the data types of the input and output tensors must be
 one of the following primitive types:
@ -256,7 +260,7 @@ provided as a single, flat `ByteBuffer` argument.
 If other data types, including boxed types like `Integer` and `Float`, are used,
 an `IllegalArgumentException` will be thrown.

-##### Inputs
+#### Inputs

 Each input should be an array or multi-dimensional array of the supported
 primitive types, or a raw `ByteBuffer` of the appropriate size. If the input is
@ -265,12 +269,12 @@ implicitly resized to the array's dimensions at inference time. If the input is
 a ByteBuffer, the caller should first manually resize the associated input
 tensor (via `Interpreter.resizeInput()`) before running inference.

-When using 'ByteBuffer', prefer using direct byte buffers, as this allows the
+When using `ByteBuffer`, prefer using direct byte buffers, as this allows the
 `Interpreter` to avoid unnecessary copies. If the `ByteBuffer` is a direct byte
 buffer, its order must be `ByteOrder.nativeOrder()`. After it is used for a
 model inference, it must remain unchanged until the model inference is finished.

-##### Outputs
+#### Outputs

 Each output should be an array or multi-dimensional array of the supported
 primitive types, or a ByteBuffer of the appropriate size. Note that some models
@ -279,7 +283,75 @@ the input. There's no straightforward way of handling this with the existing
 Java inference API, but planned extensions will make this possible.


-## Writing Custom Operators
+## Load and run a model in Python
+
+The Python API for running an inference is provided in the `tf.lite`
+module. From which, you mostly need only [`tf.lite.Interpreter`](
+https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter) to load
+a model and run an inference.
+
+The following example shows how to use the Python interpreter to load a
+`.tflite` file and run inference with random input data:
+
+```python
+import numpy as np
+import tensorflow as tf
+
+# Load TFLite model and allocate tensors.
+interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
+interpreter.allocate_tensors()
+
+# Get input and output tensors.
+input_details = interpreter.get_input_details()
+output_details = interpreter.get_output_details()
+
+# Test model on random input data.
+input_shape = input_details[0]['shape']
+input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
+interpreter.set_tensor(input_details[0]['index'], input_data)
+
+interpreter.invoke()
+
+# The function `get_tensor()` returns a copy of the tensor data.
+# Use `tensor()` in order to get a pointer to the tensor.
+output_data = interpreter.get_tensor(output_details[0]['index'])
+print(output_data)
+```
+
+Alternative to loading the model as a pre-converted `.tflite` file, you can
+combine your code with the [TensorFlow Lite Converter Python API](
+../convert/python_api.md) (`tf.lite.TFLiteConverter`), allowing you to convert
+your TensorFlow model into the TensorFlow Lite format and then run an inference:
+
+```python
+import numpy as np
+import tensorflow as tf
+
+img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
+const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
+val = img + const
+out = tf.identity(val, name="out")
+
+# Convert to TF Lite format
+with tf.Session() as sess:
+  converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
+  tflite_model = converter.convert()
+
+# Load TFLite model and allocate tensors.
+interpreter = tf.lite.Interpreter(model_content=tflite_model)
+interpreter.allocate_tensors()
+
+# Continue to get tensors and so forth, as shown above...
+```
+
+For more Python sample code, see [`label_image.py`](
+https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py).
+
+Tip: Run `help(tf.lite.Interpreter)` in the Python terminal to get detailed
+documentation about the interpreter.
+
+
+## Write a custom operator

 All TensorFlow Lite operators (both custom and builtin) are defined using a
 simple pure-C interface that consists of four functions:
@ -343,7 +415,7 @@ Note that registration is not automatic and an explicit call to
 registration of builtins, custom ops will have to be collected in separate
 custom libraries.

-### Customizing the kernel library
+### Customize the kernel library

 Behind the scenes the interpreter will load a library of kernels which will be
 assigned to execute each of the operators in the model. While the default
@ -362,21 +434,19 @@ class OpResolver {
 };
 ```

-Regular usage will require the developer to use the `BuiltinOpResolver` and
-write:
+Regular usage requires that you use the `BuiltinOpResolver` and write:

 ```c++
 tflite::ops::builtin::BuiltinOpResolver resolver;
 ```

-They can then optionally register custom ops:
+You can optionally register custom ops (before you pass the resolver to the
+`InterpreterBuilder`):

 ```c++
 resolver.AddOp("MY_CUSTOM_OP", Register_MY_CUSTOM_OP());
 ```

-before the resolver is passed to the `InterpreterBuilder`.
-
 If the set of builtin ops is deemed to be too large, a new `OpResolver` could
 be code-generated  based on a given subset of ops, possibly only the ones
 contained in a given model. This is the equivalent of TensorFlow's selective