Move docs for Python inference into guide/inference.md, and restructure that page to organize the load/run steps based on language.
PiperOrigin-RevId: 259778674
This commit is contained in:
parent
3198b9be2e
commit
b3aafbda35
@ -1,9 +1,12 @@
|
|||||||
# Converter Python API guide
|
# Converter Python API guide
|
||||||
|
|
||||||
This page provides examples on how to use the TensorFlow Lite Converter and the
|
This page describes how to convert TensorFlow models into the TensorFlow Lite
|
||||||
TensorFlow Lite interpreter using the Python API.
|
format using the TensorFlow Lite Converter Python API.
|
||||||
|
|
||||||
Note: These docs describe the converter in the TensorFlow nightly release,
|
If you're looking for information about how to run a TensorFlow Lite model,
|
||||||
|
see [TensorFlow Lite inference](../guide/inference.md).
|
||||||
|
|
||||||
|
Note: This page describes the converter in the TensorFlow nightly release,
|
||||||
installed using `pip install tf-nightly`. For docs describing older versions
|
installed using `pip install tf-nightly`. For docs describing older versions
|
||||||
reference ["Converting models from TensorFlow 1.12"](#pre_tensorflow_1.12).
|
reference ["Converting models from TensorFlow 1.12"](#pre_tensorflow_1.12).
|
||||||
|
|
||||||
@ -20,13 +23,12 @@ be targeted to devices with mobile.
|
|||||||
## API
|
## API
|
||||||
|
|
||||||
The API for converting TensorFlow models to TensorFlow Lite is
|
The API for converting TensorFlow models to TensorFlow Lite is
|
||||||
`tf.lite.TFLiteConverter`. The API for calling the Python interpreter is
|
`tf.lite.TFLiteConverter`, which provides class methods based on the original
|
||||||
`tf.lite.Interpreter`.
|
format of the model. For example, `TFLiteConverter.from_session()` is available
|
||||||
|
for GraphDefs, `TFLiteConverter.from_saved_model()` is available for
|
||||||
|
SavedModels, and `TFLiteConverter.from_keras_model_file()` is available for
|
||||||
|
`tf.Keras` files.
|
||||||
|
|
||||||
`TFLiteConverter` provides class methods based on the original format of the
|
|
||||||
model. `TFLiteConverter.from_session()` is available for GraphDefs.
|
|
||||||
`TFLiteConverter.from_saved_model()` is available for SavedModels.
|
|
||||||
`TFLiteConverter.from_keras_model_file()` is available for `tf.Keras` files.
|
|
||||||
Example usages for simple float-point models are shown in
|
Example usages for simple float-point models are shown in
|
||||||
[Basic Examples](#basic). Examples usages for more complex models is shown in
|
[Basic Examples](#basic). Examples usages for more complex models is shown in
|
||||||
[Complex Examples](#complex).
|
[Complex Examples](#complex).
|
||||||
@ -177,65 +179,6 @@ with tf.Session() as sess:
|
|||||||
open("converted_model.tflite", "wb").write(tflite_model)
|
open("converted_model.tflite", "wb").write(tflite_model)
|
||||||
```
|
```
|
||||||
|
|
||||||
## TensorFlow Lite Python interpreter <a name="interpreter"></a>
|
|
||||||
|
|
||||||
### Using the interpreter from a model file <a name="interpreter_file"></a>
|
|
||||||
|
|
||||||
The following example shows how to use the TensorFlow Lite Python interpreter
|
|
||||||
when provided a TensorFlow Lite FlatBuffer file. The example also demonstrates
|
|
||||||
how to run inference on random input data. Run
|
|
||||||
`help(tf.lite.Interpreter)` in the Python terminal to get detailed
|
|
||||||
documentation on the interpreter.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import numpy as np
|
|
||||||
import tensorflow as tf
|
|
||||||
|
|
||||||
# Load TFLite model and allocate tensors.
|
|
||||||
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
|
|
||||||
interpreter.allocate_tensors()
|
|
||||||
|
|
||||||
# Get input and output tensors.
|
|
||||||
input_details = interpreter.get_input_details()
|
|
||||||
output_details = interpreter.get_output_details()
|
|
||||||
|
|
||||||
# Test model on random input data.
|
|
||||||
input_shape = input_details[0]['shape']
|
|
||||||
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
|
|
||||||
interpreter.set_tensor(input_details[0]['index'], input_data)
|
|
||||||
|
|
||||||
interpreter.invoke()
|
|
||||||
|
|
||||||
# The function `get_tensor()` returns a copy of the tensor data.
|
|
||||||
# Use `tensor()` in order to get a pointer to the tensor.
|
|
||||||
output_data = interpreter.get_tensor(output_details[0]['index'])
|
|
||||||
print(output_data)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Using the interpreter from model data <a name="interpreter_data"></a>
|
|
||||||
|
|
||||||
The following example shows how to use the TensorFlow Lite Python interpreter
|
|
||||||
when starting with the TensorFlow Lite Flatbuffer model previously loaded. This
|
|
||||||
example shows an end-to-end use case, starting from building the TensorFlow
|
|
||||||
model.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import numpy as np
|
|
||||||
import tensorflow as tf
|
|
||||||
|
|
||||||
img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
|
|
||||||
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
|
|
||||||
val = img + const
|
|
||||||
out = tf.identity(val, name="out")
|
|
||||||
|
|
||||||
with tf.Session() as sess:
|
|
||||||
converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
|
|
||||||
tflite_model = converter.convert()
|
|
||||||
|
|
||||||
# Load TFLite model and allocate tensors.
|
|
||||||
interpreter = tf.lite.Interpreter(model_content=tflite_model)
|
|
||||||
interpreter.allocate_tensors()
|
|
||||||
```
|
|
||||||
|
|
||||||
## Additional instructions
|
## Additional instructions
|
||||||
|
|
||||||
|
|||||||
@ -4,22 +4,27 @@ TensorFlow Lite provides all the tools you need to convert and run TensorFlow
|
|||||||
models on mobile, embedded, and IoT devices. The following guide walks through
|
models on mobile, embedded, and IoT devices. The following guide walks through
|
||||||
each step of the developer workflow and provides links to further instructions.
|
each step of the developer workflow and provides links to further instructions.
|
||||||
|
|
||||||
|
[TOC]
|
||||||
|
|
||||||
## 1. Choose a model
|
## 1. Choose a model
|
||||||
|
|
||||||
<a id="1_choose_a_model"></a>
|
<a id="1_choose_a_model"></a>
|
||||||
|
|
||||||
TensorFlow Lite allows you to run TensorFlow models on a wide range of devices.
|
|
||||||
A TensorFlow model is a data structure that contains the logic and knowledge of
|
A TensorFlow model is a data structure that contains the logic and knowledge of
|
||||||
a machine learning network trained to solve a particular problem.
|
a machine learning network trained to solve a particular problem.
|
||||||
|
|
||||||
There are many ways to obtain a TensorFlow model, from using pre-trained models
|
There are many ways to obtain a TensorFlow model, from using pre-trained models
|
||||||
to training your own. To use a model with TensorFlow Lite it must be converted
|
to training your own.
|
||||||
into a special format. This is explained in section 2,
|
|
||||||
[Convert the model](#2_convert_the_model_format).
|
To use a model with TensorFlow Lite, you must convert a
|
||||||
|
full TensorFlow model into the TensorFlow Lite format—you
|
||||||
|
cannot create or train a model using TensorFlow Lite. So you must start with a
|
||||||
|
regular TensorFlow model, and then
|
||||||
|
[convert the model](#2_convert_the_model_format).
|
||||||
|
|
||||||
|
Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not
|
||||||
|
all models can be converted. For details, read about the
|
||||||
|
[TensorFlow Lite operator compatibility](ops_compatibility.md).
|
||||||
|
|
||||||
Note: Not all TensorFlow models will work with TensorFlow Lite, since the
|
|
||||||
interpreter supports a limited subset of TensorFlow operations. See section 2,
|
|
||||||
[Convert the model](#2_convert_the_model_format) to learn about compatibility.
|
|
||||||
|
|
||||||
### Use a pre-trained model
|
### Use a pre-trained model
|
||||||
|
|
||||||
@ -60,35 +65,37 @@ flowers with TensorFlow</a> codelab.
|
|||||||
### Train a custom model
|
### Train a custom model
|
||||||
|
|
||||||
If you have designed and trained your own TensorFlow model, or you have trained
|
If you have designed and trained your own TensorFlow model, or you have trained
|
||||||
a model obtained from another source, you should convert it to the TensorFlow
|
a model obtained from another source, you must
|
||||||
Lite format before use.
|
[convert it to the TensorFlow Lite format](#2_convert_the_model_format).
|
||||||
|
|
||||||
## 2. Convert the model
|
## 2. Convert the model
|
||||||
|
|
||||||
<a id="2_convert_the_model_format"></a>
|
<a id="2_convert_the_model_format"></a>
|
||||||
|
|
||||||
TensorFlow Lite is designed to execute models efficiently on devices. Some of
|
TensorFlow Lite is designed to execute models efficiently on mobile and other
|
||||||
|
embedded devices with limited compute and memory resources. Some of
|
||||||
this efficiency comes from the use of a special format for storing models.
|
this efficiency comes from the use of a special format for storing models.
|
||||||
TensorFlow models must be converted into this format before they can be used by
|
TensorFlow models must be converted into this format before they can be used by
|
||||||
TensorFlow Lite.
|
TensorFlow Lite.
|
||||||
|
|
||||||
Converting models reduces their file size and introduces optimizations that do
|
Converting models reduces their file size and introduces optimizations that do
|
||||||
not affect accuracy. Developers can opt to further reduce file size and increase
|
not affect accuracy. The TensorFlow Lite converter provides options
|
||||||
speed of execution in exchange for some trade-offs. You can use the TensorFlow
|
that allow you to further reduce file size and increase speed of execution, with
|
||||||
Lite converter to choose which optimizations to apply.
|
some trade-offs.
|
||||||
|
|
||||||
|
Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not
|
||||||
|
all models can be converted. For details, read about the
|
||||||
|
[TensorFlow Lite operator compatibility](ops_compatibility.md).
|
||||||
|
|
||||||
TensorFlow Lite supports a limited subset of TensorFlow operations, so not all
|
|
||||||
models can be converted. See [Ops compatibility](#ops-compatibility) for more
|
|
||||||
information.
|
|
||||||
|
|
||||||
### TensorFlow Lite converter
|
### TensorFlow Lite converter
|
||||||
|
|
||||||
The [TensorFlow Lite converter](../convert) is a tool that converts trained
|
The [TensorFlow Lite converter](../convert) is a tool available as a Python API
|
||||||
TensorFlow models into the TensorFlow Lite format. It can also introduce
|
that converts trained TensorFlow models into the TensorFlow Lite format. It can
|
||||||
optimizations, which are covered in section 4,
|
also introduce optimizations, which are covered in section 4,
|
||||||
[Optimize your model](#4_optimize_your_model_optional).
|
[Optimize your model](#4_optimize_your_model_optional).
|
||||||
|
|
||||||
The converter is available as a Python API. The following example shows a
|
The following example shows a
|
||||||
TensorFlow `SavedModel` being converted into the TensorFlow Lite format:
|
TensorFlow `SavedModel` being converted into the TensorFlow Lite format:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@ -128,9 +135,9 @@ performance or reduce file size. This is covered in section 4,
|
|||||||
|
|
||||||
### Ops compatibility
|
### Ops compatibility
|
||||||
|
|
||||||
TensorFlow Lite currently supports a [limited subset](ops_compatibility.md) of
|
TensorFlow Lite currently supports a [limited subset of TensorFlow
|
||||||
TensorFlow operations. The long term goal is for all TensorFlow operations to be
|
operations](ops_compatibility.md). The long term goal is for all TensorFlow
|
||||||
supported.
|
operations to be supported.
|
||||||
|
|
||||||
If the model you wish to convert contains unsupported operations, you can use
|
If the model you wish to convert contains unsupported operations, you can use
|
||||||
[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
|
[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
|
||||||
|
|||||||
@ -1,91 +1,104 @@
|
|||||||
# TensorFlow Lite inference
|
# TensorFlow Lite inference
|
||||||
|
|
||||||
The term *inference* refers to the process of executing a TensorFlow Lite model
|
The term *inference* refers to the process of executing a TensorFlow Lite model
|
||||||
on-device in order to make predictions based on input data. Inference is the
|
on-device in order to make predictions based on input data. To perform an
|
||||||
final step in using the model on-device.
|
inference with a TensorFlow Lite model, you must run it through an
|
||||||
|
*interpreter*. The TensorFlow Lite interpreter is designed to be lean and fast.
|
||||||
|
The interpreter uses a static graph ordering and a custom (less-dynamic) memory
|
||||||
|
allocator to ensure minimal load, initialization, and execution latency.
|
||||||
|
|
||||||
Inference for TensorFlow Lite models is run through an interpreter. The
|
This page describes how to access to the TensorFlow Lite interpreter and
|
||||||
TensorFlow Lite interpreter is designed to be lean and fast. The interpreter
|
perform an inference using C++, Java, and Python, plus links to other resources
|
||||||
uses a static graph ordering and a custom (less-dynamic) memory allocator to
|
for each [supported platform](#supported-platforms).
|
||||||
ensure minimal load, initialization, and execution latency.
|
|
||||||
|
|
||||||
This document outlines the various APIs for the interpreter, along with the
|
[TOC]
|
||||||
[supported platforms](#supported-platforms).
|
|
||||||
|
|
||||||
### Important Concepts
|
## Important concepts
|
||||||
|
|
||||||
TensorFlow Lite inference on device typically follows the following steps.
|
TensorFlow Lite inference typically follows the following steps:
|
||||||
|
|
||||||
1. **Loading a Model**
|
1. **Loading a model**
|
||||||
|
|
||||||
The user loads the `.tflite` model into memory which contains the model's
|
You must load the `.tflite` model into memory, which contains the model's
|
||||||
execution graph.
|
execution graph.
|
||||||
|
|
||||||
1. **Transforming Data**
|
1. **Transforming data**
|
||||||
Input data acquired by the user generally may not match the input data format
|
|
||||||
expected by the model. For eg., a user may need to resize an image or change
|
|
||||||
the image format to be used by the model.
|
|
||||||
|
|
||||||
1. **Running Inference**
|
Raw input data for the model generally does not match the input data format
|
||||||
|
expected by the model. For example, you might need to resize an image or
|
||||||
|
change the image format to be compatible with the model.
|
||||||
|
|
||||||
This step involves using the API to execute the model. It involves a few
|
1. **Running inference**
|
||||||
steps such as building the interpreter, and allocating tensors as explained
|
|
||||||
in detail in [Running a Model](#running_a_model).
|
|
||||||
|
|
||||||
1. **Interpreting Output**
|
This step involves using the TensorFlow Lite API to execute the model. It
|
||||||
|
involves a few steps such as building the interpreter, and allocating
|
||||||
|
tensors, as described in the following sections.
|
||||||
|
|
||||||
The user retrieves results from model inference and interprets the tensors in
|
1. **Interpreting output**
|
||||||
a meaningful way to be used in the application.
|
|
||||||
|
|
||||||
For example, a model may only return a list of probabilities. It is up to the
|
When you receive results from the model inference, you must interpret the
|
||||||
application developer to meaningully map them to relevant categories and
|
tensors in a meaningful way that's useful in your application.
|
||||||
present it to their user.
|
|
||||||
|
|
||||||
### Supported Platforms
|
For example, a model might return only a list of probabilities. It's up to
|
||||||
|
you to map the probabilities to relevant categories and present it to your
|
||||||
|
end-user.
|
||||||
|
|
||||||
|
## Supported platforms
|
||||||
|
|
||||||
TensorFlow inference APIs are provided for most common mobile/embedded platforms
|
TensorFlow inference APIs are provided for most common mobile/embedded platforms
|
||||||
such as Android, iOS and Linux.
|
such as Android, iOS and Linux, in multiple programming languages.
|
||||||
|
|
||||||
#### Android
|
In most cases, the API design reflects a preference for performance over ease of
|
||||||
|
use. TensorFlow Lite is designed for fast inference on small devices, so it
|
||||||
|
should be no surprise that the APIs try to avoid unnecessary copies at the
|
||||||
|
expense of convenience. Similarly, consistency with TensorFlow APIs was not an
|
||||||
|
explicit goal and some variance between languages is to be expected.
|
||||||
|
|
||||||
|
Across all libraries, the TensorFlow Lite API enables you to load models,
|
||||||
|
feed inputs, and retrieve inference outputs.
|
||||||
|
|
||||||
|
### Android
|
||||||
|
|
||||||
On Android, TensorFlow Lite inference can be performed using either Java or C++
|
On Android, TensorFlow Lite inference can be performed using either Java or C++
|
||||||
APIs. The Java APIs provide convenience and can be used directly within your
|
APIs. The Java APIs provide convenience and can be used directly within your
|
||||||
Android Activity classes. The C++ APIs offer more flexibility and speed, but may
|
Android Activity classes. The C++ APIs offer more flexibility and speed, but may
|
||||||
require writing JNI wrappers to move data between Java and C++ layers.
|
require writing JNI wrappers to move data between Java and C++ layers.
|
||||||
|
|
||||||
Visit the [Android quickstart](android.md) for a tutorial and example code.
|
See below for details about using C++ and Java, or
|
||||||
|
follow the [Android quickstart](android.md) for a tutorial and example code.
|
||||||
|
|
||||||
#### iOS
|
### iOS
|
||||||
|
|
||||||
TensorFlow Lite provides native iOS libraries written in
|
On iOS, TensorFlow Lite is available with native iOS libraries written in
|
||||||
[Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
|
[Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
|
||||||
and
|
and
|
||||||
[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).
|
[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).
|
||||||
|
|
||||||
Visit the [iOS quickstart](ios.md) for a tutorial and example code.
|
This page doesn't include a discussion for about these languages, so you should
|
||||||
|
refer to the [iOS quickstart](ios.md) for a tutorial and example code.
|
||||||
|
|
||||||
#### Linux
|
### Linux
|
||||||
On Linux platforms such as [Raspberry Pi](build_rpi.md), TensorFlow Lite C++
|
|
||||||
and Python APIs can be used to run inference.
|
On Linux platforms (including [Raspberry Pi](build_rpi.md)), you can run
|
||||||
|
inferences using TensorFlow Lite APIs available in C++ and Python, as shown
|
||||||
|
in the following sections.
|
||||||
|
|
||||||
|
|
||||||
## API Guides
|
## Load and run a model in C++
|
||||||
|
|
||||||
TensorFlow Lite provides programming APIs in C++, Java and Python, with
|
Running a TensorFlow Lite model with C++ involves a few simple steps:
|
||||||
experimental bindings for several other languages (C, Swift, Objective-C). In
|
|
||||||
most cases, the API design reflects a preference for performance over ease of
|
|
||||||
use. TensorFlow Lite is designed for fast inference on small devices so it
|
|
||||||
should be no surprise that the APIs try to avoid unnecessary copies at the
|
|
||||||
expense of convenience. Similarly, consistency with TensorFlow APIs was not an
|
|
||||||
explicit goal and some variance is to be expected.
|
|
||||||
|
|
||||||
There is also a [Python API for TensorFlow Lite](../convert/python_api.md).
|
1. Load the model into memory as a `FlatBufferModel`.
|
||||||
|
2. Build an `Interpreter` based on an existing `FlatBufferModel`.
|
||||||
|
3. Set input tensor values. (Optionally resize input tensors if the
|
||||||
|
predefined sizes are not desired.)
|
||||||
|
4. Invoke inference.
|
||||||
|
5. Read output tensor values.
|
||||||
|
|
||||||
### Loading a Model
|
The [`FlatBufferModel`](
|
||||||
|
https://www.tensorflow.org/lite/api_docs/cc/class/tflite/flat-buffer-model.html)
|
||||||
#### C++
|
class encapsulates a TensorFlow Lite model and you can
|
||||||
The `FlatBufferModel` class encapsulates a model and can be built in a couple of
|
build it in a couple of different ways, depending on where the model is stored:
|
||||||
slightly different ways depending on where the model is stored:
|
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
class FlatBufferModel {
|
class FlatBufferModel {
|
||||||
@ -104,72 +117,36 @@ class FlatBufferModel {
|
|||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
```c++
|
Note: If TensorFlow Lite detects the presence of the [Android NNAPI](
|
||||||
tflite::FlatBufferModel model(path_to_model);
|
https://developer.android.com/ndk/guides/neuralnetworks), it will
|
||||||
```
|
automatically try to use shared memory to store the `FlatBufferModel`.
|
||||||
|
|
||||||
Note that if TensorFlow Lite detects the presence of Android's NNAPI it will
|
Now that you have the model as a `FlatBufferModel` object, you can execute it
|
||||||
automatically try to use shared memory to store the FlatBufferModel.
|
with an [`Interpreter`](
|
||||||
|
https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter.html).
|
||||||
|
A single `FlatBufferModel` can be used
|
||||||
|
simultaneously by more than one `Interpreter`.
|
||||||
|
|
||||||
#### Java
|
Caution: The `FlatBufferModel` object must remain valid until
|
||||||
|
all instances of `Interpreter` using it have been destroyed.
|
||||||
|
|
||||||
TensorFlow Lite's Java API supports on-device inference and is provided as an
|
The important parts of the `Interpreter` API are shown in the
|
||||||
Android Studio Library that allows loading models, feeding inputs, and
|
code snippet below. It should be noted that:
|
||||||
retrieving inference outputs.
|
|
||||||
|
|
||||||
The `Interpreter` class drives model inference with TensorFlow Lite. In
|
|
||||||
most of the cases, this is the only class an app developer will need.
|
|
||||||
|
|
||||||
The `Interpreter` can be initialized with a model file using the constructor:
|
|
||||||
|
|
||||||
```java
|
|
||||||
public Interpreter(@NotNull File modelFile);
|
|
||||||
```
|
|
||||||
|
|
||||||
or with a `MappedByteBuffer`:
|
|
||||||
|
|
||||||
```java
|
|
||||||
public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);
|
|
||||||
```
|
|
||||||
|
|
||||||
In both cases a valid TensorFlow Lite model must be provided or an
|
|
||||||
`IllegalArgumentException` with be thrown. If a `MappedByteBuffer` is used to
|
|
||||||
initialize an Interpreter, it should remain unchanged for the whole lifetime of
|
|
||||||
the `Interpreter`.
|
|
||||||
|
|
||||||
### Running a Model {#running_a_model}
|
|
||||||
|
|
||||||
#### C++
|
|
||||||
Running a model involves a few simple steps:
|
|
||||||
|
|
||||||
* Build an `Interpreter` based on an existing `FlatBufferModel`
|
|
||||||
* Optionally resize input tensors if the predefined sizes are not desired.
|
|
||||||
* Set input tensor values
|
|
||||||
* Invoke inference
|
|
||||||
* Read output tensor values
|
|
||||||
|
|
||||||
The important parts of public interface of the `Interpreter` are provided
|
|
||||||
below. It should be noted that:
|
|
||||||
|
|
||||||
* Tensors are represented by integers, in order to avoid string comparisons
|
* Tensors are represented by integers, in order to avoid string comparisons
|
||||||
(and any fixed dependency on string libraries).
|
(and any fixed dependency on string libraries).
|
||||||
* An interpreter must not be accessed from concurrent threads.
|
* An interpreter must not be accessed from concurrent threads.
|
||||||
* Memory allocation for input and output tensors must be triggered
|
* Memory allocation for input and output tensors must be triggered
|
||||||
by calling AllocateTensors() right after resizing tensors.
|
by calling `AllocateTensors()` right after resizing tensors.
|
||||||
|
|
||||||
In order to run the inference model in TensorFlow Lite, one has to load the
|
The simplest usage of TensorFlow Lite with C++ looks like this:
|
||||||
model into a `FlatBufferModel` object which then can be executed by an
|
|
||||||
`Interpreter`. The `FlatBufferModel` needs to remain valid for the whole
|
|
||||||
lifetime of the `Interpreter`, and a single `FlatBufferModel` can be
|
|
||||||
simultaneously used by more than one `Interpreter`. In concrete terms, the
|
|
||||||
`FlatBufferModel` object must be created before any `Interpreter` objects that
|
|
||||||
use it, and must be kept around until they have all been destroyed.
|
|
||||||
|
|
||||||
The simplest usage of TensorFlow Lite will look like this:
|
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
tflite::FlatBufferModel model(path_to_model);
|
// Load the model
|
||||||
|
std::unique_ptr<tflite::FlatBufferModel> model =
|
||||||
|
tflite::FlatBufferModel::BuildFromFile(filename);
|
||||||
|
|
||||||
|
// Build the interpreter
|
||||||
tflite::ops::builtin::BuiltinOpResolver resolver;
|
tflite::ops::builtin::BuiltinOpResolver resolver;
|
||||||
std::unique_ptr<tflite::Interpreter> interpreter;
|
std::unique_ptr<tflite::Interpreter> interpreter;
|
||||||
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
|
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
|
||||||
@ -185,9 +162,40 @@ interpreter->Invoke();
|
|||||||
float* output = interpreter->typed_output_tensor<float>(0);
|
float* output = interpreter->typed_output_tensor<float>(0);
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Java
|
For more example code, see [`minimal.cc`](
|
||||||
|
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/minimal/minimal.cc)
|
||||||
|
and [`label_image.cc`](
|
||||||
|
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/label_image.cc).
|
||||||
|
|
||||||
The simplest usage of Tensorflow Lite Java API looks like this:
|
|
||||||
|
## Load and run a model in Java
|
||||||
|
|
||||||
|
The Java API for running an inference with TensorFlow Lite is primarily designed
|
||||||
|
for use with Android, so it's available as an Android library dependency:
|
||||||
|
`org.tensorflow:tensorflow-lite`.
|
||||||
|
|
||||||
|
In Java, you'll use the `Interpreter` class to load a model and drive model
|
||||||
|
inference. In many cases, this may be the only API you need.
|
||||||
|
|
||||||
|
You can initialize an `Interpreter` using a `.tflite` file:
|
||||||
|
|
||||||
|
```java
|
||||||
|
public Interpreter(@NotNull File modelFile);
|
||||||
|
```
|
||||||
|
|
||||||
|
Or with a `MappedByteBuffer`:
|
||||||
|
|
||||||
|
```java
|
||||||
|
public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);
|
||||||
|
```
|
||||||
|
|
||||||
|
In both cases, you must provide a valid TensorFlow Lite model or the API throws
|
||||||
|
`IllegalArgumentException`. If you use `MappedByteBuffer` to
|
||||||
|
initialize an `Interpreter`, it must remain unchanged for the whole lifetime
|
||||||
|
of the `Interpreter`.
|
||||||
|
|
||||||
|
To then run an inference with the model, simply call `Interpreter.run()`.
|
||||||
|
For example:
|
||||||
|
|
||||||
```java
|
```java
|
||||||
try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
|
try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
|
||||||
@ -195,48 +203,44 @@ try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model))
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
If a model takes only one input and returns only one output, the following will
|
The `run()` method takes only one input and returns only one output. So if your
|
||||||
trigger an inference run:
|
model has multiple inputs or multiple outputs, instead use:
|
||||||
|
|
||||||
```java
|
|
||||||
interpreter.run(input, output);
|
|
||||||
```
|
|
||||||
|
|
||||||
For models with multiple inputs, or multiple outputs, use:
|
|
||||||
|
|
||||||
```java
|
```java
|
||||||
interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);
|
interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);
|
||||||
```
|
```
|
||||||
|
|
||||||
where each entry in `inputs` corresponds to an input tensor and
|
In this case, each entry in `inputs` corresponds to an input tensor and
|
||||||
`map_of_indices_to_outputs` maps indices of output tensors to the corresponding
|
`map_of_indices_to_outputs` maps indices of output tensors to the corresponding
|
||||||
output data. In both cases the tensor indices should correspond to the values
|
output data.
|
||||||
given to the
|
|
||||||
[TensorFlow Lite Optimized Converter](../convert/cmdline_examples.md) when the
|
|
||||||
model was created. Be aware that the order of tensors in `input` must match the
|
|
||||||
order given to the `TensorFlow Lite Optimized Converter`.
|
|
||||||
|
|
||||||
The Java API also provides convenient functions for app developers to get the
|
In both cases, the tensor indices should correspond to the values you gave to
|
||||||
index of any model input or output using a tensor name:
|
the [TensorFlow Lite Converter](../convert/) when you created the model.
|
||||||
|
Be aware that the order of tensors in `input` must match the
|
||||||
|
order given to the TensorFlow Lite Converter.
|
||||||
|
|
||||||
|
The `Interpreter` class also provides convenient functions for you to get the
|
||||||
|
index of any model input or output using an operation name:
|
||||||
|
|
||||||
```java
|
```java
|
||||||
public int getInputIndex(String tensorName);
|
public int getInputIndex(String opName);
|
||||||
public int getOutputIndex(String tensorName);
|
public int getOutputIndex(String opName);
|
||||||
```
|
```
|
||||||
|
|
||||||
If tensorName is not a valid name in model, an `IllegalArgumentException` will
|
If `opName` is not a valid operation in the model, it throws an
|
||||||
be thrown.
|
`IllegalArgumentException`.
|
||||||
|
|
||||||
##### Releasing Resources After Use
|
Also beware that `Interpreter` owns resources. To avoid memory leak, the
|
||||||
|
resources must be released after use by:
|
||||||
An `Interpreter` owns resources. To avoid memory leak, the resources must be
|
|
||||||
released after use by:
|
|
||||||
|
|
||||||
```java
|
```java
|
||||||
interpreter.close();
|
interpreter.close();
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Supported Data Types
|
For an example project with Java, see the [Android image classification sample](
|
||||||
|
https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android).
|
||||||
|
|
||||||
|
### Supported data types (in Java)
|
||||||
|
|
||||||
To use TensorFlow Lite, the data types of the input and output tensors must be
|
To use TensorFlow Lite, the data types of the input and output tensors must be
|
||||||
one of the following primitive types:
|
one of the following primitive types:
|
||||||
@ -256,7 +260,7 @@ provided as a single, flat `ByteBuffer` argument.
|
|||||||
If other data types, including boxed types like `Integer` and `Float`, are used,
|
If other data types, including boxed types like `Integer` and `Float`, are used,
|
||||||
an `IllegalArgumentException` will be thrown.
|
an `IllegalArgumentException` will be thrown.
|
||||||
|
|
||||||
##### Inputs
|
#### Inputs
|
||||||
|
|
||||||
Each input should be an array or multi-dimensional array of the supported
|
Each input should be an array or multi-dimensional array of the supported
|
||||||
primitive types, or a raw `ByteBuffer` of the appropriate size. If the input is
|
primitive types, or a raw `ByteBuffer` of the appropriate size. If the input is
|
||||||
@ -265,12 +269,12 @@ implicitly resized to the array's dimensions at inference time. If the input is
|
|||||||
a ByteBuffer, the caller should first manually resize the associated input
|
a ByteBuffer, the caller should first manually resize the associated input
|
||||||
tensor (via `Interpreter.resizeInput()`) before running inference.
|
tensor (via `Interpreter.resizeInput()`) before running inference.
|
||||||
|
|
||||||
When using 'ByteBuffer', prefer using direct byte buffers, as this allows the
|
When using `ByteBuffer`, prefer using direct byte buffers, as this allows the
|
||||||
`Interpreter` to avoid unnecessary copies. If the `ByteBuffer` is a direct byte
|
`Interpreter` to avoid unnecessary copies. If the `ByteBuffer` is a direct byte
|
||||||
buffer, its order must be `ByteOrder.nativeOrder()`. After it is used for a
|
buffer, its order must be `ByteOrder.nativeOrder()`. After it is used for a
|
||||||
model inference, it must remain unchanged until the model inference is finished.
|
model inference, it must remain unchanged until the model inference is finished.
|
||||||
|
|
||||||
##### Outputs
|
#### Outputs
|
||||||
|
|
||||||
Each output should be an array or multi-dimensional array of the supported
|
Each output should be an array or multi-dimensional array of the supported
|
||||||
primitive types, or a ByteBuffer of the appropriate size. Note that some models
|
primitive types, or a ByteBuffer of the appropriate size. Note that some models
|
||||||
@ -279,7 +283,75 @@ the input. There's no straightforward way of handling this with the existing
|
|||||||
Java inference API, but planned extensions will make this possible.
|
Java inference API, but planned extensions will make this possible.
|
||||||
|
|
||||||
|
|
||||||
## Writing Custom Operators
|
## Load and run a model in Python
|
||||||
|
|
||||||
|
The Python API for running an inference is provided in the `tf.lite`
|
||||||
|
module. From which, you mostly need only [`tf.lite.Interpreter`](
|
||||||
|
https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter) to load
|
||||||
|
a model and run an inference.
|
||||||
|
|
||||||
|
The following example shows how to use the Python interpreter to load a
|
||||||
|
`.tflite` file and run inference with random input data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
import tensorflow as tf
|
||||||
|
|
||||||
|
# Load TFLite model and allocate tensors.
|
||||||
|
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
|
||||||
|
interpreter.allocate_tensors()
|
||||||
|
|
||||||
|
# Get input and output tensors.
|
||||||
|
input_details = interpreter.get_input_details()
|
||||||
|
output_details = interpreter.get_output_details()
|
||||||
|
|
||||||
|
# Test model on random input data.
|
||||||
|
input_shape = input_details[0]['shape']
|
||||||
|
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
|
||||||
|
interpreter.set_tensor(input_details[0]['index'], input_data)
|
||||||
|
|
||||||
|
interpreter.invoke()
|
||||||
|
|
||||||
|
# The function `get_tensor()` returns a copy of the tensor data.
|
||||||
|
# Use `tensor()` in order to get a pointer to the tensor.
|
||||||
|
output_data = interpreter.get_tensor(output_details[0]['index'])
|
||||||
|
print(output_data)
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternative to loading the model as a pre-converted `.tflite` file, you can
|
||||||
|
combine your code with the [TensorFlow Lite Converter Python API](
|
||||||
|
../convert/python_api.md) (`tf.lite.TFLiteConverter`), allowing you to convert
|
||||||
|
your TensorFlow model into the TensorFlow Lite format and then run an inference:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
import tensorflow as tf
|
||||||
|
|
||||||
|
img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
|
||||||
|
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
|
||||||
|
val = img + const
|
||||||
|
out = tf.identity(val, name="out")
|
||||||
|
|
||||||
|
# Convert to TF Lite format
|
||||||
|
with tf.Session() as sess:
|
||||||
|
converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
|
||||||
|
tflite_model = converter.convert()
|
||||||
|
|
||||||
|
# Load TFLite model and allocate tensors.
|
||||||
|
interpreter = tf.lite.Interpreter(model_content=tflite_model)
|
||||||
|
interpreter.allocate_tensors()
|
||||||
|
|
||||||
|
# Continue to get tensors and so forth, as shown above...
|
||||||
|
```
|
||||||
|
|
||||||
|
For more Python sample code, see [`label_image.py`](
|
||||||
|
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py).
|
||||||
|
|
||||||
|
Tip: Run `help(tf.lite.Interpreter)` in the Python terminal to get detailed
|
||||||
|
documentation about the interpreter.
|
||||||
|
|
||||||
|
|
||||||
|
## Write a custom operator
|
||||||
|
|
||||||
All TensorFlow Lite operators (both custom and builtin) are defined using a
|
All TensorFlow Lite operators (both custom and builtin) are defined using a
|
||||||
simple pure-C interface that consists of four functions:
|
simple pure-C interface that consists of four functions:
|
||||||
@ -343,7 +415,7 @@ Note that registration is not automatic and an explicit call to
|
|||||||
registration of builtins, custom ops will have to be collected in separate
|
registration of builtins, custom ops will have to be collected in separate
|
||||||
custom libraries.
|
custom libraries.
|
||||||
|
|
||||||
### Customizing the kernel library
|
### Customize the kernel library
|
||||||
|
|
||||||
Behind the scenes the interpreter will load a library of kernels which will be
|
Behind the scenes the interpreter will load a library of kernels which will be
|
||||||
assigned to execute each of the operators in the model. While the default
|
assigned to execute each of the operators in the model. While the default
|
||||||
@ -362,21 +434,19 @@ class OpResolver {
|
|||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
Regular usage will require the developer to use the `BuiltinOpResolver` and
|
Regular usage requires that you use the `BuiltinOpResolver` and write:
|
||||||
write:
|
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
tflite::ops::builtin::BuiltinOpResolver resolver;
|
tflite::ops::builtin::BuiltinOpResolver resolver;
|
||||||
```
|
```
|
||||||
|
|
||||||
They can then optionally register custom ops:
|
You can optionally register custom ops (before you pass the resolver to the
|
||||||
|
`InterpreterBuilder`):
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
resolver.AddOp("MY_CUSTOM_OP", Register_MY_CUSTOM_OP());
|
resolver.AddOp("MY_CUSTOM_OP", Register_MY_CUSTOM_OP());
|
||||||
```
|
```
|
||||||
|
|
||||||
before the resolver is passed to the `InterpreterBuilder`.
|
|
||||||
|
|
||||||
If the set of builtin ops is deemed to be too large, a new `OpResolver` could
|
If the set of builtin ops is deemed to be too large, a new `OpResolver` could
|
||||||
be code-generated based on a given subset of ops, possibly only the ones
|
be code-generated based on a given subset of ops, possibly only the ones
|
||||||
contained in a given model. This is the equivalent of TensorFlow's selective
|
contained in a given model. This is the equivalent of TensorFlow's selective
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user