Add inference instruction for iOS

PiperOrigin-RevId: 317239235
Change-Id: I55bd7e43bc286f34024ccfc27db61d28304a651d
This commit is contained in:
Taehee Jeong 2020-06-18 20:48:49 -07:00 committed by TensorFlower Gardener
parent 85ad8031f6
commit 2a05589bd4

View File

@ -7,9 +7,9 @@ inference with a TensorFlow Lite model, you must run it through an
The interpreter uses a static graph ordering and a custom (less-dynamic) memory
allocator to ensure minimal load, initialization, and execution latency.
This page describes how to access to the TensorFlow Lite interpreter and
perform an inference using C++, Java, and Python, plus links to other resources
for each [supported platform](#supported-platforms).
This page describes how to access to the TensorFlow Lite interpreter and perform
an inference using C++, Java, and Python, plus links to other resources for each
[supported platform](#supported-platforms).
[TOC]
@ -17,31 +17,31 @@ for each [supported platform](#supported-platforms).
TensorFlow Lite inference typically follows the following steps:
1. **Loading a model**
1. **Loading a model**
You must load the `.tflite` model into memory, which contains the model's
execution graph.
You must load the `.tflite` model into memory, which contains the model's
execution graph.
1. **Transforming data**
1. **Transforming data**
Raw input data for the model generally does not match the input data format
expected by the model. For example, you might need to resize an image or
change the image format to be compatible with the model.
Raw input data for the model generally does not match the input data format
expected by the model. For example, you might need to resize an image or
change the image format to be compatible with the model.
1. **Running inference**
1. **Running inference**
This step involves using the TensorFlow Lite API to execute the model. It
involves a few steps such as building the interpreter, and allocating
tensors, as described in the following sections.
This step involves using the TensorFlow Lite API to execute the model. It
involves a few steps such as building the interpreter, and allocating
tensors, as described in the following sections.
1. **Interpreting output**
1. **Interpreting output**
When you receive results from the model inference, you must interpret the
tensors in a meaningful way that's useful in your application.
When you receive results from the model inference, you must interpret the
tensors in a meaningful way that's useful in your application.
For example, a model might return only a list of probabilities. It's up to
you to map the probabilities to relevant categories and present it to your
end-user.
For example, a model might return only a list of probabilities. It's up to
you to map the probabilities to relevant categories and present it to your
end-user.
## Supported platforms
@ -54,8 +54,8 @@ should be no surprise that the APIs try to avoid unnecessary copies at the
expense of convenience. Similarly, consistency with TensorFlow APIs was not an
explicit goal and some variance between languages is to be expected.
Across all libraries, the TensorFlow Lite API enables you to load models,
feed inputs, and retrieve inference outputs.
Across all libraries, the TensorFlow Lite API enables you to load models, feed
inputs, and retrieve inference outputs.
### Android
@ -64,8 +64,8 @@ APIs. The Java APIs provide convenience and can be used directly within your
Android Activity classes. The C++ APIs offer more flexibility and speed, but may
require writing JNI wrappers to move data between Java and C++ layers.
See below for details about using C++ and Java, or
follow the [Android quickstart](android.md) for a tutorial and example code.
See below for details about using C++ and Java, or follow the
[Android quickstart](android.md) for a tutorial and example code.
#### TensorFlow Lite Android wrapper code generator
@ -86,103 +86,36 @@ On iOS, TensorFlow Lite is available with native iOS libraries written in
[Swift](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
and
[Objective-C](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc).
You can also use
[C API](https://www.tensorflow.org/code/tensorflow/lite/c/c_api.h)
directly in Objective-C codes.
This page doesn't include a discussion for about these languages, so you should
refer to the [iOS quickstart](ios.md) for a tutorial and example code.
See below for details about using Swift, Objective-C and C API, or follow the
[iOS quickstart](ios.md) for a tutorial and example code.
### Linux
On Linux platforms (including [Raspberry Pi](build_rpi.md)), you can run
inferences using TensorFlow Lite APIs available in C++ and Python, as shown
in the following sections.
inferences using TensorFlow Lite APIs available in C++ and Python, as shown in
the following sections.
## Running a model
## Load and run a model in C++
Running a TensorFlow Lite model involves a few simple steps:
Running a TensorFlow Lite model with C++ involves a few simple steps:
1. Load the model into memory as a `FlatBufferModel`.
2. Build an `Interpreter` based on an existing `FlatBufferModel`.
3. Set input tensor values. (Optionally resize input tensors if the
predefined sizes are not desired.)
4. Invoke inference.
5. Read output tensor values.
The [`FlatBufferModel`](
https://www.tensorflow.org/lite/api_docs/cc/class/tflite/flat-buffer-model.html)
class encapsulates a TensorFlow Lite model and you can
build it in a couple of different ways, depending on where the model is stored:
```c++
class FlatBufferModel {
 // Build a model based on a file. Return a nullptr in case of failure.
 static std::unique_ptr<FlatBufferModel> BuildFromFile(
     const char* filename,
     ErrorReporter* error_reporter);
 // Build a model based on a pre-loaded flatbuffer. The caller retains
 // ownership of the buffer and should keep it alive until the returned object
 // is destroyed. Return a nullptr in case of failure.
 static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
     const char* buffer,
     size_t buffer_size,
     ErrorReporter* error_reporter);
};
```
Note: If TensorFlow Lite detects the presence of the [Android NNAPI](
https://developer.android.com/ndk/guides/neuralnetworks), it will
automatically try to use shared memory to store the `FlatBufferModel`.
Now that you have the model as a `FlatBufferModel` object, you can execute it
with an [`Interpreter`](
https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter.html).
A single `FlatBufferModel` can be used
simultaneously by more than one `Interpreter`.
Caution: The `FlatBufferModel` object must remain valid until
all instances of `Interpreter` using it have been destroyed.
The important parts of the `Interpreter` API are shown in the
code snippet below. It should be noted that:
* Tensors are represented by integers, in order to avoid string comparisons
(and any fixed dependency on string libraries).
* An interpreter must not be accessed from concurrent threads.
* Memory allocation for input and output tensors must be triggered
by calling `AllocateTensors()` right after resizing tensors.
The simplest usage of TensorFlow Lite with C++ looks like this:
```c++
// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromFile(filename);
// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
// Resize input tensors, if desired.
interpreter->AllocateTensors();
float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.
interpreter->Invoke();
float* output = interpreter->typed_output_tensor<float>(0);
```
For more example code, see [`minimal.cc`](
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/minimal/minimal.cc)
and [`label_image.cc`](
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/label_image.cc).
1. Load the model into memory.
2. Build an `Interpreter` based on an existing model.
3. Set input tensor values. (Optionally resize input tensors if the predefined
sizes are not desired.)
4. Invoke inference.
5. Read output tensor values.
Following sections describe how these steps can be done in each language.
## Load and run a model in Java
*Platform: Android*
The Java API for running an inference with TensorFlow Lite is primarily designed
for use with Android, so it's available as an Android library dependency:
`org.tensorflow:tensorflow-lite`.
@ -203,12 +136,12 @@ public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);
```
In both cases, you must provide a valid TensorFlow Lite model or the API throws
`IllegalArgumentException`. If you use `MappedByteBuffer` to
initialize an `Interpreter`, it must remain unchanged for the whole lifetime
of the `Interpreter`.
`IllegalArgumentException`. If you use `MappedByteBuffer` to initialize an
`Interpreter`, it must remain unchanged for the whole lifetime of the
`Interpreter`.
To then run an inference with the model, simply call `Interpreter.run()`.
For example:
To then run an inference with the model, simply call `Interpreter.run()`. For
example:
```java
try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
@ -228,9 +161,9 @@ In this case, each entry in `inputs` corresponds to an input tensor and
output data.
In both cases, the tensor indices should correspond to the values you gave to
the [TensorFlow Lite Converter](../convert/) when you created the model.
Be aware that the order of tensors in `input` must match the
order given to the TensorFlow Lite Converter.
the [TensorFlow Lite Converter](../convert/) when you created the model. Be
aware that the order of tensors in `input` must match the order given to the
TensorFlow Lite Converter.
The `Interpreter` class also provides convenient functions for you to get the
index of any model input or output using an operation name:
@ -250,8 +183,8 @@ resources must be released after use by:
interpreter.close();
```
For an example project with Java, see the [Android image classification sample](
https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android).
For an example project with Java, see the
[Android image classification sample](https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/android).
### Supported data types (in Java)
@ -295,13 +228,231 @@ have dynamic outputs, where the shape of output tensors can vary depending on
the input. There's no straightforward way of handling this with the existing
Java inference API, but planned extensions will make this possible.
## Load and run a model in Swift
*Platform: iOS*
The
[Swift API](https://www.tensorflow.org/code/tensorflow/lite/experimental/swift)
is available in `TensorFlowLiteSwift` Pod from Cocoapods.
First, you need to import `TensorFlowLite` module.
```swift
import TensorFlowLite
```
```swift
// Getting model path
guard
let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
else {
// Error handling...
}
do {
// Initialize an interpreter with the model.
let interpreter = try Interpreter(modelPath: modelPath)
// Allocate memory for the model's input `Tensor`s.
try interpreter.allocateTensors()
let inputData: Data // Should be initialized
// input data preparation...
// Copy the input data to the input `Tensor`.
try self.interpreter.copy(inputData, toInputAt: 0)
// Run inference by invoking the `Interpreter`.
try self.interpreter.invoke()
// Get the output `Tensor`
let outputTensor = try self.interpreter.output(at: 0)
// Copy output to `Data` to process the inference results.
let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
let outputData =
UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
outputTensor.data.copyBytes(to: outputData)
if (error != nil) { /* Error handling... */ }
} catch error {
// Error handling...
}
```
## Load and run a model in Objective-C
*Platform: iOS*
The
[Objective-C API](https://www.tensorflow.org/code/tensorflow/lite/experimental/objc)
is available in `TensorFlowLiteObjC` Pod from Cocoapods.
First, you need to import `TensorFlowLite` module.
```objc
@import TensorFlowLite;
```
```objc
NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
ofType:@"tflite"];
NSError *error;
// Initialize an interpreter with the model.
TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
error:&error];
if (error != nil) { /* Error handling... */ }
// Allocate memory for the model's input `TFLTensor`s.
[interpreter allocateTensorsWithError:&error];
if (error != nil) { /* Error handling... */ }
NSMutableData *inputData; // Should be initialized
// input data preparation...
// Copy the input data to the input `TFLTensor`.
[interpreter copyData:inputData toInputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }
// Run inference by invoking the `TFLInterpreter`.
[interpreter invokeWithError:&error];
if (error != nil) { /* Error handling... */ }
// Get the output `TFLTensor`
TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }
// Copy output to `NSData` to process the inference results.
NSData *outputData = [outputTensor dataWithError:&amp;error];
if (error != nil) { /* Error handling... */ }
```
### Using C API in Objective-C code
Currently Objective-C API does not support delegates. In order to use delegates
with Objective-C code, you need to directly call underlying
[C API](https://www.tensorflow.org/code/tensorflow/lite/c/c_api.h).
```c
#include "tensorflow/lite/c/c_api.h"
```
```c
TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
// Create the interpreter.
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);
// Allocate tensors and populate the input tensor data.
TfLiteInterpreterAllocateTensors(interpreter);
TfLiteTensor* input_tensor =
TfLiteInterpreterGetInputTensor(interpreter, 0);
TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
input.size() * sizeof(float));
// Execute inference.
TfLiteInterpreterInvoke(interpreter);
// Extract the output tensor data.
const TfLiteTensor* output_tensor =
// TfLiteInterpreterGetOutputTensor(interpreter, 0);
TfLiteTensorCopyToBuffer(output_tensor, output.data(),
output.size() * sizeof(float));
// Dispose of the model and interpreter objects.
TfLiteInterpreterDelete(interpreter);
TfLiteInterpreterOptionsDelete(options);
TfLiteModelDelete(model);
```
## Load and run a model in C++
*Platforms: Android and Linux*
In C++, the model is stored in
[`FlatBufferModel`](https://www.tensorflow.org/lite/api_docs/cc/class/tflite/flat-buffer-model.html)
class. It encapsulates a TensorFlow Lite model and you can build it in a couple
of different ways, depending on where the model is stored:
```c++
class FlatBufferModel {
 // Build a model based on a file. Return a nullptr in case of failure.
 static std::unique_ptr<FlatBufferModel> BuildFromFile(
     const char* filename,
     ErrorReporter* error_reporter);
 // Build a model based on a pre-loaded flatbuffer. The caller retains
 // ownership of the buffer and should keep it alive until the returned object
 // is destroyed. Return a nullptr in case of failure.
 static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
     const char* buffer,
     size_t buffer_size,
     ErrorReporter* error_reporter);
};
```
Note: If TensorFlow Lite detects the presence of the
[Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks), it
will automatically try to use shared memory to store the `FlatBufferModel`.
Now that you have the model as a `FlatBufferModel` object, you can execute it
with an
[`Interpreter`](https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter.html).
A single `FlatBufferModel` can be used simultaneously by more than one
`Interpreter`.
Caution: The `FlatBufferModel` object must remain valid until all instances of
`Interpreter` using it have been destroyed.
The important parts of the `Interpreter` API are shown in the code snippet
below. It should be noted that:
* Tensors are represented by integers, in order to avoid string comparisons
(and any fixed dependency on string libraries).
* An interpreter must not be accessed from concurrent threads.
* Memory allocation for input and output tensors must be triggered by calling
`AllocateTensors()` right after resizing tensors.
The simplest usage of TensorFlow Lite with C++ looks like this:
```c++
// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromFile(filename);
// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);
// Resize input tensors, if desired.
interpreter->AllocateTensors();
float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.
interpreter->Invoke();
float* output = interpreter->typed_output_tensor<float>(0);
```
For more example code, see
[`minimal.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/minimal/minimal.cc)
and
[`label_image.cc`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/label_image/label_image.cc).
## Load and run a model in Python
The Python API for running an inference is provided in the `tf.lite`
module. From which, you mostly need only [`tf.lite.Interpreter`](
https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter) to load
a model and run an inference.
*Platform: Linux*
The Python API for running an inference is provided in the `tf.lite` module.
From which, you mostly need only
[`tf.lite.Interpreter`](https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter)
to load a model and run an inference.
The following example shows how to use the Python interpreter to load a
`.tflite` file and run inference with random input data:
@ -358,13 +509,12 @@ interpreter.allocate_tensors()
# Continue to get tensors and so forth, as shown above...
```
For more Python sample code, see [`label_image.py`](
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py).
For more Python sample code, see
[`label_image.py`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py).
Tip: Run `help(tf.lite.Interpreter)` in the Python terminal to get detailed
documentation about the interpreter.
## Write a custom operator
All TensorFlow Lite operators (both custom and builtin) are defined using a
@ -379,10 +529,10 @@ typedef struct {
} TfLiteRegistration;
```
Refer to `context.h` for details on `TfLiteContext` and `TfLiteNode`. The
former provides error reporting facilities and access to global objects,
including all the tensors. The latter allows implementations to access their
inputs and outputs.
Refer to `context.h` for details on `TfLiteContext` and `TfLiteNode`. The former
provides error reporting facilities and access to global objects, including all
the tensors. The latter allows implementations to access their inputs and
outputs.
When the interpreter loads a model, it calls `init()` once for each node in the
graph. A given `init()` will be called more than once if the op is used multiple
@ -403,9 +553,9 @@ implementations can access their state using `node->user_data`.
Finally, each time inference runs, the interpreter traverses the graph calling
`invoke()`, and here too the state is available as `node->user_data`.
Custom ops can be implemented in exactly the same way as builtin ops, by
defined those four functions and a global registration function that usually
looks like this:
Custom ops can be implemented in exactly the same way as builtin ops, by defined
those four functions and a global registration function that usually looks like
this:
```c++
namespace tflite {
@ -461,8 +611,7 @@ You can optionally register custom ops (before you pass the resolver to the
resolver.AddOp("MY_CUSTOM_OP", Register_MY_CUSTOM_OP());
```
If the set of builtin ops is deemed to be too large, a new `OpResolver` could
be code-generated based on a given subset of ops, possibly only the ones
contained in a given model. This is the equivalent of TensorFlow's selective
registration (and a simple version of it is available in the `tools`
directory).
If the set of builtin ops is deemed to be too large, a new `OpResolver` could be
code-generated based on a given subset of ops, possibly only the ones contained
in a given model. This is the equivalent of TensorFlow's selective registration
(and a simple version of it is available in the `tools` directory).