Minor TF Lite documentation updates.

PiperOrigin-RevId: 314643633
Change-Id: Ieaa82849c35d1071d6a750b60c72ca08c47a0db7
This commit is contained in:
Gregory Clark 2020-06-03 18:25:48 -07:00 committed by TensorFlower Gardener
parent 716e8a092c
commit a628c339c5
33 changed files with 328 additions and 332 deletions

View File

@ -18,7 +18,7 @@ make sure you [have TensorFlow installed](https://www.tensorflow.org/install).
You can use any compatible model, but the following MobileNet v1 model offers
a good demonstration of a model trained to recognize 1,000 different objects.
```
```sh
# Get photo
curl https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp > /tmp/grace_hopper.bmp
# Get model
@ -33,7 +33,7 @@ mv /tmp/mobilenet_v1_1.0_224/labels.txt /tmp/
Note: Instead use `python` if you're using Python 2.x.
```
```sh
python3 label_image.py \
--model_file /tmp/mobilenet_v1_1.0_224.tflite \
--label_file /tmp/labels.txt \

View File

@ -21,9 +21,9 @@ custom objects in
## Usage
The following example shows a SavedModel being converted:
The following example shows a `SavedModel` being converted:
```bash
```sh
tflite_convert \
--saved_model_dir=/tmp/mobilenet_saved_model \
--output_file=/tmp/mobilenet.tflite
@ -39,7 +39,7 @@ The inputs and outputs are specified using the following commonly used flags:
To use all of the available flags, use the following command:
```bash
```sh
tflite_convert --help
```
@ -57,7 +57,7 @@ To obtain the latest version of the TensorFlow Lite converter CLI, we recommend
installing the nightly build using
[pip](https://www.tensorflow.org/install/pip):
```bash
```sh
pip install tf-nightly
```
@ -65,7 +65,7 @@ Alternatively, you can
[clone the TensorFlow repository](https://www.tensorflow.org/install/source) and
use `bazel` to run the command:
```
```sh
bazel run //tensorflow/lite/python:tflite_convert -- \
--saved_model_dir=/tmp/mobilenet_saved_model \
--output_file=/tmp/mobilenet.tflite
@ -75,13 +75,13 @@ bazel run //tensorflow/lite/python:tflite_convert -- \
There is a behavior change in how models containing
[custom ops](https://www.tensorflow.org/lite/guide/ops_custom) (those for which
users use to set --allow\_custom\_ops before) are handled in the
users previously set `--allow_custom_ops` before) are handled in the
[new converter](https://github.com/tensorflow/tensorflow/blob/917ebfe5fc1dfacf8eedcc746b7989bafc9588ef/tensorflow/lite/python/lite.py#L81).
**Built-in TensorFlow op**
If you are converting a model with a built-in TensorFlow op that does not exist
in TensorFlow Lite, you should set --allow\_custom\_ops argument (same as
in TensorFlow Lite, you should set `--allow_custom_ops` argument (same as
before), explained [here](https://www.tensorflow.org/lite/guide/ops_custom).
**Custom op in TensorFlow**
@ -90,27 +90,27 @@ If you are converting a model with a custom TensorFlow op, it is recommended
that you write a [TensorFlow kernel](https://www.tensorflow.org/guide/create_op)
and [TensorFlow Lite kernel](https://www.tensorflow.org/lite/guide/ops_custom).
This ensures that the model is working end-to-end, from TensorFlow and
TensorFlow Lite. This also requires setting the --allow\_custom\_ops argument.
TensorFlow Lite. This also requires setting the `--allow_custom_ops` argument.
**Advanced custom op usage (not recommended)**
If the above is not possible, you can still convert a TensorFlow model
containing a custom op without a corresponding kernel. You will need to pass the
[OpDef](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_def.proto)
of the custom op in TensorFlow using --custom\_opdefs flag, as long as you have
of the custom op in TensorFlow using `--custom_opdefs` flag, as long as you have
the corresponding OpDef registered in the TensorFlow global op registry. This
ensures that the TensorFlow model is valid (i.e. loadable by the TensorFlow
runtime).
If the custom op is not part of the global TensorFlow op registry, then the
corresponding OpDef needs to be specified via the --custom\_opdefs flag. This is
a list of an OpDef proto in string that needs to be additionally registered.
Below is an example of an TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
corresponding OpDef needs to be specified via the `--custom_opdefs` flag. This
is a list of an OpDef proto in string that needs to be additionally registered.
Below is an example of a TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
attributes:
```
--custom\_opdefs="name: 'TFLiteAwesomeCustomOp' input\_arg: { name: 'InputA'
type: DT\_FLOAT } input\_arg: { name: InputB' type: DT\_FLOAT }
output\_arg: { name: 'Output' type: DT\_FLOAT } attr : { name: 'Attr1' type:
```sh
--custom_opdefs="name: 'TFLiteAwesomeCustomOp' input_arg: { name: 'InputA'
type: DT_FLOAT } input_arg: { name: InputB' type: DT_FLOAT }
output_arg: { name: 'Output' type: DT_FLOAT } attr : { name: 'Attr1' type:
'float'} attr : { name: 'Attr2' type: 'list(float)'}"
```

View File

@ -13,8 +13,8 @@ The API for TensorFlow 1.X is available
## New in TF 2.2
Switching to use a new converter backend by default - in the nightly builds and
TF 2.2 stable. Why we are switching?
TensorFlow Lite has switched to use a new converter backend by default - in the
nightly builds and TF 2.2 stable. Why we did we switch?
* Enables conversion of new classes of models, including Mask R-CNN, Mobile
BERT, and many more
@ -46,9 +46,9 @@ In case you encounter any issues:
and
[Command Line Tool](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/convert/cmdline.md)
documentation
* Switch to the old converter by setting --experimental_new_converter=false
* Switch to the old converter by setting `--experimental_new_converter=false`
(from the [tflite_convert](https://www.tensorflow.org/lite/convert/cmdline)
command line tool) or converter.experimental_new_converter=False (from
command line tool) or `converter.experimental_new_converter=False` (from the
[Python API](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter))
## Device deployment

View File

@ -20,7 +20,7 @@ set this up [here](https://www.tensorflow.org/install).
After setup the Python programming environment, you will need to install
additional tooling:
```
```sh
pip install tflite-support
```
@ -53,31 +53,31 @@ Lite metadata:
### Examples
Note: The export directory specified has to exist before you run the script, it
Note: The export directory specified has to exist before you run the script; it
does not get created as part of the process.
You can find examples on how the metadata should be populated for different
types of models here:
#### Image Classification
#### Image classification
Download the script
[here](https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/metadata/metadata_writer_for_image_classifier.py)
and run the script like this:
```
```sh
python ./metadata_writer_for_image_classifier.py \
--model_file=./model_without_metadata/mobilenet_v1_0.75_160_quantized.tflite \
--label_file=./model_without_metadata/labels.txt \
--export_directory=model_with_metadata
```
The rest of this guide will highlight some of the key sections in the Image
Classification example to illustrate the key elements.
The rest of this guide will highlight some of the key sections in the image
classification example to illustrate the key elements.
### Deep dive into the Image Classification example
### Deep dive into the image classification example
#### Model Information
#### Model information
Metadata starts by creating a new model info:
@ -103,9 +103,9 @@ model_meta.license = ("Apache License. Version 2.0 "
#### Input / output information
This describe your model's input and output signature and it maybe used by
automatic code generators to create pre- and post- processing code. To create an
input or output information about a tensor:
This section shows you how to describe your model's input and output signature.
This metadata may be used by automatic code generators to create pre- and post-
processing code. To create input or output information about a tensor:
```python
# Creates input info.
@ -115,13 +115,13 @@ input_meta = _metadata_fb.TensorMetadataT()
output_meta = _metadata_fb.TensorMetadataT()
```
#### Image Input
#### Image input
Image is a common input type for machine learning. TensorFlow Lite metadata
supports information such as colorspace and pre-processing information such as
normalization. One thing that does not required manual input is the dimension of
the image as this is already provided by the shape of the input tensor and can
be automatically inferred.
normalization. The dimension of the image does not require manual specification
since it is already provided by the shape of the input tensor and can be
automatically inferred.
```python
input_meta.name = "image"
@ -153,7 +153,7 @@ input_meta.stats = input_stats
Label can be mapped to an output tensor via an associated file using
`TENSOR_AXIS_LABELS`.
```Python
```python
# Creates output info.
output_meta = _metadata_fb.TensorMetadataT()
output_meta.name = "probability"
@ -175,7 +175,7 @@ output_meta.associatedFiles = [label_file]
#### Put it all together
The following code pull the model information together with the input and output
The following code combines the model information with the input and output
information:
```python
@ -192,8 +192,8 @@ b.Finish(
metadata_buf = b.Output()
```
Once the data structure is ready, the writing of the metadata into the tflite
file is done via the `populate` method:
Once the data structure is ready, the metadata is written into the TFLite file
via the `populate` method:
```python
populator = _metadata.MetadataPopulator.with_model_file(model_file)
@ -204,9 +204,9 @@ populator.populate()
#### Verify the metadata
You can read back the metadata in a tflite file using the `MetadataDisplayer`:
You can read the metadata in a TFLite file using the `MetadataDisplayer`:
```Python
```python
displayer = _metadata.MetadataDisplayer.with_model_file(export_model_path)
export_json_file = os.path.join(FLAGS.export_directory,
os.path.splitext(model_basename)[0] + ".json")

View File

@ -192,7 +192,7 @@ specific wrapper code. For more information, please refer to the
The TensorFlow nightly can be installed using the following command:
```
```sh
pip install tf-nightly
```
@ -208,13 +208,13 @@ either install the nightly build with
There is a behavior change in how models containing
[custom ops](https://www.tensorflow.org/lite/guide/ops_custom) (those for which
users use to set allow\_custom\_ops before) are handled in the
users previously set `allow_custom_ops` before) are handled in the
[new converter](https://github.com/tensorflow/tensorflow/blob/917ebfe5fc1dfacf8eedcc746b7989bafc9588ef/tensorflow/lite/python/lite.py#L81).
**Built-in TensorFlow op**
If you are converting a model with a built-in TensorFlow op that does not exist
in TensorFlow Lite, you should set allow\_custom\_ops attribute (same as
in TensorFlow Lite, you should set the `allow_custom_ops` attribute (same as
before), explained [here](https://www.tensorflow.org/lite/guide/ops_custom).
**Custom op in TensorFlow**
@ -223,27 +223,27 @@ If you are converting a model with a custom TensorFlow op, it is recommended
that you write a [TensorFlow kernel](https://www.tensorflow.org/guide/create_op)
and [TensorFlow Lite kernel](https://www.tensorflow.org/lite/guide/ops_custom).
This ensures that the model is working end-to-end, from TensorFlow and
TensorFlow Lite. This also requires setting the allow\_custom\_ops attribute.
TensorFlow Lite. This also requires setting the `allow_custom_ops` attribute.
**Advanced custom op usage (not recommended)**
If the above is not possible, you can still convert a TensorFlow model
containing a custom op without a corresponding kernel. You will need to pass the
[OpDef](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_def.proto)
of the custom op in TensorFlow using --custom\_opdefs flag, as long as you have
of the custom op in TensorFlow using `--custom_opdefs` flag, as long as you have
the corresponding OpDef registered in the TensorFlow global op registry. This
ensures that the TensorFlow model is valid (i.e. loadable by the TensorFlow
runtime).
If the custom op is not part of the global TensorFlow op registry, then the
corresponding OpDef needs to be specified via the --custom\_opdefs flag. This is
a list of an OpDef proto in string that needs to be additionally registered.
Below is an example of an TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
corresponding OpDef needs to be specified via the `--custom_opdefs` flag. This
is a list of an OpDef proto in string that needs to be additionally registered.
Below is an example of a TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
attributes:
```
converter.custom\_opdefs="name: 'TFLiteAwesomeCustomOp' input\_arg: { name: 'InputA'
type: DT\_FLOAT } input\_arg: { name: InputB' type: DT\_FLOAT }
output\_arg: { name: 'Output' type: DT\_FLOAT } attr : { name: 'Attr1' type:
'float'} attr : { name: 'Attr2' type: 'list(float)'}"
```python
converter.custom_opdefs="""name: 'TFLiteAwesomeCustomOp' input_arg: { name: 'InputA'
type: DT_FLOAT } input_arg: { name: InputB' type: DT_FLOAT }
output_arg: { name: 'Output' type: DT_FLOAT } attr : { name: 'Attr1' type:
'float'} attr : { name: 'Attr2' type: 'list(float)'}"""
```

View File

@ -12,7 +12,7 @@ has latency benefits, but prioritizes size reduction.
During conversion, set the `optimizations` flag to optimize for size:
```
```python
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()
@ -26,7 +26,7 @@ quantized. To do this, we need to measure the dynamic range of activations and
inputs with a representative data set. You can simply create an input data
generator and provide it to our converter.
```
```python
import tensorflow as tf
def representative_dataset_gen():
@ -40,7 +40,7 @@ converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
```
# During training: Quantizing models for integer-only execution.
# During training: Quantizing models for integer-only execution
Quantizing models for integer-only execution gets a model with even faster
latency, smaller size, and integer-only accelerators compatible model.
@ -52,7 +52,7 @@ compatible with 2.0 semantics is in progress.
Convert the graph:
```
```python
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
input_arrays = converter.get_input_arrays()
@ -75,5 +75,5 @@ the
`std_dev` is 255 / (float_max - float_min).
For most users, we recommend using post-training quantization. We are working on
new tools for post-training and during training quantization that we hope will
new tools for post-training and training-time quantization that we hope will
simplify generating quantized models.

View File

@ -191,7 +191,7 @@ build --action_env ANDROID_SDK_API_LEVEL="23"
build --action_env ANDROID_SDK_HOME="/usr/local/android/android-sdk-linux"
```
#### Build and Install
#### Build and install
Once Bazel is properly configured, you can build the TensorFlow Lite AAR from
the root checkout directory as follows:
@ -268,11 +268,13 @@ If you want to use TFLite through C++ libraries, you can build the shared
libraries:
32bit armeabi-v7a:
```
```sh
bazel build -c opt --config=android_arm //tensorflow/lite:libtensorflowlite.so
```
64bit arm64-v8a:
```
```sh
bazel build -c opt --config=android_arm64 //tensorflow/lite:libtensorflowlite.so
```

View File

@ -14,23 +14,22 @@ or
## Cross-compile for Raspberry Pi
Instruction has been tested on Ubuntu 16.04.3 64-bit PC (AMD64) and TensorFlow
devel docker image
The following instructions have been tested on Ubuntu 16.04.3 64-bit PC (AMD64)
and TensorFlow devel docker image
[tensorflow/tensorflow:nightly-devel](https://hub.docker.com/r/tensorflow/tensorflow/tags/).
To cross compile TensorFlow Lite follow the steps:
1. Clone official Raspberry Pi cross-compilation toolchain:
```bash
```sh
git clone https://github.com/raspberrypi/tools.git rpi_tools
```
2. Clone TensorFlow repository:
```bash
```sh
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
```
**Note:** If you're using the TensorFlow Docker image, the repo is already
@ -39,7 +38,7 @@ To cross compile TensorFlow Lite follow the steps:
3. Run following script at the root of the TensorFlow repository to download
all the build dependencies:
```bash
```sh
cd tensorflow_src && ./tensorflow/lite/tools/make/download_dependencies.sh
```
@ -47,7 +46,7 @@ To cross compile TensorFlow Lite follow the steps:
4. To build ARMv7 binary for Raspberry Pi 2, 3 and 4 execute:
```bash
```sh
PATH=../rpi_tools/arm-bcm2708/arm-rpi-4.9.3-linux-gnueabihf/bin:$PATH ./tensorflow/lite/tools/make/build_rpi_lib.sh
```
@ -56,7 +55,7 @@ To cross compile TensorFlow Lite follow the steps:
5. To build ARMv6 binary for Raspberry Pi Zero execute:
```bash
```sh
PATH=../rpi_tools/arm-bcm2708/arm-rpi-4.9.3-linux-gnueabihf/bin:$PATH ./tensorflow/lite/tools/make/build_rpi_lib.sh TARGET_ARCH=armv6
```
@ -65,28 +64,27 @@ To cross compile TensorFlow Lite follow the steps:
## Compile natively on Raspberry Pi
Instruction has been tested on Raspberry Pi Zero, Raspbian GNU/Linux 10
(buster), gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1):
The following instructions have been tested on Raspberry Pi Zero, Raspbian
GNU/Linux 10 (buster), gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1):
To natively compile TensorFlow Lite follow the steps:
1. Log in to your Raspberry Pi and install the toolchain:
```bash
```sh
sudo apt-get install build-essential
```
2. Clone TensorFlow repository:
```bash
```sh
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
```
3. Run following script at the root of the TensorFlow repository to download
all the build dependencies:
```bash
```sh
cd tensorflow_src && ./tensorflow/lite/tools/make/download_dependencies.sh
```
@ -94,7 +92,7 @@ To natively compile TensorFlow Lite follow the steps:
4. You should then be able to compile TensorFlow Lite with:
```bash
```sh
./tensorflow/lite/tools/make/build_rpi_lib.sh
```

View File

@ -1,7 +1,7 @@
# Generate code from TensorFlow Lite metadata
Note: TensorFlow Lite wrapper code generator is in experimental (beta) phase and
it currently only supports Android.
currently only supports Android.
For TensorFlow Lite model enhanced with [metadata](../convert/metadata.md),
developers can use the TensorFlow Lite Android wrapper code generator to create
@ -19,13 +19,13 @@ to see how the codegen tool parses each field.
You will need to install the following tooling in your terminal:
```
```sh
pip install tflite-support
```
Once completed, the code generator can be used using the following syntax:
```
```sh
tflite_codegen --model=./model_with_metadata/mobilenet_v1_0.75_160_quantized.tflite \
--package_name=org.tensorflow.lite.classify \
--model_class_name=MyClassifierModel \
@ -66,7 +66,7 @@ In the app module that will be consuming the generated library module:
Under the android section, add the following:
```java
```build
aaptOptions {
noCompress "tflite"
}
@ -74,14 +74,14 @@ aaptOptions {
Under the dependencies section, add the following:
```java
```build
implementation project(":classify_wrapper")
```
### Step 3: Using the model
```java
// 1. Initialize the Model
// 1. Initialize the model
MyClassifierModel myImageClassifier = null;
try {
@ -92,14 +92,14 @@ try {
if(null != myImageClassifier) {
// 2. Setting the input with a Bitmap called inputBitmap
// 2. Set the input with a Bitmap called inputBitmap
MyClassifierModel.Inputs inputs = myImageClassifier.createInputs();
inputs.loadImage(inputBitmap));
// 3. Running the model
// 3. Run the model
MyClassifierModel.Outputs outputs = myImageClassifier.run(inputs);
// 4. Retrieving the result
// 4. Retrieve the result
Map<String, Float> labeledProbability = outputs.getProbability();
}
```
@ -117,7 +117,7 @@ parameters:
* (Optional) **`numThreads`**: Number of threads used to run the model -
default is one.
For example, to use a NNAPI delegate and up to three threads, you can initiate
For example, to use a NNAPI delegate and up to three threads, you can initialize
the model like this:
```java
@ -135,7 +135,7 @@ try {
Under the app module that will uses the library module, insert the following
lines under the android section:
```java
```build
aaptOptions {
noCompress "tflite"
}

View File

@ -55,7 +55,7 @@ look for the inputs and outputs in the graph. To visualize a `.pb` file, use the
[`import_pb_to_tensorboard.py`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/import_pb_to_tensorboard.py)
script like below:
```
```sh
python import_pb_to_tensorboard.py --model_dir <model path> --log_dir <log dir path>
```
@ -68,7 +68,7 @@ script in our repository.
* [Clone the TensorFlow repository](https://www.tensorflow.org/install/source)
* Run the `visualize.py` script with bazel:
```
```sh
bazel run //tensorflow/lite/tools:visualize model.tflite visualized_model.html
```
@ -101,8 +101,8 @@ random data to feed to the interpreter.
#### How do I reduce the size of my converted TensorFlow Lite model?
[Post-training quantization](../performance/post_training_quantization.md) can be
used during conversion to TensorFlow Lite to reduce the size of the model.
[Post-training quantization](../performance/post_training_quantization.md) can
be used during conversion to TensorFlow Lite to reduce the size of the model.
Post-training quantization quantizes weights to 8-bits of precision from
floating-point and dequantizes them during runtime to perform floating point
computations. However, note that this could have some accuracy implications.

View File

@ -11,14 +11,13 @@ each step of the developer workflow and provides links to further instructions.
<a id="1_choose_a_model"></a>
A TensorFlow model is a data structure that contains the logic and knowledge of
a machine learning network trained to solve a particular problem.
There are many ways to obtain a TensorFlow model, from using pre-trained models
to training your own.
a machine learning network trained to solve a particular problem. There are many
ways to obtain a TensorFlow model, from using pre-trained models to training
your own.
To use a model with TensorFlow Lite, you must convert a
full TensorFlow model into the TensorFlow Lite format—you
cannot create or train a model using TensorFlow Lite. So you must start with a
regular TensorFlow model, and then
To use a model with TensorFlow Lite, you must convert a full TensorFlow model
into the TensorFlow Lite format—you cannot create or train a model using
TensorFlow Lite. So you must start with a regular TensorFlow model, and then
[convert the model](#2_convert_the_model_format).
Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not
@ -135,9 +134,9 @@ performance or reduce file size. This is covered in section 4,
### Ops compatibility
TensorFlow Lite currently supports a [limited subset of TensorFlow
operations](ops_compatibility.md). The long term goal is for all TensorFlow
operations to be supported.
TensorFlow Lite currently supports a
[limited subset of TensorFlow operations](ops_compatibility.md). The long term
goal is for all TensorFlow operations to be supported.
If the model you wish to convert contains unsupported operations, you can use
[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
@ -215,11 +214,9 @@ Embedded Linux is an important platform for deploying machine learning. To get
started using Python to perform inference with your TensorFlow Lite models,
follow the [Python quickstart](python.md).
To instead install the C++ library, see the
build instructions for [Raspberry Pi](build_rpi.md) or
[Arm64-based boards](build_arm64.md) (for boards such as Odroid C2, Pine64, and
NanoPi).
To instead install the C++ library, see the build instructions for
[Raspberry Pi](build_rpi.md) or [Arm64-based boards](build_arm64.md) (for boards
such as Odroid C2, Pine64, and NanoPi).
### Microcontrollers

View File

@ -310,7 +310,7 @@ The following example shows how to use the Python interpreter to load a
import numpy as np
import tensorflow as tf
# Load TFLite model and allocate tensors.
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()
@ -318,7 +318,7 @@ interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Test model on random input data.
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
@ -331,10 +331,11 @@ output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
```
Alternative to loading the model as a pre-converted `.tflite` file, you can
combine your code with the [TensorFlow Lite Converter Python API](
../convert/python_api.md) (`tf.lite.TFLiteConverter`), allowing you to convert
your TensorFlow model into the TensorFlow Lite format and then run an inference:
Alternatively to loading the model as a pre-converted `.tflite` file, you can
combine your code with the
[TensorFlow Lite Converter Python API](../convert/python_api.md)
(`tf.lite.TFLiteConverter`), allowing you to convert your TensorFlow model into
the TensorFlow Lite format and then run an inference:
```python
import numpy as np
@ -350,7 +351,7 @@ with tf.Session() as sess:
converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
tflite_model = converter.convert()
# Load TFLite model and allocate tensors.
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
@ -384,22 +385,22 @@ including all the tensors. The latter allows implementations to access their
inputs and outputs.
When the interpreter loads a model, it calls `init()` once for each node in the
graph. A given `init()` will be called more than once if the op is used
multiple times in the graph. For custom ops a configuration buffer will be
provided, containing a flexbuffer that maps parameter names to their values.
The buffer is empty for builtin ops because the interpreter has already parsed
the op parameters. Kernel implementation that require state should initialize
it here and transfer ownership to the caller. For each `init()` call, there
will be a corresponding call to `free()`, allowing implementations to dispose
of the buffer they might have allocated in `init()`.
graph. A given `init()` will be called more than once if the op is used multiple
times in the graph. For custom ops a configuration buffer will be provided,
containing a flexbuffer that maps parameter names to their values. The buffer is
empty for builtin ops because the interpreter has already parsed the op
parameters. Kernel implementations that require state should initialize it here
and transfer ownership to the caller. For each `init()` call, there will be a
corresponding call to `free()`, allowing implementations to dispose of the
buffer they might have allocated in `init()`.
Whenever the input tensors are resized the interpreter will go through the
Whenever the input tensors are resized, the interpreter will go through the
graph notifying implementations of the change. This gives them the chance to
resize their internal buffer, check validity of input shapes and types, and
recalculate output shapes. This is all done through `prepare()` and
implementation can access their state using `node->user_data`.
recalculate output shapes. This is all done through `prepare()`, and
implementations can access their state using `node->user_data`.
Finally, each time inference runs the interpreter traverses the graph calling
Finally, each time inference runs, the interpreter traverses the graph calling
`invoke()`, and here too the state is available as `node->user_data`.
Custom ops can be implemented in exactly the same way as builtin ops, by

View File

@ -23,7 +23,7 @@ quantized (`uint8`, `int8`) inference, but many ops do not yet for other types
like `tf.float16` and strings.
Apart from using different version of the operations, the other difference
between floating-point and quantized models lies in the way they are converted.
between floating-point and quantized models is the way they are converted.
Quantized conversion requires dynamic range information for tensors. This
requires "fake-quantization" during model training, getting range information
via a calibration data set, or doing "on-the-fly" range estimation. See
@ -32,8 +32,8 @@ via a calibration data set, or doing "on-the-fly" range estimation. See
## Data format and broadcasting
At the moment TensorFlow Lite supports only TensorFlow's "NHWC" format, and
broadcasting is only support in a limited number of ops (tf.add, tf.mul, tf.sub,
and tf.div).
broadcasting is only support in a limited number of ops (`tf.add`, `tf.mul`,
`tf.sub`, and `tf.div`).
## Compatible operations
@ -58,8 +58,8 @@ counterparts:
* `tf.nn.softmax` —As long as tensors are 2D and axis is the last dimension.
* `tf.nn.top_k`
* `tf.one_hot`
* `tf.pad` —As long as mode and constant_values are not used.
* `tf.reduce_mean` —As long as the reduction_indices attribute is not used.
* `tf.pad` —As long as `mode` and `constant_values` are not used.
* `tf.reduce_mean` —As long as the `reduction_indices` attribute is not used.
* `tf.reshape`
* `tf.sigmoid`
* `tf.space_to_batch_nd` —As long as the input tensor is 4D (1 batch + 2
@ -67,19 +67,19 @@ counterparts:
* `tf.space_to_depth`
* `tf.split` —As long as num is not provided and `num_or_size_split` contains
number of splits as a 0D tensor.
* `tf.squeeze` —As long as axis is not provided.
* `tf.squeeze` —As long as `axis` is not provided.
* `tf.squared_difference`
* `tf.strided_slice` —As long as `ellipsis_mask and new_axis_mask` are not
* `tf.strided_slice` —As long as `ellipsis_mask` and `new_axis_mask` are not
used.
* `tf.transpose` —As long as conjugate is not used.
* `tf.transpose` —As long as `conjugate` is not used.
## Straight-forward conversions, constant-folding and fusing
A number of TensorFlow operations can be processed by TensorFlow Lite even
though they have no direct equivalent. This is the case for operations that can
be simply removed from the graph (tf.identity), replaced by tensors
(tf.placeholder), or fused into more complex operations (tf.nn.bias_add). Even
some supported operations may sometimes be removed through one of these
be simply removed from the graph (`tf.identity`), replaced by tensors
(`tf.placeholder`), or fused into more complex operations (`tf.nn.bias_add`).
Even some supported operations may sometimes be removed through one of these
processes.
Here is a non-exhaustive list of TensorFlow operations that are usually removed
@ -115,7 +115,7 @@ from the graph:
* `tf.nn.relu`
* `tf.nn.relu6`
Note: Many of those operations don't have TensorFlow Lite equivalents and the
Note: Many of those operations don't have TensorFlow Lite equivalents, and the
corresponding model will not be convertible if they can't be elided or fused.
## Unsupported operations
@ -343,10 +343,10 @@ Outputs {
**FLOOR**
```
inputs {
Inputs {
0: tensor
}
outputs: {
Outputs: {
0: result of computing element-wise floor of the input tensor
}
```

View File

@ -1,32 +1,31 @@
# Custom operators
TensorFlow Lite currently supports a subset of TensorFlow operators. It supports
the use of user-provided implementations (as known as custom implementations) if
the use of user-provided implementations (known as custom implementations) if
the model contains an operator that is not supported. Providing custom kernels
is also a way of evaluating a series of TensorFlow operations as a single fused
TensorFlow Lite operations.
is also a way of executing a series of TensorFlow operations as a single fused
TensorFlow Lite operation.
Using custom operators consists of three steps.
* Making sure the TensorFlow Graph Def or SavedModel refers to the correctly
* Make sure the TensorFlow Graph Def or SavedModel refers to the correctly
named TensorFlow Lite operator.
* Registering a custom kernel with TensorFlow Lite so that the runtime knows
how to map your operator and parameters in your graph to executable C/C++
code.
* Register a custom kernel with TensorFlow Lite so that the runtime knows how
to map your operator and parameters in your graph to executable C/C++ code.
* Testing and profiling your operator correctness and performance,
respectively. If you wish to test just your custom operator it is best to
create a model with just your custom operator and using the
* Test and profile your operator correctness and performance, respectively. If
you wish to test just your custom operator, it is best to create a model
with just your custom operator and using the
[benchmark_model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/benchmark/benchmark_model_test.cc)
program
program.
Below we describe a complete example of defining Sin and some links to existing
conversion process involving custom operators.
Below we describe a complete example of defining `Sin` and some links to
existing conversion process involving custom operators.
## Making a custom operator for Sin
Lets walk through this an example of supporting a TensorFlow operator that
Lets walk through an example of supporting a TensorFlow operator that
TensorFlow Lite does not have. Assume we are using the `Sin` operator and that
we are building a very simple model for a function `y = sin(x + offset)`, where
`offset` is trainable.
@ -45,11 +44,11 @@ optimizer = tf.train.GradientDescentOptimizer(0.001)
train = optimizer.minimize(loss)
```
If you convert this model to Tensorflow Lite format using the TensorFlow Lite
If you convert this model to TensorFlow Lite format using the TensorFlow Lite
Optimizing Converter with `--allow_custom_ops` argument, and run it with the
default interpreter, the interpreter will raise the following error messages:
```
```none
Didn't find custom op for name 'Sin'
Registration failed.
```
@ -57,8 +56,7 @@ Registration failed.
### Defining the kernel in the TensorFlow Lite runtime
All we need to do to use the op in TensorFlow Lite is define two functions
(`Prepare` and `Eval`), and construct a `TfLiteRegistration`. This code would
look something like this:
(`Prepare` and `Eval`), and construct a `TfLiteRegistration`:
```cpp
TfLiteStatus SinPrepare(TfLiteContext* context, TfLiteNode* node) {
@ -105,44 +103,45 @@ TfLiteRegistration* Register_SIN() {
}
```
When initializing the `OpResolver`, add the custom op into the resolver, this
When initializing the `OpResolver`, add the custom op into the resolver. This
will register the operator with Tensorflow Lite so that TensorFlow Lite can use
the new implementation. Note that the last two arguments in TfLiteRegistration
correspond to the `SinPrepare` and `SinEval()` functions you defined for the
custom op. If you used two functions to initialize variables used in the op and
free up space: `Init()` and `Free()`, then they would be added to the first two
arguments of TfLiteRegistration; they are set to nullptr in this example.
the new implementation. Note that the last two arguments in `TfLiteRegistration`
correspond to the `SinPrepare` and `SinEval` functions you defined for the
custom op. If you used `SinInit` and `SinFree` functions to initialize variables
used in the op and to free up space, respectively, then they would be added to
the first two arguments of `TfLiteRegistration`; those arguments are set to
`nullptr` in this example.
```cpp
tflite::ops::builtin::BuiltinOpResolver builtins;
builtins.AddCustom("Sin", Register_SIN());
```
If you want to make your custom operators in Java, you would currently need to
If you want to define your custom operators in Java, you would currently need to
build your own custom JNI layer and compile your own AAR
[in this jni code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/java/src/main/native/builtin_ops_jni.cc).
Similarly, if you wish to make these operators available in Python you can place
your registrations in the
Similarly, if you wish to define these operators available in Python you can
place your registrations in the
[Python wrapper code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/interpreter_wrapper/interpreter_wrapper.cc).
Note that a similar process as above can be followed for supporting for a set of
Note that a similar process as above can be followed for supporting a set of
operations instead of a single operator. Just add as many `AddCustom` operators
as you need. In addition, `BuiltinOpResolver` also allows you to override
implementations of builtins by using the `AddBuiltin`.
## Best Practices
## Best practices
### Writing TensorFlow Lite kernels best practices
1. Optimize memory allocations and de-allocations cautiously. It is more
efficient to allocate memory in Prepare() instead of Invoke(), and allocate
memory before a loop instead of in every iteration. Use temporary tensors
data rather than mallocing yourself (see item 2). Use pointers/references
instead of copying as much as possible.
1. Optimize memory allocations and de-allocations cautiously. Allocating memory
in `Prepare` is more efficient than in `Invoke`, and allocating memory
before a loop is better than in every iteration. Use temporary tensors data
rather than mallocing yourself (see item 2). Use pointers/references instead
of copying as much as possible.
2. If a data structure will persist during the entire operation, we advise
pre-allocating the memory using temporary tensors. You may need to use
OpData struct to reference the tensor indices in other functions. See
OpData struct to reference the tensor indices in other functions. See the
example in the
[kernel for convolution](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/conv.cc).
A sample code snippet is below
@ -158,23 +157,24 @@ implementations of builtins by using the `AddBuiltin`.
```
3. If it doesn't cost too much wasted memory, prefer using a static fixed size
array (or in Resize() pre-allocated std::vector) rather than using a
dynamically allocating std::vector every iteration of execution.
array (or a pre-allocated `std::vector` in `Resize`) rather than using a
dynamically allocated `std::vector` every iteration of execution.
4. Avoid instantiating standard library container templates that don't already
exist, because they affect binary size. For example, if you need a std::map
in your operation that doesn't exist in other kernels, using a std::vector
with direct indexing mapping could work while keeping the binary size small.
See what other kernels use to gain insight (or ask).
exist, because they affect binary size. For example, if you need a
`std::map` in your operation that doesn't exist in other kernels, using a
`std::vector` with direct indexing mapping could work while keeping the
binary size small. See what other kernels use to gain insight (or ask).
5. Check the pointer to the memory returned by malloc. If this pointer is
nullptr, no operations should be performed using that pointer. If you
malloc() in a function and have an error exit, deallocate memory before you
5. Check the pointer to the memory returned by `malloc`. If this pointer is
`nullptr`, no operations should be performed using that pointer. If you
`malloc` in a function and have an error exit, deallocate memory before you
exit.
6. Use TF_LITE_ENSURE(context, condition) to check for a specific condition.
Your code must not leave memory hanging when TF_LITE_ENSURE is done, i.e.,
these should be done before any resources are allocated that will leak.
6. Use `TF_LITE_ENSURE(context, condition)` to check for a specific condition.
Your code must not leave memory hanging when `TF_LITE_ENSURE` is used, i.e.,
these macros should be used before any resources are allocated that will
leak.
### Conversion best practices
@ -187,10 +187,10 @@ instead of the builtin TensorFlow one.
#### Converting TensorFlow models to convert graphs
In TensorFlow you can use the `tf.lite.OpHint` class to encapsulate groups of
operators when you create a TensorFlow graph. This allows you then to extract a
graph def that has references to those operators. This is currently experimental
and should only be used by advanced users. There is a full example of how to use
this in the
operators when you create a TensorFlow graph. This encapsulation allows you then
to extract a graph def that has references to those operators. `tf.lite.OpHint`
is currently experimental and should only be used by advanced users. A full
example of how to use this class is in the
[OpHint code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/op_hint.py).
In addition, you can also use a manual graph substitution approach to rewrite
@ -198,23 +198,23 @@ Tensorflow graphs. There is an example of how this is done in single shot object
based detection models
[export script](https://github.com/tensorflow/models/blob/master/research/object_detection/export_tflite_ssd_graph.py).
### TF Graph Attributes
### TF graph attributes
When `tflite_convert` converts a TensorFlow graph into TFLite format, it makes
some assumption about custom operations that might not be correct. In this case,
some assumptions about custom operations. If the assumptions are not correct,
the generated graph may not execute.
It is possible to add additional information about your custom op output to TF
graph before it is converted. The following attributes are supported:
It is possible to add additional information about your custom op output to the
TF graph before it is converted. The following attributes are supported:
- **_output_quantized** a boolean attribute, true if the operation outputs are
quantized
- **_output_types** a list of types for output tensors
- **_output_shapes** a list of shapes for output tensors
#### Setting the Attributes
#### Setting the attributes
This is an example how the attributes can be set:
The following example demonstrates how the attributes can be set:
```python
frozen_graph_def = tf.graph_util.convert_variables_to_constants(...)
@ -231,5 +231,5 @@ tflite_model = tf.lite.toco_convert(
frozen_graph_def,...)
```
**Note:** After the attributes are set, the graph can not be executed by
Tensorflow, therefore it should be done just before the conversion.
**Note:** After the attributes are set, the graph cannot be executed by
TensorFlow. Therefore, the attributes should be set just before the conversion.

View File

@ -2,9 +2,9 @@
Caution: This feature is experimental.
The TensorFlow Lite builtin op library has grown rapidly, and will continue to
The TensorFlow Lite builtin op library has grown rapidly and will continue to
grow, but there remains a long tail of TensorFlow ops that are not yet natively
supported by TensorFlow Lite . These unsupported ops can be a point of friction
supported by TensorFlow Lite. These unsupported ops can be a point of friction
in the TensorFlow Lite model conversion process. To that end, the team has
recently been working on an experimental mechanism for reducing this friction.
@ -55,7 +55,7 @@ limitations.
The following example shows how to use this feature in the
[`TFLiteConverter`](./convert/python_api.md) Python API.
```
```python
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
@ -69,7 +69,7 @@ The following example shows how to use this feature in the
[`tflite_convert`](../convert/cmdline_examples.md) command line tool using the
command line flag `target_ops`.
```
```sh
tflite_convert \
--output_file=/tmp/foo.tflite \
--graph_def_file=/tmp/foo.pb \
@ -81,7 +81,7 @@ tflite_convert \
When building and running `tflite_convert` directly with `bazel`, please pass
`--define=tflite_convert_with_select_tf_ops=true` as an additional argument.
```
```sh
bazel run --define=tflite_convert_with_select_tf_ops=true tflite_convert -- \
--output_file=/tmp/foo.tflite \
--graph_def_file=/tmp/foo.pb \
@ -157,7 +157,7 @@ Finally, in your app's `build.gradle`, ensure you have the `mavenLocal()`
dependency and replace the standard TensorFlow Lite dependency with the one that
has support for select TensorFlow ops:
```
```build
allprojects {
repositories {
jcenter()
@ -220,7 +220,7 @@ creating the interpreter at runtime as long as the delegate is linked into the
client library. It is not necessary to explicitly install the delegate instance
as is typically required with other delegate types.
### Python pip Package
### Python pip package
Python support is actively under development.
@ -241,7 +241,7 @@ Build | Time (milliseconds)
Only built-in ops (`TFLITE_BUILTIN`) | 260.7
Using only TF ops (`SELECT_TF_OPS`) | 264.5
### Binary Size
### Binary size
The following table describes the binary size of TensorFlow Lite for each build.
These targets were built for Android using `--config=android_arm -c opt`.
@ -251,22 +251,22 @@ Build | C++ Binary Size | Android APK Size
Only built-in ops | 796 KB | 561 KB
Built-in ops + TF ops | 23.0 MB | 8.0 MB
## Known Limitations
## Known limitations
The following is a list of some of the known limitations:
* Control flow ops are not yet supported.
* The
[`post_training_quantization`](https://www.tensorflow.org/performance/post_training_quantization)
flag is currently not supported for TensorFlow ops so it will not quantize
flag is currently not supported for TensorFlow ops, so it will not quantize
weights for any TensorFlow ops. In models with both TensorFlow Lite builtin
ops and TensorFlow ops, the weights for the builtin ops will be quantized.
* Ops that require explicit initialization from resources, like HashTableV2,
* Ops that require explicit initialization from resources, like `HashTableV2`,
are not yet supported.
* Certain TensorFlow ops may not support the full set of input/output types
that are typically available on stock TensorFlow.
## Future Plans
## Future plans
The following is a list of improvements to this pipeline that are in progress:
@ -276,5 +276,5 @@ The following is a list of improvements to this pipeline that are in progress:
* *Improved usability* - The conversion process will be simplified to only
require a single pass through the converter. Additionally, pre-built Android
AAR and iOS CocoaPod binaries will be provided.
* *Improved performance* - There is work being done to ensure TensorFlow Lite
with TensorFlow ops has performance parity to TensorFlow Mobile.
* *Improved performance* - Work is being done to ensure TensorFlow Lite with
TensorFlow ops has performance parity to TensorFlow Mobile.

View File

@ -13,7 +13,7 @@ existing ops. In addition, it guarantees the following:
reads a new model that contains a new version of an op which isn't
supported, it should report the error.
## Example: Adding Dilation into Convolution
## Example: Adding dilation into convolution
The remainder of this document explains op versioning in TFLite by showing how
to add dilation parameters to the convolution operation.
@ -25,7 +25,7 @@ Knowledge of dilation is not required to understand this document. Note that:
* Old convolution kernels that don't support dilation are equivalent to
setting the dilation factors to 1.
### Change FlatBuffer Schema
### Change FlatBuffer schema
To add new parameters into an op, change the options table in
`lite/schema/schema.fbs`.
@ -66,7 +66,7 @@ table Conv2DOptions {
The file `lite/schema/schema_generated.h` should be re-generated for the new
schema.
### Change C Structures and Kernel Implementation
### Change C structures and kernel implementation
In TensorFlow Lite, the kernel implementation is decoupled from FlatBuffer
definition. The kernels read the parameter from C structures defined in
@ -103,7 +103,7 @@ typedef struct {
Please also change the kernel implementation to read the newly added parameters
from the C structures. The details are omitted here.
### Change the FlatBuffer Reading Code
### Change the FlatBuffer reading code
The logic to read FlatBuffer and produce C structure is in
`lite/core/api/flatbuffer_conversions.cc`.
@ -132,7 +132,7 @@ reads an old model file where dilation factors are missing, it will use 1 as
the default value, and the new kernel will work consistently with the old
kernel.
### Change Kernel Registration
### Change kernel registration
The MutableOpResolver (defined in `lite/op_resolver.h`) provides a few functions
to register op kernels. The minimum and maximum version are 1 by default:
@ -192,23 +192,24 @@ int GetVersion(const Operator& op) const override {
### Update the operator version map
The last step is to add the new version info into the operator version map. This
step is required because we need generate the model's minimum required runtime
version based on this version map.
step is required because we need to generate the model's minimum required
runtime version based on this version map.
To do this, you need to add a new map entry in `lite/toco/tflite/op_version.cc`.
In this example, it means you need to add the following into `op_version_map`:
In this example, you need to add the following entry into `op_version_map`:
```
{{OperatorType::kConv, 3}, "kPendingReleaseOpVersion"}
```
(`kPendingReleaseOpVersion` will be replaced with the appropriate release
version in the next stable release.)
### Delegation Implementation
### Delegation implementation
TensorFlow Lite provides a delegation API which enables delegating ops to
hardware backends. In Delegate's `Prepare` function, check if the version
is supported for every node in Delegation code.
hardware backends. In the delegate's `Prepare` function, check if the version is
supported for every node in Delegation code.
```
const int kMinVersion = 1;

View File

@ -372,5 +372,5 @@ applications on GitHub</a>
To learn how to use the library in your own project, read
[Understand the C++ library](library.md).
For information about training and convert models for deployment on
For information about training and converting models for deployment on
microcontrollers, read [Build and convert models](build_convert.md).

View File

@ -1,4 +1,4 @@
# Question and Answer
# Question and answer
Use a pre-trained model to answer questions based on the content of a given
passage.
@ -44,7 +44,7 @@ pre-processing including tokenization and post-processing steps that are
described in the BERT [paper](https://arxiv.org/abs/1810.04805) and implemented
in the sample app.
## Performance Benchmarks
## Performance benchmarks
Performance benchmark numbers are generated with the tool
[described here](https://www.tensorflow.org/lite/performance/benchmarks).

View File

@ -216,7 +216,7 @@ experiment with different models to find the optimal balance between
performance, accuracy, and model size. For guidance, see
<a href="#choose_a_different_model">Choose a different model</a>.
## Performance Benchmarks
## Performance benchmarks
Performance benchmark numbers are generated with the tool
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
@ -260,7 +260,7 @@ Performance benchmark numbers are generated with the tool
## Choose a different model
There are a large number of image classification models available on our
A large number of image classification models are available on our
<a href="../../guide/hosted_models.md">List of hosted models</a>. You should aim
to choose the optimal model for your application based on performance, accuracy
and model size. There are trade-offs between each of them.
@ -302,7 +302,7 @@ Our quantized MobileNet models size ranges from 0.5 to 3.4 Mb.
### Architecture
There are several different architectures of models available on
Several different model architectures are available on
<a href="../../guide/hosted_models.md">List of hosted models</a>, indicated by
the models name. For example, you can choose between MobileNet, Inception, and
others.

View File

@ -180,7 +180,7 @@ edges in a similar manner.
Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your devices camera, and you will have to write code to crop and scale your raw image to fit the models input size (there are examples of this in our <a href="#get_started">example applications</a>).<br /><br />The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly.
## Performance Benchmarks
## Performance benchmarks
Performance benchmark numbers are generated with the tool
[described here](https://www.tensorflow.org/lite/performance/benchmarks).

View File

@ -43,7 +43,7 @@ The current implementation includes the following features:
<li>DeepLabv3+: We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder-decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime.</li>
</ol>
## Performance Benchmarks
## Performance benchmarks
Performance benchmark numbers are generated with the tool
[described here](https://www.tensorflow.org/lite/performance/benchmarks).

View File

@ -1,4 +1,4 @@
# Text Classification
# Text classification
Use a pre-trained model to category a paragraph into predefined groups.
@ -44,7 +44,7 @@ Here are the steps to classify a paragraph with the model:
* This model was trained on movie reviews dataset so you may experience
reduced accuracy when classifying text of other domains.
## Performance Benchmarks
## Performance benchmarks
Performance benchmark numbers are generated with the tool
[described here](https://www.tensorflow.org/lite/performance/benchmarks).

View File

@ -19,7 +19,7 @@ and assumed in the `/data/local/tmp` directory.
To run the benchmark:
```
```sh
adb shell /data/local/tmp/benchmark_model \
--num_threads=4 \
--graph=/data/local/tmp/tflite_models/${GRAPH} \
@ -27,8 +27,8 @@ adb shell /data/local/tmp/benchmark_model \
--num_runs=50
```
To run with nnapi delegate, please set --use_nnapi=true. To run with gpu
delegate, please set --use_gpu=true.
To run with nnapi delegate, please set `--use_nnapi=true`. To run with gpu
delegate, please set `--use_gpu=true`.
The performance values below are measured on Android 10.

View File

@ -50,9 +50,8 @@ operator is executed. Check out our
## Optimize your model
Model optimization aims to create smaller models that are generally faster and
more energy efficient, so that they can be deployed on mobile devices. There are
multiple optimization techniques supported by TensorFlow Lite, such as
quantization.
more energy efficient, so that they can be deployed on mobile devices.
TensorFlow Lite supports multiple optimization techniques, such as quantization.
Check out our [model optimization docs](model_optimization.md) for details.
@ -78,7 +77,7 @@ If your application is not carefully designed, there can be redundant copies
when feeding the input to and reading the output from the model. Make sure to
eliminate redundant copies. If you are using higher level APIs, like Java, make
sure to carefully check the documentation for performance caveats. For example,
the Java API is a lot faster if ByteBuffers are used as
the Java API is a lot faster if `ByteBuffers` are used as
[inputs](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/java/src/main/java/org/tensorflow/lite/Interpreter.java#L175).
## Profile your application with platform specific tools

View File

@ -1,8 +1,8 @@
# Tensorflow Lite Core ML Delegate
# Tensorflow Lite Core ML delegate
TensorFlow Lite Core ML Delegate enables running TensorFlow Lite models on
[Core ML framework](https://developer.apple.com/documentation/coreml),
which results in faster model inference on iOS devices.
The TensorFlow Lite Core ML delegate enables running TensorFlow Lite models on
[Core ML framework](https://developer.apple.com/documentation/coreml), which
results in faster model inference on iOS devices.
Note: This delegate is in experimental (beta) phase.

View File

@ -1,6 +1,7 @@
# TensorFlow Lite delegates
Note: Delegate API is still experimental and is subject to change.
## What is a TensorFlow Lite delegate?
A TensorFlow Lite delegate is a way to delegate part or all of graph execution
@ -51,9 +52,9 @@ If a delegate was provided for specific operations, then TensorFlow Lite will
split the graph into multiple subgraphs where each subgraph will be handled by a
delegate.
Let's assume that there is a delegate "MyDelegate," which has a faster
implementation for Conv2D and Mean operations. The resulting main graph will be
updated to look like below.
Let's assume that a delegate, `MyDelegate`, has a faster implementation for
Conv2D and Mean operations. The resulting main graph will be updated to look
like below.
![Graph with delegate](../images/performance/tflite_delegate_graph_2.png "Graph with delegate")
@ -74,16 +75,16 @@ _Note that the API used below is experimental and is subject to change._
Based on the previous section, to add a delegate, we need to do the following:
1. Define a kernel node that is responsible for evaluating the delegate
subgraph
subgraph.
1. Create an instance of
[TfLiteDelegate](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/c/common.h#L611),
which is responsible for registering the kernel node and claiming the nodes
that the delegate can execute
that the delegate can execute.
To see it in code, let's define a delegate and call it "MyDelegate," which can
To see it in code, let's define a delegate and call it `MyDelegate`, which can
execute Conv2D and Mean operations faster.
```
```c++
// This is where the execution of the operations or whole graph happens.
// The class below has an empty implementation just as a guideline
// on the structure.
@ -113,9 +114,9 @@ class MyDelegate {
// the subgraph in the main TfLite graph.
TfLiteRegistration GetMyDelegateNodeRegistration() {
// This is the registration for the Delegate Node that gets added to
// the TFLite graph instead of the subGraph it replaces.
// It is treated as a an OP node. But in our case
// Init will initialize the delegate
// the TFLite graph instead of the subgraph it replaces.
// It is treated as an OP node. But in our case
// Init will initialize the delegate.
// Invoke will run the delegate graph.
// Prepare for preparing the delegate.
// Free for any cleaning needed by the delegate.
@ -232,6 +233,4 @@ if (interpreter->ModifyGraphWithDelegate(my_delegate) !=
...
// Don't forget to delete your delegate
delete my_delegate;
```

View File

@ -18,7 +18,7 @@ Another benefit with GPU inference is its power efficiency. GPUs carry out the
computations in a very efficient and optimized manner, so that they consume less
power and generate less heat than when the same task is run on CPUs.
## Demo App Tutorials
## Demo app tutorials
The easiest way to try out the GPU delegate is to follow the below tutorials,
which go through building our classification demo applications with GPU support.
@ -35,7 +35,7 @@ Note: This requires OpenCL or OpenGL ES (3.1 or higher).
#### Step 1. Clone the TensorFlow source code and open it in Android Studio
```
```sh
git clone https://github.com/tensorflow/tensorflow
```
@ -87,7 +87,7 @@ target 'YourProjectName'
pod 'TensorFlowLiteGpuExperimental'
```
#### Step 3. Enable the GPU Delegate
#### Step 3. Enable the GPU delegate
To enable the code that will use the GPU delegate, you will need to change
`TFLITE_USE_GPU_DELEGATE` from 0 to 1 in `CameraExampleViewController.h`.
@ -100,8 +100,7 @@ To enable the code that will use the GPU delegate, you will need to change
After following the previous step, you should be able to run the app.
#### Step 5. Release mode.
#### Step 5. Release mode
While in Step 4 you ran in debug mode, to get better performance, you should
change to a release build with the appropriate optimal Metal settings. In
@ -111,19 +110,18 @@ Scheme...`. Select `Run`. On the `Info` tab, change `Build Configuration`, from
![setting up release](images/iosdebug.png)
Then
click the `Options` tab and change `GPU Frame Capture` to `Disabled` and
Then click the `Options` tab and change `GPU Frame Capture` to `Disabled` and
`Metal API Validation` to `Disabled`.
![setting up metal options](images/iosmetal.png)
Lastly make sure Release only builds on 64-bit architecture. Under `Project
navigator -> tflite_camera_example -> PROJECT -> tflite_camera_example -> Build
Settings` set `Build Active Architecture Only > Release` to Yes.
Lastly make sure to select Release-only builds on 64-bit architecture. Under
`Project navigator -> tflite_camera_example -> PROJECT -> tflite_camera_example
-> Build Settings` set `Build Active Architecture Only > Release` to Yes.
![setting up release options](images/iosrelease.png)
## Trying the GPU Delegate on your own model
## Trying the GPU delegate on your own model
### Android
@ -197,12 +195,12 @@ To see a full list of supported ops, please see the [advanced documentation](gpu
## Non-supported models and ops
If some of the ops are not supported by the GPU delegate, the framework will
only run a part of the graph on the GPU and the remaining part on the CPU. Due
only run a part of the graph on the GPU and the remaining part on the CPU. Due
to the high cost of CPU/GPU synchronization, a split execution mode like this
will often result in a performance slower than when the whole network is run on
the CPU alone. In this case, the user will get a warning like:
will often result in slower performance than when the whole network is run on
the CPU alone. In this case, the user will get a warning like:
```
```none
WARNING: op code #42 cannot be handled by this delegate.
```
@ -226,6 +224,6 @@ In that sense, if the camera hardware supports image frames in RGBA, feeding
that 4-channel input is significantly faster as a memory copy (from 3-channel
RGB to 4-channel RGBX) can be avoided.
For best performance, do not hesitate to retrain your classifier with a mobile-
optimized network architecture. That is a significant part of optimization for
on-device inference.
For best performance, do not hesitate to retrain your classifier with a
mobile-optimized network architecture. That is a significant part of
optimization for on-device inference.

View File

@ -5,7 +5,7 @@ hardware accelerators. This document describes how to use the GPU backend using
the TensorFlow Lite delegate APIs on Android (requires OpenCL or OpenGL ES 3.1
and higher) and iOS (requires iOS 8 or later).
## Benefits of GPU Acceleration
## Benefits of GPU acceleration
### Speed
@ -24,13 +24,13 @@ GPUs do their computation with 16-bit or 32-bit floating point numbers and
decreased accuracy made quantization untenable for your models, running your
neural network on a GPU may eliminate this concern.
### Energy Efficiency
### Energy efficiency
Another benefit that comes with GPU inference is its power efficiency. A GPU
carries out computations in a very efficient and optimized way, consuming less
power and generating less heat than the same task run on a CPU.
## Supported Ops
## Supported ops
TensorFlow Lite on GPU supports the following ops in 16-bit and 32-bit float
precision:
@ -63,12 +63,12 @@ By default, all ops are only supported at version 1. Enabling the
[experimental quantization support](gpu_advanced.md#running-quantized-models-experimental-android-only)
allows the appropriate versions; for example, ADD v2.
## Basic Usage
## Basic usage
### Android (Java)
Run TensorFlow Lite on GPU with `TfLiteDelegate`. In Java, you can specify the
GpuDelegate through `Interpreter.Options`.
`GpuDelegate` through `Interpreter.Options`.
```java
// NEW: Prepare GPU delegate.
@ -167,7 +167,7 @@ then the developer must ensure that `Interpreter::Invoke()` is always called
from the same thread in which `Interpreter::ModifyGraphWithDelegate()` was
called.
## Advanced Usage
## Advanced usage
### Running quantized models (Experimental, Android only)

View File

@ -32,9 +32,9 @@ path are also supported, for e.g.,
[these quantized versions](https://www.tensorflow.org/lite/guide/hosted_models#quantized_models)
on our Hosted Models page.
## Hexagon Delegate Java API
## Hexagon delegate Java API
```
```java
public class HexagonDelegate implements Delegate, Closeable {
/*
@ -96,7 +96,7 @@ will need to add the Hexagon shared libs to both 32 and 64-bit lib folders.
#### Step 3. Create a delegate and initialize a TensorFlow Lite Interpreter
```
```java
import org.tensorflow.lite.experimental.HexagonDelegate;
// Create the Delegate instance.
@ -116,9 +116,9 @@ if (hexagonDelegate != null) {
}
```
## Hexagon Delegate C API
## Hexagon delegate C API
```
```c
struct TfLiteHexagonDelegateOptions {
// This corresponds to the debug level in the Hexagon SDK. 0 (default)
// means no debug.
@ -161,7 +161,7 @@ Void TfLiteHexagonInit();
Void TfLiteHexagonTearDown();
```
### Example Usage
### Example usage
#### Step 1. Edit app/build.gradle to use the nightly Hexagon delegate AAR
@ -213,7 +213,7 @@ will need to add the Hexagon shared libs to both 32 and 64-bit lib folders.
* Create a delegate, example:
```
```c
#include "tensorflow/lite/experimental/delegates/hexagon/hexagon_delegate.h"
// Assuming shared libraries are under "/data/local/tmp/"

View File

@ -5,8 +5,8 @@ optimizations can be applied to models so that they can be run within these
constraints. In addition, some optimizations allow the use of specialized
hardware for accelerated inference.
Tensorflow Lite and the
[Tensorflow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
TensorFlow Lite and the
[TensorFlow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
provide tools to minimize the complexity of optimizing inference.
It's recommended that you consider model optimization during your application
@ -79,9 +79,10 @@ with TensorFlow Lite.
### Quantization
Quantization works by reducing the precision of the numbers used to represent a
model's parameters, which by default are 32-bit floating point numbers. This
results in a smaller model size and faster computation.
[Quantization](https://www.tensorflow.org/model_optimization/guide/quantization/post_training)
works by reducing the precision of the numbers used to represent a model's
parameters, which by default are 32-bit floating point numbers. This results in
a smaller model size and faster computation.
The following types of quantization are available in TensorFlow Lite:
@ -145,7 +146,7 @@ For cases where the accuracy and latency targets are not met, or hardware
accelerator support is important,
[quantization-aware training](https://www.tensorflow.org/model_optimization/guide/quantization/training){:.external}
is the better option. See additional optimization techniques under the
[Tensorflow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization).
[TensorFlow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization).
If you want to further reduce your model size, you can try [pruning](#pruning)
prior to quantizing your models.

View File

@ -16,9 +16,9 @@ This page describes how to use the NNAPI delegate with the TensorFlow Lite
Interpreter in Java and Kotlin. For Android C APIs, please refer to
[Android Native Developer Kit documentation](https://developer.android.com/ndk/guides/neuralnetworks).
## Trying the NNAPI Delegate on your own model
## Trying the NNAPI delegate on your own model
### Gradle Import
### Gradle import
The NNAPI delegate is part of the TensorFlow Lite Android interpreter, release
1.14.0 or higher. You can import it to your project by adding the following to
@ -69,7 +69,7 @@ if(null != nnApiDelegate) {
}
```
## Best Practices
## Best practices
### Test performance before deploying
@ -164,6 +164,6 @@ The following models are known to be compatible with NNAPI:
NNAPI acceleration is also not supported when the model contains
dynamically-sized outputs. In this case, you will get a warning like:
```
```none
ERROR: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
```

View File

@ -2,9 +2,9 @@
Post-training quantization is a conversion technique that can reduce model size
while also improving CPU and hardware accelerator latency, with little
degradation in model accuracy. You can perform these techniques using an
already-trained float TensorFlow model when you convert it to TensorFlow Lite
format using the [TensorFlow Lite Converter](../convert/).
degradation in model accuracy. You can quantize an already-trained float
TensorFlow model when you convert it to TensorFlow Lite format using the
[TensorFlow Lite Converter](../convert/).
Note: The procedures on this page require TensorFlow 1.15 or higher.
@ -22,8 +22,8 @@ summary table of the choices and the benefits they provide:
| Float16 quantization | 2x smaller, potential GPU | CPU, GPU |
: : acceleration : :
This decision tree can help determine which post-training quantization method is
best for your use case:
The following decision tree can help determine which post-training quantization
method is best for your use case:
![post-training optimization options](images/optimization.jpg)
@ -47,9 +47,9 @@ To further improve latency, "dynamic-range" operators dynamically quantize
activations based on their range to 8-bits and perform computations with 8-bit
weights and activations. This optimization provides latencies close to fully
fixed-point inference. However, the outputs are still stored using floating
point, so that the speedup with dynamic-range ops is less than a full
fixed-point computation. Dynamic-range ops are available for the most
compute-intensive operators in a network:
point so that the speedup with dynamic-range ops is less than a full fixed-point
computation. Dynamic-range ops are available for the most compute-intensive
operators in a network:
* `tf.keras.layers.Dense`
* `tf.keras.layers.Conv2D`
@ -62,12 +62,12 @@ compute-intensive operators in a network:
### Full integer quantization
You can get further latency improvements, reductions in peak memory usage, and
access to integer only hardware devices or accelerators by making sure all model
math is integer quantized.
compatibility with integer only hardware devices or accelerators by making sure
all model math is integer quantized.
To do this, you need to measure the dynamic range of activations and inputs by
supplying sample input data to the converter. Refer to the
`representative_dataset_gen()` function used in the following code.
For full integer quantization, you need to measure the dynamic range of
activations and inputs by supplying sample input data to the converter. Refer to
the `representative_dataset_gen()` function used in the following code.
#### Integer with float fallback (using default float input/output)
@ -87,14 +87,14 @@ converter.representative_dataset = representative_dataset_gen</b>
tflite_quant_model = converter.convert()
</pre>
Note: This won't be compatible with integer only devices (such as 8-bit
microcontrollers) and accelerators (such as the Coral Edge TPU). For convenience
during inference, the input and output still remain float in order to have the
same interface as the original float only model.
Note: This `tflite_quant_model` won't be compatible with integer only devices
(such as 8-bit microcontrollers) and accelerators (such as the Coral Edge TPU)
because the input and output still remain float in order to have the same
interface as the original float only model.
#### Integer only
*This is a common use case for
*Creating integer only models is a common use case for
[TensorFlow Lite for Microcontrollers](https://www.tensorflow.org/lite/microcontrollers)
and [Coral Edge TPUs](https://coral.ai/).*
@ -135,18 +135,18 @@ converter.target_spec.supported_types = [tf.lite.constants.FLOAT16]</b>
tflite_quant_model = converter.convert()
</pre>
The advantages of this quantization are as follows:
The advantages of float16 quantization are as follows:
* Reduce model size by up to half (since all weights are now half the original
size).
* Minimal loss in accuracy.
* Supports some delegates (e.g. the GPU delegate) can operate directly on
float16 data, which results in faster execution than float32 computations.
* It reduces model size by up to half (since all weights become half of their
original size).
* It causes minimal loss in accuracy.
* It supports some delegates (e.g. the GPU delegate) which can operate
directly on float16 data, resulting in faster execution than float32
computations.
The disadvantages of this quantization are as follows:
The disadvantages of float16 quantization are as follows:
* Not a good choice for maximum performance (a quantization to fixed point
math would be better in that case).
* It does not reduce latency as much as a quantization to fixed point math.
* By default, a float16 quantized model will "dequantize" the weights values
to float32 when run on the CPU. (Note that the GPU delegate will not perform
this dequantization, since it can operate on float16 data.)

View File

@ -27,7 +27,7 @@ values in the range `[-128, 127]`, with a zero-point in range `[-128, 127]`.
There are other exceptions for particular operations that are documented below.
Note: In the past our quantized tooling used per-tensor, asymmetric, `uint8`
Note: In the past our quantization tooling used per-tensor, asymmetric, `uint8`
quantization. New tooling, reference kernels, and optimized kernels for 8-bit
quantization will use this spec.
@ -46,19 +46,19 @@ entire tensor. Per-axis quantization means that there will be one scale and/or
specifies the dimension of the Tensor's shape that the scales and zero-points
correspond to. For example, a tensor `t`, with `dims=[4, 3, 2, 1]` with
quantization params: `scale=[1.0, 2.0, 3.0]`, `zero_point=[1, 2, 3]`,
`quantization_dimension=1` will be quantized across the second dimension of t:
`quantization_dimension=1` will be quantized across the second dimension of `t`:
t[:, 0, :, :] will have scale[0]=1.0, zero_point[0]=1
t[:, 1, :, :] will have scale[1]=2.0, zero_point[1]=2
t[:, 2, :, :] will have scale[2]=3.0, zero_point[2]=3
Often, the quantized_dimension is the output_channel of the weights of
Often, the `quantized_dimension` is the `output_channel` of the weights of
convolutions, but in theory it can be the dimension that corresponds to each
dot-product in the kernel implementation, allowing more quantization granularity
without performance implications. This has large improvements to accuracy.
TFLite has per-axis support for a growing number of operations. At the time of
this document support exists for Conv2d and DepthwiseConv2d.
this document, support exists for Conv2d and DepthwiseConv2d.
## Symmetric vs asymmetric
@ -69,7 +69,7 @@ binary bit of precision. Since activations are only multiplied by constant
weights, the constant zero-point value can be optimized pretty heavily.
Weights are symmetric: forced to have zero-point equal to 0. Weight values are
multiplied by dynamic input and activation values. This means that there is a
multiplied by dynamic input and activation values. This means that there is an
unavoidable runtime cost of multiplying the zero-point of the weight with the
activation value. By enforcing that zero-point is 0 we can avoid this cost.