Minor TF Lite documentation updates.
PiperOrigin-RevId: 314643633 Change-Id: Ieaa82849c35d1071d6a750b60c72ca08c47a0db7
This commit is contained in:
parent
716e8a092c
commit
a628c339c5
@ -18,7 +18,7 @@ make sure you [have TensorFlow installed](https://www.tensorflow.org/install).
|
||||
You can use any compatible model, but the following MobileNet v1 model offers
|
||||
a good demonstration of a model trained to recognize 1,000 different objects.
|
||||
|
||||
```
|
||||
```sh
|
||||
# Get photo
|
||||
curl https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp > /tmp/grace_hopper.bmp
|
||||
# Get model
|
||||
@ -33,7 +33,7 @@ mv /tmp/mobilenet_v1_1.0_224/labels.txt /tmp/
|
||||
|
||||
Note: Instead use `python` if you're using Python 2.x.
|
||||
|
||||
```
|
||||
```sh
|
||||
python3 label_image.py \
|
||||
--model_file /tmp/mobilenet_v1_1.0_224.tflite \
|
||||
--label_file /tmp/labels.txt \
|
||||
|
@ -21,9 +21,9 @@ custom objects in
|
||||
|
||||
## Usage
|
||||
|
||||
The following example shows a SavedModel being converted:
|
||||
The following example shows a `SavedModel` being converted:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
tflite_convert \
|
||||
--saved_model_dir=/tmp/mobilenet_saved_model \
|
||||
--output_file=/tmp/mobilenet.tflite
|
||||
@ -39,7 +39,7 @@ The inputs and outputs are specified using the following commonly used flags:
|
||||
|
||||
To use all of the available flags, use the following command:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
tflite_convert --help
|
||||
```
|
||||
|
||||
@ -57,7 +57,7 @@ To obtain the latest version of the TensorFlow Lite converter CLI, we recommend
|
||||
installing the nightly build using
|
||||
[pip](https://www.tensorflow.org/install/pip):
|
||||
|
||||
```bash
|
||||
```sh
|
||||
pip install tf-nightly
|
||||
```
|
||||
|
||||
@ -65,7 +65,7 @@ Alternatively, you can
|
||||
[clone the TensorFlow repository](https://www.tensorflow.org/install/source) and
|
||||
use `bazel` to run the command:
|
||||
|
||||
```
|
||||
```sh
|
||||
bazel run //tensorflow/lite/python:tflite_convert -- \
|
||||
--saved_model_dir=/tmp/mobilenet_saved_model \
|
||||
--output_file=/tmp/mobilenet.tflite
|
||||
@ -75,13 +75,13 @@ bazel run //tensorflow/lite/python:tflite_convert -- \
|
||||
|
||||
There is a behavior change in how models containing
|
||||
[custom ops](https://www.tensorflow.org/lite/guide/ops_custom) (those for which
|
||||
users use to set --allow\_custom\_ops before) are handled in the
|
||||
users previously set `--allow_custom_ops` before) are handled in the
|
||||
[new converter](https://github.com/tensorflow/tensorflow/blob/917ebfe5fc1dfacf8eedcc746b7989bafc9588ef/tensorflow/lite/python/lite.py#L81).
|
||||
|
||||
**Built-in TensorFlow op**
|
||||
|
||||
If you are converting a model with a built-in TensorFlow op that does not exist
|
||||
in TensorFlow Lite, you should set --allow\_custom\_ops argument (same as
|
||||
in TensorFlow Lite, you should set `--allow_custom_ops` argument (same as
|
||||
before), explained [here](https://www.tensorflow.org/lite/guide/ops_custom).
|
||||
|
||||
**Custom op in TensorFlow**
|
||||
@ -90,27 +90,27 @@ If you are converting a model with a custom TensorFlow op, it is recommended
|
||||
that you write a [TensorFlow kernel](https://www.tensorflow.org/guide/create_op)
|
||||
and [TensorFlow Lite kernel](https://www.tensorflow.org/lite/guide/ops_custom).
|
||||
This ensures that the model is working end-to-end, from TensorFlow and
|
||||
TensorFlow Lite. This also requires setting the --allow\_custom\_ops argument.
|
||||
TensorFlow Lite. This also requires setting the `--allow_custom_ops` argument.
|
||||
|
||||
**Advanced custom op usage (not recommended)**
|
||||
|
||||
If the above is not possible, you can still convert a TensorFlow model
|
||||
containing a custom op without a corresponding kernel. You will need to pass the
|
||||
[OpDef](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_def.proto)
|
||||
of the custom op in TensorFlow using --custom\_opdefs flag, as long as you have
|
||||
of the custom op in TensorFlow using `--custom_opdefs` flag, as long as you have
|
||||
the corresponding OpDef registered in the TensorFlow global op registry. This
|
||||
ensures that the TensorFlow model is valid (i.e. loadable by the TensorFlow
|
||||
runtime).
|
||||
|
||||
If the custom op is not part of the global TensorFlow op registry, then the
|
||||
corresponding OpDef needs to be specified via the --custom\_opdefs flag. This is
|
||||
a list of an OpDef proto in string that needs to be additionally registered.
|
||||
Below is an example of an TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
|
||||
corresponding OpDef needs to be specified via the `--custom_opdefs` flag. This
|
||||
is a list of an OpDef proto in string that needs to be additionally registered.
|
||||
Below is an example of a TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
|
||||
attributes:
|
||||
|
||||
```
|
||||
--custom\_opdefs="name: 'TFLiteAwesomeCustomOp' input\_arg: { name: 'InputA'
|
||||
type: DT\_FLOAT } input\_arg: { name: ‘InputB' type: DT\_FLOAT }
|
||||
output\_arg: { name: 'Output' type: DT\_FLOAT } attr : { name: 'Attr1' type:
|
||||
```sh
|
||||
--custom_opdefs="name: 'TFLiteAwesomeCustomOp' input_arg: { name: 'InputA'
|
||||
type: DT_FLOAT } input_arg: { name: ‘InputB' type: DT_FLOAT }
|
||||
output_arg: { name: 'Output' type: DT_FLOAT } attr : { name: 'Attr1' type:
|
||||
'float'} attr : { name: 'Attr2' type: 'list(float)'}"
|
||||
```
|
||||
|
@ -13,8 +13,8 @@ The API for TensorFlow 1.X is available
|
||||
|
||||
## New in TF 2.2
|
||||
|
||||
Switching to use a new converter backend by default - in the nightly builds and
|
||||
TF 2.2 stable. Why we are switching?
|
||||
TensorFlow Lite has switched to use a new converter backend by default - in the
|
||||
nightly builds and TF 2.2 stable. Why we did we switch?
|
||||
|
||||
* Enables conversion of new classes of models, including Mask R-CNN, Mobile
|
||||
BERT, and many more
|
||||
@ -46,9 +46,9 @@ In case you encounter any issues:
|
||||
and
|
||||
[Command Line Tool](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/convert/cmdline.md)
|
||||
documentation
|
||||
* Switch to the old converter by setting --experimental_new_converter=false
|
||||
* Switch to the old converter by setting `--experimental_new_converter=false`
|
||||
(from the [tflite_convert](https://www.tensorflow.org/lite/convert/cmdline)
|
||||
command line tool) or converter.experimental_new_converter=False (from
|
||||
command line tool) or `converter.experimental_new_converter=False` (from the
|
||||
[Python API](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter))
|
||||
|
||||
## Device deployment
|
||||
|
@ -20,7 +20,7 @@ set this up [here](https://www.tensorflow.org/install).
|
||||
After setup the Python programming environment, you will need to install
|
||||
additional tooling:
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install tflite-support
|
||||
```
|
||||
|
||||
@ -53,31 +53,31 @@ Lite metadata:
|
||||
|
||||
### Examples
|
||||
|
||||
Note: The export directory specified has to exist before you run the script, it
|
||||
Note: The export directory specified has to exist before you run the script; it
|
||||
does not get created as part of the process.
|
||||
|
||||
You can find examples on how the metadata should be populated for different
|
||||
types of models here:
|
||||
|
||||
#### Image Classification
|
||||
#### Image classification
|
||||
|
||||
Download the script
|
||||
[here](https://github.com/tensorflow/examples/tree/master/lite/examples/image_classification/metadata/metadata_writer_for_image_classifier.py)
|
||||
and run the script like this:
|
||||
|
||||
```
|
||||
```sh
|
||||
python ./metadata_writer_for_image_classifier.py \
|
||||
--model_file=./model_without_metadata/mobilenet_v1_0.75_160_quantized.tflite \
|
||||
--label_file=./model_without_metadata/labels.txt \
|
||||
--export_directory=model_with_metadata
|
||||
```
|
||||
|
||||
The rest of this guide will highlight some of the key sections in the Image
|
||||
Classification example to illustrate the key elements.
|
||||
The rest of this guide will highlight some of the key sections in the image
|
||||
classification example to illustrate the key elements.
|
||||
|
||||
### Deep dive into the Image Classification example
|
||||
### Deep dive into the image classification example
|
||||
|
||||
#### Model Information
|
||||
#### Model information
|
||||
|
||||
Metadata starts by creating a new model info:
|
||||
|
||||
@ -103,9 +103,9 @@ model_meta.license = ("Apache License. Version 2.0 "
|
||||
|
||||
#### Input / output information
|
||||
|
||||
This describe your model's input and output signature and it maybe used by
|
||||
automatic code generators to create pre- and post- processing code. To create an
|
||||
input or output information about a tensor:
|
||||
This section shows you how to describe your model's input and output signature.
|
||||
This metadata may be used by automatic code generators to create pre- and post-
|
||||
processing code. To create input or output information about a tensor:
|
||||
|
||||
```python
|
||||
# Creates input info.
|
||||
@ -115,13 +115,13 @@ input_meta = _metadata_fb.TensorMetadataT()
|
||||
output_meta = _metadata_fb.TensorMetadataT()
|
||||
```
|
||||
|
||||
#### Image Input
|
||||
#### Image input
|
||||
|
||||
Image is a common input type for machine learning. TensorFlow Lite metadata
|
||||
supports information such as colorspace and pre-processing information such as
|
||||
normalization. One thing that does not required manual input is the dimension of
|
||||
the image as this is already provided by the shape of the input tensor and can
|
||||
be automatically inferred.
|
||||
normalization. The dimension of the image does not require manual specification
|
||||
since it is already provided by the shape of the input tensor and can be
|
||||
automatically inferred.
|
||||
|
||||
```python
|
||||
input_meta.name = "image"
|
||||
@ -153,7 +153,7 @@ input_meta.stats = input_stats
|
||||
Label can be mapped to an output tensor via an associated file using
|
||||
`TENSOR_AXIS_LABELS`.
|
||||
|
||||
```Python
|
||||
```python
|
||||
# Creates output info.
|
||||
output_meta = _metadata_fb.TensorMetadataT()
|
||||
output_meta.name = "probability"
|
||||
@ -175,7 +175,7 @@ output_meta.associatedFiles = [label_file]
|
||||
|
||||
#### Put it all together
|
||||
|
||||
The following code pull the model information together with the input and output
|
||||
The following code combines the model information with the input and output
|
||||
information:
|
||||
|
||||
```python
|
||||
@ -192,8 +192,8 @@ b.Finish(
|
||||
metadata_buf = b.Output()
|
||||
```
|
||||
|
||||
Once the data structure is ready, the writing of the metadata into the tflite
|
||||
file is done via the `populate` method:
|
||||
Once the data structure is ready, the metadata is written into the TFLite file
|
||||
via the `populate` method:
|
||||
|
||||
```python
|
||||
populator = _metadata.MetadataPopulator.with_model_file(model_file)
|
||||
@ -204,9 +204,9 @@ populator.populate()
|
||||
|
||||
#### Verify the metadata
|
||||
|
||||
You can read back the metadata in a tflite file using the `MetadataDisplayer`:
|
||||
You can read the metadata in a TFLite file using the `MetadataDisplayer`:
|
||||
|
||||
```Python
|
||||
```python
|
||||
displayer = _metadata.MetadataDisplayer.with_model_file(export_model_path)
|
||||
export_json_file = os.path.join(FLAGS.export_directory,
|
||||
os.path.splitext(model_basename)[0] + ".json")
|
||||
|
@ -192,7 +192,7 @@ specific wrapper code. For more information, please refer to the
|
||||
|
||||
The TensorFlow nightly can be installed using the following command:
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install tf-nightly
|
||||
```
|
||||
|
||||
@ -208,13 +208,13 @@ either install the nightly build with
|
||||
|
||||
There is a behavior change in how models containing
|
||||
[custom ops](https://www.tensorflow.org/lite/guide/ops_custom) (those for which
|
||||
users use to set allow\_custom\_ops before) are handled in the
|
||||
users previously set `allow_custom_ops` before) are handled in the
|
||||
[new converter](https://github.com/tensorflow/tensorflow/blob/917ebfe5fc1dfacf8eedcc746b7989bafc9588ef/tensorflow/lite/python/lite.py#L81).
|
||||
|
||||
**Built-in TensorFlow op**
|
||||
|
||||
If you are converting a model with a built-in TensorFlow op that does not exist
|
||||
in TensorFlow Lite, you should set allow\_custom\_ops attribute (same as
|
||||
in TensorFlow Lite, you should set the `allow_custom_ops` attribute (same as
|
||||
before), explained [here](https://www.tensorflow.org/lite/guide/ops_custom).
|
||||
|
||||
**Custom op in TensorFlow**
|
||||
@ -223,27 +223,27 @@ If you are converting a model with a custom TensorFlow op, it is recommended
|
||||
that you write a [TensorFlow kernel](https://www.tensorflow.org/guide/create_op)
|
||||
and [TensorFlow Lite kernel](https://www.tensorflow.org/lite/guide/ops_custom).
|
||||
This ensures that the model is working end-to-end, from TensorFlow and
|
||||
TensorFlow Lite. This also requires setting the allow\_custom\_ops attribute.
|
||||
TensorFlow Lite. This also requires setting the `allow_custom_ops` attribute.
|
||||
|
||||
**Advanced custom op usage (not recommended)**
|
||||
|
||||
If the above is not possible, you can still convert a TensorFlow model
|
||||
containing a custom op without a corresponding kernel. You will need to pass the
|
||||
[OpDef](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_def.proto)
|
||||
of the custom op in TensorFlow using --custom\_opdefs flag, as long as you have
|
||||
of the custom op in TensorFlow using `--custom_opdefs` flag, as long as you have
|
||||
the corresponding OpDef registered in the TensorFlow global op registry. This
|
||||
ensures that the TensorFlow model is valid (i.e. loadable by the TensorFlow
|
||||
runtime).
|
||||
|
||||
If the custom op is not part of the global TensorFlow op registry, then the
|
||||
corresponding OpDef needs to be specified via the --custom\_opdefs flag. This is
|
||||
a list of an OpDef proto in string that needs to be additionally registered.
|
||||
Below is an example of an TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
|
||||
corresponding OpDef needs to be specified via the `--custom_opdefs` flag. This
|
||||
is a list of an OpDef proto in string that needs to be additionally registered.
|
||||
Below is an example of a TFLiteAwesomeCustomOp with 2 inputs, 1 output, and 2
|
||||
attributes:
|
||||
|
||||
```
|
||||
converter.custom\_opdefs="name: 'TFLiteAwesomeCustomOp' input\_arg: { name: 'InputA'
|
||||
type: DT\_FLOAT } input\_arg: { name: ‘InputB' type: DT\_FLOAT }
|
||||
output\_arg: { name: 'Output' type: DT\_FLOAT } attr : { name: 'Attr1' type:
|
||||
'float'} attr : { name: 'Attr2' type: 'list(float)'}"
|
||||
```python
|
||||
converter.custom_opdefs="""name: 'TFLiteAwesomeCustomOp' input_arg: { name: 'InputA'
|
||||
type: DT_FLOAT } input_arg: { name: ‘InputB' type: DT_FLOAT }
|
||||
output_arg: { name: 'Output' type: DT_FLOAT } attr : { name: 'Attr1' type:
|
||||
'float'} attr : { name: 'Attr2' type: 'list(float)'}"""
|
||||
```
|
||||
|
@ -12,7 +12,7 @@ has latency benefits, but prioritizes size reduction.
|
||||
|
||||
During conversion, set the `optimizations` flag to optimize for size:
|
||||
|
||||
```
|
||||
```python
|
||||
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
|
||||
converter.optimizations = [tf.lite.Optimize.DEFAULT]
|
||||
tflite_quant_model = converter.convert()
|
||||
@ -26,7 +26,7 @@ quantized. To do this, we need to measure the dynamic range of activations and
|
||||
inputs with a representative data set. You can simply create an input data
|
||||
generator and provide it to our converter.
|
||||
|
||||
```
|
||||
```python
|
||||
import tensorflow as tf
|
||||
|
||||
def representative_dataset_gen():
|
||||
@ -40,7 +40,7 @@ converter.representative_dataset = representative_dataset_gen
|
||||
tflite_quant_model = converter.convert()
|
||||
```
|
||||
|
||||
# During training: Quantizing models for integer-only execution.
|
||||
# During training: Quantizing models for integer-only execution
|
||||
|
||||
Quantizing models for integer-only execution gets a model with even faster
|
||||
latency, smaller size, and integer-only accelerators compatible model.
|
||||
@ -52,7 +52,7 @@ compatible with 2.0 semantics is in progress.
|
||||
|
||||
Convert the graph:
|
||||
|
||||
```
|
||||
```python
|
||||
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(saved_model_dir)
|
||||
converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
|
||||
input_arrays = converter.get_input_arrays()
|
||||
@ -75,5 +75,5 @@ the
|
||||
`std_dev` is 255 / (float_max - float_min).
|
||||
|
||||
For most users, we recommend using post-training quantization. We are working on
|
||||
new tools for post-training and during training quantization that we hope will
|
||||
new tools for post-training and training-time quantization that we hope will
|
||||
simplify generating quantized models.
|
||||
|
@ -191,7 +191,7 @@ build --action_env ANDROID_SDK_API_LEVEL="23"
|
||||
build --action_env ANDROID_SDK_HOME="/usr/local/android/android-sdk-linux"
|
||||
```
|
||||
|
||||
#### Build and Install
|
||||
#### Build and install
|
||||
|
||||
Once Bazel is properly configured, you can build the TensorFlow Lite AAR from
|
||||
the root checkout directory as follows:
|
||||
@ -268,11 +268,13 @@ If you want to use TFLite through C++ libraries, you can build the shared
|
||||
libraries:
|
||||
|
||||
32bit armeabi-v7a:
|
||||
```
|
||||
|
||||
```sh
|
||||
bazel build -c opt --config=android_arm //tensorflow/lite:libtensorflowlite.so
|
||||
```
|
||||
|
||||
64bit arm64-v8a:
|
||||
```
|
||||
|
||||
```sh
|
||||
bazel build -c opt --config=android_arm64 //tensorflow/lite:libtensorflowlite.so
|
||||
```
|
||||
|
@ -14,23 +14,22 @@ or
|
||||
|
||||
## Cross-compile for Raspberry Pi
|
||||
|
||||
Instruction has been tested on Ubuntu 16.04.3 64-bit PC (AMD64) and TensorFlow
|
||||
devel docker image
|
||||
The following instructions have been tested on Ubuntu 16.04.3 64-bit PC (AMD64)
|
||||
and TensorFlow devel docker image
|
||||
[tensorflow/tensorflow:nightly-devel](https://hub.docker.com/r/tensorflow/tensorflow/tags/).
|
||||
|
||||
To cross compile TensorFlow Lite follow the steps:
|
||||
|
||||
1. Clone official Raspberry Pi cross-compilation toolchain:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
git clone https://github.com/raspberrypi/tools.git rpi_tools
|
||||
```
|
||||
|
||||
2. Clone TensorFlow repository:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
|
||||
|
||||
```
|
||||
|
||||
**Note:** If you're using the TensorFlow Docker image, the repo is already
|
||||
@ -39,7 +38,7 @@ To cross compile TensorFlow Lite follow the steps:
|
||||
3. Run following script at the root of the TensorFlow repository to download
|
||||
all the build dependencies:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
cd tensorflow_src && ./tensorflow/lite/tools/make/download_dependencies.sh
|
||||
```
|
||||
|
||||
@ -47,7 +46,7 @@ To cross compile TensorFlow Lite follow the steps:
|
||||
|
||||
4. To build ARMv7 binary for Raspberry Pi 2, 3 and 4 execute:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
PATH=../rpi_tools/arm-bcm2708/arm-rpi-4.9.3-linux-gnueabihf/bin:$PATH ./tensorflow/lite/tools/make/build_rpi_lib.sh
|
||||
```
|
||||
|
||||
@ -56,7 +55,7 @@ To cross compile TensorFlow Lite follow the steps:
|
||||
|
||||
5. To build ARMv6 binary for Raspberry Pi Zero execute:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
PATH=../rpi_tools/arm-bcm2708/arm-rpi-4.9.3-linux-gnueabihf/bin:$PATH ./tensorflow/lite/tools/make/build_rpi_lib.sh TARGET_ARCH=armv6
|
||||
```
|
||||
|
||||
@ -65,28 +64,27 @@ To cross compile TensorFlow Lite follow the steps:
|
||||
|
||||
## Compile natively on Raspberry Pi
|
||||
|
||||
Instruction has been tested on Raspberry Pi Zero, Raspbian GNU/Linux 10
|
||||
(buster), gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1):
|
||||
The following instructions have been tested on Raspberry Pi Zero, Raspbian
|
||||
GNU/Linux 10 (buster), gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1):
|
||||
|
||||
To natively compile TensorFlow Lite follow the steps:
|
||||
|
||||
1. Log in to your Raspberry Pi and install the toolchain:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
sudo apt-get install build-essential
|
||||
```
|
||||
|
||||
2. Clone TensorFlow repository:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
git clone https://github.com/tensorflow/tensorflow.git tensorflow_src
|
||||
|
||||
```
|
||||
|
||||
3. Run following script at the root of the TensorFlow repository to download
|
||||
all the build dependencies:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
cd tensorflow_src && ./tensorflow/lite/tools/make/download_dependencies.sh
|
||||
```
|
||||
|
||||
@ -94,7 +92,7 @@ To natively compile TensorFlow Lite follow the steps:
|
||||
|
||||
4. You should then be able to compile TensorFlow Lite with:
|
||||
|
||||
```bash
|
||||
```sh
|
||||
./tensorflow/lite/tools/make/build_rpi_lib.sh
|
||||
```
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
# Generate code from TensorFlow Lite metadata
|
||||
|
||||
Note: TensorFlow Lite wrapper code generator is in experimental (beta) phase and
|
||||
it currently only supports Android.
|
||||
currently only supports Android.
|
||||
|
||||
For TensorFlow Lite model enhanced with [metadata](../convert/metadata.md),
|
||||
developers can use the TensorFlow Lite Android wrapper code generator to create
|
||||
@ -19,13 +19,13 @@ to see how the codegen tool parses each field.
|
||||
|
||||
You will need to install the following tooling in your terminal:
|
||||
|
||||
```
|
||||
```sh
|
||||
pip install tflite-support
|
||||
```
|
||||
|
||||
Once completed, the code generator can be used using the following syntax:
|
||||
|
||||
```
|
||||
```sh
|
||||
tflite_codegen --model=./model_with_metadata/mobilenet_v1_0.75_160_quantized.tflite \
|
||||
--package_name=org.tensorflow.lite.classify \
|
||||
--model_class_name=MyClassifierModel \
|
||||
@ -66,7 +66,7 @@ In the app module that will be consuming the generated library module:
|
||||
|
||||
Under the android section, add the following:
|
||||
|
||||
```java
|
||||
```build
|
||||
aaptOptions {
|
||||
noCompress "tflite"
|
||||
}
|
||||
@ -74,14 +74,14 @@ aaptOptions {
|
||||
|
||||
Under the dependencies section, add the following:
|
||||
|
||||
```java
|
||||
```build
|
||||
implementation project(":classify_wrapper")
|
||||
```
|
||||
|
||||
### Step 3: Using the model
|
||||
|
||||
```java
|
||||
// 1. Initialize the Model
|
||||
// 1. Initialize the model
|
||||
MyClassifierModel myImageClassifier = null;
|
||||
|
||||
try {
|
||||
@ -92,14 +92,14 @@ try {
|
||||
|
||||
if(null != myImageClassifier) {
|
||||
|
||||
// 2. Setting the input with a Bitmap called inputBitmap
|
||||
// 2. Set the input with a Bitmap called inputBitmap
|
||||
MyClassifierModel.Inputs inputs = myImageClassifier.createInputs();
|
||||
inputs.loadImage(inputBitmap));
|
||||
|
||||
// 3. Running the model
|
||||
// 3. Run the model
|
||||
MyClassifierModel.Outputs outputs = myImageClassifier.run(inputs);
|
||||
|
||||
// 4. Retrieving the result
|
||||
// 4. Retrieve the result
|
||||
Map<String, Float> labeledProbability = outputs.getProbability();
|
||||
}
|
||||
```
|
||||
@ -117,7 +117,7 @@ parameters:
|
||||
* (Optional) **`numThreads`**: Number of threads used to run the model -
|
||||
default is one.
|
||||
|
||||
For example, to use a NNAPI delegate and up to three threads, you can initiate
|
||||
For example, to use a NNAPI delegate and up to three threads, you can initialize
|
||||
the model like this:
|
||||
|
||||
```java
|
||||
@ -135,7 +135,7 @@ try {
|
||||
Under the app module that will uses the library module, insert the following
|
||||
lines under the android section:
|
||||
|
||||
```java
|
||||
```build
|
||||
aaptOptions {
|
||||
noCompress "tflite"
|
||||
}
|
||||
|
@ -55,7 +55,7 @@ look for the inputs and outputs in the graph. To visualize a `.pb` file, use the
|
||||
[`import_pb_to_tensorboard.py`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/import_pb_to_tensorboard.py)
|
||||
script like below:
|
||||
|
||||
```
|
||||
```sh
|
||||
python import_pb_to_tensorboard.py --model_dir <model path> --log_dir <log dir path>
|
||||
```
|
||||
|
||||
@ -68,7 +68,7 @@ script in our repository.
|
||||
* [Clone the TensorFlow repository](https://www.tensorflow.org/install/source)
|
||||
* Run the `visualize.py` script with bazel:
|
||||
|
||||
```
|
||||
```sh
|
||||
bazel run //tensorflow/lite/tools:visualize model.tflite visualized_model.html
|
||||
```
|
||||
|
||||
@ -101,8 +101,8 @@ random data to feed to the interpreter.
|
||||
|
||||
#### How do I reduce the size of my converted TensorFlow Lite model?
|
||||
|
||||
[Post-training quantization](../performance/post_training_quantization.md) can be
|
||||
used during conversion to TensorFlow Lite to reduce the size of the model.
|
||||
[Post-training quantization](../performance/post_training_quantization.md) can
|
||||
be used during conversion to TensorFlow Lite to reduce the size of the model.
|
||||
Post-training quantization quantizes weights to 8-bits of precision from
|
||||
floating-point and dequantizes them during runtime to perform floating point
|
||||
computations. However, note that this could have some accuracy implications.
|
||||
|
@ -11,14 +11,13 @@ each step of the developer workflow and provides links to further instructions.
|
||||
<a id="1_choose_a_model"></a>
|
||||
|
||||
A TensorFlow model is a data structure that contains the logic and knowledge of
|
||||
a machine learning network trained to solve a particular problem.
|
||||
There are many ways to obtain a TensorFlow model, from using pre-trained models
|
||||
to training your own.
|
||||
a machine learning network trained to solve a particular problem. There are many
|
||||
ways to obtain a TensorFlow model, from using pre-trained models to training
|
||||
your own.
|
||||
|
||||
To use a model with TensorFlow Lite, you must convert a
|
||||
full TensorFlow model into the TensorFlow Lite format—you
|
||||
cannot create or train a model using TensorFlow Lite. So you must start with a
|
||||
regular TensorFlow model, and then
|
||||
To use a model with TensorFlow Lite, you must convert a full TensorFlow model
|
||||
into the TensorFlow Lite format—you cannot create or train a model using
|
||||
TensorFlow Lite. So you must start with a regular TensorFlow model, and then
|
||||
[convert the model](#2_convert_the_model_format).
|
||||
|
||||
Note: TensorFlow Lite supports a limited subset of TensorFlow operations, so not
|
||||
@ -135,9 +134,9 @@ performance or reduce file size. This is covered in section 4,
|
||||
|
||||
### Ops compatibility
|
||||
|
||||
TensorFlow Lite currently supports a [limited subset of TensorFlow
|
||||
operations](ops_compatibility.md). The long term goal is for all TensorFlow
|
||||
operations to be supported.
|
||||
TensorFlow Lite currently supports a
|
||||
[limited subset of TensorFlow operations](ops_compatibility.md). The long term
|
||||
goal is for all TensorFlow operations to be supported.
|
||||
|
||||
If the model you wish to convert contains unsupported operations, you can use
|
||||
[TensorFlow Select](ops_select.md) to include operations from TensorFlow. This
|
||||
@ -215,11 +214,9 @@ Embedded Linux is an important platform for deploying machine learning. To get
|
||||
started using Python to perform inference with your TensorFlow Lite models,
|
||||
follow the [Python quickstart](python.md).
|
||||
|
||||
To instead install the C++ library, see the
|
||||
build instructions for [Raspberry Pi](build_rpi.md) or
|
||||
[Arm64-based boards](build_arm64.md) (for boards such as Odroid C2, Pine64, and
|
||||
NanoPi).
|
||||
|
||||
To instead install the C++ library, see the build instructions for
|
||||
[Raspberry Pi](build_rpi.md) or [Arm64-based boards](build_arm64.md) (for boards
|
||||
such as Odroid C2, Pine64, and NanoPi).
|
||||
|
||||
### Microcontrollers
|
||||
|
||||
|
@ -310,7 +310,7 @@ The following example shows how to use the Python interpreter to load a
|
||||
import numpy as np
|
||||
import tensorflow as tf
|
||||
|
||||
# Load TFLite model and allocate tensors.
|
||||
# Load the TFLite model and allocate tensors.
|
||||
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
|
||||
interpreter.allocate_tensors()
|
||||
|
||||
@ -318,7 +318,7 @@ interpreter.allocate_tensors()
|
||||
input_details = interpreter.get_input_details()
|
||||
output_details = interpreter.get_output_details()
|
||||
|
||||
# Test model on random input data.
|
||||
# Test the model on random input data.
|
||||
input_shape = input_details[0]['shape']
|
||||
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
|
||||
interpreter.set_tensor(input_details[0]['index'], input_data)
|
||||
@ -331,10 +331,11 @@ output_data = interpreter.get_tensor(output_details[0]['index'])
|
||||
print(output_data)
|
||||
```
|
||||
|
||||
Alternative to loading the model as a pre-converted `.tflite` file, you can
|
||||
combine your code with the [TensorFlow Lite Converter Python API](
|
||||
../convert/python_api.md) (`tf.lite.TFLiteConverter`), allowing you to convert
|
||||
your TensorFlow model into the TensorFlow Lite format and then run an inference:
|
||||
Alternatively to loading the model as a pre-converted `.tflite` file, you can
|
||||
combine your code with the
|
||||
[TensorFlow Lite Converter Python API](../convert/python_api.md)
|
||||
(`tf.lite.TFLiteConverter`), allowing you to convert your TensorFlow model into
|
||||
the TensorFlow Lite format and then run an inference:
|
||||
|
||||
```python
|
||||
import numpy as np
|
||||
@ -350,7 +351,7 @@ with tf.Session() as sess:
|
||||
converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
|
||||
tflite_model = converter.convert()
|
||||
|
||||
# Load TFLite model and allocate tensors.
|
||||
# Load the TFLite model and allocate tensors.
|
||||
interpreter = tf.lite.Interpreter(model_content=tflite_model)
|
||||
interpreter.allocate_tensors()
|
||||
|
||||
@ -384,22 +385,22 @@ including all the tensors. The latter allows implementations to access their
|
||||
inputs and outputs.
|
||||
|
||||
When the interpreter loads a model, it calls `init()` once for each node in the
|
||||
graph. A given `init()` will be called more than once if the op is used
|
||||
multiple times in the graph. For custom ops a configuration buffer will be
|
||||
provided, containing a flexbuffer that maps parameter names to their values.
|
||||
The buffer is empty for builtin ops because the interpreter has already parsed
|
||||
the op parameters. Kernel implementation that require state should initialize
|
||||
it here and transfer ownership to the caller. For each `init()` call, there
|
||||
will be a corresponding call to `free()`, allowing implementations to dispose
|
||||
of the buffer they might have allocated in `init()`.
|
||||
graph. A given `init()` will be called more than once if the op is used multiple
|
||||
times in the graph. For custom ops a configuration buffer will be provided,
|
||||
containing a flexbuffer that maps parameter names to their values. The buffer is
|
||||
empty for builtin ops because the interpreter has already parsed the op
|
||||
parameters. Kernel implementations that require state should initialize it here
|
||||
and transfer ownership to the caller. For each `init()` call, there will be a
|
||||
corresponding call to `free()`, allowing implementations to dispose of the
|
||||
buffer they might have allocated in `init()`.
|
||||
|
||||
Whenever the input tensors are resized the interpreter will go through the
|
||||
Whenever the input tensors are resized, the interpreter will go through the
|
||||
graph notifying implementations of the change. This gives them the chance to
|
||||
resize their internal buffer, check validity of input shapes and types, and
|
||||
recalculate output shapes. This is all done through `prepare()` and
|
||||
implementation can access their state using `node->user_data`.
|
||||
recalculate output shapes. This is all done through `prepare()`, and
|
||||
implementations can access their state using `node->user_data`.
|
||||
|
||||
Finally, each time inference runs the interpreter traverses the graph calling
|
||||
Finally, each time inference runs, the interpreter traverses the graph calling
|
||||
`invoke()`, and here too the state is available as `node->user_data`.
|
||||
|
||||
Custom ops can be implemented in exactly the same way as builtin ops, by
|
||||
|
@ -23,7 +23,7 @@ quantized (`uint8`, `int8`) inference, but many ops do not yet for other types
|
||||
like `tf.float16` and strings.
|
||||
|
||||
Apart from using different version of the operations, the other difference
|
||||
between floating-point and quantized models lies in the way they are converted.
|
||||
between floating-point and quantized models is the way they are converted.
|
||||
Quantized conversion requires dynamic range information for tensors. This
|
||||
requires "fake-quantization" during model training, getting range information
|
||||
via a calibration data set, or doing "on-the-fly" range estimation. See
|
||||
@ -32,8 +32,8 @@ via a calibration data set, or doing "on-the-fly" range estimation. See
|
||||
## Data format and broadcasting
|
||||
|
||||
At the moment TensorFlow Lite supports only TensorFlow's "NHWC" format, and
|
||||
broadcasting is only support in a limited number of ops (tf.add, tf.mul, tf.sub,
|
||||
and tf.div).
|
||||
broadcasting is only support in a limited number of ops (`tf.add`, `tf.mul`,
|
||||
`tf.sub`, and `tf.div`).
|
||||
|
||||
## Compatible operations
|
||||
|
||||
@ -58,8 +58,8 @@ counterparts:
|
||||
* `tf.nn.softmax` —As long as tensors are 2D and axis is the last dimension.
|
||||
* `tf.nn.top_k`
|
||||
* `tf.one_hot`
|
||||
* `tf.pad` —As long as mode and constant_values are not used.
|
||||
* `tf.reduce_mean` —As long as the reduction_indices attribute is not used.
|
||||
* `tf.pad` —As long as `mode` and `constant_values` are not used.
|
||||
* `tf.reduce_mean` —As long as the `reduction_indices` attribute is not used.
|
||||
* `tf.reshape`
|
||||
* `tf.sigmoid`
|
||||
* `tf.space_to_batch_nd` —As long as the input tensor is 4D (1 batch + 2
|
||||
@ -67,19 +67,19 @@ counterparts:
|
||||
* `tf.space_to_depth`
|
||||
* `tf.split` —As long as num is not provided and `num_or_size_split` contains
|
||||
number of splits as a 0D tensor.
|
||||
* `tf.squeeze` —As long as axis is not provided.
|
||||
* `tf.squeeze` —As long as `axis` is not provided.
|
||||
* `tf.squared_difference`
|
||||
* `tf.strided_slice` —As long as `ellipsis_mask and new_axis_mask` are not
|
||||
* `tf.strided_slice` —As long as `ellipsis_mask` and `new_axis_mask` are not
|
||||
used.
|
||||
* `tf.transpose` —As long as conjugate is not used.
|
||||
* `tf.transpose` —As long as `conjugate` is not used.
|
||||
|
||||
## Straight-forward conversions, constant-folding and fusing
|
||||
|
||||
A number of TensorFlow operations can be processed by TensorFlow Lite even
|
||||
though they have no direct equivalent. This is the case for operations that can
|
||||
be simply removed from the graph (tf.identity), replaced by tensors
|
||||
(tf.placeholder), or fused into more complex operations (tf.nn.bias_add). Even
|
||||
some supported operations may sometimes be removed through one of these
|
||||
be simply removed from the graph (`tf.identity`), replaced by tensors
|
||||
(`tf.placeholder`), or fused into more complex operations (`tf.nn.bias_add`).
|
||||
Even some supported operations may sometimes be removed through one of these
|
||||
processes.
|
||||
|
||||
Here is a non-exhaustive list of TensorFlow operations that are usually removed
|
||||
@ -115,7 +115,7 @@ from the graph:
|
||||
* `tf.nn.relu`
|
||||
* `tf.nn.relu6`
|
||||
|
||||
Note: Many of those operations don't have TensorFlow Lite equivalents and the
|
||||
Note: Many of those operations don't have TensorFlow Lite equivalents, and the
|
||||
corresponding model will not be convertible if they can't be elided or fused.
|
||||
|
||||
## Unsupported operations
|
||||
@ -343,10 +343,10 @@ Outputs {
|
||||
**FLOOR**
|
||||
|
||||
```
|
||||
inputs {
|
||||
Inputs {
|
||||
0: tensor
|
||||
}
|
||||
outputs: {
|
||||
Outputs: {
|
||||
0: result of computing element-wise floor of the input tensor
|
||||
}
|
||||
```
|
||||
|
@ -1,32 +1,31 @@
|
||||
# Custom operators
|
||||
|
||||
TensorFlow Lite currently supports a subset of TensorFlow operators. It supports
|
||||
the use of user-provided implementations (as known as custom implementations) if
|
||||
the use of user-provided implementations (known as custom implementations) if
|
||||
the model contains an operator that is not supported. Providing custom kernels
|
||||
is also a way of evaluating a series of TensorFlow operations as a single fused
|
||||
TensorFlow Lite operations.
|
||||
is also a way of executing a series of TensorFlow operations as a single fused
|
||||
TensorFlow Lite operation.
|
||||
|
||||
Using custom operators consists of three steps.
|
||||
|
||||
* Making sure the TensorFlow Graph Def or SavedModel refers to the correctly
|
||||
* Make sure the TensorFlow Graph Def or SavedModel refers to the correctly
|
||||
named TensorFlow Lite operator.
|
||||
|
||||
* Registering a custom kernel with TensorFlow Lite so that the runtime knows
|
||||
how to map your operator and parameters in your graph to executable C/C++
|
||||
code.
|
||||
* Register a custom kernel with TensorFlow Lite so that the runtime knows how
|
||||
to map your operator and parameters in your graph to executable C/C++ code.
|
||||
|
||||
* Testing and profiling your operator correctness and performance,
|
||||
respectively. If you wish to test just your custom operator it is best to
|
||||
create a model with just your custom operator and using the
|
||||
* Test and profile your operator correctness and performance, respectively. If
|
||||
you wish to test just your custom operator, it is best to create a model
|
||||
with just your custom operator and using the
|
||||
[benchmark_model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/benchmark/benchmark_model_test.cc)
|
||||
program
|
||||
program.
|
||||
|
||||
Below we describe a complete example of defining Sin and some links to existing
|
||||
conversion process involving custom operators.
|
||||
Below we describe a complete example of defining `Sin` and some links to
|
||||
existing conversion process involving custom operators.
|
||||
|
||||
## Making a custom operator for Sin
|
||||
|
||||
Let’s walk through this an example of supporting a TensorFlow operator that
|
||||
Let’s walk through an example of supporting a TensorFlow operator that
|
||||
TensorFlow Lite does not have. Assume we are using the `Sin` operator and that
|
||||
we are building a very simple model for a function `y = sin(x + offset)`, where
|
||||
`offset` is trainable.
|
||||
@ -45,11 +44,11 @@ optimizer = tf.train.GradientDescentOptimizer(0.001)
|
||||
train = optimizer.minimize(loss)
|
||||
```
|
||||
|
||||
If you convert this model to Tensorflow Lite format using the TensorFlow Lite
|
||||
If you convert this model to TensorFlow Lite format using the TensorFlow Lite
|
||||
Optimizing Converter with `--allow_custom_ops` argument, and run it with the
|
||||
default interpreter, the interpreter will raise the following error messages:
|
||||
|
||||
```
|
||||
```none
|
||||
Didn't find custom op for name 'Sin'
|
||||
Registration failed.
|
||||
```
|
||||
@ -57,8 +56,7 @@ Registration failed.
|
||||
### Defining the kernel in the TensorFlow Lite runtime
|
||||
|
||||
All we need to do to use the op in TensorFlow Lite is define two functions
|
||||
(`Prepare` and `Eval`), and construct a `TfLiteRegistration`. This code would
|
||||
look something like this:
|
||||
(`Prepare` and `Eval`), and construct a `TfLiteRegistration`:
|
||||
|
||||
```cpp
|
||||
TfLiteStatus SinPrepare(TfLiteContext* context, TfLiteNode* node) {
|
||||
@ -105,44 +103,45 @@ TfLiteRegistration* Register_SIN() {
|
||||
}
|
||||
```
|
||||
|
||||
When initializing the `OpResolver`, add the custom op into the resolver, this
|
||||
When initializing the `OpResolver`, add the custom op into the resolver. This
|
||||
will register the operator with Tensorflow Lite so that TensorFlow Lite can use
|
||||
the new implementation. Note that the last two arguments in TfLiteRegistration
|
||||
correspond to the `SinPrepare` and `SinEval()` functions you defined for the
|
||||
custom op. If you used two functions to initialize variables used in the op and
|
||||
free up space: `Init()` and `Free()`, then they would be added to the first two
|
||||
arguments of TfLiteRegistration; they are set to nullptr in this example.
|
||||
the new implementation. Note that the last two arguments in `TfLiteRegistration`
|
||||
correspond to the `SinPrepare` and `SinEval` functions you defined for the
|
||||
custom op. If you used `SinInit` and `SinFree` functions to initialize variables
|
||||
used in the op and to free up space, respectively, then they would be added to
|
||||
the first two arguments of `TfLiteRegistration`; those arguments are set to
|
||||
`nullptr` in this example.
|
||||
|
||||
```cpp
|
||||
tflite::ops::builtin::BuiltinOpResolver builtins;
|
||||
builtins.AddCustom("Sin", Register_SIN());
|
||||
```
|
||||
|
||||
If you want to make your custom operators in Java, you would currently need to
|
||||
If you want to define your custom operators in Java, you would currently need to
|
||||
build your own custom JNI layer and compile your own AAR
|
||||
[in this jni code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/java/src/main/native/builtin_ops_jni.cc).
|
||||
Similarly, if you wish to make these operators available in Python you can place
|
||||
your registrations in the
|
||||
Similarly, if you wish to define these operators available in Python you can
|
||||
place your registrations in the
|
||||
[Python wrapper code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/interpreter_wrapper/interpreter_wrapper.cc).
|
||||
|
||||
Note that a similar process as above can be followed for supporting for a set of
|
||||
Note that a similar process as above can be followed for supporting a set of
|
||||
operations instead of a single operator. Just add as many `AddCustom` operators
|
||||
as you need. In addition, `BuiltinOpResolver` also allows you to override
|
||||
implementations of builtins by using the `AddBuiltin`.
|
||||
|
||||
## Best Practices
|
||||
## Best practices
|
||||
|
||||
### Writing TensorFlow Lite kernels best practices
|
||||
|
||||
1. Optimize memory allocations and de-allocations cautiously. It is more
|
||||
efficient to allocate memory in Prepare() instead of Invoke(), and allocate
|
||||
memory before a loop instead of in every iteration. Use temporary tensors
|
||||
data rather than mallocing yourself (see item 2). Use pointers/references
|
||||
instead of copying as much as possible.
|
||||
1. Optimize memory allocations and de-allocations cautiously. Allocating memory
|
||||
in `Prepare` is more efficient than in `Invoke`, and allocating memory
|
||||
before a loop is better than in every iteration. Use temporary tensors data
|
||||
rather than mallocing yourself (see item 2). Use pointers/references instead
|
||||
of copying as much as possible.
|
||||
|
||||
2. If a data structure will persist during the entire operation, we advise
|
||||
pre-allocating the memory using temporary tensors. You may need to use
|
||||
OpData struct to reference the tensor indices in other functions. See
|
||||
OpData struct to reference the tensor indices in other functions. See the
|
||||
example in the
|
||||
[kernel for convolution](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/conv.cc).
|
||||
A sample code snippet is below
|
||||
@ -158,23 +157,24 @@ implementations of builtins by using the `AddBuiltin`.
|
||||
```
|
||||
|
||||
3. If it doesn't cost too much wasted memory, prefer using a static fixed size
|
||||
array (or in Resize() pre-allocated std::vector) rather than using a
|
||||
dynamically allocating std::vector every iteration of execution.
|
||||
array (or a pre-allocated `std::vector` in `Resize`) rather than using a
|
||||
dynamically allocated `std::vector` every iteration of execution.
|
||||
|
||||
4. Avoid instantiating standard library container templates that don't already
|
||||
exist, because they affect binary size. For example, if you need a std::map
|
||||
in your operation that doesn't exist in other kernels, using a std::vector
|
||||
with direct indexing mapping could work while keeping the binary size small.
|
||||
See what other kernels use to gain insight (or ask).
|
||||
exist, because they affect binary size. For example, if you need a
|
||||
`std::map` in your operation that doesn't exist in other kernels, using a
|
||||
`std::vector` with direct indexing mapping could work while keeping the
|
||||
binary size small. See what other kernels use to gain insight (or ask).
|
||||
|
||||
5. Check the pointer to the memory returned by malloc. If this pointer is
|
||||
nullptr, no operations should be performed using that pointer. If you
|
||||
malloc() in a function and have an error exit, deallocate memory before you
|
||||
5. Check the pointer to the memory returned by `malloc`. If this pointer is
|
||||
`nullptr`, no operations should be performed using that pointer. If you
|
||||
`malloc` in a function and have an error exit, deallocate memory before you
|
||||
exit.
|
||||
|
||||
6. Use TF_LITE_ENSURE(context, condition) to check for a specific condition.
|
||||
Your code must not leave memory hanging when TF_LITE_ENSURE is done, i.e.,
|
||||
these should be done before any resources are allocated that will leak.
|
||||
6. Use `TF_LITE_ENSURE(context, condition)` to check for a specific condition.
|
||||
Your code must not leave memory hanging when `TF_LITE_ENSURE` is used, i.e.,
|
||||
these macros should be used before any resources are allocated that will
|
||||
leak.
|
||||
|
||||
### Conversion best practices
|
||||
|
||||
@ -187,10 +187,10 @@ instead of the builtin TensorFlow one.
|
||||
#### Converting TensorFlow models to convert graphs
|
||||
|
||||
In TensorFlow you can use the `tf.lite.OpHint` class to encapsulate groups of
|
||||
operators when you create a TensorFlow graph. This allows you then to extract a
|
||||
graph def that has references to those operators. This is currently experimental
|
||||
and should only be used by advanced users. There is a full example of how to use
|
||||
this in the
|
||||
operators when you create a TensorFlow graph. This encapsulation allows you then
|
||||
to extract a graph def that has references to those operators. `tf.lite.OpHint`
|
||||
is currently experimental and should only be used by advanced users. A full
|
||||
example of how to use this class is in the
|
||||
[OpHint code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/op_hint.py).
|
||||
|
||||
In addition, you can also use a manual graph substitution approach to rewrite
|
||||
@ -198,23 +198,23 @@ Tensorflow graphs. There is an example of how this is done in single shot object
|
||||
based detection models
|
||||
[export script](https://github.com/tensorflow/models/blob/master/research/object_detection/export_tflite_ssd_graph.py).
|
||||
|
||||
### TF Graph Attributes
|
||||
### TF graph attributes
|
||||
|
||||
When `tflite_convert` converts a TensorFlow graph into TFLite format, it makes
|
||||
some assumption about custom operations that might not be correct. In this case,
|
||||
some assumptions about custom operations. If the assumptions are not correct,
|
||||
the generated graph may not execute.
|
||||
|
||||
It is possible to add additional information about your custom op output to TF
|
||||
graph before it is converted. The following attributes are supported:
|
||||
It is possible to add additional information about your custom op output to the
|
||||
TF graph before it is converted. The following attributes are supported:
|
||||
|
||||
- **_output_quantized** a boolean attribute, true if the operation outputs are
|
||||
quantized
|
||||
- **_output_types** a list of types for output tensors
|
||||
- **_output_shapes** a list of shapes for output tensors
|
||||
|
||||
#### Setting the Attributes
|
||||
#### Setting the attributes
|
||||
|
||||
This is an example how the attributes can be set:
|
||||
The following example demonstrates how the attributes can be set:
|
||||
|
||||
```python
|
||||
frozen_graph_def = tf.graph_util.convert_variables_to_constants(...)
|
||||
@ -231,5 +231,5 @@ tflite_model = tf.lite.toco_convert(
|
||||
frozen_graph_def,...)
|
||||
```
|
||||
|
||||
**Note:** After the attributes are set, the graph can not be executed by
|
||||
Tensorflow, therefore it should be done just before the conversion.
|
||||
**Note:** After the attributes are set, the graph cannot be executed by
|
||||
TensorFlow. Therefore, the attributes should be set just before the conversion.
|
||||
|
@ -2,9 +2,9 @@
|
||||
|
||||
Caution: This feature is experimental.
|
||||
|
||||
The TensorFlow Lite builtin op library has grown rapidly, and will continue to
|
||||
The TensorFlow Lite builtin op library has grown rapidly and will continue to
|
||||
grow, but there remains a long tail of TensorFlow ops that are not yet natively
|
||||
supported by TensorFlow Lite . These unsupported ops can be a point of friction
|
||||
supported by TensorFlow Lite. These unsupported ops can be a point of friction
|
||||
in the TensorFlow Lite model conversion process. To that end, the team has
|
||||
recently been working on an experimental mechanism for reducing this friction.
|
||||
|
||||
@ -55,7 +55,7 @@ limitations.
|
||||
The following example shows how to use this feature in the
|
||||
[`TFLiteConverter`](./convert/python_api.md) Python API.
|
||||
|
||||
```
|
||||
```python
|
||||
import tensorflow as tf
|
||||
|
||||
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
|
||||
@ -69,7 +69,7 @@ The following example shows how to use this feature in the
|
||||
[`tflite_convert`](../convert/cmdline_examples.md) command line tool using the
|
||||
command line flag `target_ops`.
|
||||
|
||||
```
|
||||
```sh
|
||||
tflite_convert \
|
||||
--output_file=/tmp/foo.tflite \
|
||||
--graph_def_file=/tmp/foo.pb \
|
||||
@ -81,7 +81,7 @@ tflite_convert \
|
||||
When building and running `tflite_convert` directly with `bazel`, please pass
|
||||
`--define=tflite_convert_with_select_tf_ops=true` as an additional argument.
|
||||
|
||||
```
|
||||
```sh
|
||||
bazel run --define=tflite_convert_with_select_tf_ops=true tflite_convert -- \
|
||||
--output_file=/tmp/foo.tflite \
|
||||
--graph_def_file=/tmp/foo.pb \
|
||||
@ -157,7 +157,7 @@ Finally, in your app's `build.gradle`, ensure you have the `mavenLocal()`
|
||||
dependency and replace the standard TensorFlow Lite dependency with the one that
|
||||
has support for select TensorFlow ops:
|
||||
|
||||
```
|
||||
```build
|
||||
allprojects {
|
||||
repositories {
|
||||
jcenter()
|
||||
@ -220,7 +220,7 @@ creating the interpreter at runtime as long as the delegate is linked into the
|
||||
client library. It is not necessary to explicitly install the delegate instance
|
||||
as is typically required with other delegate types.
|
||||
|
||||
### Python pip Package
|
||||
### Python pip package
|
||||
|
||||
Python support is actively under development.
|
||||
|
||||
@ -241,7 +241,7 @@ Build | Time (milliseconds)
|
||||
Only built-in ops (`TFLITE_BUILTIN`) | 260.7
|
||||
Using only TF ops (`SELECT_TF_OPS`) | 264.5
|
||||
|
||||
### Binary Size
|
||||
### Binary size
|
||||
|
||||
The following table describes the binary size of TensorFlow Lite for each build.
|
||||
These targets were built for Android using `--config=android_arm -c opt`.
|
||||
@ -251,22 +251,22 @@ Build | C++ Binary Size | Android APK Size
|
||||
Only built-in ops | 796 KB | 561 KB
|
||||
Built-in ops + TF ops | 23.0 MB | 8.0 MB
|
||||
|
||||
## Known Limitations
|
||||
## Known limitations
|
||||
|
||||
The following is a list of some of the known limitations:
|
||||
|
||||
* Control flow ops are not yet supported.
|
||||
* The
|
||||
[`post_training_quantization`](https://www.tensorflow.org/performance/post_training_quantization)
|
||||
flag is currently not supported for TensorFlow ops so it will not quantize
|
||||
flag is currently not supported for TensorFlow ops, so it will not quantize
|
||||
weights for any TensorFlow ops. In models with both TensorFlow Lite builtin
|
||||
ops and TensorFlow ops, the weights for the builtin ops will be quantized.
|
||||
* Ops that require explicit initialization from resources, like HashTableV2,
|
||||
* Ops that require explicit initialization from resources, like `HashTableV2`,
|
||||
are not yet supported.
|
||||
* Certain TensorFlow ops may not support the full set of input/output types
|
||||
that are typically available on stock TensorFlow.
|
||||
|
||||
## Future Plans
|
||||
## Future plans
|
||||
|
||||
The following is a list of improvements to this pipeline that are in progress:
|
||||
|
||||
@ -276,5 +276,5 @@ The following is a list of improvements to this pipeline that are in progress:
|
||||
* *Improved usability* - The conversion process will be simplified to only
|
||||
require a single pass through the converter. Additionally, pre-built Android
|
||||
AAR and iOS CocoaPod binaries will be provided.
|
||||
* *Improved performance* - There is work being done to ensure TensorFlow Lite
|
||||
with TensorFlow ops has performance parity to TensorFlow Mobile.
|
||||
* *Improved performance* - Work is being done to ensure TensorFlow Lite with
|
||||
TensorFlow ops has performance parity to TensorFlow Mobile.
|
||||
|
@ -13,7 +13,7 @@ existing ops. In addition, it guarantees the following:
|
||||
reads a new model that contains a new version of an op which isn't
|
||||
supported, it should report the error.
|
||||
|
||||
## Example: Adding Dilation into Convolution
|
||||
## Example: Adding dilation into convolution
|
||||
|
||||
The remainder of this document explains op versioning in TFLite by showing how
|
||||
to add dilation parameters to the convolution operation.
|
||||
@ -25,7 +25,7 @@ Knowledge of dilation is not required to understand this document. Note that:
|
||||
* Old convolution kernels that don't support dilation are equivalent to
|
||||
setting the dilation factors to 1.
|
||||
|
||||
### Change FlatBuffer Schema
|
||||
### Change FlatBuffer schema
|
||||
|
||||
To add new parameters into an op, change the options table in
|
||||
`lite/schema/schema.fbs`.
|
||||
@ -66,7 +66,7 @@ table Conv2DOptions {
|
||||
The file `lite/schema/schema_generated.h` should be re-generated for the new
|
||||
schema.
|
||||
|
||||
### Change C Structures and Kernel Implementation
|
||||
### Change C structures and kernel implementation
|
||||
|
||||
In TensorFlow Lite, the kernel implementation is decoupled from FlatBuffer
|
||||
definition. The kernels read the parameter from C structures defined in
|
||||
@ -103,7 +103,7 @@ typedef struct {
|
||||
Please also change the kernel implementation to read the newly added parameters
|
||||
from the C structures. The details are omitted here.
|
||||
|
||||
### Change the FlatBuffer Reading Code
|
||||
### Change the FlatBuffer reading code
|
||||
|
||||
The logic to read FlatBuffer and produce C structure is in
|
||||
`lite/core/api/flatbuffer_conversions.cc`.
|
||||
@ -132,7 +132,7 @@ reads an old model file where dilation factors are missing, it will use 1 as
|
||||
the default value, and the new kernel will work consistently with the old
|
||||
kernel.
|
||||
|
||||
### Change Kernel Registration
|
||||
### Change kernel registration
|
||||
|
||||
The MutableOpResolver (defined in `lite/op_resolver.h`) provides a few functions
|
||||
to register op kernels. The minimum and maximum version are 1 by default:
|
||||
@ -192,23 +192,24 @@ int GetVersion(const Operator& op) const override {
|
||||
### Update the operator version map
|
||||
|
||||
The last step is to add the new version info into the operator version map. This
|
||||
step is required because we need generate the model's minimum required runtime
|
||||
version based on this version map.
|
||||
step is required because we need to generate the model's minimum required
|
||||
runtime version based on this version map.
|
||||
|
||||
To do this, you need to add a new map entry in `lite/toco/tflite/op_version.cc`.
|
||||
|
||||
In this example, it means you need to add the following into `op_version_map`:
|
||||
In this example, you need to add the following entry into `op_version_map`:
|
||||
|
||||
```
|
||||
{{OperatorType::kConv, 3}, "kPendingReleaseOpVersion"}
|
||||
```
|
||||
(`kPendingReleaseOpVersion` will be replaced with the appropriate release
|
||||
version in the next stable release.)
|
||||
|
||||
### Delegation Implementation
|
||||
### Delegation implementation
|
||||
|
||||
TensorFlow Lite provides a delegation API which enables delegating ops to
|
||||
hardware backends. In Delegate's `Prepare` function, check if the version
|
||||
is supported for every node in Delegation code.
|
||||
hardware backends. In the delegate's `Prepare` function, check if the version is
|
||||
supported for every node in Delegation code.
|
||||
|
||||
```
|
||||
const int kMinVersion = 1;
|
||||
|
@ -372,5 +372,5 @@ applications on GitHub</a>
|
||||
To learn how to use the library in your own project, read
|
||||
[Understand the C++ library](library.md).
|
||||
|
||||
For information about training and convert models for deployment on
|
||||
For information about training and converting models for deployment on
|
||||
microcontrollers, read [Build and convert models](build_convert.md).
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Question and Answer
|
||||
# Question and answer
|
||||
|
||||
Use a pre-trained model to answer questions based on the content of a given
|
||||
passage.
|
||||
@ -44,7 +44,7 @@ pre-processing including tokenization and post-processing steps that are
|
||||
described in the BERT [paper](https://arxiv.org/abs/1810.04805) and implemented
|
||||
in the sample app.
|
||||
|
||||
## Performance Benchmarks
|
||||
## Performance benchmarks
|
||||
|
||||
Performance benchmark numbers are generated with the tool
|
||||
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
|
||||
|
@ -216,7 +216,7 @@ experiment with different models to find the optimal balance between
|
||||
performance, accuracy, and model size. For guidance, see
|
||||
<a href="#choose_a_different_model">Choose a different model</a>.
|
||||
|
||||
## Performance Benchmarks
|
||||
## Performance benchmarks
|
||||
|
||||
Performance benchmark numbers are generated with the tool
|
||||
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
|
||||
@ -260,7 +260,7 @@ Performance benchmark numbers are generated with the tool
|
||||
|
||||
## Choose a different model
|
||||
|
||||
There are a large number of image classification models available on our
|
||||
A large number of image classification models are available on our
|
||||
<a href="../../guide/hosted_models.md">List of hosted models</a>. You should aim
|
||||
to choose the optimal model for your application based on performance, accuracy
|
||||
and model size. There are trade-offs between each of them.
|
||||
@ -302,7 +302,7 @@ Our quantized MobileNet models’ size ranges from 0.5 to 3.4 Mb.
|
||||
|
||||
### Architecture
|
||||
|
||||
There are several different architectures of models available on
|
||||
Several different model architectures are available on
|
||||
<a href="../../guide/hosted_models.md">List of hosted models</a>, indicated by
|
||||
the model’s name. For example, you can choose between MobileNet, Inception, and
|
||||
others.
|
||||
|
@ -180,7 +180,7 @@ edges in a similar manner.
|
||||
|
||||
Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our <a href="#get_started">example applications</a>).<br /><br />The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly.
|
||||
|
||||
## Performance Benchmarks
|
||||
## Performance benchmarks
|
||||
|
||||
Performance benchmark numbers are generated with the tool
|
||||
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
|
||||
|
@ -43,7 +43,7 @@ The current implementation includes the following features:
|
||||
<li>DeepLabv3+: We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder-decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade-off precision and runtime.</li>
|
||||
</ol>
|
||||
|
||||
## Performance Benchmarks
|
||||
## Performance benchmarks
|
||||
|
||||
Performance benchmark numbers are generated with the tool
|
||||
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Text Classification
|
||||
# Text classification
|
||||
|
||||
Use a pre-trained model to category a paragraph into predefined groups.
|
||||
|
||||
@ -44,7 +44,7 @@ Here are the steps to classify a paragraph with the model:
|
||||
* This model was trained on movie reviews dataset so you may experience
|
||||
reduced accuracy when classifying text of other domains.
|
||||
|
||||
## Performance Benchmarks
|
||||
## Performance benchmarks
|
||||
|
||||
Performance benchmark numbers are generated with the tool
|
||||
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
|
||||
|
@ -19,7 +19,7 @@ and assumed in the `/data/local/tmp` directory.
|
||||
|
||||
To run the benchmark:
|
||||
|
||||
```
|
||||
```sh
|
||||
adb shell /data/local/tmp/benchmark_model \
|
||||
--num_threads=4 \
|
||||
--graph=/data/local/tmp/tflite_models/${GRAPH} \
|
||||
@ -27,8 +27,8 @@ adb shell /data/local/tmp/benchmark_model \
|
||||
--num_runs=50
|
||||
```
|
||||
|
||||
To run with nnapi delegate, please set --use_nnapi=true. To run with gpu
|
||||
delegate, please set --use_gpu=true.
|
||||
To run with nnapi delegate, please set `--use_nnapi=true`. To run with gpu
|
||||
delegate, please set `--use_gpu=true`.
|
||||
|
||||
The performance values below are measured on Android 10.
|
||||
|
||||
|
@ -50,9 +50,8 @@ operator is executed. Check out our
|
||||
## Optimize your model
|
||||
|
||||
Model optimization aims to create smaller models that are generally faster and
|
||||
more energy efficient, so that they can be deployed on mobile devices. There are
|
||||
multiple optimization techniques supported by TensorFlow Lite, such as
|
||||
quantization.
|
||||
more energy efficient, so that they can be deployed on mobile devices.
|
||||
TensorFlow Lite supports multiple optimization techniques, such as quantization.
|
||||
|
||||
Check out our [model optimization docs](model_optimization.md) for details.
|
||||
|
||||
@ -78,7 +77,7 @@ If your application is not carefully designed, there can be redundant copies
|
||||
when feeding the input to and reading the output from the model. Make sure to
|
||||
eliminate redundant copies. If you are using higher level APIs, like Java, make
|
||||
sure to carefully check the documentation for performance caveats. For example,
|
||||
the Java API is a lot faster if ByteBuffers are used as
|
||||
the Java API is a lot faster if `ByteBuffers` are used as
|
||||
[inputs](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/java/src/main/java/org/tensorflow/lite/Interpreter.java#L175).
|
||||
|
||||
## Profile your application with platform specific tools
|
||||
|
@ -1,8 +1,8 @@
|
||||
# Tensorflow Lite Core ML Delegate
|
||||
# Tensorflow Lite Core ML delegate
|
||||
|
||||
TensorFlow Lite Core ML Delegate enables running TensorFlow Lite models on
|
||||
[Core ML framework](https://developer.apple.com/documentation/coreml),
|
||||
which results in faster model inference on iOS devices.
|
||||
The TensorFlow Lite Core ML delegate enables running TensorFlow Lite models on
|
||||
[Core ML framework](https://developer.apple.com/documentation/coreml), which
|
||||
results in faster model inference on iOS devices.
|
||||
|
||||
Note: This delegate is in experimental (beta) phase.
|
||||
|
||||
|
@ -1,6 +1,7 @@
|
||||
# TensorFlow Lite delegates
|
||||
|
||||
Note: Delegate API is still experimental and is subject to change.
|
||||
|
||||
## What is a TensorFlow Lite delegate?
|
||||
|
||||
A TensorFlow Lite delegate is a way to delegate part or all of graph execution
|
||||
@ -51,9 +52,9 @@ If a delegate was provided for specific operations, then TensorFlow Lite will
|
||||
split the graph into multiple subgraphs where each subgraph will be handled by a
|
||||
delegate.
|
||||
|
||||
Let's assume that there is a delegate "MyDelegate," which has a faster
|
||||
implementation for Conv2D and Mean operations. The resulting main graph will be
|
||||
updated to look like below.
|
||||
Let's assume that a delegate, `MyDelegate`, has a faster implementation for
|
||||
Conv2D and Mean operations. The resulting main graph will be updated to look
|
||||
like below.
|
||||
|
||||

|
||||
|
||||
@ -74,16 +75,16 @@ _Note that the API used below is experimental and is subject to change._
|
||||
Based on the previous section, to add a delegate, we need to do the following:
|
||||
|
||||
1. Define a kernel node that is responsible for evaluating the delegate
|
||||
subgraph
|
||||
subgraph.
|
||||
1. Create an instance of
|
||||
[TfLiteDelegate](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/c/common.h#L611),
|
||||
which is responsible for registering the kernel node and claiming the nodes
|
||||
that the delegate can execute
|
||||
that the delegate can execute.
|
||||
|
||||
To see it in code, let's define a delegate and call it "MyDelegate," which can
|
||||
To see it in code, let's define a delegate and call it `MyDelegate`, which can
|
||||
execute Conv2D and Mean operations faster.
|
||||
|
||||
```
|
||||
```c++
|
||||
// This is where the execution of the operations or whole graph happens.
|
||||
// The class below has an empty implementation just as a guideline
|
||||
// on the structure.
|
||||
@ -113,9 +114,9 @@ class MyDelegate {
|
||||
// the subgraph in the main TfLite graph.
|
||||
TfLiteRegistration GetMyDelegateNodeRegistration() {
|
||||
// This is the registration for the Delegate Node that gets added to
|
||||
// the TFLite graph instead of the subGraph it replaces.
|
||||
// It is treated as a an OP node. But in our case
|
||||
// Init will initialize the delegate
|
||||
// the TFLite graph instead of the subgraph it replaces.
|
||||
// It is treated as an OP node. But in our case
|
||||
// Init will initialize the delegate.
|
||||
// Invoke will run the delegate graph.
|
||||
// Prepare for preparing the delegate.
|
||||
// Free for any cleaning needed by the delegate.
|
||||
@ -232,6 +233,4 @@ if (interpreter->ModifyGraphWithDelegate(my_delegate) !=
|
||||
...
|
||||
// Don't forget to delete your delegate
|
||||
delete my_delegate;
|
||||
|
||||
|
||||
```
|
||||
|
@ -18,7 +18,7 @@ Another benefit with GPU inference is its power efficiency. GPUs carry out the
|
||||
computations in a very efficient and optimized manner, so that they consume less
|
||||
power and generate less heat than when the same task is run on CPUs.
|
||||
|
||||
## Demo App Tutorials
|
||||
## Demo app tutorials
|
||||
|
||||
The easiest way to try out the GPU delegate is to follow the below tutorials,
|
||||
which go through building our classification demo applications with GPU support.
|
||||
@ -35,7 +35,7 @@ Note: This requires OpenCL or OpenGL ES (3.1 or higher).
|
||||
|
||||
#### Step 1. Clone the TensorFlow source code and open it in Android Studio
|
||||
|
||||
```
|
||||
```sh
|
||||
git clone https://github.com/tensorflow/tensorflow
|
||||
```
|
||||
|
||||
@ -87,7 +87,7 @@ target 'YourProjectName'
|
||||
pod 'TensorFlowLiteGpuExperimental'
|
||||
```
|
||||
|
||||
#### Step 3. Enable the GPU Delegate
|
||||
#### Step 3. Enable the GPU delegate
|
||||
|
||||
To enable the code that will use the GPU delegate, you will need to change
|
||||
`TFLITE_USE_GPU_DELEGATE` from 0 to 1 in `CameraExampleViewController.h`.
|
||||
@ -100,8 +100,7 @@ To enable the code that will use the GPU delegate, you will need to change
|
||||
|
||||
After following the previous step, you should be able to run the app.
|
||||
|
||||
|
||||
#### Step 5. Release mode.
|
||||
#### Step 5. Release mode
|
||||
|
||||
While in Step 4 you ran in debug mode, to get better performance, you should
|
||||
change to a release build with the appropriate optimal Metal settings. In
|
||||
@ -111,19 +110,18 @@ Scheme...`. Select `Run`. On the `Info` tab, change `Build Configuration`, from
|
||||
|
||||

|
||||
|
||||
Then
|
||||
click the `Options` tab and change `GPU Frame Capture` to `Disabled` and
|
||||
Then click the `Options` tab and change `GPU Frame Capture` to `Disabled` and
|
||||
`Metal API Validation` to `Disabled`.
|
||||
|
||||

|
||||
|
||||
Lastly make sure Release only builds on 64-bit architecture. Under `Project
|
||||
navigator -> tflite_camera_example -> PROJECT -> tflite_camera_example -> Build
|
||||
Settings` set `Build Active Architecture Only > Release` to Yes.
|
||||
Lastly make sure to select Release-only builds on 64-bit architecture. Under
|
||||
`Project navigator -> tflite_camera_example -> PROJECT -> tflite_camera_example
|
||||
-> Build Settings` set `Build Active Architecture Only > Release` to Yes.
|
||||
|
||||

|
||||
|
||||
## Trying the GPU Delegate on your own model
|
||||
## Trying the GPU delegate on your own model
|
||||
|
||||
### Android
|
||||
|
||||
@ -197,12 +195,12 @@ To see a full list of supported ops, please see the [advanced documentation](gpu
|
||||
## Non-supported models and ops
|
||||
|
||||
If some of the ops are not supported by the GPU delegate, the framework will
|
||||
only run a part of the graph on the GPU and the remaining part on the CPU. Due
|
||||
only run a part of the graph on the GPU and the remaining part on the CPU. Due
|
||||
to the high cost of CPU/GPU synchronization, a split execution mode like this
|
||||
will often result in a performance slower than when the whole network is run on
|
||||
the CPU alone. In this case, the user will get a warning like:
|
||||
will often result in slower performance than when the whole network is run on
|
||||
the CPU alone. In this case, the user will get a warning like:
|
||||
|
||||
```
|
||||
```none
|
||||
WARNING: op code #42 cannot be handled by this delegate.
|
||||
```
|
||||
|
||||
@ -226,6 +224,6 @@ In that sense, if the camera hardware supports image frames in RGBA, feeding
|
||||
that 4-channel input is significantly faster as a memory copy (from 3-channel
|
||||
RGB to 4-channel RGBX) can be avoided.
|
||||
|
||||
For best performance, do not hesitate to retrain your classifier with a mobile-
|
||||
optimized network architecture. That is a significant part of optimization for
|
||||
on-device inference.
|
||||
For best performance, do not hesitate to retrain your classifier with a
|
||||
mobile-optimized network architecture. That is a significant part of
|
||||
optimization for on-device inference.
|
||||
|
@ -5,7 +5,7 @@ hardware accelerators. This document describes how to use the GPU backend using
|
||||
the TensorFlow Lite delegate APIs on Android (requires OpenCL or OpenGL ES 3.1
|
||||
and higher) and iOS (requires iOS 8 or later).
|
||||
|
||||
## Benefits of GPU Acceleration
|
||||
## Benefits of GPU acceleration
|
||||
|
||||
### Speed
|
||||
|
||||
@ -24,13 +24,13 @@ GPUs do their computation with 16-bit or 32-bit floating point numbers and
|
||||
decreased accuracy made quantization untenable for your models, running your
|
||||
neural network on a GPU may eliminate this concern.
|
||||
|
||||
### Energy Efficiency
|
||||
### Energy efficiency
|
||||
|
||||
Another benefit that comes with GPU inference is its power efficiency. A GPU
|
||||
carries out computations in a very efficient and optimized way, consuming less
|
||||
power and generating less heat than the same task run on a CPU.
|
||||
|
||||
## Supported Ops
|
||||
## Supported ops
|
||||
|
||||
TensorFlow Lite on GPU supports the following ops in 16-bit and 32-bit float
|
||||
precision:
|
||||
@ -63,12 +63,12 @@ By default, all ops are only supported at version 1. Enabling the
|
||||
[experimental quantization support](gpu_advanced.md#running-quantized-models-experimental-android-only)
|
||||
allows the appropriate versions; for example, ADD v2.
|
||||
|
||||
## Basic Usage
|
||||
## Basic usage
|
||||
|
||||
### Android (Java)
|
||||
|
||||
Run TensorFlow Lite on GPU with `TfLiteDelegate`. In Java, you can specify the
|
||||
GpuDelegate through `Interpreter.Options`.
|
||||
`GpuDelegate` through `Interpreter.Options`.
|
||||
|
||||
```java
|
||||
// NEW: Prepare GPU delegate.
|
||||
@ -167,7 +167,7 @@ then the developer must ensure that `Interpreter::Invoke()` is always called
|
||||
from the same thread in which `Interpreter::ModifyGraphWithDelegate()` was
|
||||
called.
|
||||
|
||||
## Advanced Usage
|
||||
## Advanced usage
|
||||
|
||||
### Running quantized models (Experimental, Android only)
|
||||
|
||||
|
@ -32,9 +32,9 @@ path are also supported, for e.g.,
|
||||
[these quantized versions](https://www.tensorflow.org/lite/guide/hosted_models#quantized_models)
|
||||
on our Hosted Models page.
|
||||
|
||||
## Hexagon Delegate Java API
|
||||
## Hexagon delegate Java API
|
||||
|
||||
```
|
||||
```java
|
||||
public class HexagonDelegate implements Delegate, Closeable {
|
||||
|
||||
/*
|
||||
@ -96,7 +96,7 @@ will need to add the Hexagon shared libs to both 32 and 64-bit lib folders.
|
||||
|
||||
#### Step 3. Create a delegate and initialize a TensorFlow Lite Interpreter
|
||||
|
||||
```
|
||||
```java
|
||||
import org.tensorflow.lite.experimental.HexagonDelegate;
|
||||
|
||||
// Create the Delegate instance.
|
||||
@ -116,9 +116,9 @@ if (hexagonDelegate != null) {
|
||||
}
|
||||
```
|
||||
|
||||
## Hexagon Delegate C API
|
||||
## Hexagon delegate C API
|
||||
|
||||
```
|
||||
```c
|
||||
struct TfLiteHexagonDelegateOptions {
|
||||
// This corresponds to the debug level in the Hexagon SDK. 0 (default)
|
||||
// means no debug.
|
||||
@ -161,7 +161,7 @@ Void TfLiteHexagonInit();
|
||||
Void TfLiteHexagonTearDown();
|
||||
```
|
||||
|
||||
### Example Usage
|
||||
### Example usage
|
||||
|
||||
#### Step 1. Edit app/build.gradle to use the nightly Hexagon delegate AAR
|
||||
|
||||
@ -213,7 +213,7 @@ will need to add the Hexagon shared libs to both 32 and 64-bit lib folders.
|
||||
|
||||
* Create a delegate, example:
|
||||
|
||||
```
|
||||
```c
|
||||
#include "tensorflow/lite/experimental/delegates/hexagon/hexagon_delegate.h"
|
||||
|
||||
// Assuming shared libraries are under "/data/local/tmp/"
|
||||
|
@ -5,8 +5,8 @@ optimizations can be applied to models so that they can be run within these
|
||||
constraints. In addition, some optimizations allow the use of specialized
|
||||
hardware for accelerated inference.
|
||||
|
||||
Tensorflow Lite and the
|
||||
[Tensorflow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
|
||||
TensorFlow Lite and the
|
||||
[TensorFlow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization)
|
||||
provide tools to minimize the complexity of optimizing inference.
|
||||
|
||||
It's recommended that you consider model optimization during your application
|
||||
@ -79,9 +79,10 @@ with TensorFlow Lite.
|
||||
|
||||
### Quantization
|
||||
|
||||
Quantization works by reducing the precision of the numbers used to represent a
|
||||
model's parameters, which by default are 32-bit floating point numbers. This
|
||||
results in a smaller model size and faster computation.
|
||||
[Quantization](https://www.tensorflow.org/model_optimization/guide/quantization/post_training)
|
||||
works by reducing the precision of the numbers used to represent a model's
|
||||
parameters, which by default are 32-bit floating point numbers. This results in
|
||||
a smaller model size and faster computation.
|
||||
|
||||
The following types of quantization are available in TensorFlow Lite:
|
||||
|
||||
@ -145,7 +146,7 @@ For cases where the accuracy and latency targets are not met, or hardware
|
||||
accelerator support is important,
|
||||
[quantization-aware training](https://www.tensorflow.org/model_optimization/guide/quantization/training){:.external}
|
||||
is the better option. See additional optimization techniques under the
|
||||
[Tensorflow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization).
|
||||
[TensorFlow Model Optimization Toolkit](https://www.tensorflow.org/model_optimization).
|
||||
|
||||
If you want to further reduce your model size, you can try [pruning](#pruning)
|
||||
prior to quantizing your models.
|
||||
|
@ -16,9 +16,9 @@ This page describes how to use the NNAPI delegate with the TensorFlow Lite
|
||||
Interpreter in Java and Kotlin. For Android C APIs, please refer to
|
||||
[Android Native Developer Kit documentation](https://developer.android.com/ndk/guides/neuralnetworks).
|
||||
|
||||
## Trying the NNAPI Delegate on your own model
|
||||
## Trying the NNAPI delegate on your own model
|
||||
|
||||
### Gradle Import
|
||||
### Gradle import
|
||||
|
||||
The NNAPI delegate is part of the TensorFlow Lite Android interpreter, release
|
||||
1.14.0 or higher. You can import it to your project by adding the following to
|
||||
@ -69,7 +69,7 @@ if(null != nnApiDelegate) {
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
## Best practices
|
||||
|
||||
### Test performance before deploying
|
||||
|
||||
@ -164,6 +164,6 @@ The following models are known to be compatible with NNAPI:
|
||||
NNAPI acceleration is also not supported when the model contains
|
||||
dynamically-sized outputs. In this case, you will get a warning like:
|
||||
|
||||
```
|
||||
```none
|
||||
ERROR: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors.
|
||||
```
|
||||
|
@ -2,9 +2,9 @@
|
||||
|
||||
Post-training quantization is a conversion technique that can reduce model size
|
||||
while also improving CPU and hardware accelerator latency, with little
|
||||
degradation in model accuracy. You can perform these techniques using an
|
||||
already-trained float TensorFlow model when you convert it to TensorFlow Lite
|
||||
format using the [TensorFlow Lite Converter](../convert/).
|
||||
degradation in model accuracy. You can quantize an already-trained float
|
||||
TensorFlow model when you convert it to TensorFlow Lite format using the
|
||||
[TensorFlow Lite Converter](../convert/).
|
||||
|
||||
Note: The procedures on this page require TensorFlow 1.15 or higher.
|
||||
|
||||
@ -22,8 +22,8 @@ summary table of the choices and the benefits they provide:
|
||||
| Float16 quantization | 2x smaller, potential GPU | CPU, GPU |
|
||||
: : acceleration : :
|
||||
|
||||
This decision tree can help determine which post-training quantization method is
|
||||
best for your use case:
|
||||
The following decision tree can help determine which post-training quantization
|
||||
method is best for your use case:
|
||||
|
||||

|
||||
|
||||
@ -47,9 +47,9 @@ To further improve latency, "dynamic-range" operators dynamically quantize
|
||||
activations based on their range to 8-bits and perform computations with 8-bit
|
||||
weights and activations. This optimization provides latencies close to fully
|
||||
fixed-point inference. However, the outputs are still stored using floating
|
||||
point, so that the speedup with dynamic-range ops is less than a full
|
||||
fixed-point computation. Dynamic-range ops are available for the most
|
||||
compute-intensive operators in a network:
|
||||
point so that the speedup with dynamic-range ops is less than a full fixed-point
|
||||
computation. Dynamic-range ops are available for the most compute-intensive
|
||||
operators in a network:
|
||||
|
||||
* `tf.keras.layers.Dense`
|
||||
* `tf.keras.layers.Conv2D`
|
||||
@ -62,12 +62,12 @@ compute-intensive operators in a network:
|
||||
### Full integer quantization
|
||||
|
||||
You can get further latency improvements, reductions in peak memory usage, and
|
||||
access to integer only hardware devices or accelerators by making sure all model
|
||||
math is integer quantized.
|
||||
compatibility with integer only hardware devices or accelerators by making sure
|
||||
all model math is integer quantized.
|
||||
|
||||
To do this, you need to measure the dynamic range of activations and inputs by
|
||||
supplying sample input data to the converter. Refer to the
|
||||
`representative_dataset_gen()` function used in the following code.
|
||||
For full integer quantization, you need to measure the dynamic range of
|
||||
activations and inputs by supplying sample input data to the converter. Refer to
|
||||
the `representative_dataset_gen()` function used in the following code.
|
||||
|
||||
#### Integer with float fallback (using default float input/output)
|
||||
|
||||
@ -87,14 +87,14 @@ converter.representative_dataset = representative_dataset_gen</b>
|
||||
tflite_quant_model = converter.convert()
|
||||
</pre>
|
||||
|
||||
Note: This won't be compatible with integer only devices (such as 8-bit
|
||||
microcontrollers) and accelerators (such as the Coral Edge TPU). For convenience
|
||||
during inference, the input and output still remain float in order to have the
|
||||
same interface as the original float only model.
|
||||
Note: This `tflite_quant_model` won't be compatible with integer only devices
|
||||
(such as 8-bit microcontrollers) and accelerators (such as the Coral Edge TPU)
|
||||
because the input and output still remain float in order to have the same
|
||||
interface as the original float only model.
|
||||
|
||||
#### Integer only
|
||||
|
||||
*This is a common use case for
|
||||
*Creating integer only models is a common use case for
|
||||
[TensorFlow Lite for Microcontrollers](https://www.tensorflow.org/lite/microcontrollers)
|
||||
and [Coral Edge TPUs](https://coral.ai/).*
|
||||
|
||||
@ -135,18 +135,18 @@ converter.target_spec.supported_types = [tf.lite.constants.FLOAT16]</b>
|
||||
tflite_quant_model = converter.convert()
|
||||
</pre>
|
||||
|
||||
The advantages of this quantization are as follows:
|
||||
The advantages of float16 quantization are as follows:
|
||||
|
||||
* Reduce model size by up to half (since all weights are now half the original
|
||||
size).
|
||||
* Minimal loss in accuracy.
|
||||
* Supports some delegates (e.g. the GPU delegate) can operate directly on
|
||||
float16 data, which results in faster execution than float32 computations.
|
||||
* It reduces model size by up to half (since all weights become half of their
|
||||
original size).
|
||||
* It causes minimal loss in accuracy.
|
||||
* It supports some delegates (e.g. the GPU delegate) which can operate
|
||||
directly on float16 data, resulting in faster execution than float32
|
||||
computations.
|
||||
|
||||
The disadvantages of this quantization are as follows:
|
||||
The disadvantages of float16 quantization are as follows:
|
||||
|
||||
* Not a good choice for maximum performance (a quantization to fixed point
|
||||
math would be better in that case).
|
||||
* It does not reduce latency as much as a quantization to fixed point math.
|
||||
* By default, a float16 quantized model will "dequantize" the weights values
|
||||
to float32 when run on the CPU. (Note that the GPU delegate will not perform
|
||||
this dequantization, since it can operate on float16 data.)
|
||||
|
@ -27,7 +27,7 @@ values in the range `[-128, 127]`, with a zero-point in range `[-128, 127]`.
|
||||
|
||||
There are other exceptions for particular operations that are documented below.
|
||||
|
||||
Note: In the past our quantized tooling used per-tensor, asymmetric, `uint8`
|
||||
Note: In the past our quantization tooling used per-tensor, asymmetric, `uint8`
|
||||
quantization. New tooling, reference kernels, and optimized kernels for 8-bit
|
||||
quantization will use this spec.
|
||||
|
||||
@ -46,19 +46,19 @@ entire tensor. Per-axis quantization means that there will be one scale and/or
|
||||
specifies the dimension of the Tensor's shape that the scales and zero-points
|
||||
correspond to. For example, a tensor `t`, with `dims=[4, 3, 2, 1]` with
|
||||
quantization params: `scale=[1.0, 2.0, 3.0]`, `zero_point=[1, 2, 3]`,
|
||||
`quantization_dimension=1` will be quantized across the second dimension of t:
|
||||
`quantization_dimension=1` will be quantized across the second dimension of `t`:
|
||||
|
||||
t[:, 0, :, :] will have scale[0]=1.0, zero_point[0]=1
|
||||
t[:, 1, :, :] will have scale[1]=2.0, zero_point[1]=2
|
||||
t[:, 2, :, :] will have scale[2]=3.0, zero_point[2]=3
|
||||
|
||||
Often, the quantized_dimension is the output_channel of the weights of
|
||||
Often, the `quantized_dimension` is the `output_channel` of the weights of
|
||||
convolutions, but in theory it can be the dimension that corresponds to each
|
||||
dot-product in the kernel implementation, allowing more quantization granularity
|
||||
without performance implications. This has large improvements to accuracy.
|
||||
|
||||
TFLite has per-axis support for a growing number of operations. At the time of
|
||||
this document support exists for Conv2d and DepthwiseConv2d.
|
||||
this document, support exists for Conv2d and DepthwiseConv2d.
|
||||
|
||||
## Symmetric vs asymmetric
|
||||
|
||||
@ -69,7 +69,7 @@ binary bit of precision. Since activations are only multiplied by constant
|
||||
weights, the constant zero-point value can be optimized pretty heavily.
|
||||
|
||||
Weights are symmetric: forced to have zero-point equal to 0. Weight values are
|
||||
multiplied by dynamic input and activation values. This means that there is a
|
||||
multiplied by dynamic input and activation values. This means that there is an
|
||||
unavoidable runtime cost of multiplying the zero-point of the weight with the
|
||||
activation value. By enforcing that zero-point is 0 we can avoid this cost.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user