Document TFLite delegate parameters supported by the TFLite delegate registrar in lite/tools/delegates, and also change README in each tool (evaluation tools and the benchmark tool) accordingly to mention these supported parameters.
PiperOrigin-RevId: 307791203 Change-Id: If23c11cb9c80e0037c38f070e0ad23fe6cff690e
This commit is contained in:
parent
1912ef16d6
commit
83b579780d
@ -34,32 +34,6 @@ and the following optional parameters:
|
||||
* `run_delay`: `float` (default=-1.0) \
|
||||
The delay in seconds between subsequent benchmark runs. Non-positive values
|
||||
mean use no delay.
|
||||
* `use_xnnpack`: `bool` (default=false) \
|
||||
Whether to use the XNNPack delegate.
|
||||
* `use_hexagon`: `bool` (default=false) \
|
||||
Whether to use the Hexagon delegate. Not all devices may support the Hexagon
|
||||
delegate, refer to the TensorFlow Lite documentation for more information
|
||||
about which devices/chipsets are supported and about how to get the required
|
||||
libraries. To use the Hexagon delegate also build the
|
||||
hexagon_nn:libhexagon_interface.so target and copy the library to the
|
||||
device. All libraries should be copied to /data/local/tmp on the device.
|
||||
* `use_nnapi`: `bool` (default=false) \
|
||||
Whether to use
|
||||
[Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks/).
|
||||
This API is available on recent Android devices. Note that some Android P
|
||||
devices will fail to use NNAPI for models in `/data/local/tmp/` and this
|
||||
benchmark tool will not correctly use NNAPI. When on Android Q+, will also
|
||||
print the names of NNAPI accelerators accessible through the
|
||||
`nnapi_accelerator_name` flag.
|
||||
* `nnapi_accelerator_name`: `str` (default="") \
|
||||
The name of the NNAPI accelerator to use (requires Android Q+). If left
|
||||
blank, NNAPI will automatically select which of the available accelerators
|
||||
to use.
|
||||
* `nnapi_execution_preference`: `string` (default="") \
|
||||
Which
|
||||
[NNAPI execution preference](https://developer.android.com/ndk/reference/group/neural-networks.html#group___neural_networks_1gga034380829226e2d980b2a7e63c992f18af727c25f1e2d8dcc693c477aef4ea5f5)
|
||||
to use when executing using NNAPI. Should be one of the following:
|
||||
fast_single_answer, sustained_speed, low_power, undefined.
|
||||
* `use_legacy_nnapi`: `bool` (default=false) \
|
||||
Whether to use the legacy
|
||||
[Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks/)
|
||||
@ -67,39 +41,6 @@ and the following optional parameters:
|
||||
This is available on recent Android devices. Note that some Android P
|
||||
devices will fail to use NNAPI for models in `/data/local/tmp/` and this
|
||||
benchmark tool will not correctly use NNAPI.
|
||||
* `max_delegated_partitions`: `int` (default=0, i.e. no limit) \
|
||||
The maximum number of partitions that will be delegated. \
|
||||
Currently supported by the Hexagon delegate or the NNAPI delegate but won't
|
||||
work if `use_legacy_nnapi` has been selected.
|
||||
* `min_nodes_per_partition`: `int` (default=0, i.e. default choice implemented
|
||||
by each delegate) \
|
||||
The minimal number of TFLite graph nodes of a partition that needs to be
|
||||
reached to be delegated. A negative value or 0 means to use the default
|
||||
choice of each delegate. \
|
||||
This option is currently only supported by the Hexagon delegate.
|
||||
* `disable_nnapi_cpu`: `bool` (default=false) \
|
||||
Excludes the
|
||||
[NNAPI CPU reference implementation](https://developer.android.com/ndk/guides/neuralnetworks#device-assignment)
|
||||
from the possible devices to be used by NNAPI to execute the model. This
|
||||
option is ignored if `nnapi_accelerator_name` is specified.
|
||||
* `use_gpu`: `bool` (default=false) \
|
||||
Whether to use the
|
||||
[GPU accelerator delegate](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu).
|
||||
This option is currently only available on Android and iOS devices.
|
||||
* `gpu_precision_loss_allowed`: `bool` (default=true) \
|
||||
Whethre to allow the GPU delegate to carry out computation with some
|
||||
precision loss (i.e. processing in FP16) or not. If allowed, the performance
|
||||
will increase.
|
||||
* `gpu_experimental_enable_quant`: `bool` (default=true) \
|
||||
Whether to allow the GPU delegate to run a quantized model or not. This
|
||||
option is currently only available on Android.
|
||||
* `gpu_wait_type`: `str` (default="") \
|
||||
Which GPU wait_type option to use, when using GPU delegate on iOS. Should be
|
||||
one of the following: passive, active, do_not_wait, aggressive. When left
|
||||
blank, passive mode is used by default.
|
||||
* `use_coreml`: `bool` (default=false) \
|
||||
Whether to use the [Core ML delegate](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/delegates/coreml).
|
||||
This option is only available in iOS.
|
||||
* `enable_op_profiling`: `bool` (default=false) \
|
||||
Whether to enable per-operator profiling measurement.
|
||||
* `enable_platform_tracing`: `bool` (default=false) \
|
||||
@ -107,16 +48,50 @@ and the following optional parameters:
|
||||
'enable_op_profiling'. Note, the platform-wide tracing might not work if the
|
||||
tool runs as a commandline native binary. For example, on Android, the
|
||||
ATrace-based tracing only works when the tool is launched as an APK.
|
||||
* `hexagon_profiling`: `bool` (default=false) \
|
||||
Whether to profile ops running on hexagon. Needs to be combined with
|
||||
`enable_op_profiling`. When this is set to true the profile of ops on
|
||||
hexagon DSP will be added to the profile table. Note that, the reported data
|
||||
on hexagon is in cycles, not in ms like on cpu.
|
||||
* `external_delegate_path`: `string` (default="") \
|
||||
Path to the external delegate library to use.
|
||||
* `external_delegate_options`: `string` (default="") \
|
||||
A list of options to be passed to the external delegate library. Options
|
||||
should be in the format of `option1:value1;option2:value2;optionN:valueN`
|
||||
|
||||
### TFLite delegate parameters
|
||||
The tool supports all runtime/delegate parameters introduced by
|
||||
[the delegate registrar]
|
||||
(https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/delegates).
|
||||
The following simply lists the names of all of them w/ additional notes where
|
||||
applicable:
|
||||
#### Common parameters
|
||||
* `max_delegated_partitions`: `int` (default=0) \
|
||||
Note when `use_legacy_nnapi` is selected, this parameter won't work.
|
||||
* `min_nodes_per_partition`:`int` (default=0)
|
||||
|
||||
#### GPU delegate provider
|
||||
* `use_gpu`: `bool` (default=false)
|
||||
* `gpu_precision_loss_allowed`: `bool` (default=true)
|
||||
* `gpu_experimental_enable_quant`: `bool` (default=true)
|
||||
* `gpu_backend`: `string` (default="")
|
||||
* `gpu_wait_type`: `str` (default="")
|
||||
|
||||
### NNAPI delegate provider
|
||||
|
||||
* `use_nnapi`: `bool` (default=false) \
|
||||
Note some Android P devices will fail to use NNAPI for models in
|
||||
`/data/local/tmp/` and this benchmark tool will not correctly use NNAPI.
|
||||
* `nnapi_accelerator_name`: `str` (default="")
|
||||
* `disable_nnapi_cpu`: `bool` (default=false)
|
||||
|
||||
#### Hexagon delegate provider
|
||||
* `use_hexagon`: `bool` (default=false)
|
||||
* `hexagon_profiling`: `bool` (default=false) \
|
||||
Note enabling this option will not produce profiling results outputs unless
|
||||
`enable_op_profiling` is also turned on. When both parameters are set to true,
|
||||
the profile of ops on hexagon DSP will be added to the profile table. Note that,
|
||||
the reported data on hexagon is in cycles, not in ms like on cpu.
|
||||
|
||||
#### XNNPACK delegate provider
|
||||
* `use_xnnpack`: `bool` (default=false)
|
||||
|
||||
#### CoreML delegate provider
|
||||
* `use_coreml`: `bool` (default=false)
|
||||
|
||||
#### External delegate provider
|
||||
* `external_delegate_path`: `string` (default="")
|
||||
* `external_delegate_options`: `string` (default="")
|
||||
|
||||
## To build/install/run
|
||||
|
||||
|
||||
104
tensorflow/lite/tools/delegates/README.md
Normal file
104
tensorflow/lite/tools/delegates/README.md
Normal file
@ -0,0 +1,104 @@
|
||||
# TFLite Delegate Utilities for Tooling
|
||||
|
||||
## TFLite Delegate Registrar
|
||||
[A TFLite delegate registrar]
|
||||
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/delegates/delegate_provider.h)
|
||||
is provided here. The registrar keeps a list of TFLite delegate providers, each
|
||||
of which defines a list parameters that could be initialized from commandline
|
||||
argumenents and provides a TFLite delegate instance creation based on those
|
||||
parameters. This delegate registrar has been used in TFLite evaluation tools and
|
||||
the benchmark model tool.
|
||||
|
||||
A particular TFLite delegate provider can be used by
|
||||
linking the corresponding library, e.g. adding it to the `deps` of a BUILD rule.
|
||||
Note that each delegate provider library has been configured with
|
||||
`alwayslink=1` in the BUILD rule so that it will be linked to any binary that
|
||||
directly or indirectly depends on it.
|
||||
|
||||
The following lists all implemented TFLite delegate providers and their
|
||||
corresponding list of parameters that each supports to create a particular
|
||||
TFLite delegate.
|
||||
|
||||
### Common parameters
|
||||
* `num_threads`: `int` (default=1) \
|
||||
The number of threads to use for running the inference on CPU.
|
||||
* `max_delegated_partitions`: `int` (default=0, i.e. no limit) \
|
||||
The maximum number of partitions that will be delegated. \
|
||||
Currently supported by the GPU, Hexagon, CoreML and NNAPI delegate.
|
||||
* `min_nodes_per_partition`: `int` (default=delegate's own choice) \
|
||||
The minimal number of TFLite graph nodes of a partition that needs to be
|
||||
reached to be delegated. A negative value or 0 means to use the default
|
||||
choice of each delegate. \
|
||||
This option is currently supported by the Hexagon and CoreML delegate.
|
||||
|
||||
### GPU delegate provider
|
||||
* `use_gpu`: `bool` (default=false) \
|
||||
Whether to use the
|
||||
[GPU accelerator delegate](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/delegates/gpu).
|
||||
This option is currently only available on Android and iOS devices.
|
||||
* `gpu_precision_loss_allowed`: `bool` (default=true) \
|
||||
Whethre to allow the GPU delegate to carry out computation with some
|
||||
precision loss (i.e. processing in FP16) or not. If allowed, the performance
|
||||
will increase.
|
||||
* `gpu_experimental_enable_quant`: `bool` (default=true) \
|
||||
Whether to allow the GPU delegate to run a quantized model or not. \
|
||||
This option is currently only available on Android.
|
||||
* `gpu_backend`: `string` (default="") \
|
||||
Force the GPU delegate to use a particular backend for execution, and fail
|
||||
if unsuccessful. Should be one of: cl, gl. By default, the GPU delegate will
|
||||
try OpenCL first and then OpenGL if the former fails.\
|
||||
Note this option is only available on Android.
|
||||
* `gpu_wait_type`: `string` (default="") \
|
||||
Which GPU wait_type option to use, when using GPU delegate on iOS. Should be
|
||||
one of the following: passive, active, do_not_wait, aggressive. When left
|
||||
blank, passive mode is used by default.
|
||||
|
||||
### NNAPI delegate provider
|
||||
* `use_nnapi`: `bool` (default=false) \
|
||||
Whether to use
|
||||
[Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks/).
|
||||
This API is available on recent Android devices. When on Android Q+, will
|
||||
also print the names of NNAPI accelerators accessible through the
|
||||
`nnapi_accelerator_name` flag.
|
||||
* `nnapi_accelerator_name`: `string` (default="") \
|
||||
The name of the NNAPI accelerator to use (requires Android Q+). If left
|
||||
blank, NNAPI will automatically select which of the available accelerators
|
||||
to use.
|
||||
* `nnapi_execution_preference`: `string` (default="") \
|
||||
Which
|
||||
[NNAPI execution preference](https://developer.android.com/ndk/reference/group/neural-networks.html#group___neural_networks_1gga034380829226e2d980b2a7e63c992f18af727c25f1e2d8dcc693c477aef4ea5f5)
|
||||
to use when executing using NNAPI. Should be one of the following:
|
||||
fast_single_answer, sustained_speed, low_power, undefined.
|
||||
* `disable_nnapi_cpu`: `bool` (default=false) \
|
||||
Excludes the
|
||||
[NNAPI CPU reference implementation](https://developer.android.com/ndk/guides/neuralnetworks#device-assignment)
|
||||
from the possible devices to be used by NNAPI to execute the model. This
|
||||
option is ignored if `nnapi_accelerator_name` is specified.
|
||||
|
||||
### Hexagon delegate provider
|
||||
* `use_hexagon`: `bool` (default=false) \
|
||||
Whether to use the Hexagon delegate. Not all devices may support the Hexagon
|
||||
delegate, refer to the [TensorFlow Lite documentation]
|
||||
(https://www.tensorflow.org/lite/performance/hexagon_delegate) for more
|
||||
information about which devices/chipsets are supported and about how to get
|
||||
the required libraries. To use the Hexagon delegate also build the
|
||||
hexagon_nn:libhexagon_interface.so target and copy the library to the
|
||||
device. All libraries should be copied to /data/local/tmp on the device.
|
||||
* `hexagon_profiling`: `bool` (default=false) \
|
||||
Whether to profile ops running on hexagon.
|
||||
|
||||
### XNNPACK delegate provider
|
||||
* `use_xnnpack`: `bool` (default=false) \
|
||||
Whether to use the XNNPack delegate.
|
||||
|
||||
### CoreML delegate provider
|
||||
* `use_coreml`: `bool` (default=false) \
|
||||
Whether to use the [Core ML delegate](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/delegates/coreml).
|
||||
This option is only available in iOS.
|
||||
|
||||
### External delegate provider
|
||||
* `external_delegate_path`: `string` (default="") \
|
||||
Path to the external delegate library to use.
|
||||
* `external_delegate_options`: `string` (default="") \
|
||||
A list of options to be passed to the external delegate library. Options
|
||||
should be in the format of `option1:value1;option2:value2;optionN:valueN`
|
||||
@ -83,9 +83,11 @@ The following optional parameters can be used to modify the inference runtime:
|
||||
assumes that `libhexagon_interface.so` and Qualcomm libraries lie in
|
||||
`/data/local/tmp`.
|
||||
|
||||
This script also supports all applicable runtime/delegate arguments supported on
|
||||
the `benchmark_model` tool. If there is any conflict (for example, `num_threads`
|
||||
in `benchmark_model` vs `num_interpreter_threads` here), the parameters of this
|
||||
This script also supports runtime/delegate arguments introduced by the
|
||||
[delegate registrar]
|
||||
(https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/delegates).
|
||||
If there is any conflict (for example, `num_threads` vs
|
||||
`num_interpreter_threads` here), the parameters of this
|
||||
script are given precedence.
|
||||
|
||||
### Debug Mode
|
||||
|
||||
@ -91,9 +91,11 @@ The following optional parameters can be used to modify the inference runtime:
|
||||
assumes that `libhexagon_interface.so` and Qualcomm libraries lie in
|
||||
`/data/local/tmp`.
|
||||
|
||||
This script also supports all applicable runtime/delegate arguments supported on
|
||||
the `benchmark_model` tool. If there is any conflict (for example, `num_threads`
|
||||
in `benchmark_model` vs `num_interpreter_threads` here), the parameters of this
|
||||
This script also supports runtime/delegate arguments introduced by the
|
||||
[delegate registrar]
|
||||
(https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/delegates).
|
||||
If there is any conflict (for example, `num_threads` vs
|
||||
`num_interpreter_threads` here), the parameters of this
|
||||
script are given precedence.
|
||||
|
||||
## Downloading ILSVRC
|
||||
|
||||
@ -64,9 +64,11 @@ and the following optional parameters:
|
||||
The final metrics are dumped into `output_file_path` as a serialized
|
||||
instance of `tflite::evaluation::EvaluationStageMetrics`
|
||||
|
||||
This script also supports all applicable runtime/delegate arguments supported on
|
||||
the `benchmark_model` tool. If there is any conflict (for example, `num_threads`
|
||||
in `benchmark_model` vs `num_interpreter_threads` here), the parameters of this
|
||||
This script also supports runtime/delegate arguments introduced by the
|
||||
[delegate registrar]
|
||||
(https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/delegates).
|
||||
If there is any conflict (for example, `num_threads` vs
|
||||
`num_interpreter_threads` here), the parameters of this
|
||||
script are given precedence.
|
||||
|
||||
## Running the binary on Android
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user