Inference Diff tool
NOTE: This is an experimental tool to analyze TensorFlow Lite behavior on delegates.
For a given model, this binary compares TensorFlow Lite execution (in terms of latency & output-value deviation) in two settings:
- Single-threaded CPU Inference
- User-defined Inference
To do so, the tool generates random gaussian data and passes it through two TFLite Interpreters - one running single-threaded CPU kernels and the other parametrized by the user's arguments.
It measures the latency of both, as well as the absolute difference between the output tensors from each Interpreter, on a per-element basis.
The final output (logged to stdout) typically looks like this:
Num evaluation runs: 50
Reference run latency: avg=84364.2(us), std_dev=12525(us)
Test run latency: avg=7281.64(us), std_dev=2089(us)
OutputDiff[0]: avg_error=1.96277e-05, std_dev=6.95767e-06
There is one instance of OutputDiff for each output tensor in the model, and
the statistics in OutputDiff[i] correspond to the absolute difference in raw
values across all elements for the ith output.
Parameters
(In this section, 'test Interpreter' refers to the User-defined Inference mentioned above. The reference setting is always single-threaded CPU).
The binary takes the following parameters:
model_file:string
Path to the TFlite model file.
and the following optional parameters:
-
num_runs:int
How many runs to perform to compare execution in reference and test setting. Default: 50. The binary performs runs 3 invocations per 'run', to get more accurate latency numbers. -
num_interpreter_threads:int(default=1)
This modifies the number of threads used by the test Interpreter for inference. -
delegate:string
If provided, tries to use the specified delegate on the test Interpreter. Valid values: "nnapi", "gpu", "hexagon".NOTE: Please refer to the Hexagon delegate documentation for instructions on how to set it up for the Hexagon delegate. The tool assumes that
libhexagon_interface.soand Qualcomm libraries lie in/data/local/tmp. -
output_file_path:string
The final metrics are dumped intooutput_file_pathas a serialized instance oftflite::evaluation::EvaluationStageMetrics
This script also supports runtime/delegate arguments introduced by the
delegate registrar.
If there is any conflict (for example, num_threads vs
num_interpreter_threads here), the parameters of this
script are given precedence.
Running the binary on Android
(1) Build using the following command:
bazel build -c opt \
--config=android_arm64 \
//tensorflow/lite/tools/evaluation/tasks/inference_diff:run_eval
(2) Connect your phone. Push the binary to your phone with adb push (make the directory if required):
adb push bazel-bin/third_party/tensorflow/lite/tools/evaluation/tasks/inference_diff/run_eval /data/local/tmp
(3) Push the TFLite model that you need to test. For example:
adb push mobilenet_v1_1.0_224.tflite /data/local/tmp
(3) Run the binary.
adb shell /data/local/tmp/run_eval \
--model_file=/data/local/tmp/mobilenet_v1_1.0_224.tflite \
--delegate=gpu
(5) Pull the results.
adb pull /data/local/tmp/inference_diff.txt ~/accuracy_tool