Inference Diff tool

NOTE: This is an experimental tool to analyze TensorFlow Lite behavior on delegates.

For a given model, this binary compares TensorFlow Lite execution (in terms of latency & output-value deviation) in two settings:

  • Single-threaded CPU Inference
  • User-defined Inference

To do so, the tool generates random gaussian data and passes it through two TFLite Interpreters - one running single-threaded CPU kernels and the other parametrized by the user's arguments.

It measures the latency of both, as well as the absolute difference between the output tensors from each Interpreter, on a per-element basis.

The final output (logged to stdout) typically looks like this:

Num evaluation runs: 50
Reference run latency: avg=84364.2(us), std_dev=12525(us)
Test run latency: avg=7281.64(us), std_dev=2089(us)
OutputDiff[0]: avg_error=1.96277e-05, std_dev=6.95767e-06

There is one instance of OutputDiff for each output tensor in the model, and the statistics in OutputDiff[i] correspond to the absolute difference in raw values across all elements for the ith output.

Parameters

(In this section, 'test Interpreter' refers to the User-defined Inference mentioned above. The reference setting is always single-threaded CPU).

The binary takes the following parameters:

  • model_file : string
    Path to the TFlite model file.

and the following optional parameters:

  • num_runs: int
    How many runs to perform to compare execution in reference and test setting. Default: 50. The binary performs runs 3 invocations per 'run', to get more accurate latency numbers.

  • num_interpreter_threads: int (default=1)
    This modifies the number of threads used by the test Interpreter for inference.

  • delegate: string
    If provided, tries to use the specified delegate on the test Interpreter. Valid values: "nnapi", "gpu", "hexagon".

    NOTE: Please refer to the Hexagon delegate documentation for instructions on how to set it up for the Hexagon delegate. The tool assumes that libhexagon_interface.so and Qualcomm libraries lie in /data/local/tmp.

  • output_file_path: string
    The final metrics are dumped into output_file_path as a serialized instance of tflite::evaluation::EvaluationStageMetrics

This script also supports runtime/delegate arguments introduced by the delegate registrar. If there is any conflict (for example, num_threads vs num_interpreter_threads here), the parameters of this script are given precedence.

Running the binary on Android

(1) Build using the following command:

bazel build -c opt \
  --config=android_arm64 \
  //tensorflow/lite/tools/evaluation/tasks/inference_diff:run_eval

(2) Connect your phone. Push the binary to your phone with adb push (make the directory if required):

adb push bazel-bin/third_party/tensorflow/lite/tools/evaluation/tasks/inference_diff/run_eval /data/local/tmp

(3) Push the TFLite model that you need to test. For example:

adb push mobilenet_v1_1.0_224.tflite /data/local/tmp

(3) Run the binary.

adb shell /data/local/tmp/run_eval \
  --model_file=/data/local/tmp/mobilenet_v1_1.0_224.tflite \
  --delegate=gpu

(5) Pull the results.

adb pull /data/local/tmp/inference_diff.txt ~/accuracy_tool