STT-tensorflow/tensorflow/lite/delegates/xnnpack
Smit Hinsu e6ada6a6b4 Add unit test for Prelu in XNNPACK delegate
- Add PreluTester class and unit test for XNNPACK-delegated Prelu operator
- Relax restrictions on the number of input/output dimensions in delegated
  Prelu operators

PiperOrigin-RevId: 317735911
Change-Id: Iddf727f5f916b142412a1be44efa1f367dc31d49
2020-06-22 14:44:20 -07:00
..
abs_test.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
add_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
average_pool_2d_test.cc MaxPool2D and AveragePool2D tests for XNNPACK delegate 2020-04-14 20:39:21 -07:00
binary_elementwise_tester.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
binary_elementwise_tester.h Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
BUILD Add unit test for Prelu in XNNPACK delegate 2020-06-22 14:44:20 -07:00
ceil_test.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
conv_2d_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
conv_2d_tester.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
conv_2d_tester.h Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
depthwise_conv_2d_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
depthwise_conv_2d_tester.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
depthwise_conv_2d_tester.h Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
div_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
floor_test.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
fully_connected_test.cc Support models with FP16 weights in XNNPACK delegate 2020-05-27 18:55:36 -07:00
fully_connected_tester.cc Support models with FP16 weights in XNNPACK delegate 2020-05-27 18:55:36 -07:00
fully_connected_tester.h Support models with FP16 weights in XNNPACK delegate 2020-05-27 18:55:36 -07:00
hard_swish_test.cc Activation tests for XNNPACK delegate 2020-04-21 01:20:18 -07:00
leaky_relu_test.cc Support LEAKY_RELU operator in XNNPACK delegate 2020-06-10 22:09:20 -07:00
leaky_relu_tester.cc Support LEAKY_RELU operator in XNNPACK delegate 2020-06-10 22:09:20 -07:00
leaky_relu_tester.h Add unit test for Prelu in XNNPACK delegate 2020-06-22 14:44:20 -07:00
logistic_test.cc Fix flakiness in XNNPACK Logistic test 2020-04-27 15:19:15 -07:00
max_pool_2d_test.cc MaxPool2D and AveragePool2D tests for XNNPACK delegate 2020-04-14 20:39:21 -07:00
maximum_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
mean_test.cc Support Global Average Pooling in XNNPACK delegate 2020-06-11 20:25:14 -07:00
minimum_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
mul_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
neg_test.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
pad_test.cc Support Pad with static paddings in XNNPACK delegate 2020-05-25 15:53:01 -07:00
pad_tester.cc Support Pad with static paddings in XNNPACK delegate 2020-05-25 15:53:01 -07:00
pad_tester.h Prune unused includes in XNNPACK tester headers 2020-05-25 20:24:56 -07:00
pool_2d_tester.cc MaxPool2D and AveragePool2D tests for XNNPACK delegate 2020-04-14 20:39:21 -07:00
pool_2d_tester.h Prune unused includes in XNNPACK tester headers 2020-05-25 20:24:56 -07:00
README.md Update XNNPACK dependency and document sparse inference capability 2020-06-18 17:23:25 -07:00
reduce_tester.cc Avoid numerical stability issues in MEAN test 2020-06-12 14:35:38 -07:00
reduce_tester.h Support Global Average Pooling in XNNPACK delegate 2020-06-11 20:25:14 -07:00
relu6_test.cc Activation tests for XNNPACK delegate 2020-04-21 01:20:18 -07:00
relu_n1_to_1_test.cc Activation tests for XNNPACK delegate 2020-04-21 01:20:18 -07:00
relu_test.cc Activation tests for XNNPACK delegate 2020-04-21 01:20:18 -07:00
round_test.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
softmax_test.cc SOFTMAX tests for XNNPACK delegate 2020-04-21 10:36:51 -07:00
softmax_tester.cc SOFTMAX tests for XNNPACK delegate 2020-04-21 10:36:51 -07:00
softmax_tester.h Prune unused includes in XNNPACK tester headers 2020-05-25 20:24:56 -07:00
square_test.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
squared_difference_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
sub_test.cc Run tflite model with sparse tensor with XNNPACK. 2020-06-12 15:22:36 -07:00
unary_elementwise_tester.cc Support six new operators in XNNPACK delegate 2020-06-10 13:05:13 -07:00
unary_elementwise_tester.h Prune unused includes in XNNPACK tester headers 2020-05-25 20:24:56 -07:00
xnnpack_delegate.cc Add unit test for Prelu in XNNPACK delegate 2020-06-22 14:44:20 -07:00
xnnpack_delegate.h

XNNPACK backend for TensorFlow Lite

XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, x86, and WebAssembly architectures in Android, iOS, Windows, Linux, macOS, and Emscripten environments. This document describes how to use the XNNPACK library as an inference engine for TensorFlow Lite.

Using XNNPACK engine with TensorFlow Lite interpreter

XNNPACK integrates with TensorFlow Lite interpreter through the delegation mechanism. There are three methods to enable XNNPACK engine in TensorFlow Lite.

When building TensorFlow Lite with Bazel, add --define tflite_with_xnnpack=true, and the TensorFlow Lite interpreter will use XNNPACK engine by default.

The exact command depends on the target platform, e.g. for Android AAR you'd use

bazel build -c opt --fat_apk_cpu=x86,x86_64,arm64-v8a,armeabi-v7a \
  --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
  --define tflite_with_xnnpack=true \
  //tensorflow/lite/java:tensorflow-lite

Enable XNNPACK via additional dependency

Another way to enable XNNPACK is to build and link the //tensorflow/lite:tflite_with_xnnpack target into your application alongside the TensorFlow Lite framework.

This method works on platforms which support POSIX-style weak symbols (Android, iOS, Linux, Mac, but NOT Windows).

While it is possible to use low-level delegate API to enable XNNPACK, this method is NOT RECOMMENDED unless you need to use TensorFlow Lite both with and without XNNPACK (e.g. for benchmarking).

With low-level delegate API users create an XNNPACK delegate with the TfLiteXNNPackDelegateCreate function, and then call Interpreter::ModifyGraphWithDelegate to delegate supported parts of the model to the XNNPACK delegate. The users must destroy the delegate with TfLiteXNNPackDelegateDelete after releasing the TensorFlow Lite interpreter. The snippet below illustrates the typical usage:

// Build the interpreter
std::unique_ptr<tflite::Interpreter> interpreter;
...

// IMPORTANT: initialize options with TfLiteXNNPackDelegateOptionsDefault() for
// API-compatibility with future extensions of the TfLiteXNNPackDelegateOptions
// structure.
TfLiteXNNPackDelegateOptions xnnpack_options =
    TfLiteXNNPackDelegateOptionsDefault();
xnnpack_options.num_threads = num_threads;

TfLiteDelegate* xnnpack_delegate =
    TfLiteXNNPackDelegateCreate(&xnnpack_options);
if (interpreter->ModifyGraphWithDelegate(xnnpack_delegate) != kTfLiteOk) {
  // Report error and fall back to another delegate, or the default backend
}

...

// Run inference using XNNPACK
interpreter->Invoke()

...

// IMPORTANT: release the interpreter before destroying the delegate
interpreter.reset();
TfLiteXNNPackDelegateDelete(xnnpack_delegate);

Limitations and supported operators

XNNPACK delegate is a work-in-progress, and currently supports a limited set of operators. Unsupported operators will fall back to the default implementations, so models using a combination of supported and unsupported operators can still benefit from XNNPACK delegate.

Below is the list of current operators and limitations:

ABS

  • Inputs and outputs must be in 32-bit floating-point format.

ADD

  • Inputs and outputs must be in 32-bit floating-point format.
  • Only addition with two inputs is supported.
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

AVERAGE_POOL_2D

  • Inputs and outputs must be in 32-bit floating-point format.
  • 1x1 pooling is not supported.
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

CEIL

  • Inputs and outputs must be in 32-bit floating-point format.

CONV_2D

  • Inputs and outputs must be in 32-bit floating-point format.
  • Bias is mandatory.
  • Both filter and bias must be static (use kTfLiteMmapRo allocation type).
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

DEPTHWISE_CONV_2D

  • Inputs and outputs must be in 32-bit floating-point format.
  • Bias is mandatory.
  • Both filter and bias must be static (use kTfLiteMmapRo allocation type).
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

DIV

  • Inputs and outputs must be in 32-bit floating-point format.
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

FULLY_CONNECTED

  • Inputs and outputs must be in 32-bit floating-point format.
  • Bias is mandatory.
  • Both filter and bias must be static (use kTfLiteMmapRo allocation type).
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

FLOOR

  • Inputs and outputs must be in 32-bit floating-point format.

HARD_SWISH

  • Inputs and outputs must be in 32-bit floating-point format.

LEAKY_RELU

  • Inputs and outputs must be in 32-bit floating-point format.

LOGISTIC

  • Inputs and outputs must be in 32-bit floating-point format.

MAX_POOL_2D

  • Inputs and outputs must be in 32-bit floating-point format.
  • 1x1 pooling is not supported.
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

MAXIMUM

  • Inputs and outputs must be in 32-bit floating-point format.

MEAN

  • The first input and the output must be a 4D tensors in 32-bit floating-point format.
  • The second input (the input with the axes specification) must be static (use kTfLiteMmapRo allocation type).
  • Only [1, 2] or [2, 1] axes specification (i.e. reduction across spatial dimensions) is supported.
  • Only keep_dims = True parameter value is supported.

MINIMUM

  • Inputs and outputs must be in 32-bit floating-point format.

MUL

  • Inputs and outputs must be in 32-bit floating-point format.
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

NEG

  • Inputs and outputs must be in 32-bit floating-point format.

PAD

  • The first input and the output must be in 32-bit floating-point format.
  • The second input (the input with the padding specification) must be static (use kTfLiteMmapRo allocation type).
  • The numbers of padding elements must be non-negative.

PRELU

  • Inputs and outputs must be in 32-bit floating-point format.
  • Slope must be static (use kTfLiteMmapRo allocation type).
  • Slope must be either a 1D tensor, or have all its non-channel dimensions equal 1.

RELU

  • Inputs and outputs must be in 32-bit floating-point format.

RELU6

  • Inputs and outputs must be in 32-bit floating-point format.

RELU_N1_TO_1

  • Inputs and outputs must be in 32-bit floating-point format.

ROUND

  • Inputs and outputs must be in 32-bit floating-point format.

SOFTMAX

  • Inputs and outputs must be in 32-bit floating-point format.
  • Only beta = 1.0 is supported.

SQUARE

  • Inputs and outputs must be in 32-bit floating-point format.

SQUARED_DIFFERENCE

  • Inputs and outputs must be in 32-bit floating-point format.

SUB

  • Inputs and outputs must be in 32-bit floating-point format.
  • Fused NONE, RELU, RELU_N1_TO_1, and RELU6 activations are supported, but fused TANH and SIGN_BIT activations are not.

Sparse Inference (experimental)

XNNPACK backend supports sparse inference for CNN models described in the Fast Sparse ConvNets paper. This functionality must be enabled at build-time via --define xnn_enable_sparse=true Bazel flag. Sparse inference is restricted to subgraphs with the following operators:

  • Sparse subgraph must start with a 3x3 stride-2 CONV_2D operator with padding 1 on each side, no dilation, and 3 input channels.
  • Sparse subgraph must end with a MEAN operator that does reduction across spatial axes.
  • Sparse subgraph may contain the following operators:
    • CONV_2D with 1x1 kernel and no padding. It is important to have high sparsity (at least 70%) in the filter of this operator to get speedup over dense inference.
    • DEPTHWISE_CONV_2D with 3x3 kernel, stride 1, no dilation, and padding 1 on each side.
    • DEPTHWISE_CONV_2D with 3x3 kernel, stride 2, no dilation, and padding 1 on each side.
    • DEPTHWISE_CONV_2D with 5x5 kernel, stride 1, no dilation, and padding 2 on each side.
    • DEPTHWISE_CONV_2D with 5x5 kernel, stride 2, no dilation, and padding 2 on each side.
    • ADD and MUL operators where both inputs are 4D tensors. If one of the inputs to ADD or MUL is a constant tensor, it must be representable as either a scalar, or a 1D vector.
    • Unary elementwise operators ABS, CEIL, FLOOR, HARD_SWISH, LEAKY_RELU, LOGISTIC, NEG, RELU, RELU6, RELU_N1_TO_1, ROUND, and SQUARE.

Pre-trained Fast Sparse ConvNets models provide examples that satisfy these constrains.

In addition to acceleration, sparse models get the compression benefit by storing only non-zero values in the TensorFlow Lite file format.

Other limitations

  • Dynamically allocated (with kTfLiteDynamic allocation type) inputs and outputs are not supported.
  • Resizing model inputs (via Interpreter::ResizeInputTensor) is supported, but cause a complete reinitialization of the delegate instance, which has considerable overhead.