STT-tensorflow/tensorflow/lite/experimental/quantization_debugger
Taehee Jeong ea2477d56c Fix quantization debugger's tf_export
This should be usable with TF v2, too.

PiperOrigin-RevId: 355786139
Change-Id: I0725913ab3b410c23588cc59b5847f351a0eeb5a
2021-02-04 22:39:50 -08:00
..
BUILD Handle fully quantized model in quantization debugger 2021-02-03 21:30:55 -08:00
debugger_test.py Handle fully quantized model in quantization debugger 2021-02-03 21:30:55 -08:00
debugger.py Fix quantization debugger's tf_export 2021-02-04 22:39:50 -08:00
README.md User guide for quantization debugger 2021-02-03 20:02:30 -08:00

TensorFlow Lite Quantization Debugger

[TOC]

Overview

When a quantized model is produced, it requires tedious and manual custom code to debug the model in order to:

  1. Verify if the quantized model is working as expected (spot errors, check accuracy, etc).
  2. Compare the quantized model and the original float model.

This is now feasible using the TensorFlow Lite Quantization Debugger, as shown below.

Note: Currently, this workflow is only supported for full integer (int8) quantization. The debug model produced using this workflow should only be used for debugging purposes only (and not for inference).

Analysis with quantized model only

Produce a debug model

Modify the TFLite full integer (int8) quantization steps as shown below to produce a debug model (used for debugging purposes only, and not inference)

How does this work?

With the help of the MLIR quantizer's debug mode feature, the debug model produced has both the original float operators (or ops) and the quantized ops. Additionally, NumericVerify ops are added to compare the outputs of the original float and quantized ops and to also collect statistics. It has the name in the format of NumericVerify/{original tensor name}:{original tensor id}

# for mlir_quantize
from tensorflow.lite.python import convert

# set full-integer quantization parameters as usual.
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = calibration_gen

# Create a TFLite model with new quantizer and numeric verify ops. Rather than
# calling convert() only, calibrate model first and call `mlir_quantize` to run
# the actual quantization, with `enable_numeric_verify` set to `True`.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter._experimental_calibrate_only = True
calibrated = converter.convert()
return convert.mlir_quantize(calibrated, enable_numeric_verify=True)

Run debugger with debug model

Initialize debugger with the debug model. This can be done in two ways.

from tensorflow.lite.experimental.quantization_debugger import debugger

# `debug_dataset` accpets the same type as `converter.representative_dataset`.
quant_debugger = debugger.QuantizationDebugger(
    quant_debug_model_content=quant_debug_model,
    debug_dataset=data_gen)

# OR

quant_debugger = debugger.QuantizationDebugger(
    quant_debug_model_path='/path/to/debug_model.tflite',
    debug_dataset=data_gen)

quant_debugger.run()

Inspect statistics

When you call quant_debugger.run(), quant_debugger.layer_statistics is filled with aggregated statistics for each NumericVerify ops. Some metrics (i.e. stddev, mean square error) are calculated by default.

Example output

# `quant_debugger.layer_statistics.metrics` is defaultdict, convert it to dict
# for readable output.
import pprint
for layer_name, metrics in quant_debugger.layer_statistics.items():
  print(layer_name)
  pprint.pprint(dict(metrics))
# ...

NumericVerify/sequential/dense/MatMul;sequential/dense/BiasAdd3:77
{'max_abs_error': 0.05089309,
 'mean_error': -0.00017149668,
 'mean_square_error': 0.00040816222,
 'num_elements': 256.0,
 'stddev': 0.02009948}
NumericVerify/sequential/dense_1/MatMul;sequential/dense_1/BiasAdd3:81
{'max_abs_error': 0.09744112,
 'mean_error': 0.0048679365,
 'mean_square_error': 0.0036721828,
 'num_elements': 10.0,
 'stddev': 0.055745363}
NumericVerify/Identity2:85
{'max_abs_error': 0.0036417267,
 'mean_error': -0.00068773015,
 'mean_square_error': 3.439951e-06,
 'num_elements': 10.0,
 'stddev': 0.0016223773}

# ...

Adding custom metrics

More metrics can be added by passing QuantizationDebugOptions to the initializer. For example, if you want to add mean absolute error, use following snippet.

debug_options = debugger.QuantizationDebugOptions(
    layer_debug_metrics={
        'mean_abs_error': lambda diffs: np.mean(np.abs(diffs))
    })

quant_debugger = debugger.QuantizationDebugger(
    quant_debug_model_content=quant_debug_model,
    debug_dataset=data_gen,
    debug_options=debug_options
)
quant_debugger.run()

Now quant_debugger.layer_statistics includes mean absoulte error for each layer.

Analysis with float and quantized models

In addition to single model analysis, the output of original float model and quantized model can be compared when both models are given. This can be done by providing a float model, and metrics to compare outputs. This can be argmax for classification models, bit for more complex models like detection more complicated logic should be given.

# functions for model_debug_metrics gets all output tensors from float and
# quantized models, and returns a single metric value.
debug_options = debugger.QuantizationDebugOptions(
    model_debug_metrics={
        'argmax_accuracy': lambda f, q: np.argmax(f[0]) == np.argmax(q[0])
    })

float_model = converter.convert()  # converted without any optimizations.

quant_debugger = debugger.QuantizationDebugger(
    quant_debug_model_content=quant_debug_model,
    float_model_content=float_model,  # can pass `float_model_path` instead.
    debug_dataset=data_gen,
    debug_options=debug_options
)
quant_debugger.run()

The result is a single number per metric, so it's easier to inspect.

>>> quant_debugger.model_statistics
{'argmax_accuracy': 0.89}

Advanced usage: Export stats to csv, and import to pandas

quant_debugger.layer_statistics_dump function accepts file-like object, and exports layer statistics to csv. This can be imported to other tools like pandas for further processing. The exported data also has name of the op, originating tensor ID, and quantization parameters (scales and zero points) for quantized layer.

Note: scales and zero points are lists, and imported to pandas as text by default. Additional processing to parse them is required before processing.

import pandas as pd
import yaml  # used to parse lists

with open('/path/to/stats.csv', 'w') as f:
  quant_debugger.layer_statistics_dump(f)

data = pd.read_csv('/path/to/stats.csv', converters={
      'scales': yaml.safe_load, 'zero_points': yaml.safe_load})