7.2 KiB
Frequently Asked Questions
If you don't find an answer to your question here, please look through our detailed documentation for the topic or file a GitHub issue.
Model Conversion
What formats are supported for conversion from TensorFlow to TensorFlow Lite?
The TensorFlow Lite converter supports the following formats:
- SavedModels: TFLiteConverter.from_saved_model
- Frozen GraphDefs generated by freeze_graph.py: TFLiteConverter.from_frozen_graph
- tf.keras HDF5 models: TFLiteConverter.from_keras_model_file
- tf.Session: TFLiteConverter.from_session
The recommended approach is to integrate the Python converter into your model pipeline in order to detect compatibility issues early on.
Why doesn't my model convert?
Since the number of TensorFlow Lite operations is smaller than TensorFlow's, some inference models may not be able to convert. For unimplemented operations, take a look at the question on missing operators. Unsupported operators include embeddings and LSTM/RNNs. For models with LSTM/RNNs, you can also try the experimental API OpHint to convert. Models with control flow ops (Switch, Merge, etc) are not convertible at the moment, but we are working on adding support for control flow in Tensorflow Lite, please see GitHub issues.
For conversion issues not related to missing operations or control flow ops, search our GitHub issues or file a new one.
How do I determine the inputs/outputs for GraphDef protocol buffer?
The easiest way to inspect a graph from a .pb
file is to use
Netron, an open-source viewer for
machine learning models.
If Netron cannot open the graph, you can try the summarize_graph tool.
If the summarize_graph tool yields an error, you can visualize the GraphDef with
TensorBoard and
look for the inputs and outputs in the graph. To visualize a .pb
file, use the
import_pb_to_tensorboard.py
script like below:
python import_pb_to_tensorboard.py --model_dir <model path> --log_dir <log dir path>
How do I inspect a .tflite
file?
Netron is the easiest way to visualize a TensorFlow Lite model.
If Netron cannot open your TensorFlow Lite model, you can try the visualize.py script in our repository.
- Clone the TensorFlow repository
- Run the
visualize.py
script with bazel:
bazel run //tensorflow/lite/tools:visualize model.tflite visualized_model.html
Models & Operations
Why are some operations not implemented in TensorFlow Lite?
In order to keep TensorFlow Lite lightweight, only certain operations were used in the converter. The Compatibility Guide provides a list of operations currently supported by TensorFlow Lite.
If you don’t see a specific operation (or an equivalent) listed, it's likely that it has not been prioritized. The team tracks requests for new operations on GitHub issue #21526. Leave a comment if your request hasn’t already been mentioned.
In the meanwhile, you could try implementing a custom operator or using a different model that only contains supported operators. If binary size is not a constraint, try using TensorFlow Lite with select TensorFlow ops.
How do I test that a TensorFlow Lite model behaves the same as the original TensorFlow model?
The best way to test the behavior of a TensorFlow Lite model is to use our API with test data and compare the outputs to TensorFlow for the same inputs. Take a look at our Python Interpreter example that generates random data to feed to the interpreter.
Optimization
How do I reduce the size of my converted TensorFlow Lite model?
Post-training quantization can be used during conversion to TensorFlow Lite to reduce the size of the model. Post-training quantization quantizes weights to 8-bits of precision from floating-point and dequantizes them during runtime to perform floating point computations. However, note that this could have some accuracy implications.
If retraining the model is an option, consider Quantization-aware training. However, note that quantization-aware training is only available for a subset of convolutional neural network architectures.
For a deeper understanding of different optimization methods, look at Model optimization.
How do I optimize TensorFlow Lite performance for my machine learning task?
The high-level process to optimize TensorFlow Lite performance looks something like this:
- Make sure that you have the right model for the task. For image classification, check out our list of hosted models.
- Tweak the number of threads. Many TensorFlow Lite operators support
multi-threaded kernels. You can use
SetNumThreads()
in the C++ API to do this. However, increasing threads results in performance variability depending on the environment. - Use Hardware Accelerators. TensorFlow Lite supports model acceleration for
specific hardware using delegates. For example, to use Android’s Neural
Networks API, call
UseNNAPI
on the interpreter. Or take a look at our GPU delegate tutorial. - (Advanced) Profile Model. The Tensorflow Lite benchmarking tool has a built-in profiler that can show per-operator statistics. If you know how you can optimize an operator’s performance for your specific platform, you can implement a custom operator.
For a more in-depth discussion on how to optimize performance, take a look at Best Practices.