Update detection documentation with latest models & instructions

PiperOrigin-RevId: 336694589
Change-Id: I6932c3a3f91eb184acd7e51ccbfe80c74b077c7d
This commit is contained in:
Sachin Joglekar 2020-10-12 10:38:44 -07:00 committed by TensorFlower Gardener
parent b81b896808
commit d855fc792a

View File

@ -2,45 +2,55 @@
<img src="../images/detection.png" class="attempt-right"> <img src="../images/detection.png" class="attempt-right">
Detect multiple objects within an image, with bounding boxes. Recognize 90 Given an image or a video stream, an object detection model can identify which
different classes of objects. of a known set of objects might be present and provide information about their
positions within the image.
For example, this screenshot of the <a href="#get_started">example
application</a> shows how two objects have been recognized and their positions
annotated:
<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
## Get started ## Get started
If you are new to TensorFlow Lite and are working with Android or iOS, we If you are new to TensorFlow Lite and are working with Android or iOS, download
recommend exploring the following example applications that can help you get the following example applications to get started.
started.
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">Android <a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">Android
example</a> example</a>
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">iOS <a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">iOS
example</a> example</a>
If you are using a platform other than Android or iOS, or you are already If you are using a platform other than Android or iOS, or if you are already
familiar with the <a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite APIs</a>, you can familiar with the
download our starter object detection model and the accompanying labels. <a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite
APIs</a>, you can download the starter object detection model and the
accompanying labels.
<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download <a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
starter model with Metadata</a> starter model with Metadata</a>
For more information about the starter model, see
<a href="#starter_model">Starter model</a>.
For more information about Metadata and associated fields (eg: `labels.txt`) see For more information about Metadata and associated fields (eg: `labels.txt`) see
<a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read <a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read
the metadata from models</a> the metadata from models</a>
## What is object detection? If you want to train a custom detection model for your own task, see
<a href="#model-customization">Model customization</a>.
Given an image or a video stream, an object detection model can identify which For the following use cases, you should use a different type of model:
of a known set of objects might be present and provide information about their
positions within the image.
For example, this screenshot of our <a href="#get_started">example <ul>
application</a> shows how two objects have been recognized and their positions <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
annotated: <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
</ul>
<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%"> ## Model description
This section describes the signature for
[Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to
TensorFlow Lite from the
[TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/).
An object detection model is trained to detect the presence and location of An object detection model is trained to detect the presence and location of
multiple classes of objects. For example, a model might be trained with images multiple classes of objects. For example, a model might be trained with images
@ -48,15 +58,69 @@ that contain various pieces of fruit, along with a _label_ that specifies the
class of fruit they represent (e.g. an apple, a banana, or a strawberry), and class of fruit they represent (e.g. an apple, a banana, or a strawberry), and
data specifying where each object appears in the image. data specifying where each object appears in the image.
When we subsequently provide an image to the model, it will output a list of the When an image is subsequently provided to the model, it will output a list of
objects it detects, the location of a bounding box that contains each object, the objects it detects, the location of a bounding box that contains each
and a score that indicates the confidence that detection was correct. object, and a score that indicates the confidence that detection was correct.
### Model output ### Input Signature
Imagine a model has been trained to detect apples, bananas, and strawberries. The model takes an image as input.
When we pass it an image, it will output a set number of detection results - in
this example, 5. Lets assume the expected image is 300x300 pixels, with three channels (red,
blue, and green) per pixel. This should be fed to the model as a flattened
buffer of 270,000 byte values (300x300x3). If the model is
<a href="../../performance/post_training_quantization.md">quantized</a>, each
value should be a single byte representing a value between 0 and 255.
You can take a look at our
[example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android)
to understand how to do this pre-processing on Android.
### Output Signature
The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
describe `N` detected objects, with one element in each array corresponding to
each object.
<table>
<thead>
<tr>
<th>Index</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Locations</td>
<td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
</tr>
<tr>
<td>1</td>
<td>Classes</td>
<td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td>
</tr>
<tr>
<td>2</td>
<td>Scores</td>
<td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td>
</tr>
<tr>
<td>3</td>
<td>Number of detections</td>
<td>Integer value of N</td>
</tr>
</tbody>
</table>
NOTE: The number of results (10 in the above case) is a parameter set while
exporting the detection model to TensorFlow Lite. See
<a href="#model-customization">Model customization</a> for more details.
For example, imagine a model has been trained to detect apples, bananas, and
strawberries. When provided an image, it will output a set number of detection
results - in this example, 5.
<table style="width: 60%;"> <table style="width: 60%;">
<thead> <thead>
@ -95,7 +159,7 @@ this example, 5.
</tbody> </tbody>
</table> </table>
### Confidence score #### Confidence score
To interpret these results, we can look at the score and the location for each To interpret these results, we can look at the score and the location for each
detected object. The score is a number between 0 and 1 that indicates confidence detected object. The score is a number between 0 and 1 that indicates confidence
@ -103,10 +167,10 @@ that the object was genuinely detected. The closer the number is to 1, the more
confident the model is. confident the model is.
Depending on your application, you can decide a cut-off threshold below which Depending on your application, you can decide a cut-off threshold below which
you will discard detection results. For our example, we might decide a sensible you will discard detection results. For the current example, a sensible cut-off
cut-off is a score of 0.5 (meaning a 50% probability that the detection is is a score of 0.5 (meaning a 50% probability that the detection is valid). In
valid). In that case, we would ignore the last two objects in the array, because that case, the last two objects in the array would be ignored because those
those confidence scores are below 0.5: confidence scores are below 0.5:
<table style="width: 60%;"> <table style="width: 60%;">
<thead> <thead>
@ -158,11 +222,11 @@ positive.
<img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%"> <img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%">
### Location #### Location
For each detected object, the model will return an array of four numbers For each detected object, the model will return an array of four numbers
representing a bounding rectangle that surrounds its position. For the starter representing a bounding rectangle that surrounds its position. For the starter
model we provide, the numbers are ordered as follows: model provided, the numbers are ordered as follows:
<table style="width: 50%; margin: 0 auto;"> <table style="width: 50%; margin: 0 auto;">
<tbody> <tbody>
@ -186,7 +250,9 @@ Note: Object detection models accept input images of a specific size. This is li
## Performance benchmarks ## Performance benchmarks
Performance benchmark numbers are generated with the tool Performance benchmark numbers for our
<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">starter
model</a> are generated with the tool
[described here](https://www.tensorflow.org/lite/performance/benchmarks). [described here](https://www.tensorflow.org/lite/performance/benchmarks).
<table> <table>
@ -226,79 +292,53 @@ Performance benchmark numbers are generated with the tool
\*\* 2 threads used on iPhone for the best performance result. \*\* 2 threads used on iPhone for the best performance result.
## Starter model ## Model Customization
We recommend starting with this pre-trained quantized COCO SSD MobileNet v1 ### Pre-trained models
model.
<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download Mobile-optimized detection models with a variety of latency and precision
starter model and labels</a> characteristics can be found in the
[Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models).
Each one of them follows the input and output signatures described in the
following sections.
### Uses and limitations Most of the download zips contain a `model.tflite` file. If there isn't one, a
TensorFlow Lite flatbuffer can be generated using
[these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
SSD models from the
[TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
can also be converted to TensorFlow Lite using the instructions
[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md).
It is important to note that detection models cannot be converted directly using
the [TensorFlow Lite Converter](https://www.tensorflow.org/lite/convert), since
they require an intermediate step of generating a mobile-friendly source model.
The scripts linked above perform this step.
The object detection model we provide can identify and locate up to 10 objects Both the
in an image. It is trained to recognize 90 classes of objects. For a full list [TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
of classes, see the labels file embedded in the model with &
<a href="https://www.tensorflow.org/lite/convert/metadata#visualize_the_metadata">metadata [TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
visualiztion</a>. exporting scripts have parameters that can enable a larger number of output
objects or slower, more-accurate post processing. Please use `--help` with the
scripts to see an exhaustive list of supported arguments.
If you want to train a model to recognize new classes, see > Currently, on-device inference is only optimized with SSD models. Better
<a href="#customize_model">Customize model</a>. > support for other architectures like CenterNet and EfficientDet is being
> investigated.
For the following use cases, you should use a different type of model: ### How to choose a model to customize?
<ul> Each model comes with its own precision (quantified by mAP value) and latency
<li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li> characteristics. You should choose a model that works the best for your use-case
<li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li> and intended hardware. For example, the
</ul> [Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models)
models are ideal for inference on Google's Edge TPU on Pixel 4.
### Input You can use our
[benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to
evaluate models and choose the most efficient option available.
The model takes an image as input. The expected image is 300x300 pixels, with ## Fine-tuning models on custom data
three channels (red, blue, and green) per pixel. This should be fed to the model
as a flattened buffer of 270,000 byte values (300x300x3). Since the model is
<a href="../../performance/post_training_quantization.md">quantized</a>, each
value should be a single byte representing a value between 0 and 255.
### Output
The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
describe 10 detected objects, with one element in each array corresponding to
each object. There will always be 10 objects detected.
<table>
<thead>
<tr>
<th>Index</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Locations</td>
<td>Multidimensional array of [10][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
</tr>
<tr>
<td>1</td>
<td>Classes</td>
<td>Array of 10 integers (output as floating point values) each indicating the index of a class label from the labels file</td>
</tr>
<tr>
<td>2</td>
<td>Scores</td>
<td>Array of 10 floating point values between 0 and 1 representing probability that a class was detected</td>
</tr>
<tr>
<td>3</td>
<td>Number and detections</td>
<td>Array of length 1 containing a floating point value expressing the total number of detection results</td>
</tr>
</tbody>
</table>
## Customize model
The pre-trained models we provide are trained to detect 90 classes of objects. The pre-trained models we provide are trained to detect 90 classes of objects.
For a full list of classes, see the labels file in the For a full list of classes, see the labels file in the
@ -309,8 +349,15 @@ You can use a technique known as transfer learning to re-train a model to
recognize classes not in the original set. For example, you could re-train the recognize classes not in the original set. For example, you could re-train the
model to detect multiple types of vegetable, despite there only being one model to detect multiple types of vegetable, despite there only being one
vegetable in the original training data. To do this, you will need a set of vegetable in the original training data. To do this, you will need a set of
training images for each of the new labels you wish to train. training images for each of the new labels you wish to train. Please see our
[Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb)
as an example of fine-tuning a pre-trained model with few examples.
Learn how to perform transfer learning in For fine-tuning with larger datasets, take a look at the these guides for
<a href="https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193">Training training your own models with the TensorFlow Object Detection API:
and serving a real-time mobile object detector in 30 minutes</a>. [TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md),
[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md).
Once trained, they can be converted to a TFLite-friendly format with the
instructions here:
[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md),
[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)