Update detection documentation with latest models & instructions
PiperOrigin-RevId: 336694589 Change-Id: I6932c3a3f91eb184acd7e51ccbfe80c74b077c7d
This commit is contained in:
parent
b81b896808
commit
d855fc792a
@ -2,45 +2,55 @@
|
||||
|
||||
<img src="../images/detection.png" class="attempt-right">
|
||||
|
||||
Detect multiple objects within an image, with bounding boxes. Recognize 90
|
||||
different classes of objects.
|
||||
Given an image or a video stream, an object detection model can identify which
|
||||
of a known set of objects might be present and provide information about their
|
||||
positions within the image.
|
||||
|
||||
For example, this screenshot of the <a href="#get_started">example
|
||||
application</a> shows how two objects have been recognized and their positions
|
||||
annotated:
|
||||
|
||||
<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
|
||||
|
||||
## Get started
|
||||
|
||||
If you are new to TensorFlow Lite and are working with Android or iOS, we
|
||||
recommend exploring the following example applications that can help you get
|
||||
started.
|
||||
If you are new to TensorFlow Lite and are working with Android or iOS, download
|
||||
the following example applications to get started.
|
||||
|
||||
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">Android
|
||||
example</a>
|
||||
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">iOS
|
||||
example</a>
|
||||
|
||||
If you are using a platform other than Android or iOS, or you are already
|
||||
familiar with the <a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite APIs</a>, you can
|
||||
download our starter object detection model and the accompanying labels.
|
||||
If you are using a platform other than Android or iOS, or if you are already
|
||||
familiar with the
|
||||
<a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite
|
||||
APIs</a>, you can download the starter object detection model and the
|
||||
accompanying labels.
|
||||
|
||||
<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
|
||||
starter model with Metadata</a>
|
||||
|
||||
For more information about the starter model, see
|
||||
<a href="#starter_model">Starter model</a>.
|
||||
|
||||
For more information about Metadata and associated fields (eg: `labels.txt`) see
|
||||
<a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read
|
||||
the metadata from models</a>
|
||||
|
||||
## What is object detection?
|
||||
If you want to train a custom detection model for your own task, see
|
||||
<a href="#model-customization">Model customization</a>.
|
||||
|
||||
Given an image or a video stream, an object detection model can identify which
|
||||
of a known set of objects might be present and provide information about their
|
||||
positions within the image.
|
||||
For the following use cases, you should use a different type of model:
|
||||
|
||||
For example, this screenshot of our <a href="#get_started">example
|
||||
application</a> shows how two objects have been recognized and their positions
|
||||
annotated:
|
||||
<ul>
|
||||
<li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
|
||||
<li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
|
||||
</ul>
|
||||
|
||||
<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
|
||||
## Model description
|
||||
|
||||
This section describes the signature for
|
||||
[Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to
|
||||
TensorFlow Lite from the
|
||||
[TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/).
|
||||
|
||||
An object detection model is trained to detect the presence and location of
|
||||
multiple classes of objects. For example, a model might be trained with images
|
||||
@ -48,15 +58,69 @@ that contain various pieces of fruit, along with a _label_ that specifies the
|
||||
class of fruit they represent (e.g. an apple, a banana, or a strawberry), and
|
||||
data specifying where each object appears in the image.
|
||||
|
||||
When we subsequently provide an image to the model, it will output a list of the
|
||||
objects it detects, the location of a bounding box that contains each object,
|
||||
and a score that indicates the confidence that detection was correct.
|
||||
When an image is subsequently provided to the model, it will output a list of
|
||||
the objects it detects, the location of a bounding box that contains each
|
||||
object, and a score that indicates the confidence that detection was correct.
|
||||
|
||||
### Model output
|
||||
### Input Signature
|
||||
|
||||
Imagine a model has been trained to detect apples, bananas, and strawberries.
|
||||
When we pass it an image, it will output a set number of detection results - in
|
||||
this example, 5.
|
||||
The model takes an image as input.
|
||||
|
||||
Lets assume the expected image is 300x300 pixels, with three channels (red,
|
||||
blue, and green) per pixel. This should be fed to the model as a flattened
|
||||
buffer of 270,000 byte values (300x300x3). If the model is
|
||||
<a href="../../performance/post_training_quantization.md">quantized</a>, each
|
||||
value should be a single byte representing a value between 0 and 255.
|
||||
|
||||
You can take a look at our
|
||||
[example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android)
|
||||
to understand how to do this pre-processing on Android.
|
||||
|
||||
### Output Signature
|
||||
|
||||
The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
|
||||
describe `N` detected objects, with one element in each array corresponding to
|
||||
each object.
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Index</th>
|
||||
<th>Name</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>0</td>
|
||||
<td>Locations</td>
|
||||
<td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>1</td>
|
||||
<td>Classes</td>
|
||||
<td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2</td>
|
||||
<td>Scores</td>
|
||||
<td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>3</td>
|
||||
<td>Number of detections</td>
|
||||
<td>Integer value of N</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
NOTE: The number of results (10 in the above case) is a parameter set while
|
||||
exporting the detection model to TensorFlow Lite. See
|
||||
<a href="#model-customization">Model customization</a> for more details.
|
||||
|
||||
For example, imagine a model has been trained to detect apples, bananas, and
|
||||
strawberries. When provided an image, it will output a set number of detection
|
||||
results - in this example, 5.
|
||||
|
||||
<table style="width: 60%;">
|
||||
<thead>
|
||||
@ -95,7 +159,7 @@ this example, 5.
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
### Confidence score
|
||||
#### Confidence score
|
||||
|
||||
To interpret these results, we can look at the score and the location for each
|
||||
detected object. The score is a number between 0 and 1 that indicates confidence
|
||||
@ -103,10 +167,10 @@ that the object was genuinely detected. The closer the number is to 1, the more
|
||||
confident the model is.
|
||||
|
||||
Depending on your application, you can decide a cut-off threshold below which
|
||||
you will discard detection results. For our example, we might decide a sensible
|
||||
cut-off is a score of 0.5 (meaning a 50% probability that the detection is
|
||||
valid). In that case, we would ignore the last two objects in the array, because
|
||||
those confidence scores are below 0.5:
|
||||
you will discard detection results. For the current example, a sensible cut-off
|
||||
is a score of 0.5 (meaning a 50% probability that the detection is valid). In
|
||||
that case, the last two objects in the array would be ignored because those
|
||||
confidence scores are below 0.5:
|
||||
|
||||
<table style="width: 60%;">
|
||||
<thead>
|
||||
@ -158,11 +222,11 @@ positive.
|
||||
|
||||
<img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%">
|
||||
|
||||
### Location
|
||||
#### Location
|
||||
|
||||
For each detected object, the model will return an array of four numbers
|
||||
representing a bounding rectangle that surrounds its position. For the starter
|
||||
model we provide, the numbers are ordered as follows:
|
||||
model provided, the numbers are ordered as follows:
|
||||
|
||||
<table style="width: 50%; margin: 0 auto;">
|
||||
<tbody>
|
||||
@ -186,7 +250,9 @@ Note: Object detection models accept input images of a specific size. This is li
|
||||
|
||||
## Performance benchmarks
|
||||
|
||||
Performance benchmark numbers are generated with the tool
|
||||
Performance benchmark numbers for our
|
||||
<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">starter
|
||||
model</a> are generated with the tool
|
||||
[described here](https://www.tensorflow.org/lite/performance/benchmarks).
|
||||
|
||||
<table>
|
||||
@ -226,79 +292,53 @@ Performance benchmark numbers are generated with the tool
|
||||
|
||||
\*\* 2 threads used on iPhone for the best performance result.
|
||||
|
||||
## Starter model
|
||||
## Model Customization
|
||||
|
||||
We recommend starting with this pre-trained quantized COCO SSD MobileNet v1
|
||||
model.
|
||||
### Pre-trained models
|
||||
|
||||
<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
|
||||
starter model and labels</a>
|
||||
Mobile-optimized detection models with a variety of latency and precision
|
||||
characteristics can be found in the
|
||||
[Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models).
|
||||
Each one of them follows the input and output signatures described in the
|
||||
following sections.
|
||||
|
||||
### Uses and limitations
|
||||
Most of the download zips contain a `model.tflite` file. If there isn't one, a
|
||||
TensorFlow Lite flatbuffer can be generated using
|
||||
[these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
|
||||
SSD models from the
|
||||
[TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
|
||||
can also be converted to TensorFlow Lite using the instructions
|
||||
[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md).
|
||||
It is important to note that detection models cannot be converted directly using
|
||||
the [TensorFlow Lite Converter](https://www.tensorflow.org/lite/convert), since
|
||||
they require an intermediate step of generating a mobile-friendly source model.
|
||||
The scripts linked above perform this step.
|
||||
|
||||
The object detection model we provide can identify and locate up to 10 objects
|
||||
in an image. It is trained to recognize 90 classes of objects. For a full list
|
||||
of classes, see the labels file embedded in the model with
|
||||
<a href="https://www.tensorflow.org/lite/convert/metadata#visualize_the_metadata">metadata
|
||||
visualiztion</a>.
|
||||
Both the
|
||||
[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
|
||||
&
|
||||
[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
|
||||
exporting scripts have parameters that can enable a larger number of output
|
||||
objects or slower, more-accurate post processing. Please use `--help` with the
|
||||
scripts to see an exhaustive list of supported arguments.
|
||||
|
||||
If you want to train a model to recognize new classes, see
|
||||
<a href="#customize_model">Customize model</a>.
|
||||
> Currently, on-device inference is only optimized with SSD models. Better
|
||||
> support for other architectures like CenterNet and EfficientDet is being
|
||||
> investigated.
|
||||
|
||||
For the following use cases, you should use a different type of model:
|
||||
### How to choose a model to customize?
|
||||
|
||||
<ul>
|
||||
<li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
|
||||
<li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
|
||||
</ul>
|
||||
Each model comes with its own precision (quantified by mAP value) and latency
|
||||
characteristics. You should choose a model that works the best for your use-case
|
||||
and intended hardware. For example, the
|
||||
[Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models)
|
||||
models are ideal for inference on Google's Edge TPU on Pixel 4.
|
||||
|
||||
### Input
|
||||
You can use our
|
||||
[benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to
|
||||
evaluate models and choose the most efficient option available.
|
||||
|
||||
The model takes an image as input. The expected image is 300x300 pixels, with
|
||||
three channels (red, blue, and green) per pixel. This should be fed to the model
|
||||
as a flattened buffer of 270,000 byte values (300x300x3). Since the model is
|
||||
<a href="../../performance/post_training_quantization.md">quantized</a>, each
|
||||
value should be a single byte representing a value between 0 and 255.
|
||||
|
||||
### Output
|
||||
|
||||
The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
|
||||
describe 10 detected objects, with one element in each array corresponding to
|
||||
each object. There will always be 10 objects detected.
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Index</th>
|
||||
<th>Name</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>0</td>
|
||||
<td>Locations</td>
|
||||
<td>Multidimensional array of [10][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>1</td>
|
||||
<td>Classes</td>
|
||||
<td>Array of 10 integers (output as floating point values) each indicating the index of a class label from the labels file</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2</td>
|
||||
<td>Scores</td>
|
||||
<td>Array of 10 floating point values between 0 and 1 representing probability that a class was detected</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>3</td>
|
||||
<td>Number and detections</td>
|
||||
<td>Array of length 1 containing a floating point value expressing the total number of detection results</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
## Customize model
|
||||
## Fine-tuning models on custom data
|
||||
|
||||
The pre-trained models we provide are trained to detect 90 classes of objects.
|
||||
For a full list of classes, see the labels file in the
|
||||
@ -309,8 +349,15 @@ You can use a technique known as transfer learning to re-train a model to
|
||||
recognize classes not in the original set. For example, you could re-train the
|
||||
model to detect multiple types of vegetable, despite there only being one
|
||||
vegetable in the original training data. To do this, you will need a set of
|
||||
training images for each of the new labels you wish to train.
|
||||
training images for each of the new labels you wish to train. Please see our
|
||||
[Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb)
|
||||
as an example of fine-tuning a pre-trained model with few examples.
|
||||
|
||||
Learn how to perform transfer learning in
|
||||
<a href="https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193">Training
|
||||
and serving a real-time mobile object detector in 30 minutes</a>.
|
||||
For fine-tuning with larger datasets, take a look at the these guides for
|
||||
training your own models with the TensorFlow Object Detection API:
|
||||
[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md),
|
||||
[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md).
|
||||
Once trained, they can be converted to a TFLite-friendly format with the
|
||||
instructions here:
|
||||
[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md),
|
||||
[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
|
||||
|
Loading…
x
Reference in New Issue
Block a user