Update detection documentation with latest models & instructions

PiperOrigin-RevId: 336694589 Change-Id: I6932c3a3f91eb184acd7e51ccbfe80c74b077c7d
2020-10-12 10:38:44 -07:00 · 2020-10-12 10:38:44 -07:00 · d855fc792a
commit d855fc792a
parent b81b896808
1 changed files with 149 additions and 102 deletions
--- a/tensorflow/lite/g3doc/models/object_detection/overview.md
+++ b/tensorflow/lite/g3doc/models/object_detection/overview.md
@ -2,45 +2,55 @@

 <img src="../images/detection.png" class="attempt-right">

-Detect multiple objects within an image, with bounding boxes. Recognize 90
-different classes of objects.
+Given an image or a video stream, an object detection model can identify which
+of a known set of objects might be present and provide information about their
+positions within the image.
+
+For example, this screenshot of the <a href="#get_started">example
+application</a> shows how two objects have been recognized and their positions
+annotated:
+
+<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">

 ## Get started

-If you are new to TensorFlow Lite and are working with Android or iOS, we
-recommend exploring the following example applications that can help you get
-started.
+If you are new to TensorFlow Lite and are working with Android or iOS, download
+the following example applications to get started.

 <a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">Android
 example</a>
 <a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">iOS
 example</a>

-If you are using a platform other than Android or iOS, or you are already
-familiar with the <a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite APIs</a>, you can
-download our starter object detection model and the accompanying labels.
+If you are using a platform other than Android or iOS, or if you are already
+familiar with the
+<a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite
+APIs</a>, you can download the starter object detection model and the
+accompanying labels.

 <a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
 starter model with Metadata</a>

-For more information about the starter model, see
-<a href="#starter_model">Starter model</a>.
-
 For more information about Metadata and associated fields (eg: `labels.txt`) see
 <a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read
 the metadata from models</a>

-## What is object detection?
+If you want to train a custom detection model for your own task, see
+<a href="#model-customization">Model customization</a>.

-Given an image or a video stream, an object detection model can identify which
-of a known set of objects might be present and provide information about their
-positions within the image.
+For the following use cases, you should use a different type of model:

-For example, this screenshot of our <a href="#get_started">example
-application</a> shows how two objects have been recognized and their positions
-annotated:
+<ul>
+  <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
+  <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
+</ul>

-<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
+## Model description
+
+This section describes the signature for
+[Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to
+TensorFlow Lite from the
+[TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/).

 An object detection model is trained to detect the presence and location of
 multiple classes of objects. For example, a model might be trained with images
@ -48,15 +58,69 @@ that contain various pieces of fruit, along with a _label_ that specifies the
 class of fruit they represent (e.g. an apple, a banana, or a strawberry), and
 data specifying where each object appears in the image.

-When we subsequently provide an image to the model, it will output a list of the
-objects it detects, the location of a bounding box that contains each object,
-and a score that indicates the confidence that detection was correct.
+When an image is subsequently provided to the model, it will output a list of
+the objects it detects, the location of a bounding box that contains each
+object, and a score that indicates the confidence that detection was correct.

-### Model output
+### Input Signature

-Imagine a model has been trained to detect apples, bananas, and strawberries.
-When we pass it an image, it will output a set number of detection results - in
-this example, 5.
+The model takes an image as input.
+
+Lets assume the expected image is 300x300 pixels, with three channels (red,
+blue, and green) per pixel. This should be fed to the model as a flattened
+buffer of 270,000 byte values (300x300x3). If the model is
+<a href="../../performance/post_training_quantization.md">quantized</a>, each
+value should be a single byte representing a value between 0 and 255.
+
+You can take a look at our
+[example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android)
+to understand how to do this pre-processing on Android.
+
+### Output Signature
+
+The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
+describe `N` detected objects, with one element in each array corresponding to
+each object.
+
+<table>
+  <thead>
+    <tr>
+      <th>Index</th>
+      <th>Name</th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>0</td>
+      <td>Locations</td>
+      <td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
+    </tr>
+    <tr>
+      <td>1</td>
+      <td>Classes</td>
+      <td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td>
+    </tr>
+    <tr>
+      <td>2</td>
+      <td>Scores</td>
+      <td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td>
+    </tr>
+    <tr>
+      <td>3</td>
+      <td>Number of detections</td>
+      <td>Integer value of N</td>
+    </tr>
+  </tbody>
+</table>
+
+NOTE: The number of results (10 in the above case) is a parameter set while
+exporting the detection model to TensorFlow Lite. See
+<a href="#model-customization">Model customization</a> for more details.
+
+For example, imagine a model has been trained to detect apples, bananas, and
+strawberries. When provided an image, it will output a set number of detection
+results - in this example, 5.

 <table style="width: 60%;">
  <thead>
@ -95,7 +159,7 @@ this example, 5.
  </tbody>
 </table>

-### Confidence score
+#### Confidence score

 To interpret these results, we can look at the score and the location for each
 detected object. The score is a number between 0 and 1 that indicates confidence
@ -103,10 +167,10 @@ that the object was genuinely detected. The closer the number is to 1, the more
 confident the model is.

 Depending on your application, you can decide a cut-off threshold below which
-you will discard detection results. For our example, we might decide a sensible
-cut-off is a score of 0.5 (meaning a 50% probability that the detection is
-valid). In that case, we would ignore the last two objects in the array, because
-those confidence scores are below 0.5:
+you will discard detection results. For the current example, a sensible cut-off
+is a score of 0.5 (meaning a 50% probability that the detection is valid). In
+that case, the last two objects in the array would be ignored because those
+confidence scores are below 0.5:

 <table style="width: 60%;">
  <thead>
@ -158,11 +222,11 @@ positive.

 <img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%">

-### Location
+#### Location

 For each detected object, the model will return an array of four numbers
 representing a bounding rectangle that surrounds its position. For the starter
-model we provide, the numbers are ordered as follows:
+model provided, the numbers are ordered as follows:

 <table style="width: 50%; margin: 0 auto;">
  <tbody>
@ -186,7 +250,9 @@ Note: Object detection models accept input images of a specific size. This is li

 ## Performance benchmarks

-Performance benchmark numbers are generated with the tool
+Performance benchmark numbers for our
+<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">starter
+model</a> are generated with the tool
 [described here](https://www.tensorflow.org/lite/performance/benchmarks).

 <table>
@ -226,79 +292,53 @@ Performance benchmark numbers are generated with the tool

 \*\* 2 threads used on iPhone for the best performance result.

-## Starter model
+## Model Customization

-We recommend starting with this pre-trained quantized COCO SSD MobileNet v1
-model.
+### Pre-trained models

-<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
-starter model and labels</a>
+Mobile-optimized detection models with a variety of latency and precision
+characteristics can be found in the
+[Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models).
+Each one of them follows the input and output signatures described in the
+following sections.

-### Uses and limitations
+Most of the download zips contain a `model.tflite` file. If there isn't one, a
+TensorFlow Lite flatbuffer can be generated using
+[these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
+SSD models from the
+[TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
+can also be converted to TensorFlow Lite using the instructions
+[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md).
+It is important to note that detection models cannot be converted directly using
+the [TensorFlow Lite Converter](https://www.tensorflow.org/lite/convert), since
+they require an intermediate step of generating a mobile-friendly source model.
+The scripts linked above perform this step.

-The object detection model we provide can identify and locate up to 10 objects
-in an image. It is trained to recognize 90 classes of objects. For a full list
-of classes, see the labels file embedded in the model with
-<a href="https://www.tensorflow.org/lite/convert/metadata#visualize_the_metadata">metadata
-visualiztion</a>.
+Both the
+[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
+&
+[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
+exporting scripts have parameters that can enable a larger number of output
+objects or slower, more-accurate post processing. Please use `--help` with the
+scripts to see an exhaustive list of supported arguments.

-If you want to train a model to recognize new classes, see
-<a href="#customize_model">Customize model</a>.
+> Currently, on-device inference is only optimized with SSD models. Better
+> support for other architectures like CenterNet and EfficientDet is being
+> investigated.

-For the following use cases, you should use a different type of model:
+### How to choose a model to customize?

-<ul>
-  <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
-  <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
-</ul>
+Each model comes with its own precision (quantified by mAP value) and latency
+characteristics. You should choose a model that works the best for your use-case
+and intended hardware. For example, the
+[Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models)
+models are ideal for inference on Google's Edge TPU on Pixel 4.

-### Input
+You can use our
+[benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to
+evaluate models and choose the most efficient option available.

-The model takes an image as input. The expected image is 300x300 pixels, with
-three channels (red, blue, and green) per pixel. This should be fed to the model
-as a flattened buffer of 270,000 byte values (300x300x3). Since the model is
-<a href="../../performance/post_training_quantization.md">quantized</a>, each
-value should be a single byte representing a value between 0 and 255.
-
-### Output
-
-The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
-describe 10 detected objects, with one element in each array corresponding to
-each object. There will always be 10 objects detected.
-
-<table>
-  <thead>
-    <tr>
-      <th>Index</th>
-      <th>Name</th>
-      <th>Description</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>0</td>
-      <td>Locations</td>
-      <td>Multidimensional array of [10][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
-    </tr>
-    <tr>
-      <td>1</td>
-      <td>Classes</td>
-      <td>Array of 10 integers (output as floating point values) each indicating the index of a class label from the labels file</td>
-    </tr>
-    <tr>
-      <td>2</td>
-      <td>Scores</td>
-      <td>Array of 10 floating point values between 0 and 1 representing probability that a class was detected</td>
-    </tr>
-    <tr>
-      <td>3</td>
-      <td>Number and detections</td>
-      <td>Array of length 1 containing a floating point value expressing the total number of detection results</td>
-    </tr>
-  </tbody>
-</table>
-
-## Customize model
+## Fine-tuning models on custom data

 The pre-trained models we provide are trained to detect 90 classes of objects.
 For a full list of classes, see the labels file in the
@ -309,8 +349,15 @@ You can use a technique known as transfer learning to re-train a model to
 recognize classes not in the original set. For example, you could re-train the
 model to detect multiple types of vegetable, despite there only being one
 vegetable in the original training data. To do this, you will need a set of
-training images for each of the new labels you wish to train.
+training images for each of the new labels you wish to train. Please see our
+[Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb)
+as an example of fine-tuning a pre-trained model with few examples.

-Learn how to perform transfer learning in
-<a href="https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193">Training
-and serving a real-time mobile object detector in 30 minutes</a>.
+For fine-tuning with larger datasets, take a look at the these guides for
+training your own models with the TensorFlow Object Detection API:
+[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md),
+[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md).
+Once trained, they can be converted to a TFLite-friendly format with the
+instructions here:
+[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md),
+[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)