Update detection documentation with latest models & instructions

PiperOrigin-RevId: 336694589 Change-Id: I6932c3a3f91eb184acd7e51ccbfe80c74b077c7d
2020-10-12 10:38:44 -07:00 · 2020-10-12 10:38:44 -07:00 · d855fc792a
commit d855fc792a
parent b81b896808
1 changed files with 149 additions and 102 deletions
--- a/tensorflow/lite/g3doc/models/object_detection/overview.md
+++ b/tensorflow/lite/g3doc/models/object_detection/overview.md
@ -2,45 +2,55 @@
 <img src="../images/detection.png" class="attempt-right">
-Detect multiple objects within an image, with bounding boxes. Recognize 90
+Given an image or a video stream, an object detection model can identify which
-different classes of objects.
+of a known set of objects might be present and provide information about their
 positions within the image.
 For example, this screenshot of the <a href="#get_started">example
 application</a> shows how two objects have been recognized and their positions
 annotated:
 <img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
 ## Get started
-If you are new to TensorFlow Lite and are working with Android or iOS, we
+If you are new to TensorFlow Lite and are working with Android or iOS, download
-recommend exploring the following example applications that can help you get
+the following example applications to get started.
 started.
 <a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android">Android
 example</a>
 <a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/ios">iOS
 example</a>
-If you are using a platform other than Android or iOS, or you are already
+If you are using a platform other than Android or iOS, or if you are already
-familiar with the <a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite APIs</a>, you can
+familiar with the
-download our starter object detection model and the accompanying labels.
+<a href="https://www.tensorflow.org/api_docs/python/tf/lite">TensorFlow Lite
 APIs</a>, you can download the starter object detection model and the
 accompanying labels.
 <a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
 starter model with Metadata</a>
 For more information about the starter model, see
 <a href="#starter_model">Starter model</a>.
 For more information about Metadata and associated fields (eg: `labels.txt`) see
 <a href="https://www.tensorflow.org/lite/convert/metadata#read_the_metadata_from_models">Read
 the metadata from models</a>
-## What is object detection?
+If you want to train a custom detection model for your own task, see
 <a href="#model-customization">Model customization</a>.
-Given an image or a video stream, an object detection model can identify which
+For the following use cases, you should use a different type of model:
 of a known set of objects might be present and provide information about their
 positions within the image.
-For example, this screenshot of our <a href="#get_started">example
+<ul>
-application</a> shows how two objects have been recognized and their positions
+  <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
-annotated:
+  <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
 </ul>
-<img src="images/android_apple_banana.png" alt="Screenshot of Android example" width="30%">
+## Model description
 This section describes the signature for
 [Single-Shot Detector](https://arxiv.org/abs/1512.02325) models converted to
 TensorFlow Lite from the
 [TensorFlow Object Detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/).
 An object detection model is trained to detect the presence and location of
 multiple classes of objects. For example, a model might be trained with images
@ -48,15 +58,69 @@ that contain various pieces of fruit, along with a _label_ that specifies the
 class of fruit they represent (e.g. an apple, a banana, or a strawberry), and
 data specifying where each object appears in the image.
-When we subsequently provide an image to the model, it will output a list of the
+When an image is subsequently provided to the model, it will output a list of
-objects it detects, the location of a bounding box that contains each object,
+the objects it detects, the location of a bounding box that contains each
-and a score that indicates the confidence that detection was correct.
+object, and a score that indicates the confidence that detection was correct.
-### Model output
+### Input Signature
-Imagine a model has been trained to detect apples, bananas, and strawberries.
+The model takes an image as input.
-When we pass it an image, it will output a set number of detection results - in
+
-this example, 5.
+Lets assume the expected image is 300x300 pixels, with three channels (red,
 blue, and green) per pixel. This should be fed to the model as a flattened
 buffer of 270,000 byte values (300x300x3). If the model is
 <a href="../../performance/post_training_quantization.md">quantized</a>, each
 value should be a single byte representing a value between 0 and 255.
 You can take a look at our
 [example app code](https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android)
 to understand how to do this pre-processing on Android.
 ### Output Signature
 The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
 describe `N` detected objects, with one element in each array corresponding to
 each object.
 <table>
  <thead>
    <tr>
      <th>Index</th>
      <th>Name</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>0</td>
      <td>Locations</td>
      <td>Multidimensional array of [N][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
    </tr>
    <tr>
      <td>1</td>
      <td>Classes</td>
      <td>Array of N integers (output as floating point values) each indicating the index of a class label from the labels file</td>
    </tr>
    <tr>
      <td>2</td>
      <td>Scores</td>
      <td>Array of N floating point values between 0 and 1 representing probability that a class was detected</td>
    </tr>
    <tr>
      <td>3</td>
      <td>Number of detections</td>
      <td>Integer value of N</td>
    </tr>
  </tbody>
 </table>
 NOTE: The number of results (10 in the above case) is a parameter set while
 exporting the detection model to TensorFlow Lite. See
 <a href="#model-customization">Model customization</a> for more details.
 For example, imagine a model has been trained to detect apples, bananas, and
 strawberries. When provided an image, it will output a set number of detection
 results - in this example, 5.
 <table style="width: 60%;">
  <thead>
@ -95,7 +159,7 @@ this example, 5.
  </tbody>
 </table>
-### Confidence score
+#### Confidence score
 To interpret these results, we can look at the score and the location for each
 detected object. The score is a number between 0 and 1 that indicates confidence
@ -103,10 +167,10 @@ that the object was genuinely detected. The closer the number is to 1, the more
 confident the model is.
 Depending on your application, you can decide a cut-off threshold below which
-you will discard detection results. For our example, we might decide a sensible
+you will discard detection results. For the current example, a sensible cut-off
-cut-off is a score of 0.5 (meaning a 50% probability that the detection is
+is a score of 0.5 (meaning a 50% probability that the detection is valid). In
-valid). In that case, we would ignore the last two objects in the array, because
+that case, the last two objects in the array would be ignored because those
-those confidence scores are below 0.5:
+confidence scores are below 0.5:
 <table style="width: 60%;">
  <thead>
@ -158,11 +222,11 @@ positive.
 <img src="images/false_positive.png" alt="Screenshot of Android example showing a false positive" width="30%">
-### Location
+#### Location
 For each detected object, the model will return an array of four numbers
 representing a bounding rectangle that surrounds its position. For the starter
-model we provide, the numbers are ordered as follows:
+model provided, the numbers are ordered as follows:
 <table style="width: 50%; margin: 0 auto;">
  <tbody>
@ -186,7 +250,9 @@ Note: Object detection models accept input images of a specific size. This is li
 ## Performance benchmarks
-Performance benchmark numbers are generated with the tool
+Performance benchmark numbers for our
 <a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip">starter
 model</a> are generated with the tool
 [described here](https://www.tensorflow.org/lite/performance/benchmarks).
 <table>
@ -226,79 +292,53 @@ Performance benchmark numbers are generated with the tool
 \*\* 2 threads used on iPhone for the best performance result.
-## Starter model
+## Model Customization
-We recommend starting with this pre-trained quantized COCO SSD MobileNet v1
+### Pre-trained models
 model.
-<a class="button button-primary" href="https://tfhub.dev/tensorflow/lite-model/ssd_mobilenet_v1/1/metadata/1?lite-format=tflite">Download
+Mobile-optimized detection models with a variety of latency and precision
-starter model and labels</a>
+characteristics can be found in the
 [Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#mobile-models).
 Each one of them follows the input and output signatures described in the
 following sections.
-### Uses and limitations
+Most of the download zips contain a `model.tflite` file. If there isn't one, a
 TensorFlow Lite flatbuffer can be generated using
 [these instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md).
 SSD models from the
 [TF2 Object Detection Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
 can also be converted to TensorFlow Lite using the instructions
 [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md).
 It is important to note that detection models cannot be converted directly using
 the [TensorFlow Lite Converter](https://www.tensorflow.org/lite/convert), since
 they require an intermediate step of generating a mobile-friendly source model.
 The scripts linked above perform this step.
-The object detection model we provide can identify and locate up to 10 objects
+Both the
-in an image. It is trained to recognize 90 classes of objects. For a full list
+[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
-of classes, see the labels file embedded in the model with
+&
-<a href="https://www.tensorflow.org/lite/convert/metadata#visualize_the_metadata">metadata
+[TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)
-visualiztion</a>.
+exporting scripts have parameters that can enable a larger number of output
 objects or slower, more-accurate post processing. Please use `--help` with the
 scripts to see an exhaustive list of supported arguments.
-If you want to train a model to recognize new classes, see
+> Currently, on-device inference is only optimized with SSD models. Better
-<a href="#customize_model">Customize model</a>.
+> support for other architectures like CenterNet and EfficientDet is being
 > investigated.
-For the following use cases, you should use a different type of model:
+### How to choose a model to customize?
-<ul>
+Each model comes with its own precision (quantified by mAP value) and latency
-  <li>Predicting which single label the image most likely represents (see <a href="../image_classification/overview.md">image classification</a>)</li>
+characteristics. You should choose a model that works the best for your use-case
-  <li>Predicting the composition of an image, for example subject versus background (see <a href="../segmentation/overview.md">segmentation</a>)</li>
+and intended hardware. For example, the
-</ul>
+[Edge TPU](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md#pixel4-edge-tpu-models)
 models are ideal for inference on Google's Edge TPU on Pixel 4.
-### Input
+You can use our
 [benchmark tool](https://www.tensorflow.org/lite/performance/measurement) to
 evaluate models and choose the most efficient option available.
-The model takes an image as input. The expected image is 300x300 pixels, with
+## Fine-tuning models on custom data
 three channels (red, blue, and green) per pixel. This should be fed to the model
 as a flattened buffer of 270,000 byte values (300x300x3). Since the model is
 <a href="../../performance/post_training_quantization.md">quantized</a>, each
 value should be a single byte representing a value between 0 and 255.
 ### Output
 The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2
 describe 10 detected objects, with one element in each array corresponding to
 each object. There will always be 10 objects detected.
 <table>
  <thead>
    <tr>
      <th>Index</th>
      <th>Name</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>0</td>
      <td>Locations</td>
      <td>Multidimensional array of [10][4] floating point values between 0 and 1, the inner arrays representing bounding boxes in the form [top, left, bottom, right]</td>
    </tr>
    <tr>
      <td>1</td>
      <td>Classes</td>
      <td>Array of 10 integers (output as floating point values) each indicating the index of a class label from the labels file</td>
    </tr>
    <tr>
      <td>2</td>
      <td>Scores</td>
      <td>Array of 10 floating point values between 0 and 1 representing probability that a class was detected</td>
    </tr>
    <tr>
      <td>3</td>
      <td>Number and detections</td>
      <td>Array of length 1 containing a floating point value expressing the total number of detection results</td>
    </tr>
  </tbody>
 </table>
 ## Customize model
 The pre-trained models we provide are trained to detect 90 classes of objects.
 For a full list of classes, see the labels file in the
@ -309,8 +349,15 @@ You can use a technique known as transfer learning to re-train a model to
 recognize classes not in the original set. For example, you could re-train the
 model to detect multiple types of vegetable, despite there only being one
 vegetable in the original training data. To do this, you will need a set of
-training images for each of the new labels you wish to train.
+training images for each of the new labels you wish to train. Please see our
 [Few-shot detection Colab](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tflite.ipynb)
 as an example of fine-tuning a pre-trained model with few examples.
-Learn how to perform transfer learning in
+For fine-tuning with larger datasets, take a look at the these guides for
-<a href="https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193">Training
+training your own models with the TensorFlow Object Detection API:
-and serving a real-time mobile object detector in 30 minutes</a>.
+[TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_training_and_evaluation.md),
 [TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md).
 Once trained, they can be converted to a TFLite-friendly format with the
 instructions here:
 [TF1](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md),
 [TF2](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tensorflowlite.md)