diff --git a/tensorflow/lite/g3doc/_book.yaml b/tensorflow/lite/g3doc/_book.yaml index 7eaf64c9a4a..ac9d5349642 100644 --- a/tensorflow/lite/g3doc/_book.yaml +++ b/tensorflow/lite/g3doc/_book.yaml @@ -79,40 +79,36 @@ upper_tabs: - title: Optimizing for mobile path: /lite/tfmobile/optimizing - # - name: Models - # contents: - # - title: Overview - # path: /lite/models/ - # - title: Hosted models - # path: /lite/models/hosted - # - title: Image classification - # section: - # - title: Overview - # path: /lite/models/image_classification/overview - # - title: Android - # path: /lite/models/image_classification/android - # - title: iOS - # path: /lite/models/image_classification/ios - # - title: Object detection - # section: - # - title: Overview - # path: /lite/models/object_detection/overview - # - title: Speech recognition - # section: - # - title: Overview - # path: /lite/models/speech_recognition/overview - # - title: Pose estimation - # section: - # - title: Overview - # path: /lite/models/pose_estimation/overview - # - title: Segmentation - # section: - # - title: Overview - # path: /lite/models/segmentation/overview - # - title: Smart reply - # section: - # - title: Overview - # path: /lite/models/smart_reply/overview + - name: Models + contents: + - title: Overview + path: /lite/models/ + - title: Hosted models + path: /lite/models/hosted + - title: Image classification + section: + - title: Overview + path: /lite/models/image_classification/overview + - title: Android + path: /lite/models/image_classification/android + - title: iOS + path: /lite/models/image_classification/ios + - title: Object detection + section: + - title: Overview + path: /lite/models/object_detection/overview + - title: Pose estimation + section: + - title: Overview + path: /lite/models/pose_estimation/overview + - title: Segmentation + section: + - title: Overview + path: /lite/models/segmentation/overview + - title: Smart reply + section: + - title: Overview + path: /lite/models/smart_reply/overview - name: API skip_translation: true diff --git a/tensorflow/lite/g3doc/models/hosted.md b/tensorflow/lite/g3doc/models/hosted.md index 84421e1fc4b..bc4b90824f0 100644 --- a/tensorflow/lite/g3doc/models/hosted.md +++ b/tensorflow/lite/g3doc/models/hosted.md @@ -1,63 +1,27 @@ # Hosted models -# AutoML mobile image classification models (Float Models) +The following is an incomplete list of pre-trained models optimized to work with +TensorFlow Lite. -Model Name | Paper_Model_Files | Model_Size | Top-1 Accuracy | Top-5 Accuracy | TF Lite Performance^ -------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ---------: | -------------: | -------------: | ---------------------: -MnasNet_0.50_224| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_0.5_224_09_07_2018.tgz) | 8.5 Mb | 68.03% | 87.79% | 37 ms -MnasNet_0.75_224| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_0.75_224_09_07_2018.tgz) | 12 Mb | 71.72% | 90.17% | 61 ms -MnasNet_1.0_96| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_96_09_07_2018.tgz) | 17 Mb | 62.33% | 83.98% | 23 ms -MnasNet_1.0_128| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_128_09_07_2018.tgz) | 17 Mb | 67.32% | 87.70% | 34 ms -MnasNet_1.0_160| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_160_09_07_2018.tgz) | 17 Mb | 70.63% | 89.58% | 51 ms -MnasNet_1.0_192| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_192_09_07_2018.tgz) | 17 Mb | 72.56% | 90.76% | 70 ms -MnasNet_1.0_224| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_224_09_07_2018.tgz) | 17 Mb | 74.08% | 91.75% | 93 ms -MnasNet_1.3_224| [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.3_224_09_07_2018.tgz) | 24 Mb | 75.24% | 92.55% | 152 ms +To get started choosing a model, visit Models. +Note: The best model for a given application depends on your requirements. For +example, some applications might benefit from higher accuracy, while others +require a small model size. You should test your application with a variety of +models to find the optimal balance between size, performance, and accuracy. -^ Performance numbers are generated on Pixel-1 using single thread large BIG core. +## Image classification +For more information about image classification, see +Image classification. -## Image classification (Float Models) +### Quantized models -Model Name | Paper_Model_Files^ | Model_Size | Top-1 Accuracy | Top-5 Accuracy | TF Lite Performance^^ | Tensorflow Performance ---------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ---------: | -------------: | -------------: | --------------------: | ---------------------: -DenseNet | [paper](https://arxiv.org/abs/1608.06993), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/densenet_2018_04_27.tgz) | 43.6 Mb | 64.2% | 85.6% | 894 ms | 1262 ms -SqueezeNet | [paper](https://arxiv.org/abs/1602.07360), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/squeezenet_2018_04_27.tgz) | 5.0 Mb | 49.0% | 72.9% | 224 ms | 255 ms -NASNet mobile | [paper](https://arxiv.org/abs/1707.07012), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/nasnet_mobile_2018_04_27.tgz) | 21.4 Mb | 73.9% | 91.5% | 261 ms | 389 ms -NASNet large | [paper](https://arxiv.org/abs/1707.07012), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/nasnet_large_2018_04_27.tgz) | 355.3 Mb | 82.6% | 96.1% | 6697 ms | 7940 ms -ResNet_V2_101 | [paper](https://arxiv.org/abs/1603.05027), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite_11_05_08/resnet_v2_101.tgz) | 178.3 Mb | 76.8% | 93.6% | 1880 ms | 1970 ms -Inception_V3 | [paper](http://arxiv.org/abs/1512.00567), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_v3_2018_04_27.tgz) | 95.3 Mb | 77.9% | 93.8% | 1433 ms | 1522 ms -Inception_V4 | [paper](http://arxiv.org/abs/1602.07261), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_v4_2018_04_27.tgz) | 170.7 Mb | 80.1% | 95.1% | 2986 ms | 3139 ms -Inception_ResNet_V2 | [paper](https://arxiv.org/abs/1602.07261), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_resnet_v2_2018_04_27.tgz) | 121.0 Mb | 77.5% | 94.0% | 2731 ms | 2926 ms -Mobilenet_V1_0.25_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_128.tgz) | 1.9 Mb | 41.4% | 66.2% | 6.2 ms | 13.0 ms -Mobilenet_V1_0.25_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_160.tgz) | 1.9 Mb | 45.4% | 70.2% | 8.6 ms | 19.5 ms -Mobilenet_V1_0.25_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_192.tgz) | 1.9 Mb | 47.1% | 72.0% | 12.1 ms | 27.8 ms -Mobilenet_V1_0.25_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_224.tgz) | 1.9 Mb | 49.7% | 74.1% | 16.2 ms | 37.3 ms -Mobilenet_V1_0.50_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_128.tgz) | 5.3 Mb | 56.2% | 79.3% | 18.1 ms | 29.9 ms -Mobilenet_V1_0.50_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_160.tgz) | 5.3 Mb | 59.0% | 81.8% | 26.8 ms | 45.9 ms -Mobilenet_V1_0.50_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_192.tgz) | 5.3 Mb | 61.7% | 83.5% | 35.6 ms | 65.3 ms -Mobilenet_V1_0.50_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_224.tgz) | 5.3 Mb | 63.2% | 84.9% | 47.6 ms | 164.2 ms -Mobilenet_V1_0.75_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_128.tgz) | 10.3 Mb | 62.0% | 83.8% | 34.6 ms | 48.7 ms -Mobilenet_V1_0.75_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_160.tgz) | 10.3 Mb | 65.2% | 85.9% | 51.3 ms | 75.2 ms -Mobilenet_V1_0.75_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_192.tgz) | 10.3 Mb | 67.1% | 87.2% | 71.7 ms | 107.0 ms -Mobilenet_V1_0.75_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_224.tgz) | 10.3 Mb | 68.3% | 88.1% | 95.7 ms | 143.4 ms -Mobilenet_V1_1.0_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_128.tgz) | 16.9 Mb | 65.2% | 85.7% | 57.4 ms | 76.8 ms -Mobilenet_V1_1.0_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_160.tgz) | 16.9 Mb | 68.0% | 87.7% | 86.0 ms | 117.7 ms -Mobilenet_V1_1.0_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_192.tgz) | 16.9 Mb | 69.9% | 89.1% | 118.6 ms | 167.3 ms -Mobilenet_V1_1.0_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz) | 16.9 Mb | 71.0% | 89.9% | 160.1 ms | 224.3 ms -Mobilenet_V2_1.0_224 | [paper](https://arxiv.org/pdf/1801.04381.pdf), [tflite&pb](http://download.tensorflow.org/models/tflite_11_05_08/mobilenet_v2_1.0_224.tgz) | 14.0 Mb | 71.8% | 90.6% | 117 ms | +Quantized image +classification models offer the smallest model size and fastest performance, at +the expense of accuracy. -^ The model files include both TF Lite FlatBuffer and Tensorflow frozen Graph. - -^^ The performance numbers are generated in the benchmark on Pixel-2 using -single thread large core. - -^^ Accuracy numbers were computed using the -[TFLite accuracy tool](../tools/accuracy/ilsvrc) . - -## Image classification (Quantized Models) - -Model Name | Paper_Model_Files | Model_Size | Top-1 Accuracy | Top-5 Accuracy | TF Lite Performance +Model name | Paper and model | Model size | Top-1 accuracy | Top-5 accuracy | TF Lite performance --------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------: | ---------: | -------------: | -------------: | ------------------: Mobilenet_V1_0.25_128_quant | [paper](https://arxiv.org/pdf/1712.05877.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_128_quant.tgz) | 0.5 Mb | 39.5% | 64.4% | 3.7 ms Mobilenet_V1_0.25_160_quant | [paper](https://arxiv.org/pdf/1712.05877.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_160_quant.tgz) | 0.5 Mb | 42.8% | 68.1% | 5.5 ms @@ -81,9 +45,104 @@ Inception_V2_quant | [paper](https://arxiv.org/abs/1512.00567), [tflite Inception_V3_quant | [paper](https://arxiv.org/abs/1806.08342),[tflite&pb](http://download.tensorflow.org/models/tflite_11_05_08/inception_v3_quant.tgz) | 23 Mb | 77.5% | 93.7% | 637 ms Inception_V4_quant | [paper](https://arxiv.org/abs/1602.07261), [tflite&pb](http://download.tensorflow.org/models/inception_v4_299_quant_20181026.tgz) | 41 Mb | 79.5% | 93.9% | 1250.8 ms -## Other models +Note: The model files include both TF Lite FlatBuffer and Tensorflow frozen +Graph. -Model | TF Lite FlatBuffer ------------------------ | :----------------: -[reference](https://research.googleblog.com/2017/11/on-device-conversational-modeling-with.html), -[tflite](https://storage.googleapis.com/download.tensorflow.org/models/smartreply_1.0_2017_11_01.zip) +Note: Performance numbers were benchmarked on Pixel-2 using single thread large +core. Accuracy numbers were computed using the +[TFLite accuracy tool](../tools/accuracy/ilsvrc.md). + +### Floating point models + +Floating point models offer the best accuracy, at the expense of model size and +performance. GPU acceleration requires the +use of floating point models. + +Model name | Paper and model | Model size | Top-1 accuracy | Top-5 accuracy | TF Lite performance | Tensorflow performance +--------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | ---------: | -------------: | -------------: | ------------------: | ---------------------: +DenseNet | [paper](https://arxiv.org/abs/1608.06993), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/densenet_2018_04_27.tgz) | 43.6 Mb | 64.2% | 85.6% | 894 ms | 1262 ms +SqueezeNet | [paper](https://arxiv.org/abs/1602.07360), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/squeezenet_2018_04_27.tgz) | 5.0 Mb | 49.0% | 72.9% | 224 ms | 255 ms +NASNet mobile | [paper](https://arxiv.org/abs/1707.07012), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/nasnet_mobile_2018_04_27.tgz) | 21.4 Mb | 73.9% | 91.5% | 261 ms | 389 ms +NASNet large | [paper](https://arxiv.org/abs/1707.07012), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/nasnet_large_2018_04_27.tgz) | 355.3 Mb | 82.6% | 96.1% | 6697 ms | 7940 ms +ResNet_V2_101 | [paper](https://arxiv.org/abs/1603.05027), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite_11_05_08/resnet_v2_101.tgz) | 178.3 Mb | 76.8% | 93.6% | 1880 ms | 1970 ms +Inception_V3 | [paper](http://arxiv.org/abs/1512.00567), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_v3_2018_04_27.tgz) | 95.3 Mb | 77.9% | 93.8% | 1433 ms | 1522 ms +Inception_V4 | [paper](http://arxiv.org/abs/1602.07261), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_v4_2018_04_27.tgz) | 170.7 Mb | 80.1% | 95.1% | 2986 ms | 3139 ms +Inception_ResNet_V2 | [paper](https://arxiv.org/abs/1602.07261), [tflite&pb](https://storage.googleapis.com/download.tensorflow.org/models/tflite/model_zoo/upload_20180427/inception_resnet_v2_2018_04_27.tgz) | 121.0 Mb | 77.5% | 94.0% | 2731 ms | 2926 ms +Mobilenet_V1_0.25_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_128.tgz) | 1.9 Mb | 41.4% | 66.2% | 6.2 ms | 13.0 ms +Mobilenet_V1_0.25_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_160.tgz) | 1.9 Mb | 45.4% | 70.2% | 8.6 ms | 19.5 ms +Mobilenet_V1_0.25_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_192.tgz) | 1.9 Mb | 47.1% | 72.0% | 12.1 ms | 27.8 ms +Mobilenet_V1_0.25_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_224.tgz) | 1.9 Mb | 49.7% | 74.1% | 16.2 ms | 37.3 ms +Mobilenet_V1_0.50_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_128.tgz) | 5.3 Mb | 56.2% | 79.3% | 18.1 ms | 29.9 ms +Mobilenet_V1_0.50_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_160.tgz) | 5.3 Mb | 59.0% | 81.8% | 26.8 ms | 45.9 ms +Mobilenet_V1_0.50_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_192.tgz) | 5.3 Mb | 61.7% | 83.5% | 35.6 ms | 65.3 ms +Mobilenet_V1_0.50_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_224.tgz) | 5.3 Mb | 63.2% | 84.9% | 47.6 ms | 164.2 ms +Mobilenet_V1_0.75_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_128.tgz) | 10.3 Mb | 62.0% | 83.8% | 34.6 ms | 48.7 ms +Mobilenet_V1_0.75_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_160.tgz) | 10.3 Mb | 65.2% | 85.9% | 51.3 ms | 75.2 ms +Mobilenet_V1_0.75_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_192.tgz) | 10.3 Mb | 67.1% | 87.2% | 71.7 ms | 107.0 ms +Mobilenet_V1_0.75_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.75_224.tgz) | 10.3 Mb | 68.3% | 88.1% | 95.7 ms | 143.4 ms +Mobilenet_V1_1.0_128 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_128.tgz) | 16.9 Mb | 65.2% | 85.7% | 57.4 ms | 76.8 ms +Mobilenet_V1_1.0_160 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_160.tgz) | 16.9 Mb | 68.0% | 87.7% | 86.0 ms | 117.7 ms +Mobilenet_V1_1.0_192 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_192.tgz) | 16.9 Mb | 69.9% | 89.1% | 118.6 ms | 167.3 ms +Mobilenet_V1_1.0_224 | [paper](https://arxiv.org/pdf/1704.04861.pdf), [tflite&pb](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz) | 16.9 Mb | 71.0% | 89.9% | 160.1 ms | 224.3 ms +Mobilenet_V2_1.0_224 | [paper](https://arxiv.org/pdf/1801.04381.pdf), [tflite&pb](http://download.tensorflow.org/models/tflite_11_05_08/mobilenet_v2_1.0_224.tgz) | 14.0 Mb | 71.8% | 90.6% | 117 ms | + +### AutoML mobile models + +The following image classification models were created using +Cloud AutoML. + +Model Name | Paper and model | Model size | Top-1 accuracy | Top-5 accuracy | TF Lite performance +---------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------: | ---------: | -------------: | -------------: | ------------------: +MnasNet_0.50_224 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_0.5_224_09_07_2018.tgz) | 8.5 Mb | 68.03% | 87.79% | 37 ms +MnasNet_0.75_224 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_0.75_224_09_07_2018.tgz) | 12 Mb | 71.72% | 90.17% | 61 ms +MnasNet_1.0_96 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_96_09_07_2018.tgz) | 17 Mb | 62.33% | 83.98% | 23 ms +MnasNet_1.0_128 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_128_09_07_2018.tgz) | 17 Mb | 67.32% | 87.70% | 34 ms +MnasNet_1.0_160 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_160_09_07_2018.tgz) | 17 Mb | 70.63% | 89.58% | 51 ms +MnasNet_1.0_192 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_192_09_07_2018.tgz) | 17 Mb | 72.56% | 90.76% | 70 ms +MnasNet_1.0_224 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.0_224_09_07_2018.tgz) | 17 Mb | 74.08% | 91.75% | 93 ms +MnasNet_1.3_224 | [paper](https://arxiv.org/abs/1807.11626), [tflite&pb](https://storage.cloud.google.com/download.tensorflow.org/models/tflite/mnasnet_1.3_224_09_07_2018.tgz) | 24 Mb | 75.24% | 92.55% | 152 ms + +Note: Performance numbers were benchmarked on Pixel-1 using single thread large +BIG core. + +## Object detection + +For more information about object detection, see +Object detection. + +The object detection model we currently host is +**coco_ssd_mobilenet_v1_1.0_quant_2018_06_29**. + +Download +model and labels + +## Pose estimation + +For more information about pose estimation, see +Pose estimation. + +The pose estimation model we currently host is +**multi_person_mobilenet_v1_075_float**. + +Download +model + +## Image segmentation + +For more information about image segmentation, see +Segmentation. + +The image segmentation model we currently host is **deeplabv3_257_mv_gpu**. + +Download +model + +## Smart reply + +For more information about smart reply, see +Smart reply. + +The smart reply model we currently host is **smartreply_1.0_2017_11_01**. + +Download +model diff --git a/tensorflow/lite/g3doc/models/image_classification/images/android_banana.png b/tensorflow/lite/g3doc/models/image_classification/images/android_banana.png new file mode 100644 index 00000000000..a25dffe3a07 Binary files /dev/null and b/tensorflow/lite/g3doc/models/image_classification/images/android_banana.png differ diff --git a/tensorflow/lite/g3doc/models/image_classification/overview.md b/tensorflow/lite/g3doc/models/image_classification/overview.md index 3ffaf33d359..1023236dcc8 100644 --- a/tensorflow/lite/g3doc/models/image_classification/overview.md +++ b/tensorflow/lite/g3doc/models/image_classification/overview.md @@ -1,34 +1,74 @@ # Image classification + -Use a pre-trained and optimized model to identify hundreds of classes of objects, including people, activities, animals, plants, and places. +Use a pre-trained and optimized model to identify hundreds of classes of +objects, including people, activities, animals, plants, and places. ## Get started -If you are unfamiliar with the concept of image classification, you should start by reading What is image classification? +If you are unfamiliar with the concept of image classification, you should start +by reading What is image +classification? -If you understand image classification, you’re new to TensorFlow Lite, and you’re working with Android or iOS, we recommend following the corresponding tutorial that will walk you through our sample code. +If you understand image classification, you’re new to TensorFlow Lite, and +you’re working with Android or iOS, we recommend following the corresponding +tutorial that will walk you through our sample code. Android iOS -If you are using a platform other than Android or iOS, or you are already familiar with the TensorFlow Lite APIs, you can download our starter image classification model and the accompanying labels. +We also provide example applications you can +use to get started. -Once you have the starter model running on your target device, you can experiment with different models to find the optimal balance between performance, accuracy, and model size. For guidance, see Choose a different model. +If you are using a platform other than Android or iOS, or you are already +familiar with the TensorFlow Lite APIs, you can +download our starter image classification model and the accompanying labels. +Download +starter model and labels -If you are using a platform other than Android or iOS, or you are already familiar with the TensorFlow Lite APIs, you can download our starter image classification model and the accompanying labels. +Once you have the starter model running on your target device, you can +experiment with different models to find the optimal balance between +performance, accuracy, and model size. For guidance, see +Choose a different model. -Download starter model and labels +If you are using a platform other than Android or iOS, or you are already +familiar with the TensorFlow Lite APIs, you can +download our starter image classification model and the accompanying labels. + +Download +starter model and labels + +### Example applications + +We have example applications for image classification for both Android and iOS. + +Android +example +iOS +example + +The following screenshot shows the Android image classification example: + +Screenshot of Android example ## What is image classification? -A common use of machine learning is to identify what an image represents. For example, we might want to know what type of animal appears in the following photograph. + +A common use of machine learning is to identify what an image represents. For +example, we might want to know what type of animal appears in the following +photograph. dog -The task of predicting what an image represents is called image classification. An image classification model is trained to recognize various classes of images. For example, a model might be trained to recognize photos representing three different types of animals: rabbits, hamsters, and dogs. +The task of predicting what an image represents is called _image +classification_. An image classification model is trained to recognize various +classes of images. For example, a model might be trained to recognize photos +representing three different types of animals: rabbits, hamsters, and dogs. -When we subsequently provide a new image as input to the model, it will output the probabilities of the image representing each of the types of animal it was trained on. An example output might be as follows: +When we subsequently provide a new image as input to the model, it will output +the probabilities of the image representing each of the types of animal it was +trained on. An example output might be as follows: @@ -53,49 +93,40 @@ When we subsequently provide a new image as input to the model, it will output t
-Based on the output, we can see that the classification model has predicted that the image has a high probability of representing a dog. +Based on the output, we can see that the classification model has predicted that +the image has a high probability of representing a dog. -Note: Image classification can only tell you the probability that an image represents one or more of the classes that the model was trained on. It cannot tell you the position or identity of objects within the image. If you need to identify objects and their positions within images, you should use an object detection model. +Note: Image classification can only tell you the probability that an image +represents one or more of the classes that the model was trained on. It cannot +tell you the position or identity of objects within the image. If you need to +identify objects and their positions within images, you should use an +object detection model. ### Training, labels, and inference -During training, an image classification model is fed images and their associated labels. Each label is the name of a distinct concept, or class, that the model will learn to recognize. Here are some examples of labels and training data for our hypothetical model that classifies animal photos: +During training, an image classification model is fed images and their +associated _labels_. Each label is the name of a distinct concept, or class, +that the model will learn to recognize. - - - - - - - - - - - - - - - - - - - - - -
LabelTraining data
rabbit[three different images of rabbits]
hamster[three different images of hamsters]
dog[three different images of dogs]
+Given sufficient training data (often hundreds or thousands of images per +label), an image classification model can learn to predict whether new images +belong to any of the classes it has been trained on. This process of prediction +is called _inference_. -Given sufficient training data (often hundreds or thousands of images per label), an image classification model can learn to predict whether new images belong to any of the classes it has been trained on. This process of prediction is called inference. - -To perform inference, an image is passed as input to a model. The model will then output an array of probabilities between 0 and 1. With our example model, this process might look like the following: +To perform inference, an image is passed as input to a model. The model will +then output an array of probabilities between 0 and 1. With our example model, +this process might look like the following: - - + +
dog[0.07, 0.02, 0.91][0.07, 0.02, 0.91]
-Each number in the output corresponds to a label in our training data. Associating our output with the three labels the model was trained on, we can see the model has predicted a high probability that the image represents a dog. +Each number in the output corresponds to a label in our training data. +Associating our output with the three labels the model was trained on, we can +see the model has predicted a high probability that the image represents a dog. @@ -120,11 +151,18 @@ Each number in the output corresponds to a label in our training data. Associati
-You might notice that the sum of all the probabilities (for rabbit, hamster, and dog) is equal to 1. This is a common type of output for models with multiple classes (see Softmax for more information). +You might notice that the sum of all the probabilities (for rabbit, hamster, and +dog) is equal to 1. This is a common type of output for models with multiple +classes (see +Softmax +for more information). ### Ambiguous results -Since the probabilities will always sum to 1, if the image is not confidently recognized as belonging to any of the classes the model was trained on you may see the probability distributed throughout the labels without any one value being significantly larger. +Since the probabilities will always sum to 1, if the image is not confidently +recognized as belonging to any of the classes the model was trained on you may +see the probability distributed throughout the labels without any one value +being significantly larger. For example, the following might indicate an ambiguous result: @@ -153,9 +191,15 @@ For example, the following might indicate an ambiguous result: ### Uses and limitations -The image classification models that we provide are useful for single-label classification, which means predicting which single label the image is most likely to represent. They are trained to recognize 1000 classes of image. For a full list of classes, see the labels file. +The image classification models that we provide are useful for single-label +classification, which means predicting which single label the image is most +likely to represent. They are trained to recognize 1000 classes of image. For a +full list of classes, see the labels file in the +model +zip. -If you want to train a model to recognize new classes, see Customize model. +If you want to train a model to recognize new classes, see +Customize model. For the following use cases, you should use a different type of model: @@ -164,48 +208,78 @@ For the following use cases, you should use a different type of model:
  • Predicting the composition of an image, for example subject versus background (see segmentation)
  • -Once you have the starter model running on your target device, you can experiment with different models to find the optimal balance between performance, accuracy, and model size. For guidance, see Choose a different model. +Once you have the starter model running on your target device, you can +experiment with different models to find the optimal balance between +performance, accuracy, and model size. For guidance, see +Choose a different model. ## Choose a different model -There are a large number of image classification models available on our List of hosted models. You should aim to choose the optimal model for your application based on performance, accuracy and model size. There are trade-offs between each of them. +There are a large number of image classification models available on our +List of hosted models. You should aim to choose the +optimal model for your application based on performance, accuracy and model +size. There are trade-offs between each of them. ### Performance -We measure performance in terms of the amount of time it takes for a model to run inference on a given piece of hardware. The less time, the faster the model. +We measure performance in terms of the amount of time it takes for a model to +run inference on a given piece of hardware. The less time, the faster the model. -The performance you require depends on your application. Performance can be important for applications like real-time video, where it may be important to analyze each frame in the time before the next frame is drawn (e.g. inference must be faster than 33ms to perform real-time inference on a 30fps video stream). +The performance you require depends on your application. Performance can be +important for applications like real-time video, where it may be important to +analyze each frame in the time before the next frame is drawn (e.g. inference +must be faster than 33ms to perform real-time inference on a 30fps video +stream). Our quantized Mobilenet models’ performance ranges from 3.7ms to 80.3 ms. ### Accuracy -We measure accuracy in terms of how often the model correctly classifies an image. For example, a model with a stated accuracy of 60% can be expected to classify an image correctly an average of 60% of the time. -Our List of hosted models provides Top-1 and Top-5 accuracy statistics. Top-1 refers to how often the correct label appears as the label with the highest probability in the model’s output. Top-5 refers to how often the correct label appears in the top 5 highest probabilities in the model’s output. +We measure accuracy in terms of how often the model correctly classifies an +image. For example, a model with a stated accuracy of 60% can be expected to +classify an image correctly an average of 60% of the time. + +Our List of hosted models provides Top-1 and Top-5 +accuracy statistics. Top-1 refers to how often the correct label appears as the +label with the highest probability in the model’s output. Top-5 refers to how +often the correct label appears in the top 5 highest probabilities in the +model’s output. Our quantized Mobilenet models’ Top-5 accuracy ranges from 64.4 to 89.9%. ### Size -The size of a model on-disk varies with its performance and accuracy. Size may be important for mobile development (where it might impact app download sizes) or when working with hardware (where available storage might be limited). + +The size of a model on-disk varies with its performance and accuracy. Size may +be important for mobile development (where it might impact app download sizes) +or when working with hardware (where available storage might be limited). Our quantized Mobilenet models’ size ranges from 0.5 to 3.4 Mb. ### Architecture -There are several different architectures of models available on List of hosted models, indicated by the model’s name. For example, you can choose between Mobilenet, Inception, and others. -The architecture of a model impacts its performance, accuracy, and size. All of our hosted models are trained on the same data, meaning you can use the provided statistics to compare them and choose which is optimal for your application. +There are several different architectures of models available on +List of hosted models, indicated by the model’s name. +For example, you can choose between Mobilenet, Inception, and others. -Note: The image classification models we provide accept varying sizes of input. For some models, this is indicated in the filename. For example, the Mobilenet_V1_1.0_224 model accepts an input of 224x224 pixels.

    All of the models require three color channels per pixel (red, green, and blue). Quantized models require 1 byte per channel, and float models require 4 bytes per channel.

    Our Android and iOS code samples demonstrate how to process full-sized camera images into the required format for each model. +The architecture of a model impacts its performance, accuracy, and size. All of +our hosted models are trained on the same data, meaning you can use the provided +statistics to compare them and choose which is optimal for your application. + +Note: The image classification models we provide accept varying sizes of input. For some models, this is indicated in the filename. For example, the Mobilenet_V1_1.0_224 model accepts an input of 224x224 pixels.

    All of the models require three color channels per pixel (red, green, and blue). Quantized models require 1 byte per channel, and float models require 4 bytes per channel.

    Our Android and iOS code samples demonstrate how to process full-sized camera images into the required format for each model. ## Customize model -The pre-trained models we provide are trained to recognize 1000 classes of image. For a full list of classes, see the labels file. -You can use a technique known as transfer learning to re-train a model to recognize classes not in the original set. For example, you could re-train the model to distinguish between different species of tree, despite there being no trees in the original training data. To do this, you will need a set of training images for each of the new labels you wish to train. +The pre-trained models we provide are trained to recognize 1000 classes of +image. For a full list of classes, see the labels file in the +model +zip. -Learn how to perform transfer learning in the TensorFlow for Poets codelab. +You can use a technique known as _transfer learning_ to re-train a model to +recognize classes not in the original set. For example, you could re-train the +model to distinguish between different species of tree, despite there being no +trees in the original training data. To do this, you will need a set of training +images for each of the new labels you wish to train. -## Read more about this - +Learn how to perform transfer learning in the +TensorFlow +for Poets codelab. diff --git a/tensorflow/lite/g3doc/models/object_detection/images/android_apple_banana.png b/tensorflow/lite/g3doc/models/object_detection/images/android_apple_banana.png new file mode 100644 index 00000000000..f7a9fe5af89 Binary files /dev/null and b/tensorflow/lite/g3doc/models/object_detection/images/android_apple_banana.png differ diff --git a/tensorflow/lite/g3doc/models/object_detection/images/false_positive.png b/tensorflow/lite/g3doc/models/object_detection/images/false_positive.png new file mode 100644 index 00000000000..39d2103a3a8 Binary files /dev/null and b/tensorflow/lite/g3doc/models/object_detection/images/false_positive.png differ diff --git a/tensorflow/lite/g3doc/models/object_detection/overview.md b/tensorflow/lite/g3doc/models/object_detection/overview.md index 4f62d017bfd..a0295d02984 100644 --- a/tensorflow/lite/g3doc/models/object_detection/overview.md +++ b/tensorflow/lite/g3doc/models/object_detection/overview.md @@ -1,30 +1,59 @@ # Object detection + -Detect multiple objects with bounding boxes. Yes, dogs and cats too. +Detect multiple objects within an image, with bounding boxes. Recognize 80 +different classes of objects. -Download starter model and labels +## Get started -## Tutorials (coming soon) -iOS -Android +If you are new to TensorFlow Lite and are working with Android or iOS, we +recommend exploring the following example applications that can help you get +started. + +Android +example +iOS +example + +If you are using a platform other than Android or iOS, or you are already +familiar with the TensorFlow Lite APIs, you can +download our starter object detection model and the accompanying labels. + +Download +starter model and labels + +For more information about the starter model, see +Starter model. ## What is object detection? -Given an image or a video stream, an object detection model can identify which of a known set of objects might be present and provide information about their positions within the image. - -For example, this screenshot of our object detection sample app shows how several objects have been recognized and their positions annotated: +Given an image or a video stream, an object detection model can identify which +of a known set of objects might be present and provide information about their +positions within the image. +For example, this screenshot of our example +application shows how two objects have been recognized and their positions +annotated: - -TODO: Insert image +Screenshot of Android example -An object detection model is trained to detect the presence and location of multiple classes of objects. For example, a model might be trained with images that contain various pieces of computer hardware, along with a label that specifies the class of hardware they represent (e.g. a laptop, a keyboard, or a monitor), and data specifying where each object appears in the image. +An object detection model is trained to detect the presence and location of +multiple classes of objects. For example, a model might be trained with images +that contain various pieces of fruit, along with a _label_ that specifies the +class of fruit they represent (e.g. an apple, a banana, or a strawberry), and +data specifying where each object appears in the image. -When we subsequently provide an image to the model, it will output a list of the objects it detects, the location of a bounding box that contains each object, and a score that indicates the confidence that detection was correct. +When we subsequently provide an image to the model, it will output a list of the +objects it detects, the location of a bounding box that contains each object, +and a score that indicates the confidence that detection was correct. ### Model output +Imagine a model has been trained to detect apples, bananas, and strawberries. +When we pass it an image, it will output a set number of detection results - in +this example, 5. + @@ -35,27 +64,27 @@ When we subsequently provide an image to the model, it will output a list of the - + - + - + - + - + @@ -64,9 +93,16 @@ When we subsequently provide an image to the model, it will output a list of the ### Confidence score -To interpret these results, we can look at the score and the location for each detected object. The score is a number between 0 and 1 that indicates confidence that the object was genuinely detected. The closer the number is to 1, the more confident the model is. +To interpret these results, we can look at the score and the location for each +detected object. The score is a number between 0 and 1 that indicates confidence +that the object was genuinely detected. The closer the number is to 1, the more +confident the model is. -Depending on your application, you can decide a cut-off threshold below which you will discard detection results. For our example, we might decide a sensible cut-off is a score of 0.5 (meaning a 50% probability that the detection is valid). In that case, we would ignore the last two objects in the array, because those confidence scores are below 0.5: +Depending on your application, you can decide a cut-off threshold below which +you will discard detection results. For our example, we might decide a sensible +cut-off is a score of 0.5 (meaning a 50% probability that the detection is +valid). In that case, we would ignore the last two objects in the array, because +those confidence scores are below 0.5:
    LaptopApple 0.92 [18, 21, 57, 63]
    KeyboardBanana 0.88 [100, 30, 180, 150]
    MonitorStrawberry 0.87 [7, 82, 89, 163]
    KeyboardBanana 0.23 [42, 66, 57, 83]
    MonitorApple 0.11 [6, 42, 31, 58]
    @@ -78,41 +114,51 @@ Depending on your application, you can decide a cut-off threshold below which yo - + - + - + - + - +
    LaptopApple 0.92 [18, 21, 57, 63]
    KeyboardBanana 0.88 [100, 30, 180, 150]
    MonitorStrawberry 0.87 [7, 82, 89, 163]
    KeyboardBanana 0.23 [42, 66, 57, 83]
    MonitorApple 0.11 [6, 42, 31, 58]
    -The cut-off you use should be based on whether you are more comfortable with false positives (objects that are wrongly identified, or areas of the image that are erroneously identified as objects when they are not), or false negatives (genuine objects that are missed because their confidence was low). +The cut-off you use should be based on whether you are more comfortable with +false positives (objects that are wrongly identified, or areas of the image that +are erroneously identified as objects when they are not), or false negatives +(genuine objects that are missed because their confidence was low). - -TODO: Insert screenshot showing both +For example, in the following image, a pear (which is not an object that the +model was trained to detect) was misidentified as a "person". This is an example +of a false positive that could be ignored by selecting an appropriate cut-off. +In this case, a cut-off of 0.6 (or 60%) would comfortably exclude the false +positive. + +Screenshot of Android example showing a false positive ### Location -For each detected object, the model will return an array of four numbers representing a bounding rectangle that surrounds its position. The numbers are ordered as follows: +For each detected object, the model will return an array of four numbers +representing a bounding rectangle that surrounds its position. For the starter +model we provide, the numbers are ordered as follows: @@ -127,49 +173,52 @@ For each detected object, the model will return an array of four numbers represe
    -The top value represents the distance of the rectangle’s top edge from the top of the image, in pixels. The left value represents the left edge’s distance from the left of the input image. The other values represent the bottom and right edges in a similar manner. +The top value represents the distance of the rectangle’s top edge from the top +of the image, in pixels. The left value represents the left edge’s distance from +the left of the input image. The other values represent the bottom and right +edges in a similar manner. - -Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our sample code).

    The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly. +Note: Object detection models accept input images of a specific size. This is likely to be different from the size of the raw image captured by your device’s camera, and you will have to write code to crop and scale your raw image to fit the model’s input size (there are examples of this in our example applications).

    The pixel values output by the model refer to the position in the cropped and scaled image, so you must scale them to fit the raw image in order to interpret them correctly. +## Starter model + +We recommend starting with this pre-trained quantized COCO SSD MobileNet v1 +model. + +Download +starter model and labels ### Uses and limitations - -The object detection model we provide can identify and locate up to 10 objects in an image. It is trained to recognize 80 classes of object. For a full list of classes, see the labels file in the model zip. +The object detection model we provide can identify and locate up to 10 objects +in an image. It is trained to recognize 80 classes of object. For a full list of +classes, see the labels file in the +model +zip. -If you want to train a model to recognize new classes, see Customize model. +If you want to train a model to recognize new classes, see +Customize model. For the following use cases, you should use a different type of model: -Get started -If you are new to TensorFlow Lite and are working with Android or iOS, we recommend following the corresponding tutorial that will walk you through our sample code. - - -iOS -Android - -If you are using a platform other than Android or iOS, or you are already familiar with the TensorFlow Lite APIs, you can download our starter object detection model and the accompanying labels. - -Download starter model and labels - -The model will return 10 detection results... - -## Starter model -We recommend starting to implement object detection using the quantized COCO SSD MobileNet v1 model, available with labels from this download link: - -Download starter model and labels - ### Input -The model takes an image as input. The expected image is 300x300 pixels, with three channels (red, blue, and green) per pixel. This should be fed to the model as a flattened buffer of 270,000 byte values (300x300x3). Since the model is quantized, each value should be a single byte representing a value between 0 and 255. + +The model takes an image as input. The expected image is 300x300 pixels, with +three channels (red, blue, and green) per pixel. This should be fed to the model +as a flattened buffer of 270,000 byte values (300x300x3). Since the model is +quantized, each +value should be a single byte representing a value between 0 and 255. ### Output -The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2 describe 10 detected objects, with one element in each array corresponding to each object. There will always be 10 objects detected. + +The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2 +describe 10 detected objects, with one element in each array corresponding to +each object. There will always be 10 objects detected. @@ -205,16 +254,17 @@ The model outputs four arrays, mapped to the indices 0-4. Arrays 0, 1, and 2 des ## Customize model - -The pre-trained models we provide are trained to detect 80 classes of object. For a full list of classes, see the labels file in the model zip. +The pre-trained models we provide are trained to detect 80 classes of object. +For a full list of classes, see the labels file in the +model +zip. -You can use a technique known as transfer learning to re-train a model to recognize classes not in the original set. For example, you could re-train the model to detect multiple types of vegetable, despite there only being one vegetable in the original training data. To do this, you will need a set of training images for each of the new labels you wish to train. +You can use a technique known as transfer learning to re-train a model to +recognize classes not in the original set. For example, you could re-train the +model to detect multiple types of vegetable, despite there only being one +vegetable in the original training data. To do this, you will need a set of +training images for each of the new labels you wish to train. -Learn how to perform transfer learning in the Training and serving a real-time mobile object detector in 30 minutes blog post. - - -Read more about this - +Learn how to perform transfer learning in +Training +and serving a real-time mobile object detector in 30 minutes. diff --git a/tensorflow/lite/g3doc/models/pose_estimation/overview.md b/tensorflow/lite/g3doc/models/pose_estimation/overview.md index f19a5a10edb..981a2553f70 100644 --- a/tensorflow/lite/g3doc/models/pose_estimation/overview.md +++ b/tensorflow/lite/g3doc/models/pose_estimation/overview.md @@ -1,20 +1,31 @@ # Pose estimation + -PoseNet is a vision model that can be used to estimate the pose of a person in an image/video by estimating where key body joints are. +## Get started -Download starter model +_PoseNet_ is a vision model that can be used to estimate the pose of a person in +an image or video by estimating where key body joints are. -## Tutorials (coming soon) -iOS -Android +Download +starter model + +Android and iOS end-to-end tutorials are coming soon. In the meantime, if you +want to experiment this on a web browser, check out the +TensorFlow.js +GitHub repository. ## How it works -Pose estimation refers to computer vision techniques that detect human figures in images and videos, so that one could determine, for example, where someone’s elbow shows up in an image. -To be clear, this technology is not recognizing who is in an image — there is no personal identifiable information associated to pose detection. The algorithm is simply estimating where key body joints are. +Pose estimation refers to computer vision techniques that detect human figures +in images and videos, so that one could determine, for example, where someone’s +elbow shows up in an image. -The key points detected are indexed by part id with a confidence score between 0.0 and 1.0; 1.0 being the highest. +To be clear, this technology is not recognizing who is in an image. The +algorithm is simply estimating where key body joints are. + +The key points detected are indexed by "Part ID", with a confidence score +between 0.0 and 1.0, 1.0 being the highest.
    @@ -96,33 +107,47 @@ The key points detected are indexed by part id with a confidence score between 0
    ## Example output - - -## Get started -Android and iOS end-to-end tutorials are coming soon. In the meantime, if you want to experiment this on a web browser, check out the TensorFlow.js GitHub repository. +Animation showing pose estimation ## How it performs -Performance varies based on your device and output stride (heatmaps and offset vectors). The PoseNet model is image size invariant, which means it can predict pose positions in the same scale as the original image regardless of whether the image is downscaled. This means PoseNet can be configured to have a higher accuracy at the expense of performance. -The output stride determines how much we’re scaling down the output relative to the input image size. It affects the size of the layers and the model outputs. The higher the output stride, the smaller the resolution of layers in the network and the outputs, and correspondingly their accuracy. In this implementation, the output stride can have values of 8, 16, or 32. In other words, an output stride of 32 will result in the fastest performance but lowest accuracy, while 8 will result in the highest accuracy but slowest performance. We recommend starting with 16. +Performance varies based on your device and output stride (heatmaps and offset +vectors). The PoseNet model is image size invariant, which means it can predict +pose positions in the same scale as the original image regardless of whether the +image is downscaled. This means PoseNet can be configured to have a higher +accuracy at the expense of performance. - -The output stride determines how much we’re scaling down the output relative to the input image size. A higher output stride is faster but results in lower accuracy. +The output stride determines how much we’re scaling down the output relative to +the input image size. It affects the size of the layers and the model outputs. +The higher the output stride, the smaller the resolution of layers in the +network and the outputs, and correspondingly their accuracy. In this +implementation, the output stride can have values of 8, 16, or 32. In other +words, an output stride of 32 will result in the fastest performance but lowest +accuracy, while 8 will result in the highest accuracy but slowest performance. +We recommend starting with 16. + +The following image shows how the output stride determines how much we’re +scaling down the output relative to the input image size. A higher output stride +is faster but results in lower accuracy. + +Output stride and heatmap resolution + +## Read more about pose estimation -## Read more about this -## Users +### Use cases + diff --git a/tensorflow/lite/g3doc/models/segmentation/overview.md b/tensorflow/lite/g3doc/models/segmentation/overview.md index a1f1cf1aa69..0bd268ada1f 100644 --- a/tensorflow/lite/g3doc/models/segmentation/overview.md +++ b/tensorflow/lite/g3doc/models/segmentation/overview.md @@ -1,18 +1,26 @@ -# Segmentation (GPU) +# Segmentation + -DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. +## Get started -Download starter model +_DeepLab_ is a state-of-art deep learning model for semantic image segmentation, +where the goal is to assign semantic labels (e.g. person, dog, cat) to every +pixel in the input image. -## Tutorials (coming soon) -iOS -Android +Download +starter model ## How it works -It all started with classification where the model predicts an entire input. With advances in data, hardware, and software, object detection can infer objects with spatial location. Semantic segmentation offers the highest level of granularity with labels at a pixel level. -Current implementation includes the following features: +Semantic image segmentation predicts whether each pixel of an image is +associated with a certain class. This is in contrast to +object detection, which detects +objects in rectangular regions, and +image classification, which +classifies the overall image. + +The current implementation includes the following features:
    1. DeepLabv1: We use atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks.
    2. DeepLabv2: We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields-of-views.
    3. @@ -21,12 +29,15 @@ Current implementation includes the following features:
    ## Example output -The model will create a mask over the target objects with high accuracy. - -## Read more about this +The model will create a mask over the target objects with high accuracy. + +Animation showing image segmentation + +## Read more about segmentation + diff --git a/tensorflow/lite/g3doc/models/smart_reply/overview.md b/tensorflow/lite/g3doc/models/smart_reply/overview.md index c35a5f26425..20c359ec9ff 100644 --- a/tensorflow/lite/g3doc/models/smart_reply/overview.md +++ b/tensorflow/lite/g3doc/models/smart_reply/overview.md @@ -1,37 +1,49 @@ # Smart reply + -Smart replies are contextually relevant, one-touch responses that help the user to reply to an incoming text message (or email) efficiently and effortlessly. +## Get started -Download starter model and labels +Our smart reply model generates reply suggestions based on chat messages. The +suggestions are intended to be contextually relevant, one-touch responses that +help the user to easily reply to an incoming message. -## Tutorials (coming soon) -iOS -Android +Download +starter model and labels + +### Sample application + +We have provided a pre-built APK that demonstrates the smart reply model on +Android. + +Go to the +GitHub +page for instructions and list of supported ops and functionalities. ## How it works -The model generates reply suggestions to input conversational chat messages with an efficient inference that can be easily be plugged in to your chat application to power on-device conversational intelligence. + +The model generates reply suggestions to conversational chat messages. The on-device model comes with several benefits. It is: ## Example output - -## How to use this model? -We have provided a pre-built demo APK that you can download, install, and test on your phone. Go to the GitHub page for instructions and list of support ops and functionalities. +Animation showing smart reply ## Read more about this + ## Users +