Added documentation for TFLite text_classification and bert_qa model

PiperOrigin-RevId: 276590063
Change-Id: Id29cf4d7be49404352455826cca2c06b6d92dafb
This commit is contained in:
Khanh LeViet 2019-10-24 16:26:08 -07:00 committed by TensorFlower Gardener
parent b74e8fb4a5
commit eb0d31e7b7
4 changed files with 148 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 625 KiB

View File

@ -0,0 +1,83 @@
# Question and Answer
Use a pre-trained model to answer questions based on the content of a given
passage.
## Get started
<img src="images/screenshot.gif" class="attempt-right" style="max-width: 300px">
If you are new to TensorFlow Lite and are working with Android, we recommend
exploring the following example applications that can help you get started.
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/bert_qa/android">Android
example</a>
If you are using a platform other than Android, or you are already familiar with
the [TensorFlow Lite APIs](https://www.tensorflow.org/api_docs/python/tf/lite),
you can download our starter question and answer model.
<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/bert_qa/mobilebert_qa_vocab.zip">Download
starter model and vocab</a>
## How it works
The model can be used to build a system that can answer users questions in
natural language. It was created using a pre-trained BERT model fine-tuned on
SQuAD 1.1 dataset.
[BERT](https://github.com/google-research/bert), or Bidirectional Encoder
Representations from Transformers, is a method of pre-training language
representations which obtains state-of-the-art results on a wide array of
Natural Language Processing tasks.
This app uses a compressed version of BERT, MobileBERT, that runs 4x faster and
has 4x smaller model size.
[SQuAD](https://rajpurkar.github.io/SQuAD-explorer/), or Stanford Question
Answering Dataset, is a reading comprehension dataset consisting of articles
from Wikipedia and a set of question-answer pairs for each article.
The model takes a passage and a question as input, then returns a segment of the
passage that most likely answers the question. It requires semi-complex
pre-processing including tokenization and post-processing steps that are
described in the BERT [paper](https://arxiv.org/abs/1810.04805) and implemented
in the sample app.
## Example output
### Passage (Input)
> Google LLC is an American multinational technology company that specializes in
> Internet-related services and products, which include online advertising
> technologies, search engine, cloud computing, software, and hardware. It is
> considered one of the Big Four technology companies, alongside Amazon, Apple,
> and Facebook.
>
> Google was founded in September 1998 by Larry Page and Sergey Brin while they
> were Ph.D. students at Stanford University in California. Together they own
> about 14 percent of its shares and control 56 percent of the stockholder
> voting power through supervoting stock. They incorporated Google as a
> California privately held company on September 4, 1998, in California. Google
> was then reincorporated in Delaware on October 22, 2002. An initial public
> offering (IPO) took place on August 19, 2004, and Google moved to its
> headquarters in Mountain View, California, nicknamed the Googleplex. In August
> 2015, Google announced plans to reorganize its various interests as a
> conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and
> will continue to be the umbrella company for Alphabet's Internet interests.
> Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the
> CEO of Alphabet.
### Question (Input)
> Who is the CEO of Google?
### Answer (Output)
> Sundar Pichai
## Read more about BERT
* Academic paper: [BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding](https://arxiv.org/abs/1810.04805)
* [Open-source implementation of BERT](https://github.com/google-research/bert)

Binary file not shown.

After

Width:  |  Height:  |  Size: 305 KiB

View File

@ -0,0 +1,65 @@
# Text Classification
Use a pre-trained model to category a paragraph into predefined groups.
## Get started
<img src="images/screenshot.png" class="attempt-right" style="max-width: 300px">
If you are new to TensorFlow Lite and are working with Android, we recommend
exploring the following example applications that can help you get started.
<a class="button button-primary" href="https://github.com/tensorflow/examples/tree/master/lite/examples/text_classification/android">Android
example</a>
If you are using a platform other than Android, or you are already familiar with
the TensorFlow Lite APIs, you can download our starter text classification
model.
<a class="button button-primary" href="https://storage.googleapis.com/download.tensorflow.org/models/tflite/text_classification/text_classification.tflite">Download
starter model</a>
## How it works
Text classification categorizes a paragraph into predefined groups based on its
content.
This pretrained model predicts if a paragraph's sentiment is positive or
negative. It was trained on
[Large Movie Review Dataset v1.0](http://ai.stanford.edu/~amaas/data/sentiment/)
from Mass et al, which consists of IMDB movie reviews labeled as either positive
or negative.
Here are the steps to classify a paragraph with the model:
1. Tokenize the paragraph and convert it to a list of word ids using a
predefined vocabulary.
1. Feed the list to the TensorFlow Lite model.
1. Get the probability of the paragraph being positive or negative from the
model outputs.
### Note
* Only English is supported.
* This model was trained on movie reviews dataset so you may experience
reduced accuracy when classifying text of other domains.
## Example output
| Text | Negative (0) | Positive (1) |
| ------------------------------------------ | ------------ | ------------ |
| This is the best movie Ive seen in recent | 25.3% | 74.7% |
: years. Strongly recommend it! : : :
| What a waste of my time. | 72.5% | 27.5% |
## Use your training dataset
Follow this
[tutorial](https://github.com/tensorflow/examples/tree/master/lite/examples/model_customization/demo/image_classification.ipynb)
to apply the same technique used here to train a text classification model using
your own datasets. With the right dataset, you can create a model for use cases
such as document categorization or toxic comments detection.
## Read more about text classification
* [Word embeddings and tutorial to train this model](https://www.tensorflow.org/tutorials/text/word_embeddings)