Update estimators section for getting started.

PiperOrigin-RevId: 204071680
This commit is contained in:
Billy Lamberta 2018-07-10 23:00:44 -07:00 committed by TensorFlower Gardener
parent 249f124c1c
commit edc511fd69
9 changed files with 34 additions and 726 deletions

View File

@ -561,9 +561,9 @@ For more examples on feature columns, view the following:
* The @{$low_level_intro#feature_columns$Low Level Introduction} demonstrates how
experiment directly with `feature_columns` using TensorFlow's low level APIs.
* The @{$wide$wide} and @{$wide_and_deep$Wide & Deep} Tutorials solve a
binary classification problem using `feature_columns` on a variety of input
data types.
* The [Estimator wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep)
solves a binary classification problem using `feature_columns` on a variety of
input data types.
To learn more about embeddings, see the following:

View File

@ -248,7 +248,8 @@ The images below show the CIFAR-10 model with tensor shape information:
Often it is useful to collect runtime metadata for a run, such as total memory
usage, total compute time, and tensor shapes for nodes. The code example below
is a snippet from the train and test section of a modification of the
@{$layers$simple MNIST tutorial}, in which we have recorded summaries and
[Estimators MNIST tutorial](../tutorials/estimators/cnn.md), in which we have
recorded summaries and
runtime statistics. See the
@{$summaries_and_tensorboard#serializing-the-data$Summaries Tutorial}
for details on how to record summaries.

View File

@ -170,15 +170,16 @@ landing_page:
<div class="devsite-landing-row-item-description-content">
<p>
Estimators can train large models on multiple machines in a
production environment. Read the
<a href="/guide/estimators">Estimators guide</a> for details.
production environment. TensorFlow provides a collection of
pre-made Estimators to implement common ML algorithms. See the
<a href="/guide/estimators">Estimators guide</a>.
</p>
<ol style="padding-left: 20px;">
<li><a href="/tutorials/images/layers">Build a Convolutional Neural Network using Estimators</a></li>
<li><a href="/guide/premade_estimators">Premade Estimators guide</a></li>
<li><a href="https://github.com/tensorflow/models/tree/master/official/wide_deep" class="external">Wide and deep learning with Estimators</a></li>
<li><a href="https://github.com/tensorflow/models/tree/master/official/boosted_trees" class="external">Boosted trees</a></li>
<li><a href="/hub/tutorials/text_classification_with_tf_hub">How to build a simple text classifier with TF-Hub</a></li>
<li><a href="https://github.com/tensorflow/models/tree/master/official/boosted_trees">Classifying Higgs boson processes</a></li>
<li><a href="/tutorials/representation/wide_and_deep">Wide and deep learning using Estimators</a></li>
<li><a href="/tutorials/representation/linear">Large-scale linear models</a></li>
<li><a href="/tutorials/estimators/cnn">Build a Convolutional Neural Network using Estimators</a></li>
</ol>
</div>
<div class="devsite-landing-row-item-buttons">

View File

@ -37,15 +37,27 @@ toc:
status: external
- title: "Custom training: walkthrough"
path: /tutorials/eager/custom_training_walkthrough
- title: Neural machine translation
- title: Translation with attention
path: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb
status: external
- title: ML at production scale
style: accordion
section:
- title: Wide and deep learning
path: https://github.com/tensorflow/models/tree/master/official/wide_deep
status: external
- title: Boosted trees
path: https://github.com/tensorflow/models/tree/master/official/boosted_trees
status: external
- title: Text classifier with TF-Hub
path: /hub/tutorials/text_classification_with_tf_hub
- title: Build a CNN using Estimators
path: /tutorials/estimators/cnn
- title: Images
style: accordion
section:
- title: Build a CNN using Estimators
path: /tutorials/images/layers
- title: Image recognition
path: /tutorials/images/image_recognition
- title: Image retraining
@ -69,10 +81,6 @@ toc:
- title: Data representation
style: accordion
section:
- title: Linear models
path: /tutorials/representation/wide
- title: Wide and deep learning
path: /tutorials/representation/wide_and_deep
- title: Vector representations of words
path: /tutorials/representation/word2vec
- title: Kernel methods

View File

@ -449,7 +449,7 @@ covering them.
To find out more about implementing convolutional neural networks, you can jump
to the TensorFlow @{$deep_cnn$deep convolutional networks tutorial},
or start a bit more gently with our @{$layers$MNIST starter tutorial}.
or start a bit more gently with our [Estimator MNIST tutorial](../estimators/cnn.md).
Finally, if you want to get up to speed on research in this area, you can
read the recent work of all the papers referenced in this tutorial.

View File

@ -11,8 +11,9 @@ those tools. It explains:
deep learning to get the advantages of both.
Read this overview to decide whether the Estimator's linear model tools might
be useful to you. Then do the @{$wide$Linear Models tutorial} to
give it a try. This overview uses code samples from the tutorial, but the
be useful to you. Then work through the
[Estimator wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep)
to give it a try. This overview uses code samples from the tutorial, but the
tutorial walks through the code in greater detail.
To understand this overview it will help to have some familiarity
@ -176,7 +177,7 @@ the name of a `FeatureColumn`. Each key's value is a tensor containing the
values of that feature for all data instances. See
@{$premade_estimators#input_fn} for a
more comprehensive look at input functions, and `input_fn` in the
[linear models tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py)
[wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep)
for an example implementation of an input function.
The input function is passed to the `train()` and `evaluate()` calls that
@ -234,4 +235,5 @@ e = tf.estimator.DNNLinearCombinedClassifier(
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100, 50])
```
For more information, see the @{$wide_and_deep$Wide and Deep Learning tutorial}.
For more information, see the
[wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep).

View File

@ -1,461 +0,0 @@
# TensorFlow Linear Model Tutorial
In this tutorial, we will use the tf.estimator API in TensorFlow to solve a
binary classification problem: Given census data about a person such as age,
education, marital status, and occupation (the features), we will try to predict
whether or not the person earns more than 50,000 dollars a year (the target
label). We will train a **logistic regression** model, and given an individual's
information our model will output a number between 0 and 1, which can be
interpreted as the probability that the individual has an annual income of over
50,000 dollars.
## Setup
To try the code for this tutorial:
1. @{$install$Install TensorFlow} if you haven't already.
2. Download [the tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/).
3. Execute the data download script we provide to you:
$ python data_download.py
4. Execute the tutorial code with the following command to train the linear
model described in this tutorial:
$ python wide_deep.py --model_type=wide
Read on to find out how this code builds its linear model.
## Reading The Census Data
The dataset we'll be using is the
[Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income).
We have provided
[data_download.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/data_download.py)
which downloads the code and performs some additional cleanup.
Since the task is a binary classification problem, we'll construct a label
column named "label" whose value is 1 if the income is over 50K, and 0
otherwise. For reference, see `input_fn` in
[wide_deep.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py).
Next, let's take a look at the dataframe and see which columns we can use to
predict the target label. The columns can be grouped into two types—categorical
and continuous columns:
* A column is called **categorical** if its value can only be one of the
categories in a finite set. For example, the relationship status of a person
(wife, husband, unmarried, etc.) or the education level (high school,
college, etc.) are categorical columns.
* A column is called **continuous** if its value can be any numerical value in
a continuous range. For example, the capital gain of a person (e.g. $14,084)
is a continuous column.
Here's a list of columns available in the Census Income dataset:
| Column Name | Type | Description |
| -------------- | ----------- | --------------------------------- |
| age | Continuous | The age of the individual |
| workclass | Categorical | The type of employer the |
: : : individual has (government, :
: : : military, private, etc.). :
| fnlwgt | Continuous | The number of people the census |
: : : takers believe that observation :
: : : represents (sample weight). Final :
: : : weight will not be used. :
| education | Categorical | The highest level of education |
: : : achieved for that individual. :
| education_num | Continuous | The highest level of education in |
: : : numerical form. :
| marital_status | Categorical | Marital status of the individual. |
| occupation | Categorical | The occupation of the individual. |
| relationship | Categorical | Wife, Own-child, Husband, |
: : : Not-in-family, Other-relative, :
: : : Unmarried. :
| race | Categorical | Amer-Indian-Eskimo, Asian-Pac- |
: : : Islander, Black, White, Other. :
| gender | Categorical | Female, Male. |
| capital_gain | Continuous | Capital gains recorded. |
| capital_loss | Continuous | Capital Losses recorded. |
| hours_per_week | Continuous | Hours worked per week. |
| native_country | Categorical | Country of origin of the |
: : : individual. :
| income_bracket | Categorical | ">50K" or "<=50K", meaning |
: : : whether the person makes more :
: : : than $50,000 annually. :
## Converting Data into Tensors
When building a tf.estimator model, the input data is specified by means of an
Input Builder function. This builder function will not be called until it is
later passed to tf.estimator.Estimator methods such as `train` and `evaluate`.
The purpose of this function is to construct the input data, which is
represented in the form of @{tf.Tensor}s or @{tf.SparseTensor}s.
In more detail, the input builder function returns the following as a pair:
1. `features`: A dict from feature column names to `Tensors` or
`SparseTensors`.
2. `labels`: A `Tensor` containing the label column.
The keys of the `features` will be used to construct columns in the next
section. Because we want to call the `train` and `evaluate` methods with
different data, we define a method that returns an input function based on the
given data. Note that the returned input function will be called while
constructing the TensorFlow graph, not while running the graph. What it is
returning is a representation of the input data as the fundamental unit of
TensorFlow computations, a `Tensor` (or `SparseTensor`).
Each continuous column in the train or test data will be converted into a
`Tensor`, which in general is a good format to represent dense data. For
categorical data, we must represent the data as a `SparseTensor`. This data
format is good for representing sparse data. Our `input_fn` uses the `tf.data`
API, which makes it easy to apply transformations to our dataset:
```python
def input_fn(data_file, num_epochs, shuffle, batch_size):
"""Generate an input function for the Estimator."""
assert tf.gfile.Exists(data_file), (
'%s not found. Please make sure you have either run data_download.py or '
'set both arguments --train_data and --test_data.' % data_file)
def parse_csv(value):
print('Parsing', data_file)
columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)
features = dict(zip(_CSV_COLUMNS, columns))
labels = features.pop('income_bracket')
return features, tf.equal(labels, '>50K')
# Extract lines from input files using the Dataset API.
dataset = tf.data.TextLineDataset(data_file)
if shuffle:
dataset = dataset.shuffle(buffer_size=_SHUFFLE_BUFFER)
dataset = dataset.map(parse_csv, num_parallel_calls=5)
# We call repeat after shuffling, rather than before, to prevent separate
# epochs from blending together.
dataset = dataset.repeat(num_epochs)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
```
## Selecting and Engineering Features for the Model
Selecting and crafting the right set of feature columns is key to learning an
effective model. A **feature column** can be either one of the raw columns in
the original dataframe (let's call them **base feature columns**), or any new
columns created based on some transformations defined over one or multiple base
columns (let's call them **derived feature columns**). Basically, "feature
column" is an abstract concept of any raw or derived variable that can be used
to predict the target label.
### Base Categorical Feature Columns
To define a feature column for a categorical feature, we can create a
`CategoricalColumn` using the tf.feature_column API. If you know the set of all
possible feature values of a column and there are only a few of them, you can
use `categorical_column_with_vocabulary_list`. Each key in the list will get
assigned an auto-incremental ID starting from 0. For example, for the
`relationship` column we can assign the feature string "Husband" to an integer
ID of 0 and "Not-in-family" to 1, etc., by doing:
```python
relationship = tf.feature_column.categorical_column_with_vocabulary_list(
'relationship', [
'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
'Other-relative'])
```
What if we don't know the set of possible values in advance? Not a problem. We
can use `categorical_column_with_hash_bucket` instead:
```python
occupation = tf.feature_column.categorical_column_with_hash_bucket(
'occupation', hash_bucket_size=1000)
```
What will happen is that each possible value in the feature column `occupation`
will be hashed to an integer ID as we encounter them in training. See an example
illustration below:
ID | Feature
--- | -------------
... |
9 | `"Machine-op-inspct"`
... |
103 | `"Farming-fishing"`
... |
375 | `"Protective-serv"`
... |
No matter which way we choose to define a `SparseColumn`, each feature string
will be mapped into an integer ID by looking up a fixed mapping or by hashing.
Note that hashing collisions are possible, but may not significantly impact the
model quality. Under the hood, the `LinearModel` class is responsible for
managing the mapping and creating `tf.Variable` to store the model parameters
(also known as model weights) for each feature ID. The model parameters will be
learned through the model training process we'll go through later.
We'll do the similar trick to define the other categorical features:
```python
education = tf.feature_column.categorical_column_with_vocabulary_list(
'education', [
'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
'5th-6th', '10th', '1st-4th', 'Preschool', '12th'])
marital_status = tf.feature_column.categorical_column_with_vocabulary_list(
'marital_status', [
'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',
'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])
relationship = tf.feature_column.categorical_column_with_vocabulary_list(
'relationship', [
'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
'Other-relative'])
workclass = tf.feature_column.categorical_column_with_vocabulary_list(
'workclass', [
'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',
'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])
# To show an example of hashing:
occupation = tf.feature_column.categorical_column_with_hash_bucket(
'occupation', hash_bucket_size=1000)
```
### Base Continuous Feature Columns
Similarly, we can define a `NumericColumn` for each continuous feature column
that we want to use in the model:
```python
age = tf.feature_column.numeric_column('age')
education_num = tf.feature_column.numeric_column('education_num')
capital_gain = tf.feature_column.numeric_column('capital_gain')
capital_loss = tf.feature_column.numeric_column('capital_loss')
hours_per_week = tf.feature_column.numeric_column('hours_per_week')
```
### Making Continuous Features Categorical through Bucketization
Sometimes the relationship between a continuous feature and the label is not
linear. As a hypothetical example, a person's income may grow with age in the
early stage of one's career, then the growth may slow at some point, and finally
the income decreases after retirement. In this scenario, using the raw `age` as
a real-valued feature column might not be a good choice because the model can
only learn one of the three cases:
1. Income always increases at some rate as age grows (positive correlation),
1. Income always decreases at some rate as age grows (negative correlation), or
1. Income stays the same no matter at what age (no correlation)
If we want to learn the fine-grained correlation between income and each age
group separately, we can leverage **bucketization**. Bucketization is a process
of dividing the entire range of a continuous feature into a set of consecutive
bins/buckets, and then converting the original numerical feature into a bucket
ID (as a categorical feature) depending on which bucket that value falls into.
So, we can define a `bucketized_column` over `age` as:
```python
age_buckets = tf.feature_column.bucketized_column(
age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
```
where the `boundaries` is a list of bucket boundaries. In this case, there are
10 boundaries, resulting in 11 age group buckets (from age 17 and below, 18-24,
25-29, ..., to 65 and over).
### Intersecting Multiple Columns with CrossedColumn
Using each base feature column separately may not be enough to explain the data.
For example, the correlation between education and the label (earning > 50,000
dollars) may be different for different occupations. Therefore, if we only learn
a single model weight for `education="Bachelors"` and `education="Masters"`, we
won't be able to capture every single education-occupation combination (e.g.
distinguishing between `education="Bachelors" AND occupation="Exec-managerial"`
and `education="Bachelors" AND occupation="Craft-repair"`). To learn the
differences between different feature combinations, we can add **crossed feature
columns** to the model.
```python
education_x_occupation = tf.feature_column.crossed_column(
['education', 'occupation'], hash_bucket_size=1000)
```
We can also create a `CrossedColumn` over more than two columns. Each
constituent column can be either a base feature column that is categorical
(`SparseColumn`), a bucketized real-valued feature column (`BucketizedColumn`),
or even another `CrossColumn`. Here's an example:
```python
age_buckets_x_education_x_occupation = tf.feature_column.crossed_column(
[age_buckets, 'education', 'occupation'], hash_bucket_size=1000)
```
## Defining The Logistic Regression Model
After processing the input data and defining all the feature columns, we're now
ready to put them all together and build a Logistic Regression model. In the
previous section we've seen several types of base and derived feature columns,
including:
* `CategoricalColumn`
* `NumericColumn`
* `BucketizedColumn`
* `CrossedColumn`
All of these are subclasses of the abstract `FeatureColumn` class, and can be
added to the `feature_columns` field of a model:
```python
base_columns = [
education, marital_status, relationship, workclass, occupation,
age_buckets,
]
crossed_columns = [
tf.feature_column.crossed_column(
['education', 'occupation'], hash_bucket_size=1000),
tf.feature_column.crossed_column(
[age_buckets, 'education', 'occupation'], hash_bucket_size=1000),
]
model_dir = tempfile.mkdtemp()
model = tf.estimator.LinearClassifier(
model_dir=model_dir, feature_columns=base_columns + crossed_columns)
```
The model also automatically learns a bias term, which controls the prediction
one would make without observing any features (see the section "How Logistic
Regression Works" for more explanations). The learned model files will be stored
in `model_dir`.
## Training and Evaluating Our Model
After adding all the features to the model, now let's look at how to actually
train the model. Training a model is just a single command using the
tf.estimator API:
```python
model.train(input_fn=lambda: input_fn(train_data, num_epochs, True, batch_size))
```
After the model is trained, we can evaluate how good our model is at predicting
the labels of the holdout data:
```python
results = model.evaluate(input_fn=lambda: input_fn(
test_data, 1, False, batch_size))
for key in sorted(results):
print('%s: %s' % (key, results[key]))
```
The first line of the final output should be something like
`accuracy: 0.83557522`, which means the accuracy is 83.6%. Feel free to try more
features and transformations and see if you can do even better!
After the model is evaluated, we can use the model to predict whether an individual has an annual income of over
50,000 dollars given an individual's information input.
```python
pred_iter = model.predict(input_fn=lambda: input_fn(FLAGS.test_data, 1, False, 1))
for pred in pred_iter:
print(pred['classes'])
```
The model prediction output would be like `[b'1']` or `[b'0']` which means whether corresponding individual has an annual income of over 50,000 dollars or not.
If you'd like to see a working end-to-end example, you can download our
[example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py)
and set the `model_type` flag to `wide`.
## Adding Regularization to Prevent Overfitting
Regularization is a technique used to avoid **overfitting**. Overfitting happens
when your model does well on the data it is trained on, but worse on test data
that the model has not seen before, such as live traffic. Overfitting generally
occurs when a model is excessively complex, such as having too many parameters
relative to the number of observed training data. Regularization allows for you
to control your model's complexity and makes the model more generalizable to
unseen data.
In the Linear Model library, you can add L1 and L2 regularizations to the model
as:
```
model = tf.estimator.LinearClassifier(
model_dir=model_dir, feature_columns=base_columns + crossed_columns,
optimizer=tf.train.FtrlOptimizer(
learning_rate=0.1,
l1_regularization_strength=1.0,
l2_regularization_strength=1.0))
```
One important difference between L1 and L2 regularization is that L1
regularization tends to make model weights stay at zero, creating sparser
models, whereas L2 regularization also tries to make the model weights closer to
zero but not necessarily zero. Therefore, if you increase the strength of L1
regularization, you will have a smaller model size because many of the model
weights will be zero. This is often desirable when the feature space is very
large but sparse, and when there are resource constraints that prevent you from
serving a model that is too large.
In practice, you should try various combinations of L1, L2 regularization
strengths and find the best parameters that best control overfitting and give
you a desirable model size.
## How Logistic Regression Works
Finally, let's take a minute to talk about what the Logistic Regression model
actually looks like in case you're not already familiar with it. We'll denote
the label as \\(Y\\), and the set of observed features as a feature vector
\\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). We define \\(Y=1\\) if an individual
earned > 50,000 dollars and \\(Y=0\\) otherwise. In Logistic Regression, the
probability of the label being positive (\\(Y=1\\)) given the features
\\(\mathbf{x}\\) is given as:
$$ P(Y=1|\mathbf{x}) = \frac{1}{1+\exp(-(\mathbf{w}^T\mathbf{x}+b))}$$
where \\(\mathbf{w}=[w_1, w_2, ..., w_d]\\) are the model weights for the
features \\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). \\(b\\) is a constant that is
often called the **bias** of the model. The equation consists of two parts—A
linear model and a logistic function:
* **Linear Model**: First, we can see that \\(\mathbf{w}^T\mathbf{x}+b = b +
w_1x_1 + ... +w_dx_d\\) is a linear model where the output is a linear
function of the input features \\(\mathbf{x}\\). The bias \\(b\\) is the
prediction one would make without observing any features. The model weight
\\(w_i\\) reflects how the feature \\(x_i\\) is correlated with the positive
label. If \\(x_i\\) is positively correlated with the positive label, the
weight \\(w_i\\) increases, and the probability \\(P(Y=1|\mathbf{x})\\) will
be closer to 1. On the other hand, if \\(x_i\\) is negatively correlated
with the positive label, then the weight \\(w_i\\) decreases and the
probability \\(P(Y=1|\mathbf{x})\\) will be closer to 0.
* **Logistic Function**: Second, we can see that there's a logistic function
(also known as the sigmoid function) \\(S(t) = 1/(1+\exp(-t))\\) being
applied to the linear model. The logistic function is used to convert the
output of the linear model \\(\mathbf{w}^T\mathbf{x}+b\\) from any real
number into the range of \\([0, 1]\\), which can be interpreted as a
probability.
Model training is an optimization problem: The goal is to find a set of model
weights (i.e. model parameters) to minimize a **loss function** defined over the
training data, such as logistic loss for Logistic Regression models. The loss
function measures the discrepancy between the ground-truth label and the model's
prediction. If the prediction is very close to the ground-truth label, the loss
value will be low; if the prediction is very far from the label, then the loss
value would be high.
## Learn Deeper
If you're interested in learning more, check out our
@{$wide_and_deep$Wide & Deep Learning Tutorial} where we'll show you how to
combine the strengths of linear models and deep neural networks by jointly
training them using the tf.estimator API.

View File

@ -1,243 +0,0 @@
# TensorFlow Wide & Deep Learning Tutorial
In the previous @{$wide$TensorFlow Linear Model Tutorial}, we trained a logistic
regression model to predict the probability that the individual has an annual
income of over 50,000 dollars using the
[Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income).
TensorFlow is great for training deep neural networks too, and you might be
thinking which one you should choose—well, why not both? Would it be possible to
combine the strengths of both in one model?
In this tutorial, we'll introduce how to use the tf.estimator API to jointly
train a wide linear model and a deep feed-forward neural network. This approach
combines the strengths of memorization and generalization. It's useful for
generic large-scale regression and classification problems with sparse input
features (e.g., categorical features with a large number of possible feature
values). If you're interested in learning more about how Wide & Deep Learning
works, please check out our [research paper](https://arxiv.org/abs/1606.07792).
![Wide & Deep Spectrum of Models](https://www.tensorflow.org/images/wide_n_deep.svg "Wide & Deep")
The figure above shows a comparison of a wide model (logistic regression with
sparse features and transformations), a deep model (feed-forward neural network
with an embedding layer and several hidden layers), and a Wide & Deep model
(joint training of both). At a high level, there are only 3 steps to configure a
wide, deep, or Wide & Deep model using the tf.estimator API:
1. Select features for the wide part: Choose the sparse base columns and
crossed columns you want to use.
1. Select features for the deep part: Choose the continuous columns, the
embedding dimension for each categorical column, and the hidden layer sizes.
1. Put them all together in a Wide & Deep model
(`DNNLinearCombinedClassifier`).
And that's it! Let's go through a simple example.
## Setup
To try the code for this tutorial:
1. @{$install$Install TensorFlow} if you haven't already.
2. Download [the tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/).
3. Execute the data download script we provide to you:
$ python data_download.py
4. Execute the tutorial code with the following command to train the wide and
deep model described in this tutorial:
$ python wide_deep.py
Read on to find out how this code builds its model.
## Define Base Feature Columns
First, let's define the base categorical and continuous feature columns that
we'll use. These base columns will be the building blocks used by both the wide
part and the deep part of the model.
```python
import tensorflow as tf
# Continuous columns
age = tf.feature_column.numeric_column('age')
education_num = tf.feature_column.numeric_column('education_num')
capital_gain = tf.feature_column.numeric_column('capital_gain')
capital_loss = tf.feature_column.numeric_column('capital_loss')
hours_per_week = tf.feature_column.numeric_column('hours_per_week')
education = tf.feature_column.categorical_column_with_vocabulary_list(
'education', [
'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
'5th-6th', '10th', '1st-4th', 'Preschool', '12th'])
marital_status = tf.feature_column.categorical_column_with_vocabulary_list(
'marital_status', [
'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',
'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])
relationship = tf.feature_column.categorical_column_with_vocabulary_list(
'relationship', [
'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
'Other-relative'])
workclass = tf.feature_column.categorical_column_with_vocabulary_list(
'workclass', [
'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',
'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])
# To show an example of hashing:
occupation = tf.feature_column.categorical_column_with_hash_bucket(
'occupation', hash_bucket_size=1000)
# Transformations.
age_buckets = tf.feature_column.bucketized_column(
age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
```
## The Wide Model: Linear Model with Crossed Feature Columns
The wide model is a linear model with a wide set of sparse and crossed feature
columns:
```python
base_columns = [
education, marital_status, relationship, workclass, occupation,
age_buckets,
]
crossed_columns = [
tf.feature_column.crossed_column(
['education', 'occupation'], hash_bucket_size=1000),
tf.feature_column.crossed_column(
[age_buckets, 'education', 'occupation'], hash_bucket_size=1000),
]
```
You can also see the @{$wide$TensorFlow Linear Model Tutorial} for more details.
Wide models with crossed feature columns can memorize sparse interactions
between features effectively. That being said, one limitation of crossed feature
columns is that they do not generalize to feature combinations that have not
appeared in the training data. Let's add a deep model with embeddings to fix
that.
## The Deep Model: Neural Network with Embeddings
The deep model is a feed-forward neural network, as shown in the previous
figure. Each of the sparse, high-dimensional categorical features are first
converted into a low-dimensional and dense real-valued vector, often referred to
as an embedding vector. These low-dimensional dense embedding vectors are
concatenated with the continuous features, and then fed into the hidden layers
of a neural network in the forward pass. The embedding values are initialized
randomly, and are trained along with all other model parameters to minimize the
training loss. If you're interested in learning more about embeddings, check out
the TensorFlow tutorial on @{$word2vec$Vector Representations of Words} or
[Word embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia.
Another way to represent categorical columns to feed into a neural network is
via a one-hot or multi-hot representation. This is often appropriate for
categorical columns with only a few possible values. As an example of a one-hot
representation, for the relationship column, `"Husband"` can be represented as
[1, 0, 0, 0, 0, 0], and `"Not-in-family"` as [0, 1, 0, 0, 0, 0], etc. This is a
fixed representation, whereas embeddings are more flexible and calculated at
training time.
We'll configure the embeddings for the categorical columns using
`embedding_column`, and concatenate them with the continuous columns.
We also use `indicator_column` to create multi-hot representations of some
categorical columns.
```python
deep_columns = [
age,
education_num,
capital_gain,
capital_loss,
hours_per_week,
tf.feature_column.indicator_column(workclass),
tf.feature_column.indicator_column(education),
tf.feature_column.indicator_column(marital_status),
tf.feature_column.indicator_column(relationship),
# To show an example of embedding
tf.feature_column.embedding_column(occupation, dimension=8),
]
```
The higher the `dimension` of the embedding is, the more degrees of freedom the
model will have to learn the representations of the features. For simplicity, we
set the dimension to 8 for all feature columns here. Empirically, a more
informed decision for the number of dimensions is to start with a value on the
order of \\(\log_2(n)\\) or \\(k\sqrt[4]n\\), where \\(n\\) is the number of
unique features in a feature column and \\(k\\) is a small constant (usually
smaller than 10).
Through dense embeddings, deep models can generalize better and make predictions
on feature pairs that were previously unseen in the training data. However, it
is difficult to learn effective low-dimensional representations for feature
columns when the underlying interaction matrix between two feature columns is
sparse and high-rank. In such cases, the interaction between most feature pairs
should be zero except a few, but dense embeddings will lead to nonzero
predictions for all feature pairs, and thus can over-generalize. On the other
hand, linear models with crossed features can memorize these “exception rules”
effectively with fewer model parameters.
Now, let's see how to jointly train wide and deep models and allow them to
complement each others strengths and weaknesses.
## Combining Wide and Deep Models into One
The wide models and deep models are combined by summing up their final output
log odds as the prediction, then feeding the prediction to a logistic loss
function. All the graph definition and variable allocations have already been
handled for you under the hood, so you simply need to create a
`DNNLinearCombinedClassifier`:
```python
model = tf.estimator.DNNLinearCombinedClassifier(
model_dir='/tmp/census_model',
linear_feature_columns=base_columns + crossed_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=[100, 50])
```
## Training and Evaluating The Model
Before we train the model, let's read in the Census dataset as we did in the
@{$wide$TensorFlow Linear Model tutorial}. See `data_download.py` as well as
`input_fn` within
[`wide_deep.py`](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py).
After reading in the data, you can train and evaluate the model:
```python
# Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
for n in range(FLAGS.train_epochs // FLAGS.epochs_per_eval):
model.train(input_fn=lambda: input_fn(
FLAGS.train_data, FLAGS.epochs_per_eval, True, FLAGS.batch_size))
results = model.evaluate(input_fn=lambda: input_fn(
FLAGS.test_data, 1, False, FLAGS.batch_size))
# Display evaluation metrics
print('Results at epoch', (n + 1) * FLAGS.epochs_per_eval)
print('-' * 30)
for key in sorted(results):
print('%s: %s' % (key, results[key]))
```
The final output accuracy should be somewhere around 85.5%. If you'd like to
see a working end-to-end example, you can download our
[example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py).
Note that this tutorial is just a quick example on a small dataset to get you
familiar with the API. Wide & Deep Learning will be even more powerful if you
try it on a large dataset with many sparse feature columns that have a large
number of possible feature values. Again, feel free to take a look at our
[research paper](https://arxiv.org/abs/1606.07792) for more ideas about how to
apply Wide & Deep Learning in real-world large-scale machine learning problems.