Update estimators section for getting started.
PiperOrigin-RevId: 204071680
This commit is contained in:
parent
249f124c1c
commit
edc511fd69
tensorflow/docs_src
@ -561,9 +561,9 @@ For more examples on feature columns, view the following:
|
||||
|
||||
* The @{$low_level_intro#feature_columns$Low Level Introduction} demonstrates how
|
||||
experiment directly with `feature_columns` using TensorFlow's low level APIs.
|
||||
* The @{$wide$wide} and @{$wide_and_deep$Wide & Deep} Tutorials solve a
|
||||
binary classification problem using `feature_columns` on a variety of input
|
||||
data types.
|
||||
* The [Estimator wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep)
|
||||
solves a binary classification problem using `feature_columns` on a variety of
|
||||
input data types.
|
||||
|
||||
To learn more about embeddings, see the following:
|
||||
|
||||
|
@ -248,7 +248,8 @@ The images below show the CIFAR-10 model with tensor shape information:
|
||||
Often it is useful to collect runtime metadata for a run, such as total memory
|
||||
usage, total compute time, and tensor shapes for nodes. The code example below
|
||||
is a snippet from the train and test section of a modification of the
|
||||
@{$layers$simple MNIST tutorial}, in which we have recorded summaries and
|
||||
[Estimators MNIST tutorial](../tutorials/estimators/cnn.md), in which we have
|
||||
recorded summaries and
|
||||
runtime statistics. See the
|
||||
@{$summaries_and_tensorboard#serializing-the-data$Summaries Tutorial}
|
||||
for details on how to record summaries.
|
||||
|
@ -170,15 +170,16 @@ landing_page:
|
||||
<div class="devsite-landing-row-item-description-content">
|
||||
<p>
|
||||
Estimators can train large models on multiple machines in a
|
||||
production environment. Read the
|
||||
<a href="/guide/estimators">Estimators guide</a> for details.
|
||||
production environment. TensorFlow provides a collection of
|
||||
pre-made Estimators to implement common ML algorithms. See the
|
||||
<a href="/guide/estimators">Estimators guide</a>.
|
||||
</p>
|
||||
<ol style="padding-left: 20px;">
|
||||
<li><a href="/tutorials/images/layers">Build a Convolutional Neural Network using Estimators</a></li>
|
||||
<li><a href="/guide/premade_estimators">Premade Estimators guide</a></li>
|
||||
<li><a href="https://github.com/tensorflow/models/tree/master/official/wide_deep" class="external">Wide and deep learning with Estimators</a></li>
|
||||
<li><a href="https://github.com/tensorflow/models/tree/master/official/boosted_trees" class="external">Boosted trees</a></li>
|
||||
<li><a href="/hub/tutorials/text_classification_with_tf_hub">How to build a simple text classifier with TF-Hub</a></li>
|
||||
<li><a href="https://github.com/tensorflow/models/tree/master/official/boosted_trees">Classifying Higgs boson processes</a></li>
|
||||
<li><a href="/tutorials/representation/wide_and_deep">Wide and deep learning using Estimators</a></li>
|
||||
<li><a href="/tutorials/representation/linear">Large-scale linear models</a></li>
|
||||
<li><a href="/tutorials/estimators/cnn">Build a Convolutional Neural Network using Estimators</a></li>
|
||||
</ol>
|
||||
</div>
|
||||
<div class="devsite-landing-row-item-buttons">
|
||||
|
@ -37,15 +37,27 @@ toc:
|
||||
status: external
|
||||
- title: "Custom training: walkthrough"
|
||||
path: /tutorials/eager/custom_training_walkthrough
|
||||
- title: Neural machine translation
|
||||
- title: Translation with attention
|
||||
path: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb
|
||||
status: external
|
||||
|
||||
- title: ML at production scale
|
||||
style: accordion
|
||||
section:
|
||||
- title: Wide and deep learning
|
||||
path: https://github.com/tensorflow/models/tree/master/official/wide_deep
|
||||
status: external
|
||||
- title: Boosted trees
|
||||
path: https://github.com/tensorflow/models/tree/master/official/boosted_trees
|
||||
status: external
|
||||
- title: Text classifier with TF-Hub
|
||||
path: /hub/tutorials/text_classification_with_tf_hub
|
||||
- title: Build a CNN using Estimators
|
||||
path: /tutorials/estimators/cnn
|
||||
|
||||
- title: Images
|
||||
style: accordion
|
||||
section:
|
||||
- title: Build a CNN using Estimators
|
||||
path: /tutorials/images/layers
|
||||
- title: Image recognition
|
||||
path: /tutorials/images/image_recognition
|
||||
- title: Image retraining
|
||||
@ -69,10 +81,6 @@ toc:
|
||||
- title: Data representation
|
||||
style: accordion
|
||||
section:
|
||||
- title: Linear models
|
||||
path: /tutorials/representation/wide
|
||||
- title: Wide and deep learning
|
||||
path: /tutorials/representation/wide_and_deep
|
||||
- title: Vector representations of words
|
||||
path: /tutorials/representation/word2vec
|
||||
- title: Kernel methods
|
||||
|
@ -449,7 +449,7 @@ covering them.
|
||||
|
||||
To find out more about implementing convolutional neural networks, you can jump
|
||||
to the TensorFlow @{$deep_cnn$deep convolutional networks tutorial},
|
||||
or start a bit more gently with our @{$layers$MNIST starter tutorial}.
|
||||
or start a bit more gently with our [Estimator MNIST tutorial](../estimators/cnn.md).
|
||||
Finally, if you want to get up to speed on research in this area, you can
|
||||
read the recent work of all the papers referenced in this tutorial.
|
||||
|
||||
|
@ -11,8 +11,9 @@ those tools. It explains:
|
||||
deep learning to get the advantages of both.
|
||||
|
||||
Read this overview to decide whether the Estimator's linear model tools might
|
||||
be useful to you. Then do the @{$wide$Linear Models tutorial} to
|
||||
give it a try. This overview uses code samples from the tutorial, but the
|
||||
be useful to you. Then work through the
|
||||
[Estimator wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep)
|
||||
to give it a try. This overview uses code samples from the tutorial, but the
|
||||
tutorial walks through the code in greater detail.
|
||||
|
||||
To understand this overview it will help to have some familiarity
|
||||
@ -176,7 +177,7 @@ the name of a `FeatureColumn`. Each key's value is a tensor containing the
|
||||
values of that feature for all data instances. See
|
||||
@{$premade_estimators#input_fn} for a
|
||||
more comprehensive look at input functions, and `input_fn` in the
|
||||
[linear models tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py)
|
||||
[wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep)
|
||||
for an example implementation of an input function.
|
||||
|
||||
The input function is passed to the `train()` and `evaluate()` calls that
|
||||
@ -234,4 +235,5 @@ e = tf.estimator.DNNLinearCombinedClassifier(
|
||||
dnn_feature_columns=deep_columns,
|
||||
dnn_hidden_units=[100, 50])
|
||||
```
|
||||
For more information, see the @{$wide_and_deep$Wide and Deep Learning tutorial}.
|
||||
For more information, see the
|
||||
[wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep).
|
||||
|
@ -1,461 +0,0 @@
|
||||
# TensorFlow Linear Model Tutorial
|
||||
|
||||
In this tutorial, we will use the tf.estimator API in TensorFlow to solve a
|
||||
binary classification problem: Given census data about a person such as age,
|
||||
education, marital status, and occupation (the features), we will try to predict
|
||||
whether or not the person earns more than 50,000 dollars a year (the target
|
||||
label). We will train a **logistic regression** model, and given an individual's
|
||||
information our model will output a number between 0 and 1, which can be
|
||||
interpreted as the probability that the individual has an annual income of over
|
||||
50,000 dollars.
|
||||
|
||||
## Setup
|
||||
|
||||
To try the code for this tutorial:
|
||||
|
||||
1. @{$install$Install TensorFlow} if you haven't already.
|
||||
|
||||
2. Download [the tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/).
|
||||
|
||||
3. Execute the data download script we provide to you:
|
||||
|
||||
$ python data_download.py
|
||||
|
||||
4. Execute the tutorial code with the following command to train the linear
|
||||
model described in this tutorial:
|
||||
|
||||
$ python wide_deep.py --model_type=wide
|
||||
|
||||
Read on to find out how this code builds its linear model.
|
||||
|
||||
## Reading The Census Data
|
||||
|
||||
The dataset we'll be using is the
|
||||
[Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income).
|
||||
We have provided
|
||||
[data_download.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/data_download.py)
|
||||
which downloads the code and performs some additional cleanup.
|
||||
|
||||
Since the task is a binary classification problem, we'll construct a label
|
||||
column named "label" whose value is 1 if the income is over 50K, and 0
|
||||
otherwise. For reference, see `input_fn` in
|
||||
[wide_deep.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py).
|
||||
|
||||
Next, let's take a look at the dataframe and see which columns we can use to
|
||||
predict the target label. The columns can be grouped into two types—categorical
|
||||
and continuous columns:
|
||||
|
||||
* A column is called **categorical** if its value can only be one of the
|
||||
categories in a finite set. For example, the relationship status of a person
|
||||
(wife, husband, unmarried, etc.) or the education level (high school,
|
||||
college, etc.) are categorical columns.
|
||||
* A column is called **continuous** if its value can be any numerical value in
|
||||
a continuous range. For example, the capital gain of a person (e.g. $14,084)
|
||||
is a continuous column.
|
||||
|
||||
Here's a list of columns available in the Census Income dataset:
|
||||
|
||||
| Column Name | Type | Description |
|
||||
| -------------- | ----------- | --------------------------------- |
|
||||
| age | Continuous | The age of the individual |
|
||||
| workclass | Categorical | The type of employer the |
|
||||
: : : individual has (government, :
|
||||
: : : military, private, etc.). :
|
||||
| fnlwgt | Continuous | The number of people the census |
|
||||
: : : takers believe that observation :
|
||||
: : : represents (sample weight). Final :
|
||||
: : : weight will not be used. :
|
||||
| education | Categorical | The highest level of education |
|
||||
: : : achieved for that individual. :
|
||||
| education_num | Continuous | The highest level of education in |
|
||||
: : : numerical form. :
|
||||
| marital_status | Categorical | Marital status of the individual. |
|
||||
| occupation | Categorical | The occupation of the individual. |
|
||||
| relationship | Categorical | Wife, Own-child, Husband, |
|
||||
: : : Not-in-family, Other-relative, :
|
||||
: : : Unmarried. :
|
||||
| race | Categorical | Amer-Indian-Eskimo, Asian-Pac- |
|
||||
: : : Islander, Black, White, Other. :
|
||||
| gender | Categorical | Female, Male. |
|
||||
| capital_gain | Continuous | Capital gains recorded. |
|
||||
| capital_loss | Continuous | Capital Losses recorded. |
|
||||
| hours_per_week | Continuous | Hours worked per week. |
|
||||
| native_country | Categorical | Country of origin of the |
|
||||
: : : individual. :
|
||||
| income_bracket | Categorical | ">50K" or "<=50K", meaning |
|
||||
: : : whether the person makes more :
|
||||
: : : than $50,000 annually. :
|
||||
|
||||
## Converting Data into Tensors
|
||||
|
||||
When building a tf.estimator model, the input data is specified by means of an
|
||||
Input Builder function. This builder function will not be called until it is
|
||||
later passed to tf.estimator.Estimator methods such as `train` and `evaluate`.
|
||||
The purpose of this function is to construct the input data, which is
|
||||
represented in the form of @{tf.Tensor}s or @{tf.SparseTensor}s.
|
||||
In more detail, the input builder function returns the following as a pair:
|
||||
|
||||
1. `features`: A dict from feature column names to `Tensors` or
|
||||
`SparseTensors`.
|
||||
2. `labels`: A `Tensor` containing the label column.
|
||||
|
||||
The keys of the `features` will be used to construct columns in the next
|
||||
section. Because we want to call the `train` and `evaluate` methods with
|
||||
different data, we define a method that returns an input function based on the
|
||||
given data. Note that the returned input function will be called while
|
||||
constructing the TensorFlow graph, not while running the graph. What it is
|
||||
returning is a representation of the input data as the fundamental unit of
|
||||
TensorFlow computations, a `Tensor` (or `SparseTensor`).
|
||||
|
||||
Each continuous column in the train or test data will be converted into a
|
||||
`Tensor`, which in general is a good format to represent dense data. For
|
||||
categorical data, we must represent the data as a `SparseTensor`. This data
|
||||
format is good for representing sparse data. Our `input_fn` uses the `tf.data`
|
||||
API, which makes it easy to apply transformations to our dataset:
|
||||
|
||||
```python
|
||||
def input_fn(data_file, num_epochs, shuffle, batch_size):
|
||||
"""Generate an input function for the Estimator."""
|
||||
assert tf.gfile.Exists(data_file), (
|
||||
'%s not found. Please make sure you have either run data_download.py or '
|
||||
'set both arguments --train_data and --test_data.' % data_file)
|
||||
|
||||
def parse_csv(value):
|
||||
print('Parsing', data_file)
|
||||
columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)
|
||||
features = dict(zip(_CSV_COLUMNS, columns))
|
||||
labels = features.pop('income_bracket')
|
||||
return features, tf.equal(labels, '>50K')
|
||||
|
||||
# Extract lines from input files using the Dataset API.
|
||||
dataset = tf.data.TextLineDataset(data_file)
|
||||
|
||||
if shuffle:
|
||||
dataset = dataset.shuffle(buffer_size=_SHUFFLE_BUFFER)
|
||||
|
||||
dataset = dataset.map(parse_csv, num_parallel_calls=5)
|
||||
|
||||
# We call repeat after shuffling, rather than before, to prevent separate
|
||||
# epochs from blending together.
|
||||
dataset = dataset.repeat(num_epochs)
|
||||
dataset = dataset.batch(batch_size)
|
||||
|
||||
iterator = dataset.make_one_shot_iterator()
|
||||
features, labels = iterator.get_next()
|
||||
return features, labels
|
||||
```
|
||||
|
||||
## Selecting and Engineering Features for the Model
|
||||
|
||||
Selecting and crafting the right set of feature columns is key to learning an
|
||||
effective model. A **feature column** can be either one of the raw columns in
|
||||
the original dataframe (let's call them **base feature columns**), or any new
|
||||
columns created based on some transformations defined over one or multiple base
|
||||
columns (let's call them **derived feature columns**). Basically, "feature
|
||||
column" is an abstract concept of any raw or derived variable that can be used
|
||||
to predict the target label.
|
||||
|
||||
### Base Categorical Feature Columns
|
||||
|
||||
To define a feature column for a categorical feature, we can create a
|
||||
`CategoricalColumn` using the tf.feature_column API. If you know the set of all
|
||||
possible feature values of a column and there are only a few of them, you can
|
||||
use `categorical_column_with_vocabulary_list`. Each key in the list will get
|
||||
assigned an auto-incremental ID starting from 0. For example, for the
|
||||
`relationship` column we can assign the feature string "Husband" to an integer
|
||||
ID of 0 and "Not-in-family" to 1, etc., by doing:
|
||||
|
||||
```python
|
||||
relationship = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'relationship', [
|
||||
'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
|
||||
'Other-relative'])
|
||||
```
|
||||
|
||||
What if we don't know the set of possible values in advance? Not a problem. We
|
||||
can use `categorical_column_with_hash_bucket` instead:
|
||||
|
||||
```python
|
||||
occupation = tf.feature_column.categorical_column_with_hash_bucket(
|
||||
'occupation', hash_bucket_size=1000)
|
||||
```
|
||||
|
||||
What will happen is that each possible value in the feature column `occupation`
|
||||
will be hashed to an integer ID as we encounter them in training. See an example
|
||||
illustration below:
|
||||
|
||||
ID | Feature
|
||||
--- | -------------
|
||||
... |
|
||||
9 | `"Machine-op-inspct"`
|
||||
... |
|
||||
103 | `"Farming-fishing"`
|
||||
... |
|
||||
375 | `"Protective-serv"`
|
||||
... |
|
||||
|
||||
No matter which way we choose to define a `SparseColumn`, each feature string
|
||||
will be mapped into an integer ID by looking up a fixed mapping or by hashing.
|
||||
Note that hashing collisions are possible, but may not significantly impact the
|
||||
model quality. Under the hood, the `LinearModel` class is responsible for
|
||||
managing the mapping and creating `tf.Variable` to store the model parameters
|
||||
(also known as model weights) for each feature ID. The model parameters will be
|
||||
learned through the model training process we'll go through later.
|
||||
|
||||
We'll do the similar trick to define the other categorical features:
|
||||
|
||||
```python
|
||||
education = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'education', [
|
||||
'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
|
||||
'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
|
||||
'5th-6th', '10th', '1st-4th', 'Preschool', '12th'])
|
||||
|
||||
marital_status = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'marital_status', [
|
||||
'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',
|
||||
'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])
|
||||
|
||||
relationship = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'relationship', [
|
||||
'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
|
||||
'Other-relative'])
|
||||
|
||||
workclass = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'workclass', [
|
||||
'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',
|
||||
'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])
|
||||
|
||||
# To show an example of hashing:
|
||||
occupation = tf.feature_column.categorical_column_with_hash_bucket(
|
||||
'occupation', hash_bucket_size=1000)
|
||||
```
|
||||
|
||||
### Base Continuous Feature Columns
|
||||
|
||||
Similarly, we can define a `NumericColumn` for each continuous feature column
|
||||
that we want to use in the model:
|
||||
|
||||
```python
|
||||
age = tf.feature_column.numeric_column('age')
|
||||
education_num = tf.feature_column.numeric_column('education_num')
|
||||
capital_gain = tf.feature_column.numeric_column('capital_gain')
|
||||
capital_loss = tf.feature_column.numeric_column('capital_loss')
|
||||
hours_per_week = tf.feature_column.numeric_column('hours_per_week')
|
||||
```
|
||||
|
||||
### Making Continuous Features Categorical through Bucketization
|
||||
|
||||
Sometimes the relationship between a continuous feature and the label is not
|
||||
linear. As a hypothetical example, a person's income may grow with age in the
|
||||
early stage of one's career, then the growth may slow at some point, and finally
|
||||
the income decreases after retirement. In this scenario, using the raw `age` as
|
||||
a real-valued feature column might not be a good choice because the model can
|
||||
only learn one of the three cases:
|
||||
|
||||
1. Income always increases at some rate as age grows (positive correlation),
|
||||
1. Income always decreases at some rate as age grows (negative correlation), or
|
||||
1. Income stays the same no matter at what age (no correlation)
|
||||
|
||||
If we want to learn the fine-grained correlation between income and each age
|
||||
group separately, we can leverage **bucketization**. Bucketization is a process
|
||||
of dividing the entire range of a continuous feature into a set of consecutive
|
||||
bins/buckets, and then converting the original numerical feature into a bucket
|
||||
ID (as a categorical feature) depending on which bucket that value falls into.
|
||||
So, we can define a `bucketized_column` over `age` as:
|
||||
|
||||
```python
|
||||
age_buckets = tf.feature_column.bucketized_column(
|
||||
age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
|
||||
```
|
||||
|
||||
where the `boundaries` is a list of bucket boundaries. In this case, there are
|
||||
10 boundaries, resulting in 11 age group buckets (from age 17 and below, 18-24,
|
||||
25-29, ..., to 65 and over).
|
||||
|
||||
### Intersecting Multiple Columns with CrossedColumn
|
||||
|
||||
Using each base feature column separately may not be enough to explain the data.
|
||||
For example, the correlation between education and the label (earning > 50,000
|
||||
dollars) may be different for different occupations. Therefore, if we only learn
|
||||
a single model weight for `education="Bachelors"` and `education="Masters"`, we
|
||||
won't be able to capture every single education-occupation combination (e.g.
|
||||
distinguishing between `education="Bachelors" AND occupation="Exec-managerial"`
|
||||
and `education="Bachelors" AND occupation="Craft-repair"`). To learn the
|
||||
differences between different feature combinations, we can add **crossed feature
|
||||
columns** to the model.
|
||||
|
||||
```python
|
||||
education_x_occupation = tf.feature_column.crossed_column(
|
||||
['education', 'occupation'], hash_bucket_size=1000)
|
||||
```
|
||||
|
||||
We can also create a `CrossedColumn` over more than two columns. Each
|
||||
constituent column can be either a base feature column that is categorical
|
||||
(`SparseColumn`), a bucketized real-valued feature column (`BucketizedColumn`),
|
||||
or even another `CrossColumn`. Here's an example:
|
||||
|
||||
```python
|
||||
age_buckets_x_education_x_occupation = tf.feature_column.crossed_column(
|
||||
[age_buckets, 'education', 'occupation'], hash_bucket_size=1000)
|
||||
```
|
||||
|
||||
## Defining The Logistic Regression Model
|
||||
|
||||
After processing the input data and defining all the feature columns, we're now
|
||||
ready to put them all together and build a Logistic Regression model. In the
|
||||
previous section we've seen several types of base and derived feature columns,
|
||||
including:
|
||||
|
||||
* `CategoricalColumn`
|
||||
* `NumericColumn`
|
||||
* `BucketizedColumn`
|
||||
* `CrossedColumn`
|
||||
|
||||
All of these are subclasses of the abstract `FeatureColumn` class, and can be
|
||||
added to the `feature_columns` field of a model:
|
||||
|
||||
```python
|
||||
base_columns = [
|
||||
education, marital_status, relationship, workclass, occupation,
|
||||
age_buckets,
|
||||
]
|
||||
crossed_columns = [
|
||||
tf.feature_column.crossed_column(
|
||||
['education', 'occupation'], hash_bucket_size=1000),
|
||||
tf.feature_column.crossed_column(
|
||||
[age_buckets, 'education', 'occupation'], hash_bucket_size=1000),
|
||||
]
|
||||
|
||||
model_dir = tempfile.mkdtemp()
|
||||
model = tf.estimator.LinearClassifier(
|
||||
model_dir=model_dir, feature_columns=base_columns + crossed_columns)
|
||||
```
|
||||
|
||||
The model also automatically learns a bias term, which controls the prediction
|
||||
one would make without observing any features (see the section "How Logistic
|
||||
Regression Works" for more explanations). The learned model files will be stored
|
||||
in `model_dir`.
|
||||
|
||||
## Training and Evaluating Our Model
|
||||
|
||||
After adding all the features to the model, now let's look at how to actually
|
||||
train the model. Training a model is just a single command using the
|
||||
tf.estimator API:
|
||||
|
||||
```python
|
||||
model.train(input_fn=lambda: input_fn(train_data, num_epochs, True, batch_size))
|
||||
```
|
||||
|
||||
After the model is trained, we can evaluate how good our model is at predicting
|
||||
the labels of the holdout data:
|
||||
|
||||
```python
|
||||
results = model.evaluate(input_fn=lambda: input_fn(
|
||||
test_data, 1, False, batch_size))
|
||||
for key in sorted(results):
|
||||
print('%s: %s' % (key, results[key]))
|
||||
```
|
||||
|
||||
The first line of the final output should be something like
|
||||
`accuracy: 0.83557522`, which means the accuracy is 83.6%. Feel free to try more
|
||||
features and transformations and see if you can do even better!
|
||||
|
||||
After the model is evaluated, we can use the model to predict whether an individual has an annual income of over
|
||||
50,000 dollars given an individual's information input.
|
||||
```python
|
||||
pred_iter = model.predict(input_fn=lambda: input_fn(FLAGS.test_data, 1, False, 1))
|
||||
for pred in pred_iter:
|
||||
print(pred['classes'])
|
||||
```
|
||||
|
||||
The model prediction output would be like `[b'1']` or `[b'0']` which means whether corresponding individual has an annual income of over 50,000 dollars or not.
|
||||
|
||||
If you'd like to see a working end-to-end example, you can download our
|
||||
[example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py)
|
||||
and set the `model_type` flag to `wide`.
|
||||
|
||||
## Adding Regularization to Prevent Overfitting
|
||||
|
||||
Regularization is a technique used to avoid **overfitting**. Overfitting happens
|
||||
when your model does well on the data it is trained on, but worse on test data
|
||||
that the model has not seen before, such as live traffic. Overfitting generally
|
||||
occurs when a model is excessively complex, such as having too many parameters
|
||||
relative to the number of observed training data. Regularization allows for you
|
||||
to control your model's complexity and makes the model more generalizable to
|
||||
unseen data.
|
||||
|
||||
In the Linear Model library, you can add L1 and L2 regularizations to the model
|
||||
as:
|
||||
|
||||
```
|
||||
model = tf.estimator.LinearClassifier(
|
||||
model_dir=model_dir, feature_columns=base_columns + crossed_columns,
|
||||
optimizer=tf.train.FtrlOptimizer(
|
||||
learning_rate=0.1,
|
||||
l1_regularization_strength=1.0,
|
||||
l2_regularization_strength=1.0))
|
||||
```
|
||||
|
||||
One important difference between L1 and L2 regularization is that L1
|
||||
regularization tends to make model weights stay at zero, creating sparser
|
||||
models, whereas L2 regularization also tries to make the model weights closer to
|
||||
zero but not necessarily zero. Therefore, if you increase the strength of L1
|
||||
regularization, you will have a smaller model size because many of the model
|
||||
weights will be zero. This is often desirable when the feature space is very
|
||||
large but sparse, and when there are resource constraints that prevent you from
|
||||
serving a model that is too large.
|
||||
|
||||
In practice, you should try various combinations of L1, L2 regularization
|
||||
strengths and find the best parameters that best control overfitting and give
|
||||
you a desirable model size.
|
||||
|
||||
## How Logistic Regression Works
|
||||
|
||||
Finally, let's take a minute to talk about what the Logistic Regression model
|
||||
actually looks like in case you're not already familiar with it. We'll denote
|
||||
the label as \\(Y\\), and the set of observed features as a feature vector
|
||||
\\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). We define \\(Y=1\\) if an individual
|
||||
earned > 50,000 dollars and \\(Y=0\\) otherwise. In Logistic Regression, the
|
||||
probability of the label being positive (\\(Y=1\\)) given the features
|
||||
\\(\mathbf{x}\\) is given as:
|
||||
|
||||
$$ P(Y=1|\mathbf{x}) = \frac{1}{1+\exp(-(\mathbf{w}^T\mathbf{x}+b))}$$
|
||||
|
||||
where \\(\mathbf{w}=[w_1, w_2, ..., w_d]\\) are the model weights for the
|
||||
features \\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). \\(b\\) is a constant that is
|
||||
often called the **bias** of the model. The equation consists of two parts—A
|
||||
linear model and a logistic function:
|
||||
|
||||
* **Linear Model**: First, we can see that \\(\mathbf{w}^T\mathbf{x}+b = b +
|
||||
w_1x_1 + ... +w_dx_d\\) is a linear model where the output is a linear
|
||||
function of the input features \\(\mathbf{x}\\). The bias \\(b\\) is the
|
||||
prediction one would make without observing any features. The model weight
|
||||
\\(w_i\\) reflects how the feature \\(x_i\\) is correlated with the positive
|
||||
label. If \\(x_i\\) is positively correlated with the positive label, the
|
||||
weight \\(w_i\\) increases, and the probability \\(P(Y=1|\mathbf{x})\\) will
|
||||
be closer to 1. On the other hand, if \\(x_i\\) is negatively correlated
|
||||
with the positive label, then the weight \\(w_i\\) decreases and the
|
||||
probability \\(P(Y=1|\mathbf{x})\\) will be closer to 0.
|
||||
|
||||
* **Logistic Function**: Second, we can see that there's a logistic function
|
||||
(also known as the sigmoid function) \\(S(t) = 1/(1+\exp(-t))\\) being
|
||||
applied to the linear model. The logistic function is used to convert the
|
||||
output of the linear model \\(\mathbf{w}^T\mathbf{x}+b\\) from any real
|
||||
number into the range of \\([0, 1]\\), which can be interpreted as a
|
||||
probability.
|
||||
|
||||
Model training is an optimization problem: The goal is to find a set of model
|
||||
weights (i.e. model parameters) to minimize a **loss function** defined over the
|
||||
training data, such as logistic loss for Logistic Regression models. The loss
|
||||
function measures the discrepancy between the ground-truth label and the model's
|
||||
prediction. If the prediction is very close to the ground-truth label, the loss
|
||||
value will be low; if the prediction is very far from the label, then the loss
|
||||
value would be high.
|
||||
|
||||
## Learn Deeper
|
||||
|
||||
If you're interested in learning more, check out our
|
||||
@{$wide_and_deep$Wide & Deep Learning Tutorial} where we'll show you how to
|
||||
combine the strengths of linear models and deep neural networks by jointly
|
||||
training them using the tf.estimator API.
|
@ -1,243 +0,0 @@
|
||||
# TensorFlow Wide & Deep Learning Tutorial
|
||||
|
||||
In the previous @{$wide$TensorFlow Linear Model Tutorial}, we trained a logistic
|
||||
regression model to predict the probability that the individual has an annual
|
||||
income of over 50,000 dollars using the
|
||||
[Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income).
|
||||
TensorFlow is great for training deep neural networks too, and you might be
|
||||
thinking which one you should choose—well, why not both? Would it be possible to
|
||||
combine the strengths of both in one model?
|
||||
|
||||
In this tutorial, we'll introduce how to use the tf.estimator API to jointly
|
||||
train a wide linear model and a deep feed-forward neural network. This approach
|
||||
combines the strengths of memorization and generalization. It's useful for
|
||||
generic large-scale regression and classification problems with sparse input
|
||||
features (e.g., categorical features with a large number of possible feature
|
||||
values). If you're interested in learning more about how Wide & Deep Learning
|
||||
works, please check out our [research paper](https://arxiv.org/abs/1606.07792).
|
||||
|
||||

|
||||
|
||||
The figure above shows a comparison of a wide model (logistic regression with
|
||||
sparse features and transformations), a deep model (feed-forward neural network
|
||||
with an embedding layer and several hidden layers), and a Wide & Deep model
|
||||
(joint training of both). At a high level, there are only 3 steps to configure a
|
||||
wide, deep, or Wide & Deep model using the tf.estimator API:
|
||||
|
||||
1. Select features for the wide part: Choose the sparse base columns and
|
||||
crossed columns you want to use.
|
||||
1. Select features for the deep part: Choose the continuous columns, the
|
||||
embedding dimension for each categorical column, and the hidden layer sizes.
|
||||
1. Put them all together in a Wide & Deep model
|
||||
(`DNNLinearCombinedClassifier`).
|
||||
|
||||
And that's it! Let's go through a simple example.
|
||||
|
||||
## Setup
|
||||
|
||||
To try the code for this tutorial:
|
||||
|
||||
1. @{$install$Install TensorFlow} if you haven't already.
|
||||
|
||||
2. Download [the tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/).
|
||||
|
||||
3. Execute the data download script we provide to you:
|
||||
|
||||
$ python data_download.py
|
||||
|
||||
4. Execute the tutorial code with the following command to train the wide and
|
||||
deep model described in this tutorial:
|
||||
|
||||
$ python wide_deep.py
|
||||
|
||||
Read on to find out how this code builds its model.
|
||||
|
||||
|
||||
## Define Base Feature Columns
|
||||
|
||||
First, let's define the base categorical and continuous feature columns that
|
||||
we'll use. These base columns will be the building blocks used by both the wide
|
||||
part and the deep part of the model.
|
||||
|
||||
```python
|
||||
import tensorflow as tf
|
||||
|
||||
# Continuous columns
|
||||
age = tf.feature_column.numeric_column('age')
|
||||
education_num = tf.feature_column.numeric_column('education_num')
|
||||
capital_gain = tf.feature_column.numeric_column('capital_gain')
|
||||
capital_loss = tf.feature_column.numeric_column('capital_loss')
|
||||
hours_per_week = tf.feature_column.numeric_column('hours_per_week')
|
||||
|
||||
education = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'education', [
|
||||
'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
|
||||
'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
|
||||
'5th-6th', '10th', '1st-4th', 'Preschool', '12th'])
|
||||
|
||||
marital_status = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'marital_status', [
|
||||
'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',
|
||||
'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])
|
||||
|
||||
relationship = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'relationship', [
|
||||
'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried',
|
||||
'Other-relative'])
|
||||
|
||||
workclass = tf.feature_column.categorical_column_with_vocabulary_list(
|
||||
'workclass', [
|
||||
'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',
|
||||
'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])
|
||||
|
||||
# To show an example of hashing:
|
||||
occupation = tf.feature_column.categorical_column_with_hash_bucket(
|
||||
'occupation', hash_bucket_size=1000)
|
||||
|
||||
# Transformations.
|
||||
age_buckets = tf.feature_column.bucketized_column(
|
||||
age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
|
||||
```
|
||||
|
||||
## The Wide Model: Linear Model with Crossed Feature Columns
|
||||
|
||||
The wide model is a linear model with a wide set of sparse and crossed feature
|
||||
columns:
|
||||
|
||||
```python
|
||||
base_columns = [
|
||||
education, marital_status, relationship, workclass, occupation,
|
||||
age_buckets,
|
||||
]
|
||||
|
||||
crossed_columns = [
|
||||
tf.feature_column.crossed_column(
|
||||
['education', 'occupation'], hash_bucket_size=1000),
|
||||
tf.feature_column.crossed_column(
|
||||
[age_buckets, 'education', 'occupation'], hash_bucket_size=1000),
|
||||
]
|
||||
```
|
||||
|
||||
You can also see the @{$wide$TensorFlow Linear Model Tutorial} for more details.
|
||||
|
||||
Wide models with crossed feature columns can memorize sparse interactions
|
||||
between features effectively. That being said, one limitation of crossed feature
|
||||
columns is that they do not generalize to feature combinations that have not
|
||||
appeared in the training data. Let's add a deep model with embeddings to fix
|
||||
that.
|
||||
|
||||
## The Deep Model: Neural Network with Embeddings
|
||||
|
||||
The deep model is a feed-forward neural network, as shown in the previous
|
||||
figure. Each of the sparse, high-dimensional categorical features are first
|
||||
converted into a low-dimensional and dense real-valued vector, often referred to
|
||||
as an embedding vector. These low-dimensional dense embedding vectors are
|
||||
concatenated with the continuous features, and then fed into the hidden layers
|
||||
of a neural network in the forward pass. The embedding values are initialized
|
||||
randomly, and are trained along with all other model parameters to minimize the
|
||||
training loss. If you're interested in learning more about embeddings, check out
|
||||
the TensorFlow tutorial on @{$word2vec$Vector Representations of Words} or
|
||||
[Word embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia.
|
||||
|
||||
Another way to represent categorical columns to feed into a neural network is
|
||||
via a one-hot or multi-hot representation. This is often appropriate for
|
||||
categorical columns with only a few possible values. As an example of a one-hot
|
||||
representation, for the relationship column, `"Husband"` can be represented as
|
||||
[1, 0, 0, 0, 0, 0], and `"Not-in-family"` as [0, 1, 0, 0, 0, 0], etc. This is a
|
||||
fixed representation, whereas embeddings are more flexible and calculated at
|
||||
training time.
|
||||
|
||||
We'll configure the embeddings for the categorical columns using
|
||||
`embedding_column`, and concatenate them with the continuous columns.
|
||||
We also use `indicator_column` to create multi-hot representations of some
|
||||
categorical columns.
|
||||
|
||||
```python
|
||||
deep_columns = [
|
||||
age,
|
||||
education_num,
|
||||
capital_gain,
|
||||
capital_loss,
|
||||
hours_per_week,
|
||||
tf.feature_column.indicator_column(workclass),
|
||||
tf.feature_column.indicator_column(education),
|
||||
tf.feature_column.indicator_column(marital_status),
|
||||
tf.feature_column.indicator_column(relationship),
|
||||
# To show an example of embedding
|
||||
tf.feature_column.embedding_column(occupation, dimension=8),
|
||||
]
|
||||
```
|
||||
|
||||
The higher the `dimension` of the embedding is, the more degrees of freedom the
|
||||
model will have to learn the representations of the features. For simplicity, we
|
||||
set the dimension to 8 for all feature columns here. Empirically, a more
|
||||
informed decision for the number of dimensions is to start with a value on the
|
||||
order of \\(\log_2(n)\\) or \\(k\sqrt[4]n\\), where \\(n\\) is the number of
|
||||
unique features in a feature column and \\(k\\) is a small constant (usually
|
||||
smaller than 10).
|
||||
|
||||
Through dense embeddings, deep models can generalize better and make predictions
|
||||
on feature pairs that were previously unseen in the training data. However, it
|
||||
is difficult to learn effective low-dimensional representations for feature
|
||||
columns when the underlying interaction matrix between two feature columns is
|
||||
sparse and high-rank. In such cases, the interaction between most feature pairs
|
||||
should be zero except a few, but dense embeddings will lead to nonzero
|
||||
predictions for all feature pairs, and thus can over-generalize. On the other
|
||||
hand, linear models with crossed features can memorize these “exception rules”
|
||||
effectively with fewer model parameters.
|
||||
|
||||
Now, let's see how to jointly train wide and deep models and allow them to
|
||||
complement each other’s strengths and weaknesses.
|
||||
|
||||
## Combining Wide and Deep Models into One
|
||||
|
||||
The wide models and deep models are combined by summing up their final output
|
||||
log odds as the prediction, then feeding the prediction to a logistic loss
|
||||
function. All the graph definition and variable allocations have already been
|
||||
handled for you under the hood, so you simply need to create a
|
||||
`DNNLinearCombinedClassifier`:
|
||||
|
||||
```python
|
||||
model = tf.estimator.DNNLinearCombinedClassifier(
|
||||
model_dir='/tmp/census_model',
|
||||
linear_feature_columns=base_columns + crossed_columns,
|
||||
dnn_feature_columns=deep_columns,
|
||||
dnn_hidden_units=[100, 50])
|
||||
```
|
||||
|
||||
## Training and Evaluating The Model
|
||||
|
||||
Before we train the model, let's read in the Census dataset as we did in the
|
||||
@{$wide$TensorFlow Linear Model tutorial}. See `data_download.py` as well as
|
||||
`input_fn` within
|
||||
[`wide_deep.py`](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py).
|
||||
|
||||
After reading in the data, you can train and evaluate the model:
|
||||
|
||||
```python
|
||||
# Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
|
||||
for n in range(FLAGS.train_epochs // FLAGS.epochs_per_eval):
|
||||
model.train(input_fn=lambda: input_fn(
|
||||
FLAGS.train_data, FLAGS.epochs_per_eval, True, FLAGS.batch_size))
|
||||
|
||||
results = model.evaluate(input_fn=lambda: input_fn(
|
||||
FLAGS.test_data, 1, False, FLAGS.batch_size))
|
||||
|
||||
# Display evaluation metrics
|
||||
print('Results at epoch', (n + 1) * FLAGS.epochs_per_eval)
|
||||
print('-' * 30)
|
||||
|
||||
for key in sorted(results):
|
||||
print('%s: %s' % (key, results[key]))
|
||||
```
|
||||
|
||||
The final output accuracy should be somewhere around 85.5%. If you'd like to
|
||||
see a working end-to-end example, you can download our
|
||||
[example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py).
|
||||
|
||||
Note that this tutorial is just a quick example on a small dataset to get you
|
||||
familiar with the API. Wide & Deep Learning will be even more powerful if you
|
||||
try it on a large dataset with many sparse feature columns that have a large
|
||||
number of possible feature values. Again, feel free to take a look at our
|
||||
[research paper](https://arxiv.org/abs/1606.07792) for more ideas about how to
|
||||
apply Wide & Deep Learning in real-world large-scale machine learning problems.
|
Loading…
Reference in New Issue
Block a user