From edc511fd696053f2c34f49003fe2eeb05fb7d7de Mon Sep 17 00:00:00 2001 From: Billy Lamberta Date: Tue, 10 Jul 2018 23:00:44 -0700 Subject: [PATCH] Update estimators section for getting started. PiperOrigin-RevId: 204071680 --- tensorflow/docs_src/guide/feature_columns.md | 6 +- tensorflow/docs_src/guide/graph_viz.md | 3 +- tensorflow/docs_src/tutorials/_index.yaml | 13 +- tensorflow/docs_src/tutorials/_toc.yaml | 22 +- .../{images/layers.md => estimators/cnn.md} | 0 .../tutorials/images/image_recognition.md | 2 +- .../tutorials/representation/linear.md | 10 +- .../docs_src/tutorials/representation/wide.md | 461 ------------------ .../tutorials/representation/wide_and_deep.md | 243 --------- 9 files changed, 34 insertions(+), 726 deletions(-) rename tensorflow/docs_src/tutorials/{images/layers.md => estimators/cnn.md} (100%) delete mode 100644 tensorflow/docs_src/tutorials/representation/wide.md delete mode 100644 tensorflow/docs_src/tutorials/representation/wide_and_deep.md diff --git a/tensorflow/docs_src/guide/feature_columns.md b/tensorflow/docs_src/guide/feature_columns.md index 1013ec910c1..41080e050b3 100644 --- a/tensorflow/docs_src/guide/feature_columns.md +++ b/tensorflow/docs_src/guide/feature_columns.md @@ -561,9 +561,9 @@ For more examples on feature columns, view the following: * The @{$low_level_intro#feature_columns$Low Level Introduction} demonstrates how experiment directly with `feature_columns` using TensorFlow's low level APIs. -* The @{$wide$wide} and @{$wide_and_deep$Wide & Deep} Tutorials solve a - binary classification problem using `feature_columns` on a variety of input - data types. +* The [Estimator wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep) + solves a binary classification problem using `feature_columns` on a variety of + input data types. To learn more about embeddings, see the following: diff --git a/tensorflow/docs_src/guide/graph_viz.md b/tensorflow/docs_src/guide/graph_viz.md index f581ae56dae..a8876da5a5b 100644 --- a/tensorflow/docs_src/guide/graph_viz.md +++ b/tensorflow/docs_src/guide/graph_viz.md @@ -248,7 +248,8 @@ The images below show the CIFAR-10 model with tensor shape information: Often it is useful to collect runtime metadata for a run, such as total memory usage, total compute time, and tensor shapes for nodes. The code example below is a snippet from the train and test section of a modification of the -@{$layers$simple MNIST tutorial}, in which we have recorded summaries and +[Estimators MNIST tutorial](../tutorials/estimators/cnn.md), in which we have +recorded summaries and runtime statistics. See the @{$summaries_and_tensorboard#serializing-the-data$Summaries Tutorial} for details on how to record summaries. diff --git a/tensorflow/docs_src/tutorials/_index.yaml b/tensorflow/docs_src/tutorials/_index.yaml index 6fc8155669b..07d561b8a2d 100644 --- a/tensorflow/docs_src/tutorials/_index.yaml +++ b/tensorflow/docs_src/tutorials/_index.yaml @@ -170,15 +170,16 @@ landing_page:

Estimators can train large models on multiple machines in a - production environment. Read the - Estimators guide for details. + production environment. TensorFlow provides a collection of + pre-made Estimators to implement common ML algorithms. See the + Estimators guide.

    -
  1. Build a Convolutional Neural Network using Estimators
  2. +
  3. Premade Estimators guide
  4. +
  5. Wide and deep learning with Estimators
  6. +
  7. Boosted trees
  8. How to build a simple text classifier with TF-Hub
  9. -
  10. Classifying Higgs boson processes
  11. -
  12. Wide and deep learning using Estimators
  13. -
  14. Large-scale linear models
  15. +
  16. Build a Convolutional Neural Network using Estimators
diff --git a/tensorflow/docs_src/tutorials/_toc.yaml b/tensorflow/docs_src/tutorials/_toc.yaml index fa857ffec66..4db97e35fc5 100644 --- a/tensorflow/docs_src/tutorials/_toc.yaml +++ b/tensorflow/docs_src/tutorials/_toc.yaml @@ -37,15 +37,27 @@ toc: status: external - title: "Custom training: walkthrough" path: /tutorials/eager/custom_training_walkthrough - - title: Neural machine translation + - title: Translation with attention path: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb status: external +- title: ML at production scale + style: accordion + section: + - title: Wide and deep learning + path: https://github.com/tensorflow/models/tree/master/official/wide_deep + status: external + - title: Boosted trees + path: https://github.com/tensorflow/models/tree/master/official/boosted_trees + status: external + - title: Text classifier with TF-Hub + path: /hub/tutorials/text_classification_with_tf_hub + - title: Build a CNN using Estimators + path: /tutorials/estimators/cnn + - title: Images style: accordion section: - - title: Build a CNN using Estimators - path: /tutorials/images/layers - title: Image recognition path: /tutorials/images/image_recognition - title: Image retraining @@ -69,10 +81,6 @@ toc: - title: Data representation style: accordion section: - - title: Linear models - path: /tutorials/representation/wide - - title: Wide and deep learning - path: /tutorials/representation/wide_and_deep - title: Vector representations of words path: /tutorials/representation/word2vec - title: Kernel methods diff --git a/tensorflow/docs_src/tutorials/images/layers.md b/tensorflow/docs_src/tutorials/estimators/cnn.md similarity index 100% rename from tensorflow/docs_src/tutorials/images/layers.md rename to tensorflow/docs_src/tutorials/estimators/cnn.md diff --git a/tensorflow/docs_src/tutorials/images/image_recognition.md b/tensorflow/docs_src/tutorials/images/image_recognition.md index 432d470d0cd..d545de73df5 100644 --- a/tensorflow/docs_src/tutorials/images/image_recognition.md +++ b/tensorflow/docs_src/tutorials/images/image_recognition.md @@ -449,7 +449,7 @@ covering them. To find out more about implementing convolutional neural networks, you can jump to the TensorFlow @{$deep_cnn$deep convolutional networks tutorial}, -or start a bit more gently with our @{$layers$MNIST starter tutorial}. +or start a bit more gently with our [Estimator MNIST tutorial](../estimators/cnn.md). Finally, if you want to get up to speed on research in this area, you can read the recent work of all the papers referenced in this tutorial. diff --git a/tensorflow/docs_src/tutorials/representation/linear.md b/tensorflow/docs_src/tutorials/representation/linear.md index 3f247ade266..1b418cf065a 100644 --- a/tensorflow/docs_src/tutorials/representation/linear.md +++ b/tensorflow/docs_src/tutorials/representation/linear.md @@ -11,8 +11,9 @@ those tools. It explains: deep learning to get the advantages of both. Read this overview to decide whether the Estimator's linear model tools might -be useful to you. Then do the @{$wide$Linear Models tutorial} to -give it a try. This overview uses code samples from the tutorial, but the +be useful to you. Then work through the +[Estimator wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep) +to give it a try. This overview uses code samples from the tutorial, but the tutorial walks through the code in greater detail. To understand this overview it will help to have some familiarity @@ -176,7 +177,7 @@ the name of a `FeatureColumn`. Each key's value is a tensor containing the values of that feature for all data instances. See @{$premade_estimators#input_fn} for a more comprehensive look at input functions, and `input_fn` in the -[linear models tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py) +[wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep) for an example implementation of an input function. The input function is passed to the `train()` and `evaluate()` calls that @@ -234,4 +235,5 @@ e = tf.estimator.DNNLinearCombinedClassifier( dnn_feature_columns=deep_columns, dnn_hidden_units=[100, 50]) ``` -For more information, see the @{$wide_and_deep$Wide and Deep Learning tutorial}. +For more information, see the +[wide and deep learning tutorial](https://github.com/tensorflow/models/tree/master/official/wide_deep). diff --git a/tensorflow/docs_src/tutorials/representation/wide.md b/tensorflow/docs_src/tutorials/representation/wide.md deleted file mode 100644 index 27ce75a30dd..00000000000 --- a/tensorflow/docs_src/tutorials/representation/wide.md +++ /dev/null @@ -1,461 +0,0 @@ -# TensorFlow Linear Model Tutorial - -In this tutorial, we will use the tf.estimator API in TensorFlow to solve a -binary classification problem: Given census data about a person such as age, -education, marital status, and occupation (the features), we will try to predict -whether or not the person earns more than 50,000 dollars a year (the target -label). We will train a **logistic regression** model, and given an individual's -information our model will output a number between 0 and 1, which can be -interpreted as the probability that the individual has an annual income of over -50,000 dollars. - -## Setup - -To try the code for this tutorial: - -1. @{$install$Install TensorFlow} if you haven't already. - -2. Download [the tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/). - -3. Execute the data download script we provide to you: - - $ python data_download.py - -4. Execute the tutorial code with the following command to train the linear -model described in this tutorial: - - $ python wide_deep.py --model_type=wide - -Read on to find out how this code builds its linear model. - -## Reading The Census Data - -The dataset we'll be using is the -[Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income). -We have provided -[data_download.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/data_download.py) -which downloads the code and performs some additional cleanup. - -Since the task is a binary classification problem, we'll construct a label -column named "label" whose value is 1 if the income is over 50K, and 0 -otherwise. For reference, see `input_fn` in -[wide_deep.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py). - -Next, let's take a look at the dataframe and see which columns we can use to -predict the target label. The columns can be grouped into two types—categorical -and continuous columns: - -* A column is called **categorical** if its value can only be one of the - categories in a finite set. For example, the relationship status of a person - (wife, husband, unmarried, etc.) or the education level (high school, - college, etc.) are categorical columns. -* A column is called **continuous** if its value can be any numerical value in - a continuous range. For example, the capital gain of a person (e.g. $14,084) - is a continuous column. - -Here's a list of columns available in the Census Income dataset: - -| Column Name | Type | Description | -| -------------- | ----------- | --------------------------------- | -| age | Continuous | The age of the individual | -| workclass | Categorical | The type of employer the | -: : : individual has (government, : -: : : military, private, etc.). : -| fnlwgt | Continuous | The number of people the census | -: : : takers believe that observation : -: : : represents (sample weight). Final : -: : : weight will not be used. : -| education | Categorical | The highest level of education | -: : : achieved for that individual. : -| education_num | Continuous | The highest level of education in | -: : : numerical form. : -| marital_status | Categorical | Marital status of the individual. | -| occupation | Categorical | The occupation of the individual. | -| relationship | Categorical | Wife, Own-child, Husband, | -: : : Not-in-family, Other-relative, : -: : : Unmarried. : -| race | Categorical | Amer-Indian-Eskimo, Asian-Pac- | -: : : Islander, Black, White, Other. : -| gender | Categorical | Female, Male. | -| capital_gain | Continuous | Capital gains recorded. | -| capital_loss | Continuous | Capital Losses recorded. | -| hours_per_week | Continuous | Hours worked per week. | -| native_country | Categorical | Country of origin of the | -: : : individual. : -| income_bracket | Categorical | ">50K" or "<=50K", meaning | -: : : whether the person makes more : -: : : than $50,000 annually. : - -## Converting Data into Tensors - -When building a tf.estimator model, the input data is specified by means of an -Input Builder function. This builder function will not be called until it is -later passed to tf.estimator.Estimator methods such as `train` and `evaluate`. -The purpose of this function is to construct the input data, which is -represented in the form of @{tf.Tensor}s or @{tf.SparseTensor}s. -In more detail, the input builder function returns the following as a pair: - -1. `features`: A dict from feature column names to `Tensors` or - `SparseTensors`. -2. `labels`: A `Tensor` containing the label column. - -The keys of the `features` will be used to construct columns in the next -section. Because we want to call the `train` and `evaluate` methods with -different data, we define a method that returns an input function based on the -given data. Note that the returned input function will be called while -constructing the TensorFlow graph, not while running the graph. What it is -returning is a representation of the input data as the fundamental unit of -TensorFlow computations, a `Tensor` (or `SparseTensor`). - -Each continuous column in the train or test data will be converted into a -`Tensor`, which in general is a good format to represent dense data. For -categorical data, we must represent the data as a `SparseTensor`. This data -format is good for representing sparse data. Our `input_fn` uses the `tf.data` -API, which makes it easy to apply transformations to our dataset: - -```python -def input_fn(data_file, num_epochs, shuffle, batch_size): - """Generate an input function for the Estimator.""" - assert tf.gfile.Exists(data_file), ( - '%s not found. Please make sure you have either run data_download.py or ' - 'set both arguments --train_data and --test_data.' % data_file) - - def parse_csv(value): - print('Parsing', data_file) - columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS) - features = dict(zip(_CSV_COLUMNS, columns)) - labels = features.pop('income_bracket') - return features, tf.equal(labels, '>50K') - - # Extract lines from input files using the Dataset API. - dataset = tf.data.TextLineDataset(data_file) - - if shuffle: - dataset = dataset.shuffle(buffer_size=_SHUFFLE_BUFFER) - - dataset = dataset.map(parse_csv, num_parallel_calls=5) - - # We call repeat after shuffling, rather than before, to prevent separate - # epochs from blending together. - dataset = dataset.repeat(num_epochs) - dataset = dataset.batch(batch_size) - - iterator = dataset.make_one_shot_iterator() - features, labels = iterator.get_next() - return features, labels -``` - -## Selecting and Engineering Features for the Model - -Selecting and crafting the right set of feature columns is key to learning an -effective model. A **feature column** can be either one of the raw columns in -the original dataframe (let's call them **base feature columns**), or any new -columns created based on some transformations defined over one or multiple base -columns (let's call them **derived feature columns**). Basically, "feature -column" is an abstract concept of any raw or derived variable that can be used -to predict the target label. - -### Base Categorical Feature Columns - -To define a feature column for a categorical feature, we can create a -`CategoricalColumn` using the tf.feature_column API. If you know the set of all -possible feature values of a column and there are only a few of them, you can -use `categorical_column_with_vocabulary_list`. Each key in the list will get -assigned an auto-incremental ID starting from 0. For example, for the -`relationship` column we can assign the feature string "Husband" to an integer -ID of 0 and "Not-in-family" to 1, etc., by doing: - -```python -relationship = tf.feature_column.categorical_column_with_vocabulary_list( - 'relationship', [ - 'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried', - 'Other-relative']) -``` - -What if we don't know the set of possible values in advance? Not a problem. We -can use `categorical_column_with_hash_bucket` instead: - -```python -occupation = tf.feature_column.categorical_column_with_hash_bucket( - 'occupation', hash_bucket_size=1000) -``` - -What will happen is that each possible value in the feature column `occupation` -will be hashed to an integer ID as we encounter them in training. See an example -illustration below: - -ID | Feature ---- | ------------- -... | -9 | `"Machine-op-inspct"` -... | -103 | `"Farming-fishing"` -... | -375 | `"Protective-serv"` -... | - -No matter which way we choose to define a `SparseColumn`, each feature string -will be mapped into an integer ID by looking up a fixed mapping or by hashing. -Note that hashing collisions are possible, but may not significantly impact the -model quality. Under the hood, the `LinearModel` class is responsible for -managing the mapping and creating `tf.Variable` to store the model parameters -(also known as model weights) for each feature ID. The model parameters will be -learned through the model training process we'll go through later. - -We'll do the similar trick to define the other categorical features: - -```python -education = tf.feature_column.categorical_column_with_vocabulary_list( - 'education', [ - 'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college', - 'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school', - '5th-6th', '10th', '1st-4th', 'Preschool', '12th']) - -marital_status = tf.feature_column.categorical_column_with_vocabulary_list( - 'marital_status', [ - 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', - 'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed']) - -relationship = tf.feature_column.categorical_column_with_vocabulary_list( - 'relationship', [ - 'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried', - 'Other-relative']) - -workclass = tf.feature_column.categorical_column_with_vocabulary_list( - 'workclass', [ - 'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov', - 'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked']) - -# To show an example of hashing: -occupation = tf.feature_column.categorical_column_with_hash_bucket( - 'occupation', hash_bucket_size=1000) -``` - -### Base Continuous Feature Columns - -Similarly, we can define a `NumericColumn` for each continuous feature column -that we want to use in the model: - -```python -age = tf.feature_column.numeric_column('age') -education_num = tf.feature_column.numeric_column('education_num') -capital_gain = tf.feature_column.numeric_column('capital_gain') -capital_loss = tf.feature_column.numeric_column('capital_loss') -hours_per_week = tf.feature_column.numeric_column('hours_per_week') -``` - -### Making Continuous Features Categorical through Bucketization - -Sometimes the relationship between a continuous feature and the label is not -linear. As a hypothetical example, a person's income may grow with age in the -early stage of one's career, then the growth may slow at some point, and finally -the income decreases after retirement. In this scenario, using the raw `age` as -a real-valued feature column might not be a good choice because the model can -only learn one of the three cases: - -1. Income always increases at some rate as age grows (positive correlation), -1. Income always decreases at some rate as age grows (negative correlation), or -1. Income stays the same no matter at what age (no correlation) - -If we want to learn the fine-grained correlation between income and each age -group separately, we can leverage **bucketization**. Bucketization is a process -of dividing the entire range of a continuous feature into a set of consecutive -bins/buckets, and then converting the original numerical feature into a bucket -ID (as a categorical feature) depending on which bucket that value falls into. -So, we can define a `bucketized_column` over `age` as: - -```python -age_buckets = tf.feature_column.bucketized_column( - age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) -``` - -where the `boundaries` is a list of bucket boundaries. In this case, there are -10 boundaries, resulting in 11 age group buckets (from age 17 and below, 18-24, -25-29, ..., to 65 and over). - -### Intersecting Multiple Columns with CrossedColumn - -Using each base feature column separately may not be enough to explain the data. -For example, the correlation between education and the label (earning > 50,000 -dollars) may be different for different occupations. Therefore, if we only learn -a single model weight for `education="Bachelors"` and `education="Masters"`, we -won't be able to capture every single education-occupation combination (e.g. -distinguishing between `education="Bachelors" AND occupation="Exec-managerial"` -and `education="Bachelors" AND occupation="Craft-repair"`). To learn the -differences between different feature combinations, we can add **crossed feature -columns** to the model. - -```python -education_x_occupation = tf.feature_column.crossed_column( - ['education', 'occupation'], hash_bucket_size=1000) -``` - -We can also create a `CrossedColumn` over more than two columns. Each -constituent column can be either a base feature column that is categorical -(`SparseColumn`), a bucketized real-valued feature column (`BucketizedColumn`), -or even another `CrossColumn`. Here's an example: - -```python -age_buckets_x_education_x_occupation = tf.feature_column.crossed_column( - [age_buckets, 'education', 'occupation'], hash_bucket_size=1000) -``` - -## Defining The Logistic Regression Model - -After processing the input data and defining all the feature columns, we're now -ready to put them all together and build a Logistic Regression model. In the -previous section we've seen several types of base and derived feature columns, -including: - -* `CategoricalColumn` -* `NumericColumn` -* `BucketizedColumn` -* `CrossedColumn` - -All of these are subclasses of the abstract `FeatureColumn` class, and can be -added to the `feature_columns` field of a model: - -```python -base_columns = [ - education, marital_status, relationship, workclass, occupation, - age_buckets, -] -crossed_columns = [ - tf.feature_column.crossed_column( - ['education', 'occupation'], hash_bucket_size=1000), - tf.feature_column.crossed_column( - [age_buckets, 'education', 'occupation'], hash_bucket_size=1000), -] - -model_dir = tempfile.mkdtemp() -model = tf.estimator.LinearClassifier( - model_dir=model_dir, feature_columns=base_columns + crossed_columns) -``` - -The model also automatically learns a bias term, which controls the prediction -one would make without observing any features (see the section "How Logistic -Regression Works" for more explanations). The learned model files will be stored -in `model_dir`. - -## Training and Evaluating Our Model - -After adding all the features to the model, now let's look at how to actually -train the model. Training a model is just a single command using the -tf.estimator API: - -```python -model.train(input_fn=lambda: input_fn(train_data, num_epochs, True, batch_size)) -``` - -After the model is trained, we can evaluate how good our model is at predicting -the labels of the holdout data: - -```python -results = model.evaluate(input_fn=lambda: input_fn( - test_data, 1, False, batch_size)) -for key in sorted(results): - print('%s: %s' % (key, results[key])) -``` - -The first line of the final output should be something like -`accuracy: 0.83557522`, which means the accuracy is 83.6%. Feel free to try more -features and transformations and see if you can do even better! - -After the model is evaluated, we can use the model to predict whether an individual has an annual income of over -50,000 dollars given an individual's information input. -```python - pred_iter = model.predict(input_fn=lambda: input_fn(FLAGS.test_data, 1, False, 1)) - for pred in pred_iter: - print(pred['classes']) -``` - -The model prediction output would be like `[b'1']` or `[b'0']` which means whether corresponding individual has an annual income of over 50,000 dollars or not. - -If you'd like to see a working end-to-end example, you can download our -[example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py) -and set the `model_type` flag to `wide`. - -## Adding Regularization to Prevent Overfitting - -Regularization is a technique used to avoid **overfitting**. Overfitting happens -when your model does well on the data it is trained on, but worse on test data -that the model has not seen before, such as live traffic. Overfitting generally -occurs when a model is excessively complex, such as having too many parameters -relative to the number of observed training data. Regularization allows for you -to control your model's complexity and makes the model more generalizable to -unseen data. - -In the Linear Model library, you can add L1 and L2 regularizations to the model -as: - -``` -model = tf.estimator.LinearClassifier( - model_dir=model_dir, feature_columns=base_columns + crossed_columns, - optimizer=tf.train.FtrlOptimizer( - learning_rate=0.1, - l1_regularization_strength=1.0, - l2_regularization_strength=1.0)) -``` - -One important difference between L1 and L2 regularization is that L1 -regularization tends to make model weights stay at zero, creating sparser -models, whereas L2 regularization also tries to make the model weights closer to -zero but not necessarily zero. Therefore, if you increase the strength of L1 -regularization, you will have a smaller model size because many of the model -weights will be zero. This is often desirable when the feature space is very -large but sparse, and when there are resource constraints that prevent you from -serving a model that is too large. - -In practice, you should try various combinations of L1, L2 regularization -strengths and find the best parameters that best control overfitting and give -you a desirable model size. - -## How Logistic Regression Works - -Finally, let's take a minute to talk about what the Logistic Regression model -actually looks like in case you're not already familiar with it. We'll denote -the label as \\(Y\\), and the set of observed features as a feature vector -\\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). We define \\(Y=1\\) if an individual -earned > 50,000 dollars and \\(Y=0\\) otherwise. In Logistic Regression, the -probability of the label being positive (\\(Y=1\\)) given the features -\\(\mathbf{x}\\) is given as: - -$$ P(Y=1|\mathbf{x}) = \frac{1}{1+\exp(-(\mathbf{w}^T\mathbf{x}+b))}$$ - -where \\(\mathbf{w}=[w_1, w_2, ..., w_d]\\) are the model weights for the -features \\(\mathbf{x}=[x_1, x_2, ..., x_d]\\). \\(b\\) is a constant that is -often called the **bias** of the model. The equation consists of two parts—A -linear model and a logistic function: - -* **Linear Model**: First, we can see that \\(\mathbf{w}^T\mathbf{x}+b = b + - w_1x_1 + ... +w_dx_d\\) is a linear model where the output is a linear - function of the input features \\(\mathbf{x}\\). The bias \\(b\\) is the - prediction one would make without observing any features. The model weight - \\(w_i\\) reflects how the feature \\(x_i\\) is correlated with the positive - label. If \\(x_i\\) is positively correlated with the positive label, the - weight \\(w_i\\) increases, and the probability \\(P(Y=1|\mathbf{x})\\) will - be closer to 1. On the other hand, if \\(x_i\\) is negatively correlated - with the positive label, then the weight \\(w_i\\) decreases and the - probability \\(P(Y=1|\mathbf{x})\\) will be closer to 0. - -* **Logistic Function**: Second, we can see that there's a logistic function - (also known as the sigmoid function) \\(S(t) = 1/(1+\exp(-t))\\) being - applied to the linear model. The logistic function is used to convert the - output of the linear model \\(\mathbf{w}^T\mathbf{x}+b\\) from any real - number into the range of \\([0, 1]\\), which can be interpreted as a - probability. - -Model training is an optimization problem: The goal is to find a set of model -weights (i.e. model parameters) to minimize a **loss function** defined over the -training data, such as logistic loss for Logistic Regression models. The loss -function measures the discrepancy between the ground-truth label and the model's -prediction. If the prediction is very close to the ground-truth label, the loss -value will be low; if the prediction is very far from the label, then the loss -value would be high. - -## Learn Deeper - -If you're interested in learning more, check out our -@{$wide_and_deep$Wide & Deep Learning Tutorial} where we'll show you how to -combine the strengths of linear models and deep neural networks by jointly -training them using the tf.estimator API. diff --git a/tensorflow/docs_src/tutorials/representation/wide_and_deep.md b/tensorflow/docs_src/tutorials/representation/wide_and_deep.md deleted file mode 100644 index 44677a810bc..00000000000 --- a/tensorflow/docs_src/tutorials/representation/wide_and_deep.md +++ /dev/null @@ -1,243 +0,0 @@ -# TensorFlow Wide & Deep Learning Tutorial - -In the previous @{$wide$TensorFlow Linear Model Tutorial}, we trained a logistic -regression model to predict the probability that the individual has an annual -income of over 50,000 dollars using the -[Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income). -TensorFlow is great for training deep neural networks too, and you might be -thinking which one you should choose—well, why not both? Would it be possible to -combine the strengths of both in one model? - -In this tutorial, we'll introduce how to use the tf.estimator API to jointly -train a wide linear model and a deep feed-forward neural network. This approach -combines the strengths of memorization and generalization. It's useful for -generic large-scale regression and classification problems with sparse input -features (e.g., categorical features with a large number of possible feature -values). If you're interested in learning more about how Wide & Deep Learning -works, please check out our [research paper](https://arxiv.org/abs/1606.07792). - -![Wide & Deep Spectrum of Models](https://www.tensorflow.org/images/wide_n_deep.svg "Wide & Deep") - -The figure above shows a comparison of a wide model (logistic regression with -sparse features and transformations), a deep model (feed-forward neural network -with an embedding layer and several hidden layers), and a Wide & Deep model -(joint training of both). At a high level, there are only 3 steps to configure a -wide, deep, or Wide & Deep model using the tf.estimator API: - -1. Select features for the wide part: Choose the sparse base columns and - crossed columns you want to use. -1. Select features for the deep part: Choose the continuous columns, the - embedding dimension for each categorical column, and the hidden layer sizes. -1. Put them all together in a Wide & Deep model - (`DNNLinearCombinedClassifier`). - -And that's it! Let's go through a simple example. - -## Setup - -To try the code for this tutorial: - -1. @{$install$Install TensorFlow} if you haven't already. - -2. Download [the tutorial code](https://github.com/tensorflow/models/tree/master/official/wide_deep/). - -3. Execute the data download script we provide to you: - - $ python data_download.py - -4. Execute the tutorial code with the following command to train the wide and -deep model described in this tutorial: - - $ python wide_deep.py - -Read on to find out how this code builds its model. - - -## Define Base Feature Columns - -First, let's define the base categorical and continuous feature columns that -we'll use. These base columns will be the building blocks used by both the wide -part and the deep part of the model. - -```python -import tensorflow as tf - -# Continuous columns -age = tf.feature_column.numeric_column('age') -education_num = tf.feature_column.numeric_column('education_num') -capital_gain = tf.feature_column.numeric_column('capital_gain') -capital_loss = tf.feature_column.numeric_column('capital_loss') -hours_per_week = tf.feature_column.numeric_column('hours_per_week') - -education = tf.feature_column.categorical_column_with_vocabulary_list( - 'education', [ - 'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college', - 'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school', - '5th-6th', '10th', '1st-4th', 'Preschool', '12th']) - -marital_status = tf.feature_column.categorical_column_with_vocabulary_list( - 'marital_status', [ - 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', - 'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed']) - -relationship = tf.feature_column.categorical_column_with_vocabulary_list( - 'relationship', [ - 'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried', - 'Other-relative']) - -workclass = tf.feature_column.categorical_column_with_vocabulary_list( - 'workclass', [ - 'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov', - 'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked']) - -# To show an example of hashing: -occupation = tf.feature_column.categorical_column_with_hash_bucket( - 'occupation', hash_bucket_size=1000) - -# Transformations. -age_buckets = tf.feature_column.bucketized_column( - age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) -``` - -## The Wide Model: Linear Model with Crossed Feature Columns - -The wide model is a linear model with a wide set of sparse and crossed feature -columns: - -```python -base_columns = [ - education, marital_status, relationship, workclass, occupation, - age_buckets, -] - -crossed_columns = [ - tf.feature_column.crossed_column( - ['education', 'occupation'], hash_bucket_size=1000), - tf.feature_column.crossed_column( - [age_buckets, 'education', 'occupation'], hash_bucket_size=1000), -] -``` - -You can also see the @{$wide$TensorFlow Linear Model Tutorial} for more details. - -Wide models with crossed feature columns can memorize sparse interactions -between features effectively. That being said, one limitation of crossed feature -columns is that they do not generalize to feature combinations that have not -appeared in the training data. Let's add a deep model with embeddings to fix -that. - -## The Deep Model: Neural Network with Embeddings - -The deep model is a feed-forward neural network, as shown in the previous -figure. Each of the sparse, high-dimensional categorical features are first -converted into a low-dimensional and dense real-valued vector, often referred to -as an embedding vector. These low-dimensional dense embedding vectors are -concatenated with the continuous features, and then fed into the hidden layers -of a neural network in the forward pass. The embedding values are initialized -randomly, and are trained along with all other model parameters to minimize the -training loss. If you're interested in learning more about embeddings, check out -the TensorFlow tutorial on @{$word2vec$Vector Representations of Words} or -[Word embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia. - -Another way to represent categorical columns to feed into a neural network is -via a one-hot or multi-hot representation. This is often appropriate for -categorical columns with only a few possible values. As an example of a one-hot -representation, for the relationship column, `"Husband"` can be represented as -[1, 0, 0, 0, 0, 0], and `"Not-in-family"` as [0, 1, 0, 0, 0, 0], etc. This is a -fixed representation, whereas embeddings are more flexible and calculated at -training time. - -We'll configure the embeddings for the categorical columns using -`embedding_column`, and concatenate them with the continuous columns. -We also use `indicator_column` to create multi-hot representations of some -categorical columns. - -```python -deep_columns = [ - age, - education_num, - capital_gain, - capital_loss, - hours_per_week, - tf.feature_column.indicator_column(workclass), - tf.feature_column.indicator_column(education), - tf.feature_column.indicator_column(marital_status), - tf.feature_column.indicator_column(relationship), - # To show an example of embedding - tf.feature_column.embedding_column(occupation, dimension=8), -] -``` - -The higher the `dimension` of the embedding is, the more degrees of freedom the -model will have to learn the representations of the features. For simplicity, we -set the dimension to 8 for all feature columns here. Empirically, a more -informed decision for the number of dimensions is to start with a value on the -order of \\(\log_2(n)\\) or \\(k\sqrt[4]n\\), where \\(n\\) is the number of -unique features in a feature column and \\(k\\) is a small constant (usually -smaller than 10). - -Through dense embeddings, deep models can generalize better and make predictions -on feature pairs that were previously unseen in the training data. However, it -is difficult to learn effective low-dimensional representations for feature -columns when the underlying interaction matrix between two feature columns is -sparse and high-rank. In such cases, the interaction between most feature pairs -should be zero except a few, but dense embeddings will lead to nonzero -predictions for all feature pairs, and thus can over-generalize. On the other -hand, linear models with crossed features can memorize these “exception rules” -effectively with fewer model parameters. - -Now, let's see how to jointly train wide and deep models and allow them to -complement each other’s strengths and weaknesses. - -## Combining Wide and Deep Models into One - -The wide models and deep models are combined by summing up their final output -log odds as the prediction, then feeding the prediction to a logistic loss -function. All the graph definition and variable allocations have already been -handled for you under the hood, so you simply need to create a -`DNNLinearCombinedClassifier`: - -```python -model = tf.estimator.DNNLinearCombinedClassifier( - model_dir='/tmp/census_model', - linear_feature_columns=base_columns + crossed_columns, - dnn_feature_columns=deep_columns, - dnn_hidden_units=[100, 50]) -``` - -## Training and Evaluating The Model - -Before we train the model, let's read in the Census dataset as we did in the -@{$wide$TensorFlow Linear Model tutorial}. See `data_download.py` as well as -`input_fn` within -[`wide_deep.py`](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py). - -After reading in the data, you can train and evaluate the model: - -```python -# Train and evaluate the model every `FLAGS.epochs_per_eval` epochs. -for n in range(FLAGS.train_epochs // FLAGS.epochs_per_eval): - model.train(input_fn=lambda: input_fn( - FLAGS.train_data, FLAGS.epochs_per_eval, True, FLAGS.batch_size)) - - results = model.evaluate(input_fn=lambda: input_fn( - FLAGS.test_data, 1, False, FLAGS.batch_size)) - - # Display evaluation metrics - print('Results at epoch', (n + 1) * FLAGS.epochs_per_eval) - print('-' * 30) - - for key in sorted(results): - print('%s: %s' % (key, results[key])) -``` - -The final output accuracy should be somewhere around 85.5%. If you'd like to -see a working end-to-end example, you can download our -[example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/wide_deep.py). - -Note that this tutorial is just a quick example on a small dataset to get you -familiar with the API. Wide & Deep Learning will be even more powerful if you -try it on a large dataset with many sparse feature columns that have a large -number of possible feature values. Again, feel free to take a look at our -[research paper](https://arxiv.org/abs/1606.07792) for more ideas about how to -apply Wide & Deep Learning in real-world large-scale machine learning problems.