We've moved from using the docs in g3doc/ to docs_src/ files, which

get additional processing before being published. Change: 148682342
2017-02-27 12:50:37 -08:00 · 2017-02-27 12:50:37 -08:00 · 1c707ac780
commit 1c707ac780
parent e45decb6f5
1346 changed files with 7730 additions and 200864 deletions
--- a/tensorflow/docs_src/init.py
+++ b/tensorflow/docs_src/init.py
--- a/tensorflow/docs_src/about/bib.md
+++ b/tensorflow/docs_src/about/bib.md
@ -0,0 +1,56 @@
+# TensorFlow White Papers
+
+This document identifies white papers about TensorFlow.
+
+### Large-Scale Machine Learning on Heterogeneous Distributed Systems
+
+[Access this white paper.](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)
+
+**Abstract:** TensorFlow is an interface for expressing machine learning
+algorithms, and an implementation for executing such algorithms.
+A computation expressed using TensorFlow can be
+executed with little or no change on a wide variety of heterogeneous
+systems, ranging from mobile devices such as phones
+and tablets up to large-scale distributed systems of hundreds
+of machines and thousands of computational devices such as
+GPU cards. The system is flexible and can be used to express
+a wide variety of algorithms, including training and inference
+algorithms for deep neural network models, and it has been
+used for conducting research and for deploying machine learning
+systems into production across more than a dozen areas of
+computer science and other fields, including speech recognition,
+computer vision, robotics, information retrieval, natural
+language processing, geographic information extraction, and
+computational drug discovery. This paper describes the TensorFlow
+interface and an implementation of that interface that
+we have built at Google. The TensorFlow API and a reference
+implementation were released as an open-source package under
+the Apache 2.0 license in November, 2015 and are available at
+www.tensorflow.org.
+
+
+### TensorFlow: A System for Large-Scale Machine Learning
+
+[Access this white paper.](https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf)
+
+**Abstract:** TensorFlow is a machine learning system that operates at
+large scale and in heterogeneous environments. TensorFlow
+uses dataflow graphs to represent computation,
+shared state, and the operations that mutate that state. It
+maps the nodes of a dataflow graph across many machines
+in a cluster, and within a machine across multiple computational
+devices, including multicore CPUs, generalpurpose
+GPUs, and custom-designed ASICs known as
+Tensor Processing Units (TPUs). This architecture gives
+flexibility to the application developer: whereas in previous
+“parameter server” designs the management of shared
+state is built into the system, TensorFlow enables developers
+to experiment with novel optimizations and training algorithms.
+TensorFlow supports a variety of applications,
+with a focus on training and inference on deep neural networks.
+Several Google services use TensorFlow in production,
+we have released it as an open-source project, and
+it has become widely used for machine learning research.
+In this paper, we describe the TensorFlow dataflow model
+and demonstrate the compelling performance that TensorFlow
+achieves for several real-world applications.
--- a/tensorflow/g3doc/resources/index.md
+++ b/tensorflow/g3doc/resources/index.md
@ -12,7 +12,7 @@ The original white paper introducing TensorFlow can be found here:
 * [TensorFlow: Large-scale machine learning on heterogeneous systems](http://download.tensorflow.org/paper/whitepaper2015.pdf)

 A white paper about
-[contrib.learn](https://www.tensorflow.org/tutorials/tflearn/) is also
+@{$tflearn$contrib.learn} is also
 available:

 * [TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning](https://arxiv.org/abs/1612.04251)
@ -21,7 +21,7 @@ available:

 If you use TensorFlow in your research and would like to cite the TensorFlow
 system, we suggest you cite the paper above.
-You can use this [BibTeX entry](bib.md).  As the project progresses, we
+You can use this @{$bib$BibTeX entry}.  As the project progresses, we
 may update the suggested citation with new papers.

 Please only use the TensorFlow name and marks when accurately referencing this
@ -36,7 +36,7 @@ TensorFlow enables researchers to build machine learning models. We collect such
 models in our [Zoo](https://github.com/tensorflow/models). If you have built a 
 model with TensorFlow, you may consider publishing it there.

-We keep a list of projects that use TensorFlow [here](uses.md). If you made
+We keep a list of projects that use TensorFlow @{$uses$here}. If you made
 something amazing with TensorFlow, we'd like to hear about it!

 ## Community
@ -66,14 +66,14 @@ https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md).
 For help and support, technical or algorithmic questions, please submit
 your questions to Stack Overflow:
 <https://stackoverflow.com/questions/tagged/tensorflow>.
-You may also find answers in our [FAQ](faq.md), our [glossary](glossary.md), or
-in the [shapes, sizes and types guide](dims_types.md). Please do not use the
+You may also find answers in our @{$faq$FAQ}, or
+in the @{$dims_types$shapes, sizes and types guide}. Please do not use the
 mailing list or issue tracker for support.

 ### Discussions

-For general discussions, please join the [TensorFlow discuss mailing list](
-https://groups.google.com/a/tensorflow.org/d/forum/discuss).
+For general discussions, please join the
+[TensorFlow discuss mailing list](https://groups.google.com/a/tensorflow.org/d/forum/discuss).
 This list is intended for general discussions about TensorFlow development and
 directions, not as a help forum. Instead, direct your questions to
 [Stack Overflow](https://stackoverflow.com/questions/tagged/tensorflow), and
@ -91,10 +91,4 @@ tracker for that. Instead, direct your questions to
 ## Versioning

 TensorFlow uses [Semantic Versioning 2.0](http://semver.org).  For details on
-the versioning of our public API and binary compatibility, see the [versioning
-document](versions.md).  Additional details for developers are in [TensorFlow
-Data Versioning](data_versions.md).
-
-## Roadmap
-
-A roadmap containing what we're working on at the moment is [here](roadmap.md).
+the versioning of our public API and binary compatibility, see the @{$data_versions$versioning document}.  Additional details for developers are in @{$data_versions$TensorFlow Data Versioning}.
--- a/tensorflow/docs_src/about/leftnav_files
+++ b/tensorflow/docs_src/about/leftnav_files
@ -0,0 +1,3 @@
+index.md
+bib.md
+uses.md
--- a/tensorflow/g3doc/resources/roadmap.md
+++ b/tensorflow/g3doc/resources/roadmap.md
--- a/tensorflow/g3doc/resources/uses.md
+++ b/tensorflow/g3doc/resources/uses.md
--- a/tensorflow/docs_src/api_guides/cc/guide.md
+++ b/tensorflow/docs_src/api_guides/cc/guide.md
@ -0,0 +1,282 @@
+# C++ API
+[TOC]
+
+TensorFlow's C++ API provides mechanisms for constructing and executing a data
+flow graph. The API is designed to be simple and concise: graph operations are
+clearly expressed using a "functional" construction style, including easy
+specification of names, device placement, etc., and the resulting graph can be
+efficiently run and the desired outputs fetched in a few lines of code. This
+guide explains the basic concepts and data structures needed to get started with
+TensorFlow graph construction and execution in C++.
+
+## The Basics
+
+Let's start with a simple example that illustrates graph construction and
+execution using the C++ API.
+
+```c++
+// tensorflow/cc/example/example.cc
+
+#include "tensorflow/cc/client/client_session.h"
+#include "tensorflow/cc/ops/standard_ops.h"
+#include "tensorflow/core/framework/tensor.h"
+
+int main() {
+  using namespace tensorflow;
+  using namespace tensorflow::ops;
+  Scope root = Scope::NewRootScope();
+  // Matrix A = [3 2; -1 0]
+  auto A = Const(root, {{3.f, 2.f}, {-1.f, 0.f}});
+  // Vector b = [3 5]
+  auto b = Const(root, {{3.f, 5.f}});
+  // v = Ab^T
+  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));
+  std::vector<Tensor> outputs;
+  ClientSession session(root);
+  // Run and fetch v
+  TF_CHECK_OK(session.Run({v}, &outputs));
+  // Expect outputs[0] == [19; -3]
+  LOG(INFO) << outputs[0].matrix<float>();
+  return 0;
+}
+```
+
+Place this example code in the file `tensorflow/cc/example/example.cc` inside a
+clone of the
+TensorFlow
+[github repository](http://www.github.com/tensorflow/tensorflow). Also place a
+`BUILD` file in the same directory with the following contents:
+
+```python
+cc_binary(
+    name = "example",
+    srcs = ["example.cc"],
+    deps = [
+        "//tensorflow/cc:cc_ops",
+        "//tensorflow/cc:client_session",
+        "//tensorflow/core:tensorflow",
+    ],
+)
+```
+
+You should be able to build and run the example using the following command:
+
+```shell
+bazel run -c opt //tensorflow/cc/example:example
+```
+
+This example shows some of the important features of the C++ API such as the
+following:
+
+* Constructing tensor constants from C++ nested initializer lists
+* Constructing and naming of TensorFlow operations
+* Specifying optional attributes to operation constructors
+* Executing and fetching the tensor values from the TensorFlow session.
+
+We will delve into the details of each below.
+
+## Graph Construction
+
+### Scope
+
+@{tensorflow::Scope} is the main data structure that holds the current state
+of graph construction. A `Scope` acts as a handle to the graph being
+constructed, as well as storing TensorFlow operation properties. The `Scope`
+object is the first argument to operation constructors, and operations that use
+a given `Scope` as their first argument inherit that `Scopes`s properties, such
+as a common name prefix. Multiple `Scopes`s can refer to the same graph, as
+explained further below.
+
+Create a new `Scope` object by calling `Scope::NewRootScope`. This creates
+some resources such as a graph to which operations are added. It also creates a
+@{tensorflow::Status} object which will be used to indicate errors encountered
+when constructing operations. The `Scope` class has value semantics, thus, a
+`Scope` object can be freely copied and passed around.
+
+The `Scope` object returned by `Scope::NewRootScope` is referred
+to as the root scope. "Child" scopes can be constructed from the root scope by
+calling various member functions of the `Scope` class, thus forming a hierarchy
+of scopes. A child scope inherits all of the properties of the parent scope and
+typically has one property added or changed. For instance, `NewSubScope(name)`
+appends `name` to the prefix of names for operations created using the returned
+`Scope` object.
+
+Here are some of the properties controlled by a `Scope` object:
+
+* Operation names
+* Set of control dependencies for an operation
+* Device placement for an operation
+* Kernel attribute for an operation
+
+Please refer to @{tensorflow::Scope} for the complete list of member functions
+that let you create child scopes with new properties.
+
+### Operation Construtors
+
+You can create graph operations with operation constructors, one C++ class per
+TensorFlow operation. Unlike the Python API which uses snake-case to name the
+operation constructors, the C++ API uses camel-case to conform to C++ coding
+style. For instance, the `MatMul` operation has a C++ class with the same name.
+
+Using this class-per-operation method, it is possible, though not recommended,
+to construct an operation as follows:
+
+```c++
+// Not recommended
+MatMul m(scope, a, b);
+```
+
+However, we recommend the following "functional" style for constructing
+operations:
+
+```c++
+// Recommended
+auto m = MatMul(scope, a, b);
+```
+
+The first parameter for all operation constructors is always a `Scope` object.
+Tensor inputs and mandatory attributes form the rest of the arguments.
+
+For optional arguments, constructors have an optional parameter that allows
+optional attributes.  For operations with optional arguments, the constructor's
+last optional parameter is a `struct` type called `[operation]:Attrs` that
+contains data members for each optional attribute. You can construct such
+`Attrs` in multiple ways:
+
+* You can specify a single optional attribute by constructing an `Attrs` object
+using the `static` functions provided in the C++ class for the operation. For
+example:
+
+```c++
+auto m = MatMul(scope, a, b, MatMul::TransposeA(true));
+```
+
+* You can specify multiple optional attributes by chaining together functions
+  available in the `Attrs` struct. For example:
+
+```c++
+auto m = MatMul(scope, a, b, MatMul::TransposeA(true).TransposeB(true));
+
+// Or, alternatively
+auto m = MatMul(scope, a, b, MatMul::Attrs().TransposeA(true).TransposeB(true));
+```
+
+The arguments and return values of operations are handled in different ways
+depending on their type:
+
+* For operations that return single tensors, the object returned by
+  the operation object can be passed directly to other operation
+  constructors. For example:
+
+```c++
+auto m = MatMul(scope, x, W);
+auto sum = Add(scope, m, bias);
+```
+
+* For operations producing multiple outputs, the object returned by the
+  operation constructor has a member for each of the outputs. The names of those
+  members are identical to the names present in the `OpDef` for the
+  operation. For example:
+
+```c++
+auto u = Unique(scope, a);
+// u.y has the unique values and u.idx has the unique indices
+auto m = Add(scope, u.y, b);
+```
+
+* Operations producing a list-typed output return an object that can
+  be indexed using the `[]` operator. That object can also be directly passed to
+  other constructors that expect list-typed inputs. For example:
+
+```c++
+auto s = Split(scope, 0, a, 2);
+// Access elements of the returned list.
+auto b = Add(scope, s[0], s[1]);
+// Pass the list as a whole to other constructors.
+auto c = Concat(scope, s, 0);
+```
+
+### Constants
+
+You may pass many different types of C++ values directly to tensor
+constants. You may explicitly create a tensor constant by calling the
+@{tensorflow::ops::Const} function from various kinds of C++ values. For
+example:
+
+* Scalars
+
+```c++
+auto f = Const(scope, 42.0f);
+auto s = Const(scope, "hello world!");
+```
+
+* Nested initializer lists
+
+```c++
+// 2x2 matrix
+auto c1 = Const(scope, {{1, 2}, {2, 4}});
+// 1x3x1 tensor
+auto c2 = Const(scope, {{{1}, {2}, {3}}});
+// 1x2x0 tensor
+auto c3 = ops::Const(scope, {{{}, {}}});
+```
+
+* Shapes explicitly specified
+
+```c++
+// 2x2 matrix with all elements = 10
+auto c1 = Const(scope, 10, /* shape */ {2, 2});
+// 1x3x2x1 tensor
+auto c2 = Const(scope, {1, 2, 3, 4, 5, 6}, /* shape */ {1, 3, 2, 1});
+```
+
+You may directly pass constants to other operation constructors, either by
+explicitly constructing one using the `Const` function, or implicitly as any of
+the above types of C++ values. For example:
+
+```c++
+// [1 1] * [41; 1]
+auto x = MatMul(scope, {{1, 1}}, {{41}, {1}});
+// [1 2 3 4] + 10
+auto y = Add(scope, {1, 2, 3, 4}, 10);
+```
+
+## Graph Execution
+
+When executing a graph, you will need a session. The C++ API provides a
+@{tensorflow::ClientSession} class that will execute ops created by the
+operation constructors. TensorFlow will automatically determine which parts of
+the graph need to be executed, and what values need feeding. For example:
+
+```c++
+Scope root = Scope::NewRootScope();
+auto c = Const(root, {{1, 1}});
+auto m = MatMul(root, c, {{42}, {1}});
+
+ClientSession session(root);
+std::vector<Tensor> outputs;
+session.Run({m}, &outputs);
+// outputs[0] == {42}
+```
+
+Similarly, the object returned by the operation constructor can be used as the
+argument to specify a value being fed when executing the graph. Furthermore, the
+value to feed can be specified with the different kinds of C++ values used to
+specify tensor constants. For example:
+
+```c++
+Scope root = Scope::NewRootScope();
+auto a = Placeholder(root, DT_INT32);
+// [3 3; 3 3]
+auto b = Const(root, 3, {2, 2});
+auto c = Add(root, a, b);
+ClientSession session(root);
+std::vector<Tensor> outputs;
+
+// Feed a <- [1 2; 3 4]
+session.Run({{a, {{1, 2}, {3, 4}}}}, {c}, &outputs);
+// outputs[0] == [4 5; 6 7]
+```
+
+Please see the @{tensorflow::Tensor} documentation for more information on how
+to use the execution output.
--- a/tensorflow/docs_src/api_guides/python/_toc.yaml
+++ b/tensorflow/docs_src/api_guides/python/_toc.yaml
@ -0,0 +1,91 @@
+toc:
+- title: Python API Guides
+  section:
+  - title: "Asserts and boolean checks"
+    path: /TARGET_DOC_ROOT/api_guides/python/check_ops
+  - title: "Building Graphs"
+    path: /TARGET_DOC_ROOT/api_guides/python/framework
+  - title: "Constants, Sequences, and Random Values"
+    path: /TARGET_DOC_ROOT/api_guides/python/constant_op
+  - title: "Control Flow"
+    path: /TARGET_DOC_ROOT/api_guides/python/control_flow_ops
+  - title: "Data IO (Python functions)"
+    path: /TARGET_DOC_ROOT/api_guides/python/python_io
+  - title: "Higher Order Functions"
+    path: /TARGET_DOC_ROOT/api_guides/python/functional_ops
+  - title: "Histograms"
+    path: /TARGET_DOC_ROOT/api_guides/python/histogram_ops
+  - title: "Images"
+    path: /TARGET_DOC_ROOT/api_guides/python/image
+  - title: "Inputs and Readers"
+    path: /TARGET_DOC_ROOT/api_guides/python/io_ops
+  - title: "Math"
+    path: /TARGET_DOC_ROOT/api_guides/python/math_ops
+  - title: "Neural Network"
+    path: /TARGET_DOC_ROOT/api_guides/python/nn
+  - title: "Running Graphs"
+    path: /TARGET_DOC_ROOT/api_guides/python/client
+  - title: "Sparse Tensors"
+    path: /TARGET_DOC_ROOT/api_guides/python/sparse_ops
+  - title: "Strings"
+    path: /TARGET_DOC_ROOT/api_guides/python/string_ops
+  - title: "Summary Operations"
+    path: /TARGET_DOC_ROOT/api_guides/python/summary
+  - title: "TensorFlow Debugger"
+    path: /TARGET_DOC_ROOT/api_guides/python/tfdbg
+  - title: "Tensor Handle Operations"
+    path: /TARGET_DOC_ROOT/api_guides/python/session_ops
+  - title: "Tensor Transformations"
+    path: /TARGET_DOC_ROOT/api_guides/python/array_ops
+  - title: "Testing"
+    path: /TARGET_DOC_ROOT/api_guides/python/test
+  - title: "Training"
+    path: /TARGET_DOC_ROOT/api_guides/python/train
+  - title: "Variables"
+    path: /TARGET_DOC_ROOT/api_guides/python/state_ops
+  - title: "Wraps python functions"
+    path: /TARGET_DOC_ROOT/api_guides/python/script_ops
+  - title: "BayesFlow Entropy (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.bayesflow.entropy
+  - title: "BayesFlow Monte Carlo (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.bayesflow.monte_carlo
+  - title: "BayesFlow Stochastic Graph (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.bayesflow.stochastic_graph
+  - title: "BayesFlow Stochastic Tensors (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.bayesflow.stochastic_tensor
+  - title: "BayesFlow Variational Inference (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.bayesflow.variational_inference
+  - title: "Copying Graph Elements (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.copy_graph
+  - title: "CRF (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.crf
+  - title: "FFmpeg (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.ffmpeg
+  - title: "Framework (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.framework
+  - title: "Graph Editor (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.graph_editor
+  - title: "Integrate (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.integrate
+  - title: "Layers (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.layers
+  - title: "Learn (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.learn
+  - title: "Linear Algebra (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.linalg
+  - title: "Losses (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.losses
+  - title: "Metrics (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.metrics
+  - title: "Optimization (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.opt
+  - title: "Random variable transformations (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.distributions.bijector
+  - title: "RNN and Cells (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.rnn
+  - title: "Statistical Distributions (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.distributions
+  - title: "Training (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.training
+  - title: "Utilities (contrib)"
+    path: /TARGET_DOC_ROOT/api_guides/python/contrib.util
--- a/tensorflow/docs_src/api_guides/python/array_ops.md
+++ b/tensorflow/docs_src/api_guides/python/array_ops.md
@ -0,0 +1,87 @@
+# Tensor Transformations
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Casting
+
+TensorFlow provides several operations that you can use to cast tensor data
+types in your graph.
+
+*   @{tf.string_to_number}
+*   @{tf.to_double}
+*   @{tf.to_float}
+*   @{tf.to_bfloat16}
+*   @{tf.to_int32}
+*   @{tf.to_int64}
+*   @{tf.cast}
+*   @{tf.bitcast}
+*   @{tf.saturate_cast}
+
+## Shapes and Shaping
+
+TensorFlow provides several operations that you can use to determine the shape
+of a tensor and change the shape of a tensor.
+
+*   @{tf.broadcast_dynamic_shape}
+*   @{tf.broadcast_static_shape}
+*   @{tf.shape}
+*   @{tf.shape_n}
+*   @{tf.size}
+*   @{tf.rank}
+*   @{tf.reshape}
+*   @{tf.squeeze}
+*   @{tf.expand_dims}
+*   @{tf.meshgrid}
+
+## Slicing and Joining
+
+TensorFlow provides several operations to slice or extract parts of a tensor,
+or join multiple tensors together.
+
+*   @{tf.slice}
+*   @{tf.strided_slice}
+*   @{tf.split}
+*   @{tf.tile}
+*   @{tf.pad}
+*   @{tf.concat}
+*   @{tf.stack}
+*   @{tf.parallel_stack}
+*   @{tf.unstack}
+*   @{tf.reverse_sequence}
+*   @{tf.reverse}
+*   @{tf.reverse_v2}
+*   @{tf.transpose}
+*   @{tf.extract_image_patches}
+*   @{tf.space_to_batch_nd}
+*   @{tf.space_to_batch}
+*   @{tf.required_space_to_batch_paddings}
+*   @{tf.batch_to_space_nd}
+*   @{tf.batch_to_space}
+*   @{tf.space_to_depth}
+*   @{tf.depth_to_space}
+*   @{tf.gather}
+*   @{tf.gather_nd}
+*   @{tf.unique_with_counts}
+*   @{tf.scatter_nd}
+*   @{tf.dynamic_partition}
+*   @{tf.dynamic_stitch}
+*   @{tf.boolean_mask}
+*   @{tf.one_hot}
+*   @{tf.sequence_mask}
+*   @{tf.dequantize}
+*   @{tf.quantize_v2}
+*   @{tf.quantized_concat}
+*   @{tf.setdiff1d}
+
+## Fake quantization
+Operations used to help train for better quantization accuracy.
+
+*   @{tf.fake_quant_with_min_max_args}
+*   @{tf.fake_quant_with_min_max_args_gradient}
+*   @{tf.fake_quant_with_min_max_vars}
+*   @{tf.fake_quant_with_min_max_vars_gradient}
+*   @{tf.fake_quant_with_min_max_vars_per_channel}
+*   @{tf.fake_quant_with_min_max_vars_per_channel_gradient}
--- a/tensorflow/docs_src/api_guides/python/check_ops.md
+++ b/tensorflow/docs_src/api_guides/python/check_ops.md
@ -0,0 +1,19 @@
+# Asserts and boolean checks
+
+*   @{tf.assert_negative}
+*   @{tf.assert_positive}
+*   @{tf.assert_proper_iterable}
+*   @{tf.assert_non_negative}
+*   @{tf.assert_non_positive}
+*   @{tf.assert_equal}
+*   @{tf.assert_integer}
+*   @{tf.assert_less}
+*   @{tf.assert_less_equal}
+*   @{tf.assert_greater}
+*   @{tf.assert_greater_equal}
+*   @{tf.assert_rank}
+*   @{tf.assert_rank_at_least}
+*   @{tf.assert_type}
+*   @{tf.is_non_decreasing}
+*   @{tf.is_numeric_tensor}
+*   @{tf.is_strictly_increasing}
--- a/tensorflow/docs_src/api_guides/python/client.md
+++ b/tensorflow/docs_src/api_guides/python/client.md
@ -0,0 +1,36 @@
+# Running Graphs
+[TOC]
+
+This library contains classes for launching graphs and executing operations.
+
+The @{$get_started} guide has
+examples of how a graph is launched in a @{tf.Session}.
+
+## Session management
+
+*   @{tf.Session}
+*   @{tf.InteractiveSession}
+*   @{tf.get_default_session}
+
+## Error classes and convenience functions
+
+*   @{tf.OpError}
+*   @{tf.errors.CancelledError}
+*   @{tf.errors.UnknownError}
+*   @{tf.errors.InvalidArgumentError}
+*   @{tf.errors.DeadlineExceededError}
+*   @{tf.errors.NotFoundError}
+*   @{tf.errors.AlreadyExistsError}
+*   @{tf.errors.PermissionDeniedError}
+*   @{tf.errors.UnauthenticatedError}
+*   @{tf.errors.ResourceExhaustedError}
+*   @{tf.errors.FailedPreconditionError}
+*   @{tf.errors.AbortedError}
+*   @{tf.errors.OutOfRangeError}
+*   @{tf.errors.UnimplementedError}
+*   @{tf.errors.InternalError}
+*   @{tf.errors.UnavailableError}
+*   @{tf.errors.DataLossError}
+*   @{tf.errors.exception_type_from_error_code}
+*   @{tf.errors.error_code_from_exception_type}
+*   @{tf.errors.raise_exception_on_not_ok_status}
--- a/tensorflow/docs_src/api_guides/python/constant_op.md
+++ b/tensorflow/docs_src/api_guides/python/constant_op.md
@ -0,0 +1,87 @@
+# Constants, Sequences, and Random Values
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Constant Value Tensors
+
+TensorFlow provides several operations that you can use to generate constants.
+
+*   @{tf.zeros}
+*   @{tf.zeros_like}
+*   @{tf.ones}
+*   @{tf.ones_like}
+*   @{tf.fill}
+*   @{tf.constant}
+
+## Sequences
+
+*   @{tf.linspace}
+*   @{tf.range}
+
+## Random Tensors
+
+TensorFlow has several ops that create random tensors with different
+distributions.  The random ops are stateful, and create new random values each
+time they are evaluated.
+
+The `seed` keyword argument in these functions acts in conjunction with
+the graph-level random seed. Changing either the graph-level seed using
+@{tf.set_random_seed} or the
+op-level seed will change the underlying seed of these operations. Setting
+neither graph-level nor op-level seed, results in a random seed for all
+operations.
+See @{tf.set_random_seed}
+for details on the interaction between operation-level and graph-level random
+seeds.
+
+### Examples:
+
+```python
+# Create a tensor of shape [2, 3] consisting of random normal values, with mean
+# -1 and standard deviation 4.
+norm = tf.random_normal([2, 3], mean=-1, stddev=4)
+
+# Shuffle the first dimension of a tensor
+c = tf.constant([[1, 2], [3, 4], [5, 6]])
+shuff = tf.random_shuffle(c)
+
+# Each time we run these ops, different results are generated
+sess = tf.Session()
+print(sess.run(norm))
+print(sess.run(norm))
+
+# Set an op-level seed to generate repeatable sequences across sessions.
+norm = tf.random_normal([2, 3], seed=1234)
+sess = tf.Session()
+print(sess.run(norm))
+print(sess.run(norm))
+sess = tf.Session()
+print(sess.run(norm))
+print(sess.run(norm))
+```
+
+Another common use of random values is the initialization of variables. Also see
+the @{$variables$Variables How To}.
+
+```python
+# Use random uniform values in [0, 1) as the initializer for a variable of shape
+# [2, 3]. The default type is float32.
+var = tf.Variable(tf.random_uniform([2, 3]), name="var")
+init = tf.global_variables_initializer()
+
+sess = tf.Session()
+sess.run(init)
+print(sess.run(var))
+```
+
+*   @{tf.random_normal}
+*   @{tf.truncated_normal}
+*   @{tf.random_uniform}
+*   @{tf.random_shuffle}
+*   @{tf.random_crop}
+*   @{tf.multinomial}
+*   @{tf.random_gamma}
+*   @{tf.set_random_seed}
--- a/tensorflow/docs_src/api_guides/python/contrib.bayesflow.entropy.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.bayesflow.entropy.md
@ -0,0 +1,47 @@
+# BayesFlow Entropy (contrib)
+[TOC]
+
+Entropy Ops.
+
+## Background
+
+Common Shannon entropy, the Evidence Lower BOund (ELBO), KL divergence, and more
+all have information theoretic use and interpretations.  They are also often
+used in variational inference.  This library brings together `Ops` for
+estimating them, e.g. using Monte Carlo expectations.
+
+## Examples
+
+Example of fitting a variational posterior with the ELBO.
+
+```python
+# We start by assuming knowledge of the log of a joint density p(z, x) over
+# latent variable z and fixed measurement x.  Since x is fixed, the Python
+# function does not take x as an argument.
+def log_joint(z):
+  theta = tf.Variable(0.)  # Trainable variable that helps define log_joint.
+  ...
+
+# Next, define a Normal distribution with trainable parameters.
+q = distributions.Normal(mu=tf.Variable(0.), sigma=tf.Variable(1.))
+
+# Now, define a loss function (negative ELBO) that, when minimized, will adjust
+# mu, sigma, and theta, increasing the ELBO, which we hope will both reduce the
+# KL divergence between q(z) and p(z | x), and increase p(x).  Note that we
+# cannot guarantee both, but in general we expect both to happen.
+elbo = entropy.elbo_ratio(log_p, q, n=10)
+loss = -elbo
+
+# Minimize the loss
+train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
+tf.global_variables_initializer().run()
+for step in range(100):
+  train_op.run()
+```
+
+## Ops
+
+*   @{tf.contrib.bayesflow.entropy.elbo_ratio}
+*   @{tf.contrib.bayesflow.entropy.entropy_shannon}
+*   @{tf.contrib.bayesflow.entropy.renyi_ratio}
+*   @{tf.contrib.bayesflow.entropy.renyi_alpha}
--- a/tensorflow/docs_src/api_guides/python/contrib.bayesflow.monte_carlo.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.bayesflow.monte_carlo.md
@ -0,0 +1,54 @@
+# BayesFlow Monte Carlo (contrib)
+[TOC]
+
+Monte Carlo integration and helpers.
+
+## Background
+
+Monte Carlo integration refers to the practice of estimating an expectation with
+a sample mean.  For example, given random variable `Z in R^k` with density `p`,
+the expectation of function `f` can be approximated like:
+
+```
+E_p[f(Z)] = \int f(z) p(z) dz
+          ~ S_n
+          := n^{-1} \sum_{i=1}^n f(z_i),  z_i iid samples from p.
+```
+
+If `E_p[|f(Z)|] < infinity`, then `S_n --> E_p[f(Z)]` by the strong law of large
+numbers.  If `E_p[f(Z)^2] < infinity`, then `S_n` is asymptotically normal with
+variance `Var[f(Z)] / n`.
+
+Practitioners of Bayesian statistics often find themselves wanting to estimate
+`E_p[f(Z)]` when the distribution `p` is known only up to a constant.  For
+example, the joint distribution `p(z, x)` may be known, but the evidence
+`p(x) = \int p(z, x) dz` may be intractable.  In that case, a parameterized
+distribution family `q_lambda(z)` may be chosen, and the optimal `lambda` is the
+one minimizing the KL divergence between `q_lambda(z)` and
+`p(z | x)`.  We only know `p(z, x)`, but that is sufficient to find `lambda`.
+
+
+## Log-space evaluation and subtracting the maximum
+
+Care must be taken when the random variable lives in a high dimensional space.
+For example, the naive importance sample estimate `E_q[f(Z) p(Z) / q(Z)]`
+involves the ratio of two terms `p(Z) / q(Z)`, each of which must have tails
+dropping off faster than `O(|z|^{-(k + 1)})` in order to have finite integral.
+This ratio would often be zero or infinity up to numerical precision.
+
+For that reason, we write
+
+```
+Log E_q[ f(Z) p(Z) / q(Z) ]
+   = Log E_q[ exp{Log[f(Z)] + Log[p(Z)] - Log[q(Z)] - C} ] + C,  where
+C := Max[ Log[f(Z)] + Log[p(Z)] - Log[q(Z)] ].
+```
+
+The maximum value of the exponentiated term will be 0.0, and the expectation
+can be evaluated in a stable manner.
+
+## Ops
+
+*   @{tf.contrib.bayesflow.monte_carlo.expectation}
+*   @{tf.contrib.bayesflow.monte_carlo.expectation_importance_sampler}
+*   @{tf.contrib.bayesflow.monte_carlo.expectation_importance_sampler_logspace}
--- a/tensorflow/docs_src/api_guides/python/contrib.bayesflow.stochastic_graph.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.bayesflow.stochastic_graph.md
@ -0,0 +1,8 @@
+# BayesFlow Stochastic Graph (contrib)
+[TOC]
+
+Classes and helper functions for Stochastic Computation Graphs.
+
+## Stochastic Computation Graph Helper Functions
+
+*   @{tf.contrib.bayesflow.stochastic_graph.surrogate_loss}
--- a/tensorflow/docs_src/api_guides/python/contrib.bayesflow.stochastic_tensor.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.bayesflow.stochastic_tensor.md
@ -0,0 +1,24 @@
+# BayesFlow Stochastic Tensors (contrib)
+[TOC]
+
+Classes and helper functions for creating Stochastic Tensors.
+
+`StochasticTensor` objects wrap `Distribution` objects.  Their
+values may be samples from the underlying distribution, or the distribution
+mean (as governed by `value_type`).  These objects provide a `loss`
+method for use when sampling from a non-reparameterized distribution.
+The `loss`method is used in conjunction with `stochastic_graph.surrogate_loss`
+to produce a single differentiable loss in stochastic graphs having
+both continuous and discrete stochastic nodes.
+
+## Stochastic Tensor Classes
+
+*   @{tf.contrib.bayesflow.stochastic_tensor.BaseStochasticTensor}
+*   @{tf.contrib.bayesflow.stochastic_tensor.StochasticTensor}
+
+## Stochastic Tensor Value Types
+
+*   @{tf.contrib.bayesflow.stochastic_tensor.MeanValue}
+*   @{tf.contrib.bayesflow.stochastic_tensor.SampleValue}
+*   @{tf.contrib.bayesflow.stochastic_tensor.value_type}
+*   @{tf.contrib.bayesflow.stochastic_tensor.get_current_value_type}
--- a/tensorflow/docs_src/api_guides/python/contrib.bayesflow.variational_inference.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.bayesflow.variational_inference.md
@ -0,0 +1,11 @@
+# BayesFlow Variational Inference (contrib)
+[TOC]
+
+Variational inference.
+
+## Ops
+
+*   @{tf.contrib.bayesflow.variational_inference.elbo}
+*   @{tf.contrib.bayesflow.variational_inference.elbo_with_log_joint}
+*   @{tf.contrib.bayesflow.variational_inference.ELBOForms}
+*   @{tf.contrib.bayesflow.variational_inference.register_prior}
--- a/tensorflow/docs_src/api_guides/python/contrib.copy_graph.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.copy_graph.md
@ -0,0 +1,4 @@
+# Copying Graph Elements (contrib)
+[TOC]
+
+Functions for copying elements from one graph to another.
--- a/tensorflow/docs_src/api_guides/python/contrib.crf.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.crf.md
@ -0,0 +1,11 @@
+# CRF (contrib)
+
+Linear-chain CRF layer.
+
+*   @{tf.contrib.crf.crf_sequence_score}
+*   @{tf.contrib.crf.crf_log_norm}
+*   @{tf.contrib.crf.crf_log_likelihood}
+*   @{tf.contrib.crf.crf_unary_score}
+*   @{tf.contrib.crf.crf_binary_score}
+*   @{tf.contrib.crf.CrfForwardRnnCell}
+*   @{tf.contrib.crf.viterbi_decode}
--- a/tensorflow/docs_src/api_guides/python/contrib.distributions.bijector.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.distributions.bijector.md
@ -0,0 +1,33 @@
+# Random variable transformations (contrib)
+[TOC]
+
+Bijector Ops.
+
+An API for invertible, differentiable transformations of random variables.
+
+## Background
+
+Differentiable, bijective transformations of continuous random variables alter
+the calculations made in the cumulative/probability distribution functions and
+sample function.  This module provides a standard interface for making these
+manipulations.
+
+For more details and examples, see the `Bijector` docstring.
+
+To apply a `Bijector`, use `distributions.TransformedDistribution`.
+
+## Bijectors
+
+*   @{tf.contrib.distributions.bijector.Affine}
+*   @{tf.contrib.distributions.bijector.AffineLinearOperator}
+*   @{tf.contrib.distributions.bijector.Bijector}
+*   @{tf.contrib.distributions.bijector.Chain}
+*   @{tf.contrib.distributions.bijector.CholeskyOuterProduct}
+*   @{tf.contrib.distributions.bijector.Exp}
+*   @{tf.contrib.distributions.bijector.Identity}
+*   @{tf.contrib.distributions.bijector.Inline}
+*   @{tf.contrib.distributions.bijector.Invert}
+*   @{tf.contrib.distributions.bijector.PowerTransform}
+*   @{tf.contrib.distributions.bijector.SigmoidCentered}
+*   @{tf.contrib.distributions.bijector.SoftmaxCentered}
+*   @{tf.contrib.distributions.bijector.Softplus}
--- a/tensorflow/docs_src/api_guides/python/contrib.distributions.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.distributions.md
@ -0,0 +1,84 @@
+# Statistical Distributions (contrib)
+[TOC]
+
+Classes representing statistical distributions and ops for working with them.
+
+## Classes for statistical distributions
+
+Classes that represent batches of statistical distributions.  Each class is
+initialized with parameters that define the distributions.
+
+## Base classes
+
+*   @{tf.contrib.distributions.ReparameterizationType}
+*   @{tf.contrib.distributions.Distribution}
+
+## Univariate (scalar) distributions
+
+*   @{tf.contrib.distributions.Binomial}
+*   @{tf.contrib.distributions.Bernoulli}
+*   @{tf.contrib.distributions.BernoulliWithSigmoidProbs}
+*   @{tf.contrib.distributions.Beta}
+*   @{tf.contrib.distributions.Categorical}
+*   @{tf.contrib.distributions.Chi2}
+*   @{tf.contrib.distributions.Chi2WithAbsDf}
+*   @{tf.contrib.distributions.Exponential}
+*   @{tf.contrib.distributions.Gamma}
+*   @{tf.contrib.distributions.InverseGamma}
+*   @{tf.contrib.distributions.Laplace}
+*   @{tf.contrib.distributions.LaplaceWithSoftplusScale}
+*   @{tf.contrib.distributions.Normal}
+*   @{tf.contrib.distributions.NormalWithSoftplusScale}
+*   @{tf.contrib.distributions.Poisson}
+*   @{tf.contrib.distributions.StudentT}
+*   @{tf.contrib.distributions.StudentTWithAbsDfSoftplusScale}
+*   @{tf.contrib.distributions.Uniform}
+
+## Multivariate distributions
+
+### Multivariate normal
+
+*   @{tf.contrib.distributions.MultivariateNormalDiag}
+*   @{tf.contrib.distributions.MultivariateNormalTriL}
+*   @{tf.contrib.distributions.MultivariateNormalDiagPlusLowRank}
+*   @{tf.contrib.distributions.MultivariateNormalDiagWithSoftplusScale}
+
+### Other multivariate distributions
+
+*   @{tf.contrib.distributions.Dirichlet}
+*   @{tf.contrib.distributions.DirichletMultinomial}
+*   @{tf.contrib.distributions.Multinomial}
+*   @{tf.contrib.distributions.WishartCholesky}
+*   @{tf.contrib.distributions.WishartFull}
+
+### Multivariate Utilities
+
+*   @{tf.contrib.distributions.matrix_diag_transform}
+
+## Transformed distributions
+
+*   @{tf.contrib.distributions.TransformedDistribution}
+*   @{tf.contrib.distributions.QuantizedDistribution}
+
+## Mixture Models
+
+*   @{tf.contrib.distributions.Mixture}
+
+## Posterior inference with conjugate priors
+
+Functions that transform conjugate prior/likelihood pairs to distributions
+representing the posterior or posterior predictive.
+
+## Normal likelihood with conjugate prior
+
+*   @{tf.contrib.distributions.normal_conjugates_known_scale_posterior}
+*   @{tf.contrib.distributions.normal_conjugates_known_scale_predictive}
+
+## Kullback-Leibler Divergence
+
+*   @{tf.contrib.distributions.kl}
+*   @{tf.contrib.distributions.RegisterKL}
+
+## Utilities
+
+*   @{tf.contrib.distributions.softplus_inverse}
--- a/tensorflow/docs_src/api_guides/python/contrib.ffmpeg.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.ffmpeg.md
@ -0,0 +1,23 @@
+# FFmpeg (contrib)
+[TOC]
+
+## Encoding and decoding audio using FFmpeg
+
+TensorFlow provides Ops to decode and encode audio files using the
+[FFmpeg](https://www.ffmpeg.org/) library. FFmpeg must be
+locally [installed](https://ffmpeg.org/download.html) for these Ops to succeed.
+
+Example:
+
+```python
+from tensorflow.contrib import ffmpeg
+
+audio_binary = tf.read_file('song.mp3')
+waveform = ffmpeg.decode_audio(
+    audio_binary, file_format='mp3', samples_per_second=44100, channel_count=2)
+uncompressed_binary = ffmpeg.encode_audio(
+    waveform, file_format='wav', samples_per_second=44100)
+```
+
+*   @{tf.contrib.ffmpeg.decode_audio}
+*   @{tf.contrib.ffmpeg.encode_audio}
--- a/tensorflow/docs_src/api_guides/python/contrib.framework.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.framework.md
@ -0,0 +1,61 @@
+# Framework (contrib)
+[TOC]
+
+Framework utilities.
+
+*   @{tf.contrib.framework.assert_same_float_dtype}
+*   @{tf.contrib.framework.assert_scalar}
+*   @{tf.contrib.framework.assert_scalar_int}
+*   @{tf.convert_to_tensor_or_sparse_tensor}
+*   @{tf.contrib.framework.get_graph_from_inputs}
+*   @{tf.is_numeric_tensor}
+*   @{tf.is_non_decreasing}
+*   @{tf.is_strictly_increasing}
+*   @{tf.contrib.framework.is_tensor}
+*   @{tf.contrib.framework.reduce_sum_n}
+*   @{tf.contrib.framework.remove_squeezable_dimensions}
+*   @{tf.contrib.framework.with_shape}
+*   @{tf.contrib.framework.with_same_shape}
+
+## Deprecation
+*   @{tf.contrib.framework.deprecated}
+*   @{tf.contrib.framework.deprecated_args}
+*   @{tf.contrib.framework.deprecated_arg_values}
+
+## Arg_Scope
+*   @{tf.contrib.framework.arg_scope}
+*   @{tf.contrib.framework.add_arg_scope}
+*   @{tf.contrib.framework.has_arg_scope}
+*   @{tf.contrib.framework.arg_scoped_arguments}
+
+## Variables
+*   @{tf.contrib.framework.add_model_variable}
+*   @{tf.train.assert_global_step}
+*   @{tf.contrib.framework.assert_or_get_global_step}
+*   @{tf.contrib.framework.assign_from_checkpoint}
+*   @{tf.contrib.framework.assign_from_checkpoint_fn}
+*   @{tf.contrib.framework.assign_from_values}
+*   @{tf.contrib.framework.assign_from_values_fn}
+*   @{tf.contrib.framework.create_global_step}
+*   @{tf.contrib.framework.filter_variables}
+*   @{tf.train.get_global_step}
+*   @{tf.contrib.framework.get_or_create_global_step}
+*   @{tf.contrib.framework.get_local_variables}
+*   @{tf.contrib.framework.get_model_variables}
+*   @{tf.contrib.framework.get_unique_variable}
+*   @{tf.contrib.framework.get_variables_by_name}
+*   @{tf.contrib.framework.get_variables_by_suffix}
+*   @{tf.contrib.framework.get_variables_to_restore}
+*   @{tf.contrib.framework.get_variables}
+*   @{tf.contrib.framework.local_variable}
+*   @{tf.contrib.framework.model_variable}
+*   @{tf.contrib.framework.variable}
+*   @{tf.contrib.framework.VariableDeviceChooser}
+*   @{tf.contrib.framework.zero_initializer}
+
+## Checkpoint utilities
+
+*   @{tf.contrib.framework.load_checkpoint}
+*   @{tf.contrib.framework.list_variables}
+*   @{tf.contrib.framework.load_variable}
+*   @{tf.contrib.framework.init_from_checkpoint}
--- a/tensorflow/docs_src/api_guides/python/contrib.graph_editor.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.graph_editor.md
@ -0,0 +1,182 @@
+# Graph Editor (contrib)
+[TOC]
+
+TensorFlow Graph Editor.
+
+The TensorFlow Graph Editor library allows for modification of an existing
+`tf.Graph` instance in-place.
+
+The author's github username is [purpledog](https://github.com/purpledog).
+
+## Library overview
+
+Appending new nodes is the only graph editing operation allowed by the
+TensorFlow core library. The Graph Editor library is an attempt to allow for
+other kinds of editing operations, namely, *rerouting* and *transforming*.
+
+* *rerouting* is a local operation consisting in re-plugging existing tensors
+  (the edges of the graph). Operations (the nodes) are not modified by this
+  operation. For example, rerouting can be used to insert an operation adding
+  noise in place of an existing tensor.
+* *transforming* is a global operation consisting in transforming a graph into
+  another. By default, a transformation is a simple copy but it can be
+  customized to achieved other goals. For instance, a graph can be transformed
+  into another one in which noise is added after all the operations of a
+  specific type.
+
+**Important: modifying a graph in-place with the Graph Editor must be done
+`offline`, that is, without any active sessions.**
+
+Of course new operations can be appended online but Graph Editor specific
+operations like rerouting and transforming can currently only be done offline.
+
+Here is an example of what you **cannot** do:
+
+* Build a graph.
+* Create a session and run the graph.
+* Modify the graph with the Graph Editor.
+* Re-run the graph with the `same` previously created session.
+
+To edit an already running graph, follow these steps:
+
+* Build a graph.
+* Create a session and run the graph.
+* Save the graph state and terminate the session
+* Modify the graph with the Graph Editor.
+* create a new session and restore the graph state
+* Re-run the graph with the newly created session.
+
+Note that this procedure is very costly because a new session must be created
+after any modifications. Among other things, it takes time because the entire
+graph state must be saved and restored again.
+
+## Sub-graph
+
+Most of the functions in the Graph Editor library operate on *sub-graph*.
+More precisely, they take as input arguments instances of the SubGraphView class
+(or anything which can be converted to it). Doing so allows the same function
+to transparently operate on single operations as well as sub-graph of any size.
+
+A subgraph can be created in several ways:
+
+* using a list of ops:
+
+```python
+my_sgv = ge.sgv(ops)
+```
+
+* from a name scope:
+
+```python
+my_sgv = ge.sgv_scope("foo/bar", graph=tf.get_default_graph())
+```
+
+* using regular expression:
+
+```python
+my_sgv = ge.sgv("foo/.*/.*read$", graph=tf.get_default_graph())
+```
+
+Note that the Graph Editor is meant to manipulate several graphs at the same
+time, typically during transform or copy operation. For that reason,
+to avoid any confusion, the default graph is never used and the graph on
+which to operate must always be given explicitly. This is the reason why
+*`graph=tf.get_default_graph()`* is used in the code snippets above.
+
+## Modules overview
+
+* util: utility functions.
+* select: various selection methods of TensorFlow tensors and operations.
+* match: TensorFlow graph matching. Think of this as regular expressions for
+  graphs (but not quite yet).
+* reroute: various ways of rerouting tensors to different consuming ops like
+  *swap* or *reroute_a2b*.
+* subgraph: the SubGraphView class, which enables subgraph manipulations in a
+  TensorFlow `tf.Graph`.
+* edit: various editing functions operating on subgraphs like *detach*,
+  *connect* or *bypass*.
+* transform: the Transformer class, which enables transforming
+  (or simply copying) a subgraph into another one.
+
+## Module: util
+
+*   @{tf.contrib.graph_editor.make_list_of_op}
+*   @{tf.contrib.graph_editor.get_tensors}
+*   @{tf.contrib.graph_editor.make_list_of_t}
+*   @{tf.contrib.graph_editor.get_generating_ops}
+*   @{tf.contrib.graph_editor.get_consuming_ops}
+*   @{tf.contrib.graph_editor.ControlOutputs}
+*   @{tf.contrib.graph_editor.placeholder_name}
+*   @{tf.contrib.graph_editor.make_placeholder_from_tensor}
+*   @{tf.contrib.graph_editor.make_placeholder_from_dtype_and_shape}
+
+## Module: select
+
+*   @{tf.contrib.graph_editor.filter_ts}
+*   @{tf.contrib.graph_editor.filter_ts_from_regex}
+*   @{tf.contrib.graph_editor.filter_ops}
+*   @{tf.contrib.graph_editor.filter_ops_from_regex}
+*   @{tf.contrib.graph_editor.get_name_scope_ops}
+*   @{tf.contrib.graph_editor.check_cios}
+*   @{tf.contrib.graph_editor.get_ops_ios}
+*   @{tf.contrib.graph_editor.compute_boundary_ts}
+*   @{tf.contrib.graph_editor.get_within_boundary_ops}
+*   @{tf.contrib.graph_editor.get_forward_walk_ops}
+*   @{tf.contrib.graph_editor.get_backward_walk_ops}
+*   @{tf.contrib.graph_editor.get_walks_intersection_ops}
+*   @{tf.contrib.graph_editor.get_walks_union_ops}
+*   @{tf.contrib.graph_editor.select_ops}
+*   @{tf.contrib.graph_editor.select_ts}
+*   @{tf.contrib.graph_editor.select_ops_and_ts}
+
+## Module: subgraph
+
+*   @{tf.contrib.graph_editor.SubGraphView}
+*   @{tf.contrib.graph_editor.make_view}
+*   @{tf.contrib.graph_editor.make_view_from_scope}
+
+## Module: reroute
+
+*   @{tf.contrib.graph_editor.reroute.swap_ts}
+*   @{tf.contrib.graph_editor.reroute.reroute_ts}
+*   @{tf.contrib.graph_editor.reroute.swap_inputs}
+*   @{tf.contrib.graph_editor.reroute.reroute_inputs}
+*   @{tf.contrib.graph_editor.reroute.swap_outputs}
+*   @{tf.contrib.graph_editor.reroute.reroute_outputs}
+*   @{tf.contrib.graph_editor.reroute.swap_ios}
+*   @{tf.contrib.graph_editor.reroute.reroute_ios}
+*   @{tf.contrib.graph_editor.reroute.remove_control_inputs}
+*   @{tf.contrib.graph_editor.reroute.add_control_inputs}
+
+## Module: edit
+
+*   @{tf.contrib.graph_editor.detach_control_inputs}
+*   @{tf.contrib.graph_editor.detach_control_outputs}
+*   @{tf.contrib.graph_editor.detach_inputs}
+*   @{tf.contrib.graph_editor.detach_outputs}
+*   @{tf.contrib.graph_editor.detach}
+*   @{tf.contrib.graph_editor.connect}
+*   @{tf.contrib.graph_editor.bypass}
+
+## Module: transform
+
+*   @{tf.contrib.graph_editor.replace_t_with_placeholder_handler}
+*   @{tf.contrib.graph_editor.keep_t_if_possible_handler}
+*   @{tf.contrib.graph_editor.assign_renamed_collections_handler}
+*   @{tf.contrib.graph_editor.transform_op_if_inside_handler}
+*   @{tf.contrib.graph_editor.copy_op_handler}
+*   @{tf.contrib.graph_editor.Transformer}
+*   @{tf.contrib.graph_editor.copy}
+*   @{tf.contrib.graph_editor.copy_with_input_replacements}
+*   @{tf.contrib.graph_editor.graph_replace}
+
+## Module: match
+
+*   @{tf.contrib.graph_editor.op_type}
+*   @{tf.contrib.graph_editor.OpMatcher}
+
+## Useful aliases
+
+*   @{tf.contrib.graph_editor.ph}
+*   @{tf.contrib.graph_editor.sgv}
+*   @{tf.contrib.graph_editor.sgv_scope}
--- a/tensorflow/docs_src/api_guides/python/contrib.integrate.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.integrate.md
@ -0,0 +1,41 @@
+# Integrate (contrib)
+[TOC]
+
+Integration and ODE solvers for TensorFlow.
+
+## Example: Lorenz attractor
+
+We can use `odeint` to solve the
+[Lorentz system](https://en.wikipedia.org/wiki/Lorenz_system) of ordinary
+differential equations, a prototypical example of chaotic dynamics:
+
+```python
+rho = 28.0
+sigma = 10.0
+beta = 8.0/3.0
+
+def lorenz_equation(state, t):
+  x, y, z = tf.unstack(state)
+  dx = sigma * (y - x)
+  dy = x * (rho - z) - y
+  dz = x * y - beta * z
+  return tf.stack([dx, dy, dz])
+
+init_state = tf.constant([0, 2, 20], dtype=tf.float64)
+t = np.linspace(0, 50, num=5000)
+tensor_state, tensor_info = tf.contrib.integrate.odeint(
+    lorenz_equation, init_state, t, full_output=True)
+
+sess = tf.Session()
+state, info = sess.run([tensor_state, tensor_info])
+x, y, z = state.T
+plt.plot(x, z)
+```
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../../images/lorenz_attractor.png" alt>
+</div>
+
+## Ops
+
+*   @{tf.contrib.integrate.odeint}
--- a/tensorflow/docs_src/api_guides/python/contrib.layers.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.layers.md
@ -0,0 +1,109 @@
+# Layers (contrib)
+[TOC]
+
+Ops for building neural network layers, regularizers, summaries, etc.
+
+## Higher level ops for building neural network layers
+
+This package provides several ops that take care of creating variables that are
+used internally in a consistent way and provide the building blocks for many
+common machine learning algorithms.
+
+*   @{tf.contrib.layers.avg_pool2d}
+*   @{tf.contrib.layers.batch_norm}
+*   @{tf.contrib.layers.convolution2d}
+*   @{tf.contrib.layers.conv2d_in_plane}
+*   @{tf.contrib.layers.convolution2d_in_plane}
+*   @{tf.nn.conv2d_transpose}
+*   @{tf.contrib.layers.convolution2d_transpose}
+*   @{tf.nn.dropout}
+*   @{tf.contrib.layers.flatten}
+*   @{tf.contrib.layers.fully_connected}
+*   @{tf.contrib.layers.layer_norm}
+*   @{tf.contrib.layers.linear}
+*   @{tf.contrib.layers.max_pool2d}
+*   @{tf.contrib.layers.one_hot_encoding}
+*   @{tf.nn.relu}
+*   @{tf.nn.relu6}
+*   @{tf.contrib.layers.repeat}
+*   @{tf.contrib.layers.safe_embedding_lookup_sparse}
+*   @{tf.nn.separable_conv2d}
+*   @{tf.contrib.layers.separable_convolution2d}
+*   @{tf.nn.softmax}
+*   @{tf.stack}
+*   @{tf.contrib.layers.unit_norm}
+*   @{tf.contrib.layers.embed_sequence}
+
+Aliases for fully_connected which set a default activation function are
+available: `relu`, `relu6` and `linear`.
+
+`stack` operation is also available. It builds a stack of layers by applying
+a layer repeatedly.
+
+## Regularizers
+
+Regularization can help prevent overfitting. These have the signature
+`fn(weights)`. The loss is typically added to
+`tf.GraphKeys.REGULARIZATION_LOSSES`.
+
+*   @{tf.contrib.layers.apply_regularization}
+*   @{tf.contrib.layers.l1_regularizer}
+*   @{tf.contrib.layers.l2_regularizer}
+*   @{tf.contrib.layers.sum_regularizer}
+
+## Initializers
+
+Initializers are used to initialize variables with sensible values given their
+size, data type, and purpose.
+
+*   @{tf.contrib.layers.xavier_initializer}
+*   @{tf.contrib.layers.xavier_initializer_conv2d}
+*   @{tf.contrib.layers.variance_scaling_initializer}
+
+## Optimization
+
+Optimize weights given a loss.
+
+*   @{tf.contrib.layers.optimize_loss}
+
+## Summaries
+
+Helper functions to summarize specific variables or ops.
+
+*   @{tf.contrib.layers.summarize_activation}
+*   @{tf.contrib.layers.summarize_tensor}
+*   @{tf.contrib.layers.summarize_tensors}
+*   @{tf.contrib.layers.summarize_collection}
+
+The layers module defines convenience functions `summarize_variables`,
+`summarize_weights` and `summarize_biases`, which set the `collection` argument
+of `summarize_collection` to `VARIABLES`, `WEIGHTS` and `BIASES`, respectively.
+
+*   @{tf.contrib.layers.summarize_activations}
+
+## Feature columns
+
+Feature columns provide a mechanism to map data to a model.
+
+*   @{tf.contrib.layers.bucketized_column}
+*   @{tf.contrib.layers.check_feature_columns}
+*   @{tf.contrib.layers.create_feature_spec_for_parsing}
+*   @{tf.contrib.layers.crossed_column}
+*   @{tf.contrib.layers.embedding_column}
+*   @{tf.contrib.layers.scattered_embedding_column}
+*   @{tf.contrib.layers.input_from_feature_columns}
+*   @{tf.contrib.layers.joint_weighted_sum_from_feature_columns}
+*   @{tf.contrib.layers.make_place_holder_tensors_for_base_features}
+*   @{tf.contrib.layers.multi_class_target}
+*   @{tf.contrib.layers.one_hot_column}
+*   @{tf.contrib.layers.parse_feature_columns_from_examples}
+*   @{tf.contrib.layers.parse_feature_columns_from_sequence_examples}
+*   @{tf.contrib.layers.real_valued_column}
+*   @{tf.contrib.layers.shared_embedding_columns}
+*   @{tf.contrib.layers.sparse_column_with_hash_bucket}
+*   @{tf.contrib.layers.sparse_column_with_integerized_feature}
+*   @{tf.contrib.layers.sparse_column_with_keys}
+*   @{tf.contrib.layers.weighted_sparse_column}
+*   @{tf.contrib.layers.weighted_sum_from_feature_columns}
+*   @{tf.contrib.layers.infer_real_valued_columns}
+*   @{tf.contrib.layers.sequence_input_from_feature_columns}
--- a/tensorflow/docs_src/api_guides/python/contrib.learn.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.learn.md
@ -0,0 +1,62 @@
+# Learn (contrib)
+[TOC]
+
+High level API for learning with TensorFlow.
+
+## Estimators
+
+Train and evaluate TensorFlow models.
+
+*   @{tf.contrib.learn.BaseEstimator}
+*   @{tf.contrib.learn.Estimator}
+*   @{tf.contrib.learn.Trainable}
+*   @{tf.contrib.learn.Evaluable}
+*   @{tf.contrib.learn.KMeansClustering}
+*   @{tf.contrib.learn.ModeKeys}
+*   @{tf.contrib.learn.ModelFnOps}
+*   @{tf.contrib.learn.MetricSpec}
+*   @{tf.contrib.learn.PredictionKey}
+*   @{tf.contrib.learn.DNNClassifier}
+*   @{tf.contrib.learn.DNNRegressor}
+*   @{tf.contrib.learn.DNNLinearCombinedRegressor}
+*   @{tf.contrib.learn.DNNLinearCombinedClassifier}
+*   @{tf.contrib.learn.LinearClassifier}
+*   @{tf.contrib.learn.LinearRegressor}
+*   @{tf.contrib.learn.LogisticRegressor}
+
+## Distributed training utilities
+*   @{tf.contrib.learn.Experiment}
+*   @{tf.contrib.learn.ExportStrategy}
+*   @{tf.contrib.learn.TaskType}
+
+## Graph actions
+
+Perform various training, evaluation, and inference actions on a graph.
+
+*   @{tf.train.NanLossDuringTrainingError}
+*   @{tf.contrib.learn.RunConfig}
+*   @{tf.contrib.learn.evaluate}
+*   @{tf.contrib.learn.infer}
+*   @{tf.contrib.learn.run_feeds}
+*   @{tf.contrib.learn.run_n}
+*   @{tf.contrib.learn.train}
+
+## Input processing
+
+Queue and read batched input data.
+
+*   @{tf.contrib.learn.extract_dask_data}
+*   @{tf.contrib.learn.extract_dask_labels}
+*   @{tf.contrib.learn.extract_pandas_data}
+*   @{tf.contrib.learn.extract_pandas_labels}
+*   @{tf.contrib.learn.extract_pandas_matrix}
+*   @{tf.contrib.learn.infer_real_valued_columns_from_input}
+*   @{tf.contrib.learn.infer_real_valued_columns_from_input_fn}
+*   @{tf.contrib.learn.read_batch_examples}
+*   @{tf.contrib.learn.read_batch_features}
+*   @{tf.contrib.learn.read_batch_record_features}
+
+Export utilities
+
+*   @{tf.contrib.learn.build_parsing_serving_input_fn}
+*   @{tf.contrib.learn.ProblemType}
--- a/tensorflow/docs_src/api_guides/python/contrib.linalg.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.linalg.md
@ -0,0 +1,30 @@
+# Linear Algebra (contrib)
+[TOC]
+
+Linear algebra libraries for TensorFlow.
+
+## `LinearOperator`
+
+Subclasses of `LinearOperator` provide a access to common methods on a
+(batch) matrix, without the need to materialize the matrix.  This allows:
+
+* Matrix free computations
+* Different operators to take advantage of special strcture, while providing a
+  consistent API to users.
+
+### Base class
+
+*   @{tf.contrib.linalg.LinearOperator}
+
+### Individual operators
+
+*   @{tf.contrib.linalg.LinearOperatorDiag}
+*   @{tf.contrib.linalg.LinearOperatorIdentity}
+*   @{tf.contrib.linalg.LinearOperatorScaledIdentity}
+*   @{tf.contrib.linalg.LinearOperatorMatrix}
+*   @{tf.contrib.linalg.LinearOperatorTriL}
+*   @{tf.contrib.linalg.LinearOperatorUDVHUpdate}
+
+### Transformations and Combinations of operators
+
+*   @{tf.contrib.linalg.LinearOperatorComposition}
--- a/tensorflow/docs_src/api_guides/python/contrib.losses.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.losses.md
@ -0,0 +1,126 @@
+# Losses (contrib)
+
+## Loss operations for use in neural networks.
+
+Note: By default all the losses are collected into the `GraphKeys.LOSSES`
+collection.
+
+All of the loss functions take a pair of predictions and ground truth labels,
+from which the loss is computed. It is assumed that the shape of both these
+tensors is of the form [batch_size, d1, ... dN] where `batch_size` is the number
+of samples in the batch and `d1` ... `dN` are the remaining dimensions.
+
+It is common, when training with multiple loss functions, to adjust the relative
+strengths of individual losses. This is performed by rescaling the losses via
+a `weight` parameter passed to the loss functions. For example, if we were
+training with both log_loss and sum_of_squares_loss, and we wished that the
+log_loss penalty be twice as severe as the sum_of_squares_loss, we would
+implement this as:
+
+```python
+  # Explicitly set the weight.
+  tf.contrib.losses.log(predictions, labels, weight=2.0)
+
+  # Uses default weight of 1.0
+  tf.contrib.losses.sum_of_squares(predictions, labels)
+
+  # All the losses are collected into the `GraphKeys.LOSSES` collection.
+  losses = tf.get_collection(tf.GraphKeys.LOSSES)
+```
+
+While specifying a scalar loss rescales the loss over the entire batch,
+we sometimes want to rescale the loss per batch sample. For example, if we have
+certain examples that matter more to us to get correctly, we might want to have
+a higher loss that other samples whose mistakes matter less. In this case, we
+can provide a weight vector of length `batch_size` which results in the loss
+for each sample in the batch being scaled by the corresponding weight element.
+For example, consider the case of a classification problem where we want to
+maximize our accuracy but we especially interested in obtaining high accuracy
+for a specific class:
+
+```python
+  inputs, labels = LoadData(batch_size=3)
+  logits = MyModelPredictions(inputs)
+
+  # Ensures that the loss for examples whose ground truth class is `3` is 5x
+  # higher than the loss for all other examples.
+  weight = tf.multiply(4, tf.cast(tf.equal(labels, 3), tf.float32)) + 1
+
+  onehot_labels = tf.one_hot(labels, num_classes=5)
+  tf.contrib.losses.softmax_cross_entropy(logits, onehot_labels, weight=weight)
+```
+
+Finally, in certain cases, we may want to specify a different loss for every
+single measurable value. For example, if we are performing per-pixel depth
+prediction, or per-pixel denoising, a single batch sample has P values where P
+is the number of pixels in the image. For many losses, the number of measurable
+values matches the number of elements in the predictions and labels tensors.
+For others, such as softmax_cross_entropy and cosine_distance, the
+loss functions reduces the dimensions of the inputs to produces a tensor of
+losses for each measurable value. For example, softmax_cross_entropy takes as
+input predictions and labels of dimension [batch_size, num_classes] but the
+number of measurable values is [batch_size]. Consequently, when passing a weight
+tensor to specify a different loss for every measurable value, the dimension of
+the tensor will depend on the loss being used.
+
+For a concrete example, consider the case of per-pixel depth prediction where
+certain ground truth depth values are missing (due to sensor noise in the
+capture process). In this case, we want to assign zero weight to losses for
+these predictions.
+
+```python
+  # 'depths' that are missing have a value of 0:
+  images, depths = LoadData(...)
+  predictions = MyModelPredictions(images)
+
+  weight = tf.cast(tf.greater(depths, 0), tf.float32)
+  loss  = tf.contrib.losses.sum_of_squares(predictions, depths, weight)
+```
+
+Note that when using weights for the losses, the final average is computed
+by rescaling the losses by the weights and then dividing by the total number of
+non-zero samples. For an arbitrary set of weights, this may not necessarily
+produce a weighted average. Instead, it simply and transparently rescales the
+per-element losses before averaging over the number of observations. For example
+if the losses computed by the loss function is an array [4, 1, 2, 3] and the
+weights are an array [1, 0.5, 3, 9], then the average loss is:
+
+```python
+  (4*1 + 1*0.5 + 2*3 + 3*9) / 4
+```
+
+However, with a single loss function and an arbitrary set of weights, one can
+still easily create a loss function such that the resulting loss is a
+weighted average over the individual prediction errors:
+
+
+```python
+  images, labels = LoadData(...)
+  predictions = MyModelPredictions(images)
+
+  weight = MyComplicatedWeightingFunction(labels)
+  weight = tf.div(weight, tf.size(weight))
+  loss = tf.contrib.losses.sum_of_squares(predictions, depths, weight)
+```
+
+@{tf.contrib.losses.absolute_difference}
+@{tf.contrib.losses.add_loss}
+@{tf.contrib.losses.hinge_loss}
+@{tf.contrib.losses.compute_weighted_loss}
+@{tf.contrib.losses.cosine_distance}
+@{tf.contrib.losses.get_losses}
+@{tf.contrib.losses.get_regularization_losses}
+@{tf.contrib.losses.get_total_loss}
+@{tf.contrib.losses.log_loss}
+@{tf.contrib.losses.mean_pairwise_squared_error}
+@{tf.contrib.losses.mean_squared_error}
+@{tf.contrib.losses.sigmoid_cross_entropy}
+@{tf.contrib.losses.softmax_cross_entropy}
+@{tf.contrib.losses.sparse_softmax_cross_entropy}
+
+The following are deprecated in favor of `mean_pairwise_squared_error` and
+`mean_squared_error`.
+@{tf.contrib.losses.sum_of_pairwise_squares}
+@{tf.contrib.losses.sum_of_squares}
+
+
--- a/tensorflow/docs_src/api_guides/python/contrib.metrics.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.metrics.md
@ -0,0 +1,133 @@
+# Metrics (contrib)
+[TOC]
+
+##Ops for evaluation metrics and summary statistics.
+
+### API
+
+This module provides functions for computing streaming metrics: metrics computed
+on dynamically valued `Tensors`. Each metric declaration returns a
+"value_tensor", an idempotent operation that returns the current value of the
+metric, and an "update_op", an operation that accumulates the information
+from the current value of the `Tensors` being measured as well as returns the
+value of the "value_tensor".
+
+To use any of these metrics, one need only declare the metric, call `update_op`
+repeatedly to accumulate data over the desired number of `Tensor` values (often
+each one is a single batch) and finally evaluate the value_tensor. For example,
+to use the `streaming_mean`:
+
+```python
+value = ...
+mean_value, update_op = tf.contrib.metrics.streaming_mean(values)
+sess.run(tf.local_variables_initializer())
+
+for i in range(number_of_batches):
+  print('Mean after batch %d: %f' % (i, update_op.eval())
+print('Final Mean: %f' % mean_value.eval())
+```
+
+Each metric function adds nodes to the graph that hold the state necessary to
+compute the value of the metric as well as a set of operations that actually
+perform the computation. Every metric evaluation is composed of three steps
+
+* Initialization: initializing the metric state.
+* Aggregation: updating the values of the metric state.
+* Finalization: computing the final metric value.
+
+In the above example, calling streaming_mean creates a pair of state variables
+that will contain (1) the running sum and (2) the count of the number of samples
+in the sum.  Because the streaming metrics use local variables,
+the Initialization stage is performed by running the op returned
+by `tf.local_variables_initializer()`. It sets the sum and count variables to
+zero.
+
+Next, Aggregation is performed by examining the current state of `values`
+and incrementing the state variables appropriately. This step is executed by
+running the `update_op` returned by the metric.
+
+Finally, finalization is performed by evaluating the "value_tensor"
+
+In practice, we commonly want to evaluate across many batches and multiple
+metrics. To do so, we need only run the metric computation operations multiple
+times:
+
+```python
+labels = ...
+predictions = ...
+accuracy, update_op_acc = tf.contrib.metrics.streaming_accuracy(
+    labels, predictions)
+error, update_op_error = tf.contrib.metrics.streaming_mean_absolute_error(
+    labels, predictions)
+
+sess.run(tf.local_variables_initializer())
+for batch in range(num_batches):
+  sess.run([update_op_acc, update_op_error])
+
+accuracy, mean_absolute_error = sess.run([accuracy, mean_absolute_error])
+```
+
+Note that when evaluating the same metric multiple times on different inputs,
+one must specify the scope of each metric to avoid accumulating the results
+together:
+
+```python
+labels = ...
+predictions0 = ...
+predictions1 = ...
+
+accuracy0 = tf.contrib.metrics.accuracy(labels, predictions0, name='preds0')
+accuracy1 = tf.contrib.metrics.accuracy(labels, predictions1, name='preds1')
+```
+
+Certain metrics, such as streaming_mean or streaming_accuracy, can be weighted
+via a `weights` argument. The `weights` tensor must be the same size as the
+labels and predictions tensors and results in a weighted average of the metric.
+
+## Metric `Ops`
+
+*   @{tf.contrib.metrics.streaming_accuracy}
+*   @{tf.contrib.metrics.streaming_mean}
+*   @{tf.contrib.metrics.streaming_recall}
+*   @{tf.contrib.metrics.streaming_recall_at_thresholds}
+*   @{tf.contrib.metrics.streaming_precision}
+*   @{tf.contrib.metrics.streaming_precision_at_thresholds}
+*   @{tf.contrib.metrics.streaming_auc}
+*   @{tf.contrib.metrics.streaming_recall_at_k}
+*   @{tf.contrib.metrics.streaming_mean_absolute_error}
+*   @{tf.contrib.metrics.streaming_mean_iou}
+*   @{tf.contrib.metrics.streaming_mean_relative_error}
+*   @{tf.contrib.metrics.streaming_mean_squared_error}
+*   @{tf.contrib.metrics.streaming_mean_tensor}
+*   @{tf.contrib.metrics.streaming_root_mean_squared_error}
+*   @{tf.contrib.metrics.streaming_covariance}
+*   @{tf.contrib.metrics.streaming_pearson_correlation}
+*   @{tf.contrib.metrics.streaming_mean_cosine_distance}
+*   @{tf.contrib.metrics.streaming_percentage_less}
+*   @{tf.contrib.metrics.streaming_sensitivity_at_specificity}
+*   @{tf.contrib.metrics.streaming_sparse_average_precision_at_k}
+*   @{tf.contrib.metrics.streaming_sparse_precision_at_k}
+*   @{tf.contrib.metrics.streaming_sparse_precision_at_top_k}
+*   @{tf.contrib.metrics.streaming_sparse_recall_at_k}
+*   @{tf.contrib.metrics.streaming_specificity_at_sensitivity}
+*   @{tf.contrib.metrics.streaming_concat}
+*   @{tf.contrib.metrics.streaming_false_negatives}
+*   @{tf.contrib.metrics.streaming_false_negatives_at_thresholds}
+*   @{tf.contrib.metrics.streaming_false_positives}
+*   @{tf.contrib.metrics.streaming_false_positives_at_thresholds}
+*   @{tf.contrib.metrics.streaming_true_negatives}
+*   @{tf.contrib.metrics.streaming_true_negatives_at_thresholds}
+*   @{tf.contrib.metrics.streaming_true_positives}
+*   @{tf.contrib.metrics.streaming_true_positives_at_thresholds}
+*   @{tf.contrib.metrics.auc_using_histogram}
+*   @{tf.contrib.metrics.accuracy}
+*   @{tf.contrib.metrics.aggregate_metrics}
+*   @{tf.contrib.metrics.aggregate_metric_map}
+*   @{tf.contrib.metrics.confusion_matrix}
+
+## Set `Ops`
+
+*   @{tf.contrib.metrics.set_difference}
+*   @{tf.contrib.metrics.set_intersection}
+*   @{tf.contrib.metrics.set_size}
+*   @{tf.contrib.metrics.set_union}
--- a/tensorflow/docs_src/api_guides/python/contrib.opt.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.opt.md
@ -0,0 +1,4 @@
+# Optimization (contrib)
+[TOC]
+
+opt: A module containing optimization routines.
--- a/tensorflow/docs_src/api_guides/python/contrib.rnn.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.rnn.md
@ -0,0 +1,61 @@
+# RNN and Cells (contrib)
+[TOC]
+
+Module for constructing RNN Cells and additional RNN operations.
+
+## Base interface for all RNN Cells
+
+*   @{tf.contrib.rnn.RNNCell}
+
+## Core RNN Cells for use with TensorFlow's core RNN methods
+
+*   @{tf.contrib.rnn.BasicRNNCell}
+*   @{tf.contrib.rnn.BasicLSTMCell}
+*   @{tf.contrib.rnn.GRUCell}
+*   @{tf.contrib.rnn.LSTMCell}
+*   @{tf.contrib.rnn.LayerNormBasicLSTMCell}
+
+## Classes storing split `RNNCell` state
+
+*   @{tf.contrib.rnn.LSTMStateTuple}
+
+## Core RNN Cell wrappers (RNNCells that wrap other RNNCells)
+
+*   @{tf.contrib.rnn.MultiRNNCell}
+*   @{tf.contrib.rnn.LSTMBlockWrapper}
+*   @{tf.contrib.rnn.DropoutWrapper}
+*   @{tf.contrib.rnn.EmbeddingWrapper}
+*   @{tf.contrib.rnn.InputProjectionWrapper}
+*   @{tf.contrib.rnn.OutputProjectionWrapper}
+*   @{tf.contrib.rnn.DeviceWrapper}
+*   @{tf.contrib.rnn.ResidualWrapper}
+
+### Block RNNCells
+*   @{tf.contrib.rnn.LSTMBlockCell}
+*   @{tf.contrib.rnn.GRUBlockCell}
+
+### Fused RNNCells
+*   @{tf.contrib.rnn.FusedRNNCell}
+*   @{tf.contrib.rnn.FusedRNNCellAdaptor}
+*   @{tf.contrib.rnn.TimeReversedFusedRNN}
+*   @{tf.contrib.rnn.LSTMBlockFusedCell}
+
+### LSTM-like cells
+*   @{tf.contrib.rnn.CoupledInputForgetGateLSTMCell}
+*   @{tf.contrib.rnn.TimeFreqLSTMCell}
+*   @{tf.contrib.rnn.GridLSTMCell}
+
+### RNNCell wrappers
+*   @{tf.contrib.rnn.AttentionCellWrapper}
+*   @{tf.contrib.rnn.CompiledWrapper}
+
+
+## Recurrent Neural Networks
+
+TensorFlow provides a number of methods for constructing Recurrent Neural
+Networks.
+
+*   @{tf.contrib.rnn.static_rnn}
+*   @{tf.contrib.rnn.static_state_saving_rnn}
+*   @{tf.contrib.rnn.static_bidirectional_rnn}
+*   @{tf.contrib.rnn.stack_bidirectional_dynamic_rnn}
--- a/tensorflow/docs_src/api_guides/python/contrib.training.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.training.md
@ -0,0 +1,50 @@
+# Training (contrib)
+[TOC]
+
+Training and input utilities.
+
+## Splitting sequence inputs into minibatches with state saving
+
+Use @{tf.contrib.training.SequenceQueueingStateSaver} or
+its wrapper @{tf.contrib.training.batch_sequences_with_states} if
+you have input data with a dynamic primary time / frame count axis which
+you'd like to convert into fixed size segments during minibatching, and would
+like to store state in the forward direction across segments of an example.
+
+*   @{tf.contrib.training.batch_sequences_with_states}
+*   @{tf.contrib.training.NextQueuedSequenceBatch}
+*   @{tf.contrib.training.SequenceQueueingStateSaver}
+
+
+## Online data resampling
+
+To resample data with replacement on a per-example basis, use
+@{tf.contrib.training.rejection_sample} or
+@{tf.contrib.training.resample_at_rate}. For `rejection_sample`, provide
+a boolean Tensor describing whether to accept or reject. Resulting batch sizes
+are always the same. For `resample_at_rate`, provide the desired rate for each
+example. Resulting batch sizes may vary. If you wish to specify relative
+rates, rather than absolute ones, use @{tf.contrib.training.weighted_resample}
+(which also returns the actual resampling rate used for each output example).
+
+Use @{tf.contrib.training.stratified_sample} to resample without replacement
+from the data to achieve a desired mix of class proportions that the Tensorflow
+graph sees. For instance, if you have a binary classification dataset that is
+99.9% class 1, a common approach is to resample from the data so that the data
+is more balanced.
+
+*   @{tf.contrib.training.rejection_sample}
+*   @{tf.contrib.training.resample_at_rate}
+*   @{tf.contrib.training.stratified_sample}
+*   @{tf.contrib.training.weighted_resample}
+
+## Bucketing
+
+Use @{tf.contrib.training.bucket} or
+@{tf.contrib.training.bucket_by_sequence_length} to stratify
+minibatches into groups ("buckets").  Use `bucket_by_sequence_length`
+with the argument `dynamic_pad=True` to receive minibatches of similarly
+sized sequences for efficient training via `dynamic_rnn`.
+
+*   @{tf.contrib.training.bucket}
+*   @{tf.contrib.training.bucket_by_sequence_length}
--- a/tensorflow/docs_src/api_guides/python/contrib.util.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.util.md
@ -0,0 +1,12 @@
+# Utilities (contrib)
+[TOC]
+
+Utilities for dealing with Tensors.
+
+## Miscellaneous Utility Functions
+
+*   @{tf.contrib.util.constant_value}
+*   @{tf.contrib.util.make_tensor_proto}
+*   @{tf.contrib.util.make_ndarray}
+*   @{tf.contrib.util.ops_used_by_graph_def}
+*   @{tf.contrib.util.stripped_op_list_for_graph}
--- a/tensorflow/docs_src/api_guides/python/control_flow_ops.md
+++ b/tensorflow/docs_src/api_guides/python/control_flow_ops.md
@ -0,0 +1,57 @@
+# Control Flow
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Control Flow Operations
+
+TensorFlow provides several operations and classes that you can use to control
+the execution of operations and add conditional dependencies to your graph.
+
+*   @{tf.identity}
+*   @{tf.tuple}
+*   @{tf.group}
+*   @{tf.no_op}
+*   @{tf.count_up_to}
+*   @{tf.cond}
+*   @{tf.case}
+*   @{tf.while_loop}
+
+## Logical Operators
+
+TensorFlow provides several operations that you can use to add logical operators
+to your graph.
+
+*   @{tf.logical_and}
+*   @{tf.logical_not}
+*   @{tf.logical_or}
+*   @{tf.logical_xor}
+
+## Comparison Operators
+
+TensorFlow provides several operations that you can use to add comparison
+operators to your graph.
+
+*   @{tf.equal}
+*   @{tf.not_equal}
+*   @{tf.less}
+*   @{tf.less_equal}
+*   @{tf.greater}
+*   @{tf.greater_equal}
+*   @{tf.where}
+
+## Debugging Operations
+
+TensorFlow provides several operations that you can use to validate values and
+debug your graph.
+
+*   @{tf.is_finite}
+*   @{tf.is_inf}
+*   @{tf.is_nan}
+*   @{tf.verify_tensor_all_finite}
+*   @{tf.check_numerics}
+*   @{tf.add_check_numerics_ops}
+*   @{tf.Assert}
+*   @{tf.Print}
--- a/tensorflow/docs_src/api_guides/python/framework.md
+++ b/tensorflow/docs_src/api_guides/python/framework.md
@ -0,0 +1,51 @@
+# Building Graphs
+[TOC]
+
+Classes and functions for building TensorFlow graphs.
+
+## Core graph data structures
+
+*   @{tf.Graph}
+*   @{tf.Operation}
+*   @{tf.Tensor}
+
+## Tensor types
+
+*   @{tf.DType}
+*   @{tf.as_dtype}
+
+## Utility functions
+
+*   @{tf.device}
+*   @{tf.container}
+*   @{tf.name_scope}
+*   @{tf.control_dependencies}
+*   @{tf.convert_to_tensor}
+*   @{tf.convert_to_tensor_or_indexed_slices}
+*   @{tf.convert_to_tensor_or_sparse_tensor}
+*   @{tf.get_default_graph}
+*   @{tf.reset_default_graph}
+*   @{tf.import_graph_def}
+*   @{tf.load_file_system_library}
+*   @{tf.load_op_library}
+
+## Graph collections
+
+*   @{tf.add_to_collection}
+*   @{tf.get_collection}
+*   @{tf.get_collection_ref}
+*   @{tf.GraphKeys}
+
+## Defining new operations
+
+*   @{tf.RegisterGradient}
+*   @{tf.NotDifferentiable}
+*   @{tf.NoGradient}
+*   @{tf.TensorShape}
+*   @{tf.Dimension}
+*   @{tf.op_scope}
+*   @{tf.get_seed}
+
+## For libraries building on TensorFlow
+
+*   @{tf.register_tensor_conversion_function}
--- a/tensorflow/docs_src/api_guides/python/functional_ops.md
+++ b/tensorflow/docs_src/api_guides/python/functional_ops.md
@ -0,0 +1,18 @@
+# Higher Order Functions
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+Functional operations.
+
+## Higher Order Operators
+
+TensorFlow provides several higher order operators to simplify the common
+map-reduce programming patterns.
+
+*   @{tf.map_fn}
+*   @{tf.foldl}
+*   @{tf.foldr}
+*   @{tf.scan}
--- a/tensorflow/docs_src/api_guides/python/histogram_ops.md
+++ b/tensorflow/docs_src/api_guides/python/histogram_ops.md
@ -0,0 +1,6 @@
+# Histograms
+[TOC]
+
+## Histograms
+
+*   @{tf.histogram_fixed_width}
--- a/tensorflow/docs_src/api_guides/python/image.md
+++ b/tensorflow/docs_src/api_guides/python/image.md
@ -0,0 +1,143 @@
+# Images
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Encoding and Decoding
+
+TensorFlow provides Ops to decode and encode JPEG and PNG formats.  Encoded
+images are represented by scalar string Tensors, decoded images by 3-D uint8
+tensors of shape `[height, width, channels]`. (PNG also supports uint16.)
+
+The encode and decode Ops apply to one image at a time.  Their input and output
+are all of variable size.  If you need fixed size images, pass the output of
+the decode Ops to one of the cropping and resizing Ops.
+
+Note: The PNG encode and decode Ops support RGBA, but the conversions Ops
+presently only support RGB, HSV, and GrayScale. Presently, the alpha channel has
+to be stripped from the image and re-attached using slicing ops.
+
+*   @{tf.image.decode_gif}
+*   @{tf.image.decode_jpeg}
+*   @{tf.image.encode_jpeg}
+*   @{tf.image.decode_png}
+*   @{tf.image.encode_png}
+*   @{tf.image.decode_image}
+
+## Resizing
+
+The resizing Ops accept input images as tensors of several types.  They always
+output resized images as float32 tensors.
+
+The convenience function @{tf.image.resize_images} supports both 4-D
+and 3-D tensors as input and output.  4-D tensors are for batches of images,
+3-D tensors for individual images.
+
+Other resizing Ops only support 4-D batches of images as input:
+@{tf.image.resize_area}, @{tf.image.resize_bicubic},
+@{tf.image.resize_bilinear},
+@{tf.image.resize_nearest_neighbor}.
+
+Example:
+
+```python
+# Decode a JPG image and resize it to 299 by 299 using default method.
+image = tf.image.decode_jpeg(...)
+resized_image = tf.image.resize_images(image, [299, 299])
+```
+
+*   @{tf.image.resize_images}
+*   @{tf.image.resize_area}
+*   @{tf.image.resize_bicubic}
+*   @{tf.image.resize_bilinear}
+*   @{tf.image.resize_nearest_neighbor}
+
+## Cropping
+
+*   @{tf.image.resize_image_with_crop_or_pad}
+*   @{tf.image.central_crop}
+*   @{tf.image.pad_to_bounding_box}
+*   @{tf.image.crop_to_bounding_box}
+*   @{tf.image.extract_glimpse}
+*   @{tf.image.crop_and_resize}
+
+## Flipping, Rotating and Transposing
+
+*   @{tf.image.flip_up_down}
+*   @{tf.image.random_flip_up_down}
+*   @{tf.image.flip_left_right}
+*   @{tf.image.random_flip_left_right}
+*   @{tf.image.transpose_image}
+*   @{tf.image.rot90}
+
+## Converting Between Colorspaces
+
+Image ops work either on individual images or on batches of images, depending on
+the shape of their input Tensor.
+
+If 3-D, the shape is `[height, width, channels]`, and the Tensor represents one
+image. If 4-D, the shape is `[batch_size, height, width, channels]`, and the
+Tensor represents `batch_size` images.
+
+Currently, `channels` can usefully be 1, 2, 3, or 4. Single-channel images are
+grayscale, images with 3 channels are encoded as either RGB or HSV. Images
+with 2 or 4 channels include an alpha channel, which has to be stripped from the
+image before passing the image to most image processing functions (and can be
+re-attached later).
+
+Internally, images are either stored in as one `float32` per channel per pixel
+(implicitly, values are assumed to lie in `[0,1)`) or one `uint8` per channel
+per pixel (values are assumed to lie in `[0,255]`).
+
+TensorFlow can convert between images in RGB or HSV. The conversion functions
+work only on float images, so you need to convert images in other formats using
+@{tf.image.convert_image_dtype}.
+
+Example:
+
+```python
+# Decode an image and convert it to HSV.
+rgb_image = tf.image.decode_png(...,  channels=3)
+rgb_image_float = tf.image.convert_image_dtype(rgb_image, tf.float32)
+hsv_image = tf.image.rgb_to_hsv(rgb_image)
+```
+
+*   @{tf.image.rgb_to_grayscale}
+*   @{tf.image.grayscale_to_rgb}
+*   @{tf.image.hsv_to_rgb}
+*   @{tf.image.rgb_to_hsv}
+*   @{tf.image.convert_image_dtype}
+
+## Image Adjustments
+
+TensorFlow provides functions to adjust images in various ways: brightness,
+contrast, hue, and saturation.  Each adjustment can be done with predefined
+parameters or with random parameters picked from predefined intervals. Random
+adjustments are often useful to expand a training set and reduce overfitting.
+
+If several adjustments are chained it is advisable to minimize the number of
+redundant conversions by first converting the images to the most natural data
+type and representation (RGB or HSV).
+
+*   @{tf.image.adjust_brightness}
+*   @{tf.image.random_brightness}
+*   @{tf.image.adjust_contrast}
+*   @{tf.image.random_contrast}
+*   @{tf.image.adjust_hue}
+*   @{tf.image.random_hue}
+*   @{tf.image.adjust_gamma}
+*   @{tf.image.adjust_saturation}
+*   @{tf.image.random_saturation}
+*   @{tf.image.per_image_standardization}
+
+## Working with Bounding Boxes
+
+*   @{tf.image.draw_bounding_boxes}
+*   @{tf.image.non_max_suppression}
+*   @{tf.image.sample_distorted_bounding_box}
+
+## Denoising
+
+*   @{tf.image.total_variation}
--- a/tensorflow/docs_src/api_guides/python/index.md
+++ b/tensorflow/docs_src/api_guides/python/index.md
@ -0,0 +1,46 @@
+# Python API Guides
+
+*   [Asserts and boolean checks](check_ops.md)
+*   [Building Graphs](framework.md)
+*   [Constants, Sequences, and Random Values](constant_op.md)
+*   [Control Flow](control_flow_ops.md)
+*   [Data IO (Python functions)](python_io.md)
+*   [Higher Order Functions](functional_ops.md)
+*   [Histograms](histogram_ops.md)
+*   [Images](image.md)
+*   [Inputs and Readers](io_ops.md)
+*   [Math](math_ops.md)
+*   [Neural Network](nn.md)
+*   [Running Graphs](client.md)
+*   [Sparse Tensors](sparse_ops.md)
+*   [Strings](string_ops.md)
+*   [Summary Operations](summary.md)
+*   [TensorFlow Debugger](tfdbg.md)
+*   [Tensor Handle Operations](session_ops.md)
+*   [Tensor Transformations](array_ops.md)
+*   [Testing](test.md)
+*   [Training](train.md)
+*   [Variables](state_ops.md)
+*   [Wraps python functions](script_ops.md)
+*   [BayesFlow Entropy (contrib)](contrib.bayesflow.entropy.md)
+*   [BayesFlow Monte Carlo (contrib)](contrib.bayesflow.monte_carlo.md)
+*   [BayesFlow Stochastic Graph (contrib)](contrib.bayesflow.stochastic_graph.md)
+*   [BayesFlow Stochastic Tensors (contrib)](contrib.bayesflow.stochastic_tensor.md)
+*   [BayesFlow Variational Inference (contrib)](contrib.bayesflow.variational_inference.md)
+*   [Copying Graph Elements (contrib)](contrib.copy_graph.md)
+*   [CRF (contrib)](contrib.crf.md)
+*   [FFmpeg (contrib)](contrib.ffmpeg.md)
+*   [Framework (contrib)](contrib.framework.md)
+*   [Graph Editor (contrib)](contrib.graph_editor.md)
+*   [Integrate (contrib)](contrib.integrate.md)
+*   [Layers (contrib)](contrib.layers.md)
+*   [Learn (contrib)](contrib.learn.md)
+*   [Linear Algebra (contrib)](contrib.linalg.md)
+*   [Losses (contrib)](contrib.losses.md)
+*   [Metrics (contrib)](contrib.metrics.md)
+*   [Optimization (contrib)](contrib.opt.md)
+*   [Random variable transformations (contrib)](contrib.distributions.bijector.md)
+*   [RNN and Cells (contrib)](contrib.rnn.md)
+*   [Statistical Distributions (contrib)](contrib.distributions.md)
+*   [Training (contrib)](contrib.training.md)
+*   [Utilities (contrib)](contrib.util.md)
--- a/tensorflow/docs_src/api_guides/python/io_ops.md
+++ b/tensorflow/docs_src/api_guides/python/io_ops.md
@ -0,0 +1,130 @@
+# Inputs and Readers
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Placeholders
+
+TensorFlow provides a placeholder operation that must be fed with data
+on execution.  For more info, see the section on @{$reading_data#feeding$Feeding data}.
+
+*   @{tf.placeholder}
+*   @{tf.placeholder_with_default}
+
+For feeding `SparseTensor`s which are composite type,
+there is a convenience function:
+
+*   @{tf.sparse_placeholder}
+
+## Readers
+
+TensorFlow provides a set of Reader classes for reading data formats.
+For more information on inputs and readers, see @{$reading_data$Reading data}.
+
+*   @{tf.ReaderBase}
+*   @{tf.TextLineReader}
+*   @{tf.WholeFileReader}
+*   @{tf.IdentityReader}
+*   @{tf.TFRecordReader}
+*   @{tf.FixedLengthRecordReader}
+
+## Converting
+
+TensorFlow provides several operations that you can use to convert various data
+formats into tensors.
+
+*   @{tf.decode_csv}
+*   @{tf.decode_raw}
+
+- - -
+
+### Example protocol buffer
+
+TensorFlow's @{$reading_data#standard-tensorflow-format$recommended format for training examples}
+is serialized `Example` protocol buffers, [described
+here](https://www.tensorflow.org/code/tensorflow/core/example/example.proto).
+They contain `Features`, [described
+here](https://www.tensorflow.org/code/tensorflow/core/example/feature.proto).
+
+*   @{tf.VarLenFeature}
+*   @{tf.FixedLenFeature}
+*   @{tf.FixedLenSequenceFeature}
+*   @{tf.SparseFeature}
+*   @{tf.parse_example}
+*   @{tf.parse_single_example}
+*   @{tf.parse_tensor}
+*   @{tf.decode_json_example}
+
+## Queues
+
+TensorFlow provides several implementations of 'Queues', which are
+structures within the TensorFlow computation graph to stage pipelines
+of tensors together. The following describe the basic Queue interface
+and some implementations.  To see an example use, see @{$threading_and_queues$Threading and Queues}.
+
+*   @{tf.QueueBase}
+*   @{tf.FIFOQueue}
+*   @{tf.PaddingFIFOQueue}
+*   @{tf.RandomShuffleQueue}
+*   @{tf.PriorityQueue}
+
+## Conditional Accumulators
+
+*   @{tf.ConditionalAccumulatorBase}
+*   @{tf.ConditionalAccumulator}
+*   @{tf.SparseConditionalAccumulator}
+
+## Dealing with the filesystem
+
+*   @{tf.matching_files}
+*   @{tf.read_file}
+*   @{tf.write_file}
+
+## Input pipeline
+
+TensorFlow functions for setting up an input-prefetching pipeline.
+Please see the @{$reading_data$reading data how-to}
+for context.
+
+### Beginning of an input pipeline
+
+The "producer" functions add a queue to the graph and a corresponding
+`QueueRunner` for running the subgraph that fills that queue.
+
+*   @{tf.train.match_filenames_once}
+*   @{tf.train.limit_epochs}
+*   @{tf.train.input_producer}
+*   @{tf.train.range_input_producer}
+*   @{tf.train.slice_input_producer}
+*   @{tf.train.string_input_producer}
+
+### Batching at the end of an input pipeline
+
+These functions add a queue to the graph to assemble a batch of
+examples, with possible shuffling.  They also add a `QueueRunner` for
+running the subgraph that fills that queue.
+
+Use @{tf.train.batch} or @{tf.train.batch_join} for batching
+examples that have already been well shuffled.  Use
+@{tf.train.shuffle_batch} or
+@{tf.train.shuffle_batch_join} for examples that would
+benefit from additional shuffling.
+
+Use @{tf.train.batch} or @{tf.train.shuffle_batch} if you want a
+single thread producing examples to batch, or if you have a
+single subgraph producing examples but you want to run it in *N* threads
+(where you increase *N* until it can keep the queue full).  Use
+@{tf.train.batch_join} or @{tf.train.shuffle_batch_join}
+if you have *N* different subgraphs producing examples to batch and you
+want them run by *N* threads. Use `maybe_*` to enqueue conditionally.
+
+*   @{tf.train.batch}
+*   @{tf.train.maybe_batch}
+*   @{tf.train.batch_join}
+*   @{tf.train.maybe_batch_join}
+*   @{tf.train.shuffle_batch}
+*   @{tf.train.maybe_shuffle_batch}
+*   @{tf.train.shuffle_batch_join}
+*   @{tf.train.maybe_shuffle_batch_join}
--- a/tensorflow/docs_src/api_guides/python/math_ops.md
+++ b/tensorflow/docs_src/api_guides/python/math_ops.md
@ -0,0 +1,204 @@
+# Math
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+Note: Elementwise binary operations in TensorFlow follow [numpy-style
+broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
+
+## Arithmetic Operators
+
+TensorFlow provides several operations that you can use to add basic arithmetic
+operators to your graph.
+
+*   @{tf.add}
+*   @{tf.subtract}
+*   @{tf.multiply}
+*   @{tf.scalar_mul}
+*   @{tf.div}
+*   @{tf.divide}
+*   @{tf.truediv}
+*   @{tf.floordiv}
+*   @{tf.realdiv}
+*   @{tf.truncatediv}
+*   @{tf.floor_div}
+*   @{tf.truncatemod}
+*   @{tf.floormod}
+*   @{tf.mod}
+*   @{tf.cross}
+
+## Basic Math Functions
+
+TensorFlow provides several operations that you can use to add basic
+mathematical functions to your graph.
+
+*   @{tf.add_n}
+*   @{tf.abs}
+*   @{tf.negative}
+*   @{tf.sign}
+*   @{tf.reciprocal}
+*   @{tf.square}
+*   @{tf.round}
+*   @{tf.sqrt}
+*   @{tf.rsqrt}
+*   @{tf.pow}
+*   @{tf.exp}
+*   @{tf.expm1}
+*   @{tf.log}
+*   @{tf.log1p}
+*   @{tf.ceil}
+*   @{tf.floor}
+*   @{tf.maximum}
+*   @{tf.minimum}
+*   @{tf.cos}
+*   @{tf.sin}
+*   @{tf.lbeta}
+*   @{tf.tan}
+*   @{tf.acos}
+*   @{tf.asin}
+*   @{tf.atan}
+*   @{tf.lgamma}
+*   @{tf.digamma}
+*   @{tf.erf}
+*   @{tf.erfc}
+*   @{tf.squared_difference}
+*   @{tf.igamma}
+*   @{tf.igammac}
+*   @{tf.zeta}
+*   @{tf.polygamma}
+*   @{tf.betainc}
+*   @{tf.rint}
+
+## Matrix Math Functions
+
+TensorFlow provides several operations that you can use to add linear algebra
+functions on matrices to your graph.
+
+*   @{tf.diag}
+*   @{tf.diag_part}
+*   @{tf.trace}
+*   @{tf.transpose}
+*   @{tf.eye}
+*   @{tf.matrix_diag}
+*   @{tf.matrix_diag_part}
+*   @{tf.matrix_band_part}
+*   @{tf.matrix_set_diag}
+*   @{tf.matrix_transpose}
+*   @{tf.matmul}
+*   @{tf.norm}
+*   @{tf.matrix_determinant}
+*   @{tf.matrix_inverse}
+*   @{tf.cholesky}
+*   @{tf.cholesky_solve}
+*   @{tf.matrix_solve}
+*   @{tf.matrix_triangular_solve}
+*   @{tf.matrix_solve_ls}
+*   @{tf.qr}
+*   @{tf.self_adjoint_eig}
+*   @{tf.self_adjoint_eigvals}
+*   @{tf.svd}
+
+
+## Tensor Math Function
+
+TensorFlow provides operations that you can use to add tensor functions to your
+graph.
+
+*   @{tf.tensordot}
+
+
+## Complex Number Functions
+
+TensorFlow provides several operations that you can use to add complex number
+functions to your graph.
+
+*   @{tf.complex}
+*   @{tf.conj}
+*   @{tf.imag}
+*   @{tf.real}
+
+## Fourier Transform Functions
+
+TensorFlow provides several operations that you can use to add discrete
+Fourier transform functions to your graph.
+
+*   @{tf.fft}
+*   @{tf.ifft}
+*   @{tf.fft2d}
+*   @{tf.ifft2d}
+*   @{tf.fft3d}
+*   @{tf.ifft3d}
+
+## Reduction
+
+TensorFlow provides several operations that you can use to perform
+common math computations that reduce various dimensions of a tensor.
+
+*   @{tf.reduce_sum}
+*   @{tf.reduce_prod}
+*   @{tf.reduce_min}
+*   @{tf.reduce_max}
+*   @{tf.reduce_mean}
+*   @{tf.reduce_all}
+*   @{tf.reduce_any}
+*   @{tf.reduce_logsumexp}
+*   @{tf.count_nonzero}
+*   @{tf.accumulate_n}
+*   @{tf.einsum}
+
+## Scan
+
+TensorFlow provides several operations that you can use to perform scans
+(running totals) across one axis of a tensor.
+
+*   @{tf.cumsum}
+*   @{tf.cumprod}
+
+## Segmentation
+
+TensorFlow provides several operations that you can use to perform common
+math computations on tensor segments.
+Here a segmentation is a partitioning of a tensor along
+the first dimension, i.e. it  defines a mapping from the first dimension onto
+`segment_ids`. The `segment_ids` tensor should be the size of
+the first dimension, `d0`, with consecutive IDs in the range `0` to `k`,
+where `k<d0`.
+In particular, a segmentation of a matrix tensor is a mapping of rows to
+segments.
+
+For example:
+
+```python
+c = tf.constant([[1,2,3,4], [-1,-2,-3,-4], [5,6,7,8]])
+tf.segment_sum(c, tf.constant([0, 0, 1]))
+  ==>  [[0 0 0 0]
+        [5 6 7 8]]
+```
+
+*   @{tf.segment_sum}
+*   @{tf.segment_prod}
+*   @{tf.segment_min}
+*   @{tf.segment_max}
+*   @{tf.segment_mean}
+*   @{tf.unsorted_segment_sum}
+*   @{tf.sparse_segment_sum}
+*   @{tf.sparse_segment_mean}
+*   @{tf.sparse_segment_sqrt_n}
+
+
+## Sequence Comparison and Indexing
+
+TensorFlow provides several operations that you can use to add sequence
+comparison and index extraction to your graph. You can use these operations to
+determine sequence differences and determine the indexes of specific values in
+a tensor.
+
+*   @{tf.argmin}
+*   @{tf.argmax}
+*   @{tf.setdiff1d}
+*   @{tf.where}
+*   @{tf.unique}
+*   @{tf.edit_distance}
+*   @{tf.invert_permutation}
--- a/tensorflow/docs_src/api_guides/python/nn.md
+++ b/tensorflow/docs_src/api_guides/python/nn.md
@ -0,0 +1,290 @@
+# Neural Network
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Activation Functions
+
+The activation ops provide different types of nonlinearities for use in neural
+networks.  These include smooth nonlinearities (`sigmoid`, `tanh`, `elu`,
+`softplus`, and `softsign`), continuous but not everywhere differentiable
+functions (`relu`, `relu6`, `crelu` and `relu_x`), and random regularization
+(`dropout`).
+
+All activation ops apply componentwise, and produce a tensor of the same
+shape as the input tensor.
+
+*   @{tf.nn.relu}
+*   @{tf.nn.relu6}
+*   @{tf.nn.crelu}
+*   @{tf.nn.elu}
+*   @{tf.nn.softplus}
+*   @{tf.nn.softsign}
+*   @{tf.nn.dropout}
+*   @{tf.nn.bias_add}
+*   @{tf.sigmoid}
+*   @{tf.tanh}
+
+## Convolution
+
+The convolution ops sweep a 2-D filter over a batch of images, applying the
+filter to each window of each image of the appropriate size.  The different
+ops trade off between generic vs. specific filters:
+
+* `conv2d`: Arbitrary filters that can mix channels together.
+* `depthwise_conv2d`: Filters that operate on each channel independently.
+* `separable_conv2d`: A depthwise spatial filter followed by a pointwise filter.
+
+Note that although these ops are called "convolution", they are strictly
+speaking "cross-correlation" since the filter is combined with an input window
+without reversing the filter.  For details, see [the properties of
+cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation#Properties).
+
+The filter is applied to image patches of the same size as the filter and
+strided according to the `strides` argument.  `strides = [1, 1, 1, 1]` applies
+the filter to a patch at every offset, `strides = [1, 2, 2, 1]` applies the
+filter to every other image patch in each dimension, etc.
+
+Ignoring channels for the moment, and assume that the 4-D `input` has shape
+`[batch, in_height, in_width, ...]` and the 4-D `filter` has shape
+`[filter_height, filter_width, ...]`, then the spatial semantics of the
+convolution ops are as follows: first, according to the padding scheme chosen
+as `'SAME'` or `'VALID'`, the output size and the padding pixels are computed.
+For the `'SAME'` padding, the output height and width are computed as:
+
+    out_height = ceil(float(in_height) / float(strides[1]))
+    out_width  = ceil(float(in_width) / float(strides[2]))
+
+and the padding on the top and left are computed as:
+
+    pad_along_height = max((out_height - 1) * strides[1] +
+                        filter_height - in_height, 0)
+    pad_along_width = max((out_width - 1) * strides[2] +
+                       filter_width - in_width, 0)
+    pad_top = pad_along_height // 2
+    pad_bottom = pad_along_height - pad_top
+    pad_left = pad_along_width // 2
+    pad_right = pad_along_width - pad_left
+
+
+Note that the division by 2 means that there might be cases when the padding on
+both sides (top vs bottom, right vs left) are off by one. In this case, the
+bottom and right sides always get the one additional padded pixel. For example,
+when `pad_along_height` is 5, we pad 2 pixels at the top and 3 pixels at the
+bottom. Note that this is different from existing libraries such as cuDNN and
+Caffe, which explicitly specify the number of padded pixels and always pad the
+same number of pixels on both sides.
+
+For the `'VALID`' padding, the output height and width are computed as:
+
+    out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
+    out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))
+
+and the padding values are always zero. The output is then computed as
+
+    output[b, i, j, :] =
+        sum_{di, dj} input[b, strides[1] * i + di - pad_top,
+                           strides[2] * j + dj - pad_left, ...] *
+                     filter[di, dj, ...]
+
+where any value outside the original input image region are considered zero (
+i.e. we pad zero values around the border of the image).
+
+Since `input` is 4-D, each `input[b, i, j, :]` is a vector.  For `conv2d`, these
+vectors are multiplied by the `filter[di, dj, :, :]` matrices to produce new
+vectors.  For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]`
+is multiplied by a vector `filter[di, dj, k]`, and all the vectors are
+concatenated.
+
+*   @{tf.nn.convolution}
+*   @{tf.nn.conv2d}
+*   @{tf.nn.depthwise_conv2d}
+*   @{tf.nn.depthwise_conv2d_native}
+*   @{tf.nn.separable_conv2d}
+*   @{tf.nn.atrous_conv2d}
+*   @{tf.nn.atrous_conv2d_transpose}
+*   @{tf.nn.conv2d_transpose}
+*   @{tf.nn.conv1d}
+*   @{tf.nn.conv3d}
+*   @{tf.nn.conv3d_transpose}
+*   @{tf.nn.conv2d_backprop_filter}
+*   @{tf.nn.conv2d_backprop_input}
+*   @{tf.nn.conv3d_backprop_filter_v2}
+*   @{tf.nn.depthwise_conv2d_native_backprop_filter}
+*   @{tf.nn.depthwise_conv2d_native_backprop_input}
+
+## Pooling
+
+The pooling ops sweep a rectangular window over the input tensor, computing a
+reduction operation for each window (average, max, or max with argmax).  Each
+pooling op uses rectangular windows of size `ksize` separated by offset
+`strides`.  For example, if `strides` is all ones every window is used, if
+`strides` is all twos every other window is used in each dimension, etc.
+
+In detail, the output is
+
+    output[i] = reduce(value[strides * i:strides * i + ksize])
+
+where the indices also take into consideration the padding values. Please refer
+to the `Convolution` section for details about the padding calculation.
+
+*   @{tf.nn.avg_pool}
+*   @{tf.nn.max_pool}
+*   @{tf.nn.max_pool_with_argmax}
+*   @{tf.nn.avg_pool3d}
+*   @{tf.nn.max_pool3d}
+*   @{tf.nn.fractional_avg_pool}
+*   @{tf.nn.fractional_max_pool}
+*   @{tf.nn.pool}
+
+## Morphological filtering
+
+Morphological operators are non-linear filters used in image processing.
+
+[Greyscale morphological dilation
+](https://en.wikipedia.org/wiki/Dilation_(morphology))
+is the max-sum counterpart of standard sum-product convolution:
+
+    output[b, y, x, c] =
+        max_{dy, dx} input[b,
+                           strides[1] * y + rates[1] * dy,
+                           strides[2] * x + rates[2] * dx,
+                           c] +
+                     filter[dy, dx, c]
+
+The `filter` is usually called structuring function. Max-pooling is a special
+case of greyscale morphological dilation when the filter assumes all-zero
+values (a.k.a. flat structuring function).
+
+[Greyscale morphological erosion
+](https://en.wikipedia.org/wiki/Erosion_(morphology))
+is the min-sum counterpart of standard sum-product convolution:
+
+    output[b, y, x, c] =
+        min_{dy, dx} input[b,
+                           strides[1] * y - rates[1] * dy,
+                           strides[2] * x - rates[2] * dx,
+                           c] -
+                     filter[dy, dx, c]
+
+Dilation and erosion are dual to each other. The dilation of the input signal
+`f` by the structuring signal `g` is equal to the negation of the erosion of
+`-f` by the reflected `g`, and vice versa.
+
+Striding and padding is carried out in exactly the same way as in standard
+convolution. Please refer to the `Convolution` section for details.
+
+*   @{tf.nn.dilation2d}
+*   @{tf.nn.erosion2d}
+*   @{tf.nn.with_space_to_batch}
+
+## Normalization
+
+Normalization is useful to prevent neurons from saturating when inputs may
+have varying scale, and to aid generalization.
+
+*   @{tf.nn.l2_normalize}
+*   @{tf.nn.local_response_normalization}
+*   @{tf.nn.sufficient_statistics}
+*   @{tf.nn.normalize_moments}
+*   @{tf.nn.moments}
+*   @{tf.nn.weighted_moments}
+*   @{tf.nn.fused_batch_norm}
+*   @{tf.nn.batch_normalization}
+*   @{tf.nn.batch_norm_with_global_normalization}
+
+## Losses
+
+The loss ops measure error between two tensors, or between a tensor and zero.
+These can be used for measuring accuracy of a network in a regression task
+or for regularization purposes (weight decay).
+
+*   @{tf.nn.l2_loss}
+*   @{tf.nn.log_poisson_loss}
+
+## Classification
+
+TensorFlow provides several operations that help you perform classification.
+
+*   @{tf.nn.sigmoid_cross_entropy_with_logits}
+*   @{tf.nn.softmax}
+*   @{tf.nn.log_softmax}
+*   @{tf.nn.softmax_cross_entropy_with_logits}
+*   @{tf.nn.sparse_softmax_cross_entropy_with_logits}
+*   @{tf.nn.weighted_cross_entropy_with_logits}
+
+## Embeddings
+
+TensorFlow provides library support for looking up values in embedding
+tensors.
+
+*   @{tf.nn.embedding_lookup}
+*   @{tf.nn.embedding_lookup_sparse}
+
+## Recurrent Neural Networks
+
+TensorFlow provides a number of methods for constructing Recurrent
+Neural Networks.  Most accept an `RNNCell`-subclassed object
+(see the documentation for `tf.contrib.rnn`).
+
+*   @{tf.nn.dynamic_rnn}
+*   @{tf.nn.bidirectional_dynamic_rnn}
+*   @{tf.nn.raw_rnn}
+
+## Connectionist Temporal Classification (CTC)
+
+*   @{tf.nn.ctc_loss}
+*   @{tf.nn.ctc_greedy_decoder}
+*   @{tf.nn.ctc_beam_search_decoder}
+
+## Evaluation
+
+The evaluation ops are useful for measuring the performance of a network.
+They are typically used at evaluation time.
+
+*   @{tf.nn.top_k}
+*   @{tf.nn.in_top_k}
+
+## Candidate Sampling
+
+Do you want to train a multiclass or multilabel model with thousands
+or millions of output classes (for example, a language model with a
+large vocabulary)?  Training with a full Softmax is slow in this case,
+since all of the classes are evaluated for every training example.
+Candidate Sampling training algorithms can speed up your step times by
+only considering a small randomly-chosen subset of contrastive classes
+(called candidates) for each batch of training examples.
+
+See our
+[Candidate Sampling Algorithms
+Reference](https://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+### Sampled Loss Functions
+
+TensorFlow provides the following sampled loss functions for faster training.
+
+*   @{tf.nn.nce_loss}
+*   @{tf.nn.sampled_softmax_loss}
+
+### Candidate Samplers
+
+TensorFlow provides the following samplers for randomly sampling candidate
+classes when using one of the sampled loss functions above.
+
+*   @{tf.nn.uniform_candidate_sampler}
+*   @{tf.nn.log_uniform_candidate_sampler}
+*   @{tf.nn.learned_unigram_candidate_sampler}
+*   @{tf.nn.fixed_unigram_candidate_sampler}
+
+### Miscellaneous candidate sampling utilities
+
+*   @{tf.nn.compute_accidental_hits}
+
+### Quantization ops
+
+*   @{tf.nn.quantized_conv2d}
+*   @{tf.nn.quantized_relu_x}
+*   @{tf.nn.quantized_max_pool}
+*   @{tf.nn.quantized_avg_pool}
--- a/tensorflow/docs_src/api_guides/python/python_io.md
+++ b/tensorflow/docs_src/api_guides/python/python_io.md
@ -0,0 +1,29 @@
+# Data IO (Python functions)
+[TOC]
+
+A TFRecords file represents a sequence of (binary) strings.  The format is not
+random access, so it is suitable for streaming large amounts of data but not
+suitable if fast sharding or other non-sequential access is desired.
+
+*   @{tf.python_io.TFRecordWriter}
+*   @{tf.python_io.tf_record_iterator}
+*   @{tf.python_io.TFRecordCompressionType}
+*   @{tf.python_io.TFRecordOptions}
+
+- - -
+
+## TFRecords Format Details
+
+A TFRecords file contains a sequence of strings with CRC hashes.  Each record
+has the format
+
+    uint64 length
+    uint32 masked_crc32_of_length
+    byte   data[length]
+    uint32 masked_crc32_of_data
+
+and the records are concatenated together to produce the file.  The CRC32s
+are [described here](https://en.wikipedia.org/wiki/Cyclic_redundancy_check),
+and the mask of a CRC is
+
+    masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul
--- a/tensorflow/docs_src/api_guides/python/script_ops.md
+++ b/tensorflow/docs_src/api_guides/python/script_ops.md
@ -0,0 +1,13 @@
+# Wraps python functions
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Script Language Operators
+
+TensorFlow provides allows you to wrap python/numpy functions as
+TensorFlow operators.
+
+*   @{tf.py_func}
--- a/tensorflow/docs_src/api_guides/python/session_ops.md
+++ b/tensorflow/docs_src/api_guides/python/session_ops.md
@ -0,0 +1,15 @@
+# Tensor Handle Operations
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Tensor Handle Operations
+
+TensorFlow provides several operators that allows the user to keep tensors
+"in-place" across run calls.
+
+*   @{tf.get_session_handle}
+*   @{tf.get_session_tensor}
+*   @{tf.delete_session_tensor}
--- a/tensorflow/docs_src/api_guides/python/sparse_ops.md
+++ b/tensorflow/docs_src/api_guides/python/sparse_ops.md
@ -0,0 +1,45 @@
+# Sparse Tensors
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Sparse Tensor Representation
+
+TensorFlow supports a `SparseTensor` representation for data that is sparse
+in multiple dimensions. Contrast this representation with `IndexedSlices`,
+which is efficient for representing tensors that are sparse in their first
+dimension, and dense along all other dimensions.
+
+*   @{tf.SparseTensor}
+*   @{tf.SparseTensorValue}
+
+## Conversion
+
+*   @{tf.sparse_to_dense}
+*   @{tf.sparse_tensor_to_dense}
+*   @{tf.sparse_to_indicator}
+*   @{tf.sparse_merge}
+
+## Manipulation
+
+*   @{tf.sparse_concat}
+*   @{tf.sparse_reorder}
+*   @{tf.sparse_reshape}
+*   @{tf.sparse_split}
+*   @{tf.sparse_retain}
+*   @{tf.sparse_reset_shape}
+*   @{tf.sparse_fill_empty_rows}
+*   @{tf.sparse_transpose}
+
+## Reduction
+*   @{tf.sparse_reduce_sum}
+*   @{tf.sparse_reduce_sum_sparse}
+
+## Math Operations
+*   @{tf.sparse_add}
+*   @{tf.sparse_softmax}
+*   @{tf.sparse_tensor_dense_matmul}
+*   @{tf.sparse_maximum}
+*   @{tf.sparse_minimum}
--- a/tensorflow/docs_src/api_guides/python/state_ops.md
+++ b/tensorflow/docs_src/api_guides/python/state_ops.md
@ -0,0 +1,108 @@
+# Variables
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Variables
+
+*   @{tf.Variable}
+
+## Variable helper functions
+
+TensorFlow provides a set of functions to help manage the set of variables
+collected in the graph.
+
+*   @{tf.global_variables}
+*   @{tf.local_variables}
+*   @{tf.model_variables}
+*   @{tf.trainable_variables}
+*   @{tf.moving_average_variables}
+*   @{tf.global_variables_initializer}
+*   @{tf.local_variables_initializer}
+*   @{tf.variables_initializer}
+*   @{tf.is_variable_initialized}
+*   @{tf.report_uninitialized_variables}
+*   @{tf.assert_variables_initialized}
+*   @{tf.assign}
+*   @{tf.assign_add}
+*   @{tf.assign_sub}
+
+## Saving and Restoring Variables
+
+*   @{tf.train.Saver}
+*   @{tf.train.latest_checkpoint}
+*   @{tf.train.get_checkpoint_state}
+*   @{tf.train.update_checkpoint_state}
+
+## Sharing Variables
+
+TensorFlow provides several classes and operations that you can use to
+create variables contingent on certain conditions.
+
+*   @{tf.get_variable}
+*   @{tf.get_local_variable}
+*   @{tf.VariableScope}
+*   @{tf.variable_scope}
+*   @{tf.variable_op_scope}
+*   @{tf.get_variable_scope}
+*   @{tf.make_template}
+*   @{tf.no_regularizer}
+*   @{tf.constant_initializer}
+*   @{tf.random_normal_initializer}
+*   @{tf.truncated_normal_initializer}
+*   @{tf.random_uniform_initializer}
+*   @{tf.uniform_unit_scaling_initializer}
+*   @{tf.zeros_initializer}
+*   @{tf.ones_initializer}
+*   @{tf.orthogonal_initializer}
+
+## Variable Partitioners for Sharding
+
+*   @{tf.fixed_size_partitioner}
+*   @{tf.variable_axis_size_partitioner}
+*   @{tf.min_max_variable_partitioner}
+
+## Sparse Variable Updates
+
+The sparse update ops modify a subset of the entries in a dense `Variable`,
+either overwriting the entries or adding / subtracting a delta.  These are
+useful for training embedding models and similar lookup-based networks, since
+only a small subset of embedding vectors change in any given step.
+
+Since a sparse update of a large tensor may be generated automatically during
+gradient computation (as in the gradient of
+@{tf.gather}),
+an @{tf.IndexedSlices} class is provided that encapsulates a set
+of sparse indices and values.  `IndexedSlices` objects are detected and handled
+automatically by the optimizers in most cases.
+
+*   @{tf.scatter_update}
+*   @{tf.scatter_add}
+*   @{tf.scatter_sub}
+*   @{tf.scatter_mul}
+*   @{tf.scatter_div}
+*   @{tf.scatter_nd_update}
+*   @{tf.scatter_nd_add}
+*   @{tf.scatter_nd_sub}
+*   @{tf.sparse_mask}
+*   @{tf.IndexedSlices}
+
+### Read-only Lookup Tables
+
+*   @{tf.initialize_all_tables}
+*   @{tf.tables_initializer}
+
+
+## Exporting and Importing Meta Graphs
+
+*   @{tf.train.export_meta_graph}
+*   @{tf.train.import_meta_graph}
+
+# Deprecated functions (removed after 2017-03-02). Please don't use them.
+
+*   @{tf.all_variables}
+*   @{tf.initialize_all_variables}
+*   @{tf.initialize_local_variables}
+*   @{tf.initialize_variables}
--- a/tensorflow/docs_src/api_guides/python/string_ops.md
+++ b/tensorflow/docs_src/api_guides/python/string_ops.md
@ -0,0 +1,34 @@
+# Strings
+
+Note: Functions taking `Tensor` arguments can also take anything accepted by
+@{tf.convert_to_tensor}.
+
+[TOC]
+
+## Hashing
+
+String hashing ops take a string input tensor and map each element to an
+integer.
+
+*   @{tf.string_to_hash_bucket_fast}
+*   @{tf.string_to_hash_bucket_strong}
+*   @{tf.string_to_hash_bucket}
+
+## Joining
+
+String joining ops concatenate elements of input string tensors to produce a new
+string tensor.
+
+*   @{tf.reduce_join}
+*   @{tf.string_join}
+
+## Splitting
+
+*   @{tf.string_split}
+*   @{tf.substr}
+
+## Conversion
+
+*   @{tf.as_string}
+*   @{tf.encode_base64}
+*   @{tf.decode_base64}
--- a/tensorflow/docs_src/api_guides/python/summary.md
+++ b/tensorflow/docs_src/api_guides/python/summary.md
@ -0,0 +1,23 @@
+# Summary Operations
+[TOC]
+
+Summaries provide a way to export condensed information about a model, which is
+then accessible in tools such as @{$summaries_and_tensorboard$TensorBoard}.
+
+## Generation of Summaries
+
+### Class for writing Summaries
+*   @{tf.summary.FileWriter}
+*   @{tf.summary.FileWriterCache}
+
+### Summary Ops
+*   @{tf.summary.tensor_summary}
+*   @{tf.summary.scalar}
+*   @{tf.summary.histogram}
+*   @{tf.summary.audio}
+*   @{tf.summary.image}
+*   @{tf.summary.merge}
+*   @{tf.summary.merge_all}
+
+## Utilities
+*   @{tf.summary.get_summary_description}
--- a/tensorflow/docs_src/api_guides/python/test.md
+++ b/tensorflow/docs_src/api_guides/python/test.md
@ -0,0 +1,44 @@
+# Testing
+[TOC]
+
+## Unit tests
+
+TensorFlow provides a convenience class inheriting from `unittest.TestCase`
+which adds methods relevant to TensorFlow tests.  Here is an example:
+
+```python
+    import tensorflow as tf
+
+
+    class SquareTest(tf.test.TestCase):
+
+      def testSquare(self):
+        with self.test_session():
+          x = tf.square([2, 3])
+          self.assertAllEqual(x.eval(), [4, 9])
+
+
+    if __name__ == '__main__':
+      tf.test.main()
+```
+
+`tf.test.TestCase` inherits from `unittest.TestCase` but adds a few additional
+methods.  We will document these methods soon.
+
+*   @{tf.test.main}
+*   @{tf.test.TestCase}
+*   @{tf.test.test_src_dir_path}
+
+## Utilities
+
+*   @{tf.test.assert_equal_graph_def}
+*   @{tf.test.get_temp_dir}
+*   @{tf.test.is_built_with_cuda}
+*   @{tf.test.is_gpu_available}
+*   @{tf.test.gpu_device_name}
+
+## Gradient checking
+
+@{tf.test.compute_gradient} and @{tf.test.compute_gradient_error} perform
+numerical differentiation of graphs for comparison against registered analytic
+gradients.
--- a/tensorflow/docs_src/api_guides/python/tfdbg.md
+++ b/tensorflow/docs_src/api_guides/python/tfdbg.md
@ -0,0 +1,50 @@
+# TensorFlow Debugger
+[TOC]
+
+Public Python API of TensorFlow Debugger (tfdbg).
+
+## Functions for adding debug watches
+
+These functions help you modify `RunOptions` to specify which `Tensor`s are to
+be watched when the TensorFlow graph is executed at runtime.
+
+*   @{tfdbg.add_debug_tensor_watch}
+*   @{tfdbg.watch_graph}
+*   @{tfdbg.watch_graph_with_blacklists}
+
+
+## Classes for debug-dump data and directories
+
+These classes allow you to load and inspect tensor values dumped from
+TensorFlow graphs during runtime.
+
+*   @{tfdbg.DebugTensorDatum}
+*   @{tfdbg.DebugDumpDir}
+
+
+## Functions for loading debug-dump data
+
+*   @{tfdbg.load_tensor_from_event_file}
+
+
+## Tensor-value predicates
+
+Built-in tensor-filter predicates to support conditional breakpoint between
+runs. See `DebugDumpDir.find()` for more details.
+
+*   @{tfdbg.has_inf_or_nan}
+
+
+## Session wrapper class and `SessionRunHook` implementations
+
+These classes allow you to
+
+* wrap aroundTensorFlow `Session` objects to debug plain TensorFlow models
+  (see `DumpingDebugWrapperSession` and `LocalCLIDebugWrapperSession`), or
+* generate `SessionRunHook` objects to debug `tf.contrib.learn` models (see
+  `DumpingDebugHook` and `LocalCLIDebugHook`).
+
+*   @{tfdbg.DumpingDebugHook}
+*   @{tfdbg.DumpingDebugWrapperSession}
+*   @{tfdbg.LocalCLIDebugHook}
+*   @{tfdbg.LocalCLIDebugWrapperSession}
--- a/tensorflow/docs_src/api_guides/python/train.md
+++ b/tensorflow/docs_src/api_guides/python/train.md
@ -0,0 +1,133 @@
+# Training
+[TOC]
+
+@{tf.train} provides a set of classes and functions that help train models.
+
+## Optimizers
+
+The Optimizer base class provides methods to compute gradients for a loss and
+apply gradients to variables.  A collection of subclasses implement classic
+optimization algorithms such as GradientDescent and Adagrad.
+
+You never instantiate the Optimizer class itself, but instead instantiate one
+of the subclasses.
+
+*   @{tf.train.Optimizer}
+*   @{tf.train.GradientDescentOptimizer}
+*   @{tf.train.AdadeltaOptimizer}
+*   @{tf.train.AdagradOptimizer}
+*   @{tf.train.AdagradDAOptimizer}
+*   @{tf.train.MomentumOptimizer}
+*   @{tf.train.AdamOptimizer}
+*   @{tf.train.FtrlOptimizer}
+*   @{tf.train.ProximalGradientDescentOptimizer}
+*   @{tf.train.ProximalAdagradOptimizer}
+*   @{tf.train.RMSPropOptimizer}
+
+## Gradient Computation
+
+TensorFlow provides functions to compute the derivatives for a given
+TensorFlow computation graph, adding operations to the graph. The
+optimizer classes automatically compute derivatives on your graph, but
+creators of new Optimizers or expert users can call the lower-level
+functions below.
+
+*   @{tf.gradients}
+*   @{tf.AggregationMethod}
+*   @{tf.stop_gradient}
+*   @{tf.hessians}
+
+
+## Gradient Clipping
+
+TensorFlow provides several operations that you can use to add clipping
+functions to your graph. You can use these functions to perform general data
+clipping, but they're particularly useful for handling exploding or vanishing
+gradients.
+
+*   @{tf.clip_by_value}
+*   @{tf.clip_by_norm}
+*   @{tf.clip_by_average_norm}
+*   @{tf.clip_by_global_norm}
+*   @{tf.global_norm}
+
+## Decaying the learning rate
+*   @{tf.train.exponential_decay}
+*   @{tf.train.inverse_time_decay}
+*   @{tf.train.natural_exp_decay}
+*   @{tf.train.piecewise_constant}
+*   @{tf.train.polynomial_decay}
+
+## Moving Averages
+
+Some training algorithms, such as GradientDescent and Momentum often benefit
+from maintaining a moving average of variables during optimization.  Using the
+moving averages for evaluations often improve results significantly.
+
+*   @{tf.train.ExponentialMovingAverage}
+
+## Coordinator and QueueRunner
+
+See @{$threading_and_queues$Threading and Queues}
+for how to use threads and queues.  For documentation on the Queue API,
+see @{$python/io_ops#queues$Queues}.
+
+
+*   @{tf.train.Coordinator}
+*   @{tf.train.QueueRunner}
+*   @{tf.train.LooperThread}
+*   @{tf.train.add_queue_runner}
+*   @{tf.train.start_queue_runners}
+
+## Distributed execution
+
+See @{$distributed$Distributed TensorFlow} for
+more information about how to configure a distributed TensorFlow program.
+
+*   @{tf.train.Server}
+*   @{tf.train.Supervisor}
+*   @{tf.train.SessionManager}
+*   @{tf.train.ClusterSpec}
+*   @{tf.train.replica_device_setter}
+*   @{tf.train.MonitoredTrainingSession}
+*   @{tf.train.MonitoredSession}
+*   @{tf.train.SingularMonitoredSession}
+*   @{tf.train.Scaffold}
+*   @{tf.train.SessionCreator}
+*   @{tf.train.ChiefSessionCreator}
+*   @{tf.train.WorkerSessionCreator}
+
+## Reading Summaries from Event Files
+
+See @{$summaries_and_tensorboard$Summaries and TensorBoard} for an
+overview of summaries, event files, and visualization in TensorBoard.
+
+*   @{tf.train.summary_iterator}
+
+## Training Hooks
+
+Hooks are tools that run in the process of training/evaluation of the model.
+
+*   @{tf.train.SessionRunHook}
+*   @{tf.train.SessionRunArgs}
+*   @{tf.train.SessionRunContext}
+*   @{tf.train.SessionRunValues}
+*   @{tf.train.LoggingTensorHook}
+*   @{tf.train.StopAtStepHook}
+*   @{tf.train.CheckpointSaverHook}
+*   @{tf.train.NewCheckpointReader}
+*   @{tf.train.StepCounterHook}
+*   @{tf.train.NanLossDuringTrainingError}
+*   @{tf.train.NanTensorHook}
+*   @{tf.train.SummarySaverHook}
+*   @{tf.train.GlobalStepWaiterHook}
+*   @{tf.train.FinalOpsHook}
+*   @{tf.train.FeedFnHook}
+
+## Training Utilities
+
+*   @{tf.train.global_step}
+*   @{tf.train.basic_train_loop}
+*   @{tf.train.get_global_step}
+*   @{tf.train.assert_global_step}
+*   @{tf.train.write_graph}
--- a/tensorflow/g3doc/how_tos/documentation/index.md
+++ b/tensorflow/g3doc/how_tos/documentation/index.md
@ -138,7 +138,7 @@ a 3-D tensor with shape `[6, 8, 6]`.
 ### Links

 To link to something else in the `g3docs` tree, use a relative path, like
-`[tf.parse_example](../api_docs/python/ops.md#parse_example)`
+`@{tf.parse_example}`
 Do not use absolute paths for internal links, as this will break the website
 generator.

@ -320,7 +320,7 @@ Here's an example from the module docsting in `image_ops.py`:

    TensorFlow can convert between images in RGB or HSV. The conversion
    functions work only on `float` images, so you need to convert images in
-    other formats using [`convert_image_dtype`](#convert-image-dtype).
+    other formats using @{tf.image.convert_image_dtype}.

    Example:

--- a/tensorflow/docs_src/community/leftnav_files
+++ b/tensorflow/docs_src/community/leftnav_files
@ -0,0 +1,2 @@
+documentation.md
+style_guide.md
--- a/tensorflow/docs_src/community/style_guide.md
+++ b/tensorflow/docs_src/community/style_guide.md
@ -106,7 +106,7 @@ creates a part of the graph and returns output tensors.
 * Operations should contain an extensive Python comment with Args and Returns
 declarations that explain both the type and meaning of each value. Possible
 shapes, dtypes, or ranks should be specified in the description.
- [See documentation details](documentation/index.md)
+ @{$documentation$See documentation details}

 * For increased usability include an example of usage with inputs / outputs
 of the op in Example section.
--- a/tensorflow/g3doc/how_tos/distributed/index.md
+++ b/tensorflow/g3doc/how_tos/distributed/index.md
@ -2,7 +2,7 @@

 This document shows how to create a cluster of TensorFlow servers, and how to
 distribute a computation graph across that cluster. We assume that you are
-familiar with the [basic concepts](../../get_started/basic_usage.md) of
+familiar with the @{$get_started$basic concepts} of
 writing TensorFlow programs.

 ## Hello distributed TensorFlow!
@ -21,7 +21,7 @@ $ python
 ```

 The
-[`tf.train.Server.create_local_server()`](../../api_docs/python/train.md#Server.create_local_server)
+@{tf.train.Server.create_local_server}
 method creates a single-process cluster, with an in-process server.

 ## Create a cluster
@ -49,7 +49,7 @@ the following:

 The cluster specification dictionary maps job names to lists of network
 adresses. Pass this dictionary to
-the [`tf.train.ClusterSpec`](../../api_docs/python/train.md#ClusterSpec)
+the @{tf.train.ClusterSpec}
 constructor.  For example:

 <table>
@ -78,10 +78,10 @@ tf.train.ClusterSpec({

 ### Create a `tf.train.Server` instance in each task

-A [`tf.train.Server`](../../api_docs/python/train.md#Server) object contains a
+A @{tf.train.Server} object contains a
 set of local devices, a set of connections to other tasks in its
 `tf.train.ClusterSpec`, and a
-["session target"](../../api_docs/python/client.md#Session) that can use these
+@{tf.Session} that can use these
 to perform a distributed computation. Each server is a member of a specific
 named job and has a task index within that job.  A server can communicate with
 any other server in the cluster.
@ -111,7 +111,7 @@ which you'd like to see support, please raise a
 ## Specifying distributed devices in your model

 To place operations on a particular process, you can use the same
-[`tf.device()`](../../api_docs/python/framework.md#device)
+@{tf.device}
 function that is used to specify whether ops run on the CPU or GPU. For example:

 ```python
@ -159,7 +159,7 @@ simplify the work of specifying a replicated model. Possible approaches include:
  for each `/job:worker` task, typically in the same process as the worker
  task. Each client builds a similar graph containing the parameters (pinned to
  `/job:ps` as before using
-  [`tf.train.replica_device_setter()`](../../api_docs/python/train.md#replica_device_setter)
+  @{tf.train.replica_device_setter}
  to map them deterministically to the same tasks); and a single copy of the
  compute-intensive part of the model, pinned to the local task in
  `/job:worker`.
@ -174,7 +174,7 @@ simplify the work of specifying a replicated model. Possible approaches include:
  gradient averaging as in the
  [CIFAR-10 multi-GPU trainer](https://www.tensorflow.org/code/tensorflow_models/tutorials/image/cifar10/cifar10_multi_gpu_train.py)),
  and between-graph replication (e.g. using the
-  [`tf.train.SyncReplicasOptimizer`](../../api_docs/python/train.md#SyncReplicasOptimizer)).
+  @{tf.train.SyncReplicasOptimizer}).

 ### Putting it all together: example trainer program

@ -312,7 +312,7 @@ A TensorFlow cluster comprises a one or more "jobs", each divided into lists of
 one or more "tasks". A cluster is typically dedicated to a particular high-level
 objective, such as training a neural network, using many machines in parallel. A
 cluster is defined by
-a [`tf.train.ClusterSpec`](../../api_docs/python/train.md#ClusterSpec) object.
+a @{tf.train.ClusterSpec} object.

 **Job**

@ -338,7 +338,7 @@ to a single process. A task belongs to a particular "job" and is identified by
 its index within that job's list of tasks.

 **TensorFlow server** A process running
-a [`tf.train.Server`](../../api_docs/python/train.md#Server) instance, which is
+a @{tf.train.Server} instance, which is
 a member of a cluster, and exports a "master service" and "worker service".

 **Worker service**
--- a/tensorflow/g3doc/how_tos/hadoop/index.md
+++ b/tensorflow/g3doc/how_tos/hadoop/index.md
@ -6,7 +6,7 @@ at the moment.

 ## HDFS

-We assume that you are familiar with [reading data](../reading_data/index.md).
+We assume that you are familiar with @{$reading_data$reading data}.

 To use HDFS with TensorFlow, change the file paths you use to read and write
 data to an HDFS path. For example:
@ -61,5 +61,5 @@ be set:
    export KERB_TICKET_CACHE_PATH=/tmp/krb5cc_10002
    ```

-If you are running [Distributed TensorFlow](../distributed/index.md), then all
+If you are running @{$distributed$Distributed TensorFlow}, then all
 workers must have the environment variables set and Hadoop installed.
--- a/tensorflow/docs_src/deploy/leftnav_files
+++ b/tensorflow/docs_src/deploy/leftnav_files
@ -0,0 +1,3 @@
+hadoop.md
+distributed.md
+tfserve.md
--- a/tensorflow/g3doc/tutorials/tfserve/index.md
+++ b/tensorflow/g3doc/tutorials/tfserve/index.md
--- a/tensorflow/docs_src/extend/add_filesys.md
+++ b/tensorflow/docs_src/extend/add_filesys.md
@ -0,0 +1,256 @@
+# Adding a Custom Filesystem Plugin
+
+## Background
+
+The TensorFlow framework is often used in multi-process and
+multi-machine environments, such as Google data centers, Google Cloud
+Machine Learning, Amazon Web Services (AWS), and on-site distributed clusters.
+In order to both share and save certain types of state produced by TensorFlow,
+the framework assumes the existence of a reliable, shared filesystem. This
+shared filesystem has numerous uses, for example:
+
+*   Checkpoints of state are often saved to a distributed filesystem for
+    reliability and fault-tolerance.
+*   Training processes communicate with TensorBoard by writing event files
+    to a directory, which TensorBoard watches. A shared filesystem allows this
+    communication to work even when TensorBoard runs in a different process or
+    machine.
+
+There are many different implementations of shared or distributed filesystems in
+the real world, so TensorFlow provides an ability for users to implement a
+custom FileSystem plugin that can be registered with the TensorFlow runtime.
+When the TensorFlow runtime attempts to write to a file through the `FileSystem`
+interface, it uses a portion of the pathname to dynamically select the
+implementation that should be used for filesystem operations. Thus, adding
+support for your custom filesystem requires implementing a `FileSystem`
+interface, building a shared object containing that implementation, and loading
+that object at runtime in whichever process needs to write to that filesystem.
+
+Note that TensorFlow already includes many filesystem implementations, such as:
+
+*   A standard POSIX filesystem
+
+    Note: NFS filesystems often mount as a POSIX interface, and so standard
+    TensorFlow can work on top of NFS-mounted remote filesystems.
+*   HDFS - the Hadoop File System
+*   GCS - Google Cloud Storage filesystem
+*   A "memory-mapped-file" filesystem
+
+The rest of this guide describes how to implement a custom filesystem.
+
+## Implementing a custom filesystem plugin
+
+To implement a custom filesystem plugin, you must do the following:
+
+*   Implement subclasses of `RandomAccessFile`, `WriteableFile`,
+    `AppendableFile`, and `ReadOnlyMemoryRegion`.
+*   Implement the `FileSystem` interface as a subclass.
+*   Register the `FileSystem` implementation with an appropriate prefix pattern.
+*   Load the filesystem plugin in a process that wants to write to that
+    filesystem.
+
+### The FileSystem interface
+
+The `FileSystem` interface is an abstract C++ interface defined in
+[file_system.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h).
+An implementation of the `FileSystem` interface should implement all relevant
+the methods defined by the interface. Implementing the interface requires
+defining operations such as creating `RandomAccessFile`, `WritableFile`, and
+implementing standard filesystem operations such as `FileExists`, `IsDirectory`,
+`GetMatchingPaths`, `DeleteFile`, and so on. An implementation of these
+interfaces will often involve translating the function's input arguments to
+delegate to an already-existing library function implementing the equivalent
+functionality in your custom filesystem.
+
+For example, the `PosixFileSystem` implementation implements `DeleteFile` using
+the POSIX `unlink()` function; `CreateDir` simply calls `mkdir()`; `GetFileSize`
+involves calling `stat()` on the file and then returns the filesize as reported
+by the return of the stat object. Similarly, for the `HDFSFileSystem`
+implementation, these calls simply delegate to the `libHDFS` implementation of
+similar functionality, such as `hdfsDelete` for
+[DeleteFile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.cc#L386).
+
+We suggest looking through these code examples to get an idea of how different
+filesystem implementations call their existing libraries. Examples include:
+
+*   [POSIX
+    plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/posix/posix_file_system.h)
+*   [HDFS
+    plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h)
+*   [GCS
+    plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/cloud/gcs_file_system.h)
+
+#### The File interfaces
+
+Beyond operations that allow you to query and manipulate files and directories
+in a filesystem, the `FileSystem` interface requires you to implement factories
+that return implementations of abstract objects such as the
+[RandomAccessFile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h#L223),
+the `WritableFile`, so that TensorFlow code and read and write to files in that
+`FileSystem` implementation.
+
+To implement a `RandomAccessFile`, you must implement a single interface called
+`Read()`, in which the implementation must provide a way to read from an offset
+within a named file.
+
+For example, below is the implementation of RandomAccessFile for the POSIX
+filesystem, which uses the `pread()` random-access POSIX function to implement
+read. Notice that the particular implementation must know how to retry or
+propagate errors from the underlying filesystem.
+
+```C++
+    class PosixRandomAccessFile : public RandomAccessFile {
+     public:
+      PosixRandomAccessFile(const string& fname, int fd)
+          : filename_(fname), fd_(fd) {}
+      ~PosixRandomAccessFile() override { close(fd_); }
+
+      Status Read(uint64 offset, size_t n, StringPiece* result,
+                  char* scratch) const override {
+        Status s;
+        char* dst = scratch;
+        while (n > 0 && s.ok()) {
+          ssize_t r = pread(fd_, dst, n, static_cast<off_t>(offset));
+          if (r > 0) {
+            dst += r;
+            n -= r;
+            offset += r;
+          } else if (r == 0) {
+            s = Status(error::OUT_OF_RANGE, "Read less bytes than requested");
+          } else if (errno == EINTR || errno == EAGAIN) {
+            // Retry
+          } else {
+            s = IOError(filename_, errno);
+          }
+        }
+        *result = StringPiece(scratch, dst - scratch);
+        return s;
+      }
+
+     private:
+      string filename_;
+      int fd_;
+    };
+```
+
+To implement the WritableFile sequential-writing abstraction, one must implement
+a few interfaces, such as `Append()`, `Flush()`, `Sync()`, and `Close()`.
+
+For example, below is the implementation of WritableFile for the POSIX
+filesystem, which takes a `FILE` object in its constructor and uses standard
+posix functions on that object to implement the interface.
+
+```C++
+    class PosixWritableFile : public WritableFile {
+     public:
+      PosixWritableFile(const string& fname, FILE* f)
+          : filename_(fname), file_(f) {}
+
+      ~PosixWritableFile() override {
+        if (file_ != NULL) {
+          fclose(file_);
+        }
+      }
+
+      Status Append(const StringPiece& data) override {
+        size_t r = fwrite(data.data(), 1, data.size(), file_);
+        if (r != data.size()) {
+          return IOError(filename_, errno);
+        }
+        return Status::OK();
+      }
+
+      Status Close() override {
+        Status result;
+        if (fclose(file_) != 0) {
+          result = IOError(filename_, errno);
+        }
+        file_ = NULL;
+        return result;
+      }
+
+      Status Flush() override {
+        if (fflush(file_) != 0) {
+          return IOError(filename_, errno);
+        }
+        return Status::OK();
+      }
+
+      Status Sync() override {
+        Status s;
+        if (fflush(file_) != 0) {
+          s = IOError(filename_, errno);
+        }
+        return s;
+      }
+
+     private:
+      string filename_;
+      FILE* file_;
+    };
+
+```
+
+For more details, please see the documentations of those interfaces, and look at
+example implementations for inspiration.
+
+### Registering and loading the filesystem
+
+Once you have implemented the `FileSystem` implementation for your custom
+filesystem, you need to register it under a "scheme" so that paths prefixed with
+that scheme are directed to your implementation. To do this, you call
+`REGISTER_FILE_SYSTEM`::
+
+```
+    REGISTER_FILE_SYSTEM("foobar", FooBarFileSystem);
+```
+
+When TensorFlow tries to operate on a file whose path starts with `foobar://`,
+it will use the `FooBarFileSystem` implementation.
+
+```C++
+    string filename = "foobar://path/to/file.txt";
+    std::unique_ptr<WritableFile> file;
+
+    // Calls FooBarFileSystem::NewWritableFile to return
+    // a WritableFile class, which happens to be the FooBarFileSystem's
+    // WritableFile implementation.
+    TF_RETURN_IF_ERROR(env->NewWritableFile(filename, &file));
+```
+
+Next, you must build a shared object containing this implementation. An example
+of doing so using bazel's `cc_binary` rule can be found
+[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/BUILD#L244),
+but you may use any build system to do so. See the section on @{$adding_an_op#build-the-op-library$building the op library} for similar
+instructions.
+
+The result of building this target is a `.so` shared object file.
+
+Lastly, you must dynamically load this implementation in the process. In Python,
+you can call the `tf.load_file_system_library(file_system_library)` function,
+passing the path to the shared object. Calling this in your client program loads
+the shared object in the process, thus registering your implementation as
+available for any file operations going through the `FileSystem` interface. You
+can see
+[test_file_system.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/file_system_test.py)
+for an example.
+
+## What goes through this interface?
+
+Almost all core C++ file operations within TensorFlow use the `FileSystem`
+interface, such as the `CheckpointWriter`, the `EventsWriter`, and many other
+utilities. This means implementing a `FileSystem` implementation allows most of
+your TensorFlow programs to write to your shared filesystem.
+
+In Python, the `gfile` and `file_io` classes bind underneath to the `FileSystem
+implementation via SWIG, which means that once you have loaded this filesystem
+library, you can do:
+
+```
+with gfile.Open("foobar://path/to/file.txt") as w:
+
+  w.write("hi")
+```
+
+When you do this, a file containing "hi" will appear in the "/path/to/file.txt"
+of your shared filesystem.
--- a/tensorflow/g3doc/how_tos/adding_an_op/index.md
+++ b/tensorflow/g3doc/how_tos/adding_an_op/index.md
@ -1,44 +1,71 @@
 # Adding a New Op

+If you'd like to create an op that isn't covered by the existing TensorFlow
+library, we recommend that you first try writing the op in Python as
+a composition of existing Python ops or functions. If that isn't possible, you
+can create a custom C++ op. There are several reasons why you might want to
+create a custom C++ op:
+
+*   It's not easy or possible to express your operation as a composition of
+    existing ops.
+*   It's not efficient to express your operation as a composition of existing
+    primitives.
+*   You want to hand-fuse a composition of primitives that a future compiler
+    would find difficult fusing.
+
+For example, imagine you want to implement something like "median pooling",
+similar to the "MaxPool" operator, but computing medians over sliding windows
+instead of maximum values.  Doing this using a composition of operations may be
+possible (e.g., using ExtractImagePatches and TopK), but may not be as
+performance- or memory-efficient as a native operation where you can do
+something more clever in a single, fused operation. As always, it is typically
+first worth trying to express what you want using operator composition, only
+choosing to add a new operation if that proves to be difficult or inefficient.
+
+To incorporate your custom op you'll need to:
+
+1.  Register the new op in a C++ file. Op registration defines an interface
+    (specification) for the op's functionality, which is independent of the
+    op's implementation. For example, op registration defines the op's name and
+    the op's inputs and outputs. It also defines the shape function
+    that is used for tensor shape inference.
+2.  Implement the op in C++. The implementation of an op is known
+    as a kernel, and it is the concrete implementation of the specification you
+    registered in Step 1. There can be multiple kernels for different input /
+    output types or architectures (for example, CPUs, GPUs).
+3.  Create a Python wrapper (optional). This wrapper is the public API that's
+    used to create the op in Python. A default wrapper is generated from the
+    op registration, which can be used directly or added to.
+4.  Write a function to compute gradients for the op (optional).
+5.  Test the op. We usually do this in Python for convenience, but you can also
+    test the op in C++. If you define gradients, you can verify them with the
+    Python @{tf.test.compute_gradient_error$gradient checker}.
+    See
+    [`relu_op_test.py`](https://www.tensorflow.org/code/tensorflow/python/kernel_tests/relu_op_test.py) as
+    an example that does tests the forward functions of Relu-like operators and
+    their gradients.
+
 PREREQUISITES:

-* Some familiarity with C++.
-* Must have installed the
-  [TensorFlow binary](../../get_started/os_setup.md#pip-installation), or must
-  have
-  [downloaded TensorFlow source](../../get_started/os_setup.md#installing-from-sources),
-  and be able to build it.
-
-If you'd like to incorporate an operation that isn't covered by the existing
-library, you can create a custom Op. To incorporate your custom Op, you'll need
-to:
-
-* Register the new Op in a C++ file. The Op registration is independent of the
-  implementation, and describes the semantics of how the Op is invoked. For
-  example, it defines the Op name, and specifies its inputs and outputs.
-  It also defines the shape function that is used for tensor shape inference.
-* Implement the Op in C++. This implementation is called a "kernel", and there
-  can be multiple kernels for different architectures (e.g. CPUs, GPUs) or
-  input / output types.
-* Optionally, create a Python wrapper. This wrapper is the public API to create
-  the Op. A default wrapper is generated from the Op registration, which can be
-  used directly or added to.
-* Optionally, write a function to compute gradients for the Op.
-* Test the Op, typically in Python. If you define gradients, you can verify them with the Python [`GradientChecker`](https://www.tensorflow.org/code/tensorflow/python/kernel_tests/gradient_checker.py).
+*   Some familiarity with C++.
+*   Must have installed the
+    @{$install$TensorFlow binary}, or must have
+    @{$install_sources$downloaded TensorFlow source},
+    and be able to build it.

 [TOC]

-## Define the Op's interface
+## Define the op's interface

-You define the interface of an Op by registering it with the TensorFlow system.
-In the registration, you specify the name of your Op, its inputs (types and
+You define the interface of an op by registering it with the TensorFlow system.
+In the registration, you specify the name of your op, its inputs (types and
 names) and outputs (types and names), as well as docstrings and
-any [attrs](#attrs) the Op might require.
+any [attrs](#attrs) the op might require.

-To see how this works, suppose you'd like to create an Op that takes a tensor of
+To see how this works, suppose you'd like to create an op that takes a tensor of
 `int32`s and outputs a copy of the tensor, with all but the first element set to
-zero. Create file `tensorflow/core/user_ops/zero_out.cc` and
-add a call to the `REGISTER_OP` macro that defines the interface for such an Op:
+zero. To do this, create a file named `zero_out.cc`. Then add a call to the
+`REGISTER_OP` macro that defines the interface for your op:

 ```c++
 #include "tensorflow/core/framework/op.h"
@ -55,28 +82,24 @@ REGISTER_OP("ZeroOut")
    });
 ```

-This `ZeroOut` Op takes one tensor `to_zero` of 32-bit integers as input, and
-outputs a tensor `zeroed` of 32-bit integers of the same shape as the input.
-For example, if the input is a Tensor of shape [10, 20], then this shape
-function specifies that the output shape is also [10, 20].
+This `ZeroOut` op takes one tensor `to_zero` of 32-bit integers as input, and
+outputs a tensor `zeroed` of 32-bit integers. The op also uses a shape function
+to ensure that the output tensor is the same shape as the input tensor. For
+example, if the input is a tensor of shape [10, 20], then this shape function
+specifies that the output shape is also [10, 20].

-> A note on naming: The name of the Op should be unique and CamelCase.  Names
-> starting with an underscore (`_`) are reserved for internal use.

-## Implement the kernel for the Op
+>   A note on naming: The op name must be in CamelCase and it must be unique
+>   among all other ops that are registered in the binary.

-After you define the interface, provide one or more implementations of the Op.
+## Implement the kernel for the op
+
+After you define the interface, provide one or more implementations of the op.
 To create one of these kernels, create a class that extends `OpKernel` and
 overrides the `Compute` method. The `Compute` method provides one `context`
 argument of type `OpKernelContext*`, from which you can access useful things
 like the input and output tensors.

-> Important note: Instances of your OpKernel may be accessed concurrently. Your
-> `Compute` method must be thread-safe. Guard any access to class members with a
-> mutex (Or better yet, don't share state via class members! Consider using a
-> [`ResourceMgr`](https://www.tensorflow.org/code/tensorflow/core/framework/resource_mgr.h)
-> to keep track of Op state).
-
 Add your kernel to the file you created above. The kernel might look something
 like this:

@ -123,12 +146,18 @@ To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
 REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
 ```

-## Building the Op library
-### With TensorFlow binary installation
+>   Important: Instances of your OpKernel may be accessed concurrently.
+>   Your `Compute` method must be thread-safe. Guard any access to class
+>   members with a mutex. Or better yet, don't share state via class members!
+>   Consider using a [`ResourceMgr`](https://www.tensorflow.org/code/tensorflow/core/framework/resource_mgr.h)
+>   to keep track of op state.
+
+## Build the op library
+### Compile the op using your system compiler (TensorFlow binary installation)

 You should be able to compile `zero_out.cc` with a `C++` compiler such as `g++`
 or `clang` available on your system. The binary PIP package installs the header
-files and the library that you need to compile your Op in locations that are
+files and the library that you need to compile your op in locations that are
 system specific. However, the TensorFlow python library provides the
 `get_include` function to get the header directory.
 Here is the output of this function on a Ubuntu machine.
@ -142,7 +171,7 @@ $ python
 ```

 Assuming you have `g++` installed, here is the sequence of commands you can use
-to compile your Op into a dynamic library.
+to compile your op into a dynamic library.

 ```bash
 TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
@ -151,18 +180,19 @@ g++ -std=c++11 -shared zero_out.cc -o zero_out.so -fPIC -I $TF_INC -O2
 ```

 On Mac OS X, the additional flag "-undefined dynamic_lookup" is required when
-building the .so file.
+building the `.so` file.

-> Note on gcc version 5: gcc5 uses the new C++
-[ABI](https://gcc.gnu.org/gcc-5/changes.html#libstdcxx). The binary pip packages
-available on the TensorFlow website are built with gcc4 that uses the older ABI.
-If you compile your op library with gcc5, add `-D_GLIBCXX_USE_CXX11_ABI=0` to
-the command line to make the library compatible with the older abi.
+>   Note on gcc version 5: gcc5 uses the new C++
+>   [ABI](https://gcc.gnu.org/gcc-5/changes.html#libstdcxx). The binary pip
+>   packages available on the TensorFlow website are built with gcc4 that uses
+>   the older ABI. If you compile your op library with gcc5, add
+>   `-D_GLIBCXX_USE_CXX11_ABI=0` to the command line to make the library
+>   compatible with the older abi.

-### With TensorFlow source installation
+### Compile the op using bazel (TensorFlow source installation)

 If you have TensorFlow sources installed, you can make use of TensorFlow's build
-system to compile your Op. Place a BUILD file with following Bazel build rule in
+system to compile your op. Place a BUILD file with following Bazel build rule in
 the [`tensorflow/core/user_ops`][user_ops] directory.

 ```python
@ -180,20 +210,20 @@ Run the following command to build `zero_out.so`.
 $ bazel build --config opt //tensorflow/core/user_ops:zero_out.so
 ```

-> Note:
-Although you can create a shared library (a `.so` file) with the standard
-`cc_library` rule, we strongly recommend that you use the `tf_custom_op_library`
-macro. It adds some required dependencies, and performs checks to ensure that
-the shared library is compatible with TensorFlow's plugin loading mechanism.
+>   Note: Although you can create a shared library (a `.so` file) with the
+>   standard `cc_library` rule, we strongly recommend that you use the
+>   `tf_custom_op_library` macro. It adds some required dependencies, and
+>   performs checks to ensure that the shared library is compatible with
+>   TensorFlow's plugin loading mechanism.

-## Using the Op in Python
+## Use the op in Python

 TensorFlow Python API provides the
-[load_op_library](../../api_docs/python/framework#load_op_library) function to
-load the dynamic library and register the Op with the TensorFlow
-framework. `load_op_library` returns a Python module, that contains the Python
-wrappers for the Op. Thus, once you have built the op, you can do the following
-to run it from Python :
+@{tf.load_op_library} function to
+load the dynamic library and register the op with the TensorFlow
+framework. `load_op_library` returns a Python module that contains the Python
+wrappers for the op and the kernel. Thus, once you have built the op, you can
+do the following to run it from Python :

 ```python
 import tensorflow as tf
@ -202,18 +232,16 @@ with tf.Session(''):
  zero_out_module.zero_out([[1, 2], [3, 4]]).eval()

 # Prints
-array([[1, 0],
-       [0, 0]], dtype=int32)
+array([[1, 0], [0, 0]], dtype=int32)
 ```

-> Note: The generated function will be given a snake\_case name (to comply with
-> [PEP8](https://www.python.org/dev/peps/pep-0008/)).  So if your op is named
-> `ZeroOut` in the C++ files, the python function will be called `zero_out`.
+Keep in mind, the generated function will be given a snake\_case name (to comply
+with [PEP8](https://www.python.org/dev/peps/pep-0008/)). So, if your op is
+named `ZeroOut` in the C++ files, the python function will be called `zero_out`.

-To make the Op available as a regular function `import`-able from a Python
+To make the op available as a regular function `import`-able from a Python
 module, it maybe useful to have the `load_op_library` call in a Python source
-file as follows (see
-[zero_out_op_1.py](https://www.tensorflow.org/code/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_1.py))
+file as follows (see [zero_out_op_1.py](https://www.tensorflow.org/code/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_1.py))
 :

 ```python
@ -223,11 +251,11 @@ _zero_out_module = tf.load_op_library('zero_out_op_kernel_1.so')
 zero_out = _zero_out_module.zero_out
 ```

-## Verify it works
+## Verify that the op works

-A good way to verify that you've successfully implemented your Op is to write a
+A good way to verify that you've successfully implemented your op is to write a
 test for it. Create the file
-`tensorflow/python/kernel_tests/zero_out_op_test.py` with the contents:
+`zero_out_op_test.py` with the contents:

 ```python
 import tensorflow as tf
@ -243,26 +271,33 @@ if __name__ == "__main__":
  tf.test.main()
 ```

-Add a 'zero_out_op_test' target to `tensorflow/python/kernel_tests/BUILD` among the other CPU-only test targets:
-
-```
-tf_py_test(
-    name = "zero_out_op_test",
-    size = "small",
-    srcs = ["zero_out_op_test.py"],
-    additional_deps = ["//tensorflow:tensorflow_py"],
-)
-```
-
-Then run your test:
+Then run your test (assuming you have tensorflow installed):

 ```sh
-$ bazel test //tensorflow/python/kernel_tests:zero_out_op_test
+$ python zero_out_op_test.py
 ```

-## Validation
+## Building advanced features into your op

-The example above assumed that the Op applied to a tensor of any shape.  What
+Now that you know how to build a basic (and somewhat restricted) op and
+implementation, we'll look at some of the more complicated things you will
+typically need to build into your op. This includes:
+
+*   [Conditional checks and validation](#validate)
+*   Op registration
+    *   [Attrs](#attrs)
+    *   [Attr types](#attr-types)
+    *   [Polymorphism](#polymorphism)
+    *   [Inputs and outputs](#inputs-outputs)
+    *   [Backwards compatibility](#backward-compat)
+*   [GPU support](#gpu-support)
+    *   [Compiling the kernel for the GPU device](#compiling-kernel)
+*   [Implement the gradient in Python](#implement-gradient)
+*   [Shape functions in C++](#shape-functions)
+
+### Conditional checks and validation {#validate}
+
+The example above assumed that the op applied to a tensor of any shape.  What
 if it only applied to vectors?  That means adding a check to the above OpKernel
 implementation.

@ -299,20 +334,22 @@ function is an error, and if so return it, use
 [`OP_REQUIRES_OK`][validation-macros].  Both of these macros return from the
 function on error.

-## Op registration
+### Op registration

-### Attrs
+#### Attrs {#attrs}

-Ops can have attrs, whose values are set when the Op is added to a graph. These
-are used to configure the Op, and their values can be accessed both within the
-kernel implementation and in the types of inputs and outputs in the Op
+Ops can have attrs, whose values are set when the op is added to a graph. These
+are used to configure the op, and their values can be accessed both within the
+kernel implementation and in the types of inputs and outputs in the op
 registration. Prefer using an input instead of an attr when possible, since
-inputs are more flexible.  They can change every step, be set using a feed, etc.
-Attrs are used for things that can't be done with inputs: any configuration
-that affects the signature (number or type of inputs or outputs) or that
-can't change from step-to-step.
+inputs are more flexible. This is because attrs are constants and must be
+defined at graph construction time. In contrast, inputs are Tensors whose
+values can be dynamic; that is, inputs can change every step, be set using a
+feed, etc. Attrs are used for things that can't be done with inputs: any
+configuration that affects the signature (number or type of inputs or outputs)
+or that can't change from step-to-step.

-You define an attr when you register the Op, by specifying its name and type
+You define an attr when you register the op, by specifying its name and type
 using the `Attr` method, which expects a spec of the form:

 ```
@ -323,8 +360,8 @@ where `<name>` begins with a letter and can be composed of alphanumeric
 characters and underscores, and `<attr-type-expr>` is a type expression of the
 form [described below](#attr-types).

-For example, if you'd like the `ZeroOut` Op to preserve a user-specified index,
-instead of only the 0th element, you can register the Op like so:
+For example, if you'd like the `ZeroOut` op to preserve a user-specified index,
+instead of only the 0th element, you can register the op like so:
 <code class="lang-c++"><pre>
 REGISTER\_OP("ZeroOut")
    <b>.Attr("preserve\_index: int")</b>
@ -332,6 +369,9 @@ REGISTER\_OP("ZeroOut")
    .Output("zeroed: int32");
 </pre></code>

+(Note that the set of [attribute types](#attr-types) is different from the
+@{$dims_types$tensor types} used for inputs and outputs.)
+
 Your kernel can then access this attr in its constructor via the `context`
 parameter:
 <code class="lang-c++"><pre>
@ -358,7 +398,9 @@ which can then be used in the `Compute` method:
 <code class="lang-c++"><pre>
  void Compute(OpKernelContext\* context) override {
    // ...
-<br/>    <b>// Check that preserve\_index is in range
+<br/>
+    <b>// We're using saved attr to validate potentially dynamic input
+    // So we check that preserve\_index is in range
    OP\_REQUIRES(context, preserve\_index_ &lt; input.dimension(0),
                errors::InvalidArgument("preserve\_index out of range"));<br/>
    </b>// Set all the elements of the output tensor to 0
@ -371,18 +413,7 @@ which can then be used in the `Compute` method:
  }
 </pre></code>

-> To preserve [backwards compatibility](#backwards-compatibility), you should
-> specify a [default value](#default-values-constraints) when adding an attr to
-> an existing op:
->
-> <code class="lang-c++"><pre>
-> REGISTER\_OP("ZeroOut")
->     <b>.Attr("preserve\_index: int = 0")</b>
->     .Input("to\_zero: int32")
->     .Output("zeroed: int32");
-> </pre></code>
-
-### Attr types
+#### Attr types {#attr-types}

 The following types are supported in an attr:

@ -398,7 +429,7 @@ The following types are supported in an attr:

 See also: [`op_def_builder.cc:FinalizeAttr`][FinalizeAttr] for a definitive list.

-#### Default values & constraints
+##### Default values & constraints

 Attrs may have default values, and some types of attrs can have constraints. To
 define an attr with constraints, you can use the following `<attr-type-expr>`s:
@ -414,7 +445,7 @@ define an attr with constraints, you can use the following `<attr-type-expr>`s:

 * `{<type1>, <type2>}`: The value is of type `type`, and must be one of
  `<type1>` or `<type2>`, where `<type1>` and `<type2>` are supported
-  [tensor types](../../resources/dims_types.md#data-types).  You don't specify
+  @{$dims_types#data-types$tensor types}.  You don't specify
  that the type of the attr is `type`. This is implied when you have a list of
  types in `{...}`.  For example, in this case the attr `t` is a type that must
  be an `int32`, a `float`, or a `bool`:
@ -450,7 +481,7 @@ define an attr with constraints, you can use the following `<attr-type-expr>`s:
 * `int >= <n>`: The value must be an int whose value is greater than or equal to
  `<n>`, where `<n>` is a natural number.

-  For example, the following Op registration specifies that the attr `a` must
+  For example, the following op registration specifies that the attr `a` must
  have a value that is at least `2`:

  ```c++
@ -461,7 +492,7 @@ define an attr with constraints, you can use the following `<attr-type-expr>`s:
 * `list(<type>) >= <n>`: A list of type `<type>` whose length is greater than
  or equal to `<n>`.

-  For example, the following Op registration specifies that the attr `a` is a
+  For example, the following op registration specifies that the attr `a` is a
  list of types (either `int32` or `float`), and that there must be at least 3
  of them:

@ -496,19 +527,18 @@ REGISTER_OP("AttrDefaultExampleForAllTypes")
   .Attr("l_int: list(int) = [2, 3, 5, 7]");
 ```

-Note in particular that the values of type `type` use [the `DT_*` names
-for the types](../../resources/dims_types.md#data-types).
+Note in particular that the values of type `type` use @{$dims_types#data-types$the `DT_*` names for the types}.

-### Polymorphism
-#### Type Polymorphism
+#### Polymorphism {#polymorphism}
+##### Type Polymorphism

 For ops that can take different types as input or produce different output
 types, you can specify [an attr](#attrs) in
-[an input or output type](#inputs-and-outputs) in the Op registration.  Typically
+[an input or output type](#inputs-and-outputs) in the op registration.  Typically
 you would then register an `OpKernel` for each supported type.

-For instance, if you'd like the `ZeroOut` Op to work on `float`s
-in addition to `int32`s, your Op registration might look like:
+For instance, if you'd like the `ZeroOut` op to work on `float`s
+in addition to `int32`s, your op registration might look like:
 <code class="lang-c++"><pre>
 REGISTER\_OP("ZeroOut")
    <b>.Attr("T: {float, int32}")</b>
@ -516,7 +546,7 @@ REGISTER\_OP("ZeroOut")
    .Output("zeroed: <b>T</b>");
 </pre></code>

-Your Op registration now specifies that the input's type must be `float`, or
+Your op registration now specifies that the input's type must be `float`, or
 `int32`, and that its output will be the same type, since both have type `T`.

 > <a id="naming"></a>A note on naming: Inputs, outputs, and attrs generally should be
@ -602,7 +632,7 @@ class ZeroOut<b>Float</b>Op : public OpKernel {
  }
 };<br/><b>
 // Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
-// in the Op registration above) must be "int32" to use this template
+// in the op registration above) must be "int32" to use this template
 // instantiation.</b>
 REGISTER\_KERNEL\_BUILDER(
    Name("ZeroOut")
@ -662,7 +692,7 @@ class ZeroOutOp : public OpKernel {
  }
 };<br/>
 // Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
-// in the Op registration above) must be "int32" to use this template
+// in the op registration above) must be "int32" to use this template
 // instantiation.</b>
 REGISTER\_KERNEL\_BUILDER(
    Name("ZeroOut")
@ -725,7 +755,7 @@ TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
 #undef REGISTER_KERNEL
 ```

-#### List Inputs and Outputs
+##### List Inputs and Outputs

 In addition to being able to accept or produce different types, ops can consume
 or produce a variable number of tensors.
@ -743,7 +773,7 @@ REGISTER_OP("PolymorphicListExample")
 ```

 You can also place restrictions on what types can be specified in the list. In
-this next case, the input is a list of `float` and `double` tensors. The Op
+this next case, the input is a list of `float` and `double` tensors. The op
 accepts, for example, input types `(float, double, float)` and in that case the
 output type would also be `(float, double, float)`.

@ -800,9 +830,9 @@ REGISTER_OP("MinimumLengthPolymorphicListExample")
    .Output("out: T");
 ```

-### Inputs and Outputs
+#### Inputs and Outputs {#inputs-outputs}

-To summarize the above, an Op registration can have multiple inputs and outputs:
+To summarize the above, an op registration can have multiple inputs and outputs:

 ```c++
 REGISTER_OP("MultipleInsAndOuts")
@ -826,7 +856,7 @@ expressions:
  `string`). This specifies a single tensor of the given type.

  See
-  [the list of supported Tensor types](../../resources/dims_types.md#data-types).
+  @{$dims_types#data-types$the list of supported Tensor types}.

  ```c++
  REGISTER_OP("BuiltInTypesExample")
@ -869,9 +899,9 @@ expressions:
 * For a sequence of tensors with the same type: `<number> * <type>`, where
  `<number>` is the name of an [Attr](#attrs) with type `int`.  The `<type>` can
  either be
-  [a specific type like `int32` or `float`](../../resources/dims_types.md#data-types),
+  @{$dims_types#data-types$a specific type like `int32` or `float`},
  or the name of an attr with type `type`.  As an example of the first, this
-  Op accepts a list of `int32` tensors:
+  op accepts a list of `int32` tensors:

  ```c++
  REGISTER_OP("Int32SequenceExample")
@ -879,7 +909,7 @@ expressions:
      .Input("in: NumTensors * int32")
  ```

-  Whereas this Op accepts a list of tensors of any type, as long as they are all
+  Whereas this op accepts a list of tensors of any type, as long as they are all
  the same:

  ```c++
@ -901,17 +931,22 @@ expressions:
 For more details, see
 [`tensorflow/core/framework/op_def_builder.h`][op_def_builder].

-### Backwards compatibility
+#### Backwards compatibility {#backward-compat}

-In general, changes to specifications must be backwards-compatible: changing the
-specification of an Op must not break prior serialized `GraphDef` protocol
-buffers constructed from older specifications.  The details of `GraphDef`
-compatibility are [described here](../../resources/versions.md#graphs).
+Let's assume you have written a nice, custom op and shared it with others, so
+you have happy customers using your operation.  However, you'd like to make
+changes to the op in some way.
+
+In general, changes to existing, checked-in specifications must be
+backwards-compatible: changing the specification of an op must not break prior
+serialized `GraphDef` protocol buffers constructed from older specifications.
+The details of `GraphDef` compatibility are
+@{$version_semantics#graphs$described here}.

 There are several ways to preserve backwards-compatibility.

 1. Any new attrs added to an operation must have default values defined, and
-   with that default value the Op must have the original behavior. To change an
+   with that default value the op must have the original behavior. To change an
   operation from not polymorphic to polymorphic, you *must* give a default
   value to the new type attr to preserve the original signature by default. For
   example, if your operation was:
@ -941,11 +976,11 @@ There are several ways to preserve backwards-compatibility.

 4. You can add a new list input / output, if it defaults to empty.

-5. Namespace any new Ops you create, by prefixing the Op names with something
-   unique to your project. This avoids having your Op colliding with any Ops
+5. Namespace any new ops you create, by prefixing the op names with something
+   unique to your project. This avoids having your op colliding with any ops
   that might be included in future versions of TensorFlow.

-6. Plan ahead! Try to anticipate future uses for the Op. Some signature changes
+6. Plan ahead! Try to anticipate future uses for the op. Some signature changes
   can't be done in a compatible way (for example, making a list of the same
   type into a list of varying types).

@ -960,9 +995,9 @@ callers.  The Python API may be kept compatible by careful changes in a
 hand-written Python wrapper, by keeping the old signature except possibly adding
 new optional arguments to the end.  Generally incompatible changes may only be
 made when TensorFlow's changes major versions, and must conform to the
-[`GraphDef` version semantics](../../resources/versions.md#graphs).
+@{$version_semantics#graphs$`GraphDef` version semantics}.

-## GPU Support
+### GPU Support {#gpu-support}

 You can implement different OpKernels and register one for CPU and another for
 GPU, just like you can [register kernels for different types](#polymorphism).
@ -971,12 +1006,16 @@ There are several examples of kernels with GPU support in
 Notice some kernels have a CPU version in a `.cc` file, a GPU version in a file
 ending in `_gpu.cu.cc`, and some code shared in common in a `.h` file.

-For example, the [`pad` op](../../api_docs/python/array_ops.md#pad) has
+For example, the @{tf.pad} has
 everything but the GPU kernel in [`tensorflow/core/kernels/pad_op.cc`][pad_op].
 The GPU kernel is in
 [`tensorflow/core/kernels/pad_op_gpu.cu.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/pad_op_gpu.cu.cc),
 and the shared code is a templated class defined in
 [`tensorflow/core/kernels/pad_op.h`](https://www.tensorflow.org/code/tensorflow/core/kernels/pad_op.h).
+We organize the code this way for two reasons: it allows you to share common
+code among the CPU and GPU implementations, and it puts the GPU implementation
+into a separate file so that it can be compiled only by the GPU compiler.
+
 One thing to note, even when the GPU kernel version of `pad` is used, it still
 needs its `"paddings"` input in CPU memory.  To mark that inputs or outputs are
 kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
@ -990,7 +1029,7 @@ kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
                          PadOp<GPUDevice, T>)
 ```

-### Compiling the kernel for the GPU device
+#### Compiling the kernel for the GPU device {#compiling-kernel}

 Look at
 [cuda_op_kernel.cu.cc](https://www.tensorflow.org/code/tensorflow/g3doc/how_tos/adding_an_op/cuda_op_kernel.cu.cc)
@ -1021,12 +1060,12 @@ you'll need to specify the path explicitly in the second (g++) command above.
 For example, add `-L /usr/local/cuda-8.0/lib64/` if your CUDA is installed in 
 `/usr/local/cuda-8.0`.

-## Implement the gradient in Python
+### Implement the gradient in Python {#implement-gradient}

 Given a graph of ops, TensorFlow uses automatic differentiation
 (backpropagation) to add new ops representing gradients with respect to the
 existing ops (see
-[Gradient Computation](../../api_docs/python/train.md#gradient-computation)).
+@{$python/train#gradient_computation$Gradient Computation}).
 To make automatic differentiation work for new ops, you must register a gradient
 function which computes gradients with respect to the ops' inputs given
 gradients with respect to the ops' outputs.
@ -1070,16 +1109,16 @@ def _zero_out_grad(op, grad):
 ```

 Details about registering gradient functions with
-[`ops.RegisterGradient`](../../api_docs/python/framework.md#RegisterGradient):
+@{tf.RegisterGradient}:

 * For an op with one output, the gradient function will take an
-  [`Operation`](../../api_docs/python/framework.md#Operation) `op` and a
-  [`Tensor`](../../api_docs/python/framework.md#Tensor) `grad` and build new ops
+  @{tf.Operation} `op` and a
+  @{tf.Tensor} `grad` and build new ops
  out of the tensors
  [`op.inputs[i]`](../../api_docs/python/framework.md#Operation.inputs),
  [`op.outputs[i]`](../../api_docs/python/framework.md#Operation.outputs), and `grad`.  Information
  about any attrs can be found via
-  [`op.get_attr`](../../api_docs/python/framework.md#Operation.get_attr).
+  @{tf.Operation.get_attr}.

 * If the op has multiple outputs, the gradient function will take `op` and
  `grads`, where `grads` is a list of gradients with respect to each output.
@ -1091,14 +1130,17 @@ Details about registering gradient functions with
  `None`.  For example, for an op taking a floating point tensor `x` and an
  integer index `i`, the gradient function would `return [x_grad, None]`.

-* If there is no meaningful gradient for the op at all, use
-  `ops.NotDifferentiable("OpName")` to disable automatic differentiation.
+* If there is no meaningful gradient for the op at all, you often will not have
+  to register any gradient, and as long as the op's gradient is never needed,
+  you will be fine. In some cases, an op has no well-defined gradient but can
+  be involved in the computation of the gradient. Here you can use
+  `ops.NotDifferentiable` to automatically propagate zeros backwards.

 Note that at the time the gradient function is called, only the data flow graph
 of ops is available, not the tensor data itself.  Thus, all computation must be
 performed using other tensorflow ops, to be run at graph execution time.

-## Shape functions in C++
+### Shape functions in C++ {#shape-functions}

 The TensorFlow API has a feature called "shape inference" that provides
 information about the shapes of tensors without having to execute the
--- a/tensorflow/docs_src/extend/architecture.md
+++ b/tensorflow/docs_src/extend/architecture.md
@ -0,0 +1,218 @@
+# TensorFlow Architecture
+
+We designed TensorFlow for large-scale distributed training and inference, but
+it is also flexible enough to support experimentation with new machine
+learning models and system-level optimizations.
+
+This document describes the system architecture that makes possible this
+combination of scale and flexibility. It assumes that you have basic familiarity
+with TensorFlow programming concepts such as the computation graph, operations,
+and sessions. See @{$get_started$Getting Started}
+for an introduction to these topics. Some familiarity
+with @{$distributed$distributed TensorFlow}
+will also be helpful.
+
+This document is for developers who want to extend TensorFlow in some way not
+supported by current APIs, hardware engineers who want to optimize for
+TensorFlow, implementers of machine learning systems working on scaling and
+distribution, or anyone who wants to look under Tensorflow's hood. After
+reading it you should understand TensorFlow architecture well enough to read
+and modify the core TensorFlow code.
+
+## Overview
+
+The TensorFlow runtime is a cross-platform library. Figure 1 illustrates its
+general architecture. A C API separates user level code in different languages
+from the core runtime.
+
+![TensorFlow Layers](../images/layers.png){: width="300"}
+
+**Figure 1**
+
+
+This document focuses on the following layers:
+
+*  **Client**:
+   *  Defines the computation as a dataflow graph.
+   *  Initiates graph execution using a [**session**](
+      https://www.tensorflow.org/code/tensorflow/python/client/session.py)
+*  **Distributed Master**
+   *  Prunes a specific subgraph from the graph, as defined by the arguments
+      to Session.run().
+   *  Partitions the subgraph into multiple pieces that run in different
+      processes and devices.
+   *  Distributes the graph pieces to worker services.
+   *  Initiates graph piece execution by worker services.
+*  **Worker Services** (one for each task)
+   *  Schedule the execution of graph operations using kernel implementations
+      appropriate to the available hardware (CPUs, GPUs, etc).
+   *  Send and receive operation results to and from other worker services.
+*  **Kernel Implementations**
+   *  Perform the computation for individual graph operations.
+
+Figure 2 illustrates the interaction of these components. "/job:worker/task:0" and
+"/job:ps/task:0" are both tasks with worker services. "PS" stands for "parameter
+server": a task responsible for storing and updating the model's parameters.
+Other tasks send updates to these parameters as they work on optimizing the
+parameters. This particular division of labor between tasks is not required, but
+it is common for distributed training.
+
+![TensorFlow Architecture Diagram](../images/diag1.svg){: width="500"}
+
+**Figure 2**
+
+Note that the Distributed Master and Worker Service only exist in
+distributed TensorFlow. The single-process version of TensorFlow includes a
+special Session implementation that does everything the distributed master does
+but only communicates with devices in the local process.
+
+The following sections describe the core TensorFlow layers in greater detail and
+step through the processing of an example graph.
+
+## Client
+
+Users write the client TensorFlow program that builds the computation graph.
+This program can either directly compose individual operations or use a
+convenience library like the Estimators API to compose neural network layers and
+other higher-level abstractions. TensorFlow supports multiple client
+languages, and we have prioritized Python and C++, because our internal users
+are most familiar with these languages. As features become more established,
+we typically port them to C++, so that users can access an optimized
+implementation from all client languages. Most of the training libraries are
+still Python-only, but C++ does have support for efficient inference.
+
+The client creates a session, which sends the graph definition to the
+distributed master as a @{tf.GraphDef}
+protocol buffer. When the client evaluates a node or nodes in the
+graph, the evaluation triggers a call to the distributed master to initiate
+computation.
+
+In Figure 3, the client has built a graph that applies weights (w) to a
+feature vector (x), adds a bias term (b) and saves the result in a variable
+(s).
+
+![TensorFlow Architecture Diagram: Client](../images/graph_client.svg){: width="700"}
+
+**Figure 3**
+
+### Code
+
+*  @{tf.Session}
+
+## Distributed master
+
+The distributed master:
+
+*  prunes the graph to obtain the subgraph required to evaluate the nodes
+   requested by the client,
+*  partitions the graph to obtain graph pieces for
+   each participating device, and
+*  caches these pieces so that they may be re-used in subsequent steps.
+
+Since the master sees the overall computation for
+a step, it applies standard optimizations such as common subexpression
+elimination and constant folding. It then coordinates execution of the
+optimized subgraphs across a set of tasks.
+
+![TensorFlow Architecture Diagram: Master](../images/graph_master_cln.svg){: width="700"}
+
+**Figure 4**
+
+
+Figure 5 shows a possible partition of our example graph. The distributed
+master has grouped the model parameters in order to place them together on the
+parameter server.
+
+![Partitioned Graph](../images/graph_split1.svg){: width="700"}
+
+**Figure 5**
+
+
+Where graph edges are cut by the partition, the distributed master inserts
+send and receive nodes to pass information between the distributed tasks
+(Figure 6).
+
+![Partitioned Graph](../images/graph_split2.svg){: width="700"}
+
+**Figure 6**
+
+
+The distributed master then ships the graph pieces to the distributed tasks.
+
+![Partitioned Graph](../images/graph_workers_cln.svg){: width="700"}
+
+**Figure 7**
+
+### Code
+
+*  [MasterService API definition](https://www.tensorflow.org/code/tensorflow/core/protobuf/master_service.proto)
+*  [Master interface](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/master_interface.h)
+
+## Worker Service
+
+The worker service in each task:
+
+*  handles requests from the master,
+*  schedules the execution of the kernels for the operations that comprise a
+   local subgraph, and
+*  mediates direct communication between tasks.
+
+We optimize the worker service for running large graphs with low overhead. Our
+current implementation can execute tens of thousands of subgraphs per second,
+which enables a large number of replicas to make rapid, fine-grained training
+steps. The worker service dispatches kernels to local devices and runs kernels
+in parallel when possible, for example by using multiple CPU cores or GPU
+streams.
+
+We specialize Send and Recv operations for each pair of source and destination
+device types:
+
+*  Transfers between local CPU and GPU devices use the
+   `cudaMemcpyAsync()` API to overlap computation and data transfer.
+*  Transfers between two local GPUs use peer-to-peer DMA, to avoid an expensive
+   copy via the host CPU.
+
+For transfers between tasks, TensorFlow uses multiple protocols, including:
+
+*  gRPC over TCP.
+*  RDMA over Converged Ethernet.
+
+We also have preliminary support for NVIDIA's NCCL library for multi-GPU
+communication (see [`tf.contrib.nccl`](
+https://www.tensorflow.org/code/tensorflow/contrib/nccl/python/ops/nccl_ops.py)).
+
+![Partitioned Graph](../images/graph_send_recv.svg){: width="700"}
+
+**Figure 8**
+
+### Code
+
+*   [WorkerService API definition](https://www.tensorflow.org/code/tensorflow/core/protobuf/worker_service.proto)
+*   [Worker interface](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/worker_interface.h)
+*   [Remote rendezvous (for Send and Recv implementations)](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/rpc/rpc_rendezvous_mgr.h)
+
+## Kernel Implementations
+
+The runtime contains over 200 standard operations, including mathematical, array
+manipulation, control flow, and state management operations. Each of these
+operations can have kernel implementations optimized for a variety of devices.
+Many of the operation kernels are implemented using Eigen::Tensor, which uses
+C++ templates to generate efficient parallel code for multicore CPUs and GPUs;
+however, we liberally use libraries like cuDNN where a more efficient kernel
+implementation is possible. We have also implemented
+@{$quantization$quantization}, which enables
+faster inference in environments such as mobile devices and high-throughput
+datacenter applications, and use the
+[gemmlowp](https://github.com/google/gemmlowp) low-precision matrix library to
+accelerate quantized computation.
+
+If it is difficult or inefficient to represent a subcomputation as a composition
+of operations, users can register additional kernels that provide an efficient
+implementation written in C++. For example, we recommend registering your own
+fused kernels for some performance critical operations, such as the ReLU and
+Sigmoid activation functions and their corresponding gradients. The @{$xla$XLA Compiler} has an
+experimental implementation of automatic kernel fusion.
+
+### Code
+
+*   [`OpKernel` interface](https://www.tensorflow.org/code/tensorflow/core/framework/op_kernel.h)
--- a/tensorflow/g3doc/tutorials/estimators/index.md
+++ b/tensorflow/g3doc/tutorials/estimators/index.md
@ -2,17 +2,17 @@

 The tf.contrib.learn framework makes it easy to construct and train machine
 learning models via its high-level
-[Estimator](../../api_docs/python/contrib.learn.md#estimators) API. `Estimator`
+@{$python/contrib.learn#estimators$Estimator} API. `Estimator`
 offers classes you can instantiate to quickly configure common model types such
 as regressors and classifiers:

-*   [`LinearClassifier`](../../api_docs/python/contrib.learn.md#LinearClassifier):
+*   @{tf.contrib.learn.LinearClassifier}:
    Constructs a linear classification model.
-*   [`LinearRegressor`](../../api_docs/python/contrib.learn.md#LinearRegressor):
+*   @{tf.contrib.learn.LinearRegressor}:
    Constructs a linear regression model.
-*   [`DNNClassifier`](../../api_docs/python/contrib.learn.md#DNNClassifier):
+*   @{tf.contrib.learn.DNNClassifier}:
    Construct a neural network classification model.
-*   [`DNNRegressor`](../../api_docs/python/contrib.learn.md#DNNRegressor):
+*   @{tf.contrib.learn.DNNRegressor}:
    Construct a neural network regressions model.

 But what if none of `tf.contrib.learn`'s predefined model types meets your
@ -40,9 +40,9 @@ This tutorial assumes you already know tf.contrib.learn API basics, such as
 feature columns and `fit()` operations. If you've never used tf.contrib.learn
 before, or need a refresher, you should first review the following tutorials:

-*   [tf.contrib.learn Quickstart](../tflearn/index.md): Quick introduction to
+*   @{$tflearn$tf.contrib.learn Quickstart}: Quick introduction to
    training a neural network using tf.contrib.learn.
-*   [TensorFlow Linear Model Tutorial](../wide/index.md): Introduction to
+*   @{$wide$TensorFlow Linear Model Tutorial}: Introduction to
    feature columns, and an overview on building a linear classifier in
    tf.contrib.learn.

@ -55,8 +55,8 @@ viewing the shell under a microscope, it's desirable to find other measurements
 that can predict age.

 The [Abalone Data Set](https://archive.ics.uci.edu/ml/datasets/Abalone) contains
-the following [feature
-data](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names)
+the following
+[feature data](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names)
 for abalone:

 | Feature        | Description                                               |
@ -72,7 +72,7 @@ for abalone:

 The label to predict is number of rings, as a proxy for abalone age.

-![Abalone shell](../../images/abalone_shell.jpg) **[“Abalone
+![Abalone shell](../images/abalone_shell.jpg) **[“Abalone
 shell”](https://www.flickr.com/photos/thenickster/16641048623/) (by [Nicki Dugan
 Pogue](https://www.flickr.com/photos/thenickster/), CC BY-SA 2.0)**

@ -288,8 +288,7 @@ The `model_fn` must accept three arguments:
 *   `targets`: A `Tensor` containing the labels passed to the model via `fit()`,
    `evaluate()`, or `predict()`. Will be empty for `predict()` calls, as these
    are the values the model will infer.
-*   `mode`: One of the following
-    [`ModeKeys`](../../api_docs/python/contrib.learn.md#ModeKeys) string values
+*   `mode`: One of the following @{tf.contrib.learn.ModeKeys} string values
    indicating the context in which the model_fn was invoked:
    *   `tf.contrib.learn.ModeKeys.TRAIN` The `model_fn` was invoked in training
        mode—e.g., via a `fit()` call.
@ -311,8 +310,7 @@ sections that follow):
 *   Defining the training operation that specifies the `optimizer` algorithm to
    minimize the loss values calculated by the loss function.

-The `model_fn` must return a
-[`ModelFnOps`](https://www.tensorflow.org/code/tensorflow/contrib/learn/python/learn/estimators/model_fn.py)
+The `model_fn` must return a @{tf.contrib.learn.ModelFnOps}
 object, which contains the following values:

 *   `mode` (required). The mode in which the model was run. Typically, you will
@ -330,9 +328,10 @@ object, which contains the following values:
    returned by `predict()`, so you can construct it in the format in which
    you'd like to consume it.

-    In `EVAL` mode, the dict is used by [metric
-    functions](../../api_docs/python/contrib.metrics.md#metric-ops) to compute
-    metrics.
+    In `EVAL` mode, the dict is used by
+    @{$python/contrib.metrics#Metric_Ops_$metric functions}
+    to compute metrics.
+    

 *   `loss` (required in `EVAL` and `TRAIN` mode). A `Tensor` containing a scalar
    loss value: the output of the model's loss function (discussed in more depth
@ -346,8 +345,7 @@ object, which contains the following values:
 *   `eval_metric_ops` (optional). A dict of name/value pairs specifying the
    metrics that will be calculated when the model runs in `EVAL` mode. The name
    is a label of your choice for the metric, and the value is the result of
-    your metric calculation. The
-    [`tf.metrics`](https://www.tensorflow.org/code/tensorflow/python/ops/metrics_impl.py)
+    your metric calculation. The @{tf.metrics}
    module provides predefined functions for a variety of common metrics. The
    following `eval_metric_ops` contains an `"accuracy"` metric calculated using
    `tf.metrics.accuracy`:
@ -373,11 +371,10 @@ will accept the feature data that is passed to the `model_fn` in the `features`
 argument. If `features` contains an n-dimenional `Tensor` with all your feature
 data (which is the case if `x` and `y` `Dataset`s are passed to `fit()`,
 `evaluate()`, and `predict()` directly), then it can serve as the input layer.
-If `features` contains a dict of [feature
-columns](../linear/overview.md#feature-columns-and-transformations) passed to
+If `features` contains a dict of @{$linear#feature-columns-and-transformations$feature columns} passed to
 the model via an input function, you can convert it to an input-layer `Tensor`
-with the `input_from_feature_columns()` function in
-[tf.contrib.layers](../../api_docs/python/contrib.layers.md#layers-contrib).
+with the @{tf.contrib.layers.input_from_feature_columns} function in
+@{tf.contrib.layers}.

 ```python
 input_layer = tf.contrib.layers.input_from_feature_columns(
@ -403,7 +400,7 @@ fully connected layers:
 *   `relu(inputs, num_outputs)`. Create a layer of `num_outputs` nodes fully
    connected to the previous layer `inputs` with a [ReLU activation
    function](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\))
-    ([tf.nn.relu](../../api_docs/python/nn.md#relu)):
+    (@{tf.nn.relu}):

    ```python
    hidden_layer = tf.contrib.layers.relu(inputs=input_layer, num_outputs=10)
@ -411,7 +408,7 @@ fully connected layers:

 *   `relu6(inputs, num_outputs)`. Create a layer of `num_outputs` nodes fully
    connected to the previous layer `hidden_layer` with a ReLU 6 activation
-    function ([tf.nn.relu6](../../api_docs/python/nn.md#relu6)):
+    function (@{tf.nn.relu6}):

    ```python
    second_hidden_layer = tf.contrib.layers.relu6(inputs=hidden_layer, num_outputs=20)
@ -427,8 +424,7 @@ fully connected layers:

 All these functions are
 [partials](https://docs.python.org/2/library/functools.html#functools.partial)
-of the more general
-[`fully_connected()`](../../api_docs/python/contrib.layers.md#fully_connected)
+of the more general @{tf.contrib.layers.fully_connected}
 function, which can be used to add fully connected layers with other activation
 functions, e.g.:

@ -440,9 +436,8 @@ output_layer = tf.contrib.layers.fully_connected(inputs=second_hidden_layer,

 The above code creates the neural network layer `output_layer`, which is fully
 connected to `second_hidden_layer` with a sigmoid activation function
-([`tf.sigmoid`](../../api_docs/python/nn.md#sigmoid)). For a list of predefined
-activation functions available in TensorFlow, see the [API
-docs](../../api_docs/python/nn.md#activation-functions).
+(@{tf.sigmoid}). For a list of predefined
+activation functions available in TensorFlow, see the @{$python/nn#activation_functions$API docs}.

 Putting it all together, the following code constructs a full neural network for
 the abalone predictor, and captures its predictions:
@ -472,7 +467,7 @@ Here, because you'll be passing the abalone `Datasets` directly to `fit()`,
 `features` `Tensor` passed to the `model_fn`. The network contains two hidden
 layers, each with 10 nodes and a ReLU activation function. The output layer
 contains no activation function, and is
-[reshaped](../../api_docs/python/array_ops.md#reshape) to a one-dimensional
+@{tf.reshape} to a one-dimensional
 tensor to capture the model's predictions, which are stored in
 `predictions_dict`.

@ -480,8 +475,7 @@ tensor to capture the model's predictions, which are stored in

 The `ModelFnOps` returned by the `model_fn` must contain `loss`: a `Tensor`
 representing the loss value, which quantifies how well the model's predictions
-reflect the target values during training and evaluation runs. The
-[`tf.losses`](https://www.tensorflow.org/code/tensorflow/python/ops/losses/losses.py)
+reflect the target values during training and evaluation runs. The @{tf.losses}
 module provides convenience functions for calculating loss using a variety of
 metrics, including:

@ -522,7 +516,7 @@ using `mean_squared_error()` (in bold):
  loss = tf.losses.mean_squared_error(targets, predictions)</strong>
  ...</code></pre>

-See the [API docs](../../api_docs/python/contrib.losses.md#losses-contrib) for a
+See the @{$python/contrib.losses$API guide} for a
 full list of loss functions and more details on supported arguments and usage.

 Supplementary metrics for evaluation can be added to an `eval_metric_ops` dict.
@ -549,10 +543,10 @@ required arguments:
 *   `loss`. The loss value calculated by the `model_fn` (see [Defining Loss for
    the Model](#defining-loss)).
 *   `global_step`. An integer
-    [`Variable`](../../api_docs/python/state_ops.md#Variable) representing the
+    @{tf.Variable} representing the
    step counter to increment for each model training run. Can easily be
    created/incremented in TensorFlow via the
-    [`get_global_step()`](../../api_docs/python/contrib.framework.md#get_global_step)
+    @{tf.train.get_global_step}
    function.
 *   `learning_rate`. The [learning
    rate](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Background)
@ -563,34 +557,34 @@ required arguments:
    algorithm predefined in `tf.contrib.layers.optimizers`:
    *   `SGD`. Implementation of [gradient
        descent](https://en.wikipedia.org/wiki/Gradient_descent)
-        ([tf.train.GradientDescentOptimizer](../../api_docs/python/train.md#GradientDescentOptimizer))
+        (@{tf.train.GradientDescentOptimizer})
    *   `Adagrad`. Implementation of the [AdaGrad optimization
        algorithm](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
-        ([tf.train.AdagradOptimizer](../../api_docs/python/train.md#AdagradOptimizer))
+        (@{tf.train.AdagradOptimizer})
    *   `Adam`. Implementation of the [Adam optimization
        algorithm](http://arxiv.org/pdf/1412.6980.pdf)
-        ([tf.train.AdamOptimizer](../../api_docs/python/train.md#AdamOptimizer))
+        (@{tf.train.AdamOptimizer})
    *   `Ftrl`. Implementation of the
        [FTRL-Proximal](https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf)
        ("Follow The (Proximally) Regularized Leader") algorithm
-        ([tf.train.FtrlOptimizer](../../api_docs/python/train.md#FtrlOptimizer))
+        (@{tf.train.FtrlOptimizer})
    *   `Momentum`. Implementation of stochastic gradient descent with
        [momentum](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum)
-        ([tf.train.MomentumOptimizer](../../api_docs/python/train.md#MomentumOptimizer))
+        (@{tf.train.MomentumOptimizer})
    *   `RMSProp`. Implementation of the
        [RMSprop](http://sebastianruder.com/optimizing-gradient-descent/index.html#rmsprop)
        algorithm
-        ([tf.train.RMSPropOptimizer](../../api_docs/python/train.md#RMSPropOptimizer))
+        (@{tf.train.RMSPropOptimizer})

 NOTE: The `optimize_loss` function supports additional optional arguments to
-further configure the optimizer, such as for implementing decay. See the [API
-docs](../../api_docs/python/contrib.layers.md#optimize_loss) for more info.
+further configure the optimizer, such as for implementing decay. See the
+@{tf.contrib.layers.optimize_loss$API docs} for more info.

 The following code defines a training op for the abalone `model_fn` using the
 loss value calculated in [Defining Loss for the Model](#defining-loss), the
 learning rate passed to the function in `params`, and the SGD optimizer. For
 `global_step`, the convenience function
-[`get_global_step()`](../../api_docs/python/contrib.framework.md#get_global_step)
+@{tf.train.get_global_step}
 in tf.contrib.framework takes care of generating an integer variable:

 ```python
@ -713,9 +707,9 @@ Prediction 7: 11.1289

 Congrats! You've successfully built a tf.contrib.learn `Estimator` from scratch.
 For additional reference materials on building `Estimator`s, see the following
-sections of the API docs:
+sections of the API guides:

-*   [Estimators](../../api_docs/python/contrib.learn.md#estimators)
-*   [Layers](../../api_docs/python/contrib.layers.md#layers-contrib)
-*   [Losses](../../api_docs/python/contrib.losses.md#losses-contrib)
-*   [Optimization](../../api_docs/python/contrib.layers.md#optimization)
+*   @{$python/contrib.learn#Estimators$Estimators}
+*   @{$python/contrib.layers$Layers}
+*   @{$python/contrib.losses$Losses}
+*   @{$python/contrib.layers#optimization$Optimization}
--- a/tensorflow/g3doc/how_tos/language_bindings/index.md
+++ b/tensorflow/g3doc/how_tos/language_bindings/index.md
@ -125,8 +125,7 @@ The `OpDef` specifies the following:
    instead of CamelCase for the op's function name.
 -   A list of inputs and outputs. The types for these may be polymorphic by
    referencing attributes, as described in the inputs and outputs section of
-    [Adding an
-    op](https://tensorflow.org/how_tos/adding_an_op/index.html).
+    @{$adding_an_op$Adding an     op}.
 -   A list of attributes, along with their default values (if any). Note that
    some of these will be inferred (if they are determined by an input), some
    will be optional (if they have a default), and some will be required (no
--- a/tensorflow/docs_src/extend/leftnav_files
+++ b/tensorflow/docs_src/extend/leftnav_files
@ -0,0 +1,7 @@
+architecture.md
+adding_an_op.md
+add_filesys.md
+language_bindings.md
+new_data_formats.md
+estimators.md
+tool_developers/index.md
--- a/tensorflow/g3doc/how_tos/new_data_formats/index.md
+++ b/tensorflow/g3doc/how_tos/new_data_formats/index.md
@ -4,7 +4,7 @@ PREREQUISITES:

 *   Some familiarity with C++.
 *   Must have
-    [downloaded TensorFlow source](../../get_started/os_setup.md#installing-from-sources), and be
+    @{$install_sources$downloaded TensorFlow source}, and be
    able to build it.

 We divide the task of supporting a file format into two pieces:
@ -16,9 +16,9 @@ We divide the task of supporting a file format into two pieces:

 For example, to read a
 [CSV file](https://en.wikipedia.org/wiki/Comma-separated_values), we use
-[a Reader for text files](../../api_docs/python/io_ops.md#TextLineReader)
+@{tf.TextLineReader$a Reader for text files}
 followed by
-[an Op that parses CSV data from a line of text](../../api_docs/python/io_ops.md#decode_csv).
+@{tf.decode_csv$an Op that parses CSV data from a line of text}.

 [TOC]

@ -27,11 +27,11 @@ followed by
 A `Reader` is something that reads records from a file.  There are some examples
 of Reader Ops already built into TensorFlow:

-*   [`tf.TFRecordReader`](../../api_docs/python/io_ops.md#TFRecordReader)
+*   @{tf.TFRecordReader}
    ([source in `kernels/tf_record_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/tf_record_reader_op.cc))
-*   [`tf.FixedLengthRecordReader`](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
+*   @{tf.FixedLengthRecordReader}
    ([source in `kernels/fixed_length_record_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/fixed_length_record_reader_op.cc))
-*   [`tf.TextLineReader`](../../api_docs/python/io_ops.md#TextLineReader)
+*   @{tf.TextLineReader}
    ([source in `kernels/text_line_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/text_line_reader_op.cc))

 You can see these all expose the same interface, the only differences
@ -44,15 +44,15 @@ two scalar tensors: a string key and a string value.
 To create a new reader called `SomeReader`, you will need to:

 1.  In C++, define a subclass of
-    [`tensorflow::ReaderBase`](https://www.tensorflow.org/code/tensorflow/core/kernels/reader_base.h)
+    [`tensorflow::ReaderBase`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_base.h)
    called `SomeReader`.
 2.  In C++, register a new reader op and kernel with the name `"SomeReader"`.
-3.  In Python, define a subclass of [`tf.ReaderBase`](https://www.tensorflow.org/code/tensorflow/python/ops/io_ops.py) called `SomeReader`.
+3.  In Python, define a subclass of @{tf.ReaderBase} called `SomeReader`.

 You can put all the C++ code in a file in
-`tensorflow/core/user_ops/some_reader_op.cc`.  The code to read a file will live
+`tensorflow/core/user_ops/some_reader_op.cc`. The code to read a file will live
 in a descendant of the C++ `ReaderBase` class, which is defined in
-[`tensorflow/core/kernels/reader_base.h`](https://www.tensorflow.org/code/tensorflow/core/kernels/reader_base.h).
+[`tensorflow/core/kernels/reader_base.h`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_base.h).
 You will need to implement the following methods:

 *   `OnWorkStartedLocked`: open the next file
@ -87,7 +87,7 @@ helper functions from
 without modifying any arguments.

 Next you will create the actual Reader op.  It will help if you are familiar
-with [the adding an op how-to](../../how_tos/adding_an_op/index.md).  The main steps
+with @{$adding_an_op$the adding an op how-to}.  The main steps
 are:

 *   Registering the op.
@ -167,8 +167,7 @@ REGISTER_KERNEL_BUILDER(Name("TextLineReader").Device(DEVICE_CPU),
 ```

 The last step is to add the Python wrapper.  You can either do this by
-[compiling a dynamic
-library](../../how_tos/adding_an_op/#building_the_op_library)
+@{$adding_an_op#building_the_op_library$compiling a dynamic library}
 or, if you are building TensorFlow from source, adding to `user_ops.py`.
 For the latter, you will import `tensorflow.python.ops.io_ops` in
 [`tensorflow/python/user_ops/user_ops.py`](https://www.tensorflow.org/code/tensorflow/python/user_ops/user_ops.py)
@ -195,28 +194,28 @@ You can see some examples in
 ## Writing an Op for a record format

 Generally this is an ordinary op that takes a scalar string record as input, and
-so follow [the instructions to add an Op](../../how_tos/adding_an_op/index.md).
+so follow @{$adding_an_op$the instructions to add an Op}.
 You may optionally take a scalar string key as input, and include that in error
 messages reporting improperly formatted data.  That way users can more easily
 track down where the bad data came from.

 Examples of Ops useful for decoding records:

-*   [`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
+*   @{tf.parse_single_example}
    (and
-    [`tf.parse_example`](../../api_docs/python/io_ops.md#parse_example))
-*   [`tf.decode_csv`](../../api_docs/python/io_ops.md#decode_csv)
-*   [`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw)
+    @{tf.parse_example})
+*   @{tf.decode_csv}
+*   @{tf.decode_raw}

 Note that it can be useful to use multiple Ops to decode a particular record
 format.  For example, you may have an image saved as a string in
 [a `tf.train.Example` protocol buffer](https://www.tensorflow.org/code/tensorflow/core/example/example.proto).
 Depending on the format of that image, you might take the corresponding output
 from a
-[`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
-op and call [`tf.image.decode_jpeg`](../../api_docs/python/image.md#decode_jpeg),
-[`tf.image.decode_png`](../../api_docs/python/image.md#decode_png), or
-[`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw).  It is common to
+@{tf.parse_single_example}
+op and call @{tf.image.decode_jpeg},
+@{tf.image.decode_png}, or
+@{tf.decode_raw}.  It is common to
 take the output of `tf.decode_raw` and use
-[`tf.slice`](../../api_docs/python/array_ops.md#slice) and
-[`tf.reshape`](../../api_docs/python/array_ops.md#reshape) to extract pieces.
+@{tf.slice} and
+@{tf.reshape} to extract pieces.
--- a/tensorflow/docs_src/extend/tool_developers/index.md
+++ b/tensorflow/docs_src/extend/tool_developers/index.md
--- a/tensorflow/docs_src/extras/README.txt
+++ b/tensorflow/docs_src/extras/README.txt
--- a/tensorflow/docs_src/get_started/embedding_viz.md
+++ b/tensorflow/docs_src/get_started/embedding_viz.md
@ -12,7 +12,7 @@ checkpoint file. Although it's most useful for embeddings, it will load any 2D
 tensor, including your training weights.

 To learn more about embeddings and how to train them, see the
-[Vector Representations of Words](../../tutorials/word2vec/index.md) tutorial.
+@{$word2vec$Vector Representations of Words} tutorial.
 If you are interested in embeddings of images, check out
 [this article](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) for
 interesting visualizations of MNIST images. On the other hand, if you are
@ -21,7 +21,7 @@ interested in word embeddings,
 gives a good introduction.

 <video autoplay loop style="max-width: 100%;">
-  <source src="../../images/embedding-mnist.mp4" type="video/mp4">
+  <source src="../images/embedding-mnist.mp4" type="video/mp4">
  Sorry, your browser doesn't support HTML5 video in MP4 format.
 </video>

@ -46,7 +46,7 @@ in the same directory as your checkpoint file.

 For in depth information on how to run TensorBoard and make sure you are
 logging all the necessary information,
-see [TensorBoard: Visualizing Learning](../../how_tos/summaries_and_tensorboard/index.md).
+see @{$summaries_and_tensorboard$TensorBoard: Visualizing Learning}.

 To visualize your embeddings, there are 3 things you need to do:

@ -101,7 +101,7 @@ embedding.tensor_name = embedding_var.name
 embedding.metadata_path = os.path.join(LOG_DIR, 'metadata.tsv')

 # Use the same LOG_DIR where you stored your checkpoint.
-summary_writer = tf.train.SummaryWriter(LOG_DIR)
+summary_writer = tf.summary.FileWriter(LOG_DIR)

 # The next line writes a projector_config.pbtxt in the LOG_DIR. TensorBoard will
 # read this file during startup.
@ -173,7 +173,7 @@ last data point in the bottom right:

 Note in the example above that the last row doesn't have to be filled. For a
 concrete example of a sprite, see
-[this sprite image](../../images/mnist_10k_sprite.png) of 10,000 MNIST digits
+[this sprite image](../images/mnist_10k_sprite.png) of 10,000 MNIST digits
 (100x100).

 Note: We currently support sprites up to 8192px X 8192px.
@ -247,7 +247,7 @@ further analysis on their own with the "Isolate Points" button in the Inspector
 pane on the right hand side.


-![Selection of nearest neighbors](../../images/embedding-nearest-points.png "Selection of nearest neighbors")
+![Selection of nearest neighbors](../images/embedding-nearest-points.png "Selection of nearest neighbors")
 *Selection of the nearest neighbors of “important” in a word embedding dataset.*

 The combination of filtering with custom projection can be powerful. Below, we filtered
@ -260,10 +260,10 @@ You can see that on the right side we have “ideas”, “science”, “perspe
 <table width="100%;">
  <tr>
    <td style="width: 30%;">
-      <img src="../../images/embedding-custom-controls.png" alt="Custom controls panel" title="Custom controls panel" />
+      <img src="../images/embedding-custom-controls.png" alt="Custom controls panel" title="Custom controls panel" />
    </td>
    <td style="width: 70%;">
-      <img src="../../images/embedding-custom-projection.png" alt="Custom projection" title="Custom projection" />
+      <img src="../images/embedding-custom-projection.png" alt="Custom projection" title="Custom projection" />
    </td>
  </tr>
  <tr>
@ -284,4 +284,4 @@ projection) as a small file. The Projector can then be pointed to a set of one
 or more of these files, producing the panel below. Other users can then walk
 through a sequence of bookmarks.

-<img src="../../images/embedding-bookmark.png" alt="Bookmark panel" style="width:300px;">
+<img src="../images/embedding-bookmark.png" alt="Bookmark panel" style="width:300px;">
--- a/tensorflow/docs_src/get_started/get_started.md
+++ b/tensorflow/docs_src/get_started/get_started.md
@ -0,0 +1,447 @@
+
+# Getting Started With TensorFlow
+
+This guide gets you started programming in TensorFlow. Before using this guide,
+@{$install$install TensorFlow}. To get the most out of
+this guide, you should know the following:
+
+*   How to program in Python.
+*   At least a little bit about arrays.
+*   Ideally, something about machine learning. However, if you know little or
+    nothing about machine learning, then this is still the first guide you
+    should read.
+
+TensorFlow provides multiple APIs. The lowest level API--TensorFlow Core--
+provides you with complete programming control. We recommend TensorFlow Core for
+machine learning researchers and others who require fine levels of control over
+their models. The higher level APIs are built on top of TensorFlow Core. These
+higher level APIs are typically easier to learn and use than TensorFlow Core. In
+addition, the higher level APIs make repetitive tasks easier and more consistent
+between different users. A high-level API like tf.contrib.learn helps you manage
+data sets, estimators, training and inference. Note that a few of the high-level
+TensorFlow APIs--those whose method names contain `contrib`-- are still in
+development. It is possible that some `contrib` methods will change or become
+obsolete in subsequent TensorFlow releases.
+
+This guide begins with a tutorial on TensorFlow Core. Later, we
+demonstrate how to implement the same model in tf.contrib.learn. Knowing
+TensorFlow Core principles will give you a great mental model of how things are
+working internally when you use the more compact higher level API.
+
+# Tensors
+
+The central unit of data in TensorFlow is the **tensor**. A tensor consists of a
+set of primitive values shaped into an array of any number of dimensions. A
+tensor's **rank** is its number of dimensions. Here are some examples of
+tensors:
+
+```python
+3 # a rank 0 tensor; this is a scalar with shape []
+[1. ,2., 3.] # a rank 1 tensor; this is a vector with shape [3]
+[[1., 2., 3.], [4., 5., 6.]] # a rank 2 tensor; a matrix with shape [2, 3]
+[[[1., 2., 3.]], [[7., 8., 9.]]] # a rank 3 tensor with shape [2, 1, 3]
+```
+
+## TensorFlow Core tutorial
+
+### Importing TensorFlow
+
+The canonical import statement for TensorFlow programs is as follows:
+
+```python
+import tensorflow as tf
+
+```
+This gives Python access to all of TensorFlow's classes, methods, and symbols.
+Most of the documentation assumes you have already done this.
+
+### The Computational Graph
+
+You might think of TensorFlow Core programs as consisting of two discrete
+sections:
+
+1.  Building the computational graph.
+2.  Running the computational graph.
+
+A **computational graph** is a series of TensorFlow operations arranged into a
+graph of nodes.
+Let's build a simple computational graph. Each node takes zero
+or more tensors as inputs and produces a tensor as an output. One type of node
+is a constant. Like all TensorFlow constants, it takes no inputs, and it outputs
+a value it stores internally. We can create two floating point Tensors `node1`
+and `node2` as follows:
+```python
+node1 = tf.constant(3.0, tf.float32)
+node2 = tf.constant(4.0) # also tf.float32 implicitly
+print(node1, node2)
+```
+The final print statement produces
+```
+Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)
+```
+
+Notice that printing the nodes does not output the values `3.0` and `4.0` as you
+might expect. Instead, they are nodes that, when evaluated, would produce 3.0
+and 4.0, respectively. To actually evaluate the nodes, we must run the
+computational graph within a **session**. A session encapsulates the control and
+state of the TensorFlow runtime.
+
+The following code creates a `Session` object and then invokes its `run` method
+to run enough of the computational graph to evaluate `node1` and `node2`. By
+running the computational graph in a session as follows:
+
+```python
+sess = tf.Session()
+print(sess.run([node1, node2]))
+```
+we see the expected values of 3.0 and 4.0:
+```
+[3.0, 4.0]
+```
+
+We can build more complicated computations by combining `Tensor` nodes with
+operations (Operations are also nodes.). For example, we can add our two
+constant nodes and produce a new graph as follows:
+
+```python
+node3 = tf.add(node1, node2)
+print("node3: ", node3)
+print("sess.run(node3): ",sess.run(node3))
+```
+The last two print statements produce
+```
+node3:  Tensor("Add_2:0", shape=(), dtype=float32)
+sess.run(node3):  7.0
+```
+
+TensorFlow provides a utility called TensorBoard that can display a picture of
+the computational graph. Here is a screenshot showing how TensorBoard
+visualizes the graph:
+
+![TensorBoard screenshot](../images/getting_started_add.png)
+
+As it stands, this graph is not especially interesting because it always
+produces a constant result. A graph can be parameterized to accept external
+inputs, known as **placeholders**. A **placeholder** is a promise to provide a
+value later.
+
+```python
+a = tf.placeholder(tf.float32)
+b = tf.placeholder(tf.float32)
+adder_node = a + b  # + provides a shortcut for tf.add(a, b)
+```
+
+The preceding three lines are a bit like a function or a lambda in which we
+define two input parameters (a and b) and then an operation on them. We can
+evaluate this graph with multiple inputs by using the feed_dict parameter to
+specify Tensors that provide concrete values to these placeholders:
+
+```python
+print(sess.run(adder_node, {a: 3, b:4.5}))
+print(sess.run(adder_node, {a: [1,3], b: [2, 4]}))
+```
+resulting in the output
+```
+7.5
+[ 3.  7.]
+```
+In TensorBoard, the graph looks like this:
+
+![TensorBoard screenshot](../images/getting_started_adder.png)
+
+We can make the computational graph more complex by adding another operation.
+For example,
+
+```python
+add_and_triple = adder_node * 3.
+print(sess.run(add_and_triple, {a: 3, b:4.5}))
+```
+produces the output
+```
+22.5
+```
+
+The preceding computational graph would look as follows in TensorBoard:
+
+![TensorBoard screenshot](../images/getting_started_triple.png)
+
+In machine learning we will typically want a model that can take arbitrary
+inputs, such as the one above.  To make the model trainable, we need to be able
+to modify the graph to get new outputs with the same input.  **Variables** allow
+us to add trainable parameters to a graph.  They are constructed with a type and
+initial value:
+
+
+```python
+W = tf.Variable([.3], tf.float32)
+b = tf.Variable([-.3], tf.float32)
+x = tf.placeholder(tf.float32)
+linear_model = W * x + b
+```
+
+Constants are initialized when you call `tf.constant`, and their value can never
+change. By contrast, variables are not initialized when you call `tf.Variable`.
+To initialize all the variables in a TensorFlow program, you must explicitly
+call a special operation as follows:
+
+```python
+init = tf.global_variables_initializer()
+sess.run(init)
+```
+It is important to realize `init` is a handle to the TensorFlow sub-graph that
+initializes all the global variables. Until we call `sess.run`, the variables
+are uninitialized.
+
+
+Since `x` is a placeholder, we can evaluate `linear_model` for several values of
+`x` simultaneously as follows:
+
+```python
+print(sess.run(linear_model, {x:[1,2,3,4]}))
+```
+to produce the output
+```
+[ 0.          0.30000001  0.60000002  0.90000004]
+```
+
+We've created a model, but we don't know how good it is yet. To evaluate the
+model on training data, we need a `y` placeholder to provide the desired values,
+and we need to write a loss function.
+
+A loss function measures how far apart the
+current model is from the provided data. We'll use a standard loss model for
+linear regression, which sums the squares of the deltas between the current
+model and the provided data. `linear_model - y` creates a vector where each
+element is the corresponding example's error delta. We call `tf.square` to
+square that error. Then, we sum all the squared errors to create a single scalar
+that abstracts the error of all examples using `tf.reduce_sum`:
+
+```python
+y = tf.placeholder(tf.float32)
+squared_deltas = tf.square(linear_model - y)
+loss = tf.reduce_sum(squared_deltas)
+print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
+```
+producing the loss value
+```
+23.66
+```
+
+We could improve this manually by reassigning the values of `W` and `b` to the
+perfect values of -1 and 1. A variable is initialized to the value provided to
+`tf.Variable` but can be changed using operations like `tf.assign`. For example,
+`W=-1` and `b=1` are the optimal parameters for our model. We can change `W` and
+`b` accordingly:
+
+```python
+fixW = tf.assign(W, [-1.])
+fixb = tf.assign(b, [1.])
+sess.run([fixW, fixb])
+print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
+```
+The final print shows the loss now is zero.
+```
+0.0
+```
+
+We guessed the "perfect" values of `W` and `b`, but the whole point of machine
+learning is to find the correct model parameters automatically.  We will show
+how to accomplish this in the next section.
+
+## tf.train API
+
+A complete discussion of machine learning is out of the scope of this tutorial.
+However, TensorFlow provides **optimizers** that slowly change each variable in
+order to minimize the loss function. The simplest optimizer is **gradient
+descent**. It modifies each variable according to the magnitude of the
+derivative of loss with respect to that variable. In general, computing symbolic
+derivatives manually is tedious and error-prone. Consequently, TensorFlow can
+automatically produce derivatives given only a description of the model using
+the function `tf.gradients`. For simplicity, optimizers typically do this
+for you. For example,
+
+```python
+optimizer = tf.train.GradientDescentOptimizer(0.01)
+train = optimizer.minimize(loss)
+```
+
+```python
+sess.run(init) # reset values to incorrect defaults.
+for i in range(1000):
+  sess.run(train, {x:[1,2,3,4], y:[0,-1,-2,-3]})
+
+print(sess.run([W, b]))
+```
+results in the final model parameters:
+```
+[array([-0.9999969], dtype=float32), array([ 0.99999082],
+ dtype=float32)]
+```
+
+Now we have done actual machine learning!  Although doing this simple linear
+regression doesn't require much TensorFlow core code, more complicated models
+and methods to feed data into your model necessitate more code. Thus TensorFlow
+provides higher level abstractions for common patterns, structures, and
+functionality. We will learn how to use some of these abstractions in the
+next section.
+
+### Complete program
+
+The completed trainable linear regression model is shown here:
+
+```python
+import numpy as np
+import tensorflow as tf
+
+# Model parameters
+W = tf.Variable([.3], tf.float32)
+b = tf.Variable([-.3], tf.float32)
+# Model input and output
+x = tf.placeholder(tf.float32)
+linear_model = W * x + b
+y = tf.placeholder(tf.float32)
+# loss
+loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
+# optimizer
+optimizer = tf.train.GradientDescentOptimizer(0.01)
+train = optimizer.minimize(loss)
+# training data
+x_train = [1,2,3,4]
+y_train = [0,-1,-2,-3]
+# training loop
+init = tf.global_variables_initializer()
+sess = tf.Session()
+sess.run(init) # reset values to wrong
+for i in range(1000):
+  sess.run(train, {x:x_train, y:y_train})
+
+# evaluate training accuracy
+curr_W, curr_b, curr_loss  = sess.run([W, b, loss], {x:x_train, y:y_train})
+print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
+```
+When run, it produces
+```
+W: [-0.9999969] b: [ 0.99999082] loss: 5.69997e-11
+```
+
+This more complicated program can still be visualized in TensorBoard
+![TensorBoard final model visualization](../images/getting_started_final.png)
+
+## `tf.contrib.learn`
+
+`tf.contrib.learn` is a high-level TensorFlow library that simplifies the
+mechanics of machine learning, including the following:
+
+*   running training loops
+*   running evaluation loops
+*   managing data sets
+*   managing feeding
+
+tf.contrib.learn defines many common models.
+
+### Basic usage
+
+Notice how much simpler the linear regression program becomes with
+`tf.contrib.learn`:
+
+```python
+import tensorflow as tf
+# NumPy is often used to load, manipulate and preprocess data.
+import numpy as np
+
+# Declare list of features. We only have one real-valued feature. There are many
+# other types of columns that are more complicated and useful.
+features = [tf.contrib.layers.real_valued_column("x", dimension=1)]
+
+# An estimator is the front end to invoke training (fitting) and evaluation
+# (inference). There are many predefined types like linear regression,
+# logistic regression, linear classification, logistic classification, and
+# many neural network classifiers and regressors. The following code
+# provides an estimator that does linear regression.
+estimator = tf.contrib.learn.LinearRegressor(feature_columns=features)
+
+# TensorFlow provides many helper methods to read and set up data sets.
+# Here we use `numpy_input_fn`. We have to tell the function how many batches
+# of data (num_epochs) we want and how big each batch should be.
+x = np.array([1., 2., 3., 4.])
+y = np.array([0., -1., -2., -3.])
+input_fn = tf.contrib.learn.io.numpy_input_fn({"x":x}, y, batch_size=4,
+                                              num_epochs=1000)
+
+# We can invoke 1000 training steps by invoking the `fit` method and passing the
+# training data set.
+estimator.fit(input_fn=input_fn, steps=1000)
+
+# Here we evaluate how well our model did. In a real example, we would want
+# to use a separate validation and testing data set to avoid overfitting.
+estimator.evaluate(input_fn=input_fn)
+```
+When run, it produces
+```
+    {'global_step': 1000, 'loss': 1.9650059e-11}
+```
+
+### A custom model
+
+`tf.contrib.learn` does not lock you into its predefined models. Suppose we
+wanted to create a custom model that is not built into TensorFlow. We can still
+retain the high level abstraction of data set, feeding, training, etc. of
+`tf.contrib.learn`. For illustration, we will show how to implement our own
+equivalent model to `LinearRegressor` using our knowledge of the lower level
+TensorFlow API.
+
+To define a custom model that works with `tf.contrib.learn`, we need to use
+`tf.contrib.learn.Estimator`. `tf.contrib.learn.LinearRegressor` is actually
+a sub-class of `tf.contrib.learn.Estimator`. Instead of sub-classing
+`Estimator`, we simply provide `Estimator` a function `model_fn` that tells
+`tf.contrib.learn` how it can evaluate predictions, training steps, and
+loss. The code is as follows:
+
+```python
+import numpy as np
+import tensorflow as tf
+# Declare list of features, we only have one real-valued feature
+def model(features, labels, mode):
+  # Build a linear model and predict values
+  W = tf.get_variable("W", [1], dtype=tf.float64)
+  b = tf.get_variable("b", [1], dtype=tf.float64)
+  y = W*features['x'] + b
+  # Loss sub-graph
+  loss = tf.reduce_sum(tf.square(y - labels))
+  # Training sub-graph
+  global_step = tf.train.get_global_step()
+  optimizer = tf.train.GradientDescentOptimizer(0.01)
+  train = tf.group(optimizer.minimize(loss),
+                   tf.assign_add(global_step, 1))
+  # ModelFnOps connects subgraphs we built to the
+  # appropriate functionality.
+  return tf.contrib.learn.ModelFnOps(
+      mode=mode, predictions=y,
+      loss=loss,
+      train_op=train)
+
+estimator = tf.contrib.learn.Estimator(model_fn=model)
+# define our data set
+x = np.array([1., 2., 3., 4.])
+y = np.array([0., -1., -2., -3.])
+input_fn = tf.contrib.learn.io.numpy_input_fn({"x": x}, y, 4, num_epochs=1000)
+
+# train
+estimator.fit(input_fn=input_fn, steps=1000)
+# evaluate our model
+print(estimator.evaluate(input_fn=input_fn, steps=10))
+```
+When run, it produces
+```python
+{'loss': 5.9819476e-11, 'global_step': 1000}
+```
+
+Notice how the contents of the custom `model()` function are very similar
+to our manual model training loop from the lower level API.
+
+## Next steps
+
+Now you have a working knowledge of the basics of TensorFlow. We have several
+more tutorials that you can look at to learn more. If you are a beginner in
+machine learning see @{$beginners$MNIST for beginners},
+otherwise see @{$pros$Deep MNIST for experts}.
--- a/tensorflow/docs_src/get_started/graph_viz.md
+++ b/tensorflow/docs_src/get_started/graph_viz.md
@ -2,10 +2,10 @@

 TensorFlow computation graphs are powerful but complicated. The graph visualization can help you understand and debug them. Here's an example of the visualization at work.

-![Visualization of a TensorFlow graph](../../images/graph_vis_animation.gif "Visualization of a TensorFlow graph")
+![Visualization of a TensorFlow graph](../images/graph_vis_animation.gif "Visualization of a TensorFlow graph")
 *Visualization of a TensorFlow graph.*

-To see your own graph, run TensorBoard pointing it to the log directory of the job, click on the graph tab on the top pane and select the appropriate run using the menu at the upper left corner. For in depth information on how to run TensorBoard and make sure you are logging all the necessary information, see [TensorBoard: Visualizing Learning](../../how_tos/summaries_and_tensorboard/index.md).
+To see your own graph, run TensorBoard pointing it to the log directory of the job, click on the graph tab on the top pane and select the appropriate run using the menu at the upper left corner. For in depth information on how to run TensorBoard and make sure you are logging all the necessary information, see @{$summaries_and_tensorboard$TensorBoard: Visualizing Learning}.

 ## Name scoping and nodes

@ -15,7 +15,7 @@ variable names can be scoped and the visualization uses this information to
 define a hierarchy on the nodes in the graph.  By default, only the top of this
 hierarchy is shown. Here is an example that defines three operations under the
 `hidden` name scope using
-[`tf.name_scope`](../../api_docs/python/framework.md#name_scope):
+@{tf.name_scope}:

 ```python
 import tensorflow as tf
@ -43,10 +43,10 @@ expanded states.
 <table width="100%;">
  <tr>
    <td style="width: 50%;">
-      <img src="../../images/pool1_collapsed.png" alt="Unexpanded name scope" title="Unexpanded name scope" />
+      <img src="../images/pool1_collapsed.png" alt="Unexpanded name scope" title="Unexpanded name scope" />
    </td>
    <td style="width: 50%;">
-      <img src="../../images/pool1_expanded.png" alt="Expanded name scope" title="Expanded name scope" />
+      <img src="../images/pool1_expanded.png" alt="Expanded name scope" title="Expanded name scope" />
    </td>
  </tr>
  <tr>
@ -87,10 +87,10 @@ and the auxiliary area.
 <table width="100%;">
  <tr>
    <td style="width: 50%;">
-      <img src="../../images/conv_1.png" alt="conv_1 is part of the main graph" title="conv_1 is part of the main graph" />
+      <img src="../images/conv_1.png" alt="conv_1 is part of the main graph" title="conv_1 is part of the main graph" />
    </td>
    <td style="width: 50%;">
-      <img src="../../images/save.png" alt="save is extracted as auxiliary node" title="save is extracted as auxiliary node" />
+      <img src="../images/save.png" alt="save is extracted as auxiliary node" title="save is extracted as auxiliary node" />
    </td>
  </tr>
  <tr>
@ -114,10 +114,10 @@ specific set of nodes.
 <table width="100%;">
  <tr>
    <td style="width: 50%;">
-      <img src="../../images/series.png" alt="Sequence of nodes" title="Sequence of nodes" />
+      <img src="../images/series.png" alt="Sequence of nodes" title="Sequence of nodes" />
    </td>
    <td style="width: 50%;">
-      <img src="../../images/series_expanded.png" alt="Expanded sequence of nodes" title="Expanded sequence of nodes" />
+      <img src="../images/series_expanded.png" alt="Expanded sequence of nodes" title="Expanded sequence of nodes" />
    </td>
  </tr>
  <tr>
@ -135,15 +135,15 @@ for constants and summary nodes. To summarize, here's a table of node symbols:

 Symbol | Meaning
 --- | ---
-![Name scope](../../images/namespace_node.png "Name scope") | *High-level* node representing a name scope. Double-click to expand a high-level node.
-![Sequence of unconnected nodes](../../images/horizontal_stack.png "Sequence of unconnected nodes") | Sequence of numbered nodes that are not connected to each other.
-![Sequence of connected nodes](../../images/vertical_stack.png "Sequence of connected nodes") | Sequence of numbered nodes that are connected to each other.
-![Operation node](../../images/op_node.png "Operation node") | An individual operation node.
-![Constant node](../../images/constant.png "Constant node") | A constant.
-![Summary node](../../images/summary.png "Summary node") | A summary node.
-![Data flow edge](../../images/dataflow_edge.png "Data flow edge") | Edge showing the data flow between operations.
-![Control dependency edge](../../images/control_edge.png "Control dependency edge") | Edge showing the control dependency between operations.
-![Reference edge](../../images/reference_edge.png "Reference edge") | A reference edge showing that the outgoing operation node can mutate the incoming tensor.
+![Name scope](../images/namespace_node.png "Name scope") | *High-level* node representing a name scope. Double-click to expand a high-level node.
+![Sequence of unconnected nodes](../images/horizontal_stack.png "Sequence of unconnected nodes") | Sequence of numbered nodes that are not connected to each other.
+![Sequence of connected nodes](../images/vertical_stack.png "Sequence of connected nodes") | Sequence of numbered nodes that are connected to each other.
+![Operation node](../images/op_node.png "Operation node") | An individual operation node.
+![Constant node](../images/constant.png "Constant node") | A constant.
+![Summary node](../images/summary.png "Summary node") | A summary node.
+![Data flow edge](../images/dataflow_edge.png "Data flow edge") | Edge showing the data flow between operations.
+![Control dependency edge](../images/control_edge.png "Control dependency edge") | Edge showing the control dependency between operations.
+![Reference edge](../images/reference_edge.png "Reference edge") | A reference edge showing that the outgoing operation node can mutate the incoming tensor.

 ## Interaction {#interaction}

@ -161,10 +161,10 @@ right corner of the visualization.
 <table width="100%;">
  <tr>
    <td style="width: 50%;">
-      <img src="../../images/infocard.png" alt="Info card of a name scope" title="Info card of a name scope" />
+      <img src="../images/infocard.png" alt="Info card of a name scope" title="Info card of a name scope" />
    </td>
    <td style="width: 50%;">
-      <img src="../../images/infocard_op.png" alt="Info card of operation node" title="Info card of operation node" />
+      <img src="../images/infocard_op.png" alt="Info card of operation node" title="Info card of operation node" />
    </td>
  </tr>
  <tr>
@ -207,10 +207,10 @@ The images below give an illustration for a piece of a real-life graph.
 <table width="100%;">
  <tr>
    <td style="width: 50%;">
-      <img src="../../images/colorby_structure.png" alt="Color by structure" title="Color by structure" />
+      <img src="../images/colorby_structure.png" alt="Color by structure" title="Color by structure" />
    </td>
    <td style="width: 50%;">
-      <img src="../../images/colorby_device.png" alt="Color by device" title="Color by device" />
+      <img src="../images/colorby_device.png" alt="Color by device" title="Color by device" />
    </td>
  </tr>
  <tr>
@ -228,12 +228,12 @@ The images below give an illustration for a piece of a real-life graph.
 When the serialized `GraphDef` includes tensor shapes, the graph visualizer
 labels edges with tensor dimensions, and edge thickness reflects total tensor
 size. To include tensor shapes in the `GraphDef` pass the actual graph object
-(as in `sess.graph`) to the `SummaryWriter` when serializing the graph.
+(as in `sess.graph`) to the `FileWriter` when serializing the graph.
 The images below show the CIFAR-10 model with tensor shape information:
 <table width="100%;">
  <tr>
    <td style="width: 100%;">
-      <img src="../../images/tensor_shapes.png" alt="CIFAR-10 model with tensor shape information" title="CIFAR-10 model with tensor shape information" />
+      <img src="../images/tensor_shapes.png" alt="CIFAR-10 model with tensor shape information" title="CIFAR-10 model with tensor shape information" />
    </td>
  </tr>
  <tr>
@ -248,8 +248,8 @@ The images below show the CIFAR-10 model with tensor shape information:
 Often it is useful to collect runtime metadata for a run, such as total memory
 usage, total compute time, and tensor shapes for nodes. The code example below
 is a snippet from the train and test section of a modification of the
-[simple MNIST tutorial](../../tutorials/mnist/beginners/index.md),
-in which we have recorded summaries and runtime statistics. See the [Summaries Tutorial](../../how_tos/summaries_and_tensorboard/index.md#serializing-the-data)
+@{$beginners$simple MNIST tutorial},
+in which we have recorded summaries and runtime statistics. See the @{$summaries_and_tensorboard#serializing-the-data$Summaries Tutorial}
 for details on how to record summaries.
 Full source is [here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py).

@ -303,13 +303,13 @@ tensor output sizes.
 <table width="100%;">
  <tr style="height: 380px">
    <td>
-      <img src="../../images/colorby_compute_time.png" alt="Color by compute time" title="Color by compute time"/>
+      <img src="../images/colorby_compute_time.png" alt="Color by compute time" title="Color by compute time"/>
    </td>
    <td>
-      <img src="../../images/run_metadata_graph.png" alt="Run metadata graph" title="Run metadata graph" />
+      <img src="../images/run_metadata_graph.png" alt="Run metadata graph" title="Run metadata graph" />
    </td>
    <td>
-      <img src="../../images/run_metadata_infocard.png" alt="Run metadata info card" title="Run metadata info card" />
+      <img src="../images/run_metadata_infocard.png" alt="Run metadata info card" title="Run metadata info card" />
    </td>
  </tr>
 </table>
--- a/tensorflow/g3doc/tutorials/input_fn/index.md
+++ b/tensorflow/g3doc/tutorials/input_fn/index.md
@ -10,8 +10,7 @@ median house values.

 When training a neural network using tf.contrib.learn, it's possible to pass
 your feature and target data directly into your `fit`, `evaluate`, or `predict`
-operations. Here's an example taken from the [tf.contrib.learn quickstart
-tutorial](../tflearn/index.md):
+operations. Here's an example taken from the @{$tflearn$tf.contrib.learn quickstart tutorial}:

 ```py
 training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
@ -105,7 +104,7 @@ This corresponds to the following dense tensor:
 ```

 For more on `SparseTensor`, see the
-[TensorFlow API documentation](../../api_docs/python/sparse_ops.md#SparseTensor).
+@{tf.SparseTensor}.

 ### Passing input_fn Data to Your Model

@ -214,8 +213,7 @@ here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/input_fn/bos

 ### Importing the Housing Data

-To start, set up your imports (including `pandas` and `tensorflow`) and [set
-logging verbosity](../monitors/index.md#enabling-logging-with-tensorflow) to
+To start, set up your imports (including `pandas` and `tensorflow`) and @{$monitors#enabling-logging-with-tensorflow$set logging verbosity} to
 `INFO` for more detailed log output:

 ```python
@ -233,8 +231,8 @@ tf.logging.set_verbosity(tf.logging.INFO)

 Define the column names for the data set in `COLUMNS`. To distinguish features
 from the label, also define `FEATURES` and `LABEL`. Then read the three CSVs
-([train](http://download.tensorflow.org/data/boston_train.csv),
-[test](http://download.tensorflow.org/data/boston_test.csv), and
+(@{tf.train},
+@{tf.test}, and
 [predict](http://download.tensorflow.org/data/boston_predict.csv)) into _pandas_
 `DataFrame`s:

@ -266,9 +264,9 @@ feature_cols = [tf.contrib.layers.real_valued_column(k)
 ```

 NOTE: For a more in-depth overview of feature columns, see
-[this introduction](../linear/overview.md#feature-columns-and-transformations),
+@{$linear#feature-columns-and-transformations$this introduction},
 and for an example that illustrates how to define `FeatureColumns` for
-categorical data, see the [Linear Model Tutorial](../wide/index.md).
+categorical data, see the @{$wide$Linear Model Tutorial}.

 Now, instantiate a `DNNRegressor` for the neural network regression model.
 You'll need to provide two arguments here: `hidden_units`, a hyperparameter
@ -376,16 +374,16 @@ This tutorial focused on creating an `input_fn` for a neural network regressor.
 To learn more about using `input_fn`s for other types of models, check out the
 following resources:

-*   [Large-scale Linear Models with TensorFlow](../linear/overview.md): This
+*   @{$linear$Large-scale Linear Models with TensorFlow}: This
    introduction to linear models in TensorFlow provides a high-level overview
    of feature columns and techniques for transforming input data.

-*   [TensorFlow Linear Model Tutorial](../wide/index.md): This tutorial covers
+*   @{$wide$TensorFlow Linear Model Tutorial}: This tutorial covers
    creating `FeatureColumn`s and an `input_fn` for a linear classification
    model that predicts income range based on census data.

-*   [TensorFlow Wide & Deep Learning Tutorial](../wide_and_deep/index.md): Building on
-    the [Linear Model Tutorial](../wide/index.md), this tutorial covers
+*   @{$wide_and_deep$TensorFlow Wide & Deep Learning Tutorial}: Building on
+    the @{$wide$Linear Model Tutorial}, this tutorial covers
    `FeatureColumn` and `input_fn` creation for a "wide and deep" model that
    combines a linear model and a neural network using
    `DNNLinearCombinedClassifier`.
--- a/tensorflow/docs_src/get_started/leftnav_files
+++ b/tensorflow/docs_src/get_started/leftnav_files
@ -0,0 +1,10 @@
+get_started.md
+mnist/beginners.md
+mnist/pros.md
+mnist/mechanics.md
+tflearn.md
+input_fn.md
+summaries_and_tensorboard.md
+embedding_viz.md
+graph_viz.md
+monitors.md
--- a/tensorflow/g3doc/tutorials/mnist/beginners/index.md
+++ b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
@ -3,8 +3,8 @@
 *This tutorial is intended for readers who are new to both machine learning and
 TensorFlow. If you already know what MNIST is, and what softmax (multinomial
 logistic) regression is, you might prefer this
-[faster paced tutorial](../pros/index.md).  Be sure to
-[install TensorFlow](../../../get_started/os_setup.md) before starting either
+@{$pros$faster paced tutorial}.  Be sure to
+@{$install$install TensorFlow} before starting either
 tutorial.*

 When one learns how to program, there's a tradition that the first thing you do
@ -15,7 +15,7 @@ MNIST is a simple computer vision dataset. It consists of images of handwritten
 digits like these:

 <div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/MNIST.png">
+<img style="width:100%" src="../../images/MNIST.png">
 </div>

 It also includes labels for each image, telling us which digit it is. For
@ -88,7 +88,7 @@ Each image is 28 pixels by 28 pixels. We can interpret this as a big array of
 numbers:

 <div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/MNIST-Matrix.png">
+<img style="width:100%" src="../../images/MNIST-Matrix.png">
 </div>

 We can flatten this array into a vector of 28x28 = 784 numbers. It doesn't
@ -110,7 +110,7 @@ Each entry in the tensor is a pixel intensity between 0 and 1, for a particular
 pixel in a particular image.

 <div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/mnist-train-xs.png">
+<img style="width:100%" src="../../images/mnist-train-xs.png">
 </div>

 Each image in MNIST has a corresponding label, a number between 0 and 9
@ -124,7 +124,7 @@ vector which is 1 in the \\(n\\)th dimension. For example, 3 would be
 `[55000, 10]` array of floats.

 <div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/mnist-train-ys.png">
+<img style="width:100%" src="../../images/mnist-train-ys.png">
 </div>

 We're now ready to actually make our model!
@ -157,7 +157,7 @@ classes. Red represents negative weights, while blue represents positive
 weights.

 <div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/softmax-weights.png">
+<img style="width:100%" src="../../images/softmax-weights.png">
 </div>

 We also add some extra evidence called a bias. Basically, we want to be able
@ -202,13 +202,13 @@ although with a lot more \\(x\\)s. For each output, we compute a weighted sum of
 the \\(x\\)s, add a bias, and then apply softmax.

 <div style="width:55%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/softmax-regression-scalargraph.png">
+<img style="width:100%" src="../../images/softmax-regression-scalargraph.png">
 </div>

 If we write that out as equations, we get:

 <div style="width:52%; margin-left:25%; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/softmax-regression-scalarequation.png">
+<img style="width:100%" src="../../images/softmax-regression-scalarequation.png">
 </div>

 We can "vectorize" this procedure, turning it into a matrix multiplication
@ -216,7 +216,7 @@ and vector addition. This is helpful for computational efficiency. (It's also
 a useful way to think.)

 <div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../../../images/softmax-regression-vectorequation.png">
+<img style="width:100%" src="../../images/softmax-regression-vectorequation.png">
 </div>

 More compactly, we can just write:
@ -368,7 +368,7 @@ In this case, we ask TensorFlow to minimize `cross_entropy` using the
 with a learning rate of 0.5. Gradient descent is a simple procedure, where
 TensorFlow simply shifts each variable a little bit in the direction that
 reduces the cost. But TensorFlow also provides
-[many other optimization algorithms](../../../api_docs/python/train.md#optimizers):
+@{$python/train#optimizers$many other optimization algorithms}:
 using one is as simple as tweaking one line.

 What TensorFlow actually does here, behind the scenes, is to add new operations
@ -449,5 +449,5 @@ this

 What matters is that we learned from this model. Still, if you're feeling a bit
 down about these results, check out
-[the next tutorial](../../../tutorials/mnist/pros/index.md) where we do a lot
+@{$pros$the next tutorial} where we do a lot
 better, and learn how to build more sophisticated models using TensorFlow!
--- a/tensorflow/docs_src/get_started/mnist/mechanics.md
+++ b/tensorflow/docs_src/get_started/mnist/mechanics.md
@ -10,7 +10,8 @@ TensorFlow.

 These tutorials are not intended for teaching Machine Learning in general.

-Please ensure you have followed the instructions to [install TensorFlow](../../../get_started/os_setup.md).
+Please ensure you have followed the instructions to
+@{$install$install TensorFlow}.

 ## Tutorial Files

@ -33,7 +34,7 @@ MNIST is a classic problem in machine learning. The problem is to look at
 greyscale 28x28 pixel images of handwritten digits and determine which digit
 the image represents, for all the digits from zero to nine.

-![MNIST Digits](../../../images/mnist_digits.png "MNIST Digits")
+![MNIST Digits](../../images/mnist_digits.png "MNIST Digits")

 For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
 or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
@ -60,7 +61,7 @@ Dataset | Purpose

 ### Inputs and Placeholders

-The `placeholder_inputs()` function creates two [`tf.placeholder`](../../../api_docs/python/io_ops.md#placeholder)
+The `placeholder_inputs()` function creates two @{tf.placeholder}
 ops that define the shape of the inputs, including the `batch_size`, to the
 rest of the graph and into which the actual training examples will be fed.

@ -89,7 +90,7 @@ loss.
 and apply gradients.

 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../../../images/mnist_subgraph.png">
+  <img style="width:100%" src="../../images/mnist_subgraph.png">
 </div>

 ### Inference
@ -101,7 +102,7 @@ It takes the images placeholder as input and builds on top
 of it a pair of fully connected layers with [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) activation followed by a ten
 node linear layer specifying the output logits.

-Each layer is created beneath a unique [`tf.name_scope`](../../../api_docs/python/framework.md#name_scope)
+Each layer is created beneath a unique @{tf.name_scope}
 that acts as a prefix to the items created within that scope.

 ```python
@ -109,7 +110,7 @@ with tf.name_scope('hidden1'):
 ```

 Within the defined scope, the weights and biases to be used by each of these
-layers are generated into [`tf.Variable`](../../../api_docs/python/state_ops.md#Variable)
+layers are generated into @{tf.Variable}
 instances, with their desired shapes:

 ```python
@ -127,7 +128,7 @@ name given to the weights variable would be "`hidden1/weights`".
 Each variable is given initializer ops as part of their construction.

 In this most common case, the weights are initialized with the
-[`tf.truncated_normal`](../../../api_docs/python/constant_op.md#truncated_normal)
+@{tf.truncated_normal}
 and given their shape of a 2-D tensor with
 the first dim representing the number of units in the layer from which the
 weights connect and the second dim representing the number of
@ -137,12 +138,12 @@ weights are connecting the image inputs to the hidden1 layer.  The
 `tf.truncated_normal` initializer generates a random distribution with a given
 mean and standard deviation.

-Then the biases are initialized with [`tf.zeros`](../../../api_docs/python/constant_op.md#zeros)
+Then the biases are initialized with @{tf.zeros}
 to ensure they start with all zero values, and their shape is simply the number
 of units in the layer to which they connect.

-The graph's three primary ops -- two [`tf.nn.relu`](../../../api_docs/python/nn.md#relu)
-ops wrapping [`tf.matmul`](../../../api_docs/python/math_ops.md#matmul)
+The graph's three primary ops -- two @{tf.nn.relu}
+ops wrapping @{tf.matmul}
 for the hidden layers and one extra `tf.matmul` for the logits -- are then
 created, each in turn, with separate `tf.Variable` instances connected to each
 of the input placeholders or the output tensors of the previous layer.
@ -166,7 +167,7 @@ Finally, the `logits` tensor that will contain the output is returned.
 The `loss()` function further builds the graph by adding the required loss
 ops.

-First, the values from the `labels_placeholder` are converted to 64-bit integers. Then, a [`tf.nn.sparse_softmax_cross_entropy_with_logits`](../../../api_docs/python/nn.md#sparse_softmax_cross_entropy_with_logits) op is added to automatically produce 1-hot labels from the `labels_placeholder` and compare the output logits from the `inference()` function with those 1-hot labels.
+First, the values from the `labels_placeholder` are converted to 64-bit integers. Then, a @{tf.nn.sparse_softmax_cross_entropy_with_logits} op is added to automatically produce 1-hot labels from the `labels_placeholder` and compare the output logits from the `inference()` function with those 1-hot labels.

 ```python
 labels = tf.to_int64(labels)
@ -174,7 +175,7 @@ cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=labels, logits=logits, name='xentropy')
 ```

-It then uses [`tf.reduce_mean`](../../../api_docs/python/math_ops.md#reduce_mean)
+It then uses @{tf.reduce_mean}
 to average the cross entropy values across the batch dimension (the first
 dimension) as the total loss.

@ -195,16 +196,16 @@ The `training()` function adds the operations needed to minimize the loss via
 [Gradient Descent](https://en.wikipedia.org/wiki/Gradient_descent).

 Firstly, it takes the loss tensor from the `loss()` function and hands it to a
-[`tf.summary.scalar`](../../../api_docs/python/summary#scalar),
+@{tf.summary.scalar},
 an op for generating summary values into the events file when used with a
-`SummaryWriter` (see below).  In this case, it will emit the snapshot value of
+@{tf.summary.FileWriter} (see below).  In this case, it will emit the snapshot value of
 the loss every time the summaries are written out.

 ```python
 tf.summary.scalar('loss', loss)
 ```

-Next, we instantiate a [`tf.train.GradientDescentOptimizer`](../../../api_docs/python/train.md#GradientDescentOptimizer)
+Next, we instantiate a @{tf.train.GradientDescentOptimizer}
 responsible for applying gradients with the requested learning rate.

 ```python
@ -212,7 +213,7 @@ optimizer = tf.train.GradientDescentOptimizer(learning_rate)
 ```

 We then generate a single variable to contain a counter for the global
-training step and the [`minimize()`](../../../api_docs/python/train.md#Optimizer.minimize)
+training step and the @{tf.train.Optimizer.minimize}
 op is used to both update the trainable weights in the system and increment the
 global step.  This op is, by convention, known as the `train_op` and is what must
 be run by a TensorFlow session in order to induce one full step of training
@ -232,7 +233,7 @@ controlled by the user code in `fully_connected_feed.py`.

 At the top of the `run_training()` function is a python `with` command that
 indicates all of the built ops are to be associated with the default
-global [`tf.Graph`](../../../api_docs/python/framework.md#Graph)
+global @{tf.Graph}
 instance.

 ```python
@ -248,7 +249,7 @@ this simple tutorial.
 ### The Session

 Once all of the build preparation has been completed and all of the necessary
-ops generated, a [`tf.Session`](../../../api_docs/python/client.md#Session)
+ops generated, a @{tf.Session}
 is created for running the graph.

 ```python
@ -265,7 +266,7 @@ The empty parameter to session indicates that this code will attach to
 (or create if not yet created) the default local session.

 Immediately after creating the session, all of the `tf.Variable`
-instances are initialized by calling [`sess.run()`](../../../api_docs/python/client.md#Session.run)
+instances are initialized by calling @{tf.Session.run}
 on their initialization op.

 ```python
@ -273,10 +274,10 @@ init = tf.global_variables_initializer()
 sess.run(init)
 ```

-The [`sess.run()`](../../../api_docs/python/client.md#Session.run)
+The @{tf.Session.run}
 method will run the complete subset of the graph that
 corresponds to the op(s) passed as parameters.  In this first call, the `init`
-op is a [`tf.group`](../../../api_docs/python/control_flow_ops.md#group)
+op is a @{tf.group}
 that contains only the initializers for the variables.  None of the rest of the
 graph is run here; that happens in the training loop below.

@ -355,20 +356,20 @@ if step % 100 == 0:

 #### Visualize the Status

-In order to emit the events files used by [TensorBoard](../../../how_tos/summaries_and_tensorboard/index.md),
+In order to emit the events files used by @{$summaries_and_tensorboard$TensorBoard},
 all of the summaries (in this case, only one) are collected into a single Tensor
 during the graph building phase.

 ```python
-summary = tf.merge_all_summaries()
+summary = tf.summary.merge_all()
 ```

-And then after the session is created, a [`tf.train.SummaryWriter`](../../../api_docs/python/train/adding_summaries_to_event_files#SummaryWriter)
+And then after the session is created, a @{tf.summary.FileWriter}
 may be instantiated to write the events files, which
 contain both the graph itself and the values of the summaries.

 ```python
-summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
+summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)
 ```

 Lastly, the events file will be updated with new summary values every time the
@ -383,21 +384,21 @@ summary_writer.add_summary(summary_str, step)
 When the events files are written, TensorBoard may be run against the training
 folder to display the values from the summaries.

-![MNIST TensorBoard](../../../images/mnist_tensorboard.png "MNIST TensorBoard")
+![MNIST TensorBoard](../../images/mnist_tensorboard.png "MNIST TensorBoard")

-**NOTE**: For more info about how to build and run Tensorboard, please see the accompanying tutorial [Tensorboard: Visualizing Learning](../../../how_tos/summaries_and_tensorboard/index.md).
+**NOTE**: For more info about how to build and run Tensorboard, please see the accompanying tutorial @{$summaries_and_tensorboard$Tensorboard: Visualizing Learning}.

 #### Save a Checkpoint

 In order to emit a checkpoint file that may be used to later restore a model
 for further training or evaluation, we instantiate a
-[`tf.train.Saver`](../../../api_docs/python/state_ops.md#Saver).
+@{tf.train.Saver}.

 ```python
 saver = tf.train.Saver()
 ```

-In the training loop, the [`saver.save()`](../../../api_docs/python/state_ops.md#Saver.save)
+In the training loop, the @{tf.train.Saver.save}
 method will periodically be called to write a checkpoint file to the training
 directory with the current values of all the trainable variables.

@ -406,7 +407,7 @@ saver.save(sess, FLAGS.train_dir, global_step=step)
 ```

 At some later point in the future, training might be resumed by using the
-[`saver.restore()`](../../../api_docs/python/state_ops.md#Saver.restore)
+@{tf.train.Saver.restore}
 method to reload the model parameters.

 ```python
@ -455,7 +456,7 @@ logits/labels parameters as the `loss()` function.
 eval_correct = mnist.evaluation(logits, labels_placeholder)
 ```

-The `evaluation()` function simply generates a [`tf.nn.in_top_k`](../../../api_docs/python/nn.md#in_top_k)
+The `evaluation()` function simply generates a @{tf.nn.in_top_k}
 op that can automatically score each model output as correct if the true label
 can be found in the K most-likely predictions.  In this case, we set the value
 of K to 1 to only consider a prediction correct if it is for the true label.
--- a/tensorflow/g3doc/tutorials/mnist/pros/index.md
+++ b/tensorflow/g3doc/tutorials/mnist/pros/index.md
@ -8,8 +8,8 @@ TensorFlow model while constructing a deep convolutional MNIST classifier.
 *This introduction assumes familiarity with neural networks and the MNIST
 dataset. If you don't have
 a background with them, check out the
-[introduction for beginners](../beginners/index.md). Be sure to
-[install TensorFlow](../../../get_started/os_setup.md) before starting.*
+@{$beginners$introduction for beginners}. Be sure to
+@{$install$install TensorFlow} before starting.*


 ## About this tutorial
@ -63,12 +63,12 @@ programs is to first create a graph and then launch it in a session.
 Here we instead use the convenient `InteractiveSession` class, which makes
 TensorFlow more flexible about how you structure your code.  It allows you to
 interleave operations which build a
-[computation graph](../../../get_started/basic_usage.md#the-computation-graph)
+@{$get_started#the_computational_graph$computation graph}
 with ones that run the graph.  This is particularly convenient when working in
 interactive contexts like IPython.  If you are not using an
 `InteractiveSession`, then you should build the entire computation graph before
 starting a session and
-[launching the graph](../../../get_started/basic_usage.md#launching-the-graph-in-a-session).
+@{$get_started#the_computational_graph$launching the graph}.

 ```python
 import tensorflow as tf
@ -93,11 +93,8 @@ similar to that used in Theano or Torch.

 The role of the Python code is therefore to build this external computation
 graph, and to dictate which parts of the computation graph should be run. See
-the
-[Computation Graph](../../../get_started/basic_usage.md#the-computation-graph)
-section of
-[Basic Usage](../../../get_started/basic_usage.md)
-for more detail.
+the @{$get_started#the_computational_graph$Computation Graph}
+section of @{$get_started} for more detail.

 ## Build a Softmax Regression Model

@ -187,7 +184,7 @@ Now that we have defined our model and training loss function, it is
 straightforward to train using TensorFlow.  Because TensorFlow knows the entire
 computation graph, it can use automatic differentiation to find the gradients of
 the loss with respect to each of the variables.  TensorFlow has a variety of
-[built-in optimization algorithms](../../../api_docs/python/train.md#optimizers).
+@{$python/train#optimizers$built-in optimization algorithms}.
 For this example, we will use steepest gradient descent, with a step length of
 0.5, to descend the cross entropy.

@ -357,7 +354,8 @@ We create a `placeholder` for the probability that a neuron's output is kept
 during dropout. This allows us to turn dropout on during training, and turn it
 off during testing.
 TensorFlow's `tf.nn.dropout` op automatically handles scaling neuron outputs in
-addition to masking them, so dropout just works without any additional scaling.<sup id="a1">[1](#f1)</sup>
+addition to masking them, so dropout just works without any additional
+scaling.<sup id="a1">[1](#f1)</sup>

 ```python
 keep_prob = tf.placeholder(tf.float32)
--- a/tensorflow/g3doc/tutorials/monitors/index.md
+++ b/tensorflow/g3doc/tutorials/monitors/index.md
@ -4,14 +4,14 @@ When training a model, it’s often valuable to track and evaluate progress in
 real time. In this tutorial, you’ll learn how to use TensorFlow’s logging
 capabilities and the `Monitor` API to audit the in-progress training of a neural
 network classifier for categorizing irises. This tutorial builds on the code
-developed in [tf.contrib.learn Quickstart](../tflearn/index.md) so if you
+developed in @{$tflearn$tf.contrib.learn Quickstart} so if you
 haven't yet completed that tutorial, you may want to explore it first,
 especially if you're looking for an intro/refresher on tf.contrib.learn basics.

 ## Setup {#setup}

 For this tutorial, you'll be building upon the following code from
-[tf.contrib.learn Quickstart](../tflearn/index.md):
+@{$tflearn$tf.contrib.learn Quickstart}:

 ```python
 from __future__ import absolute_import
@ -65,7 +65,7 @@ if __name__ == "__main__":

 Copy the above code into a file, and download the corresponding
 [training](http://download.tensorflow.org/data/iris_training.csv) and
-[test](http://download.tensorflow.org/data/iris_test.csv) data sets to the same
+@{tf.test} data sets to the same
 directory.

 In the following sections, you'll progressively make updates to the above code
@ -75,7 +75,7 @@ here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/monitors/iri

 ## Overview

-The [tf.contrib.learn Quickstart tutorial](../tflearn/index.md) walked through
+The @{$tflearn$tf.contrib.learn Quickstart tutorial} walked through
 how to implement a neural net classifier to categorize iris examples into one of
 three species.

@ -98,7 +98,7 @@ One way to address this problem would be to split model training into multiple
 `fit` calls with smaller numbers of steps in order to evaluate accuracy more
 progressively. However, this is not recommended practice, as it greatly slows
 down model training. Fortunately, tf.contrib.learn offers another solution: a
-[Monitor API](../../api_docs/python/contrib.learn.monitors.md) designed to help
+@{tf.contrib.learn.monitors$Monitor API} designed to help
 you log metrics and evaluate your model while training is in progress. In the
 following sections, you'll learn how to enable logging in TensorFlow, set up a
 ValidationMonitor to do streaming evaluations, and visualize your metrics using
@ -149,7 +149,7 @@ Monitor             | Description
 ------------------- | -----------
 `CaptureVariable`   | Saves a specified variable's values into a collection at every _n_ steps of training
 `PrintTensor`       | Logs a specified tensor's values at every _n_ steps of training
-`SummarySaver`      | Saves [`Summary`](../../api_docs/python/train.md#summary-operations) [protocol buffers](https://developers.google.com/protocol-buffers/) for a given tensor using a [`SummaryWriter`](../../api_docs/python/train.md#SummaryWriter) at every _n_ steps of training
+`SummarySaver`      | Saves @{tf.Summary} [protocol buffers](https://developers.google.com/protocol-buffers/) for a given tensor using a @{tf.summary.FileWriter} at every _n_ steps of training
 `ValidationMonitor` | Logs a specified set of evaluation metrics at every _n_ steps of training, and, if desired, implements early stopping under certain conditions

 ### Evaluating Every *N* Steps
@ -173,7 +173,7 @@ Place this code right before the line instantiating the `classifier`.

 `ValidationMonitor`s rely on saved checkpoints to perform evaluation operations,
 so you'll want to modify instantiation of the `classifier` to add a
-[`RunConfig`](../../api_docs/python/contrib.learn.md#RunConfig) that includes
+@{tf.contrib.learn.RunConfig} that includes
 `save_checkpoints_secs`, which specifies how many seconds should elapse between
 checkpoint saves during training. Because the iris data set is quite small, and
 thus trains quickly, it makes sense to set `save_checkpoints_secs` to 1 (saving
@ -234,11 +234,10 @@ object.
 The `MetricSpec` constructor accepts four parameters:

 *   `metric_fn`. The function that calculates and returns the value of a metric.
-    This can be a predefined function available in the [tf.contrib.metrics
-    module](../../api_docs/python/contrib.metrics.md), such as
-    [`streaming_precision`](https://www.tensorflow.org/code/tensorflow/contrib/metrics/python/ops/metric_ops.py)
-    or
-    [`streaming_recall`](https://www.tensorflow.org/code/tensorflow/contrib/metrics/python/ops/metric_ops.py).
+    This can be a predefined function available in the
+    @{tf.contrib.metrics} module, such as
+    @{tf.contrib.metrics.streaming_precision} or
+    @{tf.contrib.metrics.streaming_recall}.

    Alternatively, you can define your own custom metric function, which must
    take `predictions` and `labels` tensors as arguments (a `weights` argument
@ -254,10 +253,10 @@ The `MetricSpec` constructor accepts four parameters:
    by the model. This argument may be omitted if the model returns either a
    single tensor or a dict with a single entry. For a `DNNClassifier` model,
    class predictions will be returned in a tensor with the key
-    [`PredictionKey.CLASSES`](https://www.tensorflow.org/code/tensorflow/contrib/learn/python/learn/estimators/prediction_key.py).
+    @{tf.contrib.learn.PredictionKey.CLASSES}.

 *   `label_key`. The key of the tensor containing the labels returned by the
-    model, as specified by the model's [`input_fn`](../input_fn/index.md). As
+    model, as specified by the model's @{$input_fn$`input_fn`}. As
    with `prediction_key`, this argument may be omitted if the `input_fn`
    returns either a single tensor or a dict with a single entry. In the iris
    example in this tutorial, the `DNNClassifier` does not have an `input_fn`
@ -265,20 +264,17 @@ The `MetricSpec` constructor accepts four parameters:
    a `label_key`.

 *   `weights_key`. *Optional*. The key of the tensor (returned by the
-    [`input_fn`](../input_fn/index.md)) containing weights inputs for the
+    @{$input_fn$`input_fn`}) containing weights inputs for the
    `metric_fn`.

 The following code creates a `validation_metrics` dict that defines three
 metrics to log during model evaluation:

-*   `"accuracy"`, using
-    [`streaming_accuracy`](https://www.tensorflow.org/code/tensorflow/contrib/metrics/python/ops/metric_ops.py)
+*   `"accuracy"`, using @{tf.contrib.metrics.streaming_accuracy}
    as the `metric_fn`
-*   `"precision"`, using
-    [`streaming_precision`](https://www.tensorflow.org/code/tensorflow/contrib/metrics/python/ops/metric_ops.py)
+*   `"precision"`, using @{tf.contrib.metrics.streaming_precision}
    as the `metric_fn`
-*   `"recall"`, using
-    [`streaming_recall`](https://www.tensorflow.org/code/tensorflow/contrib/metrics/python/ops/metric_ops.py)
+*   `"recall"`, using @{tf.contrib.metrics.streaming_recall}
    as the `metric_fn`

 ```python
@ -329,8 +325,8 @@ INFO:tensorflow:Validation (step 1500): recall = 1.0, loss = 0.0617403, global_s

 Note that in the above log output, by step 600, the model has already achieved
 precision and recall rates of 1.0. This raises the question as to whether model
-training could benefit from [early
-stopping](https://en.wikipedia.org/wiki/Early_stopping).
+training could benefit from
+[early stopping](https://en.wikipedia.org/wiki/Early_stopping).

 In addition to logging eval metrics, `ValidationMonitor`s make it easy to
 implement early stopping when specified conditions are met, via three params:
@ -408,9 +404,6 @@ Then navigate to `http://0.0.0.0:`*`<port_number>`* in your browser, where
 If you click on the accuracy field, you'll see an image like the following,
 which shows accuracy plotted against step count:

-![Accuracy over step count in
-TensorBoard](../../images/validation_monitor_tensorboard_accuracy.png "Accuracy over step count in TensorBoard")
+![Accuracy over step count in TensorBoard](../images/validation_monitor_tensorboard_accuracy.png "Accuracy over step count in TensorBoard")

-For more on using TensorBoard, see [TensorBoard: Visualizing
-Learning](../../how_tos/summaries_and_tensorboard/index.md) and [TensorBoard:
-Graph Visualization](../../how_tos/graph_viz/index.md).
+For more on using TensorBoard, see @{$summaries_and_tensorboard$TensorBoard: Visualizing Learning} and @{$graph_viz$TensorBoard: Graph Visualization}.
--- a/tensorflow/docs_src/get_started/summaries_and_tensorboard.md
+++ b/tensorflow/docs_src/get_started/summaries_and_tensorboard.md
@ -8,7 +8,7 @@ your TensorFlow graph, plot quantitative metrics about the execution of your
 graph, and show additional data like images that pass through it. When
 TensorBoard is fully configured, it looks like this:

-![MNIST TensorBoard](../../images/mnist_tensorboard.png "MNIST TensorBoard")
+![MNIST TensorBoard](../images/mnist_tensorboard.png "MNIST TensorBoard")


 This tutorial is intended to get you started with simple TensorBoard usage.
@ -24,12 +24,12 @@ lifecycle for summary data within TensorBoard.

 First, create the TensorFlow graph that you'd like to collect summary
 data from, and decide which nodes you would like to annotate with
-[summary operations](../../api_docs/python/summary.md).
+@{$python/summary$summary operations}.

 For example, suppose you are training a convolutional neural network for
 recognizing MNIST digits. You'd like to record how the learning rate
 varies over time, and how the objective function is changing. Collect these by
-attaching [`scalar_summary`](../../api_docs/python/summary.md#scalar) ops
+attaching @{tf.summary.scalar} ops
 to the nodes that output the learning rate and loss respectively. Then, give
 each `scalar_summary` a meaningful `tag`, like `'learning rate'` or `'loss
 function'`.
@ -37,24 +37,24 @@ function'`.
 Perhaps you'd also like to visualize the distributions of activations coming
 off a particular layer, or the distribution of gradients or weights. Collect
 this data by attaching
-[`histogram_summary`](../../api_docs/python/summary.md#histogram) ops to
+@{tf.summary.histogram} ops to
 the gradient outputs and to the variable that holds your weights, respectively.

 For details on all of the summary operations available, check out the docs on
-[summary operations](../../api_docs/python/summary.md).
+@{$python/summary$summary operations}.

 Operations in TensorFlow don't do anything until you run them, or an op that
 depends on their output. And the summary nodes that we've just created are
 peripheral to your graph: none of the ops you are currently running depend on
 them. So, to generate summaries, we need to run all of these summary nodes.
 Managing them by hand would be tedious, so use
-[`tf.summary.merge_all`](../../api_docs/python/summary.md#merge_all)
+@{tf.summary.merge_all}
 to combine them into a single op that generates all the summary data.

 Then, you can just run the merged summary op, which will generate a serialized
 `Summary` protobuf object with all of your summary data at a given step.
 Finally, to write this summary data to disk, pass the summary protobuf to a
-[`tf.summary.FileWriter`](../../api_docs/python/summary.md#FileWriter).
+@{tf.summary.FileWriter}.

 The `FileWriter` takes a logdir in its constructor - this logdir is quite
 important, it's the directory where all of the events will be written out.
@ -62,7 +62,7 @@ Also, the `FileWriter` can optionally take a `Graph` in its constructor.
 If it receives a `Graph` object, then TensorBoard will visualize your graph
 along with tensor shape information. This will give you a much better sense of
 what flows through the graph: see
-[Tensor shape information](../../how_tos/graph_viz/index.md#tensor-shape-information).
+@{$graph_viz#tensor-shape-information$Tensor shape information}.

 Now that you've modified your graph and have a `FileWriter`, you're ready to
 start running your network! If you want, you could run the merged summary op
@ -71,7 +71,7 @@ data than you need, though. Instead, consider running the merged summary op
 every `n` steps.

 The code example below is a modification of the
-[simple MNIST tutorial](http://tensorflow.org/tutorials/mnist/beginners/index.md),
+@{$beginners$simple MNIST tutorial},
 in which we have added some summary ops, and run them every ten steps. If you
 run this and then launch `tensorboard --logdir=/tmp/mnist_logs`, you'll be able
 to visualize statistics, such as how the weights or accuracy varied during
@ -209,7 +209,7 @@ When looking at TensorBoard, you will see the navigation tabs in the top right
 corner. Each tab represents a set of serialized data that can be visualized.

 For in depth information on how to use the *graph* tab to visualize your graph,
-see [TensorBoard: Graph Visualization](../../how_tos/graph_viz/index.md).
+see @{$graph_viz$TensorBoard: Graph Visualization}.

 For more usage information on TensorBoard in general, see the [TensorBoard
 README](https://www.tensorflow.org/code/tensorflow/tensorboard/README.md).
--- a/tensorflow/g3doc/tutorials/tflearn/index.md
+++ b/tensorflow/g3doc/tutorials/tflearn/index.md
@ -2,23 +2,21 @@

 TensorFlow’s high-level machine learning API (tf.contrib.learn) makes it easy to
 configure, train, and evaluate a variety of machine learning models. In this
-tutorial, you’ll use tf.contrib.learn to construct a [neural
-network](https://en.wikipedia.org/wiki/Artificial_neural_network) classifier and
-train it on the [Iris data
-set](https://en.wikipedia.org/wiki/Iris_flower_data_set) to predict flower
-species based on sepal/petal geometry. You'll write code to perform the
-following five steps:
+tutorial, you’ll use tf.contrib.learn to construct a
+[neural network](https://en.wikipedia.org/wiki/Artificial_neural_network)
+classifier and train it on the
+[Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set) to
+predict flower species based on sepal/petal geometry. You'll write code to
+perform the following five steps:

 1.  Load CSVs containing Iris training/test data into a TensorFlow `Dataset`
-2.  Construct a [neural network
-    classifier](../../api_docs/python/contrib.learn.md#DNNClassifier)
+2.  Construct a @{tf.contrib.learn.DNNClassifier$neural network classifier}
 3.  Fit the model using the training data
 4.  Evaluate the accuracy of the model
 5.  Classify new samples

-NOTE: Remember to [install TensorFlow on your
-machine](../../get_started/os_setup.md#download-and-setup) before getting
-started with this tutorial.
+NOTE: Remember to @{$install$install TensorFlow on your machine}
+before getting started with this tutorial.

 ## Complete Neural Network Source Code

@ -80,8 +78,7 @@ The [Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set) contains
 150 rows of data, comprising 50 samples from each of three related Iris species:
 *Iris setosa*, *Iris virginica*, and *Iris versicolor*.

-![Petal geometry compared for three iris species: Iris setosa, Iris virginica,
-and Iris versicolor](../../images/iris_three_species.jpg) **From left to right,
+![Petal geometry compared for three iris species: Iris setosa, Iris virginica, and Iris versicolor](../images/iris_three_species.jpg) **From left to right,
 [*Iris setosa*](https://commons.wikimedia.org/w/index.php?curid=170298) (by
 [Radomil](https://commons.wikimedia.org/wiki/User:Radomil), CC BY-SA 3.0),
 [*Iris versicolor*](https://commons.wikimedia.org/w/index.php?curid=248095) (by
@ -137,12 +134,13 @@ method in `learn.datasets.base`. The `load_csv_with_header()` method takes three
 required arguments:

 *   `filename`, which takes the filepath to the CSV file
-*   `target_dtype`, which takes the [`numpy`
-    datatype](http://docs.scipy.org/doc/numpy/user/basics.types.html) of the
-    dataset's target value.
-*   `features_dtype`, which takes the [`numpy`
-    datatype](http://docs.scipy.org/doc/numpy/user/basics.types.html) of the
-    dataset's feature values.
+*   `target_dtype`, which takes the
+    [`numpy` datatype](http://docs.scipy.org/doc/numpy/user/basics.types.html)
+    of the dataset's target value.
+*   `features_dtype`, which takes the
+    [`numpy` datatype](http://docs.scipy.org/doc/numpy/user/basics.types.html)
+    of the dataset's feature values.
+

 Here, the target (the value you're training the model to predict) is flower
 species, which is an integer from 0&ndash;2, so the appropriate `numpy` datatype
@ -164,27 +162,28 @@ test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    features_dtype=np.float32)
 ```

-`Dataset`s in tf.contrib.learn are [named
-tuples](https://docs.python.org/2/library/collections.html#collections.namedtuple);
+`Dataset`s in tf.contrib.learn are
+[named tuples](https://docs.python.org/2/library/collections.html#collections.namedtuple);
 you can access feature data and target values via the `data` and `target`
 fields. Here, `training_set.data` and `training_set.target` contain the feature
 data and target values for the training set, respectively, and `test_set.data`
 and `test_set.target` contain feature data and target values for the test set.

-Later on, in ["Fit the DNNClassifier to the Iris Training
-Data,"](#fit-dnnclassifier) you'll use `training_set.data` and
-`training_set.target` to train your model, and in ["Evaluate Model
-Accuracy,"](#evaluate-accuracy) you'll use `test_set.data` and
+Later on, in
+["Fit the DNNClassifier to the Iris Training Data,"](#fit-dnnclassifier)
+you'll use `training_set.data` and
+`training_set.target` to train your model, and in
+["Evaluate Model Accuracy,"](#evaluate-accuracy) you'll use `test_set.data` and
 `test_set.target`. But first, you'll construct your model in the next section.

 ## Construct a Deep Neural Network Classifier

 tf.contrib.learn offers a variety of predefined models, called
-[`Estimator`s](../../api_docs/python/contrib.learn.md#estimators), which you can
+@{$python/contrib.learn#estimators$`Estimator`s}, which you can
 use "out of the box" to run training and evaluation operations on your data.
 Here, you'll configure a Deep Neural Network Classifier model to fit the Iris
 data. Using tf.contrib.learn, you can instantiate your
-[`DNNClassifier`](../../api_docs/python/contrib.learn.md#DNNClassifier) with
+@{tf.contrib.learn.DNNClassifier} with
 just a couple lines of code:

 ```python
@ -208,22 +207,20 @@ must be set to `4` to hold all the data.
 Then, the code creates a `DNNClassifier` model using the following arguments:

 *   `feature_columns=feature_columns`. The set of feature columns defined above.
-*   `hidden_units=[10, 20, 10]`. Three [hidden
-    layers](http://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw),
+*   `hidden_units=[10, 20, 10]`. Three
+    [hidden layers](http://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw),
    containing 10, 20, and 10 neurons, respectively.
 *   `n_classes=3`. Three target classes, representing the three Iris species.
 *   `model_dir=/tmp/iris_model`. The directory in which TensorFlow will save
    checkpoint data during model training. For more on logging and monitoring
-    with TensorFlow, see [Logging and Monitoring Basics with
-    tf.contrib.learn](../monitors/index.md).
+    with TensorFlow, see @{$monitors$Logging and Monitoring Basics with     tf.contrib.learn}.

 ## Fit the DNNClassifier to the Iris Training Data {#fit-dnnclassifier}

 Now that you've configured your DNN `classifier` model, you can fit it to the
-Iris training data using the
-[`fit`](../../api_docs/python/contrib.learn.md#BaseEstimator.fit) method. Pass
-as arguments your feature data (`training_set.data`), target values
-(`training_set.target`), and the number of steps to train (here, 2000):
+Iris training data using the @{tf.contrib.learn.BaseEstimator.fit$`fit`}
+method. Pass as arguments your feature data (`training_set.data`), target
+values (`training_set.target`), and the number of steps to train (here, 2000):

 ```python
 # Fit model
@ -240,17 +237,16 @@ classifier.fit(x=training_set.data, y=training_set.target, steps=1000)
 ```

 However, if you're looking to track the model while it trains, you'll likely
-want to instead use a TensorFlow
-[`monitor`](https://www.tensorflow.org/code/tensorflow/contrib/learn/python/learn/monitors.py)
-to perform logging operations. See the tutorial [&ldquo;Logging and Monitoring
-Basics with tf.contrib.learn&rdquo;](../monitors/index.md) for more on this
-topic.
+want to instead use a TensorFlow @{tf.contrib.learn.monitors$`monitor`}
+to perform logging operations. See the tutorial
+@{$monitors$&ldquo;Logging and Monitoring Basics with tf.contrib.learn&rdquo;}
+for more on this topic.

 ## Evaluate Model Accuracy {#evaluate-accuracy}

 You've fit your `DNNClassifier` model on the Iris training data; now, you can
 check its accuracy on the Iris test data using the
-[`evaluate`](../../api_docs/python/contrib.learn.md#BaseEstimator.evaluate)
+@{tf.contrib.learn.BaseEstimator.evaluate$`evaluate`}
 method. Like `fit`, `evaluate` takes feature data and target values as
 arguments, and returns a `dict` with the evaluation results. The following code
 passes the Iris test data&mdash;`test_set.data` and `test_set.target`&mdash;to
@ -301,18 +297,17 @@ second sample is *Iris virginica*.

 ## Additional Resources

-*   For further reference materials on tf.contrib.learn, see the official [API
-    docs](../../api_docs/python/contrib.learn.md).
+*   For further reference materials on tf.contrib.learn, see the official @{$python/contrib.learn$API     docs}.

 *   To learn more about using tf.contrib.learn to create linear models, see
-    [Large-scale Linear Models with TensorFlow](../linear/overview.md).
+    @{$linear$Large-scale Linear Models with TensorFlow}.

-*   To build your own Estimator using tf.contrib.learn APIs, check out [Building
-    Machine Learning Estimator in
-    TensorFlow](http://terrytangyuan.github.io/2016/07/08/understand-and-build-tensorflow-estimator/).
+*   To build your own Estimator using tf.contrib.learn APIs, check out
+    [Building Machine Learning Estimator in TensorFlow](http://terrytangyuan.github.io/2016/07/08/understand-and-build-tensorflow-estimator/).

 *   To experiment with neural network modeling and visualization in the browser,
    check out [Deep Playground](http://playground.tensorflow.org/).

-*   For more advanced tutorials on neural networks, see [Convolutional Neural
-    Networks](../deep_cnn/) and [Recurrent Neural Networks](../recurrent/).
+*   For more advanced tutorials on neural networks, see
+    @{$deep_cnn$Convolutional Neural Networks} and
+    @{$recurrent$Recurrent Neural Networks}.
--- a/tensorflow/docs_src/install/index.md
+++ b/tensorflow/docs_src/install/index.md
@ -0,0 +1,10 @@
+# Installing TensorFlow
+
+We have installation instructions for the following platform:
+
+* [Linux](install_linux.md)
+* [Mac OS X](install_mac.md)
+* [Windows](install_windows.md)
+* [From source](install_sources.md)
+
+We also have help for [migrating from previous versions of TensorFlow to v1.0](migration.md).
--- a/tensorflow/docs_src/install/install_linux.md
+++ b/tensorflow/docs_src/install/install_linux.md
@ -0,0 +1,728 @@
+# Installing TensorFlow on Ubuntu
+
+This guide explains how to install TensorFlow on Ubuntu. These instructions
+might also work on other Linux variants, but we have only tested (and we
+only support) these instructions on Ubuntu 14.04 or higher.
+
+
+## Determine which TensorFlow to install
+
+You must choose one of the following types of TensorFlow to install:
+
+  * **TensorFlow with CPU support only**. If your system does not have a
+    NVIDIA® GPU, you must install this version. Note that this version of
+    TensorFlow is typically much easier to install (typically,
+    in 5 or 10 minutes), so even if you have an NVIDIA GPU, we recommend
+    installing this version first.
+  * **TensorFlow with GPU support**. TensorFlow programs typically run
+    significantly faster on a GPU than on a CPU. Therefore, if your
+    system has a NVIDIA® GPU meeting the prerequisites shown below and you
+    need to run performance-critical applications, you should ultimately
+    install this version.
+
+<a name="NVIDIARequirements"></a>
+### NVIDIA requirements to run TensorFlow with GPU support
+
+If you are installing TensorFlow with GPU support using one of the
+mechanisms described in this guide, then the following NVIDIA software
+must be installed on your system:
+
+  * CUDA® Toolkit 8.0. For details, see
+    [NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A).
+    Ensure that you append the relevant Cuda pathnames to the
+    `LD_LIBRARY_PATH` environment variable as described in the
+    NVIDIA documentation.
+  * The NVIDIA drivers associated with CUDA Toolkit 8.0.
+  * cuDNN v5.1. For details, see
+    [NVIDIA's documentation](https://developer.nvidia.com/cudnn).
+    Ensure that you create the `CUDA_HOME` environment variable as
+    described in the NVIDIA documentation.
+  * GPU card with CUDA Compute Capability 3.0 or higher.  See
+    [NVIDIA documentation](https://developer.nvidia.com/cuda-gpus) for
+    a list of supported GPU cards.
+  * The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface.
+    This library provides advanced profiling support. To install this library,
+    issue the following command:
+
+    <pre>
+    $ <b>sudo apt-get install libcupti-dev</b>
+    </pre>
+
+If you have an earlier version of the preceding packages, please upgrade to
+the specified versions. If upgrading is not possible, then you may still run
+TensorFlow with GPU support, but only if you do the following:
+
+  * Install TensorFlow from sources as documented in
+    *Installing TensorFlow from Sources*).
+  * Install or upgrade to at least the following NVIDIA versions:
+    * CUDA toolkit 7.0 or greater
+    * cuDNN v3 or greater
+    * GPU card with CUDA Compute Capability 3.0 or higher.
+
+
+## Determine how to install TensorFlow
+
+You must pick the mechanism by which you install TensorFlow. The
+supported choices are as follows:
+
+  * [virtualenv](#InstallingVirtualenv)
+  * ["native" pip](#InstallingNativePip)
+  * [Docker](#InstallingDocker)
+  * [Anaconda](#InstallingAnaconda)
+
+**We recommend the virtualenv installation.**
+[Virtualenv](https://virtualenv.pypa.io/en/stable/)
+is a virtual Python environment isolated from other Python development,
+incapable of interfering with or being affected by other Python programs
+on the same machine.  During the virtualenv installation process,
+you will install not only TensorFlow but also all the packages that
+TensorFlow requires.  (This is actually pretty easy.)
+To start working with TensorFlow, you simply need to "activate" the
+virtual environment.  All in all, virtualenv provides a safe and
+reliable mechanism for installing and running TensorFlow.
+
+Native pip installs TensorFlow directly on your system without going
+through any container system. **We recommend the native pip install for
+system administrators aiming to make TensorFlow available to everyone on a
+multi-user system.** Since a native pip installation is not walled-off in
+a separate container, the pip installation might interfere with other
+Python-based installations on your system. However, if you understand pip
+and your Python environment, a native pip installation often entails only
+a single command. 
+
+Docker completely isolates the TensorFlow installation
+from pre-existing packages on your machine. The Docker container contains
+TensorFlow and all its dependencies. Note that the Docker image can be quite
+large (hundreds of MBs). You might choose the Docker installation if you are
+incorporating TensorFlow into a larger application architecture that already
+uses Docker.
+
+In Anaconda, you may use conda to create a virtual environment.
+However, within Anaconda, we recommend installing TensorFlow with the
+`pip install` command, not with the `conda install` command.
+
+**NOTE:** The conda package is community supported, not officially supported.
+That is, the TensorFlow team neither tests nor maintains the conda package.
+Use that package at your own risk.
+
+
+<a name="InstallingVirtualenv"></a>
+## Installing with virtualenv
+
+Take the following steps to install TensorFlow with Virtualenv:
+
+  1. Install pip and virtualenv by issuing the following command:
+
+     <pre>$ <b>sudo apt-get install python-pip python-dev python-virtualenv</b> </pre>
+
+  2. Create a virtualenv environment by issuing the following command:
+
+     <pre>$ <b>virtualenv --system-site-packages</b> <i>targetDirectory</i> </pre>
+
+     The <tt><em>targetDirectory</em></tt> specifies the top of the
+     virtualenv tree.  Our instructions assume that <i>targetDirectory</i> 
+     is `~/tensorflow`, but you may choose any directory.
+
+  3. Activate the virtualenv environment by issuing one of the following 
+     commands:
+
+     <pre> $ <b>source ~/tensorflow/bin/activate</b> # If using bash, ksh, sh, or 
+     $ <b>source ~/tensorflow/bin/activate.csh</b>  # If using csh </pre>
+
+     The preceding <tt>source</tt> command should change your prompt
+     to the following:
+
+     <pre> (tensorflow)$ </pre>
+
+  4. Issue one of the following commands to install TensorFlow in the active
+     virtualenv environment:
+
+     <pre> (tensorflow)$ <b>pip install --upgrade tensorflow</b>      # for Python 2.7
+     (tensorflow)$ <b>pip3 install --upgrade tensorflow</b>     # for Python 3.n
+     (tensorflow)$ <b>pip install --upgrade tensorflow-gpu</b>  # for Python 2.7 and GPU
+     (tensorflow)$ <b>pip3 install --upgrade tensorflow-gpu</b> # for Python 3.3 and GPU</pre>
+
+     If the preceding command succeeds, skip Step 5. If the preceding
+     command fails, perform Step 5.
+
+  5. (Optional) If Step 4 failed (typically because you invoked a pip version 
+     lower than 8.1), install TensorFlow in the active virtualenv environment 
+     by issuing a command of the following format:
+
+     <pre> (tensorflow)$ <b>pip install --upgrade</b> <i>TF_PYTHON_URL</i>   # Python 2.7
+     (tensorflow)$ <b>pip3 install --upgrade</b> <i>TF_PYTHON_URL</i>  # Python 3.N </pre>
+
+     where <code><em>TF_PYTHON_URL</em></code> identifies the URL of the
+     TensorFlow Python package. The appropriate value of
+     <code><em>TF_PYTHON_URL</em></code>depends on the operating system,
+     Python version, and GPU support. Find the appropriate value for
+     <code><em>TF_PYTHON_URL</em></code> for your system
+     [here](#TF_PYTHON_URL).  For example, if you are installing TensorFlow
+     for Linux, Python version 3.4, and CPU-only support, issue the following
+     command to install TensorFlow in the active virtualenv environment:
+
+     <pre>(tensorflow)$ <b>pip3 install --upgrade \
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-py3-none-any.whl</b></pre>
+
+If you encounter installation problems, see
+[Common Installation Problems](#CommonInstallationProblems).
+
+
+### Next Steps
+
+After installing TensorFlow,
+[validate the installation](#ValidateYourInstallation).
+
+Note that you must activate the virtualenv environment each time you
+use TensorFlow. If the virtualenv environment is not currently active,
+invoke one of the following commands:
+
+<pre>$ <b>source ~/tensorflow/bin/activate</b>      # bash
+$ <b>source ~/tensorflow/bin/activate.csh</b>  # csh </pre>
+
+When the virtualenv environment is active, you may run
+TensorFlow programs from this shell.  Your prompt will become
+the following to indicate that your tensorflow environment is active:
+
+<pre>(tensorflow)$ </pre>
+
+When you are done using TensorFlow, you may deactivate the
+environment by invoking the `deactivate` function as follows:
+
+<pre>(tensorflow)$ <b>deactivate</b> </pre>
+
+The prompt will revert back to your default prompt (as defined by the
+`PS1` environment variable).
+
+
+### Uninstalling TensorFlow
+
+To uninstall TensorFlow, simply remove the tree you created.
+For example:
+
+<pre>$ <b>rm -r</b> <i>targetDirectory</i> </pre>
+
+
+<a name="InstallingNativePip"></a>
+## Installing with native pip
+
+You may install TensorFlow through pip, choosing between a simple
+installation procedure or a more complex one.
+
+**Note:** The
+[REQUIRED_PACKAGES section of setup.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/pip_package/setup.py)
+lists the TensorFlow packages that pip will install or upgrade.
+
+
+### Prerequisite: Python and Pip
+
+Python is automatically installed on Ubuntu. 
+Take a moment to confirm (by issuing a `python -V`
+command) that one of the following Python versions is already installed
+on your system:
+
+  * Python 2.7
+  * Python 3.3+
+
+The pip or pip3 package manager is *usually* installed on Ubuntu.  
+Take a moment to confirm (by issuing a `pip -V` or `pip3 -V` command)
+that pip or pip3 is installed.  We strongly recommend version 8.1 or higher
+of pip or pip3.
+If Version 8.1 or later is not installed, issue the following command, which
+will either install the latest pip version or upgrade to it: 
+  
+<pre>
+$ <b>sudo apt-get install python-pip python-dev</b>
+</pre>
+
+
+### Install TensorFlow
+
+Assuming the prerequisite software is installed on your Mac,
+take the following steps:
+
+  1. Ensure proper protobuf dependencies by issuing one of the following
+     commands:
+
+     <pre>$ <b>sudo pip uninstall tensorflow</b> # for Python 2.7
+     $ <b>sudo pip3 uninstall tensorflow</b> # for Python 3.n</pre>
+
+  2. Install TensorFlow by invoking **one** of the following commands:
+
+     <pre>$ <b>pip install tensorflow</b>      # Python 2.7; CPU support (no GPU support)
+     $ <b>pip3 install tensorflow</b>     # Python 3.n; CPU support (no GPU support)
+     $ <b>pip install tensorflow-gpu</b>  # Python 2.7;  GPU support
+     $ <b>pip3 install tensorflow-gpu</b> # Python 3.n; GPU support </pre>
+
+     If the preceding command runs to completion, you should now
+     [validate your installation](#ValidateInstallation).
+
+  3. (Optional.) If Step 2 failed, install the latest version of TensorFlow 
+     by issuing a command of the following format:
+
+     <pre>$ <b>sudo pip  install --upgrade</b> <i>TF_BINARY_URL</i>   # Python 2.7
+     $ <b>sudo pip3 install --upgrade</b> <i>TF_BINARY_URL</i>   # Python 3.N </pre>
+
+     where <i>TF_BINARY_URL</i> identifies the URL of the TensorFlow Python
+     package. The appropriate value of <i>TF_BINARY_URL</i> depends on the
+     operating system, Python version, and GPU support. Find the appropriate
+     For example, if you are installing TensorFlow for Linux,
+     Python version 3.4, and CPU-only support, the command to install
+     TensorFlow is as follows:
+
+     <pre>
+     $ <b>sudo pip3 install --upgrade \
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-py3-none-any.whl</b>
+     </pre>
+
+     If this step fails, see
+     [Common Installation Problems](#CommonInstallationProblems).
+
+
+### Next Steps
+
+After installing TensorFlow, [validate your installation](#ValidateYourInstallation).
+
+
+### Uninstalling TensorFlow
+
+To uninstall TensorFlow, issue one of following commands:
+
+<pre>
+$ <b>sudo pip uninstall tensorflow</b>  # for Python 2.7
+$ <b>sudo pip3 uninstall tensorflow</b> # for Python 3.n
+</pre>
+
+
+<a name="InstallingDocker"></a>
+## Installing with Docker
+
+Take the following steps to install TensorFlow through Docker:
+
+  1. Install Docker on your machine as described in the
+     [Docker documentation](http://docs.docker.com/engine/installation/).
+  2. Optionally, create a Docker group to allow launching containers
+     without sudo as described in the
+     [Docker documentation](https://docs.docker.com/engine/installation/linux/ubuntulinux/#/create-a-docker-group).
+     (If you don't do this step, you'll have to use sudo each time
+     you invoke Docker.)
+  3. To install a version of TensorFlow that supports GPUs, you must first
+     install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker), which
+     is stored in github.
+  4. Launch a Docker container that contains one of the
+     [TensorFlow binary images](https://hub.docker.com/r/tensorflow/tensorflow/tags/).
+
+The remainder of this section explains how to launch a Docker container.
+
+
+### CPU-only
+
+To launch a Docker container with CPU-only support (that is, without
+GPU support), enter a command of the following format:
+
+<pre>
+$ docker run -it <i>-p hostPort:containerPort TensorFlowCPUImage</i>
+</pre>
+
+where:
+
+  * <tt><i>-p hostPort:containerPort</i></tt> is optional.
+    If you plan to run TensorFlow programs from the shell, omit this option.
+    If you plan to run TensorFlow programs as Jupyter notebooks, set both
+    <tt><i>hostPort</i></tt> and <tt><i>containerPort</i></tt>
+    to <tt>8888</tt>.  If you'd like to run TensorBoard inside the container,
+    add a second `-p` flag, setting both <i>hostPort</i> and <i>containerPort</i>
+    to 6006.
+  * <tt><i>TensorFlowCPUImage</i></tt> is required. It identifies the Docker
+    container. Specify one of the following values:
+    * <tt>gcr.io/tensorflow/tensorflow</tt>, which is the TensorFlow CPU binary image.
+    * <tt>gcr.io/tensorflow/tensorflow:latest-devel</tt>, which is the latest
+      TensorFlow CPU Binary image plus source code.
+    * <tt>gcr.io/tensorflow/tensorflow:<i>version</i></tt>, which is the
+      specified version (for example, 1.0.0) of TensorFlow CPU binary image.
+    * <tt>gcr.io/tensorflow/tensorflow:<i>version</i>-devel</tt>, which is
+      the specified version (for example, 1.0.0) of the TensorFlow GPU
+      binary image plus source code.
+
+    <tt>gcr.io</tt> is the Google Container Registry. Note that some
+    TensorFlow images are also available at
+    [dockerhub](https://hub.docker.com/r/tensorflow/tensorflow/).
+
+For example, the following command launches the latest TensorFlow CPU binary image
+in a Docker container from which you can run TensorFlow programs in a shell:
+
+<pre>
+$ <b>docker run -it gcr.io/tensorflow/tensorflow bash</b>
+</pre>
+
+The following command also launches the latest TensorFlow CPU binary image in a
+Docker container. However, in this Docker container, you can run TensorFlow
+programs in a Jupyter notebook:
+
+<pre>
+$ <b>docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow</b>
+</pre>
+
+Docker will download the TensorFlow binary image the first time you launch it.
+
+
+### GPU support
+
+Prior to installing TensorFlow with GPU support, ensure that your system meets all
+[NVIDIA software requirements](#NVIDIARequirements).  To launch a Docker container
+with NVidia GPU support, enter a command of the following format:
+
+<pre>
+$ <b>nvidia-docker run -it</b> <i>-p hostPort:containerPort TensorFlowGPUImage</i>
+</pre>
+
+where:
+
+  * <tt><i>-p hostPort:containerPort</i></tt> is optional. If you plan
+    to run TensorFlow programs from the shell, omit this option. If you plan
+    to run TensorFlow programs as Jupyter notebooks, set both
+    <tt><i>hostPort</i></tt> and <code><em>containerPort</em></code> to `8888`.
+  * <i>TensorFlowGPUImage</i> specifies the Docker container. You must
+    specify one of the following values:
+    * <tt>gcr.io/tensorflow/tensorflow:latest-gpu</tt>, which is the latest
+      TensorFlow GPU binary image.
+    * <tt>gcr.io/tensorflow/tensorflow:latest-devel-gpu</tt>, which is
+      the latest TensorFlow GPU Binary image plus source code.
+    * <tt>gcr.io/tensorflow/tensorflow:<i>version</i>-gpu</tt>, which is the
+      specified version (for example, 0.12.1) of the TensorFlow GPU
+      binary image.
+    * <tt>gcr.io/tensorflow/tensorflow:<i>version</i>-devel-gpu</tt>, which is
+      the specified version (for example, 0.12.1) of the TensorFlow GPU
+      binary image plus source code.
+
+We recommend installing one of the `latest` versions. For example, the
+following command launches the latest TensorFlow GPU binary image in a
+Docker container from which you can run TensorFlow programs in a shell:
+
+<pre>
+$ <b>nvidia-docker run -it gcr.io/tensorflow/tensorflow:latest-gpu bash</b>
+</pre>
+
+The following command also launches the latest TensorFlow GPU binary image
+in a Docker container. In this Docker container, you can run TensorFlow
+programs in a Jupyter notebook:
+
+<pre>
+$ <b>nvidia-docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow:latest-gpu</b>
+</pre>
+
+The following command installs an older TensorFlow version (0.12.1):
+
+<pre>
+$ <b>nvidia-docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow:0.12.1-gpu</b>
+</pre>
+
+Docker will download the TensorFlow binary image the first time you launch it.
+For more details see the
+[TensorFlow docker readme](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker).
+
+
+### Next Steps
+
+You should now
+[validate your installation](#ValidateYourInstallation).
+
+
+<a name="InstallingAnaconda"></a>
+## Installing with Anaconda
+
+Take the following steps to install TensorFlow in an Anaconda environment:
+
+  1. Follow the instructions on the
+     [Anaconda download site](https://www.continuum.io/downloads)
+     to download and install Anaconda.
+
+  2. Create a conda environment named <tt>tensorflow</tt> to run a version
+     of Python by invoking the following command:
+
+     <tt>$ <b>conda create -n tensorflow</b> 
+
+  3. Activate the conda environment by issuing the following command:
+
+     <pre> $ <b>source activate tensorflow</b>
+     (tensorflow)$  # Your prompt should change </pre>
+
+  4. Issue a command of the following format to install
+     TensorFlow inside your conda environment:
+
+     <tt>(tensorflow)$ <b>pip install --ignore-installed --upgrade</b> <i>TF_PYTHON_URL</i>  # Python 2.7</tt>
+
+     where <tt><i>TF_PYTHON_URL</i></tt> is the
+     [URL of the TensorFlow Python package](#TF_PYTHON_URL).  For example,
+     the following command installs the CPU-only version of TensorFlow for
+     Python 3.4:
+
+     <pre>
+     (tensorflow)$ <b>pip install --ignore-installed --upgrade \
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-cp34-cp34m-linux_x86_64.whl</b>
+     </pre>
+
+
+<a name="ValidateYourInstallation"></a>
+## Validate your installation
+
+To validate your TensorFlow installation, do the following:
+
+  1. Ensure that your environment is prepared to run TensorFlow programs.
+  2. Run a short TensorFlow program.
+
+
+### Prepare your environment
+
+If you installed on native pip, virtualenv, or Anaconda, then
+do the following:
+
+  1. Start a terminal.
+  2. If you installed with virtualenv or Anaconda, activate your container. 
+  3. If you installed TensorFlow source code, navigate to any
+     directory *except* one containing TensorFlow source code.
+
+If you installed through Docker, start a Docker container
+from which you can run bash. For example:
+
+<pre>
+$ <b>docker run -it gcr.io/tensorflow/tensorflow bash</b>
+</pre>
+
+
+### Run a short TensorFlow program
+
+Invoke python from your shell as follows:
+
+<pre>
+$ <b>python</b>
+</pre>
+
+Then, enter the following short program inside the python interactive shell:
+
+<pre>
+>>> <b>import tensorflow as tf</b>
+>>> <b>hello = tf.constant('Hello, TensorFlow!')</b>
+>>> <b>sess = tf.Session()</b>
+>>> <b>print(sess.run(hello))</b>
+</pre>
+
+If the system outputs the following, then you are ready to begin
+running TensorFlow programs:
+
+<pre>Hello, TensorFlow!</pre>
+
+If you are new to TensorFlow, see @{$get_started}.
+
+If the system outputs an error message instead of a greeting, see
+[Common installation problems](#CommonInstallationProblems).
+
+
+<a name="CommonInstallationProblems"></a>
+## Common installation problems
+
+We are relying on Stack Overflow to document TensorFlow installation problems
+and their remedies.  The following table contains links to Stack Overflow
+answers for some common installation problems.
+If you encounter an error message or other
+installation problem not listed in the following table, search for it
+on Stack Overflow.  If Stack Overflow doesn't show the error message,
+ask a new question about it on Stack Overflow and specify
+the `tensorflow` tag.
+
+<table>
+<tr> <th>Stack Overflow Link</th> <th>Error Message</th> </tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/36159194">36159194</a></td>
+  <td><pre>ImportError: libcudart.so.<i>Version</i>: cannot open shared object file:
+  No such file or directory</pre></td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/41991101">41991101</a></td>
+  <td><pre>ImportError: libcudnn.<i>Version</i>: cannot open shared object file:
+  No such file or directory</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/36371137">36371137</a> and
+  <a href="#Protobuf31">here</a></td>
+  <td><pre>libprotobuf ERROR google/protobuf/src/google/protobuf/io/coded_stream.cc:207] A
+  protocol message was rejected because it was too big (more than 67108864 bytes).
+  To increase the limit (or to disable these warnings), see
+  CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.</pre></td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/35252888">35252888</a></td>
+  <td><pre>Error importing tensorflow. Unless you are using bazel, you should
+  not try to import tensorflow from its source directory; please exit the
+  tensorflow source tree, and relaunch your python interpreter from
+  there.</pre></td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/33623453">33623453</a></td>
+  <td><pre>IOError: [Errno 2] No such file or directory:
+  '/tmp/pip-o6Tpui-build/setup.py'</tt></pre>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42006320">42006320</a></td>
+  <td><pre>ImportError: Traceback (most recent call last):
+File ".../tensorflow/core/framework/graph_pb2.py", line 6, in <module>
+from google.protobuf import descriptor as _descriptor
+ImportError: cannot import name 'descriptor'</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/questions/35190574">35190574</a> </td>
+  <td><pre>SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
+  failed</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42009190">42009190</a></td>
+  <td><pre>
+  Installing collected packages: setuptools, protobuf, wheel, numpy, tensorflow
+  Found existing installation: setuptools 1.1.6
+  Uninstalling setuptools-1.1.6:
+  Exception:
+  ...
+  [Errno 1] Operation not permitted:
+  '/tmp/pip-a1DXRT-uninstall/.../lib/python/_markerlib' </pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/questions/36933958">36933958</a></td>
+  <td><pre>
+  ...
+  Installing collected packages: setuptools, protobuf, wheel, numpy, tensorflow
+  Found existing installation: setuptools 1.1.6
+  Uninstalling setuptools-1.1.6:
+  Exception:
+  ...
+  [Errno 1] Operation not permitted:
+  '/tmp/pip-a1DXRT-uninstall/System/Library/Frameworks/Python.framework/
+   Versions/2.7/Extras/lib/python/_markerlib'</pre>
+  </td>
+</tr>
+
+</table>
+
+
+<a name="TF_PYTHON_URL"></a>
+## The URL of the TensorFlow Python package
+
+A few installation mechanisms require the URL of the TensorFlow Python package.
+The value you specify depends on three factors:
+
+  * operating system
+  * Python version
+  * CPU only vs. GPU support
+
+This section documents the relevant values for Linux installations.
+
+
+### Python 2.7
+
+CPU only:
+
+<pre>
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-cp27-none-linux_x86_64.whl
+</pre>
+
+
+GPU support:
+
+<pre>
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0-cp27-none-linux_x86_64.whl
+</pre>
+
+Note that GPU support requires the NVIDIA hardware and software described in
+[NVIDIA requirements to run TensorFlow with GPU support](#NVIDIARequirements).
+
+
+### Python 3.4
+
+CPU only:
+
+<pre>
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-cp34-cp34m-linux_x86_64.whl
+</pre>
+
+
+GPU support:
+
+<pre>
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0-cp34-cp34m-linux_x86_64.whl
+</pre>
+
+Note that GPU support requires the NVIDIA hardware and software described in
+[NVIDIA requirements to run TensorFlow with GPU support](#NVIDIARequirements).
+
+
+### Python 3.5
+
+CPU only:
+
+<pre>
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0-cp35-cp35m-linux_x86_64.whl
+</pre>
+
+
+GPU support:
+
+<pre>
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0-cp35-cp35m-linux_x86_64.whl
+</pre>
+
+
+Note that GPU support requires the NVIDIA hardware and software described in
+[NVIDIA requirements to run TensorFlow with GPU support](#NVIDIARequirements).
+
+
+<a name="Protobuf31"></a>
+### Protobuf pip package 3.1
+
+You can skip this section unless you are seeing problems related
+to the protobuf pip package.
+
+**NOTE:** If your TensorFlow programs are running slowly, you might
+have a problem related to the protobuf pip package.
+
+The TensorFlow pip package depends on protobuf pip package version 3.1. The
+protobuf pip package downloaded from PyPI (when invoking
+<tt>pip install protobuf</tt>) is a Python-only library containing
+Python implementations of proto serialization/deserialization that can run
+**10x-50x slower** than the C++ implementation. Protobuf also supports a
+binary extension for the Python package that contains fast
+C++ based proto parsing.  This extension is not available in the
+standard Python-only pip package.  We have created a custom binary
+pip package for protobuf that contains the binary extension. To install
+the custom binary protobuf pip package, invoke one of the following commands:
+
+  * for Python 2.7:
+
+  <pre>
+  $ <b>pip install --upgrade \
+  https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.1.0-cp27-none-linux_x86_64.whl</b>
+  </pre>
+
+  * for Python 3.5:
+
+  <pre>
+  $ pip3 install --upgrade \
+  https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.1.0-cp35-none-linux_x86_64.whl
+  </pre>
+
+Installing this protobuf package will overwrite the existing protobuf package.
+Note that the binary pip package already has support for protobufs
+larger than 64MB, which should fix errors such as these:
+
+<pre>[libprotobuf ERROR google/protobuf/src/google/protobuf/io/coded_stream.cc:207]
+A protocol message was rejected because it was too big (more than 67108864 bytes).
+To increase the limit (or to disable these warnings), see
+CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.</pre>
--- a/tensorflow/docs_src/install/install_mac.md
+++ b/tensorflow/docs_src/install/install_mac.md
@ -0,0 +1,699 @@
+# Installing TensorFlow on Mac OS X
+
+This guide explains how to install TensorFlow on Mac OS X.
+
+## Determine which TensorFlow to install
+
+You must choose the type of TensorFlow to install.  Your choices are as follows:
+
+  * **TensorFlow with CPU support only**. If your system does not have a
+    NVIDIA CUDA® GPU, you should install this version. Note that TensorFlow
+    with CPU support is typically easier to install than TensorFlow with
+    GPU support. Therefore, even if you have an NVIDIA CUDA GPU, we recommend
+    installing this version first as a diagnostic step just in case you run
+    into problems installing TensorFlow with GPU support.
+  * **TensorFlow with GPU support**. TensorFlow programs typically run
+    significantly faster on a GPU than on a CPU. Therefore, if your system has
+    a NVIDIA CUDA GPU meeting the prerequisites shown below and you need
+    to run performance-critical applications, you should ultimately
+    install this version.
+
+
+### Requirements to run TensorFlow with GPU support
+
+If you are installing TensorFlow with GPU support using one of the mechanisms
+described in this guide, then the following NVIDIA software must be
+installed on your system:
+
+
+  * CUDA Toolkit 8.0. For details, see
+    [NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-mac-os-x).
+    Ensure that you append the relevant CUDA pathnames to the
+    `LD_LIBRARY_PATH` environment variable as described in the
+    NVIDIA documentation.
+  * The NVIDIA drivers associated with CUDA Toolkit 8.0.
+  * cuDNN v5.1. For details, see
+    [NVIDIA's documentation](https://developer.nvidia.com/cudnn).
+    Ensure that you create the `CUDA_HOME` environment variable as described in
+    the NVIDIA documentation.
+  * GPU card with CUDA Compute Capability 3.0 or higher.  See
+    [NVIDIA documentation](https://developer.nvidia.com/cuda-gpus)
+    for a list of supported GPU cards.
+
+If you have an earlier version of the preceding packages, please upgrade to
+the specified versions. If upgrading is not possible, you may still run
+TensorFlow with GPU support, but only if you do both of the following:
+
+  * Install TensorFlow from sources (as described in
+    [Installing TensorFlow from Sources](install_sources.md).
+  * Install or upgrade to at least the following NVIDIA versions:
+    * CUDA toolkit 7.0 or greater
+    * cuDNN v3 or greater
+    * GPU card with CUDA Compute Capability 3.0 or higher.
+
+
+## Determine how to install TensorFlow
+
+You must pick the mechanism by which you install TensorFlow. The supported choices are as follows:
+
+  * virtualenv
+  * "native" pip
+  * Docker
+  * installing from sources, which is for experts and is documented in
+    a separate guide.
+
+**We recommend the virtualenv installation.**
+[Virtualenv](https://virtualenv.pypa.io/en/stable/)
+is a virtual Python environment isolated from other Python development,
+incapable of interfering with or being affected by other Python programs
+on the same machine.  During the virtualenv installation process,
+you will install not only TensorFlow but also all the packages that
+TensorFlow requires.  (This is actually pretty easy.)
+To start working with TensorFlow, you simply need to "activate" the
+virtual environment.  All in all, virtualenv provides a safe and
+reliable mechanism for installing and running TensorFlow.
+
+Native pip installs TensorFlow directly on your system without going through
+any container or virtual environment system. Since a native pip installation
+is not walled-off, the pip installation might interfere with or be influenced
+by other Python-based installations on your system. Furthermore, you might need
+to disable System Integrity Protection (SIP) in order to install through native
+pip.  However, if you understand SIP, pip, and your Python environment, a
+native pip installation is relatively easy to perform.
+
+[Docker](http://docker.com/) completely isolates the TensorFlow installation
+from pre-existing packages on your machine. The Docker container contains
+TensorFlow and all its dependencies. Note that the Docker image can be quite
+large (hundreds of MBs). You might choose the Docker installation if you are
+incorporating TensorFlow into a larger application architecture that
+already uses Docker.
+
+Important: Docker currently does not support TensorFlow with GPU support
+on Mac OS; that is, on Mac OS, Docker only supports TensorFlow with
+CPU support.
+
+In Anaconda, you may use conda to create a virtual environment.
+However, within Anaconda, we recommend installing TensorFlow with the
+`pip install` command, not with the `conda install` command.
+
+**NOTE:** The conda package is community supported, not officially supported.
+That is, the TensorFlow team neither tests nor maintains the conda package.
+Use that package at your own risk.
+
+## Installing with virtualenv
+
+Take the following steps to install TensorFlow with Virtualenv:
+
+  1. Start a terminal (a shell). You'll perform all subsequent steps
+     in this shell.
+
+  2. Install pip and virtualenv by issuing the following command:
+
+     <pre>
+     $ <b>sudo easy_install pip</b>
+     $ <b>sudo pip install --upgrade virtualenv</b>
+     </pre>
+
+  3. Create a virtualenv environment by issuing a command of the
+     following format:
+
+     <pre>
+     $ <b>virtualenv --system-site-packages <i>targetDirectory</b>
+     </pre>
+
+     The <i>targetDirectory</i> identifies the top of the virtualenv tree.
+     Our instructions assume that <i>targetDirectory</i>
+     is `~/tensorflow`, but you may choose any directory.
+
+  4. Activate the virtualenv environment by issuing one of the
+     following commands:
+
+     <pre>
+     $ <b>source ~/tensorflow/bin/activate</b>      # If using bash, sh, ksh, or zsh
+     $ <b>source ~/tensorflow/bin/activate.csh</b>  # If using csh or tcsh
+     </pre>
+
+     The preceding `source` command should change your prompt to the following:
+
+     <pre>
+     (tensorflow)$
+     </pre>
+
+  5. If pip version 8.1 or later is installed on your system, issue one of
+     the following commands to install TensorFlow and all the packages that
+     TensorFlow requires into the active Virtualenv environment:
+
+     <pre>
+     $ <b>pip install --upgrade tensorflow</b>      # for Python 2.7
+     $ <b>pip3 install --upgrade tensorflow</b>     # for Python 3.n
+     $ <b>pip install --upgrade tensorflow-gpu</b>  # for Python 2.7 and GPU
+     $ <b>pip3 install --upgrade tensorflow-gpu</b> # for Python 3.n and GPU
+     </pre>
+
+     If the preceding command succeed, skip Step 6. If it failed,
+     perform Step 6.
+
+  6. Optional. If Step 5 failed (typically because you invoked a pip version
+     lower than 8.1), install TensorFlow in the active
+     virtualenv environment by issuing a command of the following format:
+
+     <pre>
+     $ <b>pip install --upgrade</b> <i>TF_BINARY_URL</i>   # Python 2.7
+     $ <b>pip3 install --upgrade</b> <i>TF_BINARY_URL</i>  # Python 3.N
+     </pre>
+
+     where <i>TF_BINARY_URL</i> identifies the URL
+     of the TensorFlow Python package. The appropriate value of
+     <i>TF_BINARY_URL</i> depends on the operating system,
+     Python version, and GPU support. Find the appropriate value for
+     <i>TF_BINARY_URL</i> for your system
+     [here](#TF_BINARY_URL).
+     For example, if you are installing TensorFlow for Mac OS X,
+     Python version 3.4, and CPU-only support, the command to install
+     TensorFlow in the active Virtualenv is as follows:
+
+     <pre>
+     $ <b>pip3 install --upgrade \
+     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.0.0-py3-none-any.whl</b>
+     </pre>
+
+If you encounter installation problems, see
+[Common Installation Problems](#CommonInstallationProblems).
+
+
+### Next Steps
+
+After installing TensorFlow,
+[validate your installation](#ValidateInstallation)
+to confirm that the installation worked properly.
+
+Note that you must activate the virtualenv environment each time you
+use TensorFlow in a new shell.  If the virtualenv environment is not
+currently active (that is, the prompt is not `(tensorflow)`, invoke
+one of the following commands:
+
+<pre>
+$ <b>source ~/tensorflow/bin/activate</b>      # bash, sh, ksh, or zsh
+$ <b>source ~/tensorflow/bin/activate.csh</b>  # csh or tcsh
+</pre>
+
+Your prompt will transform to the following to indicate that your
+tensorflow environment is active:
+
+<pre>
+(tensorflow)$
+</pre>
+
+When the virtualenv environment is active, you may run
+TensorFlow programs from this shell.
+
+When you are done using TensorFlow, you may deactivate the
+environment by issuing the following command:
+
+<pre>
+(tensorflow)$ <b>deactivate</b>
+</pre>
+
+The prompt will revert back to your default prompt (as defined by `PS1`).
+
+
+### Uninstalling TensorFlow
+
+If you want to uninstall TensorFlow, simply remove the tree you created. For example:
+
+<pre>
+$ <b>rm -r ~/tensorflow</b>
+</pre>
+
+
+## Installing with native pip
+
+We have uploaded the TensorFlow binaries to PyPI.
+Therefore, you can install TensorFlow through pip.
+
+The
+[REQUIRED_PACKAGES section of setup.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/pip_package/setup.py)
+lists the packages that pip will install or upgrade.
+
+
+### Prerequisite: Python
+
+In order to install TensorFlow, your system must contain one of the following Python versions:
+
+  * Python 2.7
+  * Python 3.3+
+
+If your system does not already have one of the preceding Python versions,
+[install](https://wiki.python.org/moin/BeginnersGuide/Download) it now.
+
+When installing Python, you might need to disable
+System Integrity Protection (SIP) to permit any entity other than
+Mac App Store to install software.
+
+
+### Prerequisite: pip
+
+[Pip](https://en.wikipedia.org/wiki/Pip_(package_manager)) installs
+and manages software packages written in Python. If you intend to install
+with native pip, then one of the following flavors of pip must be
+installed on your system:
+
+  * `pip`, for Python 2.7
+  * `pip3`, for Python 3.n.
+
+`pip` or `pip3` was probably installed on your system when you
+installed Python.  To determine whether pip or pip3 is actually
+installed on your system, issue one of the following commands:
+
+<pre>$ <b>pip -V</b>  # for Python 2.7
+$ <b>pip3 -V</b> # for Python 3.n </pre>
+
+We strongly recommend pip or pip3 version 8.1 or higher in order
+to install TensorFlow.  If pip or pip3 8.1 or later is not
+installed, issue the following commands to install or upgrade:
+
+<pre>$ <b>sudo easy_install --upgrade pip</b>
+$ <b>sudo easy_install --upgrade six</b> </pre>
+
+
+### Install TensorFlow
+
+Assuming the prerequisite software is installed on your Mac,
+take the following steps:
+
+  1. Ensure proper protobuf dependencies by issuing one of the following
+     commands:
+
+     <pre>$ <b>sudo pip uninstall tensorflow</b> # for Python 2.7
+     $ <b>sudo pip3 uninstall tensorflow</b> # for Python 3.n</pre>
+
+  2. Install TensorFlow by invoking **one** of the following commands:
+
+     <pre>$ <b>pip install tensorflow</b>      # Python 2.7; CPU support (no GPU support)
+     $ <b>pip3 install tensorflow</b>     # Python 3.n; CPU support (no GPU support)
+     $ <b>pip install tensorflow-gpu</b>  # Python 2.7;  GPU support
+     $ <b>pip3 install tensorflow-gpu</b> # Python 3.n; GPU support </pre>
+
+     If the preceding command runs to completion, you should now
+     [validate your installation](#ValidateInstallation).
+
+  3. (Optional.) If Step 2 failed, install the latest version of TensorFlow
+     by issuing a command of the following format:
+
+     <pre>$ <b>sudo pip  install --upgrade</b> <i>TF_BINARY_URL</i>   # Python 2.7
+     $ <b>sudo pip3 install --upgrade</b> <i>TF_BINARY_URL</i>   # Python 3.N </pre>
+
+     where <i>TF_BINARY_URL</i> identifies the URL of the TensorFlow Python
+     package. The appropriate value of <i>TF_BINARY_URL</i> depends on the
+     operating system, Python version, and GPU support. Find the appropriate
+     value for <i>TF_BINARY_URL</i> [here](#TF_BINARY_URL).  For example, if
+     you are installing TensorFlow for Mac OS, Python version 3.4, and CPU-only
+     support, issue the following command:
+
+     <pre>
+     $ <b>sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.0.0-py3-none-any.whl</b>
+     </pre>
+
+     If the preceding command fails, see
+     [Common installation problems](#CommonInstallationProblems).
+
+
+
+### Next Steps
+
+After installing TensorFlow,
+[validate your installation](#ValidateYourInstallation)
+to confirm that the installation worked properly.
+
+
+### Uninstalling TensorFlow
+
+To uninstall TensorFlow, issue one of following commands:
+
+<pre>$ <b>pip uninstall tensorflow</b>
+$ <b>pip3 uninstall tensorflow</b> </pre>
+
+
+## Installing with Docker
+
+Follow these steps to install TensorFlow through Docker.
+
+  1. Install Docker on your machine as described in the
+     [Docker documentation](https://docs.docker.com/engine/installation/#/on-macos-and-windows).
+
+  2. Launch a Docker container that contains one of the TensorFlow
+     binary images.
+
+The remainder of this section explains how to launch a Docker container.
+
+**Note**: You can only launch a Docker container with CPU support.
+(You cannot launch a Docker container with GPU support.)
+
+To launch a Docker container that holds the TensorFlow binary image,
+enter a command of the following format:
+
+<pre>
+$ <b>docker run -it <i>-p hostPort:containerPort</i> TensorFlowImage</b>
+</pre>
+
+where:
+
+  * <i>-p hostPort:containerPort</i> is optional. If you'd like to run
+    TensorFlow programs from the shell, omit this option. If you'd like
+    to run TensorFlow programs from Jupyter notebook,  set both
+    <i>hostPort</i> and <i>containerPort</i> to <code>8888</code>.
+    If you'd like to run TensorBoard inside the container, add
+    a second `-p` flag, setting both <i>hostPort</i> and <i>containerPort</i>
+    to 6006.
+  * <i>TensorFlowImage</i> is required. It identifies the Docker container.
+    You must specify one of the following values:
+    * <code>gcr.io/tensorflow/tensorflow</code>: TensorFlow binary image.
+    * <code>gcr.io/tensorflow/tensorflow:latest-devel</code>: TensorFlow
+      Binary image plus source code.
+
+<code>gcr.io</code> is the Google Container Registry. Note that some
+TensorFlow images are also available at
+[dockerhub](https://hub.docker.com/r/tensorflow/tensorflow/).
+
+For example, the following command launches a TensorFlow CPU binary image
+in a Docker container from which you can run TensorFlow programs in a shell:
+
+<pre>$ <b>docker run -it gcr.io/tensorflow/tensorflow bash</b></pre>
+
+The following command also launches a TensorFlow CPU binary image in a
+Docker container. However, in this Docker container, you can run
+TensorFlow programs in a Jupyter notebook:
+
+<pre>$ <b>docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow</b></pre>
+
+Docker will download the TensorFlow binary image the first time you launch it.
+
+
+### Next Steps
+
+You should now
+[validate your installation](#ValidateInstallation).
+
+
+## Installing with Anaconda
+
+**The Anaconda installation is community supported, not officially supported.**
+
+Take the following steps to install TensorFlow in an Anaconda environment:
+
+  1. Follow the instructions on the
+     [Anaconda download site](https://www.continuum.io/downloads)
+     to download and install Anaconda.
+
+  2. Create a conda environment named `tensorflow`
+     by invoking the following command:
+
+     <pre>$ <b>conda create -n tensorflow</b></pre>
+
+  3. Activate the conda environment by issuing the following command:
+
+     <pre>$ <b>source activate tensorflow</b>
+     (tensorflow)$  # Your prompt should change</pre>
+
+  4. Issue a command of the following format to install
+     TensorFlow inside your conda environment:
+
+     <pre>(tensorflow)<b>$ pip install --ignore-installed --upgrade $TF_PYTHON_URL</b> 
+
+     where `TF_PYTHON_URL` is the
+     [URL of the TensorFlow Python package](#TF_PYTHON_URL).
+     For example, the following command installs the CPU-only version of
+     TensorFlow for Python 3.4:
+
+     <pre>
+     (tensorflow)$ <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.0.0-py3-none-any.whl</b>
+     </pre>
+
+
+<a name="ValidateInstallation"></a>
+## Validate your installation
+
+To validate your TensorFlow installation, do the following:
+
+  1. Ensure that your environment is prepared to run TensorFlow programs.
+  2. Run a short TensorFlow program.
+
+
+### Prepare your environment
+
+If you installed on native pip, virtualenv, or Anaconda, then
+do the following:
+
+  1. Start a terminal.
+  2. If you installed with virtualenv or Anaconda, activate your container.
+  3. If you installed TensorFlow source code, navigate to any
+     directory *except* one containing TensorFlow source code.
+
+If you installed through Docker, start a Docker container that runs bash.
+For example:
+
+<pre>$ <b>docker run -it gcr.io/tensorflow/tensorflow bash</b></pre>
+
+
+
+### Run a short TensorFlow program
+
+Invoke python from your shell as follows:
+
+<pre>$ <b>python</b></pre>
+
+Enter the following short program inside the python interactive shell:
+
+<pre>>>> <b>import tensorflow as tf</b>
+>>> <b>hello = tf.constant('Hello, TensorFlow!')</b>
+>>> <b>sess = tf.Session()</b>
+>>> <b>print(sess.run(hello))</b></pre>
+
+If the system outputs the following, then you are ready to begin
+writing TensorFlow programs:
+
+<pre>Hello, TensorFlow!</pre>
+
+If you are new to TensorFlow, see
+@{$get_started$Getting Started with TensorFlow}.
+
+If the system outputs an error message instead of a greeting, see
+[Common installation problems](#CommonInstallationProblems).
+
+
+<a name="CommonInstallationProblems"></a>
+## Common installation problems
+
+We are relying on Stack Overflow to document TensorFlow installation problems
+and their remedies.  The following table contains links to Stack Overflow
+answers for some common installation problems.
+If you encounter an error message or other
+installation problem not listed in the following table, search for it
+on Stack Overflow.  If Stack Overflow doesn't show the error message,
+ask a new question about it on Stack Overflow and specify
+the `tensorflow` tag.
+
+<table>
+<tr> <th>Stack Overflow Link</th> <th>Error Message</th> </tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/36159194">36159194</a></td>
+  <td><pre>ImportError: libcudart.so.<i>Version</i>: cannot open shared object file:
+  No such file or directory</pre></td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/41991101">41991101</a></td>
+  <td><pre>ImportError: libcudnn.<i>Version</i>: cannot open shared object file:
+  No such file or directory</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42006320">42006320</a></td>
+  <td><pre>ImportError: Traceback (most recent call last):
+File ".../tensorflow/core/framework/graph_pb2.py", line 6, in <module>
+from google.protobuf import descriptor as _descriptor
+ImportError: cannot import name 'descriptor'</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/33623453">33623453</a></td>
+  <td><pre>IOError: [Errno 2] No such file or directory:
+  '/tmp/pip-o6Tpui-build/setup.py'</tt></pre>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/questions/35190574">35190574</a> </td>
+  <td><pre>SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify
+  failed</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42009190">42009190</a></td>
+  <td><pre>
+  Installing collected packages: setuptools, protobuf, wheel, numpy, tensorflow
+  Found existing installation: setuptools 1.1.6
+  Uninstalling setuptools-1.1.6:
+  Exception:
+  ...
+  [Errno 1] Operation not permitted:
+  '/tmp/pip-a1DXRT-uninstall/.../lib/python/_markerlib' </pre></td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/33622019">33622019</a></td>
+  <td><pre>ImportError: No module named copyreg</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/37810228">37810228</a></td>
+  <td>During a `pip install` operation, the system returns:
+  <pre>OSError: [Errno 1] Operation not permitted</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/33622842">33622842</a></td>
+  <td>An <tt>import tensorflow` statement triggers an error such as the
+  following:<pre>Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "/usr/local/lib/python2.7/site-packages/tensorflow/__init__.py",
+    line 4, in <module>
+    from tensorflow.python import *
+    ...
+  File "/usr/local/lib/python2.7/site-packages/tensorflow/core/framework/tensor_shape_pb2.py",
+    line 22, in <module>
+    serialized_pb=_b('\n,tensorflow/core/framework/tensor_shape.proto\x12\ntensorflow\"d\n\x10TensorShapeProto\x12-\n\x03\x64im\x18\x02
+      \x03(\x0b\x32
+      .tensorflow.TensorShapeProto.Dim\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01
+      \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tb\x06proto3')
+  TypeError: __init__() got an unexpected keyword argument 'syntax'</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42073336">42073336</a></td>
+  <td>An `import tensorflow` statement triggers the following error:
+<pre>
+>>> import tensorflow as tf
+I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
+I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
+I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
+"import tensorflow" terminated by signal SIGSEGV (Address boundary error)
+</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42075397">42075397</a></td>
+  <td>A `pip install` command triggers the following error:
+<pre>...<lots of warnings and errors>
+You have not agreed to the Xcode license agreements, please run
+'xcodebuild -license' (for user-level acceptance) or
+'sudo xcodebuild -license' (for system-wide acceptance) from within a
+Terminal window to review and agree to the Xcode license agreements.
+...<more stack trace output>
+  File "numpy/core/setup.py", line 653, in get_mathlib_info
+
+    raise RuntimeError("Broken toolchain: cannot link a simple C program")
+
+RuntimeError: Broken toolchain: cannot link a simple C program</pre>
+</td>
+</tr>
+
+</table>
+
+
+
+
+<a name="TF_PYTHON_URL"></a>
+## The URL of the TensorFlow Python package
+
+A few installation mechanisms require the URL of the TensorFlow Python package.
+The value you specify depends on three factors:
+
+  * operating system
+  * Python version
+  * CPU only vs. GPU support
+
+This section documents the relevant values for Mac OS installations.
+
+### Python 2.7
+
+CPU only:
+
+<pre>
+https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.0.0-py2-none-any.whl
+</pre>
+
+GPU support:
+
+<pre>
+https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-1.0.0-py2-none-any.whl
+</pre>
+
+Requires CUDA toolkit 8.0 and CuDNN v5. For other versions, see
+[Installing TensorFlow from Sources](install_sources.md).
+
+
+### Python 3.4 or 3.5
+
+CPU only:
+
+<pre>
+https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.0.0-py3-none-any.whl
+</pre>
+
+GPU support:
+
+<pre>
+https://storage.googleapis.com/tensorflow/mac/gpu/tensorflow_gpu-1.0.0-py3-none-any.whl
+</pre>
+
+Requires CUDA toolkit 8.0 and CuDNN v5. For other versions, see
+[Installing TensorFlow from Sources](install_sources.md).
+
+
+
+<a name="Protobuf31"></a>
+### Protobuf pip package 3.1
+
+You can skip this section unless you are seeing problems related
+to the protobuf pip package.
+
+**NOTE:** If your TensorFlow programs are running slowly, you might
+have a problem related to the protobuf pip package.
+
+The TensorFlow pip package depends on protobuf pip package version 3.1. The
+protobuf pip package downloaded from PyPI (when invoking
+<tt>pip install protobuf</tt>) is a Python-only library containing
+Python implementations of proto serialization/deserialization that can run
+**10x-50x slower** than the C++ implementation. Protobuf also supports a
+binary extension for the Python package that contains fast
+C++ based proto parsing.  This extension is not available in the
+standard Python-only pip package.  We have created a custom binary
+pip package for protobuf that contains the binary extension. To install
+the custom binary protobuf pip package, invoke one of the following commands:
+
+  * for Python 2.7:
+
+  <pre>
+  $ <b>pip install --upgrade \
+  https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.1.0-cp27-none-linux_x86_64.whl</b>
+  </pre>
+
+  * for Python 3.5:
+
+  <pre>
+  $ pip3 install --upgrade \
+  https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.1.0-cp35-none-linux_x86_64.whl
+  </pre>
+
+Installing this protobuf package will overwrite the existing protobuf package.
+Note that the binary pip package already has support for protobufs
+larger than 64MB, which should fix errors such as these:
+
+<pre>[libprotobuf ERROR google/protobuf/src/google/protobuf/io/coded_stream.cc:207]
+A protocol message was rejected because it was too big (more than 67108864 bytes).
+To increase the limit (or to disable these warnings), see
+CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.</pre>
--- a/tensorflow/docs_src/install/install_sources.md
+++ b/tensorflow/docs_src/install/install_sources.md
@ -0,0 +1,396 @@
+# Installing TensorFlow from Sources
+
+This guide explains how to build TensorFlow sources into a TensorFlow
+binary and how to install that TensorFlow binary.  Note that we provide
+well-tested, pre-built TensorFlow binaries for Linux, Mac, and Windows
+systems. In addition, there are pre-built TensorFlow
+[docker images](https://hub.docker.com/r/tensorflow/tensorflow/).
+So, don't build a TensorFlow binary yourself unless you are very
+comfortable building complex packages from source and dealing with
+the inevitable aftermath should things not go exactly as documented.
+
+If the last paragraph didn't scare you off, welcome.  This guide explains
+how to build TensorFlow on the following operating systems:
+
+*   Ubuntu
+*   Mac OS X
+
+We don't officially support building TensorFlow on Windows; however, you may try
+to build TensorFlow on Windows if you don't mind using the highly experimental
+[Bazel on Windows](https://bazel.build/versions/master/docs/windows.html)
+or
+[TensorFlow CMake build](https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake).
+
+
+## Determine which TensorFlow to install
+
+You must choose one of the following types of TensorFlow to build and
+install:
+
+* **TensorFlow with CPU support only**. If your system does not have a
+  NVIDIA® GPU, build and install this version. Note that this version of
+  TensorFlow is typically easier to build and install, so even if you
+  have an NVIDIA GPU, we recommend building and installing this version
+  first.
+* **TensorFlow with GPU support**. TensorFlow programs typically run
+  significantly faster on a GPU than on a CPU. Therefore, if your system
+  has a NVIDIA GPU and you need to run performance-critical applications,
+  you should ultimately build and install this version.
+  Beyond the NVIDIA GPU itself, your system must also fulfill the NVIDIA
+  software requirements described in one of the following documents:
+
+  * @{$install_linux#NVIDIARequirements$Installing TensorFlow on Ubuntu}
+  * @{$install_mac#NVIDIARequirements$Installing TensorFlow on Mac OS}
+
+
+## Clone the TensorFlow repository
+
+Start the process of building TensorFlow by cloning a TensorFlow
+repository.
+
+To clone **the latest** TensorFlow repository, issue the following command:
+
+<pre>$ <b>git clone https://github.com/tensorflow/tensorflow</b> </pre>
+
+The preceding <code>git clone</code> command creates a subdirectory
+named `tensorflow`.  After cloning, you may optionally build a
+**specific branch** (such as a release branch) by invoking the
+following commands:
+
+<pre>
+$ <b>cd tensorflow</b>
+$ <b>git checkout</b> <i>Branch</i> # where <i>Branch</i> is the desired branch
+</pre>
+
+For example, to work with the `r1.0` release instead of the master release,
+issue the following command:
+
+<pre>$ <b>git checkout r1.0</b></pre>
+
+Next, you must prepare your environment for
+[Linux](#PrepareLinux)
+or
+[Mac OS](#PrepareMac)
+
+
+<a name="#PrepareLinux"></a>
+## Prepare environment for Linux
+
+Before building TensorFlow on Linux, install the following build
+tools on your system:
+
+  * bazel
+  * TensorFlow Python dependencies
+  * optionally, NVIDIA packages to support TensorFlow for GPU.
+
+
+### Install Bazel
+
+If bazel is not installed on your system, install it now by following
+[these directions](https://bazel.build/versions/master/docs/install.html).
+
+
+### Install TensorFlow Python dependencies
+
+To install TensorFlow, you must install the following packages:
+
+  * `numpy`, which is a numerical processing package that TensorFlow requires.
+  * `dev`, which enables adding extensions to Python.
+  * `pip`, which enables you to install and manage certain Python packages.
+  * `wheel`, which enables you to manage Python compressed packages in
+    the wheel (.whl) format.
+
+To install these packages for Python 2.7, issue the following command:
+
+<pre>
+$ <b>sudo apt-get install python-numpy python-dev python-pip python-wheel</b>
+</pre>
+
+To install these packages for Python 3.n, issue the following command:
+
+<pre>
+$ <b>sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel</b>
+</pre>
+
+
+### Optional: install TensorFlow for GPU prerequisites
+
+If you are building TensorFlow without GPU support, skip this section.
+
+The following NVIDIA <i>hardware</i> must be installed on your system:
+
+  * GPU card with CUDA Compute Capability 3.0 or higher.  See
+    [NVIDIA documentation](https://developer.nvidia.com/cuda-gpus)
+    for a list of supported GPU cards.
+
+The following NVIDIA <i>software</i> must be installed on your system:
+
+  * NVIDIA's Cuda Toolkit (>= 7.0). We recommend version 8.0.
+    For details, see
+    [NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A).
+    Ensure that you append the relevant Cuda pathnames to the
+    `LD_LIBRARY_PATH` environment variable as described in the
+    NVIDIA documentation.
+  * The NVIDIA drivers associated with NVIDIA's Cuda Toolkit.
+  * cuDNN (>= v3). We recommend version 5.1. For details, see
+    [NVIDIA's documentation](https://developer.nvidia.com/cudnn),
+    particularly the description of appending the appropriate pathname
+    to your `LD_LIBRARY_PATH` environment variable.
+
+Finally, you must also install `libcupti-dev` by invoking the following
+command:
+
+<pre> $ <b>sudo apt-get install libcupti-dev</b> </pre>
+
+
+### Next
+
+After preparing the environment, you must now
+[configure the installation](#ConfigureInstallation).
+
+
+<a name="PrepareMac"></a>
+## Prepare environment for Mac OS
+
+Before building TensorFlow, you must install the following on your system:
+
+  * bazel
+  * TensorFlow Python dependencies.
+  * optionally, NVIDIA packages to support TensorFlow for GPU.
+
+
+### Install bazel
+
+If bazel is not installed on your system, install it now by following
+[these directions](https://bazel.build/versions/master/docs/install.html#mac-os-x).
+
+
+### Install python dependencies
+
+To install TensorFlow, you must install the following packages:
+
+  * six
+  * numpy, which is a numerical processing package that TensorFlow requires.
+  * wheel, which enables you to manage Python compressed packages
+    in the wheel (.whl) format.
+
+You may install the python dependencies using pip. If you don't have pip
+on your machine, we recommend using homebrew to install Python and pip as
+[documented here](http://docs.python-guide.org/en/latest/starting/install/osx/).
+If you follow these instructions, you will not need to disable SIP.
+
+After installing pip, invoke the following commands:
+
+<pre> $ <b>sudo pip install six numpy wheel</b> </pre>
+
+
+
+### Optional: install TensorFlow for GPU prerequisites
+
+If you do not have brew installed, install it by following
+[these instructions](http://brew.sh/).
+
+After installing brew, install GNU coreutils by issuing the following command:
+
+<pre>$ <b>brew install coreutils</b></pre>
+
+If you want to compile tensorflow and have XCode 7.3 and CUDA 7.5 installed,
+note that Xcode 7.3 is not yet compatible with CUDA 7.5.  To remedy this
+problem, do either of the following:
+
+  * Upgrade to CUDA 8.0.
+  * Download Xcode 7.2 and select it as your default by issuing the following
+    command:
+
+    <pre> $ <b>sudo xcode-select -s /Application/Xcode-7.2/Xcode.app</b></pre>
+
+**NOTE:** Your system must fulfill the NVIDIA software requirements described
+in one of the following documents:
+
+  * @{$install_linux#NVIDIARequirements$Installing TensorFlow on Linux}
+  * @{$install_mac#NVIDIARequirements$Installing TensorFlow on Mac OS}
+
+
+<a name="ConfigureInstallation"></a>
+## Configure the installation
+
+The root of the source tree contains a bash script named
+<code>configure</code>. This script asks you to identify the pathname of all
+relevant TensorFlow dependencies and specify other build configuration options
+such as compiler flags. You must run this script *prior* to
+creating the pip package and installing TensorFlow.
+
+If you wish to build TensorFlow with GPU, `configure` will ask
+you to specify the version numbers of Cuda and cuDNN. If several
+versions of Cuda or cuDNN are installed on your system, explicitly select
+the desired version instead of relying on the system default.
+
+Here is an example execution of the `configure` script.  Note that your
+own input will likely differ from our sample input:
+
+
+<pre>
+$ <b>cd tensorflow</b>  # cd to the top-level directory created
+$ <b>./configure</b>
+Please specify the location of python. [Default is /usr/bin/python]: <b>/usr/bin/python2.7</b>
+Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
+Do you wish to use jemalloc as the malloc implementation? [Y/n]
+jemalloc enabled
+Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
+No Google Cloud Platform support will be enabled for TensorFlow
+Do you wish to build TensorFlow with Hadoop File System support? [y/N]
+No Hadoop File System support will be enabled for TensorFlow
+Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
+No XLA JIT support will be enabled for TensorFlow
+Found possible Python library paths:
+  /usr/local/lib/python2.7/dist-packages
+  /usr/lib/python2.7/dist-packages
+Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]
+Using python library path: /usr/local/lib/python2.7/dist-packages
+Do you wish to build TensorFlow with OpenCL support? [y/N] N
+No OpenCL support will be enabled for TensorFlow
+Do you wish to build TensorFlow with CUDA support? [y/N] Y
+CUDA support will be enabled for TensorFlow
+Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
+Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: <b>8.0</b>
+Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
+Please specify the cuDNN version you want to use. [Leave empty to use system default]: <b>5</b>
+Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
+Please specify a list of comma-separated Cuda compute capabilities you want to build with.
+You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
+Please note that each additional compute capability significantly increases your build time and binary size.
+[Default is: "3.5,5.2"]: <b>3.0</b>
+Setting up Cuda include
+Setting up Cuda lib
+Setting up Cuda bin
+Setting up Cuda nvvm
+Setting up CUPTI include
+Setting up CUPTI lib64
+Configuration finished
+</pre>
+
+If you told `configure` to build for GPU support, then `configure`
+will create a canonical set of symbolic links to the Cuda libraries
+on your system.  Therefore, every time you change the Cuda library paths,
+you must rerun the `configure` script before re-invoking
+the <code>bazel build</code> command.
+
+Note the following:
+
+  * Although it is possible to build both Cuda and non-Cuda configs
+    under the same source tree, we recommend running `bazel clean` when
+    switching between these two configurations in the same source tree.
+  * If you don't run the `configure` script *before* running the
+    `bazel build` command, the `bazel build` command will fail.
+
+
+## Build the pip package
+
+To build a pip package for TensorFlow with CPU-only support,
+invoke the following command:
+
+<pre>
+$ <b>bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package</b>
+</pre>
+
+To build a pip package for TensorFlow with GPU support,
+invoke the following command:
+
+<pre>$ <b>bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package</b> </pre>
+
+<b>Tip:</b> By default, building TensorFlow from sources consumes
+a lot of RAM.  If RAM is an issue on your system, you may limit RAM usage
+by specifying <code>--local_resources 2048,.5,1.0</code> while
+invoking `bazel`.
+
+The <code>bazel build</code> command builds a script named
+`build_pip_package`.  Running this script as follows will build
+a `.whl` file within the `/tmp/tensorflow_pkg` directory:
+
+<pre>
+$ <b>bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg</b>
+</pre>
+
+
+## Install the pip package
+
+Invoke `pip install` to install that pip package.
+The filename of the `.whl `file depends on your platform.
+For example, the following command will install the pip package
+for TensorFlow 1.0.0 on Linux:
+
+<pre>
+$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.0.0-py2-none-any.whl</b>
+</pre>
+
+
+<a name="#ValidateYourInstallation"></a>
+## Validate your installation
+
+Validate your TensorFlow installation by doing the following:
+
+  1. Start a terminal.
+
+  2. Change directory (`cd`) to any directory on your system other than
+     the `tensorflow` subdirectory from which you invoked the `configure`
+     command.
+
+  3. Invoke python:
+
+  <pre> $ <b>python</b></pre>
+
+  4. Enter the following short program inside the python interactive shell:
+
+  <pre>>>> <b>import tensorflow as tf</b>
+  >>> <b>hello = tf.constant('Hello, TensorFlow!')</b>
+  >>> <b>sess = tf.Session()</b>
+  >>> <b>print(sess.run(hello))</b></pre>
+
+  If the Python program outputs the following, then the installation
+  is successful and you can begin writing TensorFlow programs. (If you
+  are new to TensorFlow, see *Getting Started with TensorFlow*):
+
+  <pre>Hello, TensorFlow!</pre>
+
+  If the system generates an error message instead of a greeting, see
+  the next section.
+
+
+## Common installation problems
+
+The installation problems you encounter typically depend on the
+operating system.  See the "Common installation problems" section
+of one of the following guides:
+
+  * @{$install_linux#CommonInstallationProblems$Installing TensorFlow on Linux}
+  * @{$install_mac#CommonInstallationProblems$Installing TensorFlow on Mac OS}
+
+Beyond the errors documented in those two guides, the following table
+notes additional errors specific to building TensorFlow.  Note that we
+are relying on Stack Overflow as the repository for build and installation
+problems.  If you encounter an error message not listed in the preceding
+two guides or in the following table, search for it on Stack Overflow.  If
+Stack Overflow doesn't show the error message, ask a new question on
+Stack Overflow and specify the `tensorflow` tag.
+
+<table>
+<tr> <th>Stack Overflow Link</th> <th>Error Message</th> </tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42013316">42013316</a></td>
+  <td><pre>ImportError: libcudart.so.8.0: cannot open shared object file:
+  No such file or directory</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42013316">42013316</a></td>
+  <td><pre>ImportError: libcudnn.5: cannot open shared object file:
+  No such file or directory</pre></td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/35953210">35953210</a></td>
+  <td>Invoking `python` or `ipython` generates the following error:
+  <pre>ImportError: cannot import name pywrap_tensorflow</pre></td>
+</tr>
+</table>
--- a/tensorflow/docs_src/install/install_windows.md
+++ b/tensorflow/docs_src/install/install_windows.md
@ -0,0 +1,197 @@
+# Installing TensorFlow on Windows
+
+This guide explains how to install TensorFlow on Windows.
+
+## Determine which TensorFlow to install
+
+You must choose one of the following types of TensorFlow to install:
+
+  * **TensorFlow with CPU support only**. If your system does not have a
+    NVIDIA® GPU, you must install this version. Note that this version of
+    TensorFlow is typically much easier to install (typically,
+    in 5 or 10 minutes), so even if you have an NVIDIA GPU, we recommend
+    installing this version first.
+  * **TensorFlow with GPU support**. TensorFlow programs typically run
+    significantly faster on a GPU than on a CPU. Therefore, if your
+    system has a NVIDIA® GPU meeting the prerequisites shown below
+    and you need to run performance-critical applications, you should
+    ultimately install this version.
+
+### Requirements to run TensorFlow with GPU support
+
+If you are installing TensorFlow with GPU support using one of the mechanisms
+described in this guide, then the following NVIDIA software must be
+installed on your system:
+
+  * CUDA® Toolkit 8.0. For details, see
+    [NVIDIA's
+    documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/)
+    Ensure that you append the relevant Cuda pathnames to the `%PATH%`
+    environment variable as described in the NVIDIA documentation.
+  * The NVIDIA drivers associated with CUDA Toolkit 8.0.
+  * cuDNN v5.1. For details, see
+    [NVIDIA's documentation](https://developer.nvidia.com/cudnn).
+    Note that cuDNN is typically installed in a different location from the
+    other CUDA DLLs. Ensure that you add the directory where you installed
+    the cuDNN DLL to your `%PATH%` environment variable.
+  * GPU card with CUDA Compute Capability 3.0 or higher.  See
+    [NVIDIA documentation](https://developer.nvidia.com/cuda-gpus) for a
+    list of supported GPU cards.
+
+If you have an earlier version of the preceding packages, please
+upgrade to the specified versions.
+
+
+## Determine how to install TensorFlow
+
+You must pick the mechanism by which you install TensorFlow. The
+supported choices are as follows:
+
+  * "native" pip
+  * Anaconda
+
+Native pip installs TensorFlow directly on your system without going
+through a virtual environment.  Since a native pip installation is not
+walled-off in a separate container, the pip installation might interfere
+with other Python-based installations on your system. However, if you
+understand pip and your Python environment, a native pip installation
+often entails only a single command! Furthermore, if you install with
+native pip, users can run TensorFlow programs from any directory on
+the system.
+
+In Anaconda, you may use conda to create a virtual environment.
+However, within Anaconda, we recommend installing TensorFlow with the
+`pip install` command, not with the `conda install` command.
+
+**NOTE:** The conda package is community supported, not officially supported.
+That is, the TensorFlow team neither tests nor maintains this conda package.
+Use that package at your own risk.
+
+
+## Installing with native pip
+
+If the following version of Python is not installed on your machine,
+install it now:
+
+  * [Python 3.5.x from python.org](https://www.python.org/downloads/release/python-352/)
+
+TensorFlow only supports version 3.5.x of Python on Windows.
+Note that Python 3.5.x comes with the pip3 package manager, which is the
+program you'll use to install TensorFlow.
+
+To install TensorFlow, start a terminal. Then issue the appropriate
+<tt>pip3 install</tt> command in that terminal.  To install the CPU-only
+version of TensorFlow, enter the following command:
+
+<pre>C:\> <b>pip3 install --upgrade tensorflow</b></pre>
+
+To install the GPU version of TensorFlow, enter the following command:
+
+<pre>C:\> <b>pip3 install --upgrade tensorflow-gpu</b></pre>
+
+
+## Installing with Anaconda
+
+**The Anaconda installation is community supported, not officially supported.**
+
+Take the following steps to install TensorFlow in an Anaconda environment:
+
+  1. Follow the instructions on the
+     [Anaconda download site](https://www.continuum.io/downloads)
+     to download and install Anaconda.
+
+  2. Create a conda environment named <tt>tensorflow</tt>
+     by invoking the following command:
+
+     <pre>C:\> <b>conda create -n tensorflow</b> </pre>
+
+  3. Activate the conda environment by issuing the following command:
+
+     <pre>C:\> <b>activate tensorflow</b>
+     (tensorflow)C:\>  # Your prompt should change </pre>
+
+  4. Issue the appropriate command to install TensorFlow inside your conda
+     environment. To install the CPU-only version of TensorFlow, enter the
+     following command:
+
+     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.0.0-cp35-cp35m-win_x86_64.whl</b> </pre>
+
+     To install the GPU version of TensorFlow, enter the following command
+     (on a single line):
+
+     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.0.0-cp35-cp35m-win_x86_64.whl</b> </pre>
+
+<a name="#ValidateYourInstallation"></a>
+## Validate your installation
+
+Validate your TensorFlow installation by doing the following:
+
+  1. Start a terminal.
+  2. If you installed through Anaconda, activate your Anaconda environment.
+  3. Inside that terminal, invoke python:
+
+     <pre>C:\> <b>python</b> </pre>
+
+  4. Enter the following short program inside the python interactive shell:
+
+     <pre>>>> <b>import tensorflow as tf</b>
+     >>> <b>hello = tf.constant('Hello, TensorFlow!')</b>
+     >>> <b>sess = tf.Session()</b>
+     >>> <b>print(sess.run(hello))</b>
+     </pre>
+
+     If the Python program outputs the following, then the installation
+     is successful and you can begin writing TensorFlow programs. (If you
+     are new to TensorFlow, see
+     @{$get_started$Getting Started with TensorFlow}.)
+
+     <pre>Hello, TensorFlow!</pre>
+
+     If the system generates an error message instead of a greeting,
+     see the next section.
+
+
+## Common installation problems
+
+We are relying on Stack Overflow to document TensorFlow installation problems
+and their remedies.  The following table contains links to Stack Overflow
+answers for some common installation problems.
+If you encounter an error message or other
+installation problem not listed in the following table, search for it
+on Stack Overflow.  If Stack Overflow doesn't show the error message,
+ask a new question about it on Stack Overflow and specify
+the `tensorflow` tag.
+
+<table>
+<tr> <th>Stack Overflow Link</th> <th>Error Message</th> </tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/41007279">41007279</a></td>
+  <td>
+  <pre>[...\stream_executor\dso_loader.cc] Couldn't open CUDA library nvcuda.dll</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/41007279">41007279</a></td>
+  <td>
+  <pre>[...\stream_executor\cuda\cuda_dnn.cc] Unable to load cuDNN DSO</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="http://stackoverflow.com/q/42006320">42006320</a></td>
+  <td><pre>ImportError: Traceback (most recent call last):
+File "...\tensorflow\core\framework\graph_pb2.py", line 6, in <module>
+from google.protobuf import descriptor as _descriptor
+ImportError: cannot import name 'descriptor'</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/42011070">42011070</a></td>
+  <td><pre>No module named "pywrap_tensorflow"</pre></td>
+</tr>
+
+<table>
+
--- a/tensorflow/docs_src/install/leftnav_files
+++ b/tensorflow/docs_src/install/leftnav_files
@ -0,0 +1,5 @@
+install_linux.md
+install_mac.md
+install_windows.md
+install_sources.md
+migration.md
--- a/tensorflow/docs_src/install/migration.md
+++ b/tensorflow/docs_src/install/migration.md
@ -0,0 +1,337 @@
+
+# Transitioning to TensorFlow 1.0
+
+
+The APIs in TensorFlow 1.0 have changed in ways that are not all backwards
+compatible.  That is, TensorFlow programs that worked on TensorFlow 0.n won't
+necessarily work on TensorFlow 1.0.  We have made this API changes to ensure an
+internally-consistent API, and do not plan to make backwards-breaking changes
+throughout the 1.N lifecycle.
+
+This guide walks you through the major changes in the API and how to
+automatically upgrade your programs for TensorFlow 1.0.  This guide not
+only steps you through the changes but also explains why we've made them.
+
+## How to upgrade
+
+If you would like to automatically  port your code to 1.0, you can try our
+`tf_upgrade.py` script. While this script handles many cases, manual changes
+are sometimes necessary.
+  Get this script from our
+[GitHub tree](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/compatibility).
+
+To convert a single 0.n TensorFlow source file to 1.0, enter a
+command of the following format:
+
+<pre>
+$ <b>python tf_upgrade.py --infile</b> <i>InputFile</i> <b>--outfile</b> <i>OutputFile</i>
+</pre>
+
+For example, the following command converts a 0.n TensorFlow
+program named `test.py` to a 1.0 TensorFlow program named `test_1.0.py`:
+
+<pre>
+$ <b>python tf_upgrade.py --infile test.py --outfile test_1.0.py</b>
+</pre>
+
+The `tf_upgrade.py` script also generates a file named `report.txt`, which
+details all the changes it performed and makes additional suggestions about
+changes you might need to make manually.
+
+To upgrade a whole directory of 0.n TensorFlow programs to 1.0,
+enter a command having the following format:
+
+<pre>
+$ <b>python tf_upgrade.py --intree</b> <i>InputDir</i> <b>--outtree</b> <i>OutputDir</i>
+</pre>
+
+For example, the following command converts all the 0.n TensorFlow programs
+in the `/home/user/cool` directory, creating their 1.0 equivalents in
+the `/home/user/cool_1.0` directory:
+
+<pre>
+$ <b>python tf_upgrade.py --intree /home/user/cool --outtree /home/user/cool_1.0</b>
+</pre>
+
+### Limitations
+
+There are a few things to watch out for. Specifically:
+
+ * You must manually fix any instances of `tf.reverse()`.
+   The `tf_upgrade.py` script will warn you about `tf.reverse()` in
+   stdout and in the `report.txt` file.
+ * On reordered arguments, `tf_upgrade.py` tries to minimally reformat
+   your code, so it cannot automatically change the actual argument order.
+   Instead, `tf_upgrade.py` makes your function invocations order-independent
+   by introducing keyword arguments.
+ * Constructions like `tf.get_variable_scope().reuse_variables()`
+   will likely not work. We recommend deleting those lines and replacing
+   them with lines such as the following:
+
+   <pre>
+   with tf.variable_scope(tf.get_variable_scope(), reuse=True):
+     ...
+   </pre>
+
+ * Analogously to `tf.pack` and  `tf.unpack`, we're renamed
+   `TensorArray.pack` and `TensorArray.unpack` to
+   `TensorArray.stack` and `TensorArray.unstack`. However, `TensorArray.pack`
+   and `TensorArray.unpack` cannot be detected lexically since they are
+   indirectly related to the `tf` namespace e.g.
+   `foo = tf.TensorArray(); foo.unpack()`
+
+## Upgrading your code manually
+
+Instead of running `tf_upgrade.py`, you may manually upgrade your code.
+The remainder of this document provides a comprehensive list of
+all backward incompatible changes made in TensorFlow 1.0.
+
+
+### Variables
+
+Variable functions have been made more consistent and less confusing.
+
+* `tf.VARIABLES`
+    * should be renamed to `tf.GLOBAL_VARIABLES`
+* `tf.all_variables`
+    * should be renamed to `tf.global_variables`
+* `tf.initialize_all_variables`
+    * should be renamed to `tf.global_variables_initializer`
+* `tf.initialize_local_variables`
+    * should be renamed to `tf.local_variables_initializer`
+* `tf.initialize_variables`
+    * should be renamed to `tf.variables_initializer`
+
+### Summary functions
+
+Summary functions have been consolidated under the `tf.summary` namespace.
+
+* `tf.audio_summary`
+    * should be renamed to `tf.summary.audio`
+* `tf.contrib.deprecated.histogram_summary`
+    * should be renamed to `tf.summary.histogram`
+* `tf.contrib.deprecated.scalar_summary`
+    * should be renamed to `tf.summary.scalar`
+* `tf.histogram_summary`
+    * should be renamed to `tf.summary.histogram`
+* `tf.image_summary`
+    * should be renamed to `tf.summary.image`
+* `tf.merge_all_summaries`
+    * should be renamed to `tf.summary.merge_all`
+* `tf.merge_summary`
+    * should be renamed to `tf.summary.merge`
+* `tf.scalar_summary`
+    * should be renamed to `tf.summary.scalar`
+* `tf.train.SummaryWriter`
+    * should be renamed to `tf.summary.FileWriter`
+
+### Numeric differences
+
+
+Integer division and `tf.floordiv` now uses flooring semantics. This is to
+make the results of `np.divide` and `np.mod` consistent with `tf.divide` and
+`tf.mod`, respectively. In addition we have changed the rounding algorithm
+used by `tf.round` to match NumPy.
+
+
+* `tf.div`
+
+    * The semantics of `tf.divide` division have been changed to match Python
+semantics completely. That is, `/` in Python 3     and future division mode in
+Python 2 will produce floating point numbers always, `//` will produce floored
+division.     However, even `tf.div` will produce floored integer division.
+To force C-style truncation semantics, you must use `tf.truncatediv`.
+
+    * Consider changing your code to use `tf.divide`, which follows Python semantics for promotion.
+
+* `tf.mod`
+
+    * The semantics of `tf.mod` have been changed to match Python semantics. In
+particular, flooring semantics are used for     integers. If you wish to have
+C-style truncation mod (remainders), you can use `tf.truncatemod`
+
+
+The old and new behavior of division can be summarized with this table:
+
+| Expr                | TF 0.11 (py2) | TF 0.11 (py3) | TF 1.0 (py2) | TF 1.0 (py3) |
+|---------------------|---------------|---------------|--------------|--------------|
+| tf.div(3,4)         | 0             | 0             | 0            | 0            |
+| tf.div(-3,4)        | 0             | 0             | -1           | -1           |
+| tf.mod(-3,4)        | -3            | -3            | 1            | 1            |
+| -3/4                | 0             | -0.75         | -1           | -0.75        |
+| -3/4tf.divide(-3,4) | N/A           | N/A           | -0.75        | -1           |
+
+The old and new behavior of rounding can be summarized with this table:
+
+| Input | Python | NumPy | C++ round() | TensorFlow 0.11(floor(x+.5)) | TensorFlow 1.0 |
+|-------|--------|-------|-------------|------------------------------|----------------|
+| -3.5  | -4     | -4    | -4          | -3                           | -4             |
+| -2.5  | -2     | -2    | -3          | -2                           | -2             |
+| -1.5  | -2     | -2    | -2          | -1                           | -2             |
+| -0.5  | 0      | 0     | -1          | 0                            | 0              |
+| 0.5   | 0      | 0     | 1           | 1                            | 0              |
+| 1.5   | 2      | 2     | 2           | 2                            | 2              |
+| 2.5   | 2      | 2     | 3           | 3                            | 2              |
+| 3.5   | 4      | 4     | 4           | 4                            | 4              |
+
+
+
+### NumPy matching names
+
+
+Many functions have been renamed to match NumPy. This was done to make the
+transition between NumPy and TensorFlow as easy as possible. There are still
+numerous cases where functions do not match, so this is far from a hard and
+fast rule, but we have removed several commonly noticed inconsistencies.
+
+* `tf.inv`
+    * should be renamed to `tf.reciprocal`
+    * This was done to avoid confusion with NumPy's matrix inverse `np.inv`
+* `tf.list_diff`
+    * should be renamed to `tf.setdiff1d`
+* `tf.listdiff`
+    * should be renamed to `tf.setdiff1d`
+* `tf.mul`
+    * should be renamed to `tf.multiply`
+* `tf.neg`
+    * should be renamed to `tf.negative`
+* `tf.select`
+    * should be renamed to `tf.where`
+    * `tf.where` now takes 3 arguments or 1 argument, just like `np.where`
+* `tf.sub`
+    * should be renamed to `tf.subtract`
+
+### NumPy matching arguments
+
+Arguments for certain TensorFlow 1.0 methods now match arguments in certain
+NumPy methods.  To achieve this, TensorFlow 1.0 has changed keyword arguments
+and reordered some arguments. Notably, TensorFlow 1.0 now uses `axis` rather
+than `dimension`. TensorFlow 1.0 aims to keep the tensor argument first on
+operations that modify Tensors. (see the `tf.concat` change).
+
+
+* `tf.argmax`
+    * keyword argument `dimension` should be renamed to `axis`
+* `tf.argmin`
+    * keyword argument `dimension` should be renamed to `axis`
+* `tf.concat`
+    * keyword argument `concat_dim` should be renamed to `axis`
+    * arguments have been reordered to `tf.concat(values, axis, name='concat')`.
+* `tf.count_nonzero`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.expand_dims`
+    * keyword argument `dim` should be renamed to `axis`
+* `tf.reduce_all`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_any`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_join`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_logsumexp`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_max`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_mean`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_min`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_prod`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reduce_sum`
+    * keyword argument `reduction_indices` should be renamed to `axis`
+* `tf.reverse`
+    * `tf.reverse` used to take a 1D `bool` tensor to control which dimensions were reversed. Now we use a Tensor of axis indices.
+    * For example `tf.reverse(a, [True, False, True])` now must be `tf.reverse(a, [0, 2])`
+* `tf.reverse_sequence`
+    * keyword argument `batch_dim` should be renamed to `batch_axis`
+    * keyword argument `seq_dim` should be renamed to `seq_axis`
+* `tf.sparse_concat`
+    * keyword argument `concat_dim` should be renamed to `axis`
+* `tf.sparse_reduce_sum`
+    * keyword argument `reduction_axes` should be renamed to `axis`
+* `tf.sparse_reduce_sum_sparse`
+    * keyword argument `reduction_axes` should be renamed to `axis`
+* `tf.sparse_split`
+    * keyword argument `split_dim` should be renamed to `axis`
+    * arguments have been reordered to `tf.sparse_split(keyword_required=KeywordRequired(), sp_input=None, num_split=None, axis=None, name=None, split_dim=None)`.
+* `tf.split`
+    * keyword argument `split_dim` should be renamed to `axis`
+    * keyword argument `num_split` should be renamed to `num_or_size_splits`
+    * arguments have been reordered to `tf.split(value, num_or_size_splits, axis=0, num=None, name='split')`.
+* `tf.squeeze`
+    * keyword argument `squeeze_dims` should be renamed to `axis`
+* `tf.svd`
+    * arguments have been reordered to `tf.svd(tensor, full_matrices=False, compute_uv=True, name=None)`.
+
+### Simplified math variants
+
+Batched versions of math operations have been removed. Now the functionality is
+contained in the non-batched versions. Similarly,`tf.complex_abs` has had its
+functionality moved to `tf.abs`
+
+* `tf.batch_band_part`
+    * should be renamed to `tf.band_part`
+* `tf.batch_cholesky`
+    * should be renamed to `tf.cholesky`
+* `tf.batch_cholesky_solve`
+    * should be renamed to `tf.cholesky_solve`
+* `tf.batch_fft`
+    * should be renamed to `tf.fft`
+* `tf.batch_fft3d`
+    * should be renamed to `tf.fft3d`
+* `tf.batch_ifft`
+    * should be renamed to `tf.ifft`
+* `tf.batch_ifft2d`
+    * should be renamed to `tf.ifft2d`
+* `tf.batch_ifft3d`
+    * should be renamed to `tf.ifft3d`
+* `tf.batch_matmul`
+    * should be renamed to `tf.matmul`
+* `tf.batch_matrix_determinant`
+    * should be renamed to `tf.matrix_determinant`
+* `tf.batch_matrix_diag`
+    * should be renamed to `tf.matrix_diag`
+* `tf.batch_matrix_inverse`
+    * should be renamed to `tf.matrix_inverse`
+* `tf.batch_matrix_solve`
+    * should be renamed to `tf.matrix_solve`
+* `tf.batch_matrix_solve_ls`
+    * should be renamed to `tf.matrix_solve_ls`
+* `tf.batch_matrix_transpose`
+    * should be renamed to `tf.matrix_transpose`
+* `tf.batch_matrix_triangular_solve`
+    * should be renamed to `tf.matrix_triangular_solve`
+* `tf.batch_self_adjoint_eig`
+    * should be renamed to `tf.self_adjoint_eig`
+* `tf.batch_self_adjoint_eigvals`
+    * should be renamed to `tf.self_adjoint_eigvals`
+* `tf.batch_set_diag`
+    * should be renamed to `tf.set_diag`
+* `tf.batch_svd`
+    * should be renamed to `tf.svd`
+* `tf.complex_abs`
+    * should be renamed to `tf.abs`
+
+### Misc Changes
+
+Several other changes have been made, including the following:
+
+* `tf.image.per_image_whitening`
+    * should be renamed to `tf.image.per_image_standardization`
+* `tf.nn.sigmoid_cross_entropy_with_logits`
+    * arguments have been reordered to `tf.nn.sigmoid_cross_entropy_with_logits(_sentinel=None, labels=None, logits=None, name=None)`.
+* `tf.nn.softmax_cross_entropy_with_logits`
+    * arguments have been reordered to `tf.nn.softmax_cross_entropy_with_logits(_sentinel=None, labels=None, logits=None, dim=-1, name=None)`.
+* `tf.nn.sparse_softmax_cross_entropy_with_logits`
+    * arguments have been reordered to `tf.nn.sparse_softmax_cross_entropy_with_logits(_sentinel=None, labels=None, logits=None, name=None)`.
+* `tf.ones_initializer`
+    * should be changed to a function call i.e. `tf.ones_initializer()`
+* `tf.pack`
+    * should be renamed to `tf.stack`
+* `tf.round`
+    * The semantics of `tf.round` now match Banker's rounding.
+* `tf.unpack`
+    * should be renamed to `tf.unstack`
+* `tf.zeros_initializer`
+    * should be changed to a function call i.e. `tf.zeros_initializer()`
+
--- a/tensorflow/docs_src/performance/leftnav_files
+++ b/tensorflow/docs_src/performance/leftnav_files
@ -1,8 +1,9 @@
-### XLA
+performance_guide.md
 xla/index.md
-xla/jit.md
-xla/tfcompile.md
+xla/broadcasting.md
 xla/developing_new_backend.md
+xla/jit.md
 xla/operation_semantics.md
 xla/shapes.md
-xla/broadcasting.md
+xla/tfcompile.md
+quantization.md
--- a/tensorflow/docs_src/performance/performance_guide.md
+++ b/tensorflow/docs_src/performance/performance_guide.md
@ -0,0 +1,143 @@
+# Performance
+
+This guide contains a collection of best practices for optimizing your
+TensorFlow code. The best practices apply to both new and experienced
+Tensorflow users.
+
+## Best Practices
+While optimizing implementations of different types of models can be different,
+the topics below cover best practices to get the most performance from
+TensorFlow. Although these suggestions focus on image-based models, we will
+regularly add tips for all kinds of models. The following list highlights key
+best practices:
+
+*   Build and install from source
+*   Utilize queues for reading data
+*   Preprocessing on the CPU
+*   Use `NCHW` image data format
+*   Place shared parameters on the GPU
+*   Use fused batch norm
+
+The following sections detail the preceding suggestions.
+
+### Build and install from source
+
+To install the most optimized version of TensorFlow, build and install
+TensorFlow from source by following [Installing TensorFlow from Source](../install/install_sources).
+Building from source with compiler optimizations for the target hardware and
+ensuring the latest CUDA platform and cuDNN libraries are installed results in
+the highest performing installs.
+
+For the most stable experience, build from the [latest release](https://github.com/tensorflow/tensorflow/releases)
+branch. To get the latest performance changes and accept some stability risk,
+build from [master](https://github.com/tensorflow/tensorflow).
+
+If there is a need to build TensorFlow on a platform that has different hardware
+than the target, then cross-compile with the highest optimizations for the target
+platform.  The following command is an example of telling `bazel` to compile for
+a specific platform:
+
+```python
+# This command optimizes for Intel’s Broadwell processor
+bazel build -c opt --copt=-march="broadwell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
+
+```
+
+#### Environment, build, and install tips
+
+*   Compile with the highest level of compute the [GPU
+    supports](http://developer.nvidia.com/cuda-gpus), e.g. P100: 6.0, Titan X
+    (pascal): 6.2, Titan X (maxwell): 5.2, and K80: 3.7.
+*   Install the latest CUDA platform and cuDNN libraries.
+*   Make sure to use a version of gcc that supports all of the optimizations of
+    the target CPU. The recommended minimum gcc version is 4.8.3.
+*   TensorFlow checks on startup whether it has been compiled with the
+    optimizations available on the CPU. If the optimizations are not included,
+    TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not
+    included.
+
+### Utilize queues for reading data
+
+One common cause of poor performance is underutilizing GPUs, or essentially
+"starving" them of data by not setting up an efficient pipeline. Make sure to
+set up an input pipeline to utilize queues and stream data effectively. Review
+the @{$reading_data#reading_from_files$Reading Data guide} for implementation
+details. One way to identify a "starved" GPU is to generate and review
+timelines. A detailed tutorial for timelines does not exist, but a quick example
+of generating a timeline exists as part of the @{$jit$XLA JIT} tutorial. Another
+simple way to check if a GPU is underutilized is to run `watch nvidia-smi`, and
+if GPU utilization is not approaching 100% then the GPU is not getting data fast
+enough.
+
+Unless for a special circumstance or for example code, do not feed data
+into the session from Python variables, e.g. `dictionary`.
+
+```python
+# This will result in poor performance.
+sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
+```
+
+### Preprocessing on the CPU
+
+Placing preprocessing operations on the CPU can significantly improve
+performance.  When preprocessing occurs on the GPU the flow of data is
+CPU -> GPU (preprocessing) -> CPU -> GPU (training).  The data is bounced back
+and forth between the CPU and GPU.  When preprocessing is placed on the CPU,
+the data flow is CPU (preprocessing) -> GPU (training).  Another benefit is
+preprocessing on the CPU frees GPU time to focus on training.
+
+Placing preprocessing on the CPU can result in a 6X+ increase in samples/sec
+processed, which could lead to training in 1/6th of the time.  To ensure
+preprocessing is on the CPU, wrap the preprocessing operations as shown below:
+
+```python
+with tf.device('/cpu:0'):
+  # function to get and process images or data.
+  distorted_inputs = load_and_distort_images()
+```
+
+### Use large files
+
+Under some circumstances, both the CPU and GPU can be starved for data by the
+I/O system. If you are using many small files to form your input data set, you
+may be limited by the speed of your filesystem. If your training loop runs
+faster when using SSDs vs HDDs for storing your input data, you could could be
+I/O bottlenecked.
+
+If this is the case, you should pre-process your input data, creating a few
+large TFRecord files.
+
+### Use NCHW image data format
+
+Image data format refers to the representation of batches of images. TensorFlow
+supports `NHWC` (TensorFlow default) and `NCHW` (cuDNN default). N refers to the
+number of images in a batch, H refers to the number of pixels in the vertical
+dimension, W refers to the number of pixels in the horizontal dimension, and C
+refers to the channels (e.g. 1 for black and white, 3 for RGB, etc.) Although
+cuDNN can operate on both formats, it is faster to operate in its default
+format.
+
+The best practice is to build models that work with both `NCHW` and `NHWC` as it
+is common to train using `NCHW` on GPU, and then do inference with NHWC on CPU.
+
+The very brief history of these two formats is that TensorFlow started by using
+`NHWC` because it was a little faster on CPUs. Then the TensorFlow team
+discovered that `NCHW` performs better when using the NVIDIA cuDNN library.  The
+current recommendation is that users support both formats in their models. In
+the long term, we plan to rewrite graphs to make switching between the formats
+transparent.
+
+### Use fused batch norm
+
+When using batch norm
+@{tf.contrib.layers.batch_norm} set the attribute `fused=True`:
+
+```python
+bn = tf.contrib.layers.batch_norm(
+          input_layer, fused=True, data_format='NCHW'
+          scope=scope, **kwargs)
+```
+
+The non-fused batch norm does computations using several individual Ops. Fused
+batch norm combines the individual operations into a single kernel, which runs
+faster.
--- a/tensorflow/docs_src/performance/quantization.md
+++ b/tensorflow/docs_src/performance/quantization.md
@ -143,13 +143,13 @@ conversion functions before and after to move the data between float and
 eight-bit. Below is an example of what they look like. First here's the original
 Relu operation, with float inputs and outputs:

-![Relu Diagram](https://www.tensorflow.org/images/quantization0.png)
+![Relu Diagram](https://www.tensorflow.org/../images/quantization0.png)

 Then, this is the equivalent converted subgraph, still with float inputs and
 outputs, but with internal conversions so the calculations are done in eight
 bit.

-![Converted Diagram](https://www.tensorflow.org/images/quantization1.png)
+![Converted Diagram](https://www.tensorflow.org/../images/quantization1.png)

 The min and max operations actually look at the values in the input float
 tensor, and then feeds them into the Dequantize operation that converts the
@ -162,7 +162,7 @@ operations that all have float equivalents, then there will be a lot of adjacent
 Dequantize/Quantize ops. This stage spots that pattern, recognizes that they
 cancel each other out, and removes them, like this:

-![Stripping Diagram](https://www.tensorflow.org/images/quantization2.png)
+![Stripping Diagram](https://www.tensorflow.org/../images/quantization2.png)

 Applied on a large scale to models where all of the operations have quantized
 equivalents, this gives a graph where all of the tensor calculations are done in
--- a/tensorflow/docs_src/performance/xla/broadcasting.md
+++ b/tensorflow/docs_src/performance/xla/broadcasting.md
--- a/tensorflow/docs_src/performance/xla/developing_new_backend.md
+++ b/tensorflow/docs_src/performance/xla/developing_new_backend.md
--- a/tensorflow/docs_src/performance/xla/index.md
+++ b/tensorflow/docs_src/performance/xla/index.md
@ -10,8 +10,7 @@ XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear
 algebra that optimizes TensorFlow computations. The results are improvements in
 speed, memory usage, and portability on server and mobile platforms. Initially,
 most users will not see large benefits from XLA, but are welcome to experiment
-by using XLA via [just-in-time (JIT) compilaton](jit.md) or [ahead-of-time (AOT)
-compilation](tfcompile.md). Developers targeting new hardware accelerators are
+by using XLA via @{$jit$just-in-time (JIT) compilaton} or @{$tfcompile$ahead-of-time (AOT) compilation}. Developers targeting new hardware accelerators are
 especially encouraged to try out XLA.

 The XLA framework is experimental and in active development. In particular,
@ -51,14 +50,13 @@ We had several objectives for XLA to work with TensorFlow:

 The input language to XLA is called "HLO IR", or just HLO (High Level
 Optimizer). The semantics of HLO are described on the
-[Operation Semantics](operation_semantics.md) page. It
+@{$operation_semantics$Operation Semantics} page. It
 is most convenient to think of HLO as a [compiler
 IR](https://en.wikipedia.org/wiki/Intermediate_representation).

 XLA takes graphs ("computations") defined in HLO and compiles them into machine
 instructions for various architectures. XLA is modular in the sense that it is
-easy to slot in an alternative backend to [target some novel HW
-architecture](developing_new_backend.md). The CPU backend for x64 and ARM64 as
+easy to slot in an alternative backend to @{$developing_new_backend$target some novel HW architecture}. The CPU backend for x64 and ARM64 as
 well as the NVIDIA GPU backend are in the TensorFlow source tree.

 The following diagram shows the compilation process in XLA:
@ -91,5 +89,5 @@ CPU backend supports multiple CPU ISAs.

 ## Supported Platforms

-XLA currently supports [JIT compilation](jit.md) on x86-64 and NVIDIA GPUs; and
-[AOT compilation](tfcompile.md) for x86-64 and ARM.
+XLA currently supports @{$jit$JIT compilation} on x86-64 and NVIDIA GPUs; and
+@{$tfcompile$AOT compilation} for x86-64 and ARM.
--- a/tensorflow/docs_src/performance/xla/jit.md
+++ b/tensorflow/docs_src/performance/xla/jit.md
@ -35,8 +35,7 @@ placed on the same device.

 Turning on JIT compilation at the session level will result in all possible
 operators being greedily compiled into XLA computations. Each XLA computation
-will be compiled into one or more kernels for the underlying device. (This does
-not mean everything will be fused into a single CUDA kernel, for example.)
+will be compiled into one or more kernels for the underlying device.

 Subject to a few constraints, if there are two adjacent operators in the graph
 that both have XLA implementations, then they will be compiled into a single XLA
@ -54,6 +53,11 @@ config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON
 sess = tf.Session(config=config)
 ```

+> Note: Turning on JIT at the session level will not result in operations being
+> compiled for the CPU. JIT compilation for CPU operations must be done via
+> the manual method documented below. This decision was made due to the CPU
+> backend being single-threaded.
+
 #### Manual

 JIT compilation can also be turned on manually for one or more operators. This
@ -93,8 +97,13 @@ it expensive to mix XLA and TensorFlow operators in the same graph.
 ## Tutorial

 This tutorial covers training a simple version of MNIST softmax with JIT turned
-on. While, the tutorial was created using CPU only, the steps are the same and
-the artifacts are similar to running on GPU.
+on. Currently JIT at the session level, which is what is used for the tutorial,
+only supports GPU.
+
+Before starting the tutorial verify that the LD_LIBRARY environment variable or
+ldconfig contains `$CUDA_ROOT/extras/CUPTI/lib64`, which contains libraries for
+the CUDA Profiling Tools Interface [(CUPTI)](http://docs.nvidia.com/cuda/cupti/index.html).
+TensorFlow uses CUPTI to pull tracing information from the GPU.

 ### Step #1: Prepare sample script

@ -107,7 +116,7 @@ into a folder outside of the TensorFlow source tree.
 Execute the python script to train the model without XLA.

 ```shell
-python mnist_softmax_xla.py --xla=false
+python mnist_softmax_xla.py --xla=''
 ```

 Using the Chrome Trace Event Profiler (browse to chrome://tracing),
@ -115,7 +124,7 @@ open the timeline file created when the script finishes: `timeline.ctf.json`.
 The rendered timeline should look similar to the picture below with multiple
 green boxes labeled `MatMul`, possibly across multiple CPUs.
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../../images/jit_timeline_cpu.png">
+  <img style="width:100%" src="../../images/jit_timeline_gpu.png">
 </div>

 ### Step #3 Run with XLA
@ -130,7 +139,7 @@ TF_XLA_FLAGS=--xla_generate_hlo_graph=.* python mnist_softmax_xla.py
 Open the timeline file created (`timeline.ctf.json`).  The rendered timeline
 should look similar to the picture below with one long bar labeled `_XlaLaunch`.
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../../images/jit_timeline_cpu_xla.png">
+  <img style="width:100%" src="../../images/jit_timeline_gpu_xla.png">
 </div>

 To understand what is happening in `_XlaLaunch`, look at the console output for
@ -142,15 +151,19 @@ pipeline start, before inline]: /tmp/hlo_graph_0.dot

 ```

-The debug statements point to the location of `hlo_graph_xx.dot` files that
-contain info about the graph created by XLA at runtime. To Render the .dot file
-into a png, install [GraphViz](http://www.graphviz.org/Download..php) and run:
+The console statements point to the location of `hlo_graph_xx.dot` files that
+contain information about the graph created by XLA. The process that XLA takes
+to fuse Ops is visible by starting at `hlo_graph_0.dot` and viewing each diagram
+in succession.
+
+To Render the .dot file into a png, install
+[GraphViz](http://www.graphviz.org/Download..php) and run:

 ```shell
-dot -Tpng hlo_graph_0.dot -o hlo_graph_0.png
+dot -Tpng hlo_graph_80.dot -o hlo_graph_80.png
 ```

 The result will look like the following:
 <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
-  <img style="width:100%" src="../../images/jit_cpu_xla_graph.png">
+  <img style="width:100%" src="../../images/jit_gpu_xla_graph.png">
 </div>
--- a/tensorflow/docs_src/performance/xla/operation_semantics.md
+++ b/tensorflow/docs_src/performance/xla/operation_semantics.md
@ -65,7 +65,7 @@ The arity and types of the `args` must match the parameters of the

 See also
 [`ComputationBuilder::Collapse`](https://www.tensorflow.org/code/tensorflow/compiler/xla/client/computation_builder.h)
-and the [`Reshape`](#reshape) operation.
+and the @{tf.reshape} operation.

 Collapses dimensions of an array into one dimension.

@ -87,7 +87,7 @@ same position in the dimension sequence as those they replace, with the new
 dimension size equal to the product of original dimension sizes. The lowest
 dimension number in `dimensions` is the slowest varying dimension (most major)
 in the loop nest which collapses these dimension, and the highest dimension
-number is fastest varying (most minor). See the [`Reshape`](#reshape) operator
+number is fastest varying (most minor). See the @{tf.reshape} operator
 if more general collapse ordering is needed.

 For example, let v be an array of 24 elements:
@ -293,8 +293,7 @@ array. The holes are filled with a no-op value, which for convolution means
 zeroes.

 Dilation of the rhs is also called atrous convolution. For more details, see the
-[TensorFlow documentation on atrous convolution](https://www.tensorflow.org/
-versions/r0.11/api_docs/python/nn.html#atrous_conv2d). Dilation of the lhs is
+@{tf.nn.atrous_conv2d}. Dilation of the lhs is
 also called deconvolution.

 The output shape has these dimensions, in this order:
@ -474,7 +473,7 @@ Arguments | Type                    | Semantics
 `rhs`     | `ComputationDataHandle` | right-hand-side operand: array of type T

 The arguments' shapes have to be either similar or compatible. See the
-[broadcasting](broadcasting.md) documentation about what it means for shapes to
+@{$broadcasting$broadcasting} documentation about what it means for shapes to
 be compatible. The result of an operation has a shape which is the result of
 broadcasting the two input arrays. In this variant, operations between arrays of
 different ranks are *not* supported, unless one of the operands is a scalar.
@ -498,7 +497,7 @@ the dimensions of the higher-rank shape. The unmapped dimensions of the expanded
 shape are filled with dimensions of size one. Degenerate-dimension broadcasting
 then broadcasts the shapes along these degenerate dimension to equalize the
 shapes of both operands. The semantics are described in detail on the
-[broadcasting page](broadcasting.md).
+@{$broadcasting$broadcasting page}.

 ## Element-wise comparison operations

@ -521,7 +520,7 @@ Arguments | Type                    | Semantics
 `rhs`     | `ComputationDataHandle` | right-hand-side operand: array of type T

 The arguments' shapes have to be either similar or compatible. See the
-[broadcasting](broadcasting.md) documentation about what it means for shapes to
+@{$broadcasting$broadcasting} documentation about what it means for shapes to
 be compatible. The result of an operation has a shape which is the result of
 broadcasting the two input arrays with the element type `PRED`. In this variant,
 operations between arrays of different ranks are *not* supported, unless one of
@ -538,7 +537,7 @@ matrix to a vector).

 The additional `broadcast_dimensions` operand is a slice of integers specifying
 the dimensions to use for broadcasting the operands. The semantics are described
-in detail on the [broadcasting page](broadcasting.md).
+in detail on the @{$broadcasting$broadcasting page}.

 ## Element-wise unary functions

@ -598,7 +597,7 @@ let t: (f32[10], s32) = tuple(v, s);
 let element_1: s32 = gettupleelement(t, 1);  // Inferred shape matches s32.
 ```

-See also [`Tuple`](#tuple).
+See also @{tf.tuple}.

 ## Infeed

@ -1381,7 +1380,7 @@ Arguments | Type                    | Semantics

 ## Transpose

-See also the [`Reshape`](#reshape) operation.
+See also the @{tf.reshape} operation.

 <b>`Transpose(operand)`</b>

--- a/tensorflow/docs_src/performance/xla/shapes.md
+++ b/tensorflow/docs_src/performance/xla/shapes.md
--- a/tensorflow/docs_src/performance/xla/tfcompile.md
+++ b/tensorflow/docs_src/performance/xla/tfcompile.md
@ -17,23 +17,21 @@ kernels that are actually used in the computation.
 The compiler is built on top of the XLA framework. The code bridging TensorFlow
 to the XLA framework resides under
 [tensorflow/compiler](https://www.tensorflow.org/code/tensorflow/compiler/),
-which also includes support for [just-in-time (JIT) compilation](jit.md) of
+which also includes support for @{$jit$just-in-time (JIT) compilation} of
 TensorFlow graphs.

 ## What does tfcompile do?

 `tfcompile` takes a subgraph, identified by the TensorFlow concepts of
-[`feeds`](../../get_started/basic_usage.md#feeds) and
-[`fetches`](../../get_started/basic_usage.md#fetches), and generates a function
-that implements that subgraph. The feeds are the input arguments for the
-function, and the fetches are the output arguments for the function. All inputs
-must be fully specified by the feeds; the resulting pruned subgraph cannot
-contain Placeholder or Variable nodes. It is common to specify all Placeholders
-and Variables as feeds, which ensures the resulting subgraph no longer contains
-these nodes. The generated function is packaged as a `cc_library`, with a header
-file exporting the function signature, and an object file containing the
-implementation. The user writes code to invoke the generated function as
-appropriate.
+feeds and fetches, and generates a function that implements that subgraph.
+The `feeds` are the input arguments for the function, and the `fetches` are the
+output arguments for the function. All inputs must be fully specified by the
+feeds; the resulting pruned subgraph cannot contain Placeholder or Variable
+nodes. It is common to specify all Placeholders and Variables as feeds, which
+ensures the resulting subgraph no longer contains these nodes. The generated
+function is packaged as a `cc_library`, with a header file exporting the
+function signature, and an object file containing the implementation. The user
+writes code to invoke the generated function as appropriate.

 ## Using tfcompile

@ -47,11 +45,9 @@ This section details high level steps for generating an executable binary with

 ### Step 1: Configure the subgraph to compile

-Identify the [`feeds`](../../get_started/basic_usage.md#feeds) and
-[`fetches`](../../get_started/basic_usage.md#fetches) of the graph, which
-correspond to the input and output arguments for the generated function. Then
-configure the feeds and fetches in a
-[`tensorflow.tfcompile.Config`](https://www.tensorflow.org/code/tensorflow/compiler/aot/tfcompile.proto)
+Identify the feeds and fetches that correspond to the input and output
+arguments for the generated function. Then configure the `feeds` and `fetches`
+in a [`tensorflow.tfcompile.Config`](https://www.tensorflow.org/code/tensorflow/compiler/aot/tfcompile.proto)
 proto.

 ```textproto
@ -120,7 +116,7 @@ tf_library(
 > [make_test_graphs.py]("https://www.tensorflow.org/code/tensorflow/compiler/aot/tests/make_test_graphs.py")
 > and specify the output location with the --out_dir flag.

-Typical graphs contain [`Variables`](../../api_docs/python/state_ops.md)
+Typical graphs contain @{$python/state_ops$`Variables`}
 representing the weights that are learned via training, but `tfcompile` cannot
 compile a subgraph that contain `Variables`. The
 [freeze_graph.py](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py)
--- a/tensorflow/docs_src/programmers_guide/data_versions.md
+++ b/tensorflow/docs_src/programmers_guide/data_versions.md
@ -1,7 +1,7 @@
 # TensorFlow Data Versioning: GraphDefs and Checkpoints

 As described in
-[Compatibility for Graphs and Checkpoints](versions.md#compatibility-for-graphs-and-checkpoints),
+@{$version_semantics#compatibility-for-graphs-and-checkpoints$Compatibility for Graphs and Checkpoints},
 TensorFlow marks each kind of data with version information in order to maintain
 backward compatibility. This document provides additional details about the
 versioning mechanism, and how to use it to safely change data formats.
--- a/tensorflow/docs_src/programmers_guide/debugger.md
+++ b/tensorflow/docs_src/programmers_guide/debugger.md
@ -24,7 +24,7 @@ This code trains a simple NN for MNIST digit image recognition. Notice that the
 accuracy increases slightly after the first training step, but then gets stuck
 at a low (near-chance) level:

-![debug_mnist training fails](../../images/tfdbg_screenshot_mnist_symptom.png)
+![debug_mnist training fails](../images/tfdbg_screenshot_mnist_symptom.png)

 Scratching your head, you suspect that certain nodes in the training graph
 generated bad numeric values such as `inf`s and `nan`s. The computation-graph
@ -58,11 +58,11 @@ state.
 the diagnosis of issues.

 In this example, we are registering a tensor filter called
-[`has_inf_or_nan`](../../../g3doc/api_docs/python/tf_debug.md#has_inf_or_nan),
+@{tfdbg.has_inf_or_nan},
 which simply determines if there are any `nan` or `inf` values in any
 intermediate tensor of the graph. (This filter is a common enough use case that
 we ship it with the
-[`debug_data`](../../../g3doc/api_docs/python/tf_debug.md#classes-for-debug-dump-data-and-directories)
+@{$python/tfdbg#Classes_for_debug_dump_data_and_directories$`debug_data`}
 module.)

 ```python
@ -71,7 +71,7 @@ def has_inf_or_nan(datum, tensor):
 ```

 TIP: You can also write your own custom filters. See
-the [API documentation](../../../g3doc/api_docs/python/tf_debug.md#DebugDumpDir.find)
+the @{tfdbg.DebugDumpDir.find$API documentation}
 of `DebugDumpDir.find()` for additional information.

 ## Debugging Model Training with tfdbg
@ -87,7 +87,7 @@ The debug wrapper session will prompt you when it is about to execute the first
 `run()` call, with information regarding the fetched tensor and feed
 dictionaries displayed on the screen.

-![tfdbg run-start UI](../../images/tfdbg_screenshot_run_start.png)
+![tfdbg run-start UI](../images/tfdbg_screenshot_run_start.png)

 This is what we refer to as the *run-start UI*. If the screen size is
 too small to display the content of the message in its entirety, you can resize
@ -106,7 +106,7 @@ intermedate tensors from the run. (These tensors can also be obtained by
 running the command `lt` after you executed `run`.) This is called the
 **run-end UI**:

-![tfdbg run-end UI: accuracy](../../images/tfdbg_screenshot_run_end_accuracy.png)
+![tfdbg run-end UI: accuracy](../images/tfdbg_screenshot_run_end_accuracy.png)

 ### tfdbg CLI Frequently-Used Commands

@ -174,7 +174,7 @@ screen with a red-colored title line indicating **tfdbg** stopped immediately
 after a `run()` call generated intermediate tensors that passed the specified
 filter `has_inf_or_nan`:

-![tfdbg run-end UI: infs and nans](../../images/tfdbg_screenshot_run_end_inf_nan.png)
+![tfdbg run-end UI: infs and nans](../images/tfdbg_screenshot_run_end_inf_nan.png)

 As the screen display indicates, the `has_inf_or_nan` filter is first passed
 during the fourth `run()` call: an [Adam optimizer](https://arxiv.org/abs/1412.6980)
@ -213,7 +213,7 @@ item on the top or entering the equivalent command:
 tfdbg> ni cross_entropy/Log
 ```

-![tfdbg run-end UI: infs and nans](../../images/tfdbg_screenshot_run_end_node_info.png)
+![tfdbg run-end UI: infs and nans](../images/tfdbg_screenshot_run_end_node_info.png)

 You can see that this node has the op type `Log`
 and that its input is the node `softmax/Softmax`. Run the following command to
@ -245,7 +245,7 @@ From the traceback, you can see that the op is constructed at line 109 of
 diff = y_ * tf.log(y)
 ```

-Apply a value clipping on the input to [`tf.log`](../../../g3doc/api_docs/python/math_ops.md#log)
+Apply a value clipping on the input to @{tf.log}
 to resolve this problem:

 ```python
@ -265,9 +265,9 @@ stuck. Success!
 ## Debugging tf-learn Estimators

 For documentation on **tfdbg** to debug
-[tf.contrib.learn](https://tensorflow.org/tutorials/tflearn/index.html)
+@{$tflearn$tf.contrib.learn}
 `Estimator`s and `Experiment`s, please see
-[How to Use TensorFlow Debugger (tfdbg) with tf.contrib.learn](tfdbg-tflearn.md).
+@{$tfdbg-tflearn$How to Use TensorFlow Debugger (tfdbg) with tf.contrib.learn}.

 ## Offline Debugging of Remotely-Running Sessions

@ -276,8 +276,7 @@ have terminal access to. To perform model debugging in such cases, you can use
 the `offline_analyzer` of `tfdbg`. It operates on dumped data directories.
 If the process you are running is written in Python, you can
 configure the `RunOptions` proto that you call your `Session.run()` method
-with, by using the method
-[`debug_utils.watch_graph()`](../../../g3doc/api_docs/python/tf_debug.md#watch_graph).
+with, by using the method @{tfdbg.watch_graph}.
 This will cause the intermediate tensors and runtime graphs to be dumped to a
 shared storage location of your choice when the `Session.run()` call occurs.
 For example:
@ -321,7 +320,7 @@ sess = tf_debug.DumpingDebugWrapperSession(
 `watch_fn=my_watch_fn` is a `Callable` that allows you to configure what
 `Tensor`s to watch on different `Session.run()` calls, as a function of the
 `fetches` and `feed_dict` to the `run()` call and other states. See
-[the API doc of DumpingDebugWrapperSession](../../api_docs/python/tf_debug.md#DumpingDebugWrapperSession.__init__)
+@{tfdbg.DumpingDebugWrapperSession.__init__$the API doc of DumpingDebugWrapperSession}
 for more details.

 If you model code is written in C++ or other languages, you can also
--- a/tensorflow/docs_src/programmers_guide/dims_types.md
+++ b/tensorflow/docs_src/programmers_guide/dims_types.md
@ -43,7 +43,7 @@ Rank | Shape | Dimension number | Example
 n | [D0, D1, ... Dn-1] | n-D | A tensor with shape [D0, D1, ... Dn-1].

 Shapes can be represented via Python lists / tuples of ints, or with the
-[`TensorShape` class](../api_docs/python/framework.md#TensorShape).
+@{tf.TensorShape}.

 ## Data types

--- a/Show More
+++ b/Show More