Fix broken link to Anaconda installation (#2679 )

Update cuda instructions to be more specific about versions (#2065 )
Merge pull request #2023 from caisq/r0.8-tensorforest-2
2016-06-06 13:54:44 -07:00 · 2016-04-22 13:51:21 -07:00 · 2016-04-19 13:51:30 -07:00 · 2016-04-19 13:31:57 -07:00 · 2016-04-19 16:01:05 -04:00 · 2016-04-19 13:08:21 -04:00
50 changed files with 1412 additions and 555 deletions
--- a/README.md
+++ b/README.md
@ -31,9 +31,9 @@ and discussion.**

 People who are a little bit adventurous can also try our nightly binaries:

-* Linux CPU only: [Python 2](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/)) / [Python 3](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/))
-* Linux GPU: [Python 2](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-working/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-working/)) / [Python 3](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-working/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-working/))
-* Mac CPU only: [Python 2](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-py2-none-any.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-py3-none-any.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
+* Linux CPU only: [Python 2](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp27-none-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/)) / [Python 3](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/))
+* Linux GPU: [Python 2](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-working/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp27-none-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-working/)) / [Python 3](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-working/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nigntly-matrix-linux-gpu/TF_BUILD_CONTAINER_TYPE=GPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-working/))
+* Mac CPU only: [Python 2](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-py2-none-any.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-py3-none-any.whl) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
 * [Android](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-android/TF_BUILD_CONTAINER_TYPE=ANDROID,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=NO_PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=android-slave/lastSuccessfulBuild/artifact/bazel-out/local_linux/bin/tensorflow/examples/android/tensorflow_demo.apk) ([build history](http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-android/TF_BUILD_CONTAINER_TYPE=ANDROID,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=NO_PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=android-slave/))

 #### *Try your first TensorFlow program*
--- a/RELEASE.md
+++ b/RELEASE.md
@ -35,7 +35,7 @@

 This release contains contributions from many people at Google, as well as:

-Abhinav Upadhyay, Aggelos Avgerinos, Alan Wu, Alexander G. de G. Matthews, Aleksandr Yahnev, @amchercashin, Andy Kitchen, Aurelien Geron, Awni Hannun, @BanditCat, Bas Veeling, Cameron Chen, @cg31, Cheng-Lung Sung, Christopher Bonnett, Dan Becker, Dan Van Boxel, Daniel Golden, Danijar Hafner, Danny Goodman, Dave Decker, David Dao, David Kretch, Dongjoon Hyun, Dustin Dorroh, @e-lin, Eurico Doirado, Erik Erwitt, Fabrizio Milo, @gaohuazuo, Iblis Lin, Igor Babuschkin, Isaac Hodes, Isaac Turner, Iván Vallés, J Yegerlehner, Jack Zhang, James Wexler, Jan Zikes, Jay Young, Jeff Hodges, @jmtatsch, Johnny Lim, Jonas Meinertz Hansen, Kanit Wongsuphasawat, Kashif Rasul, Ken Shirriff, Kenneth Mitchner, Kenta Yonekura, Konrad Magnusson, Konstantin Lopuhin, @lahwran, @lekaha, @liyongsea, Lucas Adams, @makseq, Mandeep Singh, @manipopopo, Mark Amery, Memo Akten, Michael Heilman, Michael Peteuil, Nathan Daly, Nicolas Fauchereau, @ninotoshi, Olav Nymoen, @panmari, @papelita1234, Pedro Lopes, Pranav Sailesh Mani, RJ Ryan, Rob Culliton, Robert DiPietro, @ronrest, Sam Abrahams, Sarath Shekkizhar, Scott Graham, Sebastian Raschka, Sung Kim, Surya Bhupatiraju, Syed Ahmed, Till Hoffmann, @timsl, @urimend, @vesnica, Vlad Frolov, Vlad Zagorodniy, Wei-Ting Kuo, Wenjian Huang, William Dmitri Breaden Madden, Wladimir Schmidt, Yuwen Yan, Yuxin Wu, Yuya Kusakabe, @zhongzyd, @znah.
+Abhinav Upadhyay, Aggelos Avgerinos, Alan Wu, Alexander G. de G. Matthews, Aleksandr Yahnev, @amchercashin, Andy Kitchen, Aurelien Geron, Awni Hannun, @BanditCat, Bas Veeling, Cameron Chen, @cg31, Cheng-Lung Sung, Christopher Bonnett, Dan Becker, Dan Van Boxel, Daniel Golden, Danijar Hafner, Danny Goodman, Dave Decker, David Dao, David Kretch, Dongjoon Hyun, Dustin Dorroh, @e-lin, Eurico Doirado, Erik Erwitt, Fabrizio Milo, @gaohuazuo, Iblis Lin, Igor Babuschkin, Isaac Hodes, Isaac Turner, Iván Vallés, J Yegerlehner, Jack Zhang, Jan Zikes, Jay Young, Jeff Hodges, @jmtatsch, Johnny Lim, Jonas Meinertz Hansen, Kanit Wongsuphasawat, Kashif Rasul, Ken Shirriff, Kenneth Mitchner, Kenta Yonekura, Konrad Magnusson, Konstantin Lopuhin, @lahwran, @lekaha, @liyongsea, Lucas Adams, @makseq, Mandeep Singh, @manipopopo, Mark Amery, Memo Akten, Michael Heilman, Michael Peteuil, Nathan Daly, Nicolas Fauchereau, @ninotoshi, Olav Nymoen, @panmari, @papelita1234, Pedro Lopes, Pranav Sailesh Mani, RJ Ryan, Rob Culliton, Robert DiPietro, @ronrest, Sam Abrahams, Sarath Shekkizhar, Scott Graham, Sebastian Raschka, Sung Kim, Surya Bhupatiraju, Syed Ahmed, Till Hoffmann, @timsl, @urimend, @vesnica, Vlad Frolov, Vlad Zagorodniy, Wei-Ting Kuo, Wenjian Huang, William Dmitri Breaden Madden, Wladimir Schmidt, Yuan Tang, Yuwen Yan, Yuxin Wu, Yuya Kusakabe, @zhongzyd, @znah.

 We are also grateful to all who filed issues or helped resolve them, asked and 
 answered questions, and were part of inspiring discussions. 
--- a/tensorflow/contrib/tensor_forest/core/ops/count_extremely_random_stats_op.cc
+++ b/tensorflow/contrib/tensor_forest/core/ops/count_extremely_random_stats_op.cc
@ -18,6 +18,7 @@
 // only op that involves tree traversal, and is constructed so that it can
 // be run in parallel on separate batches of data.
 #include <unordered_map>
+#include <vector>

 #include "tensorflow/contrib/tensor_forest/core/ops/tree_utils.h"

@ -25,10 +26,12 @@
 #include "tensorflow/core/framework/op_kernel.h"

 #include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/util/work_sharder.h"

 namespace tensorflow {

 using std::get;
+using std::make_pair;
 using std::make_tuple;
 using std::pair;
 using std::tuple;
@ -42,6 +45,71 @@ using tensorforest::DecideNode;
 using tensorforest::Initialize;
 using tensorforest::IsAllInitialized;

+// A data structure to store the results of parallel tree traversal.
+struct InputDataResult {
+  // A list of each node that was visited.
+  std::vector<int32> node_indices;
+  // The accumulator of the leaf that a data point ended up at, or -1 if none.
+  int32 leaf_accumulator;
+  // The left-branch taken candidate splits.
+  std::vector<int32> split_adds;
+  // If the candidate splits for the leaf that a data point arrived at
+  // were initialized or not, which determines if we add this to total
+  // pcw counts or not.
+  bool splits_initialized;
+};
+
+void Evaluate(const Tensor& input_data, const Tensor& input_labels,
+              const Tensor& tree_tensor, const Tensor& tree_thresholds,
+              const Tensor& node_to_accumulator,
+              const Tensor& candidate_split_features,
+              const Tensor& candidate_split_thresholds,
+              InputDataResult* results, int64 start, int64 end) {
+  const auto tree = tree_tensor.tensor<int32, 2>();
+  const auto thresholds = tree_thresholds.unaligned_flat<float>();
+  const auto node_map = node_to_accumulator.unaligned_flat<int32>();
+  const auto split_features = candidate_split_features.tensor<int32, 2>();
+  const auto split_thresholds = candidate_split_thresholds.tensor<float, 2>();
+
+  const int32 num_splits = candidate_split_features.shape().dim_size(1);
+
+  for (int i = start; i < end; ++i) {
+    const Tensor point = input_data.Slice(i, i + 1);
+    int node_index = 0;
+    results[i].splits_initialized = false;
+    while (true) {
+      results[i].node_indices.push_back(node_index);
+      int32 left_child = tree(node_index, CHILDREN_INDEX);
+      if (left_child == LEAF_NODE) {
+        const int32 accumulator = node_map(node_index);
+        results[i].leaf_accumulator = accumulator;
+        // If the leaf is not fertile or is not yet initialized, we don't
+        // count it in the candidate/total split per-class-weights because
+        // it won't have any candidate splits yet.
+        if (accumulator >= 0 &&
+            IsAllInitialized(candidate_split_features.Slice(
+                accumulator, accumulator + 1))) {
+          results[i].splits_initialized = true;
+          for (int split = 0; split < num_splits; split++) {
+            if (!DecideNode(point, split_features(accumulator, split),
+                            split_thresholds(accumulator, split))) {
+              results[i].split_adds.push_back(split);
+            }
+          }
+        }
+        break;
+      } else if (left_child == FREE_NODE) {
+        LOG(ERROR) << "Reached a free node, not good.";
+        results[i].node_indices.push_back(FREE_NODE);
+        break;
+      }
+      node_index =
+          left_child + DecideNode(point, tree(node_index, FEATURE_INDEX),
+                                  thresholds(node_index));
+    }
+  }
+}
+
 REGISTER_OP("CountExtremelyRandomStats")
  .Attr("num_classes: int32")
  .Input("input_data: float")
@ -79,9 +147,9 @@ REGISTER_OP("CountExtremelyRandomStats")
     gives the j-th feature of the i-th input.
   input_labels: The training batch's labels; `input_labels[i]` is the class
     of the i-th input.
-   tree:= A 2-d int32 tensor.  `tree[0][i]` gives the index of the left child
-     of the i-th node, `tree[0][i] + 1` gives the index of the right child of
-     the i-th node, and `tree[1][i]` gives the index of the feature used to
+   tree:= A 2-d int32 tensor.  `tree[i][0]` gives the index of the left child
+     of the i-th node, `tree[i][0] + 1` gives the index of the right child of
+     the i-th node, and `tree[i][1]` gives the index of the feature used to
     split the i-th node.
   tree_thresholds: `tree_thresholds[i]` is the value used to split the i-th
     node.
@ -176,7 +244,31 @@ class CountExtremelyRandomStats : public OpKernel {
            "candidate_split_features and candidate_split_thresholds should be "
            "the same shape."));

-    const int32 num_splits = candidate_split_features.shape().dim_size(1);
+    // Evaluate input data in parallel.
+    const int64 num_data = input_data.shape().dim_size(0);
+    std::unique_ptr<InputDataResult[]> results(new InputDataResult[num_data]);
+    auto worker_threads = context->device()->tensorflow_cpu_worker_threads();
+    int num_threads = worker_threads->num_threads;
+    if (num_threads <= 1) {
+      Evaluate(input_data, input_labels, tree_tensor, tree_thresholds,
+               node_to_accumulator, candidate_split_features,
+               candidate_split_thresholds, results.get(), 0, num_data);
+    } else {
+      auto work = [&input_data, &input_labels, &tree_tensor, &tree_thresholds,
+                   &node_to_accumulator, &candidate_split_features,
+                   &candidate_split_thresholds, &num_data,
+                   &results](int64 start, int64 end) {
+        CHECK(start <= end);
+        CHECK(end <= num_data);
+        Evaluate(input_data, input_labels, tree_tensor, tree_thresholds,
+                 node_to_accumulator, candidate_split_features,
+                 candidate_split_thresholds, results.get(), start, end);
+      };
+      Shard(num_threads, worker_threads->workers, num_data, 100, work);
+    }
+
+    // Set output tensors.
+    const auto labels = input_labels.unaligned_flat<int32>();

    // node pcw delta
    Tensor* output_node_pcw_delta = nullptr;
@ -196,58 +288,28 @@ class CountExtremelyRandomStats : public OpKernel {
                                            &output_leaves));
    auto out_leaves = output_leaves->unaligned_flat<int32>();

-    const auto tree = tree_tensor.tensor<int32, 2>();
-    const auto thresholds = tree_thresholds.unaligned_flat<float>();
-    const auto labels = input_labels.unaligned_flat<int32>();
-    const auto node_map = node_to_accumulator.unaligned_flat<int32>();
-    const auto split_features = candidate_split_features.tensor<int32, 2>();
-    const auto split_thresholds = candidate_split_thresholds.tensor<float, 2>();
-
-    const int32 num_data = input_data.shape().dim_size(0);
-
    // <accumulator, class> -> count delta
    std::unordered_map<pair<int32, int32>, int32, PairIntHash> total_delta;
    // <accumulator, split, class> -> count delta
    std::unordered_map<tuple<int32, int32, int32>,
        int32, TupleIntHash> split_delta;
-    for (int i = 0; i < num_data; i++) {
-      const Tensor point = input_data.Slice(i, i+1);
-      int node_index = 0;
-      while (true) {
-        const int32 label = labels(i);
-        ++out_node(node_index, label);
-        int32 left_child = tree(node_index, CHILDREN_INDEX);
-        if (left_child == LEAF_NODE) {
-          out_leaves(i) = node_index;
-          const int32 accumulator = node_map(node_index);
-          // If the leaf is not fertile or is not yet initialized, we don't
-          // count it in the candidate/total split per-class-weights because
-          // it won't have any candidate splits yet.
-          if (accumulator >= 0 &&
-              IsAllInitialized(
-                  candidate_split_features.Slice(accumulator,
-                                                 accumulator + 1))) {
-            ++total_delta[std::make_pair(accumulator, label)];
-            for (int split = 0; split < num_splits; split++) {
-              if (!DecideNode(point, split_features(accumulator, split),
-                              split_thresholds(accumulator, split))) {
-                ++split_delta[make_tuple(accumulator, split, label)];
-              }
-            }
-          }
-          break;
-        } else if (left_child == FREE_NODE) {
-          LOG(ERROR) << "Reached a free node, not good.";
-          out_leaves(i) = FREE_NODE;
-          break;
+
+    for (int32 i = 0; i < num_data; ++i) {
+      const int32 label = labels(i);
+      const int32 accumulator = results[i].leaf_accumulator;
+      for (const int32 node : results[i].node_indices) {
+        ++out_node(node, label);
+      }
+      out_leaves(i) = results[i].node_indices.back();
+      if (accumulator >= 0 && results[i].splits_initialized) {
+        ++total_delta[make_pair(accumulator, label)];
+        for (const int32 split : results[i].split_adds) {
+          ++split_delta[make_tuple(accumulator, split, label)];
        }
-        node_index = left_child +
-            DecideNode(point, tree(node_index, FEATURE_INDEX),
-                       thresholds(node_index));
      }
    }

-     // candidate splits pcw indices
+    // candidate splits pcw indices
    Tensor* output_candidate_pcw_indices = nullptr;
    TensorShape candidate_pcw_shape;
    candidate_pcw_shape.AddDim(split_delta.size());
--- a/tensorflow/contrib/tensor_forest/core/ops/sample_inputs_op.cc
+++ b/tensorflow/contrib/tensor_forest/core/ops/sample_inputs_op.cc
@ -94,7 +94,7 @@ class SampleInputs : public OpKernel {
        "split_sampling_random_seed", &split_sampling_random_seed_));
    // Set up the random number generator.
    if (split_sampling_random_seed_ == 0) {
-      uint64 time_seed = static_cast<uint64>(std::time(NULL));
+      uint64 time_seed = static_cast<uint64>(std::clock());
      single_rand_ = std::unique_ptr<random::PhiloxRandom>(
          new random::PhiloxRandom(time_seed));
    } else {
--- a/tensorflow/contrib/tensor_forest/core/ops/tree_predictions_op.cc
+++ b/tensorflow/contrib/tensor_forest/core/ops/tree_predictions_op.cc
@ -44,9 +44,9 @@ REGISTER_OP("TreePredictions")

  input_data: The training batch's features as a 2-d tensor; `input_data[i][j]`
   gives the j-th feature of the i-th input.
-  tree:= A 2-d int32 tensor.  `tree[0][i]` gives the index of the left child
-   of the i-th node, `tree[0][i] + 1` gives the index of the right child of
-   the i-th node, and `tree[1][i]` gives the index of the feature used to
+  tree:= A 2-d int32 tensor.  `tree[i][0]` gives the index of the left child
+   of the i-th node, `tree[i][0] + 1` gives the index of the right child of
+   the i-th node, and `tree[i][1]` gives the index of the feature used to
   split the i-th node.
  tree_thresholds: `tree_thresholds[i]` is the value used to split the i-th
   node.
--- a/tensorflow/contrib/tensor_forest/python/kernel_tests/count_extremely_random_stats_op_test.py
+++ b/tensorflow/contrib/tensor_forest/python/kernel_tests/count_extremely_random_stats_op_test.py
@ -17,7 +17,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

-import tensorflow  # pylint: disable=unused-import
+import tensorflow as tf

 from tensorflow.contrib.tensor_forest.python.ops import training_ops

@ -47,6 +47,29 @@ class CountExtremelyRandomStatsTest(test_util.TensorFlowTestCase):
               self.tree_thresholds, self.node_map,
               self.split_features, self.split_thresholds, num_classes=4))

+      self.assertAllEqual(
+          [[1., 1., 1., 1.], [1., 1., 0., 0.], [0., 0., 1., 1.]],
+          pcw_node.eval())
+      self.assertAllEqual([[0, 0, 0]], pcw_splits_indices.eval())
+      self.assertAllEqual([1.], pcw_splits_delta.eval())
+      self.assertAllEqual([[0, 1], [0, 0]], pcw_totals_indices.eval())
+      self.assertAllEqual([1., 1.], pcw_totals_delta.eval())
+      self.assertAllEqual([1, 1, 2, 2], leaves.eval())
+
+  def testThreaded(self):
+    with self.test_session(
+        config=tf.ConfigProto(intra_op_parallelism_threads=2)):
+      (pcw_node, pcw_splits_indices, pcw_splits_delta, pcw_totals_indices,
+       pcw_totals_delta,
+       leaves) = (self.ops.count_extremely_random_stats(self.input_data,
+                                                        self.input_labels,
+                                                        self.tree,
+                                                        self.tree_thresholds,
+                                                        self.node_map,
+                                                        self.split_features,
+                                                        self.split_thresholds,
+                                                        num_classes=4))
+
      self.assertAllEqual([[1., 1., 1., 1.], [1., 1., 0., 0.],
                           [0., 0., 1., 1.]],
                          pcw_node.eval())
--- a/tensorflow/contrib/tensor_forest/python/ops/inference_ops.py
+++ b/tensorflow/contrib/tensor_forest/python/ops/inference_ops.py
@ -49,12 +49,13 @@ def TreePredictions(op):
 # there's not yet any guarantee that the shared object exists.
 # In which case, "import tensorflow" will always crash, even for users that
 # never use contrib.
-def Load():
+def Load(library_base_dir=''):
  """Load the inference ops library and return the loaded module."""
  with _ops_lock:
    global _inference_ops
    if not _inference_ops:
-      data_files_path = tf.resource_loader.get_data_files_path()
+      data_files_path = os.path.join(library_base_dir,
+                                     tf.resource_loader.get_data_files_path())
      tf.logging.info('data path: %s', data_files_path)
      _inference_ops = tf.load_op_library(os.path.join(
          data_files_path, INFERENCE_OPS_FILE))
--- a/tensorflow/contrib/tensor_forest/python/ops/training_ops.py
+++ b/tensorflow/contrib/tensor_forest/python/ops/training_ops.py
@ -25,6 +25,7 @@ import tensorflow as tf
 from tensorflow.python.framework import ops
 from tensorflow.python.framework import tensor_shape

+
 TRAINING_OPS_FILE = '_training_ops.so'

 _training_ops = None
@ -96,12 +97,13 @@ def _UpdateFertileSlotsShape(unused_op):
 # there's not yet any guarantee that the shared object exists.
 # In which case, "import tensorflow" will always crash, even for users that
 # never use contrib.
-def Load():
+def Load(library_base_dir=''):
  """Load training ops library and return the loaded module."""
  with _ops_lock:
    global _training_ops
    if not _training_ops:
-      data_files_path = tf.resource_loader.get_data_files_path()
+      data_files_path = os.path.join(library_base_dir,
+                                     tf.resource_loader.get_data_files_path())
      tf.logging.info('data path: %s', data_files_path)
      _training_ops = tf.load_op_library(os.path.join(
          data_files_path, TRAINING_OPS_FILE))
--- a/tensorflow/contrib/tensor_forest/python/tensor_forest.py
+++ b/tensorflow/contrib/tensor_forest/python/tensor_forest.py
@ -25,19 +25,6 @@ from tensorflow.contrib.tensor_forest.python.ops import inference_ops
 from tensorflow.contrib.tensor_forest.python.ops import training_ops


-flags = tf.app.flags
-FLAGS = flags.FLAGS
-
-
-# Default parameter values.  These are all only used if the corresponding
-# parameter is not specified when constructing the ForestHParams.
-flags.DEFINE_integer('num_trees', 100, 'Number of trees in forest')
-flags.DEFINE_integer('max_nodes', 10000, 'Maxmimum number of tree nodes.')
-flags.DEFINE_float(
-    'samples_to_decide', 25.0,
-    'Only decide on a split, or only fully use a leaf, after this many '
-    'training samples have been seen.')
-
 # If tree[i][0] equals this value, then i is a leaf node.
 LEAF_NODE = -1

@ -57,7 +44,20 @@ LEAF_NODE = -1
 class ForestHParams(object):
  """A base class for holding hyperparameters and calculating good defaults."""

-  def __init__(self, **kwargs):
+  def __init__(self, num_trees=100, max_nodes=10000, bagging_fraction=1.0,
+               samples_to_decide=25, max_depth=0, num_splits_to_consider=0,
+               max_fertile_nodes=0, split_after_samples=0,
+               valid_leaf_threshold=0, **kwargs):
+    self.num_trees = num_trees
+    self.max_nodes = max_nodes
+    self.bagging_fraction = bagging_fraction
+    self.samples_to_decide = samples_to_decide
+    self.max_depth = max_depth
+    self.num_splits_to_consider = num_splits_to_consider
+    self.max_fertile_nodes = max_fertile_nodes
+    self.split_after_samples = split_after_samples
+    self.valid_leaf_threshold = valid_leaf_threshold
+
    for name, value in kwargs.items():
      setattr(self, name, value)

@ -69,19 +69,21 @@ class ForestHParams(object):
    # Fail fast if num_classes isn't set.
    _ = getattr(self, 'num_classes')

-    self.num_trees = getattr(self, 'num_trees', FLAGS.num_trees)
-    self.max_nodes = getattr(self, 'max_nodes', FLAGS.max_nodes)
+    self.training_library_base_dir = getattr(
+        self, 'training_library_base_dir', '')
+    self.inference_library_base_dir = getattr(
+        self, 'inference_library_base_dir', '')

    # Allow each tree to be unbalanced by up to a factor of 2.
-    self.max_depth = getattr(self, 'max_depth',
-                             int(2 * math.ceil(math.log(self.max_nodes, 2))))
+    self.max_depth = (self.max_depth or
+                      int(2 * math.ceil(math.log(self.max_nodes, 2))))

    # The Random Forest literature recommends sqrt(# features) for
    # classification problems, and p/3 for regression problems.
    # TODO(thomaswc): Consider capping this for large number of features.
-    if not getattr(self, 'num_splits_to_consider', None):
-      self.num_splits_to_consider = max(10, int(
-          math.ceil(math.sqrt(self.num_features))))
+    self.num_splits_to_consider = (
+        self.num_splits_to_consider or
+        max(10, int(math.ceil(math.sqrt(self.num_features)))))

    # max_fertile_nodes doesn't effect performance, only training speed.
    # We therefore set it primarily based upon space considerations.
@ -91,22 +93,19 @@ class ForestHParams(object):
    num_fertile = int(math.ceil(self.max_nodes / self.num_splits_to_consider))
    # But always use at least 1000 accumulate slots.
    num_fertile = max(num_fertile, 1000)
-    self.max_fertile_nodes = getattr(self, 'max_fertile_nodes', num_fertile)
+    self.max_fertile_nodes = self.max_fertile_nodes or num_fertile
    # But it also never needs to be larger than the number of leaves,
    # which is max_nodes / 2.
-    self.max_fertile_nodes = min(self.max_nodes,
-                                 int(math.ceil(self.max_fertile_nodes / 2.0)))
+    self.max_fertile_nodes = min(self.max_fertile_nodes,
+                                 int(math.ceil(self.max_nodes / 2.0)))

    # split_after_samples and valid_leaf_threshold should be about the same.
    # Therefore, if either is set, use it to set the other.  Otherwise, fall
-    # back on FLAGS.samples_to_decide.
-    samples_to_decide = (
-        getattr(self, 'split_after_samples',
-                getattr(self, 'valid_leaf_threshold', FLAGS.samples_to_decide)))
-    self.split_after_samples = getattr(self, 'split_after_samples',
-                                       samples_to_decide)
-    self.valid_leaf_threshold = getattr(self, 'valid_leaf_threshold',
-                                        samples_to_decide)
+    # back on samples_to_decide.
+    samples_to_decide = self.split_after_samples or self.samples_to_decide
+
+    self.split_after_samples = self.split_after_samples or samples_to_decide
+    self.valid_leaf_threshold = self.valid_leaf_threshold or samples_to_decide

    # We have num_splits_to_consider slots to fill, and we want to spend
    # approximately split_after_samples samples initializing them.
@ -184,23 +183,6 @@ class TreeStats(object):
    self.num_leaves = num_leaves


-def get_tree_stats(variables, unused_params, session):
-  num_nodes = variables.end_of_tree.eval(session=session) - 1
-  num_leaves = tf.where(
-      tf.equal(tf.squeeze(tf.slice(variables.tree, [0, 0], [-1, 1])),
-               LEAF_NODE)).eval(session=session).shape[0]
-  return TreeStats(num_nodes, num_leaves)
-
-
-def get_forest_stats(variables, params, session):
-
-  tree_stats = []
-  for i in range(params.num_trees):
-    tree_stats.append(get_tree_stats(variables[i], params, session))
-
-  return ForestStats(tree_stats, params)
-
-
 class ForestTrainingVariables(object):
  """A container for a forests training data, consisting of multiple trees.

@ -212,9 +194,11 @@ class ForestTrainingVariables(object):
    ... forest_variables.tree ...
  """

-  def __init__(self, params):
-    self.variables = [TreeTrainingVariables(params)
-                      for _ in range(params.num_trees)]
+  def __init__(self, params, device_assigner):
+    self.variables = []
+    for i in range(params.num_trees):
+      with tf.device(device_assigner.get_device(i)):
+        self.variables.append(TreeTrainingVariables(params))

  def __setitem__(self, t, val):
    self.variables[t] = val
@ -223,15 +207,41 @@ class ForestTrainingVariables(object):
    return self.variables[t]


+class RandomForestDeviceAssigner(object):
+  """A device assigner that uses the default device.
+
+  Write subclasses that implement get_device for control over how trees
+  get assigned to devices.  This assumes that whole trees are assigned
+  to a device.
+  """
+
+  def __init__(self):
+    self.cached = None
+
+  def get_device(self, unused_tree_num):
+    if not self.cached:
+      dummy = tf.constant(0)
+      self.cached = dummy.device
+
+    return self.cached
+
+
 class RandomForestGraphs(object):
  """Builds TF graphs for random forest training and inference."""

-  def __init__(self, params):
+  def __init__(self, params, device_assigner=None, variables=None):
    self.params = params
-    self.variables = ForestTrainingVariables(self.params)
-    self.trees = [RandomTreeGraphs(self.variables[i], self.params,
-                                   training_ops.Load(), inference_ops.Load())
-                  for i in range(self.params.num_trees)]
+    self.device_assigner = device_assigner or RandomForestDeviceAssigner()
+    tf.logging.info('Constructing forest with params = ')
+    tf.logging.info(self.params.__dict__)
+    self.variables = variables or ForestTrainingVariables(
+        self.params, device_assigner=self.device_assigner)
+    self.trees = [
+        RandomTreeGraphs(
+            self.variables[i], self.params,
+            training_ops.Load(self.params.training_library_base_dir),
+            inference_ops.Load(self.params.inference_library_base_dir))
+        for i in range(self.params.num_trees)]

  def training_graph(self, input_data, input_labels):
    """Constructs a TF graph for training a random forest.
@ -246,12 +256,26 @@ class RandomForestGraphs(object):
    """
    tree_graphs = []
    for i in range(self.params.num_trees):
-      tf.logging.info('Constructing tree %d', i)
-      seed = self.params.base_random_seed
-      if seed != 0:
-        seed += i
-      tree_graphs.append(self.trees[i].training_graph(
-          input_data, input_labels, seed))
+      with tf.device(self.device_assigner.get_device(i)):
+        seed = self.params.base_random_seed
+        if seed != 0:
+          seed += i
+        # If using bagging, randomly select some of the input.
+        tree_data = input_data
+        tree_labels = input_labels
+        if self.params.bagging_fraction < 1.0:
+          # TODO(thomaswc): This does sampling without replacment.  Consider
+          # also allowing sampling with replacement as an option.
+          batch_size = tf.slice(tf.shape(input_data), [0], [1])
+          r = tf.random_uniform(batch_size, seed=seed)
+          mask = tf.less(r, tf.ones_like(r) * self.params.bagging_fraction)
+          gather_indices = tf.squeeze(tf.where(mask), squeeze_dims=[1])
+          # TODO(thomaswc): Calculate out-of-bag data and labels, and store
+          # them for use in calculating statistics later.
+          tree_data = tf.gather(input_data, gather_indices)
+          tree_labels = tf.gather(input_labels, gather_indices)
+        tree_graphs.append(
+            self.trees[i].training_graph(tree_data, tree_labels, seed))
    return tf.group(*tree_graphs)

  def inference_graph(self, input_data):
@ -265,9 +289,23 @@ class RandomForestGraphs(object):
    """
    probabilities = []
    for i in range(self.params.num_trees):
-      probabilities.append(self.trees[i].inference_graph(input_data))
-    all_predict = tf.pack(probabilities)
-    return tf.reduce_sum(all_predict, 0) / self.params.num_trees
+      with tf.device(self.device_assigner.get_device(i)):
+        probabilities.append(self.trees[i].inference_graph(input_data))
+    with tf.device(self.device_assigner.get_device(0)):
+      all_predict = tf.pack(probabilities)
+      return tf.reduce_sum(all_predict, 0) / self.params.num_trees
+
+  def average_size(self):
+    """Constructs a TF graph for evaluating the average size of a forest.
+
+    Returns:
+      The average number of nodes over the trees.
+    """
+    sizes = []
+    for i in range(self.params.num_trees):
+      with tf.device(self.device_assigner.get_device(i)):
+        sizes.append(self.trees[i].size())
+    return tf.reduce_mean(tf.pack(sizes))

  def average_impurity(self):
    """Constructs a TF graph for evaluating the leaf impurity of a forest.
@ -277,9 +315,17 @@ class RandomForestGraphs(object):
    """
    impurities = []
    for i in range(self.params.num_trees):
-      impurities.append(self.trees[i].average_impurity(self.variables[i]))
+      with tf.device(self.device_assigner.get_device(i)):
+        impurities.append(self.trees[i].average_impurity())
    return tf.reduce_mean(tf.pack(impurities))

+  def get_stats(self, session):
+    tree_stats = []
+    for i in range(self.params.num_trees):
+      with tf.device(self.device_assigner.get_device(i)):
+        tree_stats.append(self.trees[i].get_stats(session))
+    return ForestStats(tree_stats, self.params)
+

 class RandomTreeGraphs(object):
  """Builds TF graphs for random tree training and inference."""
@ -394,6 +440,7 @@ class RandomTreeGraphs(object):
    with tf.control_dependencies([node_update_op]):
      def f1():
        return self.variables.non_fertile_leaf_scores
+
      def f2():
        counts = tf.gather(self.variables.node_per_class_weights,
                           self.variables.non_fertile_leaves)
@ -535,3 +582,18 @@ class RandomTreeGraphs(object):
    counts = tf.gather(self.variables.node_per_class_weights, leaves)
    impurity = self._weighted_gini(counts)
    return tf.reduce_sum(impurity) / tf.reduce_sum(counts + 1.0)
+
+  def size(self):
+    """Constructs a TF graph for evaluating the current number of nodes.
+
+    Returns:
+      The current number of nodes in the tree.
+    """
+    return self.variables.end_of_tree - 1
+
+  def get_stats(self, session):
+    num_nodes = self.variables.end_of_tree.eval(session=session) - 1
+    num_leaves = tf.where(
+        tf.equal(tf.squeeze(tf.slice(self.variables.tree, [0, 0], [-1, 1])),
+                 LEAF_NODE)).eval(session=session).shape[0]
+    return TreeStats(num_nodes, num_leaves)
--- a/tensorflow/contrib/tensor_forest/python/tensor_forest_test.py
+++ b/tensorflow/contrib/tensor_forest/python/tensor_forest_test.py
@ -27,6 +27,37 @@ from tensorflow.python.platform import googletest

 class TensorForestTest(test_util.TensorFlowTestCase):

+  def testForestHParams(self):
+    hparams = tensor_forest.ForestHParams(
+        num_classes=2, num_trees=100, max_nodes=1000,
+        num_features=60).fill()
+    self.assertEquals(2, hparams.num_classes)
+    # 2 * ceil(log_2(1000)) = 20
+    self.assertEquals(20, hparams.max_depth)
+    # sqrt(num_features) < 10, so num_splits_to_consider should be 10.
+    self.assertEquals(10, hparams.num_splits_to_consider)
+    # Don't have more fertile nodes than max # leaves, which is 500.
+    self.assertEquals(500, hparams.max_fertile_nodes)
+    # We didn't set either of these, so they should be equal
+    self.assertEquals(hparams.split_after_samples,
+                      hparams.valid_leaf_threshold)
+    # split_after_samples is larger than 10
+    self.assertEquals(1, hparams.split_initializations_per_input)
+    self.assertEquals(0, hparams.base_random_seed)
+
+  def testForestHParamsBigTree(self):
+    hparams = tensor_forest.ForestHParams(
+        num_classes=2, num_trees=100, max_nodes=1000000,
+        split_after_samples=25,
+        num_features=1000).fill()
+    self.assertEquals(40, hparams.max_depth)
+    # sqrt(1000) = 31.63...
+    self.assertEquals(32, hparams.num_splits_to_consider)
+    # 1000000 / 32 = 31250
+    self.assertEquals(31250, hparams.max_fertile_nodes)
+    # floor(31.63 / 25) = 1
+    self.assertEquals(1, hparams.split_initializations_per_input)
+
  def testTrainingConstruction(self):
    input_data = [[-1., 0.], [-1., 2.],  # node 1
                  [1., 0.], [1., -2.]]  # node 2
@ -50,6 +81,14 @@ class TensorForestTest(test_util.TensorFlowTestCase):
    graph = graph_builder.inference_graph(input_data)
    self.assertTrue(isinstance(graph, tf.Tensor))

+  def testImpurityConstruction(self):
+    params = tensor_forest.ForestHParams(
+        num_classes=4, num_features=2, num_trees=10, max_nodes=1000).fill()
+
+    graph_builder = tensor_forest.RandomForestGraphs(params)
+    graph = graph_builder.average_impurity()
+    self.assertTrue(isinstance(graph, tf.Tensor))
+

 if __name__ == '__main__':
  googletest.main()
--- a/tensorflow/core/BUILD
+++ b/tensorflow/core/BUILD
@ -143,7 +143,6 @@ cc_library(
        "lib/core/bits.h",
        "lib/core/casts.h",
        "lib/core/coding.h",
-        "lib/core/command_line_flags.h",  # TODO(vrv): Delete.
        "lib/core/errors.h",
        "lib/core/notification.h",
        "lib/core/status.h",
--- a/tensorflow/core/framework/tensor.cc
+++ b/tensorflow/core/framework/tensor.cc
@ -35,6 +35,7 @@ limitations under the License.
 #include "tensorflow/core/framework/types.h"
 #include "tensorflow/core/lib/core/coding.h"
 #include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
 #include "tensorflow/core/lib/gtl/stl_util.h"
 #include "tensorflow/core/lib/strings/str_util.h"
 #include "tensorflow/core/lib/strings/strcat.h"
@ -713,4 +714,36 @@ void Tensor::FillDescription(TensorDescription* description) const {
  }
 }

+gtl::InlinedVector<int64, 5> Tensor::ComputeFlatInnerDims(
+    int64 num_out_dims) const {
+  gtl::InlinedVector<int64, 5> out_dims(num_out_dims, 0);
+  const int64 num_elements = NumElements();
+  if (num_elements != 0) {
+    int64 prod_out_dims = 1;
+    for (int64 out_dim = num_out_dims - 1; out_dim > 0; --out_dim) {
+      const int64 in_dim = out_dim + (dims() - num_out_dims);
+      out_dims[out_dim] =
+          (in_dim >= dims() || in_dim < 0) ? 1 : dim_size(in_dim);
+      prod_out_dims *= out_dims[out_dim];
+    }
+    out_dims[0] = num_elements / prod_out_dims;
+  }
+  return out_dims;
+}
+
+gtl::InlinedVector<int64, 5> Tensor::ComputeFlatOuterDims(
+    int64 num_out_dims) const {
+  gtl::InlinedVector<int64, 5> out_dims(num_out_dims, 0);
+  const int64 num_elements = NumElements();
+  if (num_elements != 0) {
+    int64 prod_out_dims = 1;
+    for (int64 out_dim = 0; out_dim < num_out_dims - 1; ++out_dim) {
+      out_dims[out_dim] = out_dim >= dims() ? 1 : dim_size(out_dim);
+      prod_out_dims *= out_dims[out_dim];
+    }
+    out_dims[num_out_dims - 1] = num_elements / prod_out_dims;
+  }
+  return out_dims;
+}
+
 }  // namespace tensorflow
--- a/tensorflow/core/framework/tensor.h
+++ b/tensorflow/core/framework/tensor.h
@ -28,6 +28,7 @@ limitations under the License.
 #include "tensorflow/core/lib/core/refcount.h"
 #include "tensorflow/core/lib/core/status.h"
 #include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
 #include "tensorflow/core/platform/logging.h"
 #include "tensorflow/core/platform/macros.h"
 #include "tensorflow/core/platform/types.h"
@ -243,40 +244,28 @@ class Tensor {
  ///
  /// ```
  template <typename T>
-  typename TTypes<T>::Flat flat();
+  typename TTypes<T>::Flat flat() {
+    return shaped<T, 1>({NumElements()});
+  }

  template <typename T>
  typename TTypes<T>::UnalignedFlat unaligned_flat() {
    return unaligned_shaped<T, 1>({NumElements()});
  }

-  /// Returns the data as an Eigen::Tensor with 2 dimensions, collapsing all
-  /// Tensor dimensions but the last one into the first dimension of the result.
-  template <typename T>
-  typename TTypes<T>::Matrix flat_inner_dims() {
-    int64 last_size = dims() > 0 ? dim_size(dims() - 1) : 1;
-    if (last_size == 0) {
-      DCHECK_EQ(NumElements(), 0);
-      // Return something empty, avoiding divide by 0
-      return shaped<T, 2>({0, 0});
-    } else {
-      return shaped<T, 2>({NumElements() / last_size, last_size});
-    }
-  }
+  /// Returns the data as an Eigen::Tensor with NDIMS dimensions, collapsing all
+  /// Tensor dimensions but the last NDIMS-1 into the first dimension of the
+  /// result. If NDIMS > dims() then leading dimensions of size 1 will be
+  /// added to make the output rank NDIMS.
+  template <typename T, size_t NDIMS = 2>
+  typename TTypes<T, NDIMS>::Tensor flat_inner_dims();

-  /// Returns the data as an Eigen::Tensor with 2 dimensions, collapsing all
-  /// Tensor dimensions but the first one into the last dimension of the result.
-  template <typename T>
-  typename TTypes<T>::Matrix flat_outer_dims() {
-    int64 first_size = dims() > 0 ? dim_size(0) : 1;
-    if (first_size == 0) {
-      DCHECK_EQ(NumElements(), 0);
-      // Return something empty, avoiding divide by 0
-      return shaped<T, 2>({0, 0});
-    } else {
-      return shaped<T, 2>({first_size, NumElements() / first_size});
-    }
-  }
+  /// Returns the data as an Eigen::Tensor with NDIMS dimensions, collapsing all
+  /// Tensor dimensions but the first NDIMS-1 into the last dimension of the
+  /// result. If NDIMS > dims() then trailing dimensions of size 1 will be
+  /// added to make the output rank NDIMS.
+  template <typename T, size_t NDIMS = 2>
+  typename TTypes<T, NDIMS>::Tensor flat_outer_dims();

  template <typename T, size_t NDIMS>
  typename TTypes<T, NDIMS>::Tensor shaped(gtl::ArraySlice<int64> new_sizes);
@ -308,31 +297,19 @@ class Tensor {
  typename TTypes<T, NDIMS>::ConstTensor tensor() const;

  template <typename T>
-  typename TTypes<T>::ConstFlat flat() const;
+  typename TTypes<T>::ConstFlat flat() const {
+    return shaped<T, 1>({NumElements()});
+  }

  template <typename T>
  typename TTypes<T>::UnalignedConstFlat unaligned_flat() const {
    return unaligned_shaped<T, 1>({NumElements()});
  }

-  template <typename T>
-  typename TTypes<T>::ConstMatrix flat_inner_dims() const {
-    int64 last_size = dims() > 0 ? dim_size(dims() - 1) : 1;
-    if (last_size == 0) {
-      DCHECK_EQ(NumElements(), 0);
-      // Return something empty, avoiding divide by 0
-      return shaped<T, 2>({0, 0});
-    } else {
-      return shaped<T, 2>({NumElements() / last_size, last_size});
-    }
-  }
-
-  template <typename T>
-  typename TTypes<T>::ConstMatrix flat_outer_dims() const;
-
  template <typename T, size_t NDIMS>
  typename TTypes<T, NDIMS>::ConstTensor shaped(
      gtl::ArraySlice<int64> new_sizes) const;
+
  template <typename T, size_t NDIMS>
  typename TTypes<T, NDIMS>::UnalignedConstTensor unaligned_shaped(
      gtl::ArraySlice<int64> new_sizes) const;
@ -340,6 +317,12 @@ class Tensor {
  template <typename T>
  typename TTypes<T>::ConstScalar scalar() const;

+  template <typename T, size_t NDIMS = 2>
+  typename TTypes<T, NDIMS>::ConstTensor flat_inner_dims() const;
+
+  template <typename T, size_t NDIMS = 2>
+  typename TTypes<T, NDIMS>::ConstTensor flat_outer_dims() const;
+
  /// Render the first `max_entries` values in `*this` into a string.
  string SummarizeValue(int64 max_entries) const;

@ -378,6 +361,8 @@ class Tensor {
  void FillDimsAndValidateCompatibleShape(
      gtl::ArraySlice<int64> new_sizes,
      Eigen::array<Eigen::DenseIndex, NDIMS>* dims) const;
+  gtl::InlinedVector<int64, 5> ComputeFlatInnerDims(int64 num_out_dims) const;
+  gtl::InlinedVector<int64, 5> ComputeFlatOuterDims(int64 num_out_dims) const;

  TensorShape shape_;
  TensorBuffer* buf_;
@ -534,26 +519,24 @@ typename TTypes<T>::ConstScalar Tensor::scalar() const {
  return typename TTypes<T>::ConstScalar(base<T>());
 }

-template <typename T>
-typename TTypes<T>::Flat Tensor::flat() {
-  return shaped<T, 1>({NumElements()});
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::Tensor Tensor::flat_inner_dims() {
+  return shaped<T, NDIMS>(ComputeFlatInnerDims(NDIMS));
 }

-template <typename T>
-typename TTypes<T>::ConstFlat Tensor::flat() const {
-  return shaped<T, 1>({NumElements()});
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::Tensor Tensor::flat_outer_dims() {
+  return shaped<T, NDIMS>(ComputeFlatOuterDims(NDIMS));
 }

-template <typename T>
-typename TTypes<T>::ConstMatrix Tensor::flat_outer_dims() const {
-  int64 first_size = dims() > 0 ? dim_size(0) : 1;
-  if (first_size == 0) {
-    DCHECK_EQ(NumElements(), 0);
-    // Return something empty, avoiding divide by 0
-    return shaped<T, 2>({0, 0});
-  } else {
-    return shaped<T, 2>({first_size, NumElements() / first_size});
-  }
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::ConstTensor Tensor::flat_inner_dims() const {
+  return shaped<T, NDIMS>(ComputeFlatInnerDims(NDIMS));
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::ConstTensor Tensor::flat_outer_dims() const {
+  return shaped<T, NDIMS>(ComputeFlatOuterDims(NDIMS));
 }

 }  // namespace tensorflow
--- a/tensorflow/core/framework/tensor_test.cc
+++ b/tensorflow/core/framework/tensor_test.cc
@ -224,6 +224,49 @@ TEST(Tensor_Float, Reshape) {
    EXPECT_EQ(flat_inner_dims(0, 0), 0.01f);
    EXPECT_EQ(flat_inner_dims(23, 4), 0.02f);
  }
+  {
+    auto flat_outer_dims = t.flat_outer_dims<float>();
+    EXPECT_EQ(2, flat_outer_dims.dimension(0));
+    EXPECT_EQ(60, flat_outer_dims.dimension(1));
+    EXPECT_EQ(flat_outer_dims(0, 0), 0.01f);
+    EXPECT_EQ(flat_outer_dims(1, 59), 0.02f);
+  }
+  {
+    auto flat_inner_dims = t.flat_inner_dims<float, 3>();
+    EXPECT_EQ(6, flat_inner_dims.dimension(0));
+    EXPECT_EQ(4, flat_inner_dims.dimension(1));
+    EXPECT_EQ(5, flat_inner_dims.dimension(2));
+    EXPECT_EQ(flat_inner_dims(0, 0, 0), 0.01f);
+    EXPECT_EQ(flat_inner_dims(5, 3, 4), 0.02f);
+  }
+  {
+    auto flat_outer_dims = t.flat_outer_dims<float, 3>();
+    EXPECT_EQ(2, flat_outer_dims.dimension(0));
+    EXPECT_EQ(3, flat_outer_dims.dimension(1));
+    EXPECT_EQ(20, flat_outer_dims.dimension(2));
+    EXPECT_EQ(flat_outer_dims(0, 0, 0), 0.01f);
+    EXPECT_EQ(flat_outer_dims(1, 2, 19), 0.02f);
+  }
+  {
+    auto flat_inner_dims = t.flat_inner_dims<float, 5>();
+    EXPECT_EQ(1, flat_inner_dims.dimension(0));
+    EXPECT_EQ(2, flat_inner_dims.dimension(1));
+    EXPECT_EQ(3, flat_inner_dims.dimension(2));
+    EXPECT_EQ(4, flat_inner_dims.dimension(3));
+    EXPECT_EQ(5, flat_inner_dims.dimension(4));
+    EXPECT_EQ(flat_inner_dims(0, 0, 0, 0, 0), 0.01f);
+    EXPECT_EQ(flat_inner_dims(0, 1, 2, 3, 4), 0.02f);
+  }
+  {
+    auto flat_outer_dims = t.flat_outer_dims<float, 5>();
+    EXPECT_EQ(2, flat_outer_dims.dimension(0));
+    EXPECT_EQ(3, flat_outer_dims.dimension(1));
+    EXPECT_EQ(4, flat_outer_dims.dimension(2));
+    EXPECT_EQ(5, flat_outer_dims.dimension(3));
+    EXPECT_EQ(1, flat_outer_dims.dimension(4));
+    EXPECT_EQ(flat_outer_dims(0, 0, 0, 0, 0), 0.01f);
+    EXPECT_EQ(flat_outer_dims(1, 2, 3, 4, 0), 0.02f);
+  }
 }

 TEST(Tensor_Scalar, Basics) {
--- a/tensorflow/core/kernels/BUILD
+++ b/tensorflow/core/kernels/BUILD
@ -305,6 +305,23 @@ tf_kernel_libraries(
    ],
 )

+tf_cc_test(
+    name = "batch_norm_op_test",
+    size = "small",
+    deps = [
+        ":batch_norm_op",
+        ":ops_testutil",
+        ":ops_util",
+        "//tensorflow/core:core_cpu",
+        "//tensorflow/core:framework",
+        "//tensorflow/core:lib",
+        "//tensorflow/core:protos_all_cc",
+        "//tensorflow/core:test",
+        "//tensorflow/core:test_main",
+        "//tensorflow/core:testlib",
+    ],
+)
+
 tf_cc_test(
    name = "concat_op_test",
    size = "small",
--- a/tensorflow/core/kernels/batch_norm_op_test.cc
+++ b/tensorflow/core/kernels/batch_norm_op_test.cc
@ -0,0 +1,62 @@
+/* Copyright 2015 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include <vector>
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/platform/test.h"
+
+namespace tensorflow {
+
+class BatchNormOpTest : public OpsTestBase {};
+
+TEST_F(BatchNormOpTest, Simple) {
+  TF_EXPECT_OK(
+      NodeDefBuilder("batch_norm_op", "BatchNormWithGlobalNormalization")
+          .Input(FakeInput(DT_FLOAT))
+          .Input(FakeInput(DT_FLOAT))
+          .Input(FakeInput(DT_FLOAT))
+          .Input(FakeInput(DT_FLOAT))
+          .Input(FakeInput(DT_FLOAT))
+          .Attr("scale_after_normalization", false)
+          .Attr("variance_epsilon", 0.001)
+          .Finalize(node_def()));
+  TF_EXPECT_OK(InitOpWithGraphVersion(8));
+  AddInputFromArray<float>(TensorShape({1, 1, 6, 2}),
+                           {1, 4, 2, 5, 3, 6, -1, -4, -2, -5, -3, -6});
+  AddInputFromArray<float>(TensorShape({2}), {10, 20});
+  AddInputFromArray<float>(TensorShape({2}), {0.25, 0.5});
+  AddInputFromArray<float>(TensorShape({2}), {0.1, 0.6});
+  AddInputFromArray<float>(TensorShape({2}), {0.0, 0.0});
+  TF_ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 1, 6, 2}));
+  test::FillValues<float>(
+      &expected, {-17.86, -22.00, -15.87, -20.59, -13.87, -19.18, -21.86,
+                  -33.31, -23.85, -34.72, -25.85, -36.13});
+  test::ExpectTensorNear<float>(expected, *GetOutput(0), 0.01);
+}
+
+}  // namespace tensorflow
--- a/tensorflow/core/kernels/ops_testutil.h
+++ b/tensorflow/core/kernels/ops_testutil.h
@ -94,10 +94,13 @@ class OpsTestBase : public ::testing::Test {
  // and output types as output.
  //
  // Returns the status of initialization.
-  Status InitOp() {
+  Status InitOp() { return InitOpWithGraphVersion(TF_GRAPH_DEF_VERSION); }
+
+  // Only use this directly if you have a deprecated op that you need to test.
+  Status InitOpWithGraphVersion(int graph_def_version) {
    Status status;
    kernel_ = CreateOpKernel(device_type_, device_.get(), allocator(),
-                             node_def_, TF_GRAPH_DEF_VERSION, &status);
+                             node_def_, graph_def_version, &status);
    if (kernel_ != nullptr) input_types_ = kernel_->input_types();
    return status;
  }
--- a/tensorflow/core/kernels/sparse_tensor_dense_matmul_op.h
+++ b/tensorflow/core/kernels/sparse_tensor_dense_matmul_op.h
@ -66,7 +66,7 @@ EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T MaybeConj(T v) {
 #define MAYBE_CONJ(T)                                         \
  template <>                                                 \
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T MaybeConj<T>(T v) { \
-    return std::conj(v);                                      \
+    return Eigen::numext::conj(v);                            \
  }
 #endif

--- a/tensorflow/core/lib/core/command_line_flags.cc
+++ b/tensorflow/core/lib/core/command_line_flags.cc
@ -1,121 +0,0 @@
-/* Copyright 2015 Google Inc. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-
-#include "tensorflow/core/lib/core/command_line_flags.h"
-
-#include "tensorflow/core/lib/strings/str_util.h"
-#include "tensorflow/core/lib/strings/strcat.h"
-#include "tensorflow/core/lib/strings/stringprintf.h"
-
-namespace tensorflow {
-namespace {
-
-// Templated function to convert a string to target values.
-// Return true if the conversion is successful. Otherwise, return false.
-template <typename T>
-bool StringToValue(const string& content, T* value);
-
-template <>
-bool StringToValue<int32>(const string& content, int32* value) {
-  return strings::safe_strto32(content, value);
-}
-
-template <>
-bool StringToValue<string>(const string& content, string* value) {
-  *value = content;
-  return true;
-}
-
-// Parse a single argument by linearly searching through the command table.
-// The input format is: --argument=value.
-// Return OK if the argument is used. It store the extracted value into the
-// matching flag.
-// Return NOT_FOUND if the argument is not recognized.
-// Return INVALID_ARGUMENT if the command is recognized, but fails to extract
-// its value.
-template <typename T>
-Status ParseArgument(const string& argument) {
-  for (auto& command :
-       internal::CommandLineFlagRegistry<T>::Instance()->commands) {
-    string prefix = strings::StrCat("--", command.name, "=");
-    if (tensorflow::StringPiece(argument).starts_with(prefix)) {
-      string content = argument.substr(prefix.length());
-      if (StringToValue<T>(content, command.value)) {
-        return Status::OK();
-      }
-      return Status(error::INVALID_ARGUMENT,
-                    strings::StrCat("Cannot parse integer in: ", argument));
-    }
-  }
-  return Status(error::NOT_FOUND,
-                strings::StrCat("Unknown command: ", argument));
-}
-
-// A specialization for booleans. The input format is:
-//   "--argument" or "--noargument".
-// Parse a single argument by linearly searching through the command table.
-// Return OK if the argument is used. The value is stored in the matching flag.
-// Return NOT_FOUND if the argument is not recognized.
-template <>
-Status ParseArgument<bool>(const string& argument) {
-  for (auto& command :
-       internal::CommandLineFlagRegistry<bool>::Instance()->commands) {
-    if (argument == strings::StrCat("--", command.name)) {
-      *command.value = true;
-      return Status::OK();
-    } else if (argument == strings::StrCat("--no", command.name)) {
-      *command.value = false;
-      return Status::OK();
-    }
-  }
-  return Status(error::NOT_FOUND,
-                strings::StrCat("Unknown command: ", argument));
-}
-
-}  // namespace
-
-Status ParseCommandLineFlags(int* argc, char* argv[]) {
-  int unused_argc = 1;
-  for (int index = 1; index < *argc; ++index) {
-    Status s;
-    // Search bool commands.
-    s = ParseArgument<bool>(argv[index]);
-    if (s.ok()) {
-      continue;
-    }
-    if (s.code() != error::NOT_FOUND) {
-      return s;
-    }
-    // Search int32 commands.
-    s = ParseArgument<int32>(argv[index]);
-    if (s.ok()) {
-      continue;
-    }
-    // Search string commands.
-    s = ParseArgument<string>(argv[index]);
-    if (s.ok()) {
-      continue;
-    }
-    if (s.code() != error::NOT_FOUND) {
-      return s;
-    }
-    // Pointer swap the unused argument to the front.
-    std::swap(argv[unused_argc++], argv[index]);
-  }
-  *argc = unused_argc;
-  return Status::OK();
-}
-
-}  // namespace tensorflow
--- a/tensorflow/core/lib/core/command_line_flags.h
+++ b/tensorflow/core/lib/core/command_line_flags.h
@ -1,80 +0,0 @@
-/* Copyright 2015 Google Inc. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-
-#ifndef TENSORFLOW_LIB_CORE_COMMAND_LINE_FLAGS_H_
-#define TENSORFLOW_LIB_CORE_COMMAND_LINE_FLAGS_H_
-
-#include <vector>
-#include "tensorflow/core/lib/core/status.h"
-#include "tensorflow/core/platform/macros.h"
-#include "tensorflow/core/platform/types.h"
-
-namespace tensorflow {
-namespace internal {
-
-template <typename T>
-struct CommandLineFlagRegistry {
-  static CommandLineFlagRegistry* Instance() {
-    static CommandLineFlagRegistry instance_;
-    return &instance_;
-  }
-  struct Command {
-    string name;
-    T* value;
-    string text;
-  };
-  std::vector<Command> commands;
-
- private:
-  CommandLineFlagRegistry() {}
-  TF_DISALLOW_COPY_AND_ASSIGN(CommandLineFlagRegistry);
-};
-
-template <typename T>
-struct CommandLineFlagRegister {
-  CommandLineFlagRegister(const string& name, T* val, const string& text) {
-    CommandLineFlagRegistry<T>::Instance()->commands.push_back(
-        {name, val, text});
-  }
-};
-
-#define TF_DEFINE_variable(type, name, default_value, text)     \
-  type FLAGS_##name = default_value;                            \
-  namespace TF_flags_internal {                                 \
-  tensorflow::internal::CommandLineFlagRegister<type>           \
-      TF_flags_internal_var_##name(#name, &FLAGS_##name, text); \
-  }  // namespace TF_flags_internal
-
-}  // namespace internal
-
-#define TF_DEFINE_int32(name, default_value, text) \
-  TF_DEFINE_variable(tensorflow::int32, name, default_value, text);
-
-#define TF_DEFINE_bool(name, default_value, text) \
-  TF_DEFINE_variable(bool, name, default_value, text);
-
-#define TF_DEFINE_string(name, default_value, text) \
-  TF_DEFINE_variable(string, name, default_value, text);
-
-// Parse argv[1]..argv[*argc-1] to options. Remove used arguments from the argv.
-// Returned the number of unused arguments in *argc.
-// Return error Status if the parsing encounters errors.
-// TODO(opensource): switch to a command line argument parser that can be
-// shared with other tests.
-Status ParseCommandLineFlags(int* argc, char* argv[]);
-
-}  // namespace tensorflow
-
-#endif  // TENSORFLOW_LIB_CORE_COMMAND_LINE_FLAGS_H_
--- a/tensorflow/core/public/version.h
+++ b/tensorflow/core/public/version.h
@ -20,7 +20,7 @@ limitations under the License.

 #define TF_MAJOR_VERSION 0
 #define TF_MINOR_VERSION 8
-#define TF_PATCH_VERSION 0rc0
+#define TF_PATCH_VERSION 0

 // TF_VERSION_SUFFIX is non-empty for pre-releases (e.g. "-alpha", "-alpha.1",
 // "-beta", "-rc", "-rc.1")
--- a/tensorflow/g3doc/api_docs/python/state_ops.md
+++ b/tensorflow/g3doc/api_docs/python/state_ops.md
@ -120,9 +120,12 @@ variable to its initial value.
 ##### Args:


-*  <b>`initial_value`</b>: A `Tensor`, or Python object convertible to a `Tensor`.
-    The initial value for the Variable. Must have a shape specified unless
-    `validate_shape` is set to False.
+*  <b>`initial_value`</b>: A `Tensor`, or Python object convertible to a `Tensor`,
+    which is the initial value for the Variable. The initial value must have
+    a shape specified unless `validate_shape` is set to False. Can also be a
+    callable with no argument that returns the initial value when called. In
+    that case, `dtype` must be specified. (Note that initializer functions
+    from init_ops.py must first be bound to a shape before being used here.)
 *  <b>`trainable`</b>: If `True`, the default, also adds the variable to the graph
    collection `GraphKeys.TRAINABLE_VARIABLES`. This collection is used as
    the default list of variables to use by the `Optimizer` classes.
--- a/tensorflow/g3doc/api_docs/python/train.md
+++ b/tensorflow/g3doc/api_docs/python/train.md
@ -1955,6 +1955,7 @@ on the parameters to the constructor and may include:
 ##### Raises:


+*  <b>`RuntimeError`</b>: If called with a non-chief Supervisor.
 *  <b>`ValueError`</b>: If not `logdir` was passed to the constructor as the
    services need a log directory.

@ -2182,6 +2183,7 @@ on the parameters to the constructor and may include:
 ##### Raises:


+*  <b>`RuntimeError`</b>: If called with a non-chief Supervisor.
 *  <b>`ValueError`</b>: If not `logdir` was passed to the constructor as the
    services need a log directory.

@ -2409,7 +2411,7 @@ Start threads for `QueueRunners`.

 #### `tf.train.Supervisor.summary_op` {#Supervisor.summary_op}

-Return the Summary Tensor used by the supervisor.
+Return the Summary Tensor used by the chief supervisor.

 ##### Returns:

@ -2420,7 +2422,7 @@ Return the Summary Tensor used by the supervisor.

 #### `tf.train.Supervisor.summary_writer` {#Supervisor.summary_writer}

-Return the SummaryWriter used by the supervisor.
+Return the SummaryWriter used by the chief supervisor.

 ##### Returns:

--- a/tensorflow/g3doc/get_started/os_setup.md
+++ b/tensorflow/g3doc/get_started/os_setup.md
@ -7,8 +7,10 @@ github source.

 The TensorFlow Python API supports Python 2.7 and Python 3.3+.

-The GPU version (Linux only) requires the Cuda Toolkit >= 7.0 and cuDNN >=
-v2.  Please see [Cuda installation](#optional-install-cuda-gpus-on-linux)
+The GPU version (Linux only) works best with Cuda Toolkit 7.5 and
+cuDNN v4.  other versions are supported (Cuda toolkit >= 7.0 and
+cuDNN 6.5(v2), 7.0(v3), v5) only when installing from sources.
+Please see [Cuda installation](#optional-install-cuda-gpus-on-linux)
 for details.

 ## Overview
@ -20,10 +22,13 @@ We support different ways to install TensorFlow:
   Python programs on your machine.
 *  [Virtualenv install](#virtualenv-installation): Install TensorFlow in its own
   directory, not impacting any existing Python programs on your machine.
+*  [Anaconda install](#anaconda-environment-installation): Install TensorFlow in its own
+   environment for those running the Anaconda Python distribution.  Does not
+   impact existing Python programs on your machine.
 *  [Docker install](#docker-installation): Run TensorFlow in a Docker container
   isolated from all other programs on your machine.

-If you are familiar with Pip, Virtualenv, or Docker, please feel free to adapt
+If you are familiar with Pip, Virtualenv, Anaconda, or Docker, please feel free to adapt
 the instructions to your particular needs.  The names of the pip and Docker
 images are listed in the corresponding installation sections.

@ -53,28 +58,30 @@ Install TensorFlow:

 ```bash
 # Ubuntu/Linux 64-bit, CPU only:
-$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

-# Ubuntu/Linux 64-bit, GPU enabled:
-$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4.  For
+# other versions, see "Install from sources" below.
+$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

 # Mac OS X, CPU only:
 $ sudo easy_install --upgrade six
-$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0rc0-py2-none-any.whl
+$ sudo pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0-py2-none-any.whl
 ```

 For python3:

 ```bash
 # Ubuntu/Linux 64-bit, CPU only:
-$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl
+$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl

-# Ubuntu/Linux 64-bit, GPU enabled:
-$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl
+# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4.  For
+# other versions, see "Install from sources" below.
+$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl

 # Mac OS X, CPU only:
 $ sudo easy_install --upgrade six
-$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0rc0-py3-none-any.whl
+$ sudo pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0-py3-none-any.whl
 ```

 NOTE: If you are upgrading from a previous installation of TensorFlow < 0.7.1,
@ -126,13 +133,14 @@ $ source ~/tensorflow/bin/activate.csh  # If using csh
 (tensorflow)$  # Your prompt should change

 # Ubuntu/Linux 64-bit, CPU only:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

-# Ubuntu/Linux 64-bit, GPU enabled:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4.  For
+# other versions, see "Install from sources" below.
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

 # Mac OS X, CPU only:
-(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0rc0-py2-none-any.whl
+(tensorflow)$ pip install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0-py2-none-any.whl
 ```

 and again for python3:
@ -143,13 +151,14 @@ $ source ~/tensorflow/bin/activate.csh  # If using csh
 (tensorflow)$  # Your prompt should change

 # Ubuntu/Linux 64-bit, CPU only:
-(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl
+(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl

-# Ubuntu/Linux 64-bit, GPU enabled:
-(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl
+# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4.  For
+# other versions, see "Install from sources" below.
+(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl

 # Mac OS X, CPU only:
-(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0rc0-py3-none-any.whl
+(tensorflow)$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0-py3-none-any.whl
 ```

 With the Virtualenv environment activated, you can now
@ -175,6 +184,95 @@ $ source ~/tensorflow/bin/activate.csh  # If using csh.
 (tensorflow)$ deactivate
 ```

+## Anaconda environment installation
+
+[Anaconda](https://www.continuum.io/why-anaconda) is a Python distribution that
+includes a large number of standard numeric and scientific computing packages.
+Anaconda uses a package manager called "conda" that has its own 
+[environment system](http://conda.pydata.org/docs/using/envs.html) similar to Virtualenv.
+
+As with Virtualenv, conda environments keep the dependencies required by
+different Python projects in separate places.  The Anaconda environment
+installation of TensorFlow will not override pre-existing version of the Python
+packages needed by TensorFlow.
+
+*  Install Anaconda.
+*  Create a conda environment.
+*  Activate the conda environment and install TensorFlow in it.
+*  After the install you will activate the conda environment each time you
+   want to use TensorFlow.
+
+Install Anaconda:
+
+Follow the instructions on the [Anaconda download site](https://www.continuum.io/downloads)
+
+Create a conda environment called `tensorflow`:
+
+```bash
+# Python 2.7
+$ conda create -n tensorflow python=2.7
+
+# Python 3.5
+$ conda create -n tensorflow python=3.5
+```
+
+Activate the environment and use pip to install TensorFlow inside it.
+Use the `--ignore-installed` flag to prevent errors about `easy_install`.
+
+```bash
+$ source activate tensorflow
+(tensorflow)$  # Your prompt should change
+
+# Ubuntu/Linux 64-bit, CPU only:
+(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl
+
+# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4.  For
+# other versions, see "Install from sources" below.
+(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl
+
+# Mac OS X, CPU only:
+(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0-py2-none-any.whl
+```
+
+and again for Python 3:
+
+```bash
+$ source activate tensorflow
+(tensorflow)$  # Your prompt should change
+
+# Ubuntu/Linux 64-bit, CPU only:
+(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl
+
+# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4.  For
+# other versions, see "Install from sources" below.
+(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl
+
+# Mac OS X, CPU only:
+(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.8.0-py3-none-any.whl
+```
+
+With the conda environment activated, you can now
+[test your installation](#test-the-tensorflow-installation).
+
+When you are done using TensorFlow, deactivate the environment.
+
+```bash
+(tensorflow)$ source deactivate
+
+$  # Your prompt should change back
+```
+
+To use TensorFlow later you will have to activate the conda environment again:
+
+```bash
+$ source activate tensorflow
+(tensorflow)$  # Your prompt should change.
+# Run Python programs that use TensorFlow.
+...
+# When you are done using TensorFlow, deactivate the environment.
+(tensorflow)$ source deactivate
+```
+
 ## Docker installation

 [Docker](http://docker.com/) is a system to build self contained versions of a
@ -191,7 +289,7 @@ code.
 * `gcr.io/tensorflow/tensorflow:latest-devel-gpu`: GPU Binary image plus source
 code.

-We also have tags with `latest` replaced by a released version (e.g., `0.8.0rc0-gpu`).
+We also have tags with `latest` replaced by a released version (e.g., `0.8.0-gpu`).

 With Docker the installation is as follows:

@ -229,7 +327,7 @@ You can now [test your installation](#test-the-tensorflow-installation) within t
 ### (Optional, Linux) Enable GPU Support

 If you installed the GPU version of TensorFlow, you must also install the Cuda
-Toolkit 7.0 and cuDNN v2.  Please see [Cuda installation](#optional-install-cuda-gpus-on-linux).
+Toolkit 7.5 and cuDNN v4.  Please see [Cuda installation](#optional-install-cuda-gpus-on-linux).

 You also need to set the `LD_LIBRARY_PATH` and `CUDA_HOME` environment
 variables.  Consider adding the commands below to your `~/.bash_profile`.  These
@ -370,20 +468,25 @@ Supported cards include but are not limited to:

 https://developer.nvidia.com/cuda-downloads

+Install version 7.5 if using our binary releases.
+
 Install the toolkit into e.g. `/usr/local/cuda`

 ##### Download and install cuDNN

 https://developer.nvidia.com/cudnn

+Download cuDNN v4 (v5 is currently a release candidate and is only supported when
+installing TensorFlow from sources).
+
 Uncompress and copy the cuDNN files into the toolkit directory.  Assuming the
 toolkit is installed in `/usr/local/cuda`, run the following commands (edited
 to reflect the cuDNN version you downloaded):

 ``` bash
-tar xvzf cudnn-6.5-linux-x64-v2.tgz
-sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda/include
-sudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda/lib64
+tar xvzf cudnn-7.5-linux-x64-v4.tgz
+sudo cp cudnn-7.5-linux-x64-v4/cudnn.h /usr/local/cuda/include
+sudo cp cudnn-7.5-linux-x64-v4/libcudnn* /usr/local/cuda/lib64
 sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
 ```

@ -517,7 +620,7 @@ $ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_pack
 $ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

 # The name of the .whl file will depend on your platform.
-$ pip install /tmp/tensorflow_pkg/tensorflow-0.8.0rc0-py2-none-linux_x86_64.whl
+$ pip install /tmp/tensorflow_pkg/tensorflow-0.8.0-py2-none-linux_x86_64.whl
 ```

 ## Setting up TensorFlow for Development
--- a/tensorflow/g3doc/how_tos/distributed/index.md
+++ b/tensorflow/g3doc/how_tos/distributed/index.md
@ -8,7 +8,7 @@ writing TensorFlow programs.
 ## Hello distributed TensorFlow!

 This tutorial assumes that you are using a TensorFlow nightly build. You
-can test your installation by starting a local server as follows:
+can test your installation by starting and using a local server as follows:

 ```shell
 # Start a TensorFlow server as a single-process "cluster".
@ -16,29 +16,34 @@ $ python
 >>> import tensorflow as tf
 >>> c = tf.constant("Hello, distributed TensorFlow!")
 >>> server = tf.train.Server.create_local_server()
->>> sess = tf.Session(server.target)
+>>> sess = tf.Session(server.target)  # Create a session on the server.
 >>> sess.run(c)
 'Hello, distributed TensorFlow!'
 ```

 The
 [`tf.train.Server.create_local_server()`](../../api_docs/train.md#Server.create_local_server)
-method creates a single-process cluster.
+method creates a single-process cluster, with an in-process server.

 ## Create a cluster

-Most clusters have multiple tasks, divided into one or more jobs.  To create a
-cluster with multiple processes or machines:
+A TensorFlow "cluster" is a set of "tasks" that participate in the distributed
+execution of a TensorFlow graph. Each task is associated with a TensorFlow
+"server", which contains a "master" that can be used to create sessions, and a
+"worker" that executes operations in the graph.  A cluster can also be divided
+into one or more "jobs", where each job contains one or more tasks.

-1.  **For each process or machine** in the cluster, run a TensorFlow program to
-    do the following:
+To create a cluster, you start one TensorFlow server per task in the cluster.
+Each task typically runs on a different machine, but you can run multiple tasks
+on the same machine (e.g. to control different GPU devices). In each task, do
+the following:

-    1.  **Create a `tf.train.ClusterSpec`**, which describes all of the tasks
-        in the cluster. This should be the same in each process.
+1.  **Create a `tf.train.ClusterSpec`** that describes all of the tasks
+    in the cluster. This should be the same for each task.

-    1.  **Create a `tf.train.Server`**, passing the `tf.train.ClusterSpec` to
-        the constructor, and identifying the local process with a job name
-        and task index.
+2.  **Create a `tf.train.Server`**, passing the `tf.train.ClusterSpec` to
+    the constructor, and identifying the local task with a job name
+    and task index.


 ### Create a `tf.train.ClusterSpec` to describe the cluster
@ -71,28 +76,29 @@ tf.train.ClusterSpec({
  </tr>
 </table>

-### Create a `tf.train.Server` instance in each process
+### Create a `tf.train.Server` instance in each task

 A [`tf.train.Server`](../../api_docs/python/train.md#Server) object contains a
-set of local devices, and a
-[`tf.Session`](../../api_docs/python/client.md#Session) target that can
-participate in a distributed computation. Each server belongs to a particular
-cluster (specified by a `tf.train.ClusterSpec`), and corresponds to a particular
-task in a named job. The server can communicate with any other server in the
-same cluster.
+set of local devices, a set of connections to other tasks in its
+`tf.train.ClusterSpec`, and a
+["session target"](../../api_docs/python/client.md#Session) that can use these
+to perform a distributed computation. Each server is a member of a specific
+named job and has a task index within that job.  A server can communicate with
+any other server in the cluster.

-For example, to define and instantiate servers running on `localhost:2222` and
-`localhost:2223`, run the following snippets in different processes:
+For example, to launch a cluster with two servers running on `localhost:2222`
+and `localhost:2223`, run the following snippets in two different processes on
+the local machine:

 ```python
 # In task 0:
-cluster = tf.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
-server = tf.GrpcServer(cluster, job_name="local", task_index=0)
+cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
+server = tf.train.Server(cluster, job_name="local", task_index=0)
 ```
 ```python
 # In task 1:
-cluster = tf.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
-server = tf.GrpcServer(cluster, job_name="local", task_index=1)
+cluster = tf.train.ClusterSpec({"local": ["localhost:2222", "localhost:2223"]})
+server = tf.train.Server(cluster, job_name="local", task_index=1)
 ```

 **Note:** Manually specifying these cluster specifications can be tedious,
@ -137,45 +143,44 @@ applying gradients).

 ## Replicated training

-A common training configuration ("data parallel training") involves multiple
-tasks in a `worker` job training the same model, using shared parameters hosted
-in a one or more tasks in a `ps` job. Each task will typically run on a
-different machine. There are many ways to specify this structure in TensorFlow,
-and we are building libraries that will simplify the work of specifying a
-replicated model. Possible approaches include:
+A common training configuration, called "data parallelism," involves multiple
+tasks in a `worker` job training the same model on different mini-batches of
+data, updating shared parameters hosted in a one or more tasks in a `ps`
+job. All tasks typically run on different machines. There are many ways to
+specify this structure in TensorFlow, and we are building libraries that will
+simplify the work of specifying a replicated model. Possible approaches include:

-* Building a single graph containing one set of parameters (in `tf.Variable`
-  nodes pinned to `/job:ps`), and multiple copies of the "model" pinned to
-  different tasks in `/job:worker`. Each copy of the model can have a different
-  `train_op`, and one or more client threads can call `sess.run(train_ops[i])`
-  for each worker `i`. This implements *asynchronous* training.
+* **In-graph replication.** In this approach, the client builds a single
+  `tf.Graph` that contains one set of parameters (in `tf.Variable` nodes pinned
+  to `/job:ps`); and multiple copies of the compute-intensive part of the model,
+  each pinned to a different task in `/job:worker`.
+  
+* **Between-graph replication.** In this approach, there is a separate client
+  for each `/job:worker` task, typically in the same process as the worker
+  task. Each client builds a similar graph containing the parameters (pinned to
+  `/job:ps` as before using
+  [`tf.train.replica_device_setter()`](../../api_docs/train.md#replica_device_setter)
+  to map them deterministically to the same tasks); and a single copy of the
+  compute-intensive part of the model, pinned to the local task in
+  `/job:worker`.

-  This approach uses a single `tf.Session` whose target is one of the workers in
-  the cluster.
+* **Asynchronous training.** In this approach, each replica of the graph has an
+  independent training loop that executes without coordination. It is compatible
+  with both forms of replication above.

-* As above, but where the gradients from all workers are averaged. See the
-  [CIFAR-10 multi-GPU trainer](https://www.tensorflow.org/code/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py)
-  for an example of this form of replication. This implements *synchronous*
-  training.
-
-* The "distributed trainer" approach uses multiple graphs&mdash;one per
-  worker&mdash;where each graph contains one set of parameters (pinned to
-  `/job:ps`) and one copy of the model (pinned to a particular
-  `/job:worker/task:i`). The "container" mechanism is used to share variables
-  between different graphs: when each variable is constructed, the optional
-  `container` argument is specified with the same value in each copy of the
-  graph. For large models, this can be more efficient, because the overall graph
-  is smaller.
-
-  This approach uses multiple `tf.Session` objects: one per worker process,
-  where the `target` of each is the address of a different worker. The
-  `tf.Session` objects can all be created in a single Python client, or you can
-  use multiple Python clients to better distribute the trainer load.
+* **Synchronous training.** In this approach, all of the replicas read the same
+  values for the current parameters, compute gradients in parallel, and then
+  apply them together. It is compatible with in-graph replication (e.g. using
+  gradient averaging as in the
+  [CIFAR-10 multi-GPU trainer](https://www.tensorflow.org/code/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py)),
+  and between-graph replication (e.g. using the
+  `tf.train.SyncReplicasOptimizer`).

 ### Putting it all together: example trainer program

-The following code shows the skeleton of a distributed trainer program. It
-includes the code for the parameter server and worker processes.
+The following code shows the skeleton of a distributed trainer program,
+implementing **between-graph replication** and **asynchronous training**. It
+includes the code for the parameter server and worker tasks.

 ```python
 import tensorflow as tf
@ -197,10 +202,13 @@ def main(_):
  ps_hosts = FLAGS.ps_hosts.split(",")
  worker_hosts = FLAGS.worker_hosts(",")

+  # Create a cluster from the parameter server and worker hosts.
  cluster = tf.train.ClusterSpec({"ps": ps_hosts, "worker": worker_hosts})
+  
+  # Create and start a server for the local task.
  server = tf.train.Server(cluster,
                           job_name=FLAGS.job_name,
-                           task_index=task_index)
+                           task_index=FLAGS.task_index)

  if FLAGS.job_name == "ps":
    server.join()
@ -290,10 +298,10 @@ $ python trainer.py \
  </dd>
  <dt>Cluster</dt>
  <dd>
-    A TensorFlow cluster comprises one or more TensorFlow servers, divided into
-    a set of named jobs, which in turn comprise lists of tasks. A cluster is
-    typically dedicated to a particular high-level objective, such as training a
-    neural network, using many machines in parallel.
+    A TensorFlow cluster comprises a one or more "jobs", each divided into lists
+    of one or more "tasks". A cluster is typically dedicated to a particular
+    high-level objective, such as training a neural network, using many machines
+    in parallel. A cluster is defined by a `tf.train.ClusterSpec` object.
  </dd>
  <dt>Job</dt>
  <dd>
@ -301,20 +309,22 @@ $ python trainer.py \
    purpose. For example, a job named `ps` (for "parameter server") typically
    hosts nodes that store and update variables; while a job named `worker`
    typically hosts stateless nodes that perform compute-intensive tasks.
-    The tasks in a job typically run on different machines.
+    The tasks in a job typically run on different machines. The set of job roles
+    is flexible: for example, a `worker` may maintain some state.
  </dd>
  <dt>Master service</dt>
  <dd>
-    An RPC service that provides remote access to a set of distributed
-    devices. The master service implements the <code>tensorflow::Session</code>
-    interface, and is responsible for coordinating work across one or more
-    "worker services".
+    An RPC service that provides remote access to a set of distributed devices,
+    and acts as a session target. The master service implements the
+    <code>tensorflow::Session</code> interface, and is responsible for
+    coordinating work across one or more "worker services". All TensorFlow
+    servers implement the master service.
  </dd>
  <dt>Task</dt>
  <dd>
-    A task typically corresponds to a single TensorFlow server process,
-    belonging to a particular "job" and with a particular index within that
-    job's list of tasks.
+    A task corresponds to a specific TensorFlow server, and typically
+    corresponds to a single process. A task belongs to a particular "job" and is
+    identified by its index within that job's list of tasks.
  </dd>
  <dt>TensorFlow server</dt>
  <dd>
@ -326,6 +336,7 @@ $ python trainer.py \
    An RPC service that executes parts of a TensorFlow graph using its local
    devices. A worker service implements <a href=
    "https://www.tensorflow.org/code/tensorflow/core/protobuf/worker_service.proto"
-    ><code>worker_service.proto</code></a>.
+    ><code>worker_service.proto</code></a>. All TensorFlow servers implement the
+    worker service.
  </dd>
 </dl>
--- a/tensorflow/g3doc/tutorials/index.md
+++ b/tensorflow/g3doc/tutorials/index.md
@ -114,4 +114,4 @@ Building on the Inception recognition model, we will release a TensorFlow
 version of the [Deep Dream](https://github.com/google/deepdream) neural network
 visual hallucination software.

-[View Tutorial](https://www.tensorflow.org/code/tensorflow/examples/tutorials/deepdream/deepdream.ipynb)
+[View Tutorial](https://www.tensorflow.org/code/tensorflow/examples/tutorials/deepdream/README.md)
--- a/tensorflow/python/kernel_tests/rnn_cell_test.py
+++ b/tensorflow/python/kernel_tests/rnn_cell_test.py
@ -19,6 +19,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import functools
 import numpy as np
 import tensorflow as tf

@ -208,5 +209,59 @@ class RNNCellTest(tf.test.TestCase):
                                   0.13248, 0.13248]])


+class SlimRNNCellTest(tf.test.TestCase):
+
+  def testBasicRNNCell(self):
+    with self.test_session() as sess:
+      with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
+        x = tf.zeros([1, 2])
+        m = tf.zeros([1, 2])
+        my_cell = functools.partial(basic_rnn_cell, num_units=2)
+        g, _ = tf.nn.rnn_cell.SlimRNNCell(my_cell)(x, m)
+        sess.run([tf.initialize_all_variables()])
+        res = sess.run([g], {x.name: np.array([[1., 1.]]),
+                             m.name: np.array([[0.1, 0.1]])})
+        self.assertEqual(res[0].shape, (1, 2))
+
+  def testBasicRNNCellMatch(self):
+    batch_size = 32
+    input_size = 100
+    num_units = 10
+    with self.test_session() as sess:
+      with tf.variable_scope("root", initializer=tf.constant_initializer(0.5)):
+        inputs = tf.random_uniform((batch_size, input_size))
+        _, initial_state = basic_rnn_cell(inputs, None, num_units)
+        my_cell = functools.partial(basic_rnn_cell, num_units=num_units)
+        slim_cell = tf.nn.rnn_cell.SlimRNNCell(my_cell)
+        slim_outputs, slim_state = slim_cell(inputs, initial_state)
+        rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units)
+        outputs, state = rnn_cell(inputs, initial_state)
+        self.assertEqual(slim_outputs.get_shape(), outputs.get_shape())
+        self.assertEqual(slim_state.get_shape(), state.get_shape())
+        sess.run([tf.initialize_all_variables()])
+        res = sess.run([slim_outputs, slim_state, outputs, state])
+        self.assertAllClose(res[0], res[2])
+        self.assertAllClose(res[1], res[3])
+
+
+def basic_rnn_cell(inputs, state, num_units, scope=None):
+  if state is None:
+    if inputs is not None:
+      batch_size = inputs.get_shape()[0]
+      dtype = inputs.dtype
+    else:
+      batch_size = 0
+      dtype = tf.float32
+    init_output = tf.zeros(tf.pack([batch_size, num_units]), dtype=dtype)
+    init_state = tf.zeros(tf.pack([batch_size, num_units]), dtype=dtype)
+    init_output.set_shape([batch_size, num_units])
+    init_state.set_shape([batch_size, num_units])
+    return init_output, init_state
+  else:
+    with tf.variable_op_scope([inputs, state], scope, "BasicRNNCell"):
+      output = tf.tanh(tf.nn.rnn_cell.linear([inputs, state],
+                                             num_units, True))
+    return output, output
+
 if __name__ == "__main__":
  tf.test.main()
--- a/tensorflow/python/kernel_tests/variables_test.py
+++ b/tensorflow/python/kernel_tests/variables_test.py
@ -302,6 +302,53 @@ class VariablesTestCase(tf.test.TestCase):
      self.assertEqual(var.op.device, init_op.device)
      sess.run(init_op)

+  def testInitializerFunction(self):
+    value = [[-42], [133.7]]
+    shape = [2, 1]
+    with self.test_session():
+      initializer = lambda: tf.constant(value)
+      with self.assertRaises(ValueError):
+        # Checks that dtype must be specified.
+        tf.Variable(initializer)
+
+      v1 = tf.Variable(initializer, dtype=tf.float32)
+      self.assertEqual(shape, v1.get_shape())
+      self.assertAllClose(value, v1.initial_value.eval())
+      with self.assertRaises(tf.errors.FailedPreconditionError):
+        v1.eval()
+
+      v2 = tf.Variable(tf.neg(v1.initialized_value()), dtype=tf.float32)
+      self.assertEqual(v1.get_shape(), v2.get_shape())
+      self.assertAllClose(np.negative(value), v2.initial_value.eval())
+
+      # Once v2.initial_value.eval() has been called, v1 has effectively been
+      # initialized.
+      self.assertAllClose(value, v1.eval())
+
+      with self.assertRaises(tf.errors.FailedPreconditionError):
+        v2.eval()
+      tf.initialize_all_variables().run()
+      self.assertAllClose(np.negative(value), v2.eval())
+
+  def testInitializerFunctionDevicePlacement(self):
+    with self.test_session():
+      initializer = lambda: tf.constant(42.0)
+      with tf.device("/cpu:100"):
+        v1 = tf.Variable(initializer, dtype=tf.float32, name="v1")
+      expected_device = "/device:CPU:100"
+      expected_group_v1 = [b"loc:@v1"]
+      self.assertEqual(expected_device, v1.op.device)
+      self.assertEqual(expected_group_v1, v1.op.colocation_groups())
+      for i in v1.initializer.inputs:
+        self.assertEqual(expected_device, i.op.device)
+        self.assertEqual(expected_group_v1, i.op.colocation_groups())
+
+      v2 = tf.Variable(initializer, dtype=tf.float32, name="v2")
+      expected_group_v2 = [b"loc:@v2"]
+      self.assertEqual(expected_group_v2, v2.op.colocation_groups())
+      for i in v2.initializer.inputs:
+        self.assertEqual(expected_group_v2, i.op.colocation_groups())
+

 class IsInitializedTest(tf.test.TestCase):

--- a/tensorflow/python/ops/partitioned_variables.py
+++ b/tensorflow/python/ops/partitioned_variables.py
@ -167,19 +167,22 @@ def create_partitioned_variables(
      slice_offset[slice_dim] += var_shape[slice_dim]

      if callable(initializer):
-        init_val = initializer(var_shape, dtype=dtype)
-        init_val = ops.convert_to_tensor(init_val, dtype=dtype)
+        init = initializer
+        init_shape = var_shape
      elif isinstance(initializer, ops.Tensor):
-        init_val = array_ops.slice(initializer, var_offset, var_shape)
+        init = array_ops.slice(initializer, var_offset, var_shape)
        # Use the dtype of the given tensor.
-        dtype = init_val.dtype.base_dtype
+        dtype = init.dtype.base_dtype
+        init_shape = None
      else:
-        init_val = ops.convert_to_tensor(initializer, dtype=dtype)
-        init_val = array_ops.slice(init_val, var_offset, var_shape)
+        init = ops.convert_to_tensor(initializer, dtype=dtype)
+        init = array_ops.slice(init, var_offset, var_shape)
+        init_shape = None

      var = variable_scope.get_variable(name="part_%d" % i,
+                                        shape=init_shape,
                                        dtype=dtype,
-                                        initializer=init_val,
+                                        initializer=init,
                                        trainable=trainable,
                                        collections=collections)

--- a/tensorflow/python/ops/rnn_cell.py
+++ b/tensorflow/python/ops/rnn_cell.py
@ -661,6 +661,42 @@ class MultiRNNCell(RNNCell):
    return cur_inp, array_ops.concat(1, new_states)


+class SlimRNNCell(RNNCell):
+  """A simple wrapper for slim.rnn_cells."""
+
+  def __init__(self, cell_fn):
+    """Create a SlimRNNCell from a cell_fn.
+
+    Args:
+      cell_fn: a function which takes (inputs, state, scope) and produces the
+        outputs and the new_state. Additionally when called with inputs=None and
+        state=None it should return (initial_outputs, initial_state).
+
+    Raises:
+      TypeError: if cell_fn is not callable
+      ValueError: if cell_fn cannot produce a valid initial state.
+    """
+    if not callable(cell_fn):
+      raise TypeError("cell_fn %s needs to be callable", cell_fn)
+    self._cell_fn = cell_fn
+    self._cell_name = cell_fn.func.__name__
+    _, init_state = self._cell_fn(None, None)
+    state_shape = init_state.get_shape()
+    self._state_size = state_shape.with_rank(2)[1].value
+    if self._state_size is None:
+      raise ValueError("Initial state created by %s has invalid shape %s",
+                       self._cell_name, state_shape)
+
+  @property
+  def state_size(self):
+    return self._state_size
+
+  def __call__(self, inputs, state, scope=None):
+    scope = scope or self._cell_name
+    output, state = self._cell_fn(inputs, state, scope=scope)
+    return output, state
+
+
 def linear(args, output_size, bias, bias_start=0.0, scope=None):
  """Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.

--- a/tensorflow/python/ops/variable_scope.py
+++ b/tensorflow/python/ops/variable_scope.py
@ -144,14 +144,19 @@ class _VariableStore(object):
    with ops.control_dependencies(None):
      if initializing_from_value:
        init_val = initializer
+        variable_dtype = None
      else:
-        with ops.name_scope(name + "/Initializer/"):
-          init_val = initializer(shape.as_list(), dtype=dtype)
+        init_val = lambda: initializer(shape.as_list(), dtype=dtype)
+        variable_dtype = dtype.base_dtype

    # Create the variable.
-    v = variables.Variable(init_val, name=name, trainable=trainable,
+    v = variables.Variable(initial_value=init_val,
+                           name=name,
+                           trainable=trainable,
                           collections=collections,
-                           caching_device=caching_device)
+                           caching_device=caching_device,
+                           dtype=variable_dtype)
+
    self._vars[name] = v
    logging.info("Created variable %s with shape %s and init %s", v.name,
                 format(shape), initializer)
--- a/tensorflow/python/ops/variables.py
+++ b/tensorflow/python/ops/variables.py
@ -156,9 +156,12 @@ class Variable(object):
    variable to its initial value.

    Args:
-      initial_value: A `Tensor`, or Python object convertible to a `Tensor`.
-        The initial value for the Variable. Must have a shape specified unless
-        `validate_shape` is set to False.
+      initial_value: A `Tensor`, or Python object convertible to a `Tensor`,
+        which is the initial value for the Variable. The initial value must have
+        a shape specified unless `validate_shape` is set to False. Can also be a
+        callable with no argument that returns the initial value when called. In
+        that case, `dtype` must be specified. (Note that initializer functions
+        from init_ops.py must first be bound to a shape before being used here.)
      trainable: If `True`, the default, also adds the variable to the graph
        collection `GraphKeys.TRAINABLE_VARIABLES`. This collection is used as
        the default list of variables to use by the `Optimizer` classes.
@ -211,9 +214,12 @@ class Variable(object):
    """Creates a new variable from arguments.

    Args:
-      initial_value: A `Tensor`, or Python object convertible to a `Tensor`.
-        The initial value for the Variable. Must have a shape specified unless
-        `validate_shape` is set to False.
+      initial_value: A `Tensor`, or Python object convertible to a `Tensor`,
+        which is the initial value for the Variable. The initial value must have
+        a shape specified unless `validate_shape` is set to False. Can also be a
+        callable with no argument that returns the initial value when called. In
+        that case, `dtype` must be specified. (Note that initializer functions
+        from init_ops.py must first be bound to a shape before being used here.)
      trainable: If `True`, the default, also adds the variable to the graph
        collection `GraphKeys.TRAINABLE_VARIABLES`. This collection is used as
        the default list of variables to use by the `Optimizer` classes.
@ -240,25 +246,62 @@ class Variable(object):
    """
    if initial_value is None:
      raise ValueError("initial_value must be specified.")
+    init_from_fn = callable(initial_value)
+    if init_from_fn and dtype is None:
+      raise ValueError(
+          "dtype must also be specified when initial_value is callable.")
+
    if collections is None:
      collections = [ops.GraphKeys.VARIABLES]
    if trainable and ops.GraphKeys.TRAINABLE_VARIABLES not in collections:
      collections = list(collections) + [ops.GraphKeys.TRAINABLE_VARIABLES]
    with ops.control_dependencies(None):
-      with ops.op_scope([initial_value], name, "Variable") as name:
-        self._initial_value = ops.convert_to_tensor(initial_value,
-                                                    name="initial_value",
-                                                    dtype=dtype)
-        initial_value_shape = self._initial_value.get_shape()
-        if validate_shape and not initial_value_shape.is_fully_defined():
-          raise ValueError("initial_value must have a shape specified: %s"
-                           % self._initial_value)
-        shape_to_set = initial_value_shape if validate_shape else []
+      with ops.op_scope(
+          [] if init_from_fn else [initial_value], name, "Variable") as name:

-        self._variable = state_ops.variable_op(
-            shape_to_set, self._initial_value.dtype.base_dtype,
-            set_shape=validate_shape, name=name)
+        # Get the initial value from a callable function. The real shape of the
+        # variable will be set later, since under the init_from_fn case, the
+        # shape won't be known until after the function is invoked.
+        if init_from_fn:
+          self._variable = state_ops.variable_op(
+              [],
+              dtype.base_dtype,
+              set_shape=False,
+              name=name)
+          with ops.colocate_with(self._variable.op):
+            with ops.name_scope("Initializer"):
+              # Colocate the tensors created by the initial_value() function
+              # with the variable itself.
+              self._initial_value = ops.convert_to_tensor(initial_value(),
+                                                          name="initial_value",
+                                                          dtype=dtype)

+        # Or get the initial value from a Tensor or Python object.
+        else:
+          self._initial_value = ops.convert_to_tensor(initial_value,
+                                                      name="initial_value",
+                                                      dtype=dtype)
+          # In this case, the variable op can't be created until after the
+          # initial_value has been converted to a Tensor with a known type.
+          self._variable = state_ops.variable_op(
+              [],
+              self._initial_value.dtype.base_dtype,
+              set_shape=False,
+              name=name)
+
+        # Manually overrides the variable's shape with the initial value's.
+        if validate_shape:
+          initial_value_shape = self._initial_value.get_shape()
+          if not initial_value_shape.is_fully_defined():
+            raise ValueError("initial_value must have a shape specified: %s"
+                             % self._initial_value)
+          self._variable.set_shape(initial_value_shape)
+          # TODO(b/28152992): Remove the below hack modifying the node_def shape
+          # directly once set_shape() handles it.
+          self._variable.op.node_def.attr["shape"].shape.CopyFrom(
+              initial_value_shape.as_proto())
+
+        # Assigns initial value.
        with ops.colocate_with(self._variable.op):
          self._initializer_op = state_ops.assign(
              self._variable, self._initial_value,
--- a/tensorflow/python/platform/resource_loader.py
+++ b/tensorflow/python/platform/resource_loader.py
@ -79,3 +79,7 @@ def get_path_to_datafile(path):
  """
  data_files_path = os.path.dirname(inspect.getfile(sys._getframe(1)))
  return os.path.join(data_files_path, path)
+
+def readahead_file_path(path, unused_readahead=None):
+  """Readahead files not implemented; simply returns given path."""
+  return path
--- a/tensorflow/python/summary/impl/event_file_loader.py
+++ b/tensorflow/python/summary/impl/event_file_loader.py
@ -22,6 +22,7 @@ from tensorflow.core.util import event_pb2
 from tensorflow.python import pywrap_tensorflow
 from tensorflow.python.platform import app
 from tensorflow.python.platform import logging
+from tensorflow.python.platform import resource_loader
 from tensorflow.python.util import compat


@ -31,6 +32,7 @@ class EventFileLoader(object):
  def __init__(self, file_path):
    if file_path is None:
      raise ValueError('A file path is required')
+    file_path = resource_loader.readahead_file_path(file_path)
    logging.debug('Opening a record reader pointing at %s', file_path)
    self._reader = pywrap_tensorflow.PyRecordReader_New(
        compat.as_bytes(file_path), 0)
--- a/tensorflow/python/training/server_lib.py
+++ b/tensorflow/python/training/server_lib.py
@ -238,6 +238,7 @@ class ClusterSpec(object):
    elif isinstance(cluster, ClusterSpec):
      self._cluster_def = tensorflow_server_pb2.ClusterDef()
      self._cluster_def.MergeFrom(cluster.as_cluster_def())
+      self._cluster_spec = {}
      for job_def in self._cluster_def.job:
        self._cluster_spec[job_def.name] = [t for t in job_def.tasks.values()]
    else:
@ -306,4 +307,3 @@ class ClusterSpec(object):
          raise TypeError(
              "Task address %r must be bytes or unicode" % task_address)
        job_def.tasks[i] = task_address
-
--- a/tensorflow/python/training/server_lib_test.py
+++ b/tensorflow/python/training/server_lib_test.py
@ -146,6 +146,29 @@ class ServerDefTest(tf.test.TestCase):
    cluster_spec = tf.train.ClusterSpec(cluster_def)
    self.assertProtoEquals(cluster_def, cluster_spec.as_cluster_def())

+  def testClusterSpec(self):
+    cluster_spec = tf.train.ClusterSpec(
+        {"ps": ["ps0:2222", "ps1:2222"],
+         "worker": ["worker0:2222", "worker1:2222", "worker2:2222"]})
+
+    expected_proto = """
+    job { name: 'ps' tasks { key: 0 value: 'ps0:2222' }
+                     tasks { key: 1 value: 'ps1:2222' } }
+    job { name: 'worker' tasks { key: 0 value: 'worker0:2222' }
+                         tasks { key: 1 value: 'worker1:2222' }
+                         tasks { key: 2 value: 'worker2:2222' } }
+    """
+
+    self.assertProtoEquals(expected_proto, cluster_spec.as_cluster_def())
+    self.assertProtoEquals(
+        expected_proto, tf.train.ClusterSpec(cluster_spec).as_cluster_def())
+    self.assertProtoEquals(
+        expected_proto,
+        tf.train.ClusterSpec(cluster_spec.as_cluster_def()).as_cluster_def())
+    self.assertProtoEquals(
+        expected_proto,
+        tf.train.ClusterSpec(cluster_spec.as_dict()).as_cluster_def())
+

 if __name__ == "__main__":
  tf.test.main()
--- a/tensorflow/python/training/supervisor.py
+++ b/tensorflow/python/training/supervisor.py
@ -326,19 +326,29 @@ class Supervisor(object):
      self._init_global_step(global_step=global_step)
    self._graph = graph
    self._is_chief = is_chief
-    self._logdir = logdir
-    self._save_summaries_secs = save_summaries_secs
-    self._save_model_secs = save_model_secs
-    self._recovery_wait_secs = recovery_wait_secs
    self._coord = coordinator.Coordinator()
-    if logdir:
+    self._started_threads = []
+    self._recovery_wait_secs = recovery_wait_secs
+
+    # Only chief supervisors write event files, so only chief supervisors
+    # should have event-writing properties. Set to None for non-chiefs.
+    if self._is_chief:
+      self._logdir = logdir
+      self._save_summaries_secs = save_summaries_secs
+      self._save_model_secs = save_model_secs
+    else:
+      self._logdir = None
+      self._save_summaries_secs = None
+      self._save_model_secs = None
+
+    if self._is_chief and self._logdir:
      self._save_path = os.path.join(self._logdir, checkpoint_basename)
      self._summary_writer = summary_io.SummaryWriter(self._logdir)
    else:
      self._save_path = None
      self._summary_writer = None
+
    self._init_session_manager(session_manager=session_manager)
-    self._started_threads = []
    self._verify_setup()
    # The graph is not allowed to change anymore.
    graph.finalize()
@ -520,7 +530,7 @@ class Supervisor(object):

  @property
  def summary_writer(self):
-    """Return the SummaryWriter used by the supervisor.
+    """Return the SummaryWriter used by the chief supervisor.

    Returns:
      A SummaryWriter.
@ -529,7 +539,7 @@ class Supervisor(object):

  @property
  def summary_op(self):
-    """Return the Summary Tensor used by the supervisor.
+    """Return the Summary Tensor used by the chief supervisor.

    Returns:
      A string Tensor for the summary or `None`.
@ -583,8 +593,7 @@ class Supervisor(object):

  def _write_graph(self):
    """Writes graph_def to `logdir` and adds it to summary if applicable."""
-    if not self._is_chief:
-      return
+    assert self._is_chief
    if self._logdir:
      training_util.write_graph(self._graph.as_graph_def(),
                                self._logdir, "graph.pbtxt")
@ -610,11 +619,13 @@ class Supervisor(object):
        sv.coord.Join(<list of threads>)

    Raises:
+      RuntimeError: If called with a non-chief Supervisor.
      ValueError: If not `logdir` was passed to the constructor as the
        services need a log directory.
    """
    if not self._is_chief:
-      return
+      raise RuntimeError("Only chief supervisor can start standard services. "
+                         "Because only cheif supervisors can write events.")
    if not self._logdir:
      logging.warning("Standard services need a 'logdir' "
                      "passed to the SessionManager")
@ -812,14 +823,18 @@ class Supervisor(object):
      TypeError: if 'summary' is not a Summary proto or a string.
      RuntimeError: if the Supervisor was created without a `logdir`.
    """
-    if not self._logdir:
-      raise RuntimeError("summary_computed() requires a logdir")
+    if not self._summary_writer:
+      raise RuntimeError("Writing a summary requires a summary writer.")
    if global_step is None and self.global_step is not None:
      global_step = training_util.global_step(sess, self.global_step)
-    if self._summary_writer:
-      self._summary_writer.add_summary(summary, global_step)
+    self._summary_writer.add_summary(summary, global_step)

  def _default_global_step_tensor(self):
+    """Returns the global_step from the default graph.
+
+    Returns:
+      The global step `Tensor` or `None`.
+    """
    try:
      gs = ops.get_default_graph().get_tensor_by_name("global_step:0")
      if gs.dtype.base_dtype in [dtypes.int32, dtypes.int64]:
--- a/tensorflow/python/training/supervisor_test.py
+++ b/tensorflow/python/training/supervisor_test.py
@ -73,12 +73,11 @@ class SupervisorTest(tf.test.TestCase):
      sess.close()
      sv.stop()

-  def testSummary(self):
+  def testChiefCanWriteEvents(self):
    logdir = self._TestDir("basics")
    with tf.Graph().as_default():
-      const = tf.constant([1.0, 2.0, 3.0])
-      summ = tf.scalar_summary(["c1", "c2", "c3"], const)
-      sv = tf.train.Supervisor(logdir=logdir, summary_op=None)
+      summ = tf.scalar_summary(["c1", "c2", "c3"], tf.constant([1.0, 2.0, 3.0]))
+      sv = tf.train.Supervisor(is_chief=True, logdir=logdir, summary_op=None)
      sess = sv.prepare_or_wait_for_session("")
      sv.summary_computed(sess, sess.run(summ))
      sess.close()
@ -113,13 +112,31 @@ class SupervisorTest(tf.test.TestCase):
    # We should be done.
    self.assertRaises(StopIteration, lambda: next(rr))

+  def testNonChiefCannotWriteEvents(self):
+
+    def _summary_computed():
+      with tf.Graph().as_default():
+        sv = tf.train.Supervisor(is_chief=False)
+        sess = sv.prepare_or_wait_for_session("")
+        summ = tf.scalar_summary(["c1", "c2"], tf.constant([1.0, 2.0]))
+        sv.summary_computed(sess, sess.run(summ))
+
+    def _start_standard_services():
+      with tf.Graph().as_default():
+        sv = tf.train.Supervisor(is_chief=False)
+        sess = sv.prepare_or_wait_for_session("")
+        sv.start_standard_services(sess)
+
+    self.assertRaises(RuntimeError, _summary_computed)
+    self.assertRaises(RuntimeError, _start_standard_services)
+
  def testNoLogdirButWantSummary(self):
    with tf.Graph().as_default():
      const = tf.constant([1.0, 2.0, 3.0])
      summ = tf.scalar_summary(["c1", "c2", "c3"], const)
      sv = tf.train.Supervisor(logdir="", summary_op=None)
      sess = sv.prepare_or_wait_for_session("")
-      with self.assertRaisesRegexp(RuntimeError, "requires a logdir"):
+      with self.assertRaisesRegexp(RuntimeError, "requires a summary writer"):
        sv.summary_computed(sess, sess.run(summ))

  def testNoLogdirSucceeds(self):
--- a/tensorflow/tensorboard/BUILD
+++ b/tensorflow/tensorboard/BUILD
@ -12,9 +12,10 @@ filegroup(
    srcs = [
        "dist/index.html",
        "dist/tf-tensorboard.html",
-        "//tensorflow/tensorboard/bower:bower",
        "TAG",
-    ] + glob(["lib/**/*"]),
+        "//tensorflow/tensorboard/bower:bower",
+        "//tensorflow/tensorboard/lib:all_files",
+    ],
 )

 py_binary(
--- a/tensorflow/tensorboard/README.md
+++ b/tensorflow/tensorboard/README.md
@ -5,7 +5,7 @@ TensorFlow runs and graphs. TensorBoard currently supports four visualizations:
 scalars, images, histograms, and the graph.

 You can play with an interactive demo TensorBoard at
-[tensorflow.org/tensorboard/cifar.html](https://www.tensorflow.org/tensorboard/cifar.html).
+[tensorflow.org/tensorboard](https://www.tensorflow.org/tensorboard).

 This README gives an overview of key concepts in TensorBoard, as well as how to
 interpret the visualizations TensorBoard provides. For an in-depth example of
--- a/tensorflow/tools/benchmark/BUILD
+++ b/tensorflow/tools/benchmark/BUILD
@ -0,0 +1,66 @@
+# Description:
+#   Benchmark utility that can run on desktop and Android.
+
+package(default_visibility = ["//visibility:public"])
+
+licenses(["notice"])  # Apache 2.0
+
+load("//tensorflow:tensorflow.bzl", "tf_copts")
+
+exports_files(["LICENSE"])
+
+cc_library(
+    name = "benchmark_model_lib",
+    srcs = [
+        "benchmark_model.cc",
+    ],
+    copts = tf_copts(),
+    visibility = ["//visibility:public"],
+    deps = select({
+        "//tensorflow:android": [
+            "//tensorflow/core:android_tensorflow_lib",
+        ],
+        "//conditions:default": [
+            "//tensorflow/core:core_cpu",
+            "//tensorflow/core:lib",
+            "//tensorflow/core:framework",
+            "//tensorflow/core:framework_internal",
+            "//tensorflow/core:protos_all_cc",
+            "//tensorflow/core:tensorflow",
+        ],
+    }),
+)
+
+# This binary may be built for either desktop or Android.
+# A typical Android build command will look like the following:
+# bazel build -c opt tensorflow/core:android_tensorflow_lib \
+# --crosstool_top=//external:android/crosstool \
+# --cpu=armeabi-v7a \
+# --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
+#
+# NOTE: currently '-pthread' must be removed from the LINK_OPTS variable
+# in google/protobuf/BUILD to sucessfully build for Android. This is temporary
+# pending an update of the version of the protobuf library that Tensorflow
+# uses.
+cc_binary(
+    name = "benchmark_model",
+    copts = tf_copts(),
+    linkopts = select({
+        "//tensorflow:android": [
+            "-pie",
+            "-s",
+            "-landroid",
+            "-ljnigraphics",
+            "-llog",
+            "-lm",
+            "-z defs",
+            "-s",
+            "-Wl,--icf=all",  # Identical Code Folding
+            "-Wl,--exclude-libs,ALL",  # Exclude syms in all libs from auto export
+        ],
+        "//conditions:default": [],
+    }),
+    linkstatic = 1,
+    visibility = ["//visibility:public"],
+    deps = [":benchmark_model_lib"],
+)
--- a/tensorflow/tools/benchmark/README.md
+++ b/tensorflow/tools/benchmark/README.md
@ -0,0 +1,57 @@
+# Tensorflow Model Benchmark Tool
+
+## Description
+
+A simple C++ binary to benchmark a compute graph and its individual operators,
+both on desktop machines and on Android.
+
+## To build/install/run
+
+### On Android:
+
+(1) build for your specific platform, e.g.:
+```bash
+$bazel build -c opt \
+  --crosstool_top=//external:android/crosstool \
+  --cpu=armeabi-v7a \
+  --host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
+  tensorflow/tools/benchmark:benchmark_model
+```
+
+(2) Connect your phone. Push the binary to your phone with adb push
+     (make the directory if required):
+```bash
+$adb push bazel-bin/tensorflow/tools/benchmark/benchmark_model /data/local/tmp
+```
+
+(3) Push the compute graph that you need to test. For example:
+     adb push tensorflow_inception_graph.pb /data/local/tmp
+
+(4) Run the benchmark. For example:
+```bash
+$adb shell "/data/local/tmp/benchmark_model \
+  --graph=/data/local/tmp/tensorflow_inception_graph.pb \
+  --input_layer="input:0" \
+  --input_layer_shape="1,224,224,3" \
+  --input_layer_type="float" \
+  --output_layer="output:0"
+```
+### On desktop:
+(1) build the binary
+```bash
+$bazel build -c opt tensorflow/tools/benchmark:benchmark_model
+```
+
+(2) Run on your compute graph, similar to the Android case but without the need of adb shell.
+For example:
+```bash
+$bazel-bin/tensorflow/tools/benchmark/benchmark_model \
+  --graph=tensorflow_inception_graph.pb \
+  --input_layer="input:0" \
+  --input_layer_shape="1,224,224,3" \
+  --input_layer_type="float" \
+  --output_layer="output:0"
+```
+
+The Inception graph used as an example here may be downloaded from
+https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
--- a/tensorflow/tools/benchmark/benchmark_model.cc
+++ b/tensorflow/tools/benchmark/benchmark_model.cc
@ -0,0 +1,225 @@
+/* Copyright 2016 Google Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+// A C++ binary to benchmark a compute graph and its individual operators,
+// both on desktop machines and on Android.
+//
+// See README.md for usage instructions.
+
+#include <cstdlib>
+#include <memory>
+#include <string>
+#include <unordered_set>
+#include <vector>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/tensor.h"
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/env.h"
+#include "tensorflow/core/platform/init_main.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/types.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/util/command_line_flags.h"
+#include "tensorflow/core/util/stat_summarizer.h"
+
+namespace tensorflow {
+
+// Global variables that holds the Tensorflow classifier.
+static std::unique_ptr<tensorflow::Session> session;
+
+static StatSummarizer g_stats;
+
+struct Flags {
+  string graph = "/data/local/tmp/tensorflow_inception_graph.pb";
+  string input_layer = "input:0";
+  string input_layer_shape = "1,224,224,3";
+  string input_layer_type = "float";
+  string output_layer = "output:0";
+  int num_runs = 50;
+  string run_delay = "-1.0";
+  int num_threads = -1;
+};
+
+static Flags* flags;  // Filled in by main()
+
+static bool InitializeBenchmark() {
+  g_stats.Reset();
+
+  LOG(INFO) << "Loading Tensorflow.";
+
+  tensorflow::SessionOptions options;
+  tensorflow::ConfigProto& config = options.config;
+  if (flags->num_threads > 0) {
+    config.set_intra_op_parallelism_threads(flags->num_threads);
+  }
+  LOG(INFO) << "Got config, " << config.device_count_size() << " devices";
+
+  session.reset(tensorflow::NewSession(options));
+  tensorflow::GraphDef tensorflow_graph;
+  Status s = ReadBinaryProto(Env::Default(), flags->graph, &tensorflow_graph);
+  if (!s.ok()) {
+    LOG(ERROR) << "Could not create Tensorflow Graph: " << s;
+    return false;
+  }
+
+  s = session->Create(tensorflow_graph);
+  if (!s.ok()) {
+    LOG(ERROR) << "Could not create Tensorflow Session: " << s;
+    return false;
+  }
+
+  // Clear the proto to save memory space.
+  tensorflow_graph.Clear();
+  return true;
+}
+
+static bool RunBenchmark() {
+  DataType input_data_type;
+  CHECK(DataTypeFromString(flags->input_layer_type, &input_data_type))
+      << flags->input_layer_type << " was an invalid type";
+
+  std::vector<int32> sizes;
+  CHECK(str_util::SplitAndParseAsInts(flags->input_layer_shape, ',', &sizes))
+      << "Incorrect size string specified: " << flags->input_layer_shape;
+  TensorShape input_shape;
+  for (int i = 0; i < sizes.size(); ++i) {
+    input_shape.AddDim(sizes[i]);
+  }
+
+  Tensor input_tensor(input_data_type, input_shape);
+
+  switch (input_data_type) {
+    case DT_INT32: {
+      auto int_tensor = input_tensor.flat<int32>();
+      int_tensor = int_tensor.constant(0.0);
+      break;
+    }
+    case DT_FLOAT: {
+      auto float_tensor = input_tensor.flat<float>();
+      float_tensor = float_tensor.constant(0.0);
+      break;
+    }
+    case DT_QUINT8: {
+      auto int_tensor = input_tensor.flat<quint8>();
+      int_tensor = int_tensor.constant(0.0);
+      break;
+    }
+    default:
+      LOG(FATAL) << "Unsupported input type: " << flags->input_layer_type;
+  }
+
+  std::vector<std::pair<string, tensorflow::Tensor> > input_tensors(
+      {{flags->input_layer, input_tensor}});
+
+  std::vector<tensorflow::Tensor> output_tensors;
+  std::vector<string> output_names({flags->output_layer});
+
+  tensorflow::Status s;
+
+  RunOptions run_options;
+  run_options.set_trace_level(RunOptions::FULL_TRACE);
+  RunMetadata run_metadata;
+
+  s = session->Run(run_options, input_tensors, output_names, {},
+                   &output_tensors, &run_metadata);
+
+  assert(run_metadata.has_step_stats());
+
+  const StepStats& stats = run_metadata.step_stats();
+
+  g_stats.ProcessStepStats(stats);
+
+  if (!s.ok()) {
+    LOG(ERROR) << "Error during inference: " << s;
+    return false;
+  }
+  return true;
+}
+
+}  // namespace tensorflow
+
+int main(int argc, char** argv) {
+  tensorflow::flags = new tensorflow::Flags();
+
+  const bool parse_result = tensorflow::ParseFlags(
+      &argc, argv,
+      {
+          tensorflow::Flag("graph", &tensorflow::flags->graph),
+          tensorflow::Flag("input_layer", &tensorflow::flags->input_layer),
+          tensorflow::Flag("input_layer_shape",
+                           &tensorflow::flags->input_layer_shape),
+          tensorflow::Flag("input_layer_type",
+                           &tensorflow::flags->input_layer_type),
+          tensorflow::Flag("output_layer", &tensorflow::flags->output_layer),
+          tensorflow::Flag("num_runs", &tensorflow::flags->num_runs),
+          tensorflow::Flag("run_delay", &tensorflow::flags->run_delay),
+          tensorflow::Flag("num_threads", &tensorflow::flags->num_threads),
+      });
+
+  if (!parse_result) {
+    LOG(ERROR) << "Error parsing command-line flags.";
+    return -1;
+  }
+
+  ::tensorflow::port::InitMain(argv[0], &argc, &argv);
+  if (argc > 1) {
+    LOG(ERROR) << "Unknown argument " << argv[1];
+    return -1;
+  }
+
+  LOG(INFO) << "Graph: [" << tensorflow::flags->graph << "]";
+  LOG(INFO) << "Input layer: [" << tensorflow::flags->input_layer << "]";
+  LOG(INFO) << "Input shape: [" << tensorflow::flags->input_layer_shape << "]";
+  LOG(INFO) << "Input type: [" << tensorflow::flags->input_layer_type << "]";
+  LOG(INFO) << "Output layer: [" << tensorflow::flags->output_layer << "]";
+  LOG(INFO) << "Num runs: [" << tensorflow::flags->num_runs << "]";
+  LOG(INFO) << "Inter-run delay (seconds): [" << tensorflow::flags->run_delay
+            << "]";
+  LOG(INFO) << "Num threads: [" << tensorflow::flags->num_threads << "]";
+
+  if (!tensorflow::InitializeBenchmark()) {
+    return -1;
+  }
+
+  // Convert the run_delay string into a timespec.
+  const double sleep_seconds =
+      std::strtod(tensorflow::flags->run_delay.c_str(), nullptr);
+  timespec req;
+  req.tv_sec = static_cast<time_t>(sleep_seconds);
+  req.tv_nsec = (sleep_seconds - req.tv_sec) * 1000000000;
+
+  LOG(INFO) << "Running benchmark";
+  for (int i = 0; i < tensorflow::flags->num_runs; ++i) {
+    if (!tensorflow::RunBenchmark()) {
+      LOG(INFO) << "Failed on run " << i;
+      return -1;
+    }
+
+    // If requested, sleep between runs for an arbitrary amount of time.
+    // This can be helpful to determine the effect of mobile processor
+    // scaling and thermal throttling.
+    if (sleep_seconds > 0.0) {
+      nanosleep(&req, nullptr);
+    }
+  }
+
+  tensorflow::g_stats.PrintStepStats();
+  return 0;
+}
--- a/tensorflow/tools/ci_build/builds/test_installation.sh
+++ b/tensorflow/tools/ci_build/builds/test_installation.sh
@ -139,6 +139,16 @@ else
  # Assume: PYTHON_BIN_PATH is exported by the script above
 fi

+# Obtain the path to head/ghead binary (for log file printing)
+HEAD_BIN="ghead"
+if [[ -z $(which "${HEAD_BIN}") ]]; then
+  # This is not Mac (which uses coreutils/ghead), use head.
+  HEAD_BIN="head"
+  if [[ -z $(which "${HEAD_BIN}") ]]; then
+     die "Unable to obtain path to head or ghead"
+  fi
+fi
+
 if [[ -z "${PYTHON_BIN_PATH}" ]]; then
  die "PYTHON_BIN_PATH was not provided. If this is not virtualenv, "\
 "did you run configure?"
@ -371,7 +381,7 @@ while true; do

      echo "  Log @: ${TEST_LOGS[K]}"
      echo "============== BEGINS failure log content =============="
-      head --lines=-1 "${TEST_LOGS[K]}"
+      "${HEAD_BIN}" --lines=-1 "${TEST_LOGS[K]}"
      echo "============== ENDS failure log content =============="
      echo ""
    fi
--- a/tensorflow/tools/dist_test/Dockerfile
+++ b/tensorflow/tools/dist_test/Dockerfile
@ -21,7 +21,7 @@ RUN /var/gcloud/google-cloud-sdk/bin/gcloud components install kubectl
 # Install nightly TensorFlow pip
 # TODO(cais): Should we build it locally instead?
 RUN pip install \
-    http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+    http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

 # Copy test files
 COPY scripts /var/tf-dist-test/scripts
--- a/tensorflow/tools/dist_test/server/Dockerfile
+++ b/tensorflow/tools/dist_test/server/Dockerfile
@ -36,7 +36,7 @@ RUN curl -O https://bootstrap.pypa.io/get-pip.py && \

 # Install TensorFlow CPU version from nightly build
 RUN pip --no-cache-dir install \
-    http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+    http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

 # Copy files, including the GRPC server binary at
 # server/grpc_tensorflow_server.py
--- a/tensorflow/tools/dist_test/server/Dockerfile.test
+++ b/tensorflow/tools/dist_test/server/Dockerfile.test
@ -38,7 +38,7 @@ RUN curl -O https://bootstrap.pypa.io/get-pip.py && \

 # Install TensorFlow CPU version.
 RUN pip --no-cache-dir install \
-    http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl
+    http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp27-none-linux_x86_64.whl

 # Copy files, including the GRPC server binary at
 # server/grpc_tensorflow_server.py
--- a/tensorflow/tools/docker/Dockerfile
+++ b/tensorflow/tools/docker/Dockerfile
@ -29,7 +29,7 @@ RUN pip --no-cache-dir install \
    python -m ipykernel.kernelspec

 # Install TensorFlow CPU version.
-ENV TENSORFLOW_VERSION 0.8.0rc0
+ENV TENSORFLOW_VERSION 0.8.0
 RUN pip --no-cache-dir install \
    http://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl

--- a/tensorflow/tools/docker/Dockerfile.gpu
+++ b/tensorflow/tools/docker/Dockerfile.gpu
@ -29,7 +29,7 @@ RUN pip --no-cache-dir install \
    python -m ipykernel.kernelspec

 # Install TensorFlow GPU version.
-ENV TENSORFLOW_VERSION 0.8.0rc0
+ENV TENSORFLOW_VERSION 0.8.0
 RUN pip --no-cache-dir install \
    http://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-${TENSORFLOW_VERSION}-cp27-none-linux_x86_64.whl

--- a/tensorflow/tools/pip_package/setup.py
+++ b/tensorflow/tools/pip_package/setup.py
@ -27,7 +27,7 @@ from setuptools import find_packages, setup, Command, Extension
 from setuptools.command.install import install as InstallCommandBase
 from setuptools.dist import Distribution

-_VERSION = '0.8.0rc0'
+_VERSION = '0.8.0'

 numpy_version = "1.8.2"
 if platform.system() == "Darwin":
@ -183,6 +183,7 @@ setup(
                       'tensorboard/dist/index.html',
                       'tensorboard/dist/tf-tensorboard.html',
                       'tensorboard/lib/css/global.css',
+                       'tensorboard/TAG',
                     ] + matches,
    },
    zip_safe=False,
Author	SHA1	Message	Date
breandan	9b69ec3960	Fix broken link to Anaconda installation (#2679 )	2016-06-06 13:54:44 -07:00
Vijay Vasudevan	4b7bc3174e	Update cuda instructions to be more specific about versions (#2065 )	2016-04-22 13:51:21 -07:00
Martin Wicke	dc19800ee1	Merge pull request #2023 from caisq/r0.8-tensorforest-2 R0.8 tensorforest cherry-pick	2016-04-19 13:51:30 -07:00
Jan Prach	ac3c683651	Merge pull request #2026 from caisq/r0.8-final One more version update for 0.8.0 final	2016-04-19 13:31:57 -07:00
Shanqing Cai	a074dca846	One more version update for 0.8.0 final	2016-04-19 16:01:05 -04:00
gilberth	f7ec1ed5fc	Fixes and enhancements for contrib/tensorforest Change: 120123078	2016-04-19 13:08:21 -04:00
Martin Wicke	44a6b91ce8	Merge pull request #1902 from martinwicke/branch_119712558 Merge internal changes	2016-04-19 13:08:08 -04:00
Martin Wicke	d118d1d31c	Merge pull request #2020 from martinwicke/r0.8 Doc changes	2016-04-19 09:01:38 -07:00
Martin Wicke	846e0121e7	Point deepdream tutorial at README instead of ipython	2016-04-19 08:58:19 -07:00
Daniel W Mane	93625a8a79	Include TensorBoard TAG in pip_package (#2009 )	2016-04-18 14:00:00 -07:00
Martin Wicke	6408038938	Update RELEASE.md (#1995 ) Took out James Wexler (Googler) and added Yuan Tang (not Googler).	2016-04-17 00:17:38 -07:00
Martin Wicke	1933eb61ac	Update RELEASE.md Took out James Wexler (Googler) and added Yuan Tang (not Googler).	2016-04-17 00:17:07 -07:00
Daniel W Mane	9fd4a1dd9e	Fix the TensorBoard README.md to point at correct url for demo tensorboard (#1973 )	2016-04-16 14:25:02 -07:00
Vijay Vasudevan	7a4e0bf9aa	Add a comment to install mentioning cuda and cudnn requirements (#1985 ) (#1986 ) for PIP installs. (Also cherry-pick anaconda instructions)	2016-04-16 11:38:15 -07:00
Vijay Vasudevan	0e61baf4ea	Revert change to r0.8 branch that points to new URLs (#1977 ) They haven't been uploaded yet, and so our website is pointing people at the wrong location.	2016-04-15 21:05:47 -07:00
caisq	50cb176ba9	Merge pull request #1966 from mrry/clusterspec_fix Fixed bug in `tf.train.ClusterSpec` constructor.	2016-04-15 12:00:47 -04:00
caisq	9e1b37e4ce	Version bumping for 0.8.0 final release (#1959 ) 0.8.0rc0 --> 0.8.0	2016-04-15 08:48:13 -07:00
Derek Murray	fb301a9848	Fixed bug in `tf.train.ClusterSpec` constructor. Creating a `tf.train.ClusterSpec` from another ClusterSpec was broken, which in turn broke creating a `tf.train.Server` from a ClusterSpec. Fixes #1961. Change: 119954117	2016-04-15 07:45:18 -07:00
caisq	35cd6a3011	Merge pull request #1946 from caisq/r0.8-tensorboard Fix TensorBoard lib/css dependency (#1926)	2016-04-14 11:33:39 -04:00
Daniel W Mane	09d7d91ef0	Fix TensorBoard lib/css dependency (#1926 )	2016-04-14 10:59:08 -04:00
Derek Murray	6bda8aa907	Applying editorial changes to the distributed how-to. (#1920 ) Change: 119605636	2016-04-13 15:18:51 -07:00