Automated g4 rollback of changelist 190835392

PiperOrigin-RevId: 190858242
2018-03-28 16:52:39 -07:00 · 2018-03-28 16:52:39 -07:00 · 108178da2a
commit 108178da2a
parent 390e19ab99
116 changed files with 556 additions and 1703 deletions
--- a/RELEASE.md
+++ b/RELEASE.md
@ -1,63 +1,3 @@
-# Release 1.7.0
-
-## Major Features And Improvements
-* Eager mode is moving out of contrib, try `tf.enable_eager_execution()`.
-* Graph rewrites emulating fixed-point quantization compatible with TensorFlow Lite, supported by new `tf.contrib.quantize` package.
-* Easily customize gradient computation with `tf.custom_gradient`.
-* [TensorBoard Debugger Plugin](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/debugger/README.md), the graphical user interface (GUI) of TensorFlow Debugger (tfdbg), is now in alpha.
-* Experimental support for reading a sqlite database as a `Dataset` with new `tf.contrib.data.SqlDataset`.
-* Distributed Mutex / CriticalSection added to `tf.contrib.framework.CriticalSection`.
-* Better text processing with `tf.regex_replace`.
-* Easy, efficient sequence input with `tf.contrib.data.bucket_by_sequence_length`
-
-## Bug Fixes and Other Changes
-* Accelerated Linear Algebra (XLA):
-  * Add `MaxPoolGradGrad` support for XLA
-  * CSE pass from Tensorflow is now disabled in XLA.
-* `tf.data`:
-  * `tf.data.Dataset`
-    * Add support for building C++ Dataset op kernels as external libraries, using the `tf.load_op_library()` mechanism.
-    * `Dataset.list_files()` now shuffles its output by default.
-    * `Dataset.shuffle(..., seed=tf.constant(0, dtype=tf.int64))` now yields the same sequence of elements as `Dataset.shuffle(..., seed=0)`.
-  * Add `num_parallel_reads` argument to `tf.data.TFRecordDataset`.
-* `tf.contrib`:
-  * `tf.contrib.bayesflow.halton_sequence` now supports randomization.
-  * Add support for scalars in `tf.contrib.all_reduce`.
-  * Add `effective_sample_size` to `tf.contrib.bayesflow.mcmc_diagnostics`.
-  * Add `potential_scale_reduction` to `tf.contrib.bayesflow.mcmc_diagnostics`.
-  * Add `BatchNormalization`, `Kumaraswamy` bijectors.
-  * Deprecate `tf.contrib.learn`. Please check contrib/learn/README.md for instructions on how to convert existing code.
-  * `tf.contrib.data`
-    * Remove deprecated `tf.contrib.data.Dataset`, `tf.contrib.data.Iterator`, `tf.contrib.data.FixedLengthRecordDataset`, `tf.contrib.data.TextLineDataset`, and `tf.contrib.data.TFRecordDataset` classes.
-    * Added `bucket_by_sequence_length`, `sliding_window_batch`, and `make_batched_features_dataset`
-  * Remove unmaintained `tf.contrib.ndlstm`. You can find it externally at https://github.com/tmbarchive/tfndlstm.
-  * Moved most of `tf.contrib.bayesflow` to its own repo: `tfp`
-* Other:
-  * tf.py_func now reports the full stack trace if an exception occurs.
-  * Integrate `TPUClusterResolver` with GKE's integration for Cloud TPUs.
-  * Add a library for statistical testing of samplers.
-  * Add Helpers to stream data from the GCE VM to a Cloud TPU.
-  * Integrate ClusterResolvers with TPUEstimator.
-  * Unify metropolis_hastings interface with HMC kernel.
-  * Move LIBXSMM convolutions to a separate --define flag so that they are disabled by default.
-  * Fix `MomentumOptimizer` lambda.
-  * Reduce `tfp.layers` boilerplate via programmable docstrings.
-  * Add `auc_with_confidence_intervals`, a method for computing the AUC and confidence interval with linearithmic time complexity.
-  * `regression_head` now accepts customized link function, to satisfy the usage that user can define their own link function if the `array_ops.identity` does not meet the requirement.
-  * Fix `initialized_value` and `initial_value` behaviors for `ResourceVariables` created from `VariableDef` protos.
-  * Add TensorSpec to represent the specification of Tensors.
-  * Constant folding pass is now deterministic.
-  * Support `float16` `dtype` in `tf.linalg.*`.
-  * Add `tf.estimator.export.TensorServingInputReceiver` that allows `tf.estimator.Estimator.export_savedmodel` to pass raw tensors to model functions.
-
-## Thanks to our Contributors
-
-This release contains contributions from many people at Google, as well as:
-
-4d55397500, Abe, Alistair Low, Andy Kernahan, Appledore, Ben, Ben Barsdell, Boris Pfahringer, Brad Wannow, Brett Koonce, Carl Thomé, cclauss, Chengzhi Chen, Chris Drake, Christopher Yeh, Clayne Robison, Codrut Grosu, Daniel Trebbien, Danny Goodman, David Goodwin, David Norman, Deron Eriksson, Donggeon Lim, Donny Viszneki, DosLin, DylanDmitri, Francisco Guerrero, Fred Reiss, gdh1995, Giuseppe, Glenn Weidner, gracehoney, Guozhong Zhuang, Haichen "Hc" Li, Harald Husum, harumitsu.nobuta, Henry Spivey, hsm207, Jekyll Song, Jerome, Jiongyan Zhang, jjsjann123, John Sungjin Park, Johnson145, JoshVarty, Julian Wolff, Jun Wang, June-One, Kamil Sindi, Kb Sriram, Kdavis-Mozilla, Kenji, lazypanda1, Liang-Chi Hsieh, Loo Rong Jie, Mahesh Bhosale, MandarJKulkarni, ManHyuk, Marcus Ong, Marshal Hayes, Martin Pool, matthieudelaro, mdfaijul, mholzel, Michael Zhou, Ming Li, Minmin Sun, Myungjoo Ham, MyungsungKwak, Naman Kamra, Peng Yu, Penghao Cen, Phil, Raghuraman-K, resec, Rohin Mohanadas, Sandeep N Gupta, Scott Tseng, seaotterman, Seo Sanghyeon, Sergei Lebedev, Ted Chang, terrytangyuan, Tim H, tkunic, Tod, vihanjain, Yan Facai (颜发才), Yin Li, Yong Tang, Yukun Chen, Yusuke Yamada
-
-
-
 # Release 1.6.0

 ## Breaking Changes
--- a/configure.py
+++ b/configure.py
@ -1414,7 +1414,7 @@ def main():
  set_build_var(environ_cp, 'TF_NEED_S3', 'Amazon S3 File System',
                'with_s3_support', True, 's3')
  set_build_var(environ_cp, 'TF_NEED_KAFKA', 'Apache Kafka Platform',
-                'with_kafka_support', True, 'kafka')
+                'with_kafka_support', False, 'kafka')
  set_build_var(environ_cp, 'TF_ENABLE_XLA', 'XLA JIT', 'with_xla_support',
                False, 'xla')
  set_build_var(environ_cp, 'TF_NEED_GDR', 'GDR', 'with_gdr_support',
--- a/tensorflow/BUILD
+++ b/tensorflow/BUILD
@ -240,13 +240,6 @@ config_setting(
    visibility = ["//visibility:public"],
 )

-config_setting(
-    name = "with_kafka_support_windows_override",
-    define_values = {"with_kafka_support": "true"},
-    values = {"cpu": "x64_windows"},
-    visibility = ["//visibility:public"],
-)
-
 config_setting(
    name = "with_gcp_support_android_override",
    define_values = {"with_gcp_support": "true"},
--- a/tensorflow/contrib/BUILD
+++ b/tensorflow/contrib/BUILD
@ -51,6 +51,7 @@ py_library(
        "//tensorflow/contrib/image:single_image_random_dot_stereograms_py",
        "//tensorflow/contrib/input_pipeline:input_pipeline_py",
        "//tensorflow/contrib/integrate:integrate_py",
+        "//tensorflow/contrib/kafka",
        "//tensorflow/contrib/keras",
        "//tensorflow/contrib/kernel_methods",
        "//tensorflow/contrib/kfac",
@ -109,13 +110,7 @@ py_library(
        "//tensorflow/python:util",
    ] + if_mpi(["//tensorflow/contrib/mpi_collectives:mpi_collectives_py"]) + if_tensorrt([
        "//tensorflow/contrib/tensorrt:init_py",
-    ]) + select({
-        "//tensorflow:with_kafka_support_windows_override": [],
-        "//tensorflow:with_kafka_support": [
-            "//tensorflow/contrib/kafka",
-        ],
-        "//conditions:default": [],
-    }),
+    ]),
 )

 cc_library(
@ -125,6 +120,7 @@ cc_library(
        "//tensorflow/contrib/boosted_trees:boosted_trees_kernels",
        "//tensorflow/contrib/coder:all_kernels",
        "//tensorflow/contrib/data/kernels:dataset_kernels",
+        "//tensorflow/contrib/kafka:dataset_kernels",
        "//tensorflow/contrib/factorization/kernels:all_kernels",
        "//tensorflow/contrib/input_pipeline:input_pipeline_ops_kernels",
        "//tensorflow/contrib/layers:sparse_feature_cross_op_kernel",
@ -137,13 +133,7 @@ cc_library(
        "//tensorflow/contrib/text:all_kernels",
    ] + if_mpi(["//tensorflow/contrib/mpi_collectives:mpi_collectives_py"]) + if_cuda([
        "//tensorflow/contrib/nccl:nccl_kernels",
-    ]) + select({
-        "//tensorflow:with_kafka_support_windows_override": [],
-        "//tensorflow:with_kafka_support": [
-            "//tensorflow/contrib/kafka:dataset_kernels",
-        ],
-        "//conditions:default": [],
-    }),
+    ]),
 )

 cc_library(
@ -156,6 +146,7 @@ cc_library(
        "//tensorflow/contrib/factorization:all_ops",
        "//tensorflow/contrib/framework:all_ops",
        "//tensorflow/contrib/input_pipeline:input_pipeline_ops_op_lib",
+        "//tensorflow/contrib/kafka:dataset_ops_op_lib",
        "//tensorflow/contrib/layers:sparse_feature_cross_op_op_lib",
        "//tensorflow/contrib/nccl:nccl_ops_op_lib",
        "//tensorflow/contrib/nearest_neighbor:nearest_neighbor_ops_op_lib",
@ -166,13 +157,7 @@ cc_library(
        "//tensorflow/contrib/tensor_forest:tensor_forest_ops_op_lib",
        "//tensorflow/contrib/text:all_ops",
        "//tensorflow/contrib/tpu:all_ops",
-    ] + select({
-        "//tensorflow:with_kafka_support_windows_override": [],
-        "//tensorflow:with_kafka_support": [
-            "//tensorflow/contrib/kafka:dataset_ops_op_lib",
-        ],
-        "//conditions:default": [],
-    }),
+    ],
 )

 filegroup(
--- a/tensorflow/contrib/boosted_trees/kernels/quantile_ops.cc
+++ b/tensorflow/contrib/boosted_trees/kernels/quantile_ops.cc
@ -253,7 +253,7 @@ class CreateQuantileAccumulatorOp : public OpKernel {
 private:
  float epsilon_;
  int32 num_quantiles_;
-  // An upper bound on the number of entries that the summaries might have
+  // An upperbound on the number of enteries that the summaries might have
  // for a feature.
  int64 max_elements_;
  bool generate_quantiles_;
--- a/tensorflow/contrib/boosted_trees/lib/utils/batch_features.cc
+++ b/tensorflow/contrib/boosted_trees/lib/utils/batch_features.cc
@ -54,7 +54,7 @@ Status BatchFeatures::Initialize(
    TF_CHECK_AND_RETURN_IF_ERROR(
        dense_float_feature.dim_size(1) == 1,
        errors::InvalidArgument(
-            "Dense float features may not be multivalent: dim_size(1) = ",
+            "Dense float features may not be multi-valent: dim_size(1) = ",
            dense_float_feature.dim_size(1)));
    dense_float_feature_columns_.emplace_back(dense_float_feature);
  }
--- a/tensorflow/contrib/boosted_trees/lib/utils/batch_features_test.cc
+++ b/tensorflow/contrib/boosted_trees/lib/utils/batch_features_test.cc
@ -59,7 +59,7 @@ TEST_F(BatchFeaturesTest, DenseFloatFeatures_Multivalent) {
  BatchFeatures batch_features(1);
  auto dense_vec = AsTensor<float>({3.0f, 7.0f}, {1, 2});
  auto expected_error = InvalidArgument(
-      "Dense float features may not be multivalent: dim_size(1) = 2");
+      "Dense float features may not be multi-valent: dim_size(1) = 2");
  EXPECT_EQ(expected_error,
            batch_features.Initialize({dense_vec}, {}, {}, {}, {}, {}, {}));
 }
--- a/tensorflow/contrib/boosted_trees/lib/utils/dropout_utils.cc
+++ b/tensorflow/contrib/boosted_trees/lib/utils/dropout_utils.cc
@ -54,7 +54,7 @@ Status DropoutUtils::DropOutTrees(
  if (probability_of_skipping_dropout < 0 ||
      probability_of_skipping_dropout > 1) {
    return errors::InvalidArgument(
-        "Probability of skipping dropout must be in [0,1] range");
+        "Probability of skiping dropout must be in [0,1] range");
  }
  const auto num_trees = weights.size();

--- a/tensorflow/contrib/boosted_trees/lib/utils/dropout_utils.h
+++ b/tensorflow/contrib/boosted_trees/lib/utils/dropout_utils.h
@ -66,7 +66,7 @@ class DropoutUtils {
      // Current weights and num_updates will be updated as a result of this
      // func
      std::vector<float>* current_weights,
-      // How many weight assignments have been done for each tree already.
+      // How many weight assignements have been done for each tree already.
      std::vector<int32>* num_updates);
 };

--- a/tensorflow/contrib/boosted_trees/lib/utils/sparse_column_iterable_test.cc
+++ b/tensorflow/contrib/boosted_trees/lib/utils/sparse_column_iterable_test.cc
@ -34,7 +34,7 @@ TEST_F(SparseColumnIterableTest, Empty) {
 }

 TEST_F(SparseColumnIterableTest, Iterate) {
-  // 8 examples having 7 sparse features with the 3rd and 7th multivalent.
+  // 8 examples having 7 sparse features with the 3rd and 7th multi-valent.
  // This can be visualized like the following:
  // Instance | Sparse |
  // 0        |  x     |
--- a/tensorflow/contrib/boosted_trees/proto/tree_config.proto
+++ b/tensorflow/contrib/boosted_trees/proto/tree_config.proto
@ -53,7 +53,7 @@ message DenseFloatBinarySplit {
  // Float feature column and split threshold describing
  // the rule feature <= threshold.
  int32 feature_column = 1;
-  // If feature column is multivalent, this holds the index of the dimension
+  // If feature column is multivalent, this holds the index of the dimensiong
  // for the split. Defaults to 0.
  int32 dimension_id = 5;
  float threshold = 2;
--- a/tensorflow/contrib/boosted_trees/python/kernel_tests/prediction_ops_test.py
+++ b/tensorflow/contrib/boosted_trees/python/kernel_tests/prediction_ops_test.py
@ -120,8 +120,8 @@ class PredictionOpsTest(test_util.TensorFlowTestCase):
    """Sets up the prediction tests.

    Create a batch of two examples having one dense float, two sparse float
-    single valued, one sparse float multidimensional and one sparse int
-    features.  The data looks like the following:
+    single valued, one sparse float multidimensionl and one sparse int features.
+    The data looks like the following:
    | Instance | Dense0 | SparseF0 | SparseF1 | SparseI0 | SparseM
    | 0        |  7     |    -3    |          |    9,1   | __, 5.0
    | 1        | -2     |          | 4        |          |  3, ___
@ -810,7 +810,7 @@ class PredictionOpsTest(test_util.TensorFlowTestCase):
    # building. This tree should never be dropped.
    num_trees = 10
    with self.test_session():
-      # Empty tree ensemble.
+      # Empty tree ensenble.
      tree_ensemble_config = tree_config_pb2.DecisionTreeEnsembleConfig()
      # Add 10 trees with some weights.
      for i in range(0, num_trees):
@ -951,7 +951,7 @@ class PredictionOpsTest(test_util.TensorFlowTestCase):

  def testDropOutZeroProb(self):
    with self.test_session():
-      # Empty tree ensemble.
+      # Empty tree ensenble.
      tree_ensemble_config = tree_config_pb2.DecisionTreeEnsembleConfig()
      # Add 1000 trees with some weights.
      for i in range(0, 999):
@ -994,7 +994,7 @@ class PredictionOpsTest(test_util.TensorFlowTestCase):

  def testAveragingAllTrees(self):
    with self.test_session():
-      # Empty tree ensemble.
+      # Empty tree ensenble.
      tree_ensemble_config = tree_config_pb2.DecisionTreeEnsembleConfig()
      adjusted_tree_ensemble_config = (
          tree_config_pb2.DecisionTreeEnsembleConfig())
--- a/tensorflow/contrib/boosted_trees/python/kernel_tests/quantile_ops_test.py
+++ b/tensorflow/contrib/boosted_trees/python/kernel_tests/quantile_ops_test.py
@ -482,7 +482,7 @@ class QuantilesOpTest(test_util.TensorFlowTestCase):
    """Sets up the quantile op tests.

    Create a batch of 4 examples having 2 dense and 4 sparse features.
-    Fourth sparse feature is multivalent (3 dimensional)
+    Forth sparse feature is multivalent (3 dimensional)
    The data looks like this
    | Instance | Dense 0 | Dense 1 | Sparse 0 | Sparse 1 |Sparse 2| SparseM
    | 0        |   -0.1  |  -1     |   -2     |   0.1    |        |_ ,1,_
--- a/tensorflow/contrib/boosted_trees/python/ops/quantile_ops.py
+++ b/tensorflow/contrib/boosted_trees/python/ops/quantile_ops.py
@ -184,7 +184,7 @@ class QuantileAccumulator(saver.BaseSaverBuilder.SaveableObject):
    """Finalizes quantile summary stream and resets it for next iteration.

    Args:
-      stamp_token: Expected current token.
+      stamp_token: Exepcted current token.
      next_stamp_token: Next value for the token.
    Returns:
      A list of quantiles or approximate boundaries.
--- a/tensorflow/contrib/cmake/tf_tests.cmake
+++ b/tensorflow/contrib/cmake/tf_tests.cmake
@ -210,9 +210,6 @@ if (tensorflow_BUILD_PYTHON_TESTS)
    "${tensorflow_source_dir}/tensorflow/contrib/learn/python/learn/learn_io/graph_io_test.py"
    # Test is flaky on Windows GPU builds (b/38283730).
    "${tensorflow_source_dir}/tensorflow/contrib/factorization/python/ops/gmm_test.py"
-    # Disable following manual tag in BUILD.
-    "${tensorflow_source_dir}/tensorflow/python/keras/_impl/keras/layers/convolutional_test.py"
-
  )
  if (WIN32)
    set(tf_test_src_py_exclude
--- a/tensorflow/contrib/data/python/kernel_tests/batch_dataset_op_test.py
+++ b/tensorflow/contrib/data/python/kernel_tests/batch_dataset_op_test.py
@ -413,20 +413,6 @@ class BatchDatasetTest(test.TestCase):
  def testMapAndBatchPartialBatchDropRemainder(self):
    return self._testMapAndBatchPartialBatchHelper(drop_remainder=True)

-  def testMapAndBatchYieldsPartialBatch(self):
-    iterator = (dataset_ops.Dataset.range(10)
-                .apply(batching.map_and_batch(
-                    lambda x: array_ops.reshape(x * x, [1]), 4))
-                .make_one_shot_iterator())
-    self.assertEqual([None, 1], iterator.output_shapes.as_list())
-    next_element = iterator.get_next()
-    with self.test_session() as sess:
-      self.assertAllEqual([[0], [1], [4], [9]], sess.run(next_element))
-      self.assertAllEqual([[16], [25], [36], [49]], sess.run(next_element))
-      self.assertAllEqual([[64], [81]], sess.run(next_element))
-      with self.assertRaises(errors.OutOfRangeError):
-        sess.run(next_element)
-
  def testMapAndBatchSparse(self):

    def _sparse(i):
--- a/tensorflow/contrib/eager/python/BUILD
+++ b/tensorflow/contrib/eager/python/BUILD
@ -270,11 +270,7 @@ cuda_py_test(
        "//tensorflow/python/eager:test",
        "//tensorflow/python/keras",
    ],
-    tags = [
-        "no_oss",  # b/74395663
-        "no_windows",  # TODO: needs investigation on Windows
-        "notsan",
-    ],
+    tags = ["notsan"],
 )

 filegroup(
--- a/tensorflow/contrib/eager/python/examples/spinn/spinn_test.py
+++ b/tensorflow/contrib/eager/python/examples/spinn/spinn_test.py
@ -418,6 +418,7 @@ class SpinnTest(test_util.TensorFlowTestCase):
                    if event.summary.value
                    and event.summary.value[0].tag == "train/loss"]
    self.assertEqual(config.epochs, len(train_losses))
+    self.assertLess(train_losses[-1], train_losses[0])

    # 5. Verify that checkpoints exist and contains all the expected variables.
    self.assertTrue(glob.glob(os.path.join(config.logdir, "ckpt*")))
--- a/tensorflow/contrib/estimator/python/estimator/replicate_model_fn.py
+++ b/tensorflow/contrib/estimator/python/estimator/replicate_model_fn.py
@ -136,7 +136,7 @@ def replicate_model_fn(model_fn,
      the train_op argument of `EstimatorSpec`.
    loss_reduction: controls whether losses are summed or averaged.
    devices: Optional list of devices to replicate the model across.  This
-      argument can be used to replicate only on the subset of available GPUs.
+      argument can be used to replice only on the subset of available GPUs.
      If `None`, then all available GPUs are going to be used for replication.
      If no GPUs are available, then the model is going to be placed on the CPU.

--- a/tensorflow/contrib/factorization/kernels/clustering_ops.cc
+++ b/tensorflow/contrib/factorization/kernels/clustering_ops.cc
@ -353,7 +353,7 @@ class NearestNeighborsOp : public OpKernel {
    auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
    const int64 num_threads = worker_threads.num_threads;
    // This kernel might be configured to use fewer than the total number of
-    // available CPUs on the host machine. To avoid destructive interference
+    // available CPUs on the host machine. To avoid descructive interference
    // with other jobs running on the host machine, we must only use a fraction
    // of total available L3 cache. Unfortunately, we cannot query the host
    // machine to get the number of physical CPUs. So, we use a fixed per-CPU
--- a/tensorflow/contrib/factorization/python/ops/factorization_ops.py
+++ b/tensorflow/contrib/factorization/python/ops/factorization_ops.py
@ -106,7 +106,7 @@ class WALSModel(object):
      # the prep_gramian_op for row(column) can be run.
      worker_init_op = model.worker_init

-      # To be run once per integration sweep before the row(column) update
+      # To be run once per interation sweep before the row(column) update
      # initialize ops can be run. Note that in the distributed training
      # situations, this should only be run by the chief trainer. All other
      # trainers need to block until this is done.
@ -118,9 +118,9 @@ class WALSModel(object):
      init_row_update_op = model.initialize_row_update_op
      init_col_update_op = model.initialize_col_update_op

-      # Ops to update row(column). This can either take the entire sparse
-      # tensor or slices of sparse tensor. For distributed trainer, each
-      # trainer handles just part of the matrix.
+      # Ops to upate row(column). This can either take the entire sparse tensor
+      # or slices of sparse tensor. For distributed trainer, each trainer
+      # handles just part of the matrix.
      _, row_update_op, unreg_row_loss, row_reg, _ = model.update_row_factors(
           sp_input=matrix_slices_from_queue_for_worker_shard)
      row_loss = unreg_row_loss + row_reg
@ -220,7 +220,7 @@ class WALSModel(object):
        in the form of [[w_0, w_1, ...], [w_k, ... ], [...]], with the number of
        inner lists matching the number of row factor shards and the elements in
        each inner list are the weights for the rows of the corresponding row
-        factor shard. In this case,  w_ij = unobserved_weight +
+        factor shard. In this case,  w_ij = unonbserved_weight +
                                            row_weights[i] * col_weights[j].
        - If this is a single non-negative real number, this value is used for
        all row weights and w_ij = unobserved_weight + row_weights *
@ -435,7 +435,7 @@ class WALSModel(object):
      gramian: Variable storing the gramian calculated from the factors.

    Returns:
-      A op that updates the gramian with the calculated value from the factors.
+      A op that updates the gramian with the calcuated value from the factors.
    """
    partial_gramians = []
    for f in factors:
@ -564,7 +564,7 @@ class WALSModel(object):

    Note that specifically this initializes the cache of the row and column
    weights on workers when `use_factors_weights_cache` is True. In this case,
-    if these weights are being calculated and reset after the object is created,
+    if these weights are being calcualted and reset after the object is created,
    it is important to ensure this ops is run afterwards so the cache reflects
    the correct values.
    """
--- a/tensorflow/contrib/factorization/python/ops/factorization_ops_test.py
+++ b/tensorflow/contrib/factorization/python/ops/factorization_ops_test.py
@ -210,7 +210,7 @@ class WalsModelTest(test.TestCase):

      # Test row projection.
      # Using the specified projection weights for the 2 row feature vectors.
-      # This is expected to reproduce the same row factors in the model as the
+      # This is expected to reprodue the same row factors in the model as the
      # weights and feature vectors are identical to that used in model
      # training.
      projected_rows = wals_model.project_row_factors(
@ -283,8 +283,8 @@ class WalsModelTest(test.TestCase):

      # Test column projection.
      # Using the specified projection weights for the 3 column feature vectors.
-      # This is expected to reproduce the same column factors in the model as
-      # the weights and feature vectors are identical to that used in model
+      # This is expected to reprodue the same column factors in the model as the
+      # weights and feature vectors are identical to that used in model
      # training.
      projected_cols = wals_model.project_col_factors(
          sp_input=sp_feeder,
@ -385,7 +385,7 @@ class WalsModelTest(test.TestCase):

      # Test row projection.
      # Using the specified projection weights for the 2 row feature vectors.
-      # This is expected to reproduce the same row factors in the model as the
+      # This is expected to reprodue the same row factors in the model as the
      # weights and feature vectors are identical to that used in model
      # training.
      projected_rows = wals_model.project_row_factors(
@ -462,8 +462,8 @@ class WalsModelTest(test.TestCase):

      # Test column projection.
      # Using the specified projection weights for the 2 column feature vectors.
-      # This is expected to reproduce the same column factors in the model as
-      # the weights and feature vectors are identical to that used in model
+      # This is expected to reprodue the same column factors in the model as the
+      # weights and feature vectors are identical to that used in model
      # training.
      projected_cols = wals_model.project_col_factors(
          sp_input=sp_feeder,
--- a/tensorflow/contrib/factorization/python/ops/gmm_ops.py
+++ b/tensorflow/contrib/factorization/python/ops/gmm_ops.py
@ -280,7 +280,7 @@ class GmmAlgorithm(object):
    self._define_score_samples()

  def _define_full_covariance_probs(self, shard_id, shard):
-    """Defines the full covariance probabilities per example in a class.
+    """Defines the full covariance probabilties per example in a class.

    Updates a matrix with dimension num_examples X num_classes.

@ -344,7 +344,7 @@ class GmmAlgorithm(object):
  def _define_prior_log_prob_operation(self, shard_id):
    """Computes the prior probability of all samples.

-    Updates a vector where each item is the prior probability of an
+    Updates a vector where each item is the prior probabibility of an
    input example.

    Args:
--- a/tensorflow/contrib/factorization/python/ops/gmm_test.py
+++ b/tensorflow/contrib/factorization/python/ops/gmm_test.py
@ -210,7 +210,7 @@ class GMMTestQueues(test.TestCase):
    return _fn

  # This test makes sure that there are no deadlocks when using a QueueRunner.
-  # Note that since cluster initialization is dependent on inputs, if input
+  # Note that since cluster initialization is dependendent on inputs, if input
  # is generated using a QueueRunner, one has to make sure that these runners
  # are started before the initialization.
  def test_queues(self):
--- a/tensorflow/contrib/factorization/python/ops/kmeans_test.py
+++ b/tensorflow/contrib/factorization/python/ops/kmeans_test.py
@ -413,7 +413,7 @@ class KMeansCosineDistanceTest(KMeansTestBase):
    self.assertAllClose(score, self.true_score, atol=1e-2)

  def test_predict_kmeans_plus_plus(self):
-    # Most points are concentrated near one center. KMeans++ is likely to find
+    # Most points are concetrated near one center. KMeans++ is likely to find
    # the less populated centers.
    points = np.array(
        [[2.5, 3.5], [2.5, 3.5], [-2, 3], [-2, 3], [-3, -3], [-3.1, -3.2],
@ -604,7 +604,7 @@ class KMeansTestQueues(test.TestCase):
    return _fn

  # This test makes sure that there are no deadlocks when using a QueueRunner.
-  # Note that since cluster initialization is dependent on inputs, if input
+  # Note that since cluster initialization is dependendent on inputs, if input
  # is generated using a QueueRunner, one has to make sure that these runners
  # are started before the initialization.
  def test_queues(self):
--- a/tensorflow/contrib/factorization/python/ops/wals.py
+++ b/tensorflow/contrib/factorization/python/ops/wals.py
@ -235,7 +235,7 @@ def _wals_factorization_model_function(features, labels, mode, params):
        num_items: An integer, the total number of items of this axis.
        update_fn: A function that takes one argument (`sp_input`), and that
        returns a tuple of
-          * new_factors: A float Tensor of the factor values after update.
+          * new_factors: A flot Tensor of the factor values after update.
          * update_op: a TensorFlow op which updates the factors.
          * loss: A float Tensor, the unregularized loss.
          * reg_loss: A float Tensor, the regularization loss.
--- a/tensorflow/contrib/learn/BUILD
+++ b/tensorflow/contrib/learn/BUILD
@ -226,7 +226,6 @@ py_test(
    size = "small",
    srcs = ["python/learn/monitors_test.py"],
    srcs_version = "PY2AND3",
-    tags = ["no_pip_gpu"],  # b/74437598
    deps = [
        ":learn",
        "//tensorflow/contrib/framework:framework_py",
--- a/tensorflow/contrib/learn/python/learn/estimators/linear.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/linear.py
@ -243,8 +243,8 @@ def sdca_model_fn(features, labels, mode, params):

  parent_scope = "linear"

-  with variable_scope.variable_scope(
-      values=features.values(), name_or_scope=parent_scope) as scope:
+  with variable_scope.variable_op_scope(
+      features.values(), parent_scope) as scope:
    features = features.copy()
    features.update(layers.transform_features(features, feature_columns))
    logits, columns_to_variables, bias = (
--- a/tensorflow/contrib/linear_optimizer/python/sdca_estimator.py
+++ b/tensorflow/contrib/linear_optimizer/python/sdca_estimator.py
@ -140,8 +140,8 @@ def sdca_model_fn(features, labels, mode, params, config=None):

  parent_scope = "linear"

-  with variable_scope.variable_scope(
-      values=features.values(), name_or_scope=parent_scope) as scope:
+  with variable_scope.variable_op_scope(features.values(),
+                                        parent_scope) as scope:
    features = features.copy()
    features.update(layers.transform_features(features, feature_columns))
    logits, columns_to_variables, bias = (
--- a/tensorflow/contrib/lite/README.md
+++ b/tensorflow/contrib/lite/README.md
@ -126,9 +126,6 @@ The above pre-trained models have been trained on the ImageNet data set, which c

 The [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) codelab walks through this process step-by-step. The retraining code supports retraining for both floating point and quantized inference.

-# Getting started with RaspberryPi
-
-Using RaspberryPi can be accomplished by following the [Makefile instructions](g3doc/rpi.md). That will give a you a static library (.a) that you can build your app against. Python bindings will be coming soon as well as a demo app.

 ### Train a custom model
 A developer may choose to train a custom model using Tensorflow. TensorFlow documentation has [several tutorials](https://www.tensorflow.org/tutorials/) for building and training models. If the user has written a model using TensorFlow's Slim Framework the first step is to export this to a GraphDef file. This is necessary because Slim does not store the model structure outside the code, so to communicate with other parts of the framework it needs to be exported. Documentation for the export can be found [here](https://github.com/tensorflow/models/tree/master/research/slim#Export). The output of this step will be a .pb file for the custom model.
--- a/tensorflow/contrib/lite/builtin_ops.h
+++ b/tensorflow/contrib/lite/builtin_ops.h
@ -79,7 +79,6 @@ typedef enum {
  kTfLiteBuiltinBidirectionalSequenceLstm = 52,
  kTfLiteBuiltinCast = 53,
  kTfLiteBuiltinPrelu = 54,
-  kTfLiteBuiltinMaximum = 55,
 } TfLiteBuiltinOperator;

 #ifdef __cplusplus
--- a/tensorflow/contrib/lite/g3doc/models.md
+++ b/tensorflow/contrib/lite/g3doc/models.md
@ -1,4 +1,4 @@
-# List of Hosted Models
+#List of Hosted Models

 *   [Inception V3 2015](https://storage.googleapis.com/download.tensorflow.org/models/tflite/inception_v3_2015_2017_11_10.zip)
 *   [Inception V3 Slim 2016](https://storage.googleapis.com/download.tensorflow.org/models/tflite/inception_v3_slim_2016_android_2017_11_10.zip)
--- a/tensorflow/contrib/lite/kernels/BUILD
+++ b/tensorflow/contrib/lite/kernels/BUILD
@ -156,7 +156,6 @@ cc_library(
        "local_response_norm.cc",
        "lsh_projection.cc",
        "lstm.cc",
-        "maximum.cc",
        "mean.cc",
        "mfcc.cc",
        "mul.cc",
@ -537,18 +536,6 @@ tf_cc_test(
    ],
 )

-tf_cc_test(
-    name = "maximum_test",
-    size = "small",
-    srcs = ["maximum_test.cc"],
-    deps = [
-        ":builtin_ops",
-        "//tensorflow/contrib/lite:framework",
-        "//tensorflow/contrib/lite/kernels:test_util",
-        "@com_google_googletest//:gtest",
-    ],
-)
-
 tf_cc_test(
    name = "mean_test",
    size = "small",
--- a/tensorflow/contrib/lite/kernels/internal/reference/reference_ops.h
+++ b/tensorflow/contrib/lite/kernels/internal/reference/reference_ops.h
@ -404,7 +404,6 @@ inline void DepthToSpace(const T* input_data, const Dims<4>& input_dims,
          const int in_d =
              out_d + ((out_h % block_size) * block_size + out_w % block_size) *
                          output_depth;
-
          const int in_w = out_w / block_size;
          const int in_h = out_h / block_size;
          const int in_b = out_b;
@ -3364,30 +3363,6 @@ void TensorFlowMaximum(const T* input1_data, const Dims<4>& input1_dims,
  }
 }

-template <typename T>
-void TensorFlowMaximum(const T* input1_data, const Dims<4>& input1_dims,
-                       const T* input2_data, const Dims<4>& input2_dims,
-                       T* output_data, const Dims<4>& output_dims) {
-  NdArrayDesc<4> desc1;
-  NdArrayDesc<4> desc2;
-  NdArrayDescsForElementwiseBroadcast(input1_dims, input2_dims, &desc1, &desc2);
-
-  for (int b = 0; b < ArraySize(output_dims, 3); ++b) {
-    for (int y = 0; y < ArraySize(output_dims, 2); ++y) {
-      for (int x = 0; x < ArraySize(output_dims, 1); ++x) {
-        for (int c = 0; c < ArraySize(output_dims, 0); ++c) {
-          auto out_idx = Offset(output_dims, c, x, y, b);
-          auto in1_idx = SubscriptToIndex(desc1, c, x, y, b);
-          auto in2_idx = SubscriptToIndex(desc2, c, x, y, b);
-          auto in1_val = input1_data[in1_idx];
-          auto in2_val = input2_data[in2_idx];
-          output_data[out_idx] = in1_val > in2_val ? in1_val : in2_val;
-        }
-      }
-    }
-  }
-}
-
 template <typename T1, typename T2, typename T3>
 void ArgMax(const T3* axis, const T1* input_data, const Dims<4>& input_dims,
            T2* output_data, const Dims<4>& output_dims) {
--- a/tensorflow/contrib/lite/kernels/maximum.cc
+++ b/tensorflow/contrib/lite/kernels/maximum.cc
@ -1,106 +0,0 @@
-/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-#include <string.h>
-#include <vector>
-#include "tensorflow/contrib/lite/builtin_op_data.h"
-#include "tensorflow/contrib/lite/context.h"
-#include "tensorflow/contrib/lite/kernels/internal/reference/reference_ops.h"
-#include "tensorflow/contrib/lite/kernels/internal/tensor.h"
-#include "tensorflow/contrib/lite/kernels/kernel_util.h"
-#include "tensorflow/contrib/lite/kernels/op_macros.h"
-
-namespace tflite {
-namespace ops {
-namespace builtin {
-namespace maximum {
-
-// This file has a reference implemenation of TFMaximum.
-enum KernelType {
-  kReference,
-};
-
-constexpr int kInputTensor1 = 0;
-constexpr int kInputTensor2 = 1;
-constexpr int kOutputTensor = 0;
-
-struct MaximumContext {
-  MaximumContext(TfLiteContext* context, TfLiteNode* node) {
-    input1 = GetInput(context, node, kInputTensor1);
-    input2 = GetInput(context, node, kInputTensor2);
-    output = GetOutput(context, node, kOutputTensor);
-  }
-  TfLiteTensor* input1;
-  TfLiteTensor* input2;
-  TfLiteTensor* output;
-};
-
-TfLiteStatus Prepare(TfLiteContext* context, TfLiteNode* node) {
-  TF_LITE_ENSURE_EQ(context, NumInputs(node), 2);
-  TF_LITE_ENSURE_EQ(context, NumOutputs(node), 1);
-
-  MaximumContext op_context(context, node);
-  TF_LITE_ENSURE_EQ(context, op_context.input1->type, op_context.input2->type);
-  TfLiteIntArray* output_dims = TfLiteIntArrayCopy(op_context.input2->dims);
-  op_context.output->type = op_context.input2->type;
-  return context->ResizeTensor(context, op_context.output, output_dims);
-}
-
-template <KernelType kernel_type>
-TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {
-  MaximumContext op_context(context, node);
-
-#define TF_LITE_MAXIMUM(kernel_type, data_type)    \
-  kernel_type::TensorFlowMaximum<data_type>(       \
-      GetTensorData<data_type>(op_context.input1), \
-      GetTensorDims(op_context.input1),            \
-      GetTensorData<data_type>(op_context.input2), \
-      GetTensorDims(op_context.input2),            \
-      GetTensorData<data_type>(op_context.output), \
-      GetTensorDims(op_context.output))
-
-  if (kernel_type == kReference) {
-    switch (op_context.output->type) {
-      case kTfLiteFloat32:
-        TF_LITE_MAXIMUM(reference_ops, float);
-        break;
-      default:
-        context->ReportError(context,
-                             "Type %d is currently not supported by Maximum.",
-                             op_context.output->type);
-        return kTfLiteError;
-    }
-  } else {
-    context->ReportError(context,
-                         "Type %d is currently not supported by Maximum.",
-                         op_context.output->type);
-    return kTfLiteError;
-  }
-#undef TF_LITE_MAXIMUM
-  return kTfLiteOk;
-}
-
-}  // namespace maximum
-
-TfLiteRegistration* Register_MAXIMUM_REF() {
-  static TfLiteRegistration r = {nullptr, nullptr, maximum::Prepare,
-                                 maximum::Eval<maximum::kReference>};
-  return &r;
-}
-
-TfLiteRegistration* Register_MAXIMUM() { return Register_MAXIMUM_REF(); }
-
-}  // namespace builtin
-}  // namespace ops
-}  // namespace tflite
--- a/tensorflow/contrib/lite/kernels/maximum_test.cc
+++ b/tensorflow/contrib/lite/kernels/maximum_test.cc
@ -1,81 +0,0 @@
-/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-==============================================================================*/
-#include <gtest/gtest.h>
-#include "tensorflow/contrib/lite/interpreter.h"
-#include "tensorflow/contrib/lite/kernels/register.h"
-#include "tensorflow/contrib/lite/kernels/test_util.h"
-#include "tensorflow/contrib/lite/model.h"
-
-namespace tflite {
-namespace {
-
-using ::testing::ElementsAreArray;
-
-class MaximumOpModel : public SingleOpModel {
- public:
-  MaximumOpModel(const TensorData& input1, const TensorData& input2,
-                 const TensorType& output) {
-    input1_ = AddInput(input1);
-    input2_ = AddInput(input2);
-    output_ = AddOutput(output);
-    SetBuiltinOp(BuiltinOperator_MAXIMUM, BuiltinOptions_MaximumOptions,
-                 CreateMaximumOptions(builder_).Union());
-    BuildInterpreter({GetShape(input1_), GetShape(input2_)});
-  }
-
-  template <class T>
-  void SetInput1(std::initializer_list<T> data) {
-    PopulateTensor(input1_, data);
-  }
-
-  template <class T>
-  void SetInput2(std::initializer_list<T> data) {
-    PopulateTensor(input2_, data);
-  }
-
-  template <class T>
-  std::vector<T> GetOutput() {
-    return ExtractVector<T>(output_);
-  }
-  std::vector<int> GetOutputShape() { return GetTensorShape(output_); }
-
- protected:
-  int input1_;
-  int input2_;
-  int output_;
-};
-
-TEST(MaximumOpTest, FloatTest) {
-  std::initializer_list<float> data1 = {1.0, 0.0, -1.0, 11.0, -2.0, -1.44};
-  std::initializer_list<float> data2 = {-1.0, 0.0, 1.0, 12.0, -3.0, -1.43};
-  MaximumOpModel m({TensorType_FLOAT32, {3, 1, 2}},
-                   {TensorType_FLOAT32, {3, 1, 2}}, TensorType_FLOAT32);
-  m.SetInput1<float>(data1);
-  m.SetInput2<float>(data2);
-  m.Invoke();
-  EXPECT_THAT(m.GetOutputShape(), ElementsAreArray({3, 1, 2}));
-  EXPECT_THAT(
-      m.GetOutput<float>(),
-      ElementsAreArray(ArrayFloatNear({1.0, 0.0, 1.0, 12.0, -2.0, -1.43})));
-}
-
-}  // namespace
-}  // namespace tflite
-
-int main(int argc, char** argv) {
-  ::tflite::LogToStderr();
-  ::testing::InitGoogleTest(&argc, argv);
-  return RUN_ALL_TESTS();
-}
--- a/tensorflow/contrib/lite/kernels/register.cc
+++ b/tensorflow/contrib/lite/kernels/register.cc
@ -76,7 +76,6 @@ TfLiteRegistration* Register_LOG_SOFTMAX();
 TfLiteRegistration* Register_CAST();
 TfLiteRegistration* Register_DEQUANTIZE();
 TfLiteRegistration* Register_PRELU();
-TfLiteRegistration* Register_MAXIMUM();

 BuiltinOpResolver::BuiltinOpResolver() {
  AddBuiltin(BuiltinOperator_RELU, Register_RELU());
@ -134,7 +133,6 @@ BuiltinOpResolver::BuiltinOpResolver() {
  AddBuiltin(BuiltinOperator_CAST, Register_CAST());
  AddBuiltin(BuiltinOperator_DEQUANTIZE, Register_DEQUANTIZE());
  AddBuiltin(BuiltinOperator_PRELU, Register_PRELU());
-  AddBuiltin(BuiltinOperator_MAXIMUM, Register_MAXIMUM());

  // TODO(andrewharp, ahentz): Move these somewhere more appropriate so that
  // custom ops aren't always included by default.
--- a/tensorflow/contrib/lite/model.cc
+++ b/tensorflow/contrib/lite/model.cc
@ -597,9 +597,6 @@ void* ParseOpData(const Operator* op, BuiltinOperator op_type,
      builtin_data = reinterpret_cast<void*>(params);
      break;
    }
-    case BuiltinOperator_MAXIMUM: {
-      break;
-    }
    case BuiltinOperator_DELEGATE: {
      // TODO(ycling): Revisit when supporting saving delegated models.
      error_reporter->Report("DELEGATE op shouldn't exist in model.");
--- a/tensorflow/contrib/lite/nnapi_delegate.cc
+++ b/tensorflow/contrib/lite/nnapi_delegate.cc
@ -350,7 +350,6 @@ void AddOpsAndParams(tflite::Interpreter* interpreter,
      case tflite::BuiltinOperator_DELEGATE:
      case tflite::BuiltinOperator_CAST:
      case tflite::BuiltinOperator_PRELU:
-      case tflite::BuiltinOperator_MAXIMUM:
        FATAL("Op code %d is currently not delegated to NNAPI", builtin);
        nn_op_type = -1;  // set to invalid
        break;
--- a/tensorflow/contrib/lite/python/lite.py
+++ b/tensorflow/contrib/lite/python/lite.py
@ -25,9 +25,9 @@ EXPERIMENTAL: APIs here are unstable and likely to change without notice.
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-import os as _os
-import subprocess as _subprocess
-import tempfile as _tempfile
+import os
+import subprocess
+import tempfile

 # pylint: disable=unused-import
 from tensorflow.contrib.lite.python.op_hint import convert_op_hints_to_stubs
@ -74,7 +74,7 @@ else:
  _toco_from_proto_bin = _resource_loader.get_path_to_datafile(
      "../toco/python/toco_from_protos")

-if _toco_from_proto_bin and not _os.path.exists(_toco_from_proto_bin):
+if _toco_from_proto_bin and not os.path.exists(_toco_from_proto_bin):
  _toco_from_proto_bin = "toco_from_protos"


@ -102,10 +102,10 @@ def toco_convert_protos(model_flags_str, toco_flags_str, input_data_str):
    return _toco_python.TocoConvert(
        model_flags_str, toco_flags_str, input_data_str)

-  with _tempfile.NamedTemporaryFile() as fp_toco, \
-           _tempfile.NamedTemporaryFile() as fp_model, \
-           _tempfile.NamedTemporaryFile() as fp_input, \
-           _tempfile.NamedTemporaryFile() as fp_output:
+  with tempfile.NamedTemporaryFile() as fp_toco, \
+           tempfile.NamedTemporaryFile() as fp_model, \
+           tempfile.NamedTemporaryFile() as fp_input, \
+           tempfile.NamedTemporaryFile() as fp_output:
    fp_model.write(model_flags_str)
    fp_toco.write(toco_flags_str)
    fp_input.write(input_data_str)
@ -118,11 +118,11 @@ def toco_convert_protos(model_flags_str, toco_flags_str, input_data_str):
        fp_output.name
    ]
    cmdline = " ".join(cmd)
-    proc = _subprocess.Popen(
+    proc = subprocess.Popen(
        cmdline,
        shell=True,
-        stdout=_subprocess.PIPE,
-        stderr=_subprocess.STDOUT,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
        close_fds=True)
    stdout, stderr = proc.communicate()
    exitcode = proc.returncode
--- a/tensorflow/contrib/lite/schema/schema.fbs
+++ b/tensorflow/contrib/lite/schema/schema.fbs
@ -131,7 +131,6 @@ enum BuiltinOperator : byte {
  BIDIRECTIONAL_SEQUENCE_LSTM = 52,
  CAST = 53,
  PRELU = 54,
-  MAXIMUM = 55,
 }

 // Options for the builtin operators.
@ -174,7 +173,6 @@ union BuiltinOptions {
  LogSoftmaxOptions,
  CastOptions,
  DequantizeOptions,
-  MaximumOptions,
 }

 enum Padding : byte { SAME, VALID }
@ -386,9 +384,6 @@ table CastOptions {
 table DequantizeOptions {
 }

-table MaximumOptions {
-}
-
 // An OperatorCode can be an enum value (BuiltinOperator) if the operator is a
 // builtin, or a string if the operator is custom.
 table OperatorCode {
--- a/tensorflow/contrib/lite/schema/schema_generated.h
+++ b/tensorflow/contrib/lite/schema/schema_generated.h
@ -145,9 +145,6 @@ struct CastOptionsT;
 struct DequantizeOptions;
 struct DequantizeOptionsT;

-struct MaximumOptions;
-struct MaximumOptionsT;
-
 struct OperatorCode;
 struct OperatorCodeT;

@ -258,12 +255,11 @@ enum BuiltinOperator {
  BuiltinOperator_BIDIRECTIONAL_SEQUENCE_LSTM = 52,
  BuiltinOperator_CAST = 53,
  BuiltinOperator_PRELU = 54,
-  BuiltinOperator_MAXIMUM = 55,
  BuiltinOperator_MIN = BuiltinOperator_ADD,
-  BuiltinOperator_MAX = BuiltinOperator_MAXIMUM
+  BuiltinOperator_MAX = BuiltinOperator_PRELU
 };

-inline BuiltinOperator (&EnumValuesBuiltinOperator())[54] {
+inline BuiltinOperator (&EnumValuesBuiltinOperator())[53] {
  static BuiltinOperator values[] = {
    BuiltinOperator_ADD,
    BuiltinOperator_AVERAGE_POOL_2D,
@ -317,8 +313,7 @@ inline BuiltinOperator (&EnumValuesBuiltinOperator())[54] {
    BuiltinOperator_DELEGATE,
    BuiltinOperator_BIDIRECTIONAL_SEQUENCE_LSTM,
    BuiltinOperator_CAST,
-    BuiltinOperator_PRELU,
-    BuiltinOperator_MAXIMUM
+    BuiltinOperator_PRELU
  };
  return values;
 }
@ -380,7 +375,6 @@ inline const char **EnumNamesBuiltinOperator() {
    "BIDIRECTIONAL_SEQUENCE_LSTM",
    "CAST",
    "PRELU",
-    "MAXIMUM",
    nullptr
  };
  return names;
@ -431,12 +425,11 @@ enum BuiltinOptions {
  BuiltinOptions_LogSoftmaxOptions = 36,
  BuiltinOptions_CastOptions = 37,
  BuiltinOptions_DequantizeOptions = 38,
-  BuiltinOptions_MaximumOptions = 39,
  BuiltinOptions_MIN = BuiltinOptions_NONE,
-  BuiltinOptions_MAX = BuiltinOptions_MaximumOptions
+  BuiltinOptions_MAX = BuiltinOptions_DequantizeOptions
 };

-inline BuiltinOptions (&EnumValuesBuiltinOptions())[40] {
+inline BuiltinOptions (&EnumValuesBuiltinOptions())[39] {
  static BuiltinOptions values[] = {
    BuiltinOptions_NONE,
    BuiltinOptions_Conv2DOptions,
@ -476,8 +469,7 @@ inline BuiltinOptions (&EnumValuesBuiltinOptions())[40] {
    BuiltinOptions_SplitOptions,
    BuiltinOptions_LogSoftmaxOptions,
    BuiltinOptions_CastOptions,
-    BuiltinOptions_DequantizeOptions,
-    BuiltinOptions_MaximumOptions
+    BuiltinOptions_DequantizeOptions
  };
  return values;
 }
@ -523,7 +515,6 @@ inline const char **EnumNamesBuiltinOptions() {
    "LogSoftmaxOptions",
    "CastOptions",
    "DequantizeOptions",
-    "MaximumOptions",
    nullptr
  };
  return names;
@ -690,10 +681,6 @@ template<> struct BuiltinOptionsTraits<DequantizeOptions> {
  static const BuiltinOptions enum_value = BuiltinOptions_DequantizeOptions;
 };

-template<> struct BuiltinOptionsTraits<MaximumOptions> {
-  static const BuiltinOptions enum_value = BuiltinOptions_MaximumOptions;
-};
-
 struct BuiltinOptionsUnion {
  BuiltinOptions type;
  void *value;
@ -1029,14 +1016,6 @@ struct BuiltinOptionsUnion {
    return type == BuiltinOptions_DequantizeOptions ?
      reinterpret_cast<const DequantizeOptionsT *>(value) : nullptr;
  }
-  MaximumOptionsT *AsMaximumOptions() {
-    return type == BuiltinOptions_MaximumOptions ?
-      reinterpret_cast<MaximumOptionsT *>(value) : nullptr;
-  }
-  const MaximumOptionsT *AsMaximumOptions() const {
-    return type == BuiltinOptions_MaximumOptions ?
-      reinterpret_cast<const MaximumOptionsT *>(value) : nullptr;
-  }
 };

 bool VerifyBuiltinOptions(flatbuffers::Verifier &verifier, const void *obj, BuiltinOptions type);
@ -3780,46 +3759,6 @@ inline flatbuffers::Offset<DequantizeOptions> CreateDequantizeOptions(

 flatbuffers::Offset<DequantizeOptions> CreateDequantizeOptions(flatbuffers::FlatBufferBuilder &_fbb, const DequantizeOptionsT *_o, const flatbuffers::rehasher_function_t *_rehasher = nullptr);

-struct MaximumOptionsT : public flatbuffers::NativeTable {
-  typedef MaximumOptions TableType;
-  MaximumOptionsT() {
-  }
-};
-
-struct MaximumOptions FLATBUFFERS_FINAL_CLASS : private flatbuffers::Table {
-  typedef MaximumOptionsT NativeTableType;
-  bool Verify(flatbuffers::Verifier &verifier) const {
-    return VerifyTableStart(verifier) &&
-           verifier.EndTable();
-  }
-  MaximumOptionsT *UnPack(const flatbuffers::resolver_function_t *_resolver = nullptr) const;
-  void UnPackTo(MaximumOptionsT *_o, const flatbuffers::resolver_function_t *_resolver = nullptr) const;
-  static flatbuffers::Offset<MaximumOptions> Pack(flatbuffers::FlatBufferBuilder &_fbb, const MaximumOptionsT* _o, const flatbuffers::rehasher_function_t *_rehasher = nullptr);
-};
-
-struct MaximumOptionsBuilder {
-  flatbuffers::FlatBufferBuilder &fbb_;
-  flatbuffers::uoffset_t start_;
-  explicit MaximumOptionsBuilder(flatbuffers::FlatBufferBuilder &_fbb)
-        : fbb_(_fbb) {
-    start_ = fbb_.StartTable();
-  }
-  MaximumOptionsBuilder &operator=(const MaximumOptionsBuilder &);
-  flatbuffers::Offset<MaximumOptions> Finish() {
-    const auto end = fbb_.EndTable(start_);
-    auto o = flatbuffers::Offset<MaximumOptions>(end);
-    return o;
-  }
-};
-
-inline flatbuffers::Offset<MaximumOptions> CreateMaximumOptions(
-    flatbuffers::FlatBufferBuilder &_fbb) {
-  MaximumOptionsBuilder builder_(_fbb);
-  return builder_.Finish();
-}
-
-flatbuffers::Offset<MaximumOptions> CreateMaximumOptions(flatbuffers::FlatBufferBuilder &_fbb, const MaximumOptionsT *_o, const flatbuffers::rehasher_function_t *_rehasher = nullptr);
-
 struct OperatorCodeT : public flatbuffers::NativeTable {
  typedef OperatorCode TableType;
  BuiltinOperator builtin_code;
@ -4051,9 +3990,6 @@ struct Operator FLATBUFFERS_FINAL_CLASS : private flatbuffers::Table {
  const DequantizeOptions *builtin_options_as_DequantizeOptions() const {
    return builtin_options_type() == BuiltinOptions_DequantizeOptions ? static_cast<const DequantizeOptions *>(builtin_options()) : nullptr;
  }
-  const MaximumOptions *builtin_options_as_MaximumOptions() const {
-    return builtin_options_type() == BuiltinOptions_MaximumOptions ? static_cast<const MaximumOptions *>(builtin_options()) : nullptr;
-  }
  const flatbuffers::Vector<uint8_t> *custom_options() const {
    return GetPointer<const flatbuffers::Vector<uint8_t> *>(VT_CUSTOM_OPTIONS);
  }
@ -4232,10 +4168,6 @@ template<> inline const DequantizeOptions *Operator::builtin_options_as<Dequanti
  return builtin_options_as_DequantizeOptions();
 }

-template<> inline const MaximumOptions *Operator::builtin_options_as<MaximumOptions>() const {
-  return builtin_options_as_MaximumOptions();
-}
-
 struct OperatorBuilder {
  flatbuffers::FlatBufferBuilder &fbb_;
  flatbuffers::uoffset_t start_;
@ -5764,29 +5696,6 @@ inline flatbuffers::Offset<DequantizeOptions> CreateDequantizeOptions(flatbuffer
      _fbb);
 }

-inline MaximumOptionsT *MaximumOptions::UnPack(const flatbuffers::resolver_function_t *_resolver) const {
-  auto _o = new MaximumOptionsT();
-  UnPackTo(_o, _resolver);
-  return _o;
-}
-
-inline void MaximumOptions::UnPackTo(MaximumOptionsT *_o, const flatbuffers::resolver_function_t *_resolver) const {
-  (void)_o;
-  (void)_resolver;
-}
-
-inline flatbuffers::Offset<MaximumOptions> MaximumOptions::Pack(flatbuffers::FlatBufferBuilder &_fbb, const MaximumOptionsT* _o, const flatbuffers::rehasher_function_t *_rehasher) {
-  return CreateMaximumOptions(_fbb, _o, _rehasher);
-}
-
-inline flatbuffers::Offset<MaximumOptions> CreateMaximumOptions(flatbuffers::FlatBufferBuilder &_fbb, const MaximumOptionsT *_o, const flatbuffers::rehasher_function_t *_rehasher) {
-  (void)_rehasher;
-  (void)_o;
-  struct _VectorArgs { flatbuffers::FlatBufferBuilder *__fbb; const MaximumOptionsT* __o; const flatbuffers::rehasher_function_t *__rehasher; } _va = { &_fbb, _o, _rehasher}; (void)_va;
-  return tflite::CreateMaximumOptions(
-      _fbb);
-}
-
 inline OperatorCodeT *OperatorCode::UnPack(const flatbuffers::resolver_function_t *_resolver) const {
  auto _o = new OperatorCodeT();
  UnPackTo(_o, _resolver);
@ -6119,10 +6028,6 @@ inline bool VerifyBuiltinOptions(flatbuffers::Verifier &verifier, const void *ob
      auto ptr = reinterpret_cast<const DequantizeOptions *>(obj);
      return verifier.VerifyTable(ptr);
    }
-    case BuiltinOptions_MaximumOptions: {
-      auto ptr = reinterpret_cast<const MaximumOptions *>(obj);
-      return verifier.VerifyTable(ptr);
-    }
    default: return false;
  }
 }
@ -6293,10 +6198,6 @@ inline void *BuiltinOptionsUnion::UnPack(const void *obj, BuiltinOptions type, c
      auto ptr = reinterpret_cast<const DequantizeOptions *>(obj);
      return ptr->UnPack(resolver);
    }
-    case BuiltinOptions_MaximumOptions: {
-      auto ptr = reinterpret_cast<const MaximumOptions *>(obj);
-      return ptr->UnPack(resolver);
-    }
    default: return nullptr;
  }
 }
@ -6455,10 +6356,6 @@ inline flatbuffers::Offset<void> BuiltinOptionsUnion::Pack(flatbuffers::FlatBuff
      auto ptr = reinterpret_cast<const DequantizeOptionsT *>(value);
      return CreateDequantizeOptions(_fbb, ptr, _rehasher).Union();
    }
-    case BuiltinOptions_MaximumOptions: {
-      auto ptr = reinterpret_cast<const MaximumOptionsT *>(value);
-      return CreateMaximumOptions(_fbb, ptr, _rehasher).Union();
-    }
    default: return 0;
  }
 }
@ -6617,10 +6514,6 @@ inline BuiltinOptionsUnion::BuiltinOptionsUnion(const BuiltinOptionsUnion &u) FL
      value = new DequantizeOptionsT(*reinterpret_cast<DequantizeOptionsT *>(u.value));
      break;
    }
-    case BuiltinOptions_MaximumOptions: {
-      value = new MaximumOptionsT(*reinterpret_cast<MaximumOptionsT *>(u.value));
-      break;
-    }
    default:
      break;
  }
@ -6818,11 +6711,6 @@ inline void BuiltinOptionsUnion::Reset() {
      delete ptr;
      break;
    }
-    case BuiltinOptions_MaximumOptions: {
-      auto ptr = reinterpret_cast<MaximumOptionsT *>(value);
-      delete ptr;
-      break;
-    }
    default: break;
  }
  value = nullptr;
--- a/tensorflow/contrib/lite/testing/BUILD
+++ b/tensorflow/contrib/lite/testing/BUILD
@ -36,7 +36,6 @@ gen_zipped_test_files(
        "local_response_norm.zip",
        "log_softmax.zip",
        "max_pool.zip",
-        "maximum.zip",
        "mean.zip",
        "mul.zip",
        "pad.zip",
--- a/tensorflow/contrib/lite/testing/generate_examples.py
+++ b/tensorflow/contrib/lite/testing/generate_examples.py
@ -862,41 +862,6 @@ def make_log_softmax_tests(zip_path):
  make_zip_of_tests(zip_path, test_parameters, build_graph, build_inputs)


-def make_maximum_tests(zip_path):
-  """Make a set of tests to do maximum."""
-
-  test_parameters = [{
-      "input_dtype": [tf.float32],
-      "input_shape_1": [[3], [1, 100], [4, 2, 3], [5, 224, 224, 3]],
-      "input_shape_2": [[3], [1, 100], [4, 2, 3], [5, 224, 224, 3]],
-  }]
-
-  def build_graph(parameters):
-    """Build the maximum op testing graph."""
-    input_tensor_1 = tf.placeholder(
-        dtype=parameters["input_dtype"],
-        name="input_1",
-        shape=parameters["input_shape_1"])
-    input_tensor_2 = tf.placeholder(
-        dtype=parameters["input_dtype"],
-        name="input_2",
-        shape=parameters["input_shape_2"])
-
-    out = tf.maximum(input_tensor_1, input_tensor_2)
-    return [input_tensor_1, input_tensor_2], [out]
-
-  def build_inputs(parameters, sess, inputs, outputs):
-    values = [
-        create_tensor_data(parameters["input_dtype"],
-                           parameters["input_shape_1"]),
-        create_tensor_data(parameters["input_dtype"],
-                           parameters["input_shape_2"])
-    ]
-    return values, sess.run(outputs, feed_dict=dict(zip(inputs, values)))
-
-  make_zip_of_tests(zip_path, test_parameters, build_graph, build_inputs)
-
-
 def make_binary_op_tests_func(binary_operator):
  """Return a function that does a test on a binary operator."""
  return lambda zip_path: make_binary_op_tests(zip_path, binary_operator)
@ -2012,7 +1977,6 @@ def main(unused_args):
        "exp.zip": make_exp_tests,
        "log_softmax.zip": make_log_softmax_tests,
        "lstm.zip": make_lstm_tests,
-        "maximum.zip": make_maximum_tests,
    }
    out = FLAGS.zip_to_output
    bin_path = FLAGS.toco
--- a/tensorflow/contrib/lite/testing/generated_examples_zip_test.cc
+++ b/tensorflow/contrib/lite/testing/generated_examples_zip_test.cc
@ -253,7 +253,6 @@ INSTANTIATE_TESTS(l2_pool)
 INSTANTIATE_TESTS(l2norm)
 INSTANTIATE_TESTS(local_response_norm)
 INSTANTIATE_TESTS(log_softmax)
-INSTANTIATE_TESTS(maximum)
 INSTANTIATE_TESTS(max_pool)
 INSTANTIATE_TESTS(mean)
 INSTANTIATE_TESTS(mul)
--- a/tensorflow/contrib/lite/toco/tflite/operator.cc
+++ b/tensorflow/contrib/lite/toco/tflite/operator.cc
@ -863,8 +863,6 @@ std::vector<std::unique_ptr<BaseOperator>> BuildOperatorList() {
  ops.emplace_back(new SimpleOperator<ExpOperator>("EXP", OperatorType::kExp));
  ops.emplace_back(new SimpleOperator<LogSoftmaxOperator>(
      "LOG_SOFTMAX", OperatorType::kLogSoftmax));
-  ops.emplace_back(new SimpleOperator<TensorFlowMaximumOperator>(
-      "MAXIMUM", OperatorType::kTensorFlowMaximum));

  return ops;
 }
--- a/tensorflow/contrib/lite/toco/tflite/operator_test.cc
+++ b/tensorflow/contrib/lite/toco/tflite/operator_test.cc
@ -109,8 +109,6 @@ TEST_F(OperatorTest, SimpleOperators) {
  CheckSimpleOperator<ExpOperator>("EXP", OperatorType::kExp);
  CheckSimpleOperator<LogSoftmaxOperator>("LOG_SOFTMAX",
                                          OperatorType::kLogSoftmax);
-  CheckSimpleOperator<TensorFlowMaximumOperator>(
-      "MAXIMUM", OperatorType::kTensorFlowMaximum);
 }

 TEST_F(OperatorTest, BuiltinAdd) {
--- a/tensorflow/contrib/lookup/lookup_ops.py
+++ b/tensorflow/contrib/lookup/lookup_ops.py
@ -494,7 +494,7 @@ class MutableDenseHashTable(LookupInterface):
                                                  value_dtype=tf.int64,
                                                  default_value=-1,
                                                  empty_key=0)
-  sess.run(table.insert(keys, values))
+  table.insert(keys, values)
  out = table.lookup(query_keys)
  print(out.eval())
  ```
--- a/tensorflow/contrib/makefile/download_dependencies.sh
+++ b/tensorflow/contrib/makefile/download_dependencies.sh
@ -34,7 +34,7 @@ PROTOBUF_URL="$(grep -o 'https://mirror.bazel.build/github.com/google/protobuf/.
 RE2_URL="$(grep -o 'https://mirror.bazel.build/github.com/google/re2/.*tar\.gz' "${BZL_FILE_PATH}" | head -n1)"
 FFT2D_URL="$(grep -o 'http.*fft\.tgz' "${BZL_FILE_PATH}" | grep -v mirror.bazel | head -n1)"
 ABSL_URL="$(grep -o 'https://github.com/abseil/abseil-cpp/.*tar.gz' "${BZL_FILE_PATH}" | head -n1)"
-CUB_URL="$(grep -o 'https.*cub/archive.*zip' "${BZL_FILE_PATH}" | grep -v mirror.bazel | head -n1)"
+CUB_URL="$(grep -o 'https.*cub/archive.*zip' "${BZL_FILE_PATH}" | grep -v bazel-mirror | head -n1)"

 # TODO(petewarden): Some new code in Eigen triggers a clang bug with iOS arm64,
 #                   so work around it by patching the source.
--- a/tensorflow/contrib/makefile/tf_op_files.txt
+++ b/tensorflow/contrib/makefile/tf_op_files.txt
@ -258,7 +258,6 @@ tensorflow/core/kernels/requantize.cc
 tensorflow/core/kernels/remote_fused_graph_execute_op.cc
 tensorflow/core/kernels/remote_fused_graph_execute_utils.cc
 tensorflow/core/kernels/batch_matmul_op_real.cc
-tensorflow/core/kernels/random_op.cc
 tensorflow/core/ops/training_ops.cc
 tensorflow/core/ops/string_ops.cc
 tensorflow/core/ops/state_ops.cc
--- a/tensorflow/contrib/seq2seq/kernels/beam_search_ops.cc
+++ b/tensorflow/contrib/seq2seq/kernels/beam_search_ops.cc
@ -74,7 +74,7 @@ class GatherTreeOp : public OpKernel {
        ctx,
        step_ids_shape.dim_size(1) == max_sequence_lengths.shape().dim_size(0),
        errors::InvalidArgument("batch size dimensions step_ids.shape[1] and "
-                                "max_sequence_lengths.shape[0] must match.  "
+                                "max_seqeuence_lengths.shape[0] must match.  "
                                "but shapes are: ",
                                step_ids_shape.DebugString(), " and ",
                                max_sequence_lengths.shape().DebugString()));
--- a/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py
+++ b/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py
@ -736,7 +736,7 @@ class _BaseMonotonicAttentionMechanism(_BaseAttentionMechanism):
  """Base attention mechanism for monotonic attention.

  Simply overrides the initial_alignments function to provide a dirac
-  distribution, which is needed in order for the monotonic attention
+  distribution,which is needed in order for the monotonic attention
  distributions to have the correct behavior.
  """

@ -763,7 +763,7 @@ class _BaseMonotonicAttentionMechanism(_BaseAttentionMechanism):
 class BahdanauMonotonicAttention(_BaseMonotonicAttentionMechanism):
  """Monotonic attention mechanism with Bahadanau-style energy function.

-  This type of attention enforces a monotonic constraint on the attention
+  This type of attention encorces a monotonic constraint on the attention
  distributions; that is once the model attends to a given point in the memory
  it can't attend to any prior points at subsequence output timesteps.  It
  achieves this by using the _monotonic_probability_fn instead of softmax to
@ -867,7 +867,7 @@ class BahdanauMonotonicAttention(_BaseMonotonicAttentionMechanism):
 class LuongMonotonicAttention(_BaseMonotonicAttentionMechanism):
  """Monotonic attention mechanism with Luong-style energy function.

-  This type of attention enforces a monotonic constraint on the attention
+  This type of attention encorces a monotonic constraint on the attention
  distributions; that is once the model attends to a given point in the memory
  it can't attend to any prior points at subsequence output timesteps.  It
  achieves this by using the _monotonic_probability_fn instead of softmax to
@ -1133,7 +1133,7 @@ class AttentionWrapper(rnn_cell_impl.RNNCell):
      output_attention: Python bool.  If `True` (default), the output at each
        time step is the attention value.  This is the behavior of Luong-style
        attention mechanisms.  If `False`, the output at each time step is
-        the output of `cell`.  This is the behavior of Bhadanau-style
+        the output of `cell`.  This is the beahvior of Bhadanau-style
        attention mechanisms.  In both cases, the `attention` tensor is
        propagated to the next time step via the state and is used there.
        This flag only controls whether the attention mechanism is propagated
--- a/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py
+++ b/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py
@ -821,9 +821,9 @@ def _get_scores(log_probs, sequence_lengths, length_penalty_weight):
  Returns:
    The scores normalized by the length_penalty.
  """
-  length_penalty_ = _length_penalty(
+  length_penality_ = _length_penalty(
      sequence_lengths=sequence_lengths, penalty_factor=length_penalty_weight)
-  return log_probs / length_penalty_
+  return log_probs / length_penality_


 def _length_penalty(sequence_lengths, penalty_factor):
@ -860,7 +860,7 @@ def _mask_probs(probs, eos_token, finished):
  unfinished beams remain unchanged.

  Args:
-    probs: Log probabilities of shape `[batch_size, beam_width, vocab_size]`
+    probs: Log probabiltiies of shape `[batch_size, beam_width, vocab_size]`
    eos_token: An int32 id corresponding to the EOS token to allocate
      probability to.
    finished: A boolean tensor of shape `[batch_size, beam_width]` that
--- a/tensorflow/contrib/slim/python/slim/data/parallel_reader.py
+++ b/tensorflow/contrib/slim/python/slim/data/parallel_reader.py
@ -115,8 +115,8 @@ class ParallelReader(io_ops.ReaderBase):
    reader needs to start reading from a new file since it has finished with
    the previous file).

-    A queue runner for enqueuing in the `common_queue` is automatically added
-    to the TF QueueRunners collection.
+    A queue runner for enqueing in the `common_queue` is automatically added to
+    the TF QueueRunners collection.

    Args:
      queue: A Queue or a mutable string Tensor representing a handle
--- a/tensorflow/contrib/slim/python/slim/data/prefetch_queue.py
+++ b/tensorflow/contrib/slim/python/slim/data/prefetch_queue.py
@ -36,9 +36,9 @@ def prefetch_queue(tensors,
                   dynamic_pad=False,
                   shared_name=None,
                   name=None):
-  """Creates a queue to prefetch tensors from `tensors`.
+  """Creates a queue to prefetech tensors from `tensors`.

-  A queue runner for enqueuing tensors into the prefetch_queue is automatically
+  A queue runner for enqueing tensors into the prefetch_queue is automatically
  added to the TF QueueRunners collection.

  Example:
--- a/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py
+++ b/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py
@ -124,7 +124,7 @@ class BoundingBox(ItemHandler):
    super(BoundingBox, self).__init__(self._full_keys)

  def tensors_to_item(self, keys_to_tensors):
-    """Maps the given dictionary of tensors to a concatenated list of bboxes.
+    """Maps the given dictionary of tensors to a contatenated list of bboxes.

    Args:
      keys_to_tensors: a mapping of TF-Example keys to parsed tensors.
--- a/tensorflow/contrib/tensorrt/README.md
+++ b/tensorflow/contrib/tensorrt/README.md
@ -1,15 +1,15 @@
-# Using TensorRT in TensorFlow
-
+Using TensorRT in TensorFlow
+============================

 This module provides necessary bindings and introduces TRT_engine_op
 operator that wraps a subgraph in TensorRT. This is still a work in progress
 but should be useable with most common graphs.

-## Compilation
-
+Compilation
+-----------

 In order to compile the module, you need to have a local TensorRT
-installation ( libnvinfer.so and respective include files ). During the
+installation (libnvinfer.so and respective include files). During the
 configuration step, TensorRT should be enabled and installation path
 should be set. If installed through package managers (deb,rpm),
 configure script should find the necessary components from the system
@ -22,38 +22,4 @@ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/
 ```

 After the installation of tensorflow package, TensorRT transformation
-will be available. An example use can be found in test/test_tftrt.py script
-
-## Installing TensorRT 3.0.4
-
-In order to make use of TensorRT integration, you will need a local installation of TensorRT 3.0.4 from the [NVIDIA Developer website](https://developer.nvidia.com/tensorrt). Due to compiler compatibility, you will need to download and install the TensorRT 3.0.4 tarball for _Ubuntu 14.04_, i.e., **_TensorRT-3.0.4.Ubuntu-14.04.5.x86_64.cuda-9.0.cudnn7.0-tar.gz_**, even if you are using Ubuntu 16.04 or later.
-
-### Preparing TensorRT installation
-
-Once you have downloaded TensorRT-3.0.4.Ubuntu-14.04.5.x86_64.cuda-9.0.cudnn7.0-tar.gz, you will need to unpack it to an installation directory, which will be referred to as <install_dir>. Please replace <install_dir> with the full path of actual installation directory you choose in commands below.
-
-```shell
-cd <install_dir> && tar -zxf /path/to/TensorRT-3.0.4.Ubuntu-14.04.5.x86_64.cuda-9.0.cudnn7.0-tar.gz
-```
-
-After unpacking the binaries, you have several options to use them:
-
-#### To run TensorFlow as a user without superuser privileges
-
-For a regular user without any sudo rights, you should add TensorRT to your `$LD_LIBRARY_PATH`:
-
-  ```shell
-   export LD_LIBRARY_PATH=<install_dir>/TensorRT-3.0.4/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
-  ```
-
-Then you are ready to use TensorFlow-TensorRT integration. `$LD_LIBRARY_PATH` must contain the path to TensorRT installation for TensorFlow-TensorRT integration to work. If you are using a VirtualEnv-like setup, you can add the command above to your `bin/activate` script or to your `.bashrc` script.
-
-#### To run TensorFlow as a superuser
-
- When running as a superuser, such as in a container or via sudo, the `$LD_LIBRARY_PATH` approach above may not work. The following is preferred when the user has superuser privileges:
-
-  ```shell
-  echo "<install_dir>/TensorRT-3.0.4/lib" | sudo tee /etc/ld.so.conf.d/tensorrt304.conf && sudo ldconfig
-  ```
-
-  Please ensure that any existing deb package installation of TensorRT is removed before following these instructions to avoid package conflicts.
+will be available. An example use can be found in test/test_tftrt.py directory
--- a/tensorflow/contrib/tensorrt/convert/convert_graph.cc
+++ b/tensorflow/contrib/tensorrt/convert/convert_graph.cc
@ -49,13 +49,12 @@ namespace tensorrt {
 namespace convert {
 namespace {

-bool IsTensorRTCandidate(const tensorflow::Node* node) {
+bool IsTensorRTCandidate(const tensorflow::NodeDef& node_def) {
  // LINT.IfChange
  // TODO(jie): Segmentation shouldn't associated with op name.
  //            Split it into a registration for each kernel.
  static const std::set<string> candidate_ops = {
      "Identity",
-      "Snapshot",
      "Const",
      "Conv2D",
      "MaxPool",
@ -75,7 +74,7 @@ bool IsTensorRTCandidate(const tensorflow::Node* node) {
      // TODO(ben,jie): ...
  };
  // LINT.ThenChange(//tensorflow/contrib/tensorrt/convert/convert_nodes.h)
-  return candidate_ops.count(node->type_string());
+  return candidate_ops.count(node_def.op());
 }

 void GetSubGraphIncomingEdges(const tensorflow::Graph& graph,
@ -85,10 +84,10 @@ void GetSubGraphIncomingEdges(const tensorflow::Graph& graph,
    const tensorflow::Node* node = graph.FindNodeId(node_id);
    for (const tensorflow::Edge* edge : node->in_edges()) {
      if (!subgraph_node_ids.count(edge->src()->id()) &&
-          !edge->src()->IsSource() && !edge->IsControlEdge()) {
+          !edge->src()->IsSource()) {
        incoming_edges->insert(edge);
      } else {
-        VLOG(2) << node->name() << " -> " << edge->src()->name() << " N, ";
+        VLOG(2) << edge->src()->name() << " N, ";
      }
    }
  }
@ -101,11 +100,11 @@ void GetSubGraphOutgoingEdges(const tensorflow::Graph& graph,
    const tensorflow::Node* node = graph.FindNodeId(node_id);
    for (const tensorflow::Edge* edge : node->out_edges()) {
      if (!subgraph_node_ids.count(edge->dst()->id()) &&
-          !edge->dst()->IsSink() && !edge->IsControlEdge()) {
-        VLOG(2) << node->name() << " -> " << edge->dst()->name() << " Y, ";
+          !edge->dst()->IsSink()) {
+        VLOG(2) << edge->dst()->name() << " Y, ";
        outgoing_edges->insert(edge);
      } else {
-        VLOG(2) << node->name() << " -> " << edge->dst()->name() << " N, ";
+        VLOG(2) << edge->dst()->name() << " N, ";
      }
    }
  }
@ -410,9 +409,8 @@ tensorflow::Status ConvertGraphDefToTensorRT(
      tensorflow::Status status = ConvertSubGraphToTensorRT(&p);
      if (status != tensorflow::Status::OK()) {
        LOG(WARNING) << "subgraph conversion error for subgraph_index:" << count
-                     << " due to: \"" << status.ToString()
-                     << "\" SKIPPING......( " << subgraph_node_names.size()
-                     << " nodes)";
+                     << " due to: \n"
+                     << status.ToString() << " SKIPPING......";
      }
      count++;
    }
--- a/tensorflow/contrib/tensorrt/convert/convert_nodes.cc
+++ b/tensorflow/contrib/tensorrt/convert/convert_nodes.cc
@ -53,8 +53,8 @@ limitations under the License.
 namespace tensorflow {
 namespace tensorrt {
 namespace convert {
-using ::tensorflow::strings::StrAppend;
 using ::tensorflow::strings::StrCat;
+
 namespace {

 inline tensorflow::Status ConvertDType(tensorflow::DataType tf_dtype,
@ -430,8 +430,9 @@ class Converter {
  tensorflow::tensorrt::TRTWeightStore* weight_store_;
  bool fp16_;
  void register_op_converters();
-  tensorflow::Status get_inputs(const tensorflow::NodeDef& node_def,
-                                std::vector<TRT_TensorOrWeights>* inputs) {
+  std::vector<TRT_TensorOrWeights> get_inputs(
+      const tensorflow::NodeDef& node_def) {
+    std::vector<TRT_TensorOrWeights> inputs;
    for (auto const& input_name : node_def.input()) {
      /*************************************************************************
       * TODO(jie) handle case 1) here
@ -452,17 +453,13 @@ class Converter {

      VLOG(2) << "retrieve input: " << name;
      if (trt_tensors_.count(name)) {
-        inputs->push_back(trt_tensors_.at(name));
+        inputs.push_back(trt_tensors_.at(name));
      } else {
-        string str("Node ");
-        StrAppend(&str, node_def.name(), " should have an input named '", name,
-                  "' but it is not available");
-        LOG(WARNING) << "input: " << name << " not available for node at "
-                     << node_def.name();
-        return tensorflow::errors::InvalidArgument(str);
+        LOG(FATAL) << "input: " << name << " not available for node at, "
+                   << node_def.name();
      }
    }
-    return tensorflow::Status::OK();
+    return inputs;
  }

 public:
@ -486,8 +483,7 @@ class Converter {
  }

  tensorflow::Status convert_node(const tensorflow::NodeDef& node_def) {
-    std::vector<TRT_TensorOrWeights> inputs;
-    TF_RETURN_IF_ERROR(this->get_inputs(node_def, &inputs));
+    std::vector<TRT_TensorOrWeights> inputs = this->get_inputs(node_def);
    string op = node_def.op();
    if (!op_registry_.count(op)) {
      return tensorflow::errors::Unimplemented(
@ -552,19 +548,6 @@ class Converter {
  }
 };

-TRT_ShapedWeights ConvertFP32ToFP16(Converter& ctx,
-                                    const TRT_ShapedWeights& weights_src) {
-  auto dtype_new = tensorflow::DataType::DT_HALF;
-  TRT_ShapedWeights weights =
-      ctx.get_temp_weights(dtype_new, weights_src.shape_);
-  const float* src = static_cast<const float*>(weights_src.GetValues());
-  Eigen::half* dst = const_cast<Eigen::half*>(
-      static_cast<Eigen::half const*>(weights.GetValues()));
-  for (int64_t i = 0; i < weights_src.count(); i++) {
-    dst[i] = Eigen::half_impl::float_to_half_rtne(src[i]);
-  }
-  return weights;
-}
 // ****************************************************************************
 // Constant folding functions
 // TODO(jie): once optimizer kicks in, we should have done constant folding
@ -892,7 +875,7 @@ tensorflow::Status BinaryTensorOpWeight(

  // Check type consistency
  nvinfer1::DataType ttype;
-  TF_RETURN_IF_ERROR(ConvertDType(weights.type_, &ttype));
+  TF_CHECK_OK(ConvertDType(weights.type_, &ttype));

  // Check scale mode
  auto dims_w = weights.shape_;
@ -974,10 +957,6 @@ tensorflow::Status BinaryTensorOpWeight(
    }
  }

-  if (ctx.isFP16()) {
-    weights = ConvertFP32ToFP16(ctx, weights);
-  }
-
  // prepare weights
  TRT_ShapedWeights shift_weights(weights.type_);
  TRT_ShapedWeights scale_weights(weights.type_);
@ -1019,7 +998,9 @@ enum class ConvolutionType { DEFAULT, DEPTHWISE_CONV };
 tensorflow::Status ConvertConv2DHelper(
    Converter& ctx, const tensorflow::NodeDef& node_def,
    const std::vector<TRT_TensorOrWeights>& inputs,
-    std::vector<TRT_TensorOrWeights>* outputs, int group) {
+    std::vector<TRT_TensorOrWeights>* outputs,
+    int group  // group ==0 specifies depthwise conv
+) {
  const nvinfer1::ITensor* tensor = inputs.at(0).tensor();

  TFAttrs attrs(node_def);
@ -1044,10 +1025,6 @@ tensorflow::Status ConvertConv2DHelper(
  VLOG(2) << "groups count: " << num_groups;

  TRT_ShapedWeights weights_rsck = inputs.at(1).weights();
-  if (ctx.isFP16()) {
-    weights_rsck = ConvertFP32ToFP16(ctx, inputs.at(1).weights());
-  }
-
  TRT_ShapedWeights weights = ctx.get_temp_weights_like(weights_rsck);
  ReorderRSCKToKCRS(weights_rsck, &weights, num_groups);
  TRT_ShapedWeights biases(weights.type_);
@ -1157,9 +1134,9 @@ tensorflow::Status BinaryTensorOpTensor(
  CHECK_EQ_TYPE(tensor_r->getType(), dtype);
  auto op_pair = ops.find(node_def.op());
  if (op_pair == ops.end())
-    return tensorflow::errors::Unimplemented(
-        "binary op: " + node_def.op() +
-        " not supported at: " + node_def.name());
+    return tensorflow::errors::Unimplemented("binary op: " + node_def.op() +
+                                             " not supported at: " +
+                                             node_def.name());

  nvinfer1::IElementWiseLayer* layer = ctx.network()->addElementWise(
      *const_cast<nvinfer1::ITensor*>(tensor_l),
@ -1318,11 +1295,8 @@ tensorflow::Status ConvertScale(Converter& ctx,
  // Implement tensor binaryOp weight [channel wise] for now;
  const nvinfer1::ITensor* tensor = inputs.at(0).tensor();

+  // TODO(jie): handle NHWC/NCHW transpose;
  TRT_ShapedWeights weights = inputs.at(1).weights();
-  if (ctx.isFP16()) {
-    weights = ConvertFP32ToFP16(ctx, inputs.at(1).weights());
-  }
-
  TRT_ShapedWeights empty_weights(weights.type_);

  TFAttrs attrs(node_def);
@ -1402,11 +1376,8 @@ tensorflow::Status ConvertConst(Converter& ctx,
          scalar_shape.d[0] = weights_tensor.float_val_size();
          scalar_shape.type[0] = nvinfer1::DimensionType::kSPATIAL;
        } else {
-          LOG(WARNING) << "Broadcast on weights only supports kCHANNEL and"
-                       << " kUNIFORM, at: " << node_def.name();
-          string err_str("Broadcast method is not supported for '");
-          StrAppend(&err_str, node_def.name(), "' of type ", node_def.op());
-          return tensorflow::errors::InvalidArgument(err_str);
+          LOG(FATAL) << "Broadcast on weights only supports kCHANNEL and"
+                     << " kUNIFORM, at: " << node_def.name();
        }
      }
    } else {
@ -1420,16 +1391,33 @@ tensorflow::Status ConvertConst(Converter& ctx,
        scalar_shape.type[i] = nvinfer1::DimensionType::kSPATIAL;
      }
    }
-    size_t len_data = tensorflow::DataTypeSize(dtype);
-    for (int i = 0; i < scalar_shape.nbDims; i++) len_data *= scalar_shape.d[i];
-    ctx.weight_store()->store_.push_back(std::vector<uint8_t>(len_data));
-    void* dst = static_cast<void*>(&(ctx.weight_store()->store_.back()[0]));
-    std::vector<float> tensor_data(
-        weights_tensor.float_val().begin(),
-        weights_tensor.float_val()
-            .end());  //  make a local copy first to flatten
-    memcpy(dst, tensor_data.data(), len_data);  // store into weight store
-    weights = TRT_ShapedWeights(dtype, dst, scalar_shape);
+    if (ctx.isFP16()) {
+      auto dtype_new = tensorflow::DataType::DT_HALF;
+      size_t len_data = tensorflow::DataTypeSize(dtype_new);
+      for (int i = 0; i < scalar_shape.nbDims; i++)
+        len_data *= scalar_shape.d[i];
+      ctx.weight_store()->store_.push_back(std::vector<uint8_t>(len_data));
+      void* dst = static_cast<void*>(&(ctx.weight_store()->store_.back()[0]));
+      tensorflow::Tensor temp_tensor(tensorflow::DT_HALF, tensor.shape());
+      auto half_tensor = temp_tensor.flat<Eigen::half>();
+      Eigen::DefaultDevice defd;
+      half_tensor.device(defd) =
+          tensor.flat<float>().template cast<Eigen::half>();
+      memcpy(dst, half_tensor.data(), len_data);  // store into weight store
+      weights = TRT_ShapedWeights(dtype_new, dst, scalar_shape);
+    } else {
+      size_t len_data = tensorflow::DataTypeSize(dtype);
+      for (int i = 0; i < scalar_shape.nbDims; i++)
+        len_data *= scalar_shape.d[i];
+      ctx.weight_store()->store_.push_back(std::vector<uint8_t>(len_data));
+      void* dst = static_cast<void*>(&(ctx.weight_store()->store_.back()[0]));
+      std::vector<float> tensor_data(
+          weights_tensor.float_val().begin(),
+          weights_tensor.float_val()
+              .end());  //  make a local copy first to flatten
+      memcpy(dst, tensor_data.data(), len_data);  // store into weight store
+      weights = TRT_ShapedWeights(dtype, dst, scalar_shape);
+    }
  } else if (!weights_tensor.int_val().empty()) {
    VLOG(2) << "int!!!" << node_def.name();
    nvinfer1::Dims scalar_shape;
@ -1444,11 +1432,8 @@ tensorflow::Status ConvertConst(Converter& ctx,
          scalar_shape.d[0] = weights_tensor.int_val_size();
          scalar_shape.type[0] = nvinfer1::DimensionType::kSPATIAL;
        } else {
-          LOG(WARNING) << "Broadcast on weights only supports kCHANNEL and"
-                       << " kUNIFORM, at: " << node_def.name();
-          string err_str("Broadcast method is not supported for '");
-          StrAppend(&err_str, node_def.name(), "' of type ", node_def.op());
-          return tensorflow::errors::InvalidArgument(err_str);
+          LOG(FATAL) << "Broadcast on weights only supports kCHANNEL and"
+                     << " kUNIFORM, at: " << node_def.name();
        }
      }
    } else {
@ -1462,23 +1447,62 @@ tensorflow::Status ConvertConst(Converter& ctx,
        scalar_shape.type[i] = nvinfer1::DimensionType::kSPATIAL;
      }
    }
-    //  we should not have converted //if (ctx.isFP16()) {
-    size_t len_data = tensorflow::DataTypeSize(dtype);
-    for (int i = 0; i < scalar_shape.nbDims; i++) len_data *= scalar_shape.d[i];
-    size_t len_tensor = weights_tensor.int_val_size() * sizeof(int32);
-    len_data = std::max(len_data, len_tensor);
-    ctx.weight_store()->store_.push_back(std::vector<uint8_t>(len_data));
-    void* dst = static_cast<void*>(&(ctx.weight_store()->store_.back()[0]));
-    std::vector<int32> tensor_data(
-        weights_tensor.int_val().begin(),
-        weights_tensor.int_val().end());  //  make a local copy first to flatten
-                                          //  doesn't have to be contigous
-    memcpy(dst, tensor_data.data(), len_tensor);  // store into weight store
-    weights = TRT_ShapedWeights(dtype, dst, scalar_shape);
+    if (ctx.isFP16()) {
+      auto dtype_new = tensorflow::DataType::DT_HALF;
+      size_t len_data = tensorflow::DataTypeSize(dtype_new);
+      for (int i = 0; i < scalar_shape.nbDims; i++)
+        len_data *= scalar_shape.d[i];
+      ctx.weight_store()->store_.push_back(std::vector<uint8_t>(len_data));
+      void* dst = static_cast<void*>(&(ctx.weight_store()->store_.back()[0]));
+      tensorflow::Tensor temp_tensor(tensorflow::DT_HALF, tensor.shape());
+      TTypes<Eigen::half>::Flat half_tensor = temp_tensor.flat<Eigen::half>();
+      Eigen::DefaultDevice defd;
+      switch (dtype) {
+        case (tensorflow::DT_INT32): {
+          half_tensor.device(defd) =
+              tensor.flat<int32>().template cast<Eigen::half>();
+          break;
+        }
+        case (tensorflow::DT_INT16): {
+          half_tensor.device(defd) =
+              tensor.flat<int16>().template cast<Eigen::half>();
+          break;
+        }
+        case (tensorflow::DT_INT8): {
+          half_tensor.device(defd) =
+              tensor.flat<int8>().template cast<Eigen::half>();
+          break;
+        }
+        case (tensorflow::DT_UINT8): {
+          half_tensor.device(defd) =
+              tensor.flat<uint8>().template cast<Eigen::half>();
+          break;
+        }
+        default:
+          return tensorflow::errors::InvalidArgument(
+              "Datatype " + tensorflow::DataTypeString(dtype) +
+              " for FP16 conversion");
+          break;
+      };
+      memcpy(dst, half_tensor.data(), len_data);  // store into weight store
+      weights = TRT_ShapedWeights(dtype_new, dst, scalar_shape);
+    } else {
+      size_t len_data = tensorflow::DataTypeSize(dtype);
+      for (int i = 0; i < scalar_shape.nbDims; i++)
+        len_data *= scalar_shape.d[i];
+      size_t len_tensor = weights_tensor.int_val_size() * sizeof(int32);
+      len_data = std::max(len_data, len_tensor);
+      ctx.weight_store()->store_.push_back(std::vector<uint8_t>(len_data));
+      void* dst = static_cast<void*>(&(ctx.weight_store()->store_.back()[0]));
+      std::vector<int32> tensor_data(
+          weights_tensor.int_val().begin(),
+          weights_tensor.int_val()
+              .end());  //  make a local copy first to flatten
+                        //  doesn't have to be contiguous
+      memcpy(dst, tensor_data.data(), len_tensor);  // store into weight store
+      weights = TRT_ShapedWeights(dtype, dst, scalar_shape);
+    }
  } else if (!weights_tensor.tensor_content().empty()) {
-    //  obsolete method.
-    //  After optimization path, we do not see weights in this format.
-    //  fp16 conversion technically should be needed here.
    VLOG(2) << "TENSOR!!!" << node_def.name();
    const auto& content = weights_tensor.tensor_content();

@ -1760,6 +1784,8 @@ tensorflow::Status ConvertConcat(Converter& ctx,
  TRT_ShapedWeights axis = inputs.at(input_size).weights();

  TFAttrs attrs(node_def);
+  // auto attr_size = attrs.at("N")->i();
+  // auto data_type = attrs.get<nvinfer1::DataType>("T");
  auto index_type = attrs.get<tensorflow::DataType>("Tidx");

  // TODO(jie): handle data type
@ -1849,103 +1875,71 @@ tensorflow::Status ConvertFusedBatchNorm(
        "only is_training=false is supported, at " + node_def.name());
  }
  nvinfer1::ITensor const* tensor = inputs.at(0).tensor();
-
-  //  Check parameter types
-  auto parameter_type = inputs.at(1).weights().type_;
-  if ((parameter_type != tensorflow::DataType::DT_FLOAT) &&
-      (parameter_type != tensorflow::DataType::DT_HALF)) {
-    return tensorflow::errors::Unimplemented(
-        "only float32 or float16 weight data type is supported, for node " +
-        node_def.name() + " got " + tensorflow::DataTypeString(parameter_type));
-  }
-  for (int i = 1; i < 5; i++) {
-    if (inputs.at(i).weights().type_ != parameter_type) {
-      return tensorflow::errors::Unimplemented(
-          "Inconsistent parameter type for batchnormis not supported, at: " +
-          node_def.name());
-    }
-  }
-
-  TRT_ShapedWeights dummy_power_weights(parameter_type);
-  size_t nweight = 0;
-  for (int i = 1; i < 5; i++) {
-    nweight = std::max(nweight, (size_t)inputs.at(i).weights().count());
-  }
-  TRT_ShapedWeights* ptr_shape_weights = nullptr;
-  for (int i = 1; i < 5; i++) {
-    if (inputs.at(i).weights().count() == nweight) {
-      ptr_shape_weights =
-          const_cast<TRT_ShapedWeights*>(&(inputs.at(i).weights()));
-    } else if (inputs.at(i).weights().count() != 1) {
-      return tensorflow::errors::InvalidArgument(
-          "Inconsistent batchnorm parameter count, at: " + node_def.name());
-    }
-  }
-  //  We could technically have two weights with different shape.
-  //  that requires two addScale op, arguably less performant
+  TRT_ShapedWeights scale_weights = inputs.at(1).weights();
+  TRT_ShapedWeights offset_weights = inputs.at(2).weights();
+  TRT_ShapedWeights mean_weights = inputs.at(3).weights();
+  TRT_ShapedWeights variance_weights = inputs.at(4).weights();
+  TRT_ShapedWeights dummy_power_weights(scale_weights.type_);
  TRT_ShapedWeights combined_scale_weights =
-      ctx.get_temp_weights_like(*ptr_shape_weights);
+      ctx.get_temp_weights_like(scale_weights);
  TRT_ShapedWeights combined_offset_weights =
-      ctx.get_temp_weights_like(*ptr_shape_weights);
-
-  const Eigen::half* cast_vals_array[4];
-  const float* vals_array[4];
-  for (int j = 0; j < 4; j++) {
-    cast_vals_array[j] =
-        static_cast<Eigen::half const*>(inputs.at(j + 1).weights().GetValues());
-    vals_array[j] =
-        static_cast<float const*>(inputs.at(j + 1).weights().GetValues());
-  }
-  Eigen::half* cast_combined_scale_vals = const_cast<Eigen::half*>(
-      static_cast<Eigen::half const*>(combined_scale_weights.GetValues()));
-  Eigen::half* cast_combined_offset_vals = const_cast<Eigen::half*>(
-      static_cast<Eigen::half const*>(combined_offset_weights.GetValues()));
-  float* combined_scale_vals = const_cast<float*>(
-      static_cast<float const*>(combined_scale_weights.GetValues()));
-  float* combined_offset_vals = const_cast<float*>(
-      static_cast<float const*>(combined_offset_weights.GetValues()));
-
-  for (size_t i = 0; i < nweight; ++i) {
-    float batchnorm_data[4];
-    for (int j = 0; j < 4; j++) {
-      if (inputs.at(j + 1).weights().count() != 1) {
-        if (parameter_type == tensorflow::DT_FLOAT) {
-          batchnorm_data[j] = vals_array[j][i];
-        } else if (parameter_type == tensorflow::DT_HALF) {
-          batchnorm_data[j] =
-              Eigen::half_impl::half_to_float(cast_vals_array[j][i]);
-        }
-      } else {
-        if (parameter_type == tensorflow::DT_FLOAT) {
-          batchnorm_data[j] = vals_array[j][0];
-        } else if (parameter_type == tensorflow::DT_HALF) {
-          batchnorm_data[j] =
-              Eigen::half_impl::half_to_float(cast_vals_array[j][0]);
-        }
+      ctx.get_temp_weights_like(offset_weights);
+  size_t nweight = scale_weights.count();
+  if ((scale_weights.type_ == offset_weights.type_) &&
+      (mean_weights.type_ == variance_weights.type_) &&
+      (scale_weights.type_ == variance_weights.type_)) {
+    if ((scale_weights.type_ != tensorflow::DataType::DT_FLOAT) &&
+        (scale_weights.type_ != tensorflow::DataType::DT_HALF)) {
+      return tensorflow::errors::Unimplemented(
+          "only float32 or float16 weight data type is supported, for node " +
+          node_def.name() + " got " +
+          tensorflow::DataTypeString(scale_weights.type_));
+    }
+    if (scale_weights.type_ == tensorflow::DT_FLOAT) {
+      for (size_t i = 0; i < nweight; ++i) {
+        float scale = (static_cast<float const*>(scale_weights.GetValues()))[i];
+        float offset =
+            (static_cast<float const*>(offset_weights.GetValues()))[i];
+        float mean = (static_cast<float const*>(mean_weights.GetValues()))[i];
+        float variance =
+            (static_cast<float const*>(variance_weights.GetValues()))[i];
+        float& combined_scale_ref = const_cast<float*>(
+            static_cast<float const*>(combined_scale_weights.GetValues()))[i];
+        float& combined_offset_ref = const_cast<float*>(
+            static_cast<float const*>(combined_offset_weights.GetValues()))[i];
+        combined_scale_ref = scale / sqrtf(variance + epsilon);
+        combined_offset_ref = offset - mean * combined_scale_ref;
+      }
+    } else {
+      const Eigen::half* scale_vals =
+          (static_cast<Eigen::half const*>(scale_weights.GetValues()));
+      const Eigen::half* off_vals =
+          (static_cast<Eigen::half const*>(offset_weights.GetValues()));
+      const Eigen::half* mean_vals =
+          (static_cast<Eigen::half const*>(mean_weights.GetValues()));
+      const Eigen::half* variance_vals =
+          (static_cast<Eigen::half const*>(variance_weights.GetValues()));
+      Eigen::half* comb_scale_vals = const_cast<Eigen::half*>(
+          static_cast<Eigen::half const*>(combined_scale_weights.GetValues()));
+      Eigen::half* comb_off_vals = const_cast<Eigen::half*>(
+          static_cast<Eigen::half const*>(combined_offset_weights.GetValues()));
+      for (size_t i = 0; i < nweight; ++i) {
+        float scale(scale_vals[i]);
+        float offset(off_vals[i]);
+        float mean(mean_vals[i]);
+        float variance(variance_vals[i]);
+        float combined_scale_ref = scale / sqrtf(variance + epsilon);
+        comb_scale_vals[i] = Eigen::half(combined_scale_ref);
+        float combined_offset_ref = offset - mean * combined_scale_ref;
+        comb_off_vals[i] = Eigen::half(combined_offset_ref);
      }
    }
-    float scale = batchnorm_data[0];
-    float offset = batchnorm_data[1];
-    float mean = batchnorm_data[2];
-    float variance = batchnorm_data[3];
-    float combined_scale_val = scale / sqrtf(variance + epsilon);
-    float combined_offset_val = offset - mean * combined_scale_val;
-    if (parameter_type == tensorflow::DT_FLOAT) {
-      combined_scale_vals[i] = combined_scale_val;
-      combined_offset_vals[i] = combined_offset_val;
-    } else if (parameter_type == tensorflow::DT_HALF) {
-      cast_combined_scale_vals[i] = Eigen::half(combined_scale_val);
-      cast_combined_offset_vals[i] = Eigen::half(combined_offset_val);
-    }
  }
-
-  nvinfer1::ScaleMode mode = nweight == 1 ? nvinfer1::ScaleMode::kUNIFORM
-                                          : nvinfer1::ScaleMode::kCHANNEL;
-  nvinfer1::IScaleLayer* layer =
-      ctx.network()->addScale(*const_cast<nvinfer1::ITensor*>(tensor), mode,
-                              combined_offset_weights.GetWeightsForTRT(),
-                              combined_scale_weights.GetWeightsForTRT(),
-                              dummy_power_weights.GetWeightsForTRT());
+  nvinfer1::IScaleLayer* layer = ctx.network()->addScale(
+      *const_cast<nvinfer1::ITensor*>(tensor), nvinfer1::ScaleMode::kCHANNEL,
+      combined_offset_weights.GetWeightsForTRT(),
+      combined_scale_weights.GetWeightsForTRT(),
+      dummy_power_weights.GetWeightsForTRT());
  nvinfer1::ITensor* output_tensor = layer->getOutput(0);
  outputs->push_back(TRT_TensorOrWeights(output_tensor));
  return tensorflow::Status::OK();
@ -2056,7 +2050,6 @@ void Converter::register_op_converters() {
  op_registry_["Const"] = ConvertConst;
  // TODO(ben,jie): this is a temp hack.
  op_registry_["Identity"] = ConvertIdentity;  // Identity should be removed
-  op_registry_["Snapshot"] = ConvertIdentity;  // Snapshot should be removed

  // resnet_50_v1 slim implementation
  op_registry_["Add"] = ConvertBinary;
@ -2150,11 +2143,8 @@ tensorflow::Status ConvertCalibrationNodeToEngineNode(
  calib_res->thr_->join();
  delete calib_res->thr_;
  if (!calib_res->engine_) {
-    LOG(ERROR) << "Calibration failed!, engine does not exist. Did you run "
+    LOG(FATAL) << "Calibration failed!, engine is nullptr. Did you run "
                  "calibration graph?";
-    return tensorflow::errors::FailedPrecondition(
-        "Calibration graph needs to be executed on"
-        " calibration data before convertsion to inference graph");
  }
  auto weight_rmgr = trt_rm->getManager("WeightStore");
  TF_CHECK_OK(weight_rmgr->Delete<tensorflow::tensorrt::TRTWeightStore>(
@ -2191,7 +2181,7 @@ tensorflow::Status ConvertCalibrationNodeToEngineNode(
    return status;
  }
  auto trt_engine_node = graph.AddNode(engine_node, &status);
-  TF_RETURN_IF_ERROR(status);
+  TF_CHECK_OK(status);
  for (size_t i = 0; i < out_edges.size(); i++) {
    VLOG(1) << "Connecting trt_engine_node output " << i << " with "
            << out_edges.at(i)->dst()->name() << " port "
@ -2289,12 +2279,6 @@ tensorflow::Status InjectCalibrationNode(tensorrt::convert::SubGraphParams& s) {
    input_dtypes.push_back(tf_dtype);

    nvinfer1::DataType dtype(nvinfer1::DataType::kFLOAT);
-    auto type_status = ConvertDType(tf_dtype, &dtype);
-    if (type_status != tensorflow::Status::OK()) {
-      LOG(WARNING) << "Data type conversion for input '" << node_name
-                   << "' failed";
-      return type_status;
-    }
    TF_CHECK_OK(ConvertDType(tf_dtype, &dtype));

    VLOG(2) << "accessing output index of: " << output_idx
@ -2362,8 +2346,8 @@ tensorflow::Status InjectCalibrationNode(tensorrt::convert::SubGraphParams& s) {
    output_names.push_back(tensor_name);
    auto tensor_or_weights = converter.get_tensor(tensor_name);
    if (!tensor_or_weights.is_tensor()) {
-      return tensorflow::errors::InvalidArgument("Output node'" + tensor_name +
-                                                 "' is weights not tensor");
+      return tensorflow::errors::InvalidArgument(
+          "Output node is weights not tensor");
    }
    nvinfer1::ITensor* tensor = tensor_or_weights.tensor();
    if (!tensor) {
@ -2520,11 +2504,7 @@ tensorflow::Status ConvertSubGraphToTensorRTNodeDef(
    input_dtypes.push_back(tf_dtype);

    nvinfer1::DataType dtype(nvinfer1::DataType::kFLOAT);
-    auto type_status = ConvertDType(tf_dtype, &dtype);
-    if (type_status != tensorflow::Status::OK()) {
-      LOG(WARNING) << "Type conversion failed for " << node_name;
-      return type_status;
-    }
+    TF_CHECK_OK(ConvertDType(tf_dtype, &dtype));

    VLOG(2) << "Accessing output index of: " << output_idx
            << ", at node: " << node_name
@ -2535,12 +2515,8 @@ tensorflow::Status ConvertSubGraphToTensorRTNodeDef(

    // TODO(jie): TRT 3.x only support 4 dimensional input tensor.
    //            update the code once TRT 4.0 comes out.
-    if (op_info.shape().dim_size() != 4) {
-      string err_str = "Require 4 dimensional input.";
-      StrAppend(&err_str, " Got ", op_info.shape().dim_size(), " ",
-                shape_inference_node_name);
-      return tensorflow::errors::Unimplemented(err_str);
-    }
+    if (op_info.shape().dim_size() != 4)
+      return tensorflow::errors::Unimplemented("require 4 dimensional input");

    for (int i = 1; i < op_info.shape().dim_size(); i++) {
      VLOG(2) << "dimension: " << i
@ -2601,8 +2577,8 @@ tensorflow::Status ConvertSubGraphToTensorRTNodeDef(
    output_names.push_back(tensor_name);
    auto tensor_or_weights = converter.get_tensor(tensor_name);
    if (!tensor_or_weights.is_tensor()) {
-      return tensorflow::errors::InvalidArgument("Output node '" + tensor_name +
-                                                 "' is weights not tensor");
+      return tensorflow::errors::InvalidArgument(
+          "Output node is weights not tensor");
    }
    nvinfer1::ITensor* tensor = tensor_or_weights.tensor();
    if (!tensor) {
@ -2646,8 +2622,7 @@ tensorflow::Status ConvertSubGraphToTensorRTNodeDef(
  }
  TF_RETURN_IF_ERROR(weight_rmgr->Delete<tensorflow::tensorrt::TRTWeightStore>(
      engine_name, engine_name));
-  LOG(INFO) << "finished engine " << engine_name << " containing "
-            << s.subgraph_node_ids.size() << " nodes";
+  LOG(INFO) << "finished engine " << engine_name;

  // Build the TRT op
  tensorflow::NodeDefBuilder op_builder(engine_name, "TRTEngineOp");
--- a/tensorflow/contrib/tensorrt/segment/segment.cc
+++ b/tensorflow/contrib/tensorrt/segment/segment.cc
@ -80,20 +80,13 @@ void ContractEdge(tensorflow::Edge* edge, tensorflow::Graph* graph,
  std::vector<const tensorflow::Edge*> in_edges(dst->in_edges().begin(),
                                                dst->in_edges().end());
  for (const tensorflow::Edge* in_edge : in_edges) {
-    if (in_edge->IsControlEdge()) {
-      if (in_edge->src() != src) {
-        tensorflow::Edge* e = const_cast<tensorflow::Edge*>(in_edge);
-        graph->AddControlEdge(e->src(), src);
-      }
-    } else {
-      if (in_edge->src() != src) {
-        tensorflow::Edge* e = const_cast<tensorflow::Edge*>(in_edge);
-        if (e->src() == graph->source_node()) {
-          graph->AddEdge(e->src(), e->src_output(), src,
-                         tensorflow::Graph::kControlSlot);
-        } else {
-          graph->AddEdge(e->src(), e->src_output(), src, 0 /* input index */);
-        }
+    if (in_edge->src() != src) {
+      tensorflow::Edge* e = const_cast<tensorflow::Edge*>(in_edge);
+      if (e->src() == graph->source_node()) {
+        graph->AddEdge(e->src(), e->src_output(), src,
+                       tensorflow::Graph::kControlSlot);
+      } else {
+        graph->AddEdge(e->src(), e->src_output(), src, 0 /* input index */);
      }
    }
  }
@ -101,19 +94,12 @@ void ContractEdge(tensorflow::Edge* edge, tensorflow::Graph* graph,
  std::vector<const tensorflow::Edge*> out_edges(dst->out_edges().begin(),
                                                 dst->out_edges().end());
  for (const tensorflow::Edge* out_edge : out_edges) {
-    if (out_edge->IsControlEdge()) {
-      tensorflow::Edge* e = const_cast<tensorflow::Edge*>(out_edge);
-      graph->AddControlEdge(src, e->dst());
+    tensorflow::Edge* e = const_cast<tensorflow::Edge*>(out_edge);
+    if (e->dst() == graph->sink_node()) {
+      graph->AddEdge(src, tensorflow::Graph::kControlSlot, e->dst(),
+                     e->dst_input());
    } else {
-      tensorflow::Edge* e = const_cast<tensorflow::Edge*>(out_edge);
-      if (e->dst() == graph->sink_node()) {
-        VLOG(1) << " edge to sink node " << src->name() << " -> "
-                << e->dst()->name();
-        graph->AddEdge(src, tensorflow::Graph::kControlSlot, e->dst(),
-                       e->dst_input());
-      } else {
-        graph->AddEdge(src, 0 /* output index */, e->dst(), e->dst_input());
-      }
+      graph->AddEdge(src, 0 /* output index */, e->dst(), e->dst_input());
    }
  }

@ -132,7 +118,7 @@ void ContractEdge(tensorflow::Edge* edge, tensorflow::Graph* graph,

 tensorflow::Status SegmentGraph(
    const tensorflow::GraphDef& gdef,
-    const std::function<bool(const tensorflow::Node*)>& candidate_fn,
+    const std::function<bool(const tensorflow::NodeDef&)>& candidate_fn,
    const SegmentOptions& options, SegmentNodesVector* segments) {
  // Create a Graph representation of the GraphDef.
  tensorflow::FunctionLibraryDefinition flib(tensorflow::OpRegistry::Global(),
@ -150,7 +136,7 @@ tensorflow::Status SegmentGraph(
  for (int i = 0; i < graph.num_node_ids(); ++i) {
    tensorflow::Node* node = graph.FindNodeId(i);
    if (options.exclude_node_list.count(node->name()) != 0 ||
-        !candidate_fn(node)) {
+        !candidate_fn(node->def())) {
      node = nullptr;
    }
    node_segments.emplace_back(node);
@ -169,7 +155,7 @@ tensorflow::Status SegmentGraph(

  for (const tensorflow::Node* node : order) {
    // All output nodes of 'node' have been visited...
-    VLOG(2) << "Trying node " << node->name() << " id=" << node->id();
+    VLOG(2) << "Trying node " << node->name();

    // 'node' must be a TRT candidate...
    if (node_segments[node->id()].Value() == nullptr) {
@ -183,12 +169,8 @@ tensorflow::Status SegmentGraph(
    while (true) {
      std::set<const tensorflow::Edge*> contract_edges;
      for (const tensorflow::Edge* out_edge : node->out_edges()) {
-        VLOG(2) << "... out node " << out_edge->dst()->name() << " ( "
-                << out_edge->dst()->id() << " <- " << node->id() << " )";
-        if (out_edge->IsControlEdge()) {
-          VLOG(2) << "... ... Control Edge, Skipping";
-          continue;
-        }
+        VLOG(2) << "... out node " << out_edge->dst()->name();
+
        // Out node must be TRT candidate...
        if (node_segments[out_edge->dst()->id()].Value() == nullptr) {
          VLOG(2) << "... ... not a TRT candidate";
@ -214,8 +196,7 @@ tensorflow::Status SegmentGraph(
        const tensorflow::Node* src = contract_edge->src();
        const tensorflow::Node* dst = contract_edge->dst();

-        VLOG(2) << "Merge " << src->name() << " <- " << dst->name() << " ("
-                << src->id() << " <- " << dst->id();
+        VLOG(2) << "Merge " << src->name() << " <- " << dst->name();
        node_segments[src->id()].Merge(&node_segments[dst->id()]);

        // Contracting the edge leaves disconnected graph edges.
--- a/tensorflow/contrib/tensorrt/segment/segment.h
+++ b/tensorflow/contrib/tensorrt/segment/segment.h
@ -20,12 +20,10 @@ limitations under the License.
 #include <vector>

 #include "tensorflow/core/framework/graph.pb.h"
-#include "tensorflow/core/graph/graph.h"
 #include "tensorflow/core/lib/core/status.h"
 #include "tensorflow/core/platform/types.h"

 namespace tensorflow {
-
 namespace tensorrt {
 namespace segment {

@ -48,7 +46,7 @@ struct SegmentOptions {
 // @return the status.
 tensorflow::Status SegmentGraph(
    const tensorflow::GraphDef& gdef,
-    const std::function<bool(const tensorflow::Node*)>& candidate_fn,
+    const std::function<bool(const tensorflow::NodeDef&)>& candidate_fn,
    const SegmentOptions& options, SegmentNodesVector* segments);

 }  // namespace segment
--- a/tensorflow/contrib/tensorrt/segment/segment_test.cc
+++ b/tensorflow/contrib/tensorrt/segment/segment_test.cc
@ -35,7 +35,7 @@ class SegmentTest : public ::testing::Test {
  TF_Operation* Add(TF_Operation* l, TF_Operation* r, TF_Graph* graph,
                    TF_Status* s, const char* name);

-  std::function<bool(const Node*)> MakeCandidateFn(
+  std::function<bool(const NodeDef&)> MakeCandidateFn(
      const std::set<string>& node_names);

 protected:
@ -60,10 +60,10 @@ bool SegmentTest::GetGraphDef(TF_Graph* graph,
  return ret;
 }

-std::function<bool(const Node*)> SegmentTest::MakeCandidateFn(
+std::function<bool(const NodeDef&)> SegmentTest::MakeCandidateFn(
    const std::set<string>& node_names) {
-  return [node_names](const Node* node) -> bool {
-    return node_names.find(node->name()) != node_names.end();
+  return [node_names](const NodeDef& node) -> bool {
+    return node_names.find(node.name()) != node_names.end();
  };
 }

--- a/tensorflow/contrib/timeseries/python/timeseries/ar_model.py
+++ b/tensorflow/contrib/timeseries/python/timeseries/ar_model.py
@ -70,7 +70,7 @@ class ARModel(model.TimeSeriesModel):
      input_window_size: Number of past time steps of data to look at when doing
        the regression.
      output_window_size: Number of future time steps to predict. Note that
-        setting it to > 1 empirically seems to give a better fit.
+        setting it to > 1 empiricaly seems to give a better fit.
      num_features: number of input features per time step.
      num_time_buckets: Number of buckets into which to divide (time %
        periodicity) for generating time based features.
--- a/tensorflow/contrib/timeseries/python/timeseries/math_utils.py
+++ b/tensorflow/contrib/timeseries/python/timeseries/math_utils.py
@ -185,7 +185,7 @@ def batch_matrix_pow(matrices, powers):
                    { matmul(A, power(matmul(A, A), (p - 1) / 2)) for odd p
      power(A, 0) = I

-    The power(A, 0) = I case is handled by starting with accumulator set to the
+    The power(A, 0) = I case is handeled by starting with accumulator set to the
    identity matrix; matrices with zero residual powers are passed through
    unchanged.

--- a/tensorflow/contrib/timeseries/python/timeseries/state_space_models/varma.py
+++ b/tensorflow/contrib/timeseries/python/timeseries/state_space_models/varma.py
@ -107,7 +107,7 @@ class VARMA(state_space_model.StateSpaceModel):

    Returns:
      the state transition matrix. It has shape
-        [self.state_dimension, self.state_dimension].
+        [self.state_dimendion, self.state_dimension].
    """
    # Pad any unused AR blocks with zeros. The extra state is necessary if
    # ma_order >= ar_order.
@ -127,7 +127,7 @@ class VARMA(state_space_model.StateSpaceModel):

    Returns:
      the state noise transform matrix. It has shape
-        [self.state_dimension, self.num_features].
+        [self.state_dimendion, self.num_features].
    """
    # Noise is broadcast, through the moving average coefficients, to
    # un-observed parts of the latent state.
--- a/tensorflow/core/api_def/base_api/api_def_MatrixSolveLs.pbtxt
+++ b/tensorflow/core/api_def/base_api/api_def_MatrixSolveLs.pbtxt
@ -49,14 +49,14 @@ in the batch:
 If `fast` is `True`, then the solution is computed by solving the normal
 equations using Cholesky decomposition. Specifically, if \\(m \ge n\\) then
 \\(X = (A^H A + \lambda I)^{-1} A^H B\\), which solves the least-squares
-problem \\(X = \mathrm{argmin}_{Z \in \Re^{n \times k} } ||A Z - B||_F^2 + \lambda ||Z||_F^2\\). 
-If \\(m \lt n\\) then `output` is computed as
+problem \\(X = \mathrm{argmin}_{Z \in \Re^{n \times k} } ||A Z - B||_F^2 +
+\lambda ||Z||_F^2\\). If \\(m \lt n\\) then `output` is computed as
 \\(X = A^H (A A^H + \lambda I)^{-1} B\\), which (for \\(\lambda = 0\\)) is the
 minimum-norm solution to the under-determined linear system, i.e.
 \\(X = \mathrm{argmin}_{Z \in \mathbb{C}^{n \times k} } ||Z||_F^2 \\),
 subject to \\(A Z = B\\). Notice that the fast path is only numerically stable
 when \\(A\\) is numerically full rank and has a condition number
-\\(\mathrm{cond}(A) \lt \frac{1}{\sqrt{\epsilon_{mach} } }\\) or \\(\lambda\\) is
+\\(\mathrm{cond}(A) \lt \frac{1}{\sqrt{\epsilon_{mach} } }\\) or\\(\lambda\\) is
 sufficiently large.

 If `fast` is `False` an algorithm based on the numerically robust complete
--- a/tensorflow/core/common_runtime/mkl_cpu_allocator.cc
+++ b/tensorflow/core/common_runtime/mkl_cpu_allocator.cc
@ -19,6 +19,9 @@ limitations under the License.

 namespace tensorflow {

+constexpr const char* MklCPUAllocator::kMaxLimitStr;
+constexpr const size_t MklCPUAllocator::kDefaultMaxLimit;
+
 }  // namespace tensorflow

 #endif  // INTEL_MKL
--- a/tensorflow/core/framework/common_shape_fns.cc
+++ b/tensorflow/core/framework/common_shape_fns.cc
@ -1210,7 +1210,7 @@ Status ConcatV2Shape(InferenceContext* c) {
                           c->num_inputs() - 1 /* dim_index */);
 }

-Status BroadcastBinaryOpOutputShapeFn(InferenceContext* c, int output_index) {
+Status BroadcastBinaryOpShapeFn(InferenceContext* c) {
  ShapeHandle shape_x = c->input(0);
  ShapeHandle shape_y = c->input(1);
  if (!c->RankKnown(shape_x) || !c->RankKnown(shape_y)) {
@ -1272,7 +1272,7 @@ Status BroadcastBinaryOpOutputShapeFn(InferenceContext* c, int output_index) {
    }
  }

-  c->set_output(output_index, c->MakeShape(dims));
+  c->set_output(0, c->MakeShape(dims));
  return Status::OK();
 }

--- a/tensorflow/core/framework/common_shape_fns.h
+++ b/tensorflow/core/framework/common_shape_fns.h
@ -265,15 +265,9 @@ Status ConcatShape(shape_inference::InferenceContext* c,
 // Shape function for concat operations.
 Status ConcatV2Shape(shape_inference::InferenceContext* c);

-// Shape function for binary operators that broadcast their inputs
-// and with output to output_index.
-Status BroadcastBinaryOpOutputShapeFn(InferenceContext* c, int output_index);
-
 // Shape function for binary operators that broadcast their inputs.
 // Tested by ops/math_ops_test.cc.
-inline Status BroadcastBinaryOpShapeFn(InferenceContext* c) {
-  return BroadcastBinaryOpOutputShapeFn(c, 0);
-}
+Status BroadcastBinaryOpShapeFn(InferenceContext* c);

 // Shape function for random operations.
 Status RandomShape(shape_inference::InferenceContext* c);
--- a/tensorflow/core/framework/shape_inference.h
+++ b/tensorflow/core/framework/shape_inference.h
@ -317,7 +317,6 @@ class InferenceContext {
    input_tensors_as_shapes_ = input_tensors_as_shapes;
  }

-  ShapeHandle output(int64 idx) const { return outputs_[idx]; }
  void set_output(int idx, ShapeHandle shape) { outputs_[idx] = shape; }
  Status set_output(StringPiece output_name,
                    const std::vector<ShapeHandle>& shapes);
--- a/tensorflow/core/kernels/mkl_fused_batch_norm_op.cc
+++ b/tensorflow/core/kernels/mkl_fused_batch_norm_op.cc
@ -933,7 +933,7 @@ class MklFusedBatchNormOp : public OpKernel {
  bool is_training_;
  T* mean_values_;
  T* variance_values_;
-  int depth_;  // batch normalization is done for per channel.
+  size_t depth_;  // batch normalization is done for per channel.

  void ExtractParams(OpKernelContext* context) {
    const Tensor& input = MklGetInput(context, 0);
--- a/tensorflow/core/kernels/segment_reduction_ops.h
+++ b/tensorflow/core/kernels/segment_reduction_ops.h
@ -23,13 +23,6 @@ limitations under the License.
 // non-GPU targets. This only breaks in clang, because it's more strict for
 // template code and CudaAtomicMax is used in template context.

-// This file requires the following include because it uses CudaAtomicMax:
-// #include "tensorflow/core/util/cuda_kernel_helper.h"
-
-// Unfortunately we can't add the #include, since it breaks compilation for
-// non-GPU targets. This only breaks in clang, because it's more strict for
-// template code and CudaAtomicMax is used in template context.
-
 #include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
 #include "tensorflow/core/framework/tensor.h"
 #include "tensorflow/core/framework/tensor_shape.h"
--- a/tensorflow/core/kernels/snapshot_op.cc
+++ b/tensorflow/core/kernels/snapshot_op.cc
@ -22,26 +22,6 @@ limitations under the License.

 namespace tensorflow {
 typedef Eigen::ThreadPoolDevice CPUDevice;
-typedef Eigen::GpuDevice GPUDevice;
-
-template <typename Device, typename Scalar>
-class SnapshotOp : public OpKernel {
- public:
-  explicit SnapshotOp(OpKernelConstruction* context) : OpKernel(context) {}
-
-  void Compute(OpKernelContext* context) override {
-    const Tensor& input = context->input(0);
-    Tensor* output = nullptr;
-    // Try to use buffer forwarding to avoid an explicit copy.
-    OP_REQUIRES_OK(context, context->forward_input_or_allocate_output(
-                                {0}, 0, input.shape(), &output));
-    if (!output->SharesBufferWith(input)) {
-      functor::Snapshot<Device, Scalar> functor;
-      functor(context->eigen_device<Device>(), input.flat<Scalar>(),
-              output->flat<Scalar>());
-    }
-  }
-};

 #define REGISTER_KERNEL(TYPE)                                        \
  REGISTER_KERNEL_BUILDER(                                           \
@ -51,16 +31,6 @@ class SnapshotOp : public OpKernel {
 TF_CALL_POD_TYPES(REGISTER_KERNEL);
 #undef REGISTER_KERNEL

-#if GOOGLE_CUDA
-#define REGISTER_KERNEL(TYPE)                                        \
-  REGISTER_KERNEL_BUILDER(                                           \
-      Name("Snapshot").Device(DEVICE_GPU).TypeConstraint<TYPE>("T"), \
-      SnapshotOp<GPUDevice, TYPE>);
-
-TF_CALL_POD_TYPES(REGISTER_KERNEL);
-#undef REGISTER_KERNEL
-#endif
-
 #if TENSORFLOW_USE_SYCL
 typedef Eigen::SyclDevice SyclDevice;
 #define REGISTER_SYCL_KERNEL(TYPE)                                    \
--- a/tensorflow/core/kernels/snapshot_op.h
+++ b/tensorflow/core/kernels/snapshot_op.h
@ -26,19 +26,29 @@ limitations under the License.
 #include "tensorflow/core/framework/op_kernel.h"

 namespace tensorflow {
-namespace functor {

-// Functor used by SnapshotOp.
 template <typename Device, typename Scalar>
-struct Snapshot {
-  void operator()(const Device& device,
-                  typename TTypes<Scalar>::ConstTensor input,
-                  typename TTypes<Scalar>::Tensor output) {
-    device.memcpy(output.data(), input.data(), input.size() * sizeof(Scalar));
+class SnapshotOp : public OpKernel {
+ public:
+  explicit SnapshotOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    Tensor* output = nullptr;
+    // Try to use buffer forwarding to avoid an explicit copy.
+    OP_REQUIRES_OK(context, context->forward_input_or_allocate_output(
+                                {0}, 0, input.shape(), &output));
+    if (!output->SharesBufferWith(input)) {
+      // We had to allocate a new buffer since the refcount on the input was
+      // greater than 1. Copy the input to the new buffer.
+      const Device& device = context->eigen_device<Device>();
+      device.memcpy(output->template flat<Scalar>().data(),
+                    input.template flat<Scalar>().data(),
+                    input.NumElements() * sizeof(Scalar));
+    }
  }
 };

-}  // namespace functor
 }  // namespace tensorflow

 #endif  // TENSORFLOW_KERNELS_SNAPSHOT_OP_H_
--- a/tensorflow/core/kernels/snapshot_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/snapshot_op_gpu.cu.cc
@ -24,10 +24,13 @@ limitations under the License.
 namespace tensorflow {
 typedef Eigen::GpuDevice GPUDevice;

-// Definition of the GPU implementations declared in softsign_op.cc.
-#define DEFINE_GPU_KERNELS(T) template struct functor::Snapshot<GPUDevice, T>;
+#define REGISTER_KERNEL(TYPE)                                        \
+  REGISTER_KERNEL_BUILDER(                                           \
+      Name("Snapshot").Device(DEVICE_GPU).TypeConstraint<TYPE>("T"), \
+      SnapshotOp<GPUDevice, TYPE>);

-TF_CALL_POD_TYPES(DEFINE_GPU_KERNELS);
+TF_CALL_POD_TYPES(REGISTER_KERNEL);
+#undef REGISTER_KERNEL

 }  // namespace tensorflow

--- a/tensorflow/core/kernels/xent_op.cc
+++ b/tensorflow/core/kernels/xent_op.cc
@ -17,14 +17,12 @@ limitations under the License.

 #define EIGEN_USE_THREADS

+#include "tensorflow/core/kernels/xent_op.h"
 #include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
-
 #include "tensorflow/core/framework/op_kernel.h"
 #include "tensorflow/core/framework/register_types.h"
 #include "tensorflow/core/framework/tensor.h"
 #include "tensorflow/core/framework/tensor_shape.h"
-#include "tensorflow/core/kernels/xent_op.h"
-#include "tensorflow/core/util/bcast.h"

 namespace tensorflow {

@ -43,56 +41,37 @@ class SoftmaxXentWithLogitsOp : public OpKernel {
  void Compute(OpKernelContext* context) override {
    const Tensor& logits_in = context->input(0);
    const Tensor& labels_in = context->input(1);
-
-    TensorShape shape_in = logits_in.shape();
-
-    BCast bcast(BCast::FromShape(logits_in.shape()),
-                BCast::FromShape(labels_in.shape()));
-    if (!logits_in.IsSameSize(labels_in)) {
-      OP_REQUIRES(context, bcast.IsValid(),
-                  errors::InvalidArgument(
-                      "logits and labels must be broadcastable: logits_size=",
-                      logits_in.shape().DebugString(),
-                      " labels_size=", labels_in.shape().DebugString()));
-      shape_in = BCast::ToShape(bcast.output_shape());
-    }
-    OP_REQUIRES(context, TensorShapeUtils::IsMatrix(shape_in),
-                errors::InvalidArgument("logits and labels must be beither "
-                                        "2-dimensional, or roadcasted to "
-                                        "2-dimensional"));
+    OP_REQUIRES(context, logits_in.IsSameSize(labels_in),
+                errors::InvalidArgument(
+                    "logits and labels must be same size: logits_size=",
+                    logits_in.shape().DebugString(),
+                    " labels_size=", labels_in.shape().DebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsMatrix(logits_in.shape()),
+                errors::InvalidArgument("logits must be 2-dimensional"));
+    // As we already tested that both inputs have the same shape no need to
+    // check that "labels" is a matrix too.

    // loss is 1-D (one per example), and size is batch_size.

    Tensor scratch;
    OP_REQUIRES_OK(
        context, context->allocate_temp(DataTypeToEnum<T>::value,
-                                        TensorShape({shape_in.dim_size(0), 1}),
+                                        TensorShape({logits_in.dim_size(0), 1}),
                                        &scratch));

    Tensor* loss_out = nullptr;
    OP_REQUIRES_OK(context,
                   context->allocate_output(
-                       0, TensorShape({shape_in.dim_size(0)}), &loss_out));
+                       0, TensorShape({logits_in.dim_size(0)}), &loss_out));
    Tensor* back_out = nullptr;
    // Try to reuse the logits_in buffer for the backprop output.
    OP_REQUIRES_OK(context, context->forward_input_or_allocate_output(
-                                {0}, 1, shape_in, &back_out));
-    if (shape_in.dim_size(0) > 0) {
+                                {0}, 1, logits_in.shape(), &back_out));
+    if (logits_in.dim_size(0) > 0) {
      functor::XentFunctor<Device, T> functor;
-      if (logits_in.IsSameSize(labels_in)) {
-        functor(context->eigen_device<Device>(), shape_in.AsEigenDSizes<2>(),
-                Eigen::array<Eigen::DenseIndex, 2>{1, 1},
-                Eigen::array<Eigen::DenseIndex, 2>{1, 1}, logits_in.matrix<T>(),
-                labels_in.matrix<T>(), scratch.matrix<T>(), loss_out->vec<T>(),
-                back_out->matrix<T>());
-      } else {
-        functor(context->eigen_device<Device>(), shape_in.AsEigenDSizes<2>(),
-                BCast::ToIndexArray<2>(bcast.x_bcast()),
-                BCast::ToIndexArray<2>(bcast.y_bcast()),
-                logits_in.template shaped<T, 2>(bcast.x_reshape()),
-                labels_in.template shaped<T, 2>(bcast.y_reshape()),
-                scratch.matrix<T>(), loss_out->vec<T>(), back_out->matrix<T>());
-      }
+      functor(context->eigen_device<Device>(), logits_in.matrix<T>(),
+              labels_in.matrix<T>(), scratch.matrix<T>(), loss_out->vec<T>(),
+              back_out->matrix<T>());
    }
  }
 };
@ -102,17 +81,13 @@ class SoftmaxXentWithLogitsOp : public OpKernel {
 namespace functor {
 template <typename Device, typename T>
 struct XentFunctorBase {
-  void operator()(const Device& d,
-                  const Eigen::DSizes<Eigen::DenseIndex, 2>& shape,
-                  const Eigen::array<Eigen::DenseIndex, 2>& logits_bcast,
-                  const Eigen::array<Eigen::DenseIndex, 2>& labels_bcast,
-                  typename TTypes<T>::ConstMatrix logits,
+  void operator()(const Device& d, typename TTypes<T>::ConstMatrix logits,
                  typename TTypes<T>::ConstMatrix labels,
                  typename TTypes<T>::Matrix scratch,
                  typename TTypes<T>::Vec loss,
                  typename TTypes<T>::Matrix backprop) {
-    XentEigenImpl<Device, T>::Compute(d, shape, logits_bcast, labels_bcast,
-                                      logits, labels, scratch, loss, backprop);
+    XentEigenImpl<Device, T>::Compute(d, logits, labels, scratch, loss,
+                                      backprop);
  }
 };

--- a/tensorflow/core/kernels/xent_op.h
+++ b/tensorflow/core/kernels/xent_op.h
@ -18,7 +18,6 @@ limitations under the License.
 // Functor definition for XentOp, must be compilable by nvcc.

 #include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
-
 #include "tensorflow/core/framework/tensor_types.h"

 namespace tensorflow {
@ -34,11 +33,7 @@ struct XentFunctor {
  // scratch: temporary tensor, dims: batch_size, 1
  // loss: output tensor for the loss, dims: batch_size.
  // backprop: output tensor for the backprop, dims: batch_size, num_classes.
-  void operator()(const Device &d,
-                  const Eigen::DSizes<Eigen::DenseIndex, 2> &shape,
-                  const Eigen::array<Eigen::DenseIndex, 2> &logits_bcast,
-                  const Eigen::array<Eigen::DenseIndex, 2> &labels_bcast,
-                  typename TTypes<T>::ConstMatrix logits,
+  void operator()(const Device& d, typename TTypes<T>::ConstMatrix logits,
                  typename TTypes<T>::ConstMatrix labels,
                  typename TTypes<T>::Matrix scratch,
                  typename TTypes<T>::Vec loss,
@ -50,11 +45,7 @@ struct XentFunctor {
 // specializations for both device types.
 template <typename Device, typename T>
 struct XentEigenImpl {
-  static void Compute(const Device &d,
-                      const Eigen::DSizes<Eigen::DenseIndex, 2> &shape,
-                      const Eigen::array<Eigen::DenseIndex, 2> &logits_bcast,
-                      const Eigen::array<Eigen::DenseIndex, 2> &labels_bcast,
-                      typename TTypes<T>::ConstMatrix logits,
+  static void Compute(const Device& d, typename TTypes<T>::ConstMatrix logits,
                      typename TTypes<T>::ConstMatrix labels,
                      typename TTypes<T>::Matrix scratch,
                      typename TTypes<T>::Vec loss,
@ -66,8 +57,8 @@ struct XentEigenImpl {
    const int kBatchDim = 0;
    const int kClassDim = 1;

-    const int batch_size = shape[kBatchDim];
-    const int num_classes = shape[kClassDim];
+    const int batch_size = logits.dimension(kBatchDim);
+    const int num_classes = logits.dimension(kClassDim);

 // These arrays are used to reduce along the class dimension, and broadcast
 // the resulting value to all classes.
@ -93,12 +84,10 @@ struct XentEigenImpl {
 #endif

    // max_logits along classes.
-    scratch.reshape(batch_only).device(d) =
-        logits.broadcast(logits_bcast).maximum(along_class);
+    scratch.reshape(batch_only).device(d) = logits.maximum(along_class);

    // logits - max_logits.
-    backprop.device(d) =
-        logits.broadcast(logits_bcast) - scratch.broadcast(one_by_class);
+    backprop.device(d) = logits - scratch.broadcast(one_by_class);

    // sum(exp(logits - max_logits)) along classes.
    scratch.reshape(batch_only).device(d) = backprop.exp().sum(along_class);
@ -110,15 +99,15 @@ struct XentEigenImpl {
    //  sum(-labels *
    //     ((logits - max_logits) - log(sum(exp(logits - max_logits)))))
    //  along classes
-    loss.device(d) = (labels.broadcast(labels_bcast) *
-                      (scratch.log().eval().broadcast(one_by_class) - backprop))
-                         .eval()
-                         .sum(along_class);
+    loss.device(d) =
+        (labels * (scratch.log().eval().broadcast(one_by_class) - backprop))
+            .eval()
+            .sum(along_class);

    // backprop: prob - labels, where
    //   prob = exp(logits - max_logits) / sum(exp(logits - max_logits))
-    backprop.device(d) = (backprop.exp() / scratch.broadcast(one_by_class)) -
-                         labels.broadcast(labels_bcast);
+    backprop.device(d) =
+        (backprop.exp() / scratch.broadcast(one_by_class)) - labels;
  }
 };

--- a/tensorflow/core/kernels/xent_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/xent_op_gpu.cu.cc
@ -31,17 +31,12 @@ typedef Eigen::GpuDevice GPUDevice;
 namespace functor {
 template <typename T>
 struct XentFunctor<GPUDevice, T> {
-  void operator()(const GPUDevice &d,
-                  const Eigen::DSizes<Eigen::DenseIndex, 2> &shape,
-                  const Eigen::array<Eigen::DenseIndex, 2> &logits_bcast,
-                  const Eigen::array<Eigen::DenseIndex, 2> &labels_bcast,
-                  typename TTypes<T>::ConstMatrix logits,
+  void operator()(const GPUDevice& d, typename TTypes<T>::ConstMatrix logits,
                  typename TTypes<T>::ConstMatrix labels,
                  typename TTypes<T>::Matrix scratch,
                  typename TTypes<T>::Vec loss,
                  typename TTypes<T>::Matrix backprop) {
-    XentEigenImpl<GPUDevice, T>::Compute(d, shape, logits_bcast, labels_bcast,
-                                         logits, labels, scratch, loss,
+    XentEigenImpl<GPUDevice, T>::Compute(d, logits, labels, scratch, loss,
                                         backprop);
  }
 };
--- a/tensorflow/core/ops/array_ops.cc
+++ b/tensorflow/core/ops/array_ops.cc
@ -794,35 +794,11 @@ REGISTER_OP("ReverseV2")
      ShapeHandle input = c->input(0);
      ShapeHandle axis;
      TF_RETURN_IF_ERROR(c->WithRank(c->input(1), 1, &axis));
+      // TODO(aselle): if input(0)'s dimension is known we could validate axis
      if (c->Rank(input) > 8) {
        return errors::InvalidArgument(
            "reverse does not work on tensors with more than 8 dimensions");
      }
-      const Tensor* axis_tensor = c->input_tensor(1);
-      if (axis_tensor != nullptr && c->RankKnown(input)) {
-        int32 rank = c->Rank(input);
-        std::vector<int64> axis_value;
-        if (axis_tensor->dtype() == DT_INT32) {
-          axis_value = AsInt64<int32>(axis_tensor, axis_tensor->NumElements());
-        } else {
-          axis_value = AsInt64<int64>(axis_tensor, axis_tensor->NumElements());
-        }
-        std::vector<bool> axes_dense(c->Rank(input), false);
-        for (int i = 0; i < axis_value.size(); i++) {
-          int64 canonical_axis =
-              axis_value[i] < 0 ? rank + axis_value[i] : axis_value[i];
-          if (canonical_axis < 0 || canonical_axis >= rank) {
-            return errors::InvalidArgument("'axis'[", i, "] = ", axis_value[i],
-                                           " is out of valid range [", 0, ", ",
-                                           rank - 1);
-          }
-          if (axes_dense[canonical_axis]) {
-            return errors::InvalidArgument("axis ", canonical_axis,
-                                           " specified more than once.");
-          }
-          axes_dense[canonical_axis] = true;
-        }
-      }
      c->set_output(0, input);
      return Status::OK();
    });
--- a/tensorflow/core/ops/nn_ops.cc
+++ b/tensorflow/core/ops/nn_ops.cc
@ -1062,27 +1062,12 @@ REGISTER_OP("SoftmaxCrossEntropyWithLogits")
    .Attr("T: {half, bfloat16, float, double}")
    .SetShapeFn([](InferenceContext* c) {
      ShapeHandle input;
-      if (c->WithRank(c->input(0), 2, &input) == Status::OK() &&
-          c->Merge(input, c->input(1), &input) == Status::OK()) {
-        DimensionHandle batch_size = c->Dim(input, 0);
-        c->set_output(0, c->Vector(batch_size));
-        c->set_output(1, input);
-        return Status::OK();
-      }
-      TF_RETURN_IF_ERROR(BroadcastBinaryOpOutputShapeFn(c, 1));
+      TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 2, &input));
+      TF_RETURN_IF_ERROR(c->Merge(input, c->input(1), &input));

-      if (!c->RankKnown(c->output(1))) {
-        return errors::InvalidArgument(
-            "Shape must be broadcasted with rank 2, but is rank is unknown.");
-      }
-
-      if (c->Rank(c->output(1)) != 2) {
-        return errors::InvalidArgument(
-            "Shape must be broadcasted with rank 2, but is rank ",
-            c->Rank(c->output(1)));
-      }
-      DimensionHandle batch_size = c->Dim(c->output(1), 0);
+      DimensionHandle batch_size = c->Dim(input, 0);
      c->set_output(0, c->Vector(batch_size));
+      c->set_output(1, input);
      return Status::OK();
    });

--- a/tensorflow/core/ops/nn_ops_test.cc
+++ b/tensorflow/core/ops/nn_ops_test.cc
@ -410,18 +410,10 @@ TEST(NNOpsTest, SoftmaxCrossEntropyWithLogits_ShapeFn) {
  INFER_OK(op, "[1,?];[?,2]", "[d0_0];[d0_0,d0_1|d1_1]");
  INFER_OK(op, "[?,2];[1,2]", "[d1_0];in1");

-  INFER_ERROR("Shape must be broadcasted with rank 2", op, "[1,2,3];?");
-  INFER_ERROR("Shape must be broadcasted with rank 2", op, "?;[1,2,3]");
-
-  // Broadcast example
-  // [1,4] and [2,4] are broadcasted to [2,4]
-  INFER_OK(op, "[1,4];[2,4]", "[d1_0];[d1_0,d0_1|d1_1]");
-  // [2,4] and [2,1] are broadcasted to [2,4]
-  INFER_OK(op, "[2,4];[2,1]", "[d0_0];[d0_0|d1_0,d0_1]");
-  // [1,?] and [2,4] are broadcasted to [2,4]
-  INFER_OK(op, "[1,?];[2,4]", "[d1_0];[d1_0,d0_1|d1_1]");
-  // [2,4] and [?,1] are broadcasted to [2,4]
-  INFER_OK(op, "[2,4];[?,1]", "[d0_0];[d0_0|d1_0,d0_1]");
+  INFER_ERROR("Dimension 0 in both shapes must be equal, but are 1 and 2", op,
+              "[1,?];[2,?]");
+  INFER_ERROR("Shape must be rank 2 but is rank 3", op, "[1,2,3];?");
+  INFER_ERROR("Shapes must be equal rank, but are 2 and 3", op, "?;[1,2,3]");
 }

 TEST(NNOpsTest, SparseSoftmaxCrossEntropyWithLogits_ShapeFn) {
--- a/tensorflow/core/public/version.h
+++ b/tensorflow/core/public/version.h
@ -19,12 +19,12 @@ limitations under the License.
 // TensorFlow uses semantic versioning, see http://semver.org/.

 #define TF_MAJOR_VERSION 1
-#define TF_MINOR_VERSION 7
+#define TF_MINOR_VERSION 6
 #define TF_PATCH_VERSION 0

 // TF_VERSION_SUFFIX is non-empty for pre-releases (e.g. "-alpha", "-alpha.1",
 // "-beta", "-rc", "-rc.1")
-#define TF_VERSION_SUFFIX "-rc1"
+#define TF_VERSION_SUFFIX ""

 #define TF_STR_HELPER(x) #x
 #define TF_STR(x) TF_STR_HELPER(x)
--- a/tensorflow/docs_src/api_guides/python/contrib.bayesflow.monte_carlo.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.bayesflow.monte_carlo.md
@ -6,42 +6,42 @@ Monte Carlo integration and helpers.
 ## Background

 Monte Carlo integration refers to the practice of estimating an expectation with
-a sample mean.  For example, given random variable `Z in \\(R^k\\)` with density `p`,
+a sample mean.  For example, given random variable `Z in R^k` with density `p`,
 the expectation of function `f` can be approximated like:

 ```
-$$E_p[f(Z)] = \int f(z) p(z) dz$$
-$$          ~ S_n
-          := n^{-1} \sum_{i=1}^n f(z_i),  z_i\ iid\ samples\ from\ p.$$
+E_p[f(Z)] = \int f(z) p(z) dz
+          ~ S_n
+          := n^{-1} \sum_{i=1}^n f(z_i),  z_i iid samples from p.
 ```

-If `\\(E_p[|f(Z)|] < infinity\\)`, then `\\(S_n\\) --> \\(E_p[f(Z)]\\)` by the strong law of large
-numbers.  If `\\(E_p[f(Z)^2] < infinity\\)`, then `\\(S_n\\)` is asymptotically normal with
-variance `\\(Var[f(Z)] / n\\)`.
+If `E_p[|f(Z)|] < infinity`, then `S_n --> E_p[f(Z)]` by the strong law of large
+numbers.  If `E_p[f(Z)^2] < infinity`, then `S_n` is asymptotically normal with
+variance `Var[f(Z)] / n`.

 Practitioners of Bayesian statistics often find themselves wanting to estimate
-`\\(E_p[f(Z)]\\)` when the distribution `p` is known only up to a constant.  For
+`E_p[f(Z)]` when the distribution `p` is known only up to a constant.  For
 example, the joint distribution `p(z, x)` may be known, but the evidence
-`\\(p(x) = \int p(z, x) dz\\)` may be intractable.  In that case, a parameterized
-distribution family `\\(q_\lambda(z)\\)` may be chosen, and the optimal `\\(\lambda\\)` is the
-one minimizing the KL divergence between `\\(q_\lambda(z)\\)` and
-`\\(p(z | x)\\)`.  We only know `p(z, x)`, but that is sufficient to find `\\(\lambda\\)`.
+`p(x) = \int p(z, x) dz` may be intractable.  In that case, a parameterized
+distribution family `q_lambda(z)` may be chosen, and the optimal `lambda` is the
+one minimizing the KL divergence between `q_lambda(z)` and
+`p(z | x)`.  We only know `p(z, x)`, but that is sufficient to find `lambda`.


 ## Log-space evaluation and subtracting the maximum

 Care must be taken when the random variable lives in a high dimensional space.
-For example, the naive importance sample estimate `\\(E_q[f(Z) p(Z) / q(Z)]\\)`
-involves the ratio of two terms `\\(p(Z) / q(Z)\\)`, each of which must have tails
-dropping off faster than `\\(O(|z|^{-(k + 1)})\\)` in order to have finite integral.
+For example, the naive importance sample estimate `E_q[f(Z) p(Z) / q(Z)]`
+involves the ratio of two terms `p(Z) / q(Z)`, each of which must have tails
+dropping off faster than `O(|z|^{-(k + 1)})` in order to have finite integral.
 This ratio would often be zero or infinity up to numerical precision.

 For that reason, we write

 ```
-$$Log E_q[ f(Z) p(Z) / q(Z) ]$$
-$$   = Log E_q[ \exp\{Log[f(Z)] + Log[p(Z)] - Log[q(Z)] - C\} ] + C,$$  where
-$$C := Max[ Log[f(Z)] + Log[p(Z)] - Log[q(Z)] ].$$
+Log E_q[ f(Z) p(Z) / q(Z) ]
+   = Log E_q[ exp{Log[f(Z)] + Log[p(Z)] - Log[q(Z)] - C} ] + C,  where
+C := Max[ Log[f(Z)] + Log[p(Z)] - Log[q(Z)] ].
 ```

 The maximum value of the exponentiated term will be 0.0, and the expectation
--- a/tensorflow/docs_src/api_guides/python/contrib.losses.md
+++ b/tensorflow/docs_src/api_guides/python/contrib.losses.md
@ -107,19 +107,19 @@ weighted average over the individual prediction errors:
  loss = tf.contrib.losses.mean_squared_error(predictions, depths, weight)
 ```

-* @{tf.contrib.losses.absolute_difference}
-* @{tf.contrib.losses.add_loss}
-* @{tf.contrib.losses.hinge_loss}
-* @{tf.contrib.losses.compute_weighted_loss}
-* @{tf.contrib.losses.cosine_distance}
-* @{tf.contrib.losses.get_losses}
-* @{tf.contrib.losses.get_regularization_losses}
-* @{tf.contrib.losses.get_total_loss}
-* @{tf.contrib.losses.log_loss}
-* @{tf.contrib.losses.mean_pairwise_squared_error}
-* @{tf.contrib.losses.mean_squared_error}
-* @{tf.contrib.losses.sigmoid_cross_entropy}
-* @{tf.contrib.losses.softmax_cross_entropy}
-* @{tf.contrib.losses.sparse_softmax_cross_entropy}
+@{tf.contrib.losses.absolute_difference}
+@{tf.contrib.losses.add_loss}
+@{tf.contrib.losses.hinge_loss}
+@{tf.contrib.losses.compute_weighted_loss}
+@{tf.contrib.losses.cosine_distance}
+@{tf.contrib.losses.get_losses}
+@{tf.contrib.losses.get_regularization_losses}
+@{tf.contrib.losses.get_total_loss}
+@{tf.contrib.losses.log_loss}
+@{tf.contrib.losses.mean_pairwise_squared_error}
+@{tf.contrib.losses.mean_squared_error}
+@{tf.contrib.losses.sigmoid_cross_entropy}
+@{tf.contrib.losses.softmax_cross_entropy}
+@{tf.contrib.losses.sparse_softmax_cross_entropy}


--- a/tensorflow/docs_src/community/documentation.md
+++ b/tensorflow/docs_src/community/documentation.md
@ -477,29 +477,31 @@ should use Markdown in the docstring.

 Here's a simple example:

-    def foo(x, y, name="bar"):
-      """Computes foo.
+```python
+def foo(x, y, name="bar"):
+  """Computes foo.

-      Given two 1-D tensors `x` and `y`, this operation computes the foo.
+  Given two 1-D tensors `x` and `y`, this operation computes the foo.

-      Example:
+  Example:

-      ```
-      # x is [1, 1]
-      # y is [2, 2]
-      tf.foo(x, y) ==> [3, 3]
-      ```
-      Args:
-        x: A `Tensor` of type `int32`.
-        y: A `Tensor` of type `int32`.
-        name: A name for the operation (optional).
+  ```
+  # x is [1, 1]
+  # y is [2, 2]
+  tf.foo(x, y) ==> [3, 3]
+  ```
+  Args:
+    x: A `Tensor` of type `int32`.
+    y: A `Tensor` of type `int32`.
+    name: A name for the operation (optional).

-      Returns:
-        A `Tensor` of type `int32` that is the foo of `x` and `y`.
+  Returns:
+    A `Tensor` of type `int32` that is the foo of `x` and `y`.

-      Raises:
-        ValueError: If `x` or `y` are not of type `int32`.
-      """
+  Raises:
+    ValueError: If `x` or `y` are not of type `int32`.
+  """
+```

 ## Description of the docstring sections

--- a/tensorflow/docs_src/install/install_c.md
+++ b/tensorflow/docs_src/install/install_c.md
@ -38,7 +38,7 @@ enable TensorFlow for C:
         OS="linux" # Change to "darwin" for macOS
         TARGET_DIRECTORY="/usr/local"
         curl -L \
-           "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-${OS}-x86_64-1.7.0-rc1.tar.gz" |
+           "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-${OS}-x86_64-1.6.0.tar.gz" |
           sudo tar -C $TARGET_DIRECTORY -xz

     The `tar` command extracts the TensorFlow C library into the `lib`
--- a/tensorflow/docs_src/install/install_go.md
+++ b/tensorflow/docs_src/install/install_go.md
@ -38,7 +38,7 @@ steps to install this library and enable TensorFlow for Go:
         TF_TYPE="cpu" # Change to "gpu" for GPU support
         TARGET_DIRECTORY='/usr/local'
         curl -L \
-           "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-$(go env GOOS)-x86_64-1.7.0-rc1.tar.gz" |
+           "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-$(go env GOOS)-x86_64-1.6.0.tar.gz" |
         sudo tar -C $TARGET_DIRECTORY -xz

     The `tar` command extracts the TensorFlow C library into the `lib`
--- a/tensorflow/docs_src/install/install_java.md
+++ b/tensorflow/docs_src/install/install_java.md
@ -36,7 +36,7 @@ following to the project's `pom.xml` to use the TensorFlow Java APIs:
 <dependency>
  <groupId>org.tensorflow</groupId>
  <artifactId>tensorflow</artifactId>
-  <version>1.7.0-rc1</version>
+  <version>1.6.0</version>
 </dependency>
 ```

@ -65,7 +65,7 @@ As an example, these steps will create a Maven project that uses TensorFlow:
               <dependency>
                 <groupId>org.tensorflow</groupId>
                 <artifactId>tensorflow</artifactId>
-                 <version>1.7.0-rc1</version>
+                 <version>1.6.0</version>
               </dependency>
             </dependencies>
         </project>
@ -123,12 +123,12 @@ instead:
 <dependency>
  <groupId>org.tensorflow</groupId>
  <artifactId>libtensorflow</artifactId>
-  <version>1.7.0-rc1</version>
+  <version>1.6.0</version>
 </dependency>
 <dependency>
  <groupId>org.tensorflow</groupId>
  <artifactId>libtensorflow_jni_gpu</artifactId>
-  <version>1.7.0-rc1</version>
+  <version>1.6.0</version>
 </dependency>
 ```

@ -147,7 +147,7 @@ refer to the simpler instructions above instead.
 Take the following steps to install TensorFlow for Java on Linux or macOS:

  1. Download
-     [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.7.0-rc1.jar),
+     [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.6.0.jar),
     which is the TensorFlow Java Archive (JAR).

  2. Decide whether you will run TensorFlow for Java on CPU(s) only or with
@ -166,7 +166,7 @@ Take the following steps to install TensorFlow for Java on Linux or macOS:
         OS=$(uname -s | tr '[:upper:]' '[:lower:]')
         mkdir -p ./jni
         curl -L \
-           "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-${TF_TYPE}-${OS}-x86_64-1.7.0-rc1.tar.gz" |
+           "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-${TF_TYPE}-${OS}-x86_64-1.6.0.tar.gz" |
           tar -xz -C ./jni

 ### Install on Windows
@ -174,10 +174,10 @@ Take the following steps to install TensorFlow for Java on Linux or macOS:
 Take the following steps to install TensorFlow for Java on Windows:

  1. Download
-     [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.7.0-rc1.jar),
+     [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.6.0.jar),
     which is the TensorFlow Java Archive (JAR).
  2. Download the following Java Native Interface (JNI) file appropriate for
-     [TensorFlow for Java on Windows](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-windows-x86_64-1.7.0-rc1.zip).
+     [TensorFlow for Java on Windows](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-windows-x86_64-1.6.0.zip).
  3. Extract this .zip file.


@ -225,7 +225,7 @@ must be part of your `classpath`. For example, you can include the
 downloaded `.jar` in your `classpath` by using the `-cp` compilation flag
 as follows:

-<pre><b>javac -cp libtensorflow-1.7.0-rc1.jar HelloTF.java</b></pre>
+<pre><b>javac -cp libtensorflow-1.6.0.jar HelloTF.java</b></pre>


 ### Running
@ -239,11 +239,11 @@ two files are available to the JVM:
 For example, the following command line executes the `HelloTF` program on Linux
 and macOS X:

-<pre><b>java -cp libtensorflow-1.7.0-rc1.jar:. -Djava.library.path=./jni HelloTF</b></pre>
+<pre><b>java -cp libtensorflow-1.6.0.jar:. -Djava.library.path=./jni HelloTF</b></pre>

 And the following command line executes the `HelloTF` program on Windows:

-<pre><b>java -cp libtensorflow-1.7.0-rc1.jar;. -Djava.library.path=jni HelloTF</b></pre>
+<pre><b>java -cp libtensorflow-1.6.0.jar;. -Djava.library.path=jni HelloTF</b></pre>

 If the program prints <tt>Hello from <i>version</i></tt>, you've successfully
 installed TensorFlow for Java and are ready to use the API.  If the program
--- a/tensorflow/docs_src/install/install_linux.md
+++ b/tensorflow/docs_src/install/install_linux.md
@ -165,7 +165,7 @@ Take the following steps to install TensorFlow with Virtualenv:
     Virtualenv environment:

     <pre>(tensorflow)$ <b>pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp34-cp34m-linux_x86_64.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp34-cp34m-linux_x86_64.whl</b></pre>

 If you encounter installation problems, see
 [Common Installation Problems](#common_installation_problems).
@ -270,7 +270,7 @@ take the following steps:

     <pre>
     $ <b>sudo pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp34-cp34m-linux_x86_64.whl</b>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp34-cp34m-linux_x86_64.whl</b>
     </pre>

     If this step fails, see
@ -456,7 +456,7 @@ Take the following steps to install TensorFlow in an Anaconda environment:

     <pre>
     (tensorflow)$ <b>pip install --ignore-installed --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp34-cp34m-linux_x86_64.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp34-cp34m-linux_x86_64.whl</b></pre>

 <a name="ValidateYourInstallation"></a>
 ## Validate your installation
@ -630,14 +630,14 @@ This section documents the relevant values for Linux installations.
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp27-none-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0rc1-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp27-none-linux_x86_64.whl
 </pre>

 Note that GPU support requires the NVIDIA hardware and software described in
@ -649,14 +649,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp34-cp34m-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0rc1-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp34-cp34m-linux_x86_64.whl
 </pre>

 Note that GPU support requires the NVIDIA hardware and software described in
@ -668,14 +668,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp35-cp35m-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0rc1-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp35-cp35m-linux_x86_64.whl
 </pre>


@ -687,14 +687,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.7.0rc1-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.6.0-cp36-cp36m-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.7.0rc1-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp36-cp36m-linux_x86_64.whl
 </pre>


--- a/tensorflow/docs_src/install/install_mac.md
+++ b/tensorflow/docs_src/install/install_mac.md
@ -118,8 +118,8 @@ Take the following steps to install TensorFlow with Virtualenv:
     Python 2.7, the command to install
     TensorFlow in the active Virtualenv is as follows:

-     <pre> $ <b>pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.7.0rc1-py3-none-any.whl</b></pre>
+     <pre> $ <b>pip install --upgrade \
+     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py2-none-any.whl</b></pre>

 If you encounter installation problems, see
 [Common Installation Problems](#common-installation-problems).
@ -241,8 +241,8 @@ take the following steps:
     you are installing TensorFlow for macOS and Python 2.7
     issue the following command:

-     <pre> $ <b>sudo pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.7.0rc1-py3-none-any.whl</b> </pre>
+     <pre> $ <b>sudo pip install --upgrade \
+     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py2-none-any.whl</b> </pre>

     If the preceding command fails, see
     [installation problems](#common-installation-problems).
@ -350,7 +350,7 @@ Take the following steps to install TensorFlow in an Anaconda environment:
     TensorFlow for Python 2.7:

     <pre> (<i>targetDirectory</i>)$ <b>pip install --ignore-installed --upgrade \
-     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.7.0rc1-py2-none-any.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py2-none-any.whl</b></pre>


 <a name="ValidateYourInstallation"></a>
@ -524,7 +524,7 @@ The value you specify depends on your Python version.


 <pre>
-https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.7.0rc1-py2-none-any.whl
+https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py2-none-any.whl
 </pre>


@ -532,5 +532,5 @@ https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.7.0rc1-py2-none-a


 <pre>
-https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.7.0rc1-py3-none-any.whl
+https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.6.0-py3-none-any.whl
 </pre>
--- a/tensorflow/docs_src/install/install_sources.md
+++ b/tensorflow/docs_src/install/install_sources.md
@ -350,10 +350,10 @@ Invoke `pip install` to install that pip package.
 The filename of the `.whl` file depends on your platform.
 For example, the following command will install the pip package

-for TensorFlow 1.7.0rc1 on Linux:
+for TensorFlow 1.6.0 on Linux:

 <pre>
-$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.7.0rc1-py2-none-any.whl</b>
+$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.6.0-py2-none-any.whl</b>
 </pre>

 ## Validate your installation
@ -450,8 +450,6 @@ Stack Overflow and specify the `tensorflow` tag.
 **Linux**
 <table>
 <tr><th>Version:</th><th>CPU/GPU:</th><th>Python Version:</th><th>Compiler:</th><th>Build Tools:</th><th>cuDNN:</th><th>CUDA:</th></tr>
-<tr><td>tensorflow-1.7.0rc1</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.10.0</td><td>N/A</td><td>N/A</td></tr>
-<tr><td>tensorflow_gpu-1.7.0rc1</td><td>GPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.9.0</td><td>7</td><td>9</td></tr>
 <tr><td>tensorflow-1.6.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.9.0</td><td>N/A</td><td>N/A</td></tr>
 <tr><td>tensorflow_gpu-1.6.0</td><td>GPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.9.0</td><td>7</td><td>9</td></tr>
 <tr><td>tensorflow-1.5.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.8.0</td><td>N/A</td><td>N/A</td></tr>
@ -471,7 +469,6 @@ Stack Overflow and specify the `tensorflow` tag.
 **Mac**
 <table>
 <tr><th>Version:</th><th>CPU/GPU:</th><th>Python Version:</th><th>Compiler:</th><th>Build Tools:</th><th>cuDNN:</th><th>CUDA:</th></tr>
-<tr><td>tensorflow-1.7.0rc1</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.10.1</td><td>N/A</td><td>N/A</td></tr>
 <tr><td>tensorflow-1.6.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.8.1</td><td>N/A</td><td>N/A</td></tr>
 <tr><td>tensorflow-1.5.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.8.1</td><td>N/A</td><td>N/A</td></tr>
 <tr><td>tensorflow-1.4.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.5.4</td><td>N/A</td><td>N/A</td></tr>
@ -486,8 +483,6 @@ Stack Overflow and specify the `tensorflow` tag.
 **Windows**
 <table>
 <tr><th>Version:</th><th>CPU/GPU:</th><th>Python Version:</th><th>Compiler:</th><th>Build Tools:</th><th>cuDNN:</th><th>CUDA:</th></tr>
-<tr><td>tensorflow-1.7.0rc1</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
-<tr><td>tensorflow_gpu-1.7.0rc1</td><td>GPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>7</td><td>9</td></tr>
 <tr><td>tensorflow-1.6.0</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
 <tr><td>tensorflow_gpu-1.6.0</td><td>GPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>7</td><td>9</td></tr>
 <tr><td>tensorflow-1.5.0</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
--- a/tensorflow/docs_src/mobile/optimizing.md
+++ b/tensorflow/docs_src/mobile/optimizing.md
@ -233,8 +233,6 @@ order by how long they took. From left to right, the columns are:
 - The cumulative total time of this and the previous ops in the table. This is
  handy for understanding what the distribution of work is across the layers, to
  see if just a few of the nodes are taking up most of the time.
-  
- The amount of memory consumed by outputs of this type of op.

 - Name of the node.

--- a/tensorflow/docs_src/mobile/prepare_models.md
+++ b/tensorflow/docs_src/mobile/prepare_models.md
@ -60,7 +60,7 @@ and serialized as protocol buffers:
  the `NodeDef`, so if all the `Variable` weights are converted to `Const` nodes,
  then we only need a single `GraphDef` file to hold the model architecture and
  the weights. Freezing the graph handles the process of loading the
-  checkpoints, and then converts all Variables to Consts. You can then load the
+  checkpoints, and then converts all Consts to Variables. You can then load the
  resulting file in a single call, without having to restore variable values
  from checkpoints. One thing to watch out for with `GraphDef` files is that
  sometimes they’re stored in text format for easy inspection. These versions
--- a/tensorflow/python/BUILD
+++ b/tensorflow/python/BUILD
@ -1065,7 +1065,7 @@ py_test(

 py_test(
    name = "framework_importer_test",
-    size = "large",
+    size = "medium",
    srcs = ["framework/importer_test.py"],
    main = "framework/importer_test.py",
    srcs_version = "PY2AND3",
--- a/tensorflow/python/kernel_tests/array_ops_test.py
+++ b/tensorflow/python/kernel_tests/array_ops_test.py
@ -315,39 +315,21 @@ class ReverseV2Test(test_util.TensorFlowTestCase):
            self.assertAllEqual(x_tf_4, np.asarray(x_np)[:, ::-1])
            self.assertAllEqual(x_tf_5, np.asarray(x_np)[::-1, ::-1])

-  # This test covers the axis validation in the shape function
-  # (no eval())
-  def testInvalidAxis(self):
-    x_np = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
-    with self.assertRaisesRegexp(ValueError,
-                                 "is out of valid range"):
-      array_ops.reverse_v2(x_np, [-30])
-    with self.assertRaisesRegexp(ValueError,
-                                 "is out of valid range"):
-      array_ops.reverse_v2(x_np, [2])
-    with self.assertRaisesRegexp(ValueError,
-                                 "axis 0 specified more than once"):
-      array_ops.reverse_v2(x_np, [0, -2])
-
  # This is the version of reverse that uses axis indices rather than
  # bool tensors
  # TODO(b/32254538): Change this test to use array_ops.reverse
-  #
-  # Note: this test passes placeholder as constant axis is validated
-  # in shape function (see testInvalidAxis)
  def testInvalid(self):
    x_np = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
-    axis = array_ops.placeholder(dtypes.int32)
    with self.test_session():
      with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
                                   "is out of valid range"):
-        array_ops.reverse_v2(x_np, axis).eval(feed_dict={axis: [-30]})
+        array_ops.reverse_v2(x_np, [-30]).eval()
      with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
                                   "is out of valid range"):
-        array_ops.reverse_v2(x_np, axis).eval(feed_dict={axis: [2]})
+        array_ops.reverse_v2(x_np, [2]).eval()
      with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
                                   "axis 0 specified more than once"):
-        array_ops.reverse_v2(x_np, axis).eval(feed_dict={axis: [0, -2]})
+        array_ops.reverse_v2(x_np, [0, -2]).eval()

  def testReverse1DimAuto(self):
    for dtype in [
@ -908,7 +890,7 @@ class StridedSliceAssignChecker(object):
        var = resource_variable_ops.ResourceVariable(self.x)
      else:
        var = variables.Variable(self.x)
-      sess.run(variables.variables_initializer([var]))
+      sess.run(variables.initialize_variables([var]))
      val = sess.run(var[index].assign(value))
      # val_copy is used to check that tf.assign works equivalently to the
      # assign method above.
--- a/tensorflow/python/kernel_tests/testdata/BUILD
+++ b/tensorflow/python/kernel_tests/testdata/BUILD
@ -1,7 +1,7 @@
 # Data files for kernel tests.

 package(
-    default_visibility = ["//visibility:public"],
+    default_visibility = ["//tensorflow:internal"],
 )

 licenses(["notice"])  # Apache 2.0
--- a/tensorflow/python/kernel_tests/xent_op_test.py
+++ b/tensorflow/python/kernel_tests/xent_op_test.py
@ -18,16 +18,10 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

-import itertools
-import sys
-
 import numpy as np

-from tensorflow.python.client import session
 from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
-from tensorflow.python.framework import ops
-from tensorflow.python.ops import array_ops
 from tensorflow.python.ops import gen_nn_ops
 from tensorflow.python.ops import gradient_checker
 from tensorflow.python.ops import gradients_impl
@ -94,7 +88,7 @@ class XentTest(test.TestCase):
                                                    4.]]]).astype(dtype)
      np_labels = np.array([[[0., 0., 0., 1.]], [[0., .5, .5,
                                                  0.]]]).astype(dtype)
-      self.assertRaisesRegexp(ValueError, "rank 2, but is rank 3",
+      self.assertRaisesRegexp(ValueError, "must be rank 2",
                              gen_nn_ops.softmax_cross_entropy_with_logits,
                              np_features, np_labels)

@ -134,24 +128,6 @@ class XentTest(test.TestCase):
    self.assertAllClose(
        np.array([1.3862, 1.9401]), np_loss, rtol=1.e-3, atol=1.e-3)

-  def testShapeBroadcast(self):
-    np_f = np.array([[1., 2., 3., 4.],
-                     [1., 2., 3., 4.]]).astype(np.float32)
-    np_l = np.array([[0., 0., 0., 1.],
-                     [0., .5, .5, 0.]]).astype(np.float32)
-    np_loss, np_backprop = self._npXent(np_f, np_l)
-    tf_f = constant_op.constant(
-        np.array([[1., 2., 3., 4.]]).astype(np.float32))
-    tf_l = constant_op.constant(
-        np.array([[0., 0., 0., 1.], [0., .5, .5, 0.]]).astype(np.float32))
-    for use_gpu in [False, True]:
-      with self.test_session(use_gpu=use_gpu) as sess:
-        loss, backprop = gen_nn_ops.softmax_cross_entropy_with_logits(
-            tf_f, tf_l)
-        tf_loss, tf_backprop = sess.run([loss, backprop])
-      self.assertAllCloseAccordingToType(np_loss, tf_loss)
-      self.assertAllCloseAccordingToType(np_backprop, tf_backprop)
-
  def testShapeMismatch(self):
    with self.test_session():
      with self.assertRaises(ValueError):
@ -284,60 +260,5 @@ class XentTest(test.TestCase):
    self.assertAllEqual(np_loss, tf_loss)


-class XentBenchmark(test.Benchmark):
-
-  def benchmarkZeroDimension(self):
-    for (m, n, p, use_gpu) in itertools.product(
-        [128],
-        [10, 100, 1000, 10000, 100000],
-        [0.001, 0.01, 0.5, 0.99, 1.0],
-        [False]):
-      k = int(p * n)
-      if k == 0:
-        continue
-      name = "zero_dimension_m_%d_n_%d_k_%g_use_gpu_%s" % (m, n, k, use_gpu)
-      device = "/%s:0" % ("gpu" if use_gpu else "cpu")
-      with ops.Graph().as_default():
-        with ops.device(device):
-          labels = array_ops.zeros([0, 2, 4], dtype=dtypes.float32)
-          logits = array_ops.zeros([0, 2, 4], dtype=dtypes.float32)
-          op = nn_ops.softmax_cross_entropy_with_logits(
-              labels=labels, logits=logits)
-        with session.Session() as sess:
-          r = self.run_op_benchmark(sess, op, min_iters=100, name=name)
-          gb_processed_input = m * n / 1.0e9
-          throughput = gb_processed_input / r["wall_time"]
-          print("Benchmark: %s \t wall_time: %0.03g s \t "
-                "Throughput: %0.03g GB/s" % (name, r["wall_time"], throughput))
-          sys.stdout.flush()
-
-  def benchmarkSingleClass(self):
-    for (m, n, p, use_gpu) in itertools.product(
-        [128],
-        [10, 100, 1000, 10000, 100000],
-        [0.001, 0.01, 0.5, 0.99, 1.0],
-        [False]):
-      k = int(p * n)
-      if k == 0:
-        continue
-      name = "single_class_m_%d_n_%d_k_%g_use_gpu_%s" % (m, n, k, use_gpu)
-      device = "/%s:0" % ("gpu" if use_gpu else "cpu")
-      with ops.Graph().as_default():
-        with ops.device(device):
-          labels = constant_op.constant([[1.], [-1.], [0.]],
-                                        dtype=dtypes.float32)
-          logits = constant_op.constant([[-1.], [0.], [1.]],
-                                        dtype=dtypes.float32)
-          op = nn_ops.softmax_cross_entropy_with_logits(
-              labels=labels, logits=logits)
-        with session.Session() as sess:
-          r = self.run_op_benchmark(sess, op, min_iters=100, name=name)
-          gb_processed_input = m * n / 1.0e9
-          throughput = gb_processed_input / r["wall_time"]
-          print("Benchmark: %s \t wall_time: %0.03g s \t "
-                "Throughput: %0.03g GB/s" % (name, r["wall_time"], throughput))
-          sys.stdout.flush()
-
-
 if __name__ == "__main__":
  test.main()
--- a/tensorflow/python/layers/convolutional.py
+++ b/tensorflow/python/layers/convolutional.py
@ -180,8 +180,6 @@ class _Conv(base.Layer):
          # bias_add when computing gradients. To use bias_add, we collapse Z
          # and Y into a single dimension to obtain a 4D input tensor.
          outputs_shape = outputs.shape.as_list()
-          if outputs_shape[0] is None:
-            outputs_shape[0] = -1
          outputs_4d = array_ops.reshape(outputs,
                                         [outputs_shape[0], outputs_shape[1],
                                          outputs_shape[2] * outputs_shape[3],
--- a/tensorflow/python/layers/convolutional_test.py
+++ b/tensorflow/python/layers/convolutional_test.py
@ -325,12 +325,6 @@ class ConvTest(test.TestCase):
    self.assertEqual(conv3d.kernel_constraint, k_constraint)
    self.assertEqual(conv3d.bias_constraint, b_constraint)

-  def testConv3DChannelsFirst(self):
-    # Test case for GitHub issue 15655
-    images = array_ops.placeholder(
-        dtype=dtypes.float32, shape=[None, 1, 32, 32, 32])
-    conv_layers.conv3d(images, 32, 9, data_format='channels_first')
-

@test_util.with_c_api
 class SeparableConv1DTest(test.TestCase):
--- a/tensorflow/python/ops/linalg_ops.py
+++ b/tensorflow/python/ops/linalg_ops.py
@ -429,7 +429,7 @@ def svd(tensor, full_matrices=False, compute_uv=True, name=None):
  u, s, v_adj = np.linalg.svd(a, full_matrices=False)
  np_a_approx = np.dot(u, np.dot(np.diag(s), v_adj))
  # tf_a_approx and np_a_approx should be numerically close.
-  ```
+  ````
  @end_compatibility
  """
  s, u, v = gen_linalg_ops.svd(
--- a/Show More
+++ b/Show More