Merge changes from github.

Change: 155209832
2017-05-05 09:09:05 -08:00 · 2017-05-05 09:09:05 -08:00 · 692fad20f9
commit 692fad20f9
parent b329dd821e
101 changed files with 2169 additions and 1377 deletions
--- a/ISSUE_TEMPLATE.md
+++ b/ISSUE_TEMPLATE.md
@ -1,33 +1,36 @@
-Please go to Stack Overflow for help and support. http://stackoverflow.com/questions/tagged/tensorflow
+Please go to Stack Overflow for help and support:
+
+http://stackoverflow.com/questions/tagged/tensorflow
+
 If you open a GitHub issue, here is our policy:

-1. It must be a bug or feature request.
+1. It must be a bug or a feature request.
 2. The form below must be filled out.

-**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g. fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
+**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.

 ------------------------

+### System information
+- **Have I written custom code (as opposed to using a stock example script provided in TensorFlow)**:
+- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
+- **TensorFlow installed from (source or binary)**:
+- **TensorFlow version (use command below)**:
+- **Bazel version (if compiling from source)**:
+- **CUDA/cuDNN version**:
+- **GPU model and memory**:
+- **Exact command to reproduce**:
+
+You can collect some of this information using our environment capture script:
+
+https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
+
+You can obtain the TensorFlow version with
+
+python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
+
+### Describe the problem
 Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

-### System Information
- *Have I written custom code (as opposed to using a stock example script provided in TensorFlow)?*:
- *OS Platform and Distribution (i.e. Linux Ubuntu 16.0)*:
- *TensorFlow installed from (source or binary)?*:
- *TensorFlow version* (use command below):
- *Bazel version (if compiling from source)*:
- *CUDA/cuDNN version*:
- *GPU Model and Memory*:
- *Exact command to reproduce*:
-
-You can collect some of this information using our environment capture script https://github.com/tensorflow/tensorflow/blob/master/tools/
-You can collect the TensorFlow version with
-```sh
-python -c "import tensorflow as tf; print (tf.GIT_VERSION, tf.VERSION)"
-```
-
-
-### Describe the problem clearly
-
-### Source Code / Logs
-Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full-traceback. Large logs and files should be attached. Try to reproducible test-case code the bare-minimum necessary to generate the problem
+### Source code / logs
+Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.
--- a/README.md
+++ b/README.md
@ -26,7 +26,7 @@ guidelines](CONTRIBUTING.md).**

 **We use [GitHub issues](https://github.com/tensorflow/tensorflow/issues) for
 tracking requests and bugs, but please see
-[Community](tensorflow/docs_src/about/index.md#community) for general questions
+[Community](https://www.tensorflow.org/community/) for general questions
 and discussion.**

 ## Installation
@ -34,13 +34,12 @@ and discussion.**

 People who are a little more adventurous can also try our nightly binaries:

-
-* Linux CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/))
-* Linux GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/))
-* Mac CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0rc2-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
-* Mac GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0rc2-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/))
-* Windows CPU-only: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.1.0rc2-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/))
-* Windows GPU: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.1.0rc2-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/))
+* Linux CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/))
+* Linux GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/))
+* Mac CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.1.0-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
+* Mac GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.1.0-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/))
+* Windows CPU-only: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=cpu,OS=windows/))
+* Windows GPU: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/DEVICE=gpu,OS=windows/))
 * Android: [demo APK](https://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/tensorflow_demo.apk), [native libs](http://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/native/)
 ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-android/))

--- a/RELEASE.md
+++ b/RELEASE.md
@ -1,6 +1,7 @@
 # Changes since the last release

 ## Major Features and Improvements
+* Added `tf.layers.conv3d_transpose` layer for spatio temporal deconvolution.
 * Added `tf.Session.make_callable()`, which provides a lower overhead means of running a similar step multiple times.
 * Added ibverbs-based RDMA support to contrib (courtesy @junshi15 from Yahoo).
 * `RNNCell` objects now subclass `tf.layers._Layer`.  The strictness described
--- a/70
+++ b/70
@ -35,12 +35,9 @@ function is_windows() {
  fi
 }

-function sed_hyphen_i() {
-  if is_macos; then
-    sed -i '' "$@"
-  else
-    sed -i "$@"
-  fi
+function sed_in_place() {
+  sed -e $1 $2 > "$2.bak"
+  mv "$2.bak" $2
 }

 function write_to_bazelrc() {
@ -170,7 +167,7 @@ function setup_python {
 rm -f .tf_configure.bazelrc
 touch .tf_configure.bazelrc
 touch .bazelrc
-sed_hyphen_i "/tf_configure/d" .bazelrc
+sed_in_place "/tf_configure/d" .bazelrc
 echo "import %workspace%/.tf_configure.bazelrc" >> .bazelrc

 # Delete any leftover BUILD files from the Makefile build, which would interfere
@ -409,31 +406,6 @@ done
 export TF_CUDA_CLANG
 write_action_env_to_bazelrc "TF_CUDA_CLANG" "$TF_CUDA_CLANG"

-# Set up which gcc nvcc should use as the host compiler
-# No need to set this on Windows
-while [[ "$TF_CUDA_CLANG" != "1" ]] && ! is_windows && true; do
-  fromuser=""
-  if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
-    default_gcc_host_compiler_path=$(which gcc || true)
-    read -p "Please specify which gcc should be used by nvcc as the host compiler. [Default is $default_gcc_host_compiler_path]: " GCC_HOST_COMPILER_PATH
-    fromuser="1"
-    if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
-      GCC_HOST_COMPILER_PATH="$default_gcc_host_compiler_path"
-    fi
-  fi
-  if [ -e "$GCC_HOST_COMPILER_PATH" ]; then
-    export GCC_HOST_COMPILER_PATH
-    write_action_env_to_bazelrc "GCC_HOST_COMPILER_PATH" "$GCC_HOST_COMPILER_PATH"
-    break
-  fi
-  echo "Invalid gcc path. ${GCC_HOST_COMPILER_PATH} cannot be found" 1>&2
-  if [ -z "$fromuser" ]; then
-    exit 1
-  fi
-  GCC_HOST_COMPILER_PATH=""
-  # Retry
-done
-
 # Set up which clang we should use as the cuda / host compiler.
 while [[ "$TF_CUDA_CLANG" == "1" ]] && true; do
  fromuser=""
@ -474,6 +446,11 @@ while true; do
      else
        default_cuda_path="$(cygpath -m "$CUDA_PATH")"
      fi
+    elif is_linux; then
+      # If the default doesn't exist, try an alternative default.
+      if [ ! -d $default_cuda_path ] && [ -d /opt/cuda ]; then
+        default_cuda_path=/opt/cuda
+      fi
    fi
    read -p "Please specify the location where CUDA $TF_CUDA_VERSION toolkit is installed. Refer to README.md for more details. [Default is $default_cuda_path]: " CUDA_TOOLKIT_PATH
    fromuser="1"
@ -513,6 +490,35 @@ while true; do
  CUDA_TOOLKIT_PATH=""
 done

+# Set up which gcc nvcc should use as the host compiler
+# No need to set this on Windows
+while [[ "$TF_CUDA_CLANG" != "1" ]] && ! is_windows && true; do
+  fromuser=""
+  if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
+    default_gcc_host_compiler_path=$(which gcc || true)
+    cuda_bin_symlink="$CUDA_TOOLKIT_PATH/bin/gcc"
+    if [ -L "$cuda_bin_symlink" ]; then
+      default_gcc_host_compiler_path=$(readlink $cuda_bin_symlink)
+    fi
+    read -p "Please specify which gcc should be used by nvcc as the host compiler. [Default is $default_gcc_host_compiler_path]: " GCC_HOST_COMPILER_PATH
+    fromuser="1"
+    if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
+      GCC_HOST_COMPILER_PATH="$default_gcc_host_compiler_path"
+    fi
+  fi
+  if [ -e "$GCC_HOST_COMPILER_PATH" ]; then
+    export GCC_HOST_COMPILER_PATH
+    write_action_env_to_bazelrc "GCC_HOST_COMPILER_PATH" "$GCC_HOST_COMPILER_PATH"
+    break
+  fi
+  echo "Invalid gcc path. ${GCC_HOST_COMPILER_PATH} cannot be found" 1>&2
+  if [ -z "$fromuser" ]; then
+    exit 1
+  fi
+  GCC_HOST_COMPILER_PATH=""
+  # Retry
+done
+
 # Find out where the cuDNN library is installed
 while true; do
  # Configure the cuDNN version to use.
--- a/tensorflow/BUILD
+++ b/tensorflow/BUILD
@ -255,6 +255,7 @@ filegroup(
        "//tensorflow/contrib/seq2seq:all_files",
        "//tensorflow/contrib/session_bundle:all_files",
        "//tensorflow/contrib/session_bundle/example:all_files",
+        "//tensorflow/contrib/signal:all_files",
        "//tensorflow/contrib/slim:all_files",
        "//tensorflow/contrib/slim/python/slim/data:all_files",
        "//tensorflow/contrib/slim/python/slim/nets:all_files",
--- a/tensorflow/contrib/batching/BUILD
+++ b/tensorflow/contrib/batching/BUILD
@ -181,6 +181,7 @@ py_test(
    size = "small",
    srcs = ["python/ops/batch_ops_test.py"],
    srcs_version = "PY2AND3",
+    tags = ["nomac"],
    deps = [
        ":batch_py",
        "//tensorflow/python:framework_test_lib",
--- a/tensorflow/contrib/factorization/python/ops/gmm_ops.py
+++ b/tensorflow/contrib/factorization/python/ops/gmm_ops.py
@ -85,7 +85,7 @@ def _init_clusters_random(data, num_clusters, random_seed):
        maxval=math_ops.cast(num_data, dtypes.int64),
        seed=random_seed,
        dtype=dtypes.int64)
-  indices = math_ops.cast(indices, dtypes.int32) % num_data
+  indices %= math_ops.cast(num_data, dtypes.int64)
  clusters_init = embedding_lookup(data, indices, partition_strategy='div')
  return clusters_init

--- a/tensorflow/contrib/grid_rnn/python/kernel_tests/grid_rnn_test.py
+++ b/tensorflow/contrib/grid_rnn/python/kernel_tests/grid_rnn_test.py
@ -34,180 +34,228 @@ from tensorflow.python.platform import test
 class GridRNNCellTest(test.TestCase):

  def testGrid2BasicLSTMCell(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.2)) as root_scope:
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
        cell = grid_rnn_cell.Grid2BasicLSTMCell(2)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
        })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.36617181, 0.36617181]])
-        self.assertAllClose(res[1], [[0.71053141, 0.71053141, 0.36617181,
-                                      0.36617181, 0.72320831, 0.80555487,
-                                      0.39102408, 0.42150158]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+
+        self.assertAllClose(res_g, ([[0.36617181, 0.36617181]],))
+        self.assertAllClose(
+            res_s, (([[0.71053141, 0.71053141]], [[0.36617181, 0.36617181]]),
+                    ([[0.72320831, 0.80555487]], [[0.39102408, 0.42150158]])))

        # emulate a loop through the input sequence,
        # where we call cell() multiple times
        root_scope.reuse_variables()
        g2, s2 = cell(x, m)
-        self.assertEqual(g2.get_shape(), (1, 2))
-        self.assertEqual(s2.get_shape(), (1, 8))
+        self.assertEqual(g2[0].get_shape(), (1, 2))
+        self.assertEqual(s2[0].c.get_shape(), (1, 2))
+        self.assertEqual(s2[0].h.get_shape(), (1, 2))
+        self.assertEqual(s2[1].c.get_shape(), (1, 2))
+        self.assertEqual(s2[1].h.get_shape(), (1, 2))

-        res = sess.run([g2, s2], {x: np.array([[2., 2., 2.]]), m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.58847463, 0.58847463]])
-        self.assertAllClose(res[1], [[1.40469193, 1.40469193, 0.58847463,
-                                      0.58847463, 0.97726452, 1.04626071,
-                                      0.4927212, 0.51137757]])
+        res_g2, res_s2 = sess.run([g2, s2],
+                                  {x: np.array([[2., 2., 2.]]),
+                                   m: res_s})
+        self.assertEqual(res_g2[0].shape, (1, 2))
+        self.assertEqual(res_s2[0].c.shape, (1, 2))
+        self.assertEqual(res_s2[0].h.shape, (1, 2))
+        self.assertEqual(res_s2[1].c.shape, (1, 2))
+        self.assertEqual(res_s2[1].h.shape, (1, 2))
+        self.assertAllClose(res_g2[0], [[0.58847463, 0.58847463]])
+        self.assertAllClose(
+            res_s2, (([[1.40469193, 1.40469193]], [[0.58847463, 0.58847463]]),
+                     ([[0.97726452, 1.04626071]], [[0.4927212, 0.51137757]])))

  def testGrid2BasicLSTMCellTied(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.2)):
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
        cell = grid_rnn_cell.Grid2BasicLSTMCell(2, tied=True)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
        })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.36617181, 0.36617181]])
-        self.assertAllClose(res[1], [[0.71053141, 0.71053141, 0.36617181,
-                                      0.36617181, 0.72320831, 0.80555487,
-                                      0.39102408, 0.42150158]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))

-        res = sess.run([g, s], {x: np.array([[1., 1., 1.]]), m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.36703536, 0.36703536]])
-        self.assertAllClose(res[1], [[0.71200621, 0.71200621, 0.36703536,
-                                      0.36703536, 0.80941606, 0.87550586,
-                                      0.40108523, 0.42199609]])
+        self.assertAllClose(res_g[0], [[0.36617181, 0.36617181]])
+        self.assertAllClose(
+            res_s, (([[0.71053141, 0.71053141]], [[0.36617181, 0.36617181]]),
+                    ([[0.72320831, 0.80555487]], [[0.39102408, 0.42150158]])))
+
+        res_g, res_s = sess.run([g, s], {x: np.array([[1., 1., 1.]]), m: res_s})
+        self.assertEqual(res_g[0].shape, (1, 2))
+
+        self.assertAllClose(res_g[0], [[0.36703536, 0.36703536]])
+        self.assertAllClose(
+            res_s, (([[0.71200621, 0.71200621]], [[0.36703536, 0.36703536]]),
+                    ([[0.80941606, 0.87550586]], [[0.40108523, 0.42199609]])))

  def testGrid2BasicLSTMCellWithRelu(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.2)):
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
        cell = grid_rnn_cell.Grid2BasicLSTMCell(
            2, tied=False, non_recurrent_fn=nn_ops.relu)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, ((2, 2),))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1., 1.]]),
-             m: np.array([[0.1, 0.2, 0.3, 0.4]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.31667367, 0.31667367]])
-        self.assertAllClose(res[1], [[0.29530135, 0.37520045, 0.17044567,
-                                      0.21292259]])
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),)
+        })
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertAllClose(res_g[0], [[0.31667367, 0.31667367]])
+        self.assertAllClose(res_s, (([[0.29530135, 0.37520045]],
+                                     [[0.17044567, 0.21292259]]),))

  """LSTMCell
  """

  def testGrid2LSTMCell(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
        cell = grid_rnn_cell.Grid2LSTMCell(2, use_peepholes=True)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
        })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.95686918, 0.95686918]])
-        self.assertAllClose(res[1], [[2.41515064, 2.41515064, 0.95686918,
-                                      0.95686918, 1.38917875, 1.49043763,
-                                      0.83884692, 0.86036491]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+
+        self.assertAllClose(res_g[0], [[0.95686918, 0.95686918]])
+        self.assertAllClose(
+            res_s, (([[2.41515064, 2.41515064]], [[0.95686918, 0.95686918]]),
+                    ([[1.38917875, 1.49043763]], [[0.83884692, 0.86036491]])))

  def testGrid2LSTMCellTied(self):
-    with self.test_session() as sess:
+    with self.test_session(use_gpu=False) as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 8])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
        cell = grid_rnn_cell.Grid2LSTMCell(2, tied=True, use_peepholes=True)
-        self.assertEqual(cell.state_size, 8)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2)))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 8))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1., 1.]]),
-            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])))
        })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 8))
-        self.assertAllClose(res[0], [[0.95686918, 0.95686918]])
-        self.assertAllClose(res[1], [[2.41515064, 2.41515064, 0.95686918,
-                                      0.95686918, 1.38917875, 1.49043763,
-                                      0.83884692, 0.86036491]])
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+
+        self.assertAllClose(res_g[0], [[0.95686918, 0.95686918]])
+        self.assertAllClose(
+            res_s, (([[2.41515064, 2.41515064]], [[0.95686918, 0.95686918]]),
+                    ([[1.38917875, 1.49043763]], [[0.83884692, 0.86036491]])))

  def testGrid2LSTMCellWithRelu(self):
    with self.test_session() as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
        cell = grid_rnn_cell.Grid2LSTMCell(
            2, use_peepholes=True, non_recurrent_fn=nn_ops.relu)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, ((2, 2),))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1., 1.]]),
-             m: np.array([[0.1, 0.2, 0.3, 0.4]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[2.1831727, 2.1831727]])
-        self.assertAllClose(res[1], [[0.92270052, 1.02325559, 0.66159075,
-                                      0.70475441]])
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),)
+        })
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertAllClose(res_g[0], [[2.1831727, 2.1831727]])
+        self.assertAllClose(res_s, (([[0.92270052, 1.02325559]],
+                                     [[0.66159075, 0.70475441]]),))

  """RNNCell
  """
@ -217,74 +265,84 @@ class GridRNNCellTest(test.TestCase):
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([2, 2])
-        m = array_ops.zeros([2, 4])
+        m = (array_ops.zeros([2, 2]), array_ops.zeros([2, 2]))
        cell = grid_rnn_cell.Grid2BasicRNNCell(2)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, (2, 2))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (2, 2))
-        self.assertEqual(s.get_shape(), (2, 4))
+        self.assertEqual(g[0].get_shape(), (2, 2))
+        self.assertEqual(s[0].get_shape(), (2, 2))
+        self.assertEqual(s[1].get_shape(), (2, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1.], [2., 2.]]),
-            m: np.array([[0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1.], [2., 2.]]),
+            m: (np.array([[0.1, 0.1], [0.2, 0.2]]), np.array([[0.1, 0.1],
+                                                              [0.2, 0.2]]))
        })
-        self.assertEqual(res[0].shape, (2, 2))
-        self.assertEqual(res[1].shape, (2, 4))
-        self.assertAllClose(res[0], [[0.94685763, 0.94685763],
-                                     [0.99480951, 0.99480951]])
-        self.assertAllClose(res[1],
-                            [[0.94685763, 0.94685763, 0.80049908, 0.80049908],
-                             [0.99480951, 0.99480951, 0.97574311, 0.97574311]])
+        self.assertEqual(res_g[0].shape, (2, 2))
+        self.assertEqual(res_s[0].shape, (2, 2))
+        self.assertEqual(res_s[1].shape, (2, 2))
+
+        self.assertAllClose(res_g, ([[0.94685763, 0.94685763],
+                                     [0.99480951, 0.99480951]],))
+        self.assertAllClose(
+            res_s, ([[0.94685763, 0.94685763], [0.99480951, 0.99480951]],
+                    [[0.80049908, 0.80049908], [0.97574311, 0.97574311]]))

  def testGrid2BasicRNNCellTied(self):
    with self.test_session() as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([2, 2])
-        m = array_ops.zeros([2, 4])
+        m = (array_ops.zeros([2, 2]), array_ops.zeros([2, 2]))
        cell = grid_rnn_cell.Grid2BasicRNNCell(2, tied=True)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, (2, 2))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (2, 2))
-        self.assertEqual(s.get_shape(), (2, 4))
+        self.assertEqual(g[0].get_shape(), (2, 2))
+        self.assertEqual(s[0].get_shape(), (2, 2))
+        self.assertEqual(s[1].get_shape(), (2, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
-            x: np.array([[1., 1.], [2., 2.]]),
-            m: np.array([[0.1, 0.1, 0.1, 0.1], [0.2, 0.2, 0.2, 0.2]])
+        res_g, res_s = sess.run([g, s], {
+            x:
+                np.array([[1., 1.], [2., 2.]]),
+            m: (np.array([[0.1, 0.1], [0.2, 0.2]]), np.array([[0.1, 0.1],
+                                                              [0.2, 0.2]]))
        })
-        self.assertEqual(res[0].shape, (2, 2))
-        self.assertEqual(res[1].shape, (2, 4))
-        self.assertAllClose(res[0], [[0.94685763, 0.94685763],
-                                     [0.99480951, 0.99480951]])
-        self.assertAllClose(res[1],
-                            [[0.94685763, 0.94685763, 0.80049908, 0.80049908],
-                             [0.99480951, 0.99480951, 0.97574311, 0.97574311]])
+        self.assertEqual(res_g[0].shape, (2, 2))
+        self.assertEqual(res_s[0].shape, (2, 2))
+        self.assertEqual(res_s[1].shape, (2, 2))
+
+        self.assertAllClose(res_g, ([[0.94685763, 0.94685763],
+                                     [0.99480951, 0.99480951]],))
+        self.assertAllClose(
+            res_s, ([[0.94685763, 0.94685763], [0.99480951, 0.99480951]],
+                    [[0.80049908, 0.80049908], [0.97574311, 0.97574311]]))

  def testGrid2BasicRNNCellWithRelu(self):
    with self.test_session() as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([1, 2])
-        m = array_ops.zeros([1, 2])
+        m = (array_ops.zeros([1, 2]),)
        cell = grid_rnn_cell.Grid2BasicRNNCell(2, non_recurrent_fn=nn_ops.relu)
-        self.assertEqual(cell.state_size, 2)
+        self.assertEqual(cell.state_size, (2,))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 2))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s],
-                       {x: np.array([[1., 1.]]),
-                        m: np.array([[0.1, 0.1]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 2))
-        self.assertAllClose(res[0], [[1.80049896, 1.80049896]])
-        self.assertAllClose(res[1], [[0.80049896, 0.80049896]])
+        res_g, res_s = sess.run(
+            [g, s], {x: np.array([[1., 1.]]),
+                     m: np.array([[0.1, 0.1]])})
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].shape, (1, 2))
+        self.assertAllClose(res_g, ([[1.80049896, 1.80049896]],))
+        self.assertAllClose(res_s, ([[0.80049896, 0.80049896]],))

  """1-LSTM
  """
@ -294,51 +352,59 @@ class GridRNNCellTest(test.TestCase):
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)) as root_scope:
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)
        cell = grid_rnn_cell.Grid1LSTMCell(2, use_peepholes=True)
-        self.assertEqual(cell.state_size, 4)
+        self.assertEqual(cell.state_size, ((2, 2),))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1., 1.]]),
-             m: np.array([[0.1, 0.2, 0.3, 0.4]])})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.91287315, 0.91287315]])
-        self.assertAllClose(res[1],
-                            [[2.26285243, 2.26285243, 0.91287315, 0.91287315]])
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),)
+        })
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+
+        self.assertAllClose(res_g, ([[0.91287315, 0.91287315]],))
+        self.assertAllClose(res_s, (([[2.26285243, 2.26285243]],
+                                     [[0.91287315, 0.91287315]]),))

        root_scope.reuse_variables()

        x2 = array_ops.zeros([0, 0])
        g2, s2 = cell(x2, m)
-        self.assertEqual(g2.get_shape(), (1, 2))
-        self.assertEqual(s2.get_shape(), (1, 4))
+        self.assertEqual(g2[0].get_shape(), (1, 2))
+        self.assertEqual(s2[0].c.get_shape(), (1, 2))
+        self.assertEqual(s2[0].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g2, s2], {m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.9032144, 0.9032144]])
-        self.assertAllClose(res[1],
-                            [[2.79966092, 2.79966092, 0.9032144, 0.9032144]])
+        res_g2, res_s2 = sess.run([g2, s2], {m: res_s})
+        self.assertEqual(res_g2[0].shape, (1, 2))
+        self.assertEqual(res_s2[0].c.shape, (1, 2))
+        self.assertEqual(res_s2[0].h.shape, (1, 2))
+
+        self.assertAllClose(res_g2, ([[0.9032144, 0.9032144]],))
+        self.assertAllClose(res_s2, (([[2.79966092, 2.79966092]],
+                                      [[0.9032144, 0.9032144]]),))

        g3, s3 = cell(x2, m)
-        self.assertEqual(g3.get_shape(), (1, 2))
-        self.assertEqual(s3.get_shape(), (1, 4))
+        self.assertEqual(g3[0].get_shape(), (1, 2))
+        self.assertEqual(s3[0].c.get_shape(), (1, 2))
+        self.assertEqual(s3[0].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g3, s3], {m: res[1]})
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 4))
-        self.assertAllClose(res[0], [[0.92727238, 0.92727238]])
-        self.assertAllClose(res[1],
-                            [[3.3529923, 3.3529923, 0.92727238, 0.92727238]])
+        res_g3, res_s3 = sess.run([g3, s3], {m: res_s2})
+        self.assertEqual(res_g3[0].shape, (1, 2))
+        self.assertEqual(res_s3[0].c.shape, (1, 2))
+        self.assertEqual(res_s3[0].h.shape, (1, 2))
+        self.assertAllClose(res_g3, ([[0.92727238, 0.92727238]],))
+        self.assertAllClose(res_s3, (([[3.3529923, 3.3529923]],
+                                      [[0.92727238, 0.92727238]]),))

  """3-LSTM
  """
@ -348,32 +414,42 @@ class GridRNNCellTest(test.TestCase):
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([1, 3])
-        m = array_ops.zeros([1, 12])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),
+             (array_ops.zeros([1, 2]), array_ops.zeros([1, 2])))
        cell = grid_rnn_cell.Grid3LSTMCell(2, use_peepholes=True)
-        self.assertEqual(cell.state_size, 12)
+        self.assertEqual(cell.state_size, ((2, 2), (2, 2), (2, 2)))

        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (1, 2))
-        self.assertEqual(s.get_shape(), (1, 12))
+        self.assertEqual(g[0].get_shape(), (1, 2))
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))
+        self.assertEqual(s[1].c.get_shape(), (1, 2))
+        self.assertEqual(s[1].h.get_shape(), (1, 2))
+        self.assertEqual(s[2].c.get_shape(), (1, 2))
+        self.assertEqual(s[2].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {
+        res_g, res_s = sess.run([g, s], {
            x:
                np.array([[1., 1., 1.]]),
-            m:
-                np.array([[
-                    0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, -0.1, -0.2, -0.3,
-                    -0.4
-                ]])
+            m: ((np.array([[0.1, 0.2]]), np.array([[0.3, 0.4]])),
+                (np.array([[0.5, 0.6]]), np.array([[0.7, 0.8]])), (np.array(
+                    [[-0.1, -0.2]]), np.array([[-0.3, -0.4]])))
        })
-        self.assertEqual(res[0].shape, (1, 2))
-        self.assertEqual(res[1].shape, (1, 12))
+        self.assertEqual(res_g[0].shape, (1, 2))
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))
+        self.assertEqual(res_s[1].c.shape, (1, 2))
+        self.assertEqual(res_s[1].h.shape, (1, 2))
+        self.assertEqual(res_s[2].c.shape, (1, 2))
+        self.assertEqual(res_s[2].h.shape, (1, 2))

-        self.assertAllClose(res[0], [[0.96892911, 0.96892911]])
-        self.assertAllClose(res[1], [[2.45227885, 2.45227885, 0.96892911,
-                                      0.96892911, 1.33592629, 1.4373529,
-                                      0.80867189, 0.83247656, 0.7317788,
-                                      0.63205892, 0.56548983, 0.50446129]])
+        self.assertAllClose(res_g, ([[0.96892911, 0.96892911]],))
+        self.assertAllClose(
+            res_s, (([[2.45227885, 2.45227885]], [[0.96892911, 0.96892911]]),
+                    ([[1.33592629, 1.4373529]], [[0.80867189, 0.83247656]]),
+                    ([[0.7317788, 0.63205892]], [[0.56548983, 0.50446129]])))

  """Edge cases
  """
@ -383,7 +459,7 @@ class GridRNNCellTest(test.TestCase):
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([3, 2])
-        m = array_ops.zeros([0, 0])
+        m = ()

        # this is equivalent to relu
        cell = grid_rnn_cell.GridRNNCell(
@ -394,21 +470,22 @@ class GridRNNCellTest(test.TestCase):
            non_recurrent_dims=0,
            non_recurrent_fn=nn_ops.relu)
        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (3, 2))
-        self.assertEqual(s.get_shape(), (0, 0))
+        self.assertEqual(g[0].get_shape(), (3, 2))
+        self.assertEqual(s, ())

        sess.run([variables.global_variables_initializer()])
-        res = sess.run([g, s], {x: np.array([[1., -1.], [-2, 1], [2, -1]])})
-        self.assertEqual(res[0].shape, (3, 2))
-        self.assertEqual(res[1].shape, (0, 0))
-        self.assertAllClose(res[0], [[0, 0], [0, 0], [0.5, 0.5]])
+        res_g, res_s = sess.run([g, s],
+                                {x: np.array([[1., -1.], [-2, 1], [2, -1]])})
+        self.assertEqual(res_g[0].shape, (3, 2))
+        self.assertEqual(res_s, ())
+        self.assertAllClose(res_g, ([[0, 0], [0, 0], [0.5, 0.5]],))

  def testGridRNNEdgeCasesNoOutput(self):
    with self.test_session() as sess:
      with variable_scope.variable_scope(
          'root', initializer=init_ops.constant_initializer(0.5)):
        x = array_ops.zeros([1, 2])
-        m = array_ops.zeros([1, 4])
+        m = ((array_ops.zeros([1, 2]), array_ops.zeros([1, 2])),)

        # This cell produces no output
        cell = grid_rnn_cell.GridRNNCell(
@ -419,16 +496,18 @@ class GridRNNCellTest(test.TestCase):
            non_recurrent_dims=0,
            non_recurrent_fn=nn_ops.relu)
        g, s = cell(x, m)
-        self.assertEqual(g.get_shape(), (0, 0))
-        self.assertEqual(s.get_shape(), (1, 4))
+        self.assertEqual(g, ())
+        self.assertEqual(s[0].c.get_shape(), (1, 2))
+        self.assertEqual(s[0].h.get_shape(), (1, 2))

        sess.run([variables.global_variables_initializer()])
-        res = sess.run(
-            [g, s],
-            {x: np.array([[1., 1.]]),
-             m: np.array([[0.1, 0.1, 0.1, 0.1]])})
-        self.assertEqual(res[0].shape, (0, 0))
-        self.assertEqual(res[1].shape, (1, 4))
+        res_g, res_s = sess.run([g, s], {
+            x: np.array([[1., 1.]]),
+            m: ((np.array([[0.1, 0.1]]), np.array([[0.1, 0.1]])),)
+        })
+        self.assertEqual(res_g, ())
+        self.assertEqual(res_s[0].c.shape, (1, 2))
+        self.assertEqual(res_s[0].h.shape, (1, 2))

  """Test with tf.nn.rnn
  """
@ -451,20 +530,29 @@ class GridRNNCellTest(test.TestCase):
      outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)

    self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 8))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].h.get_shape(), (batch_size, 2))

    for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape()[0], inp.get_shape()[0])
-      self.assertEqual(out.get_shape()[1], num_units)
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape()[0], inp.get_shape()[0])
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)

    with self.test_session() as sess:
      sess.run(variables.global_variables_initializer())

      input_value = np.ones((batch_size, input_size))
      values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))

  def testGrid2LSTMCellReLUWithRNN(self):
    batch_size = 3
@ -478,27 +566,33 @@ class GridRNNCellTest(test.TestCase):
          num_units=num_units, non_recurrent_fn=nn_ops.relu)

      inputs = max_length * [
-          array_ops.placeholder(
-              dtypes.float32, shape=(batch_size, input_size))
+          array_ops.placeholder(dtypes.float32, shape=(batch_size, input_size))
      ]

      outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)

    self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 4))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))

    for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape()[0], inp.get_shape()[0])
-      self.assertEqual(out.get_shape()[1], num_units)
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape()[0], inp.get_shape()[0])
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)

    with self.test_session() as sess:
      sess.run(variables.global_variables_initializer())

      input_value = np.ones((batch_size, input_size))
      values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))

  def testGrid3LSTMCellReLUWithRNN(self):
    batch_size = 3
@ -512,27 +606,35 @@ class GridRNNCellTest(test.TestCase):
          num_units=num_units, non_recurrent_fn=nn_ops.relu)

      inputs = max_length * [
-          array_ops.placeholder(
-              dtypes.float32, shape=(batch_size, input_size))
+          array_ops.placeholder(dtypes.float32, shape=(batch_size, input_size))
      ]

      outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)

    self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 8))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[1].h.get_shape(), (batch_size, 2))

    for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape()[0], inp.get_shape()[0])
-      self.assertEqual(out.get_shape()[1], num_units)
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape()[0], inp.get_shape()[0])
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)

    with self.test_session() as sess:
      sess.run(variables.global_variables_initializer())

      input_value = np.ones((batch_size, input_size))
      values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))

  def testGrid1LSTMCellWithRNN(self):
    batch_size = 3
@ -553,20 +655,91 @@ class GridRNNCellTest(test.TestCase):
      outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)

    self.assertEqual(len(outputs), len(inputs))
-    self.assertEqual(state.get_shape(), (batch_size, 4))
+    self.assertEqual(state[0].c.get_shape(), (batch_size, 2))
+    self.assertEqual(state[0].h.get_shape(), (batch_size, 2))

    for out, inp in zip(outputs, inputs):
-      self.assertEqual(out.get_shape(), (3, num_units))
-      self.assertEqual(out.dtype, inp.dtype)
+      self.assertEqual(len(out), 1)
+      self.assertEqual(out[0].get_shape(), (3, num_units))
+      self.assertEqual(out[0].dtype, inp.dtype)

    with self.test_session() as sess:
      sess.run(variables.global_variables_initializer())

      input_value = np.ones((batch_size, input_size))
      values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
-      for v in values:
-        self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))

+  def testGrid2LSTMCellWithRNNAndDynamicBatchSize(self):
+    """Test for #4296."""
+    input_size = 5
+    max_length = 6  # unrolled up to this length
+    num_units = 2
+
+    with variable_scope.variable_scope(
+        'root', initializer=init_ops.constant_initializer(0.5)):
+      cell = grid_rnn_cell.Grid2LSTMCell(num_units=num_units)
+
+      inputs = max_length * [
+          array_ops.placeholder(dtypes.float32, shape=(None, input_size))
+      ]
+
+      outputs, state = core_rnn.static_rnn(cell, inputs, dtype=dtypes.float32)
+
+    self.assertEqual(len(outputs), len(inputs))
+
+    for out, inp in zip(outputs, inputs):
+      self.assertEqual(len(out), 1)
+      self.assertTrue(out[0].get_shape()[0].value is None)
+      self.assertEqual(out[0].get_shape()[1], num_units)
+      self.assertEqual(out[0].dtype, inp.dtype)
+
+    with self.test_session() as sess:
+      sess.run(variables.global_variables_initializer())
+
+      input_value = np.ones((3, input_size))
+      values = sess.run(outputs + [state], feed_dict={inputs[0]: input_value})
+      for tp in values[:-1]:
+        for v in tp:
+          self.assertTrue(np.all(np.isfinite(v)))
+      for tp in values[-1]:
+        for st in tp:
+          for v in st:
+            self.assertTrue(np.all(np.isfinite(v)))
+
+  def testGrid2LSTMCellLegacy(self):
+    """Test for legacy case (when state_is_tuple=False)."""
+    with self.test_session() as sess:
+      with variable_scope.variable_scope(
+          'root', initializer=init_ops.constant_initializer(0.5)):
+        x = array_ops.zeros([1, 3])
+        m = array_ops.zeros([1, 8])
+        cell = grid_rnn_cell.Grid2LSTMCell(
+            2, use_peepholes=True, state_is_tuple=False, output_is_tuple=False)
+        self.assertEqual(cell.state_size, 8)
+
+        g, s = cell(x, m)
+        self.assertEqual(g.get_shape(), (1, 2))
+        self.assertEqual(s.get_shape(), (1, 8))
+
+        sess.run([variables.global_variables_initializer()])
+        res = sess.run([g, s], {
+            x: np.array([[1., 1., 1.]]),
+            m: np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]])
+        })
+        self.assertEqual(res[0].shape, (1, 2))
+        self.assertEqual(res[1].shape, (1, 8))
+        self.assertAllClose(res[0], [[0.95686918, 0.95686918]])
+        self.assertAllClose(res[1], [[
+            2.41515064, 2.41515064, 0.95686918, 0.95686918, 1.38917875,
+            1.49043763, 0.83884692, 0.86036491
+        ]])

 if __name__ == '__main__':
  test.main()
--- a/tensorflow/contrib/grid_rnn/python/ops/grid_rnn_cell.py
+++ b/tensorflow/contrib/grid_rnn/python/ops/grid_rnn_cell.py
@ -25,6 +25,8 @@ from tensorflow.python.ops import array_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import nn
 from tensorflow.python.ops import variable_scope as vs
+
+from tensorflow.python.platform import tf_logging as logging
 from tensorflow.contrib import layers
 from tensorflow.contrib import rnn

@ -53,7 +55,9 @@ class GridRNNCell(rnn.RNNCell):
               non_recurrent_dims=None,
               tied=False,
               cell_fn=None,
-               non_recurrent_fn=None):
+               non_recurrent_fn=None,
+               state_is_tuple=True,
+               output_is_tuple=True):
    """Initialize the parameters of a Grid RNN cell

    Args:
@ -68,26 +72,47 @@ class GridRNNCell(rnn.RNNCell):
      non_recurrent_dims: int or list, List of dimensions that are not
        recurrent.
              The transfer function for non-recurrent dimensions is specified
-                via `non_recurrent_fn`,
-              which is default to be `tensorflow.nn.relu`.
+                via `non_recurrent_fn`, which is
+                default to be `tensorflow.nn.relu`.
      tied: bool, Whether to share the weights among the dimensions of this
        GridRNN cell.
              If there are non-recurrent dimensions in the grid, weights are
-                shared between each
-              group of recurrent and non-recurrent dimensions.
-      cell_fn: function, a function which returns the recurrent cell object. Has
-        to be in the following signature:
-              def cell_func(num_units, input_size):
+                shared between each group of recurrent and non-recurrent
+                dimensions.
+      cell_fn: function, a function which returns the recurrent cell object.
+        Has to be in the following signature:
+              ```
+              def cell_func(num_units):
                # ...
-
+              ```
              and returns an object of type `RNNCell`. If None, LSTMCell with
                default parameters will be used.
+        Note that if you use a custom RNNCell (with `cell_fn`), it is your
+        responsibility to make sure the inner cell use `state_is_tuple=True`.
+
      non_recurrent_fn: a tensorflow Op that will be the transfer function of
        the non-recurrent dimensions
+      state_is_tuple: If True, accepted and returned states are tuples of the
+        states of the recurrent dimensions. If False, they are concatenated
+        along the column axis. The latter behavior will soon be deprecated.
+
+        Note that if you use a custom RNNCell (with `cell_fn`), it is your
+        responsibility to make sure the inner cell use `state_is_tuple=True`.
+
+      output_is_tuple: If True, the output is a tuple of the outputs of the
+        recurrent dimensions. If False, they are concatenated along the
+        column axis. The later behavior will soon be deprecated.

    Raises:
      TypeError: if cell_fn does not return an RNNCell instance.
    """
+    if not state_is_tuple:
+      logging.warning('%s: Using a concatenated state is slower and will '
+                      'soon be deprecated.  Use state_is_tuple=True.', self)
+    if not output_is_tuple:
+      logging.warning('%s: Using a concatenated output is slower and will'
+                      'soon be deprecated.  Use output_is_tuple=True.', self)
+
    if num_dims < 1:
      raise ValueError('dims must be >= 1: {}'.format(num_dims))

@ -96,37 +121,41 @@ class GridRNNCell(rnn.RNNCell):
                                     non_recurrent_fn or nn.relu, tied,
                                     num_units)

-    cell_input_size = (self._config.num_dims - 1) * num_units
+    self._state_is_tuple = state_is_tuple
+    self._output_is_tuple = output_is_tuple
+
    if cell_fn is None:
      my_cell_fn = functools.partial(
-          rnn.LSTMCell,
-          num_units=num_units, input_size=cell_input_size,
-          state_is_tuple=False)
+          rnn.LSTMCell, num_units=num_units, state_is_tuple=state_is_tuple)
    else:
-      my_cell_fn = lambda: cell_fn(num_units, cell_input_size)
+      my_cell_fn = lambda: cell_fn(num_units)
    if tied:
      self._cells = [my_cell_fn()] * num_dims
    else:
      self._cells = [my_cell_fn() for _ in range(num_dims)]
    if not isinstance(self._cells[0], rnn.RNNCell):
-      raise TypeError(
-          'cell_fn must return an RNNCell instance, saw: %s'
-          % type(self._cells[0]))
+      raise TypeError('cell_fn must return an RNNCell instance, saw: %s' %
+                      type(self._cells[0]))

-  @property
-  def input_size(self):
-    # temporarily using num_units as the input_size of each dimension.
-    # The actual input size only determined when this cell get invoked,
-    # so this information can be considered unreliable.
-    return self._config.num_units * len(self._config.inputs)
+    if self._output_is_tuple:
+      self._output_size = tuple(self._cells[0].output_size
+                                for _ in self._config.outputs)
+    else:
+      self._output_size = self._cells[0].output_size * len(self._config.outputs)
+
+    if self._state_is_tuple:
+      self._state_size = tuple(self._cells[0].state_size
+                               for _ in self._config.recurrents)
+    else:
+      self._state_size = self._cell_state_size() * len(self._config.recurrents)

  @property
  def output_size(self):
-    return self._cells[0].output_size * len(self._config.outputs)
+    return self._output_size

  @property
  def state_size(self):
-    return self._cells[0].state_size * len(self._config.recurrents)
+    return self._state_size

  def __call__(self, inputs, state, scope=None):
    """Run one step of GridRNN.
@ -145,76 +174,148 @@ class GridRNNCell(rnn.RNNCell):
      - A 2D, batch x state_size, Tensor representing the new state of the cell
        after reading "inputs" when previous state was "state".
    """
-    state_sz = state.get_shape().as_list()[1]
-    if self.state_size != state_sz:
-      raise ValueError(
-          'Actual state size not same as specified: {} vs {}.'.format(
-              state_sz, self.state_size))
-
    conf = self._config
-    dtype = inputs.dtype if inputs is not None else state.dtype
+    dtype = inputs.dtype

-    # c_prev is `m`, and m_prev is `h` in the paper.
-    # Keep c and m here for consistency with the codebase
-    c_prev = [None] * self._config.num_dims
-    m_prev = [None] * self._config.num_dims
-    cell_output_size = self._cells[0].state_size - conf.num_units
-
-    # for LSTM   : state = memory cell + output, hence cell_output_size > 0
-    # for GRU/RNN: state = output (whose size is equal to _num_units),
-    #              hence cell_output_size = 0
-    for recurrent_dim, start_idx in zip(self._config.recurrents, range(
-        0, self.state_size, self._cells[0].state_size)):
-      if cell_output_size > 0:
-        c_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
-                                                [-1, conf.num_units])
-        m_prev[recurrent_dim] = array_ops.slice(
-            state, [0, start_idx + conf.num_units], [-1, cell_output_size])
-      else:
-        m_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
-                                                [-1, conf.num_units])
+    c_prev, m_prev, cell_output_size = self._extract_states(state)

    new_output = [None] * conf.num_dims
    new_state = [None] * conf.num_dims

    with vs.variable_scope(scope or type(self).__name__):  # GridRNNCell
+      # project input, populate c_prev and m_prev
+      self._project_input(inputs, c_prev, m_prev, cell_output_size > 0)

-      # project input
-      if inputs is not None and sum(inputs.get_shape().as_list()) > 0 and len(
-          conf.inputs) > 0:
-        input_splits = array_ops.split(
-            value=inputs, num_or_size_splits=len(conf.inputs), axis=1)
-        input_sz = input_splits[0].get_shape().as_list()[1]
-
-        for i, j in enumerate(conf.inputs):
-          input_project_m = vs.get_variable(
-              'project_m_{}'.format(j), [input_sz, conf.num_units], dtype=dtype)
-          m_prev[j] = math_ops.matmul(input_splits[i], input_project_m)
-
-          if cell_output_size > 0:
-            input_project_c = vs.get_variable(
-                'project_c_{}'.format(j), [input_sz, conf.num_units],
-                dtype=dtype)
-            c_prev[j] = math_ops.matmul(input_splits[i], input_project_c)
-
+      # propagate along dimensions, first for non-priority dimensions
+      # then priority dimensions
      _propagate(conf.non_priority, conf, self._cells, c_prev, m_prev,
                 new_output, new_state, True)
      _propagate(conf.priority, conf, self._cells,
                 c_prev, m_prev, new_output, new_state, False)

+      # collect outputs and states
      output_tensors = [new_output[i] for i in self._config.outputs]
-      output = array_ops.zeros(
-          [0, 0], dtype) if len(output_tensors) == 0 else array_ops.concat(
-              output_tensors, 1)
+      if self._output_is_tuple:
+        output = tuple(output_tensors)
+      else:
+        if output_tensors:
+          output = array_ops.concat(output_tensors, 1)
+        else:
+          output = array_ops.zeros([0, 0], dtype)

-      state_tensors = [new_state[i] for i in self._config.recurrents]
-      states = array_ops.zeros(
-          [0, 0],
-          dtype) if len(state_tensors) == 0 else array_ops.concat(state_tensors,
-                                                                  1)
+      if self._state_is_tuple:
+        states = tuple(new_state[i] for i in self._config.recurrents)
+      else:
+        # concat each state first, then flatten the whole thing
+        state_tensors = [
+            x for i in self._config.recurrents for x in new_state[i]
+        ]
+        if state_tensors:
+          states = array_ops.concat(state_tensors, 1)
+        else:
+          states = array_ops.zeros([0, 0], dtype)

    return output, states

+  def _extract_states(self, state):
+    """Extract the cell and previous output tensors from the given state.
+
+    Args:
+      state: The RNN state.
+
+    Returns:
+      Tuple of the cell value, previous output, and cell_output_size.
+
+    Raises:
+      ValueError: If len(self._config.recurrents) != len(state).
+    """
+    conf = self._config
+
+    # c_prev is `m` (cell value), and
+    # m_prev is `h` (previous output) in the paper.
+    # Keeping c and m here for consistency with the codebase
+    c_prev = [None] * conf.num_dims
+    m_prev = [None] * conf.num_dims
+
+    # for LSTM   : state = memory cell + output, hence cell_output_size > 0
+    # for GRU/RNN: state = output (whose size is equal to _num_units),
+    #              hence cell_output_size = 0
+    total_cell_state_size = self._cell_state_size()
+    cell_output_size = total_cell_state_size - conf.num_units
+
+    if self._state_is_tuple:
+      if len(conf.recurrents) != len(state):
+        raise ValueError('Expected state as a tuple of {} '
+                         'element'.format(len(conf.recurrents)))
+
+      for recurrent_dim, recurrent_state in zip(conf.recurrents, state):
+        if cell_output_size > 0:
+          c_prev[recurrent_dim], m_prev[recurrent_dim] = recurrent_state
+        else:
+          m_prev[recurrent_dim] = recurrent_state
+    else:
+      for recurrent_dim, start_idx in zip(conf.recurrents,
+                                          range(0, self.state_size,
+                                                total_cell_state_size)):
+        if cell_output_size > 0:
+          c_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
+                                                  [-1, conf.num_units])
+          m_prev[recurrent_dim] = array_ops.slice(
+              state, [0, start_idx + conf.num_units], [-1, cell_output_size])
+        else:
+          m_prev[recurrent_dim] = array_ops.slice(state, [0, start_idx],
+                                                  [-1, conf.num_units])
+    return c_prev, m_prev, cell_output_size
+
+  def _project_input(self, inputs, c_prev, m_prev, with_c):
+    """Fills in c_prev and m_prev with projected input, for input dimensions.
+
+    Args:
+      inputs: inputs tensor
+      c_prev: cell value
+      m_prev: previous output
+      with_c: boolean; whether to include project_c.
+
+    Raises:
+      ValueError: if len(self._config.input) != len(inputs)
+    """
+    conf = self._config
+
+    if (inputs is not None and inputs.get_shape().with_rank(2)[1].value > 0 and
+        conf.inputs):
+      if isinstance(inputs, tuple):
+        if len(conf.inputs) != len(inputs):
+          raise ValueError('Expect inputs as a tuple of {} '
+                           'tensors'.format(len(conf.inputs)))
+        input_splits = inputs
+      else:
+        input_splits = array_ops.split(
+            value=inputs, num_or_size_splits=len(conf.inputs), axis=1)
+      input_sz = input_splits[0].get_shape().with_rank(2)[1].value
+
+      for i, j in enumerate(conf.inputs):
+        input_project_m = vs.get_variable(
+            'project_m_{}'.format(j), [input_sz, conf.num_units],
+            dtype=inputs.dtype)
+        m_prev[j] = math_ops.matmul(input_splits[i], input_project_m)
+
+        if with_c:
+          input_project_c = vs.get_variable(
+              'project_c_{}'.format(j), [input_sz, conf.num_units],
+              dtype=inputs.dtype)
+          c_prev[j] = math_ops.matmul(input_splits[i], input_project_c)
+
+  def _cell_state_size(self):
+    """Total size of the state of the inner cell used in this grid.
+
+    Returns:
+      Total size of the state of the inner cell.
+    """
+    state_sizes = self._cells[0].state_size
+    if isinstance(state_sizes, tuple):
+      return sum(state_sizes)
+    return state_sizes
+

 """Specialized cells, for convenience
 """
@ -223,11 +324,17 @@ class GridRNNCell(rnn.RNNCell):
 class Grid1BasicRNNCell(GridRNNCell):
  """1D BasicRNN cell"""

-  def __init__(self, num_units):
+  def __init__(self, num_units, state_is_tuple=True, output_is_tuple=True):
    super(Grid1BasicRNNCell, self).__init__(
-        num_units=num_units, num_dims=1,
-        input_dims=0, output_dims=0, priority_dims=0, tied=False,
-        cell_fn=lambda n, i: rnn.BasicRNNCell(num_units=n, input_size=i))
+        num_units=num_units,
+        num_dims=1,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=False,
+        cell_fn=lambda n: rnn.BasicRNNCell(num_units=n),
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid2BasicRNNCell(GridRNNCell):
@ -240,71 +347,112 @@ class Grid2BasicRNNCell(GridRNNCell):
  specified.
  """

-  def __init__(self, num_units, tied=False, non_recurrent_fn=None):
+  def __init__(self,
+               num_units,
+               tied=False,
+               non_recurrent_fn=None,
+               state_is_tuple=True,
+               output_is_tuple=True):
    super(Grid2BasicRNNCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
        non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.BasicRNNCell(num_units=n, input_size=i),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=lambda n: rnn.BasicRNNCell(num_units=n),
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid1BasicLSTMCell(GridRNNCell):
-  """1D BasicLSTM cell"""
+  """1D BasicLSTM cell."""

-  def __init__(self, num_units, forget_bias=1):
+  def __init__(self,
+               num_units,
+               forget_bias=1,
+               state_is_tuple=True,
+               output_is_tuple=True):
+    def cell_fn(n):
+      return rnn.BasicLSTMCell(num_units=n, forget_bias=forget_bias)
    super(Grid1BasicLSTMCell, self).__init__(
-        num_units=num_units, num_dims=1,
-        input_dims=0, output_dims=0, priority_dims=0, tied=False,
-        cell_fn=lambda n, i: rnn.BasicLSTMCell(
-            num_units=n,
-            forget_bias=forget_bias, input_size=i,
-            state_is_tuple=False))
+        num_units=num_units,
+        num_dims=1,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=False,
+        cell_fn=cell_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid2BasicLSTMCell(GridRNNCell):
-  """2D BasicLSTM cell
+  """2D BasicLSTM cell.

-    This creates a 2D cell which receives input and gives output in the first
-    dimension.
+  This creates a 2D cell which receives input and gives output in the first
+  dimension.

-    The first dimension can optionally be non-recurrent if `non_recurrent_fn` is
-    specified.
+  The first dimension can optionally be non-recurrent if `non_recurrent_fn` is
+  specified.
  """

  def __init__(self,
               num_units,
               tied=False,
               non_recurrent_fn=None,
-               forget_bias=1):
+               forget_bias=1,
+               state_is_tuple=True,
+               output_is_tuple=True):
+    def cell_fn(n):
+      return rnn.BasicLSTMCell(num_units=n, forget_bias=forget_bias)
    super(Grid2BasicLSTMCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
        non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.BasicLSTMCell(
-            num_units=n, forget_bias=forget_bias, input_size=i,
-            state_is_tuple=False),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=cell_fn,
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid1LSTMCell(GridRNNCell):
-  """1D LSTM cell
+  """1D LSTM cell.

-    This is different from Grid1BasicLSTMCell because it gives options to
-    specify the forget bias and enabling peepholes
+  This is different from Grid1BasicLSTMCell because it gives options to
+  specify the forget bias and enabling peepholes.
  """

-  def __init__(self, num_units, use_peepholes=False, forget_bias=1.0):
+  def __init__(self,
+               num_units,
+               use_peepholes=False,
+               forget_bias=1.0,
+               state_is_tuple=True,
+               output_is_tuple=True):
+
+    def cell_fn(n):
+      return rnn.LSTMCell(
+          num_units=n, forget_bias=forget_bias, use_peepholes=use_peepholes)
+
    super(Grid1LSTMCell, self).__init__(
-        num_units=num_units, num_dims=1,
-        input_dims=0, output_dims=0, priority_dims=0,
-        cell_fn=lambda n, i: rnn.LSTMCell(
-            num_units=n, input_size=i, use_peepholes=use_peepholes,
-            forget_bias=forget_bias, state_is_tuple=False))
+        num_units=num_units,
+        num_dims=1,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        cell_fn=cell_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid2LSTMCell(GridRNNCell):
-  """2D LSTM cell
+  """2D LSTM cell.

    This creates a 2D cell which receives input and gives output in the first
    dimension.
@ -317,19 +465,30 @@ class Grid2LSTMCell(GridRNNCell):
               tied=False,
               non_recurrent_fn=None,
               use_peepholes=False,
-               forget_bias=1.0):
+               forget_bias=1.0,
+               state_is_tuple=True,
+               output_is_tuple=True):
+
+    def cell_fn(n):
+      return rnn.LSTMCell(
+          num_units=n, forget_bias=forget_bias, use_peepholes=use_peepholes)
+
    super(Grid2LSTMCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
        non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.LSTMCell(
-            num_units=n, input_size=i, forget_bias=forget_bias,
-            use_peepholes=use_peepholes, state_is_tuple=False),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=cell_fn,
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid3LSTMCell(GridRNNCell):
-  """3D BasicLSTM cell
+  """3D BasicLSTM cell.

    This creates a 2D cell which receives input and gives output in the first
    dimension.
@ -343,19 +502,30 @@ class Grid3LSTMCell(GridRNNCell):
               tied=False,
               non_recurrent_fn=None,
               use_peepholes=False,
-               forget_bias=1.0):
+               forget_bias=1.0,
+               state_is_tuple=True,
+               output_is_tuple=True):
+
+    def cell_fn(n):
+      return rnn.LSTMCell(
+          num_units=n, forget_bias=forget_bias, use_peepholes=use_peepholes)
+
    super(Grid3LSTMCell, self).__init__(
-        num_units=num_units, num_dims=3,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=3,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
        non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.LSTMCell(
-            num_units=n, input_size=i, forget_bias=forget_bias,
-            use_peepholes=use_peepholes, state_is_tuple=False),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=cell_fn,
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


 class Grid2GRUCell(GridRNNCell):
-  """2D LSTM cell
+  """2D LSTM cell.

    This creates a 2D cell which receives input and gives output in the first
    dimension.
@ -363,21 +533,31 @@ class Grid2GRUCell(GridRNNCell):
    specified.
  """

-  def __init__(self, num_units, tied=False, non_recurrent_fn=None):
+  def __init__(self,
+               num_units,
+               tied=False,
+               non_recurrent_fn=None,
+               state_is_tuple=True,
+               output_is_tuple=True):
    super(Grid2GRUCell, self).__init__(
-        num_units=num_units, num_dims=2,
-        input_dims=0, output_dims=0, priority_dims=0, tied=tied,
+        num_units=num_units,
+        num_dims=2,
+        input_dims=0,
+        output_dims=0,
+        priority_dims=0,
+        tied=tied,
        non_recurrent_dims=None if non_recurrent_fn is None else 0,
-        cell_fn=lambda n, i: rnn.GRUCell(num_units=n, input_size=i),
-        non_recurrent_fn=non_recurrent_fn)
+        cell_fn=lambda n: rnn.GRUCell(num_units=n),
+        non_recurrent_fn=non_recurrent_fn,
+        state_is_tuple=state_is_tuple,
+        output_is_tuple=output_is_tuple)


-"""Helpers
-"""
+# Helpers

-_GridRNNDimension = namedtuple(
-    '_GridRNNDimension',
-    ['idx', 'is_input', 'is_output', 'is_priority', 'non_recurrent_fn'])
+_GridRNNDimension = namedtuple('_GridRNNDimension', [
+    'idx', 'is_input', 'is_output', 'is_priority', 'non_recurrent_fn'
+])

 _GridRNNConfig = namedtuple('_GridRNNConfig',
                            ['num_dims', 'dims', 'inputs', 'outputs',
@ -387,7 +567,6 @@ _GridRNNConfig = namedtuple('_GridRNNConfig',

 def _parse_rnn_config(num_dims, ls_input_dims, ls_output_dims, ls_priority_dims,
                      ls_non_recurrent_dims, non_recurrent_fn, tied, num_units):
-
  def check_dim_list(ls):
    if ls is None:
      ls = []
@ -412,8 +591,8 @@ def _parse_rnn_config(num_dims, ls_input_dims, ls_output_dims, ls_priority_dims,
            is_input=(i in input_dims),
            is_output=(i in output_dims),
            is_priority=(i in priority_dims),
-            non_recurrent_fn=non_recurrent_fn if i in non_recurrent_dims else
-            None))
+            non_recurrent_fn=non_recurrent_fn
+            if i in non_recurrent_dims else None))
  return _GridRNNConfig(
      num_dims=num_dims,
      dims=rnn_dims,
@ -440,34 +619,40 @@ def _propagate(dim_indices, conf, cells, c_prev, m_prev, new_output, new_state,
  if conf.num_dims > 1:
    ls_cell_inputs = [None] * (conf.num_dims - 1)
    for d in conf.dims[:-1]:
-      ls_cell_inputs[d.idx] = new_output[d.idx] if new_output[
-          d.idx] is not None else m_prev[d.idx]
+      if new_output[d.idx] is None:
+        ls_cell_inputs[d.idx] = m_prev[d.idx]
+      else:
+        ls_cell_inputs[d.idx] = new_output[d.idx]
    cell_inputs = array_ops.concat(ls_cell_inputs, 1)
  else:
    cell_inputs = array_ops.zeros([m_prev[0].get_shape().as_list()[0], 0],
                                  m_prev[0].dtype)

-  last_dim_output = new_output[-1] if new_output[-1] is not None else m_prev[-1]
+  last_dim_output = (new_output[-1]
+                     if new_output[-1] is not None else m_prev[-1])

  for i in dim_indices:
    d = conf.dims[i]
    if d.non_recurrent_fn:
-      linear_args = array_ops.concat(
-          [cell_inputs, last_dim_output],
-          1) if conf.num_dims > 1 else last_dim_output
+      if conf.num_dims > 1:
+        linear_args = array_ops.concat([cell_inputs, last_dim_output], 1)
+      else:
+        linear_args = last_dim_output
      with vs.variable_scope('non_recurrent' if conf.tied else
                             'non_recurrent/cell_{}'.format(i)):
        if conf.tied and not (first_call and i == dim_indices[0]):
          vs.get_variable_scope().reuse_variables()
-        new_output[d.idx] = layers.legacy_fully_connected(
+
+        new_output[d.idx] = layers.fully_connected(
            linear_args,
-            num_output_units=conf.num_units,
+            num_outputs=conf.num_units,
            activation_fn=d.non_recurrent_fn,
-            weight_init=vs.get_variable_scope().initializer or
-            layers.initializers.xavier_initializer)
+            weights_initializer=(vs.get_variable_scope().initializer or
+                                 layers.initializers.xavier_initializer),
+            weights_regularizer=vs.get_variable_scope().regularizer)
    else:
      if c_prev[i] is not None:
-        cell_state = array_ops.concat([c_prev[i], last_dim_output], 1)
+        cell_state = (c_prev[i], last_dim_output)
      else:
        # for GRU/RNN, the state is just the previous output
        cell_state = last_dim_output
--- a/tensorflow/contrib/image/python/kernel_tests/image_ops_test.py
+++ b/tensorflow/contrib/image/python/kernel_tests/image_ops_test.py
@ -25,6 +25,7 @@ from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import test_util
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import gradient_checker
 from tensorflow.python.ops import math_ops
 from tensorflow.python.platform import googletest

@ -110,6 +111,30 @@ class ImageOpsTest(test_util.TensorFlowTestCase):
                             [0, 1, 0, 1],
                             [0, 1, 1, 1]])

+  def _test_grad(self, shape_to_test):
+    with self.test_session():
+      test_image_shape = shape_to_test
+      test_image = np.random.randn(*test_image_shape)
+      test_image_tensor = constant_op.constant(
+          test_image, shape=test_image_shape)
+      test_transform = image_ops.angles_to_projective_transforms(
+          np.pi / 2, 4, 4)
+
+      output_shape = test_image_shape
+      output = image_ops.transform(test_image_tensor, test_transform)
+      left_err = gradient_checker.compute_gradient_error(
+          test_image_tensor,
+          test_image_shape,
+          output,
+          output_shape,
+          x_init_value=test_image)
+      self.assertLess(left_err, 1e-10)
+
+  def test_grad(self):
+    self._test_grad([16, 16])
+    self._test_grad([4, 12, 12])
+    self._test_grad([3, 4, 12, 12])
+

 if __name__ == "__main__":
  googletest.main()
--- a/tensorflow/contrib/image/python/ops/image_ops.py
+++ b/tensorflow/contrib/image/python/ops/image_ops.py
@ -24,6 +24,7 @@ from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import linalg_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.platform import resource_loader

@ -214,4 +215,41 @@ def _transform_matrices_to_flat(transform_matrices):
  return transforms[:, :8]


-ops.NotDifferentiable("ImageProjectiveTransform")
+@ops.RegisterGradient("ImageProjectiveTransform")
+def _image_projective_transform_grad(op, grad):
+  """Computes the gradient for ImageProjectiveTransform."""
+  images = op.inputs[0]
+  transforms = op.inputs[1]
+
+  image_or_images = ops.convert_to_tensor(images, name="images")
+  transform_or_transforms = ops.convert_to_tensor(
+      transforms, name="transforms", dtype=dtypes.float32)
+
+  if image_or_images.dtype.base_dtype not in _IMAGE_DTYPES:
+    raise TypeError("Invalid dtype %s." % image_or_images.dtype)
+  if len(image_or_images.get_shape()) == 2:
+    images = image_or_images[None, :, :, None]
+  elif len(image_or_images.get_shape()) == 3:
+    images = image_or_images[None, :, :, :]
+  elif len(image_or_images.get_shape()) == 4:
+    images = image_or_images
+  else:
+    raise TypeError("Images should have rank between 2 and 4")
+  if len(transform_or_transforms.get_shape()) == 1:
+    transforms = transform_or_transforms[None]
+  elif len(transform_or_transforms.get_shape()) == 2:
+    transforms = transform_or_transforms
+  else:
+    raise TypeError("Transforms should have rank 1 or 2.")
+
+  # Invert transformations
+  transforms = _flat_transforms_to_matrices(transforms=transforms)
+  inverse = linalg_ops.matrix_inverse(transforms)
+  transforms = _transform_matrices_to_flat(inverse)
+  output = gen_image_ops.image_projective_transform(grad, transforms)
+  if len(image_or_images.get_shape()) == 2:
+    return [output[0, :, :, 0], None]
+  elif len(image_or_images.get_shape()) == 3:
+    return [output[0, :, :, :], None]
+  else:
+    return [output, None]
--- a/tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm
+++ b/tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm
@ -323,10 +323,10 @@ didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
      auto predictions = output->flat<float>();

      NSMutableDictionary *newValues = [NSMutableDictionary dictionary];
-      for (int index = 0; index < predictions.size(); index += 1) {
+      for (int index = 0; index < predictions.size(); ++index) {
        const float predictionValue = predictions(index);
        if (predictionValue > 0.05f) {
-          std::string label = labels[index % predictions.size()];
+          std::string label = labels[index];
          NSString *labelObject = [NSString stringWithUTF8String:label.c_str()];
          NSNumber *valueObject = [NSNumber numberWithFloat:predictionValue];
          [newValues setObject:valueObject forKey:labelObject];
--- a/tensorflow/contrib/keras/python/keras/initializers_test.py
+++ b/tensorflow/contrib/keras/python/keras/initializers_test.py
@ -120,7 +120,7 @@ class KerasInitializersTest(test.TestCase):
                   target_mean=0., target_std=None, target_max=2 * scale)

  def test_orthogonal(self):
-    tensor_shape = (10, 10)
+    tensor_shape = (20, 20)
    with self.test_session():
      self._runner(keras.initializers.orthogonal(seed=123), tensor_shape,
                   target_mean=0.)
--- a/tensorflow/contrib/layers/python/layers/encoders.py
+++ b/tensorflow/contrib/layers/python/layers/encoders.py
@ -121,7 +121,7 @@ def embed_sequence(ids,
    `Tensor` of `[batch_size, doc_length, embed_dim]` with embedded sequences.

  Raises:
-    ValueError: if `embed_dim` or `vocab_size` are not specified when not
+    ValueError: if `embed_dim` or `vocab_size` are not specified when
      `reuse` is `None` or `False`.
  """
  if not (reuse or (vocab_size and embed_dim)):
--- a/tensorflow/contrib/layers/python/layers/initializers.py
+++ b/tensorflow/contrib/layers/python/layers/initializers.py
@ -34,9 +34,10 @@ def xavier_initializer(uniform=True, seed=None, dtype=dtypes.float32):
  This function implements the weight initialization from:

  Xavier Glorot and Yoshua Bengio (2010):
-           Understanding the difficulty of training deep feedforward neural
+           [Understanding the difficulty of training deep feedforward neural
           networks. International conference on artificial intelligence and
-           statistics.
+           statistics.](
+           http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)

  This initializer is designed to keep the scale of the gradients roughly the
  same in all layers. In uniform distribution this ends up being the range:
--- a/tensorflow/contrib/learn/init.py
+++ b/tensorflow/contrib/learn/init.py
@ -38,6 +38,7 @@ See the @{$python/contrib.learn} guide.
@@LinearEstimator
@@LinearRegressor
@@LogisticRegressor
+@@StateSavingRnnEstimator
@@SVM
@@SKCompat

--- a/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py
@ -35,11 +35,11 @@ from tensorflow.contrib.learn.python.learn.estimators import prediction_key
 from tensorflow.contrib.learn.python.learn.utils import export
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import control_flow_ops
-from tensorflow.python.ops import logging_ops
 from tensorflow.python.ops import nn
 from tensorflow.python.ops import partitioned_variables
 from tensorflow.python.ops import state_ops
 from tensorflow.python.ops import variable_scope
+from tensorflow.python.summary import summary
 from tensorflow.python.training import sync_replicas_optimizer
 from tensorflow.python.training import training_util

@ -99,10 +99,14 @@ def _linear_learning_rate(num_linear_feature_columns):
  return min(_LINEAR_LEARNING_RATE, default_learning_rate)


+def _add_hidden_layer_summary(value, tag):
+  summary.scalar("%s/fraction_of_zero_values" % tag, nn.zero_fraction(value))
+  summary.histogram("%s/activation" % tag, value)
+
+
 def _add_layer_summary(value, tag):
-  logging_ops.scalar_summary("%s/fraction_of_zero_values" % tag,
-                             nn.zero_fraction(value))
-  logging_ops.histogram_summary("%s/activation" % tag, value)
+  summary.scalar("%s/fraction_of_zero_values" % tag, nn.zero_fraction(value))
+  summary.histogram("%s/activation" % tag, value)


 def _get_embedding_variable(column, collection_key, input_layer_scope):
--- a/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator.py
@ -19,7 +19,6 @@ from __future__ import division
 from __future__ import print_function

 from tensorflow.contrib import layers
-from tensorflow.contrib.framework.python.framework import deprecated
 from tensorflow.contrib.layers.python.layers import optimizers
 from tensorflow.contrib.learn.python.learn.estimators import constants
 from tensorflow.contrib.learn.python.learn.estimators import estimator
@ -540,20 +539,6 @@ def _get_dynamic_rnn_model_fn(
  return _dynamic_rnn_model_fn


-def _get_dropout_and_num_units(num_units,
-                               num_rnn_layers,
-                               input_keep_probability,
-                               output_keep_probability):
-  """Helper function for deprecated factory functions."""
-  dropout_keep_probabilities = None
-  num_units = [num_units for _ in range(num_rnn_layers)]
-  if input_keep_probability or output_keep_probability:
-    dropout_keep_probabilities = ([input_keep_probability]
-                                  + [1.0] * (num_rnn_layers - 1)
-                                  + [output_keep_probability])
-  return dropout_keep_probabilities, num_units
-
-
 class DynamicRnnEstimator(estimator.Estimator):

  def __init__(self,
@ -704,339 +689,3 @@ class DynamicRnnEstimator(estimator.Estimator):
        model_dir=model_dir,
        config=config,
        feature_engineering_fn=feature_engineering_fn)
-
-
-@deprecated('2017-04-01',
-            'multi_value_rnn_regressor is deprecated. '
-            'Please construct a DynamicRnnEstimator directly.')
-def multi_value_rnn_regressor(num_units,
-                              sequence_feature_columns,
-                              context_feature_columns=None,
-                              cell_type='basic_rnn',
-                              num_rnn_layers=1,
-                              optimizer_type='SGD',
-                              learning_rate=0.1,
-                              momentum=None,
-                              gradient_clipping_norm=5.0,
-                              input_keep_probability=None,
-                              output_keep_probability=None,
-                              model_dir=None,
-                              config=None,
-                              feature_engineering_fn=None):
-  """Creates a `DynamicRnnEstimator` for multi-value regression.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a sequence of continuous values.
-
-  Args:
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.LINEAR_REGRESSION,
-      prediction_type=rnn_common.PredictionType.MULTIPLE_VALUE,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
-
-
-@deprecated('2017-04-01',
-            'multi_value_rnn_classifier is deprecated. '
-            'Please construct a DynamicRNNEstimator directly.')
-def multi_value_rnn_classifier(num_classes,
-                               num_units,
-                               sequence_feature_columns,
-                               context_feature_columns=None,
-                               cell_type='basic_rnn',
-                               num_rnn_layers=1,
-                               optimizer_type='SGD',
-                               learning_rate=0.1,
-                               predict_probabilities=False,
-                               momentum=None,
-                               gradient_clipping_norm=5.0,
-                               input_keep_probability=None,
-                               output_keep_probability=None,
-                               model_dir=None,
-                               config=None,
-                               feature_engineering_fn=None):
-  """Creates a `DynamicRNNEstimator` for multi-value classification.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a sequence of classifications, along with
-  (optionally) a probability distribution over classes.
-
-  Args:
-    num_classes: The number of classes for categorization.
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    predict_probabilities: A boolean indicating whether to predict probabilities
-      for all classes.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.CLASSIFICATION,
-      prediction_type=rnn_common.PredictionType.MULTIPLE_VALUE,
-      num_classes=num_classes,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=predict_probabilities,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
-
-
-@deprecated('2017-04-01',
-            'single_value_rnn_regressor is deprecated. '
-            'Please construct a DynamicRnnEstimator directly.')
-def single_value_rnn_regressor(num_units,
-                               sequence_feature_columns,
-                               context_feature_columns=None,
-                               cell_type='basic_rnn',
-                               num_rnn_layers=1,
-                               optimizer_type='SGD',
-                               learning_rate=0.1,
-                               momentum=None,
-                               gradient_clipping_norm=5.0,
-                               input_keep_probability=None,
-                               output_keep_probability=None,
-                               model_dir=None,
-                               config=None,
-                               feature_engineering_fn=None):
-  """Creates a `DynamicRnnEstimator` for single-value regression.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a single continuous values.
-
-  Args:
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.LINEAR_REGRESSION,
-      prediction_type=rnn_common.PredictionType.SINGLE_VALUE,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
-
-
-@deprecated('2017-04-01',
-            'single_value_rnn_classifier is deprecated. '
-            'Please construct a DynamicRnnEstimator directly.')
-def single_value_rnn_classifier(num_classes,
-                                num_units,
-                                sequence_feature_columns,
-                                context_feature_columns=None,
-                                cell_type='basic_rnn',
-                                num_rnn_layers=1,
-                                optimizer_type='SGD',
-                                learning_rate=0.1,
-                                predict_probabilities=False,
-                                momentum=None,
-                                gradient_clipping_norm=5.0,
-                                input_keep_probability=None,
-                                output_keep_probability=None,
-                                model_dir=None,
-                                config=None,
-                                feature_engineering_fn=None):
-  """Creates a `DynamicRnnEstimator` for single-value classification.
-
-  Returns an `Estimator` that given input sequences, processes them in a dynamic
-  recurrent network and outputs a single classifications, along with
-  (optionally) a probability distribution over classes.
-
-  Args:
-    num_classes: The number of classes for categorization.
-    num_units: The size of the RNN cells.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    cell_type: A subclass of `RNNCell` or one of 'basic_rnn,' 'lstm' or 'gru'.
-    num_rnn_layers: Number of RNN layers. Leave this at its default value 1
-      if passing a `cell_type` that is already a MultiRNNCell.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer`, a callback that returns an
-      optimizer, or a string. Strings must be one of 'Adagrad', 'Adam',
-      'Ftrl', 'Momentum', 'RMSProp' or 'SGD. See `layers.optimize_loss` for
-      more details.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    predict_probabilities: A boolean indicating whether to predict probabilities
-      for all classes.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    input_keep_probability: Probability to keep inputs to `cell`. If `None`,
-      no dropout is applied.
-    output_keep_probability: Probability to keep outputs of `cell`. If `None`,
-      no dropout is applied.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-  Returns:
-    An initialized `Estimator`.
-  """
-  dropout_keep_probabilities, num_units = _get_dropout_and_num_units(
-      num_units,
-      num_rnn_layers,
-      input_keep_probability,
-      output_keep_probability)
-  return DynamicRnnEstimator(
-      problem_type=constants.ProblemType.CLASSIFICATION,
-      prediction_type=rnn_common.PredictionType.SINGLE_VALUE,
-      num_classes=num_classes,
-      sequence_feature_columns=sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_units=num_units,
-      cell_type=cell_type,
-      optimizer=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=predict_probabilities,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      feature_engineering_fn=feature_engineering_fn,
-      config=config)
--- a/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator_test.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/dynamic_rnn_estimator_test.py
@ -410,56 +410,6 @@ class DynamicRnnEstimatorTest(test.TestCase):
      state_piece = prediction_dict[dynamic_rnn_estimator._get_state_name(i)]
      self.assertListEqual(list(state_piece.shape), [batch_size, state_size])

-  def testLegacyConstructor(self):
-    """Exercise legacy constructor function."""
-    num_units = 16
-    num_layers = 6
-    output_keep_prob = 0.9
-    input_keep_prob = 0.7
-    batch_size = 11
-    learning_rate = 0.1
-    train_sequence_length = 21
-    train_steps = 121
-
-    def get_input_fn(batch_size, sequence_length, state_dict, starting_step=0):
-
-      def input_fn():
-        sequence = constant_op.constant(
-            [[(starting_step + i + j) % 2 for j in range(sequence_length + 1)]
-             for i in range(batch_size)],
-            dtype=dtypes.int32)
-        labels = array_ops.slice(sequence, [0, 0],
-                                 [batch_size, sequence_length])
-        inputs = array_ops.expand_dims(
-            math_ops.to_float(
-                array_ops.slice(sequence, [0, 1], [batch_size, sequence_length
-                                                  ])), 2)
-        input_dict = state_dict
-        input_dict['inputs'] = inputs
-        return input_dict, labels
-
-      return input_fn
-
-    seq_columns = [feature_column.real_valued_column('inputs', dimension=1)]
-    config = run_config.RunConfig(tf_random_seed=21212)
-
-    model_dir = tempfile.mkdtemp()
-    sequence_estimator = dynamic_rnn_estimator.multi_value_rnn_classifier(
-        num_classes=2,
-        num_units=num_units,
-        num_rnn_layers=num_layers,
-        input_keep_probability=input_keep_prob,
-        output_keep_probability=output_keep_prob,
-        sequence_feature_columns=seq_columns,
-        learning_rate=learning_rate,
-        config=config,
-        model_dir=model_dir)
-
-    train_input_fn = get_input_fn(
-        batch_size, train_sequence_length, state_dict={})
-
-    sequence_estimator.fit(input_fn=train_input_fn, steps=train_steps)
-
  def testMultipleRuns(self):
    """Tests resuming training by feeding state."""
    cell_sizes = [4, 7]
--- a/tensorflow/contrib/learn/python/learn/estimators/kmeans.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/kmeans.py
@ -27,11 +27,11 @@ from tensorflow.contrib.learn.python.learn.estimators import estimator
 from tensorflow.contrib.learn.python.learn.estimators.model_fn import ModelFnOps
 from tensorflow.python.framework import ops
 from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import logging_ops
 from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import state_ops
 from tensorflow.python.ops.control_flow_ops import with_dependencies
 from tensorflow.python.platform import tf_logging as logging
+from tensorflow.python.summary import summary
 from tensorflow.python.training import session_run_hook
 from tensorflow.python.training.session_run_hook import SessionRunArgs

@ -118,7 +118,7 @@ def _kmeans_clustering_model_fn(features, labels, mode, params, config):
           'kmeans_plus_plus_num_retries')).training_graph()
  incr_step = state_ops.assign_add(variables.get_global_step(), 1)
  loss = math_ops.reduce_sum(losses, name=KMeansClustering.LOSS_OP_NAME)
-  logging_ops.scalar_summary('loss/raw', loss)
+  summary.scalar('loss/raw', loss)
  training_op = with_dependencies([training_op, incr_step], loss)
  predictions = {
      KMeansClustering.ALL_SCORES: all_scores[0],
@ -257,4 +257,3 @@ class KMeansClustering(estimator.Estimator):
  def clusters(self):
    """Returns cluster centers."""
    return super(KMeansClustering, self).get_variable_value(self.CLUSTERS)
-
--- a/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator.py
@ -20,7 +20,6 @@ from __future__ import print_function

 from tensorflow.contrib import layers
 from tensorflow.contrib import rnn as rnn_cell
-from tensorflow.contrib.framework.python.framework import deprecated
 from tensorflow.contrib.layers.python.layers import feature_column_ops
 from tensorflow.contrib.layers.python.layers import optimizers
 from tensorflow.contrib.learn.python.learn.estimators import constants
@ -652,180 +651,3 @@ class StateSavingRnnEstimator(estimator.Estimator):
        model_dir=model_dir,
        config=config,
        feature_engineering_fn=feature_engineering_fn)
-
-
-@deprecated('2017-04-01', 'multi_value_rnn_regressor is deprecated. '
-            'Please construct a StateSavingRnnEstimator directly.')
-def multi_value_rnn_regressor(num_units,
-                              num_unroll,
-                              batch_size,
-                              sequence_feature_columns,
-                              context_feature_columns=None,
-                              num_rnn_layers=1,
-                              optimizer_type='SGD',
-                              learning_rate=0.1,
-                              momentum=None,
-                              gradient_clipping_norm=5.0,
-                              dropout_keep_probabilities=None,
-                              model_dir=None,
-                              config=None,
-                              feature_engineering_fn=None,
-                              num_threads=3,
-                              queue_capacity=1000,
-                              seed=None):
-  """Creates a RNN `Estimator` that predicts sequences of values.
-
-  Args:
-    num_units: The size of the RNN cells.
-    num_unroll: Python integer, how many time steps to unroll at a time.
-      The input sequences of length `k` are then split into `k / num_unroll`
-      many segments.
-    batch_size: Python integer, the size of the minibatch.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    num_rnn_layers: Number of RNN layers.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer` or a string. Strings must be
-      one of 'Adagrad', 'Momentum' or 'SGD'.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    dropout_keep_probabilities: a list of dropout keep probabilities or `None`.
-        If given a list, it must have length `num_rnn_layers + 1`.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-    num_threads: The Python integer number of threads enqueuing input examples
-      into a queue. Defaults to 3.
-    queue_capacity: The max capacity of the queue in number of examples.
-      Needs to be at least `batch_size`. Defaults to 1000. When iterating
-      over the same input example multiple times reusing their keys the
-      `queue_capacity` must be smaller than the number of examples.
-    seed: Fixes the random seed used for generating input keys by the SQSS.
-  Returns:
-    An initialized `Estimator`.
-  """
-  num_units = [num_units for _ in range(num_rnn_layers)]
-  return StateSavingRnnEstimator(
-      constants.ProblemType.LINEAR_REGRESSION,
-      num_unroll,
-      batch_size,
-      sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_classes=None,
-      num_units=num_units,
-      cell_type='lstm',
-      optimizer_type=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=False,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      config=config,
-      feature_engineering_fn=feature_engineering_fn,
-      num_threads=num_threads,
-      queue_capacity=queue_capacity,
-      seed=seed)
-
-
-@deprecated('2017-04-01', 'multi_value_rnn_classifier is deprecated. '
-            'Please construct a StateSavingRnnEstimator directly.')
-def multi_value_rnn_classifier(num_classes,
-                               num_units,
-                               num_unroll,
-                               batch_size,
-                               sequence_feature_columns,
-                               context_feature_columns=None,
-                               num_rnn_layers=1,
-                               optimizer_type='SGD',
-                               learning_rate=0.1,
-                               predict_probabilities=False,
-                               momentum=None,
-                               gradient_clipping_norm=5.0,
-                               dropout_keep_probabilities=None,
-                               model_dir=None,
-                               config=None,
-                               feature_engineering_fn=None,
-                               num_threads=3,
-                               queue_capacity=1000,
-                               seed=None):
-  """Creates a RNN `Estimator` that predicts sequences of labels.
-
-  Args:
-    num_classes: The number of classes for categorization.
-    num_units: The size of the RNN cells.
-    num_unroll: Python integer, how many time steps to unroll at a time.
-      The input sequences of length `k` are then split into `k / num_unroll`
-      many segments.
-    batch_size: Python integer, the size of the minibatch.
-    sequence_feature_columns: An iterable containing all the feature columns
-      describing sequence features. All items in the set should be instances
-      of classes derived from `FeatureColumn`.
-    context_feature_columns: An iterable containing all the feature columns
-      describing context features, i.e., features that apply accross all time
-      steps. All items in the set should be instances of classes derived from
-      `FeatureColumn`.
-    num_rnn_layers: Number of RNN layers.
-    optimizer_type: The type of optimizer to use. Either a subclass of
-      `Optimizer`, an instance of an `Optimizer` or a string. Strings must be
-      one of 'Adagrad', 'Momentum' or 'SGD'.
-    learning_rate: Learning rate. This argument has no effect if `optimizer`
-      is an instance of an `Optimizer`.
-    predict_probabilities: A boolean indicating whether to predict probabilities
-      for all classes.
-    momentum: Momentum value. Only used if `optimizer_type` is 'Momentum'.
-    gradient_clipping_norm: Parameter used for gradient clipping. If `None`,
-      then no clipping is performed.
-    dropout_keep_probabilities: a list of dropout keep probabilities or `None`.
-        If given a list, it must have length `num_rnn_layers + 1`.
-    model_dir: The directory in which to save and restore the model graph,
-      parameters, etc.
-    config: A `RunConfig` instance.
-    feature_engineering_fn: Takes features and labels which are the output of
-      `input_fn` and returns features and labels which will be fed into
-      `model_fn`. Please check `model_fn` for a definition of features and
-      labels.
-    num_threads: The Python integer number of threads enqueuing input examples
-      into a queue. Defaults to 3.
-    queue_capacity: The max capacity of the queue in number of examples.
-      Needs to be at least `batch_size`. Defaults to 1000. When iterating
-      over the same input example multiple times reusing their keys the
-      `queue_capacity` must be smaller than the number of examples.
-    seed: Fixes the random seed used for generating input keys by the SQSS.
-  Returns:
-    An initialized `Estimator`.
-  """
-  num_units = [num_units for _ in range(num_rnn_layers)]
-  return StateSavingRnnEstimator(
-      constants.ProblemType.CLASSIFICATION,
-      num_unroll,
-      batch_size,
-      sequence_feature_columns,
-      context_feature_columns=context_feature_columns,
-      num_classes=num_classes,
-      num_units=num_units,
-      cell_type='lstm',
-      optimizer_type=optimizer_type,
-      learning_rate=learning_rate,
-      predict_probabilities=predict_probabilities,
-      momentum=momentum,
-      gradient_clipping_norm=gradient_clipping_norm,
-      dropout_keep_probabilities=dropout_keep_probabilities,
-      model_dir=model_dir,
-      config=config,
-      feature_engineering_fn=feature_engineering_fn,
-      num_threads=num_threads,
-      queue_capacity=queue_capacity,
-      seed=seed)
--- a/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator_test.py
+++ b/tensorflow/contrib/learn/python/learn/estimators/state_saving_rnn_estimator_test.py
@ -455,56 +455,6 @@ class LegacyConstructorTest(test.TestCase):
      return {'inputs': inputs}, labels
    return input_fn

-  def testClassifierConstructor(self):
-    batch_size = 16
-    num_classes = 2
-    num_unroll = 32
-    sequence_length = 32
-    num_units = 4
-    learning_rate = 0.5
-    steps = 100
-    input_fn = self._get_input_fn(sequence_length,
-                                  seed=1234)
-    model_dir = tempfile.mkdtemp()
-    seq_columns = [
-        feature_column.real_valued_column(
-            'inputs', dimension=num_units)
-    ]
-    estimator = ssre.multi_value_rnn_classifier(num_classes,
-                                                num_units,
-                                                num_unroll,
-                                                batch_size,
-                                                seq_columns,
-                                                learning_rate=learning_rate,
-                                                model_dir=model_dir,
-                                                queue_capacity=batch_size+2,
-                                                seed=1234)
-    estimator.fit(input_fn=input_fn, steps=steps)
-
-  def testRegressorConstructor(self):
-    batch_size = 16
-    num_unroll = 32
-    sequence_length = 32
-    num_units = 4
-    learning_rate = 0.5
-    steps = 100
-    input_fn = self._get_input_fn(sequence_length,
-                                  seed=4321)
-    model_dir = tempfile.mkdtemp()
-    seq_columns = [
-        feature_column.real_valued_column(
-            'inputs', dimension=num_units)
-    ]
-    estimator = ssre.multi_value_rnn_regressor(num_units,
-                                               num_unroll,
-                                               batch_size,
-                                               seq_columns,
-                                               learning_rate=learning_rate,
-                                               model_dir=model_dir,
-                                               queue_capacity=batch_size+2,
-                                               seed=1234)
-    estimator.fit(input_fn=input_fn, steps=steps)
-

 # TODO(jtbates): move all tests below to a benchmark test.
 class StateSavingRNNEstimatorLearningTest(test.TestCase):
--- a/tensorflow/contrib/rnn/init.py
+++ b/tensorflow/contrib/rnn/init.py
@ -42,6 +42,8 @@ See @{$python/contrib.rnn} guide.
@@GridLSTMCell
@@BidirectionalGridLSTMCell
@@NASCell
+@@UGRNNCell
+@@IntersectionRNNCell
@@PhasedLSTMCell
@@HighwayWrapper

--- a/tensorflow/contrib/signal/BUILD
+++ b/tensorflow/contrib/signal/BUILD
@ -0,0 +1,46 @@
+package(default_visibility = ["//tensorflow:__subpackages__"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("//tensorflow:tensorflow.bzl", "cuda_py_tests")
+
+py_library(
+    name = "signal_py",
+    srcs = ["__init__.py"] + glob(["python/ops/*.py"]),
+    srcs_version = "PY2AND3",
+    deps = [
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:math_ops",
+    ],
+)
+
+cuda_py_tests(
+    name = "shape_ops_test",
+    size = "small",
+    srcs = ["python/kernel_tests/shape_ops_test.py"],
+    additional_deps = [
+        ":signal_py",
+        "//third_party/py/numpy",
+        "//tensorflow/python:array_ops",
+        "//tensorflow/python:client_testlib",
+        "//tensorflow/python:framework",
+        "//tensorflow/python:framework_for_generated_wrappers",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
--- a/tensorflow/contrib/signal/init.py
+++ b/tensorflow/contrib/signal/init.py
@ -0,0 +1,27 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""##Signal ops.
+
+@@frames
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.contrib.signal.python.ops.shape_ops import frames
+
+from tensorflow.python.util.all_util import remove_undocumented
+remove_undocumented(__name__)
--- a/tensorflow/contrib/signal/python/init.py
+++ b/tensorflow/contrib/signal/python/init.py
@ -0,0 +1,19 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Signal ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
--- a/tensorflow/contrib/signal/python/kernel_tests/shape_ops_test.py
+++ b/tensorflow/contrib/signal/python/kernel_tests/shape_ops_test.py
@ -0,0 +1,68 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for shape_ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.contrib.signal.python.ops import shape_ops
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import dtypes
+from tensorflow.python.ops import array_ops
+from tensorflow.python.platform import test
+
+
+class FramesTest(test.TestCase):
+
+  def test_mapping_of_indices_without_padding(self):
+    with self.test_session():
+      tensor = constant_op.constant(np.arange(9152), dtypes.int32)
+      tensor = array_ops.expand_dims(tensor, 0)
+
+      result = shape_ops.frames(tensor, 512, 180)
+      result = result.eval()
+
+      expected = np.tile(np.arange(512), (49, 1))
+      expected += np.tile(np.arange(49) * 180, (512, 1)).T
+
+      expected = np.expand_dims(expected, axis=0)
+      expected = np.array(expected, dtype=np.int32)
+
+      self.assertAllEqual(expected, result)
+
+  def test_mapping_of_indices_with_padding(self):
+    with self.test_session():
+      tensor = constant_op.constant(np.arange(10000), dtypes.int32)
+      tensor = array_ops.expand_dims(tensor, 0)
+
+      result = shape_ops.frames(tensor, 512, 192)
+      result = result.eval()
+
+      expected = np.tile(np.arange(512), (51, 1))
+      expected += np.tile(np.arange(51) * 192, (512, 1)).T
+
+      expected[expected >= 10000] = 0
+
+      expected = np.expand_dims(expected, axis=0)
+      expected = np.array(expected, dtype=np.int32)
+
+      self.assertAllEqual(expected, result)
+
+
+if __name__ == "__main__":
+  test.main()
--- a/tensorflow/contrib/signal/python/ops/init.py
+++ b/tensorflow/contrib/signal/python/ops/init.py
@ -0,0 +1,19 @@
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Signal ops."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
--- a/tensorflow/contrib/signal/python/ops/shape_ops.py
+++ b/tensorflow/contrib/signal/python/ops/shape_ops.py
@ -0,0 +1,87 @@
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""General shape ops for frames."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+
+
+def frames(signal, frame_length, frame_step, name=None):
+  """Frame a signal into overlapping frames.
+
+  May be used in front of spectral functions.
+
+  For example:
+
+  ```python
+  pcm = tf.placeholder(tf.float32, [None, 9152])
+  frames = tf.contrib.signal.frames(pcm, 512, 180)
+  magspec = tf.abs(tf.spectral.rfft(frames, [512]))
+  image = tf.expand_dims(magspec, 3)
+  ```
+
+  Args:
+    signal: A `Tensor` of shape `[batch_size, signal_length]`.
+    frame_length: An `int32` or `int64` `Tensor`. The length of each frame.
+    frame_step: An `int32` or `int64` `Tensor`. The step between frames.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of frames with shape `[batch_size, num_frames, frame_length]`.
+
+  Raises:
+    ValueError: if signal does not have rank 2.
+  """
+  with ops.name_scope(name, "frames", [signal, frame_length, frame_step]):
+    signal = ops.convert_to_tensor(signal, name="signal")
+    frame_length = ops.convert_to_tensor(frame_length, name="frame_length")
+    frame_step = ops.convert_to_tensor(frame_step, name="frame_step")
+
+    signal_rank = signal.shape.ndims
+
+    if signal_rank != 2:
+      raise ValueError("expected signal to have rank 2 but was " + signal_rank)
+
+    signal_length = array_ops.shape(signal)[1]
+
+    num_frames = math_ops.ceil((signal_length - frame_length) / frame_step)
+    num_frames = 1 + math_ops.cast(num_frames, dtypes.int32)
+
+    pad_length = (num_frames - 1) * frame_step + frame_length
+    pad_signal = array_ops.pad(signal, [[0, 0], [0,
+                                                 pad_length - signal_length]])
+
+    indices_frame = array_ops.expand_dims(math_ops.range(frame_length), 0)
+    indices_frames = array_ops.tile(indices_frame, [num_frames, 1])
+
+    indices_step = array_ops.expand_dims(
+        math_ops.range(num_frames) * frame_step, 1)
+    indices_steps = array_ops.tile(indices_step, [1, frame_length])
+
+    indices = indices_frames + indices_steps
+
+    # TODO(androbin): remove `transpose` when `gather` gets `axis` support
+    pad_signal = array_ops.transpose(pad_signal)
+    signal_frames = array_ops.gather(pad_signal, indices)
+    signal_frames = array_ops.transpose(signal_frames, perm=[2, 0, 1])
+
+    return signal_frames
--- a/tensorflow/contrib/slim/README.md
+++ b/tensorflow/contrib/slim/README.md
@ -447,7 +447,7 @@ vgg = tf.contrib.slim.nets.vgg
 images, labels = ...

 # Create the model.
-predictions = vgg.vgg_16(images)
+predictions, _ = vgg.vgg_16(images)

 # Define the loss functions and get the total loss.
 loss = slim.losses.softmax_cross_entropy(predictions, labels)
--- a/tensorflow/contrib/slim/python/slim/data/README.md
+++ b/tensorflow/contrib/slim/python/slim/data/README.md
@ -71,27 +71,27 @@ for item in data_decoder.list_items():
  print(item)
 ```

-## Example: TFExampleDataDecoder
+## Example: TFExampleDecoder

 The
-[tfexample_data_decoder.py](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/tfexample_data_decoder.py)
+[tfexample_decoder.py](https://www.tensorflow.org/code/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py)
 is a data decoder which decodes serialized `TFExample` protocol buffers. A
 `TFExample` protocol buffer is a map from keys (strings) to either a
 `tf.FixedLenFeature` or `tf.VarLenFeature`. Consequently, to decode a
 `TFExample`, one must provide a mapping from one or more `TFExample` fields
-to each of the `items` that the `tfexample_data_decoder` can provide. For
+to each of the `items` that the `tfexample_decoder` can provide. For
 example, a dataset of `TFExamples` might store images in various formats and
 each `TFExample` might contain an `encoding` key and a `format` key which can
 be used to decode the image using the appropriate decoder (jpg, png, etc).

-To make this possible, the `tfexample_data_decoder` is constructed by specifying
+To make this possible, the `tfexample_decoder` is constructed by specifying
 the a map of `TFExample` keys to either `tf.FixedLenFeature` or
 `tf.VarLenFeature` as well as a set of `ItemHandlers`. An `ItemHandler`
 provides a mapping from `TFExample` keys to the item being provided. Because a
-`tfexample_data_decoder` might return multiple `items`, one often constructs a
-`tfexample_data_decoder` using multiple `ItemHandlers`.
+`tfexample_decoder` might return multiple `items`, one often constructs a
+`tfexample_decoder` using multiple `ItemHandlers`.

-`tfexample_data_decoder` provides some predefined `ItemHandlers` which take care
+`tfexample_decoder` provides some predefined `ItemHandlers` which take care
 of the common cases of mapping `TFExamples` to images, `Tensors` and
 `SparseTensors`. For example, the following specification might be
 used to decode a dataset of images:
--- a/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py
+++ b/tensorflow/contrib/slim/python/slim/nets/resnet_v2.py
@ -64,6 +64,8 @@ from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import nn_ops
 from tensorflow.python.ops import variable_scope

+resnet_arg_scope = resnet_utils.resnet_arg_scope
+

@add_arg_scope
 def bottleneck(inputs,
--- a/tensorflow/contrib/testing/python/framework/fake_summary_writer.py
+++ b/tensorflow/contrib/testing/python/framework/fake_summary_writer.py
@ -127,3 +127,6 @@ class FakeSummaryWriter(object):

  def reopen(self):
    pass
+
+  def close(self):
+    pass
--- a/tensorflow/contrib/verbs/README.md
+++ b/tensorflow/contrib/verbs/README.md
@ -1,54 +1,54 @@
-## How to compile and use Rdma-enabled tensorflow
-1. Follow the regular TF compilation instructions. During configure step, if you want ibverbs based Rdma support, answer yes to this question:
+## How to compile and use RDMA-enabled TensorFlow
+1. Follow the regular TF compilation instructions. During configure step, if you want ibverbs based RDMA support, answer yes to this question:

    ```Do you wish to build TensorFlow with VERBS-RDMA support [y/N]```

-2. To turn on Rdma connection, add the protocol "grpc+verbs" in server definition:
+2. To turn on RDMA connection, add the protocol "grpc+verbs" in server definition:

    ```server = tf.train.Server(cluster, job_name="local", task_index=0, protocol='grpc+verbs') # default protocol is 'grpc'```

 ## Overview
-The design is based on Tensorflow r1.0. An Rdma path is added between servers for tensor transfer (weights, gradients, etc). The existing GRPC path remains and is responsible for "administrative" tasks, such as setting up the Rdma path, exchanging computation graphs, etc.
+The design is based on TensorFlow r1.0. An RDMA path is added between servers for tensor transfer (weights, gradients, etc). The existing GRPC path remains and is responsible for "administrative" tasks, such as setting up the RDMA path, exchanging computation graphs, etc.

-During the server setup, an Rdma manager is created to manage low-level Rdma components such as Rdma channel and Rdma adapter, an Rdma rendezvous manager is created to oversee send/recv operations between servers. Following the distributed Tensorflow design philosophy, the send operation is passive, i.e. merely placing a tensor in the local out-going table. It is the receive operation that actually initiates the tensor transfer.
+During the server setup, an RDMA manager is created to manage low-level RDMA components such as RDMA channel and RDMA adapter, an RDMA rendezvous manager is created to oversee send/recv operations between servers. Following the distributed TensorFlow design philosophy, the send operation is passive, i.e. merely placing a tensor in the local out-going table. It is the receive operation that actually initiates the tensor transfer.

-Tensorflow dynamically allocates memory for tensors that are to be sent or received. This causes difficulty for Rdma operations where pinned memory is required. Two remedies are possible, either the memory is pinned, transfer, then unpinned for each and every tensor to be transferred, or a buffer is pre-allocated and pinned for each tensor. The former incurs significant operation overhead since pinning and unpinning memory for each dynamically generated tensor is slow. The latter incurs large memory overhead and extra copying from the tensor to its pinned buffer, but may still be faster than the former. The second approach is adopted in this design. Each Rdma channel, representing a Rdma connection to a peer, contains a table of pinned buffers for all the seen tensors that requires transfer. It is assumed that the tensor size rarely changes across different steps. So only one buffer is created for the same tensor across all the steps. In the rare case when the tensor size does increases, the old buffer is discarded and new buffer of larger size is created and pinned.
+TensorFlow dynamically allocates memory for tensors that are to be sent or received. This causes difficulty for RDMA operations where pinned memory is required. Two remedies are possible, either the memory is pinned, transfer, then unpinned for each and every tensor to be transferred, or a buffer is pre-allocated and pinned for each tensor. The former incurs significant operation overhead since pinning and unpinning memory for each dynamically generated tensor is slow. The latter incurs large memory overhead and extra copying from the tensor to its pinned buffer, but may still be faster than the former. The second approach is adopted in this design. Each RDMA channel, representing a RDMA connection to a peer, contains a table of pinned buffers for all the seen tensors that requires transfer. It is assumed that the tensor size rarely changes across different steps. So only one buffer is created for the same tensor across all the steps. In the rare case when the tensor size does increases, the old buffer is discarded and new buffer of larger size is created and pinned.

-When a tensor is prepared fro transfer, it is first converted to TensorProto, then the proto is serialized to byte array and copied to the pinned buffer. The content of the buffer is transferred to the remote node via Rdma write. On the remote side, the process is reversed. This is illustrated in the diagram below. The conversion of TensorProto is introduced to simplify transfer of string-tensors. Also since the TensorProto lives in host memory, even if the origin tensor lives in the device, the pinned buffers are all allocated in the host memory.
-![Tensorflow Rdma path](./design_diagram.png)
+When a tensor is prepared for transfer, it is first converted to TensorProto, then the proto is serialized to byte array and copied to the pinned buffer. The content of the buffer is transferred to the remote node via RDMA write. On the remote side, the process is reversed. This is illustrated in the diagram below. The conversion of TensorProto is introduced to simplify transfer of string-tensors. Also since the TensorProto lives in host memory, even if the origin tensor lives in the device, the pinned buffers are all allocated in the host memory.
+![TensorFlow RDMA path](./design_diagram.png)

 The following improvements can be made in the future. First, conversion to TensorProto and serialization can be avoided for numeric (float/int) tensors since their internal buffer can be access directly as byte array. Second, the pinned buffer may be allocated on device if the tensor is located in the device. This avoids extra device-to-host copy at the expense of extra device memory consumption.
 ## Design details

-### Rdma components
+### RDMA components

-* **Rdma adapter:** The base for Rdma communications. It may contain multiple channels and buffers.  It is responsible for handling various incoming Rdma messages.
-* **Rdma channel:** Responsible for Rdma connection to a particular node. It manages multiple buffers. A channel has a callback table which stores all the callbacks for the requested tensors.
-* **Rdma buffer:** Responsible for sending or receiving data. It has a fixed size memory to store the data. It has a queue to store the pending jobs. There are three types of buffers, message buffer, ACK buffer and tensor buffer. A channel has two message buffers, two ack buffers and many tensor buffers.
-* **Rdma manager:** Manages the adapter and channels, including channel creation, channel setup via GRPC service, channel lookup, etc.
-* **Rdma rendezvous manager:** manages multiple rdma rendezvous. 
-* **Rdma rendezvous:** a derived class of BaseRemoteRendezvous. This class is the back end for "send" and "recv" ops. When the sendrecv_op wants to send or receive a tensor, it calls the rendezvous' "send" and "recv" functions respectively. Rendezvous are identified by "step_id", a random number, so that tensors for different iterations don't get mixed up.
+* **RDMA adapter:** The base for RDMA communications. It may contain multiple channels and buffers.  It is responsible for handling various incoming RDMA messages.
+* **RDMA channel:** Responsible for RDMA connection to a particular node. It manages multiple buffers. A channel has a callback table which stores all the callbacks for the requested tensors.
+* **RDMA buffer:** Responsible for sending or receiving data. It has a fixed size memory to store the data. It has a queue to store the pending jobs. There are three types of buffers, message buffer, ACK buffer and tensor buffer. A channel has two message buffers, two ack buffers and many tensor buffers.
+* **RDMA manager:** Manages the adapter and channels, including channel creation, channel setup via GRPC service, channel lookup, etc.
+* **RDMA rendezvous manager:** manages multiple rdma rendezvous. 
+* **RDMA rendezvous:** a derived class of BaseRemoteRendezvous. This class is the back end for "send" and "recv" ops. When the sendrecv_op wants to send or receive a tensor, it calls the rendezvous' "send" and "recv" functions respectively. Rendezvous are identified by "step_id", a random number, so that tensors for different iterations don't get mixed up.

 ### The SEND operation

-In tensorflow, when rendezvous sends a tensor, it merely puts a tensor in a local table in the corresponding rendezvous. If the tensor has been requested, a callback exists in the table. "send" will activate the callback, which tries to send the tensor across the node.
+In TensorFlow, when rendezvous sends a tensor, it merely puts a tensor in a local table in the corresponding rendezvous. If the tensor has been requested, a callback exists in the table. "send" will activate the callback, which tries to send the tensor across the node.


 ### The RECV operation

-When a tensor is requested, rendezvous' recv function is called. The function first places a callback in the channel's callback table, which will be activated once the tensor is sent from the source. In the next step, a message is sent to notify the source of the requested tensor. Once the source receives the message, it will check locally for the tensor, if not found, a callback is placed in the table, otherwise, the tensor id will be placed at corresponding Rdma buffer's job queue for future transmission. When a tensor is scheduled to be transmitted, the Rdma buffer needs to have the memory allocated and initialized (registered with the remote buffer info). If the memory is not ready, the transmission is deferred, a message is sent to the destination to establish the memory first. The other case a transimssion can be deferred is when the buffer is still being used by an on-going transmission.
+When a tensor is requested, rendezvous' recv function is called. The function first places a callback in the channel's callback table, which will be activated once the tensor is sent from the source. In the next step, a message is sent to notify the source of the requested tensor. Once the source receives the message, it will check locally for the tensor, if not found, a callback is placed in the table, otherwise, the tensor id will be placed at corresponding RDMA buffer's job queue for future transmission. When a tensor is scheduled to be transmitted, the RDMA buffer needs to have the memory allocated and initialized (registered with the remote buffer info). If the memory is not ready, the transmission is deferred, a message is sent to the destination to establish the memory first. The other case a transmission can be deferred is when the buffer is still being used by an on-going transmission.

-### Three types of Rdma buffers
+### Three types of RDMA buffers

 * **Message buffer:** responsible for sending message only.
 * **Ack buffer:** once a message is sent, the recipient needs to send an ack via the ack buffer to free up the message buffer. An ack buffer is exclusively for its coupled message buffer.
 * **Tensor buffer:** responsible for sending tensors. The recipient needs to send back a message to free up the sending buffer.

-### Rdma packet format
+### RDMA packet format

 |type|name_size|name|step_id|buffer_size|remote_addr|rkey|is_dead|data_type|tensor_shape|tensor_bytes|tensor_buffer|

-### Six types of Rdma messages
+### Six types of RDMA messages
 * RDMA_MESSAGE_ACK
 * RDMA_MESSAGE_BUFFER_IDLE
 * RDMA_MESSAGE_BUFFER_REQUEST
@ -56,7 +56,7 @@ When a tensor is requested, rendezvous' recv function is called. The function fi
 * RDMA_MESSAGE_TENSOR_REQUEST
 * RDMA_MESSAGE_TENSOR_WRITE

-### Actions upon receiving Rdma messages
+### Actions upon receiving RDMA messages
 * RDMA_MESSAGE_ACK
  * sender: mark local ack buffer idle.
  * receiver: mark remote message buffer idle, send next item.
--- a/tensorflow/contrib/verbs/grpc_verbs_service.cc
+++ b/tensorflow/contrib/verbs/grpc_verbs_service.cc
@ -117,6 +117,8 @@ Status GrpcVerbsService::GetRemoteAddressSync(
  ra.lid = request->channel().lid();
  ra.qpn = request->channel().qpn();
  ra.psn = request->channel().psn();
+  ra.snp = request->channel().snp();
+  ra.iid = request->channel().iid();
  rc->SetRemoteAddress(ra, false);
  rc->Connect();
  int i = 0;
@ -146,6 +148,8 @@ Status GrpcVerbsService::GetRemoteAddressSync(
  channel_info->set_lid(rc->self().lid);
  channel_info->set_qpn(rc->self().qpn);
  channel_info->set_psn(rc->self().psn);
+  channel_info->set_snp(rc->self().snp);
+  channel_info->set_iid(rc->self().iid);
  for (int i = 0; i < RdmaChannel::kNumMessageBuffers; i++) {
    MemoryRegion* mr = response->add_mr();
    mr->set_remote_addr(reinterpret_cast<uint64>(mb[i]->buffer()));
--- a/tensorflow/contrib/verbs/rdma.cc
+++ b/tensorflow/contrib/verbs/rdma.cc
@ -271,6 +271,11 @@ RdmaChannel::RdmaChannel(const RdmaAdapter* adapter, const string local_name,
    self_.lid = attr.lid;
    self_.qpn = qp_->qp_num;
    self_.psn = static_cast<uint32_t>(random::New64()) & 0xffffff;
+    union ibv_gid gid;
+    CHECK(!ibv_query_gid(adapter_->context_, (uint8_t)1, 0, &gid))
+        << "Query gid";
+    self_.snp = gid.global.subnet_prefix;
+    self_.iid = gid.global.interface_id;
  }

  // create message and ack buffers, then initialize the tables.
@ -320,11 +325,15 @@ void RdmaChannel::SetRemoteAddress(const RdmaAddress& ra, bool override) {
    remote_.lid = ra.lid;
    remote_.qpn = ra.qpn;
    remote_.psn = ra.psn;
+    remote_.snp = ra.snp;
+    remote_.iid = ra.iid;
    remote_set_ = true;
  } else {
    CHECK(remote_.lid == ra.lid);
    CHECK(remote_.qpn == ra.qpn);
    CHECK(remote_.psn == ra.psn);
+    CHECK(remote_.snp == ra.snp);
+    CHECK(remote_.iid == ra.iid);
  }
 }

@ -472,7 +481,11 @@ void RdmaChannel::Connect(const RdmaAddress& remoteAddr) {
    attr.rq_psn = remoteAddr.psn;
    attr.max_dest_rd_atomic = 1;
    attr.min_rnr_timer = 12;
-    attr.ah_attr.is_global = 0;
+    attr.ah_attr.is_global = 1;
+    attr.ah_attr.grh.dgid.global.subnet_prefix = remoteAddr.snp;
+    attr.ah_attr.grh.dgid.global.interface_id = remoteAddr.iid;
+    attr.ah_attr.grh.flow_label = 0;
+    attr.ah_attr.grh.hop_limit = 255;
    attr.ah_attr.dlid = remoteAddr.lid;
    attr.ah_attr.sl = 0;
    attr.ah_attr.src_path_bits = 0;
--- a/tensorflow/contrib/verbs/rdma.h
+++ b/tensorflow/contrib/verbs/rdma.h
@ -40,6 +40,8 @@ struct RdmaAddress {
  uint32_t lid;
  uint32_t qpn;
  uint32_t psn;
+  uint64_t snp;
+  uint64_t iid;
 };
 // structure to save information for remote memory regions.
 struct RemoteMR {
--- a/tensorflow/contrib/verbs/rdma_mgr.cc
+++ b/tensorflow/contrib/verbs/rdma_mgr.cc
@ -69,6 +69,8 @@ void RdmaMgr::SetupChannels() {
    channel_info->set_lid(rc->self_.lid);
    channel_info->set_qpn(rc->self_.qpn);
    channel_info->set_psn(rc->self_.psn);
+    channel_info->set_snp(rc->self_.snp);
+    channel_info->set_iid(rc->self_.iid);
    for (int i = 0; i < RdmaChannel::kNumMessageBuffers; i++) {
      MemoryRegion* mr = req.add_mr();
      mr->set_remote_addr(
@ -85,6 +87,8 @@ void RdmaMgr::SetupChannels() {
      ra.lid = resp.channel().lid();
      ra.qpn = resp.channel().qpn();
      ra.psn = resp.channel().psn();
+      ra.snp = resp.channel().snp();
+      ra.iid = resp.channel().iid();
      rc->SetRemoteAddress(ra, false);
      rc->Connect();
      int i = 0;
--- a/tensorflow/contrib/verbs/verbs_service.proto
+++ b/tensorflow/contrib/verbs/verbs_service.proto
@ -30,6 +30,8 @@ message Channel {
  int32 lid = 1;
  int32 qpn = 2;
  int32 psn = 3;
+  uint64 snp = 4;
+  uint64 iid = 5;
 }

 message MemoryRegion {
--- a/tensorflow/core/distributed_runtime/rpc/grpc_call.h
+++ b/tensorflow/core/distributed_runtime/rpc/grpc_call.h
@ -16,6 +16,7 @@ limitations under the License.
 #ifndef THIRD_PARTY_TENSORFLOW_CORE_DISTRIBUTED_RUNTIME_RPC_GRPC_CALL_H_
 #define THIRD_PARTY_TENSORFLOW_CORE_DISTRIBUTED_RUNTIME_RPC_GRPC_CALL_H_

+#include "tensorflow/core/lib/core/refcount.h"
 #include "tensorflow/core/platform/macros.h"

 #include "grpc++/grpc++.h"
--- a/tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.cc
+++ b/tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.cc
@ -16,6 +16,7 @@ limitations under the License.
 #include "tensorflow/core/distributed_runtime/rpc/grpc_tensor_coding.h"
 #include "grpc++/support/byte_buffer.h"
 #include "grpc++/support/slice.h"
+#include "tensorflow/core/common_runtime/dma_helper.h"
 #include "tensorflow/core/framework/tensor.h"
 #include "tensorflow/core/framework/tensor_reference.h"
 #include "tensorflow/core/lib/gtl/inlined_vector.h"
@ -27,10 +28,9 @@ namespace tensorflow {
 namespace grpc {

 static void do_nothing(void* raw) {}
-static void unref_tensorreference(void* raw) {
-  TensorReference* ref = static_cast<TensorReference*>(raw);
-  ref->Unref();
-  delete ref;
+static void unref_tensorbuffer(void* raw) {
+  TensorBuffer* buf = static_cast<TensorBuffer*>(raw);
+  buf->Unref();
 }

 void EncodeRecvTensorResponseToByteBuffer(const RecvTensorResponse& proto,
@ -219,8 +219,8 @@ void EncodeTensorToByteBuffer(bool is_dead, const Tensor& val,

    if (tensor_data_is_large) {
      // Encode the actual tensor data by pointing to the backing store,
-      // and add a special zero-length slice that is really a TensorReference
-      // object that we will destroy when we are done.
+      // and add a special zero-length slice that is really a TensorBuffer
+      // reference that we will unref when we are done.
      //
      // TODO(jeff): Note that this approach relies on the fact that
      // slices are destroyed in the order in which they are added to
@ -241,17 +241,15 @@ void EncodeTensorToByteBuffer(bool is_dead, const Tensor& val,

      // (E) Encode tensor data, but by sharing backing store

-      // TODO(jeff,sanjay): It'd be nice to avoid this TensorReference
-      // allocation, and instead get our hands on the underlying
-      // TensorBuffer object and just directly ref it here and unref
-      // it in unref_tensorreference.
-      TensorReference* ref = new TensorReference(val);
+      const TensorBuffer* buf = DMAHelper::buffer(&val);
+      buf->Ref();
      gpr_slice s1 = gpr_slice_new(
          const_cast<void*>(static_cast<const void*>(tdata.data())),
          tdata.size(), do_nothing);
      slices[1] = ::grpc::Slice(s1, ::grpc::Slice::STEAL_REF);

-      gpr_slice s2 = gpr_slice_new(ref, 0, unref_tensorreference);
+      gpr_slice s2 =
+          gpr_slice_new(const_cast<TensorBuffer*>(buf), 0, unref_tensorbuffer);
      slices[2] = ::grpc::Slice(s2, ::grpc::Slice::STEAL_REF);
      num_slices += 2;
    }
--- a/tensorflow/core/kernels/cwise_op_atan2.cc
+++ b/tensorflow/core/kernels/cwise_op_atan2.cc
@ -0,0 +1,23 @@
+/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(BinaryOp, CPU, "Atan2", functor::atan2, float, double);
+#if GOOGLE_CUDA
+REGISTER2(BinaryOp, GPU, "Atan2", functor::atan2, float, double);
+#endif
+}  // namespace tensorflow
--- a/tensorflow/core/kernels/cwise_op_gpu_atan2.cu.cc
+++ b/tensorflow/core/kernels/cwise_op_gpu_atan2.cu.cc
@ -0,0 +1,26 @@
+/* Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY2(atan2, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
--- a/tensorflow/core/kernels/cwise_ops.h
+++ b/tensorflow/core/kernels/cwise_ops.h
@ -658,6 +658,22 @@ struct zeta : base<T, Eigen::internal::scalar_zeta_op<T>> {};
 template <typename T>
 struct polygamma : base<T, Eigen::internal::scalar_polygamma_op<T>> {};

+template <typename Scalar>
+struct scalar_atan2_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_atan2_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar
+  operator()(const Scalar& y, const Scalar& x) const {
+#if GOOGLE_CUDA
+    return ::atan2(y, x);
+#else
+    return std::atan2(y, x);
+#endif
+  }
+};
+
+template <typename T>
+struct atan2 : base<T, scalar_atan2_op<T>> {};
+
 template <typename T>
 struct squared_difference
    : base<T, Eigen::internal::scalar_compose_op<
--- a/tensorflow/core/kernels/linalg_ops_common.cc
+++ b/tensorflow/core/kernels/linalg_ops_common.cc
@ -15,6 +15,8 @@ limitations under the License.

 #include "tensorflow/core/kernels/linalg_ops_common.h"

+#include <utility>
+
 #include "third_party/eigen3/Eigen/Core"
 #include "tensorflow/core/framework/device_base.h"
 #include "tensorflow/core/framework/kernel_def_builder.h"
@ -153,10 +155,9 @@ void LinearAlgebraOp<Scalar>::AnalyzeInputs(OpKernelContext* context,
    const int col_dimension = input_rank - 1;
    const int64 num_rows = in.dim_size(row_dimension);
    const int64 num_cols = in.dim_size(col_dimension);
-    // TODO(rmlarsen): Use emplace_back when it is added to InlinedVector. Same
-    // in several places below.
-    input_matrix_shapes->push_back(TensorShape({num_rows, num_cols}));
-    inputs->push_back(&in);
+    input_matrix_shapes->emplace_back(
+        std::initializer_list<int64>({num_rows, num_cols}));
+    inputs->emplace_back(&in);
  }
  // Have the derived class validate that the inputs are as expected.
  ValidateInputMatrixShapes(context, *input_matrix_shapes);
@ -198,9 +199,7 @@ void LinearAlgebraOp<Scalar>::PrepareOutputs(
      // concatenated with the output_matrix_shape (if the output is not
      // scalar).
      output_tensor_shape = batch_shape;
-      for (int dim = 0; dim < output_matrix_shape.dims(); ++dim) {
-        output_tensor_shape.AddDim(output_matrix_shape.dim_size(dim));
-      }
+      output_tensor_shape.AppendShape(output_matrix_shape);
    }
    Tensor* out = nullptr;
    // See if there is an input buffer we can reuse for this output.
@ -219,7 +218,7 @@ void LinearAlgebraOp<Scalar>::PrepareOutputs(
      OP_REQUIRES_OK(context, context->allocate_output(
                                  output_idx, output_tensor_shape, &out));
    }
-    outputs->push_back(out);
+    outputs->emplace_back(out);
  }
 }

@ -232,11 +231,10 @@ void LinearAlgebraOp<Scalar>::ComputeTensorSlice(
  for (size_t i = 0; i < inputs.size(); ++i) {
    // TODO(kalakris): Handle alignment if possible. Eigen::Map is
    // unaligned by default.
-    matrix_inputs.push_back(
-        ConstMatrixMap(inputs[i]->flat<Scalar>().data() +
-                           matrix_index * input_matrix_shapes[i].num_elements(),
-                       input_matrix_shapes[i].dim_size(0),
-                       input_matrix_shapes[i].dim_size(1)));
+    matrix_inputs.emplace_back(
+        inputs[i]->flat<Scalar>().data() +
+            matrix_index * input_matrix_shapes[i].num_elements(),
+        input_matrix_shapes[i].dim_size(0), input_matrix_shapes[i].dim_size(1));
  }

  MatrixMaps matrix_outputs;
@ -248,10 +246,10 @@ void LinearAlgebraOp<Scalar>::ComputeTensorSlice(
    int num_output_cols = output_matrix_shapes[i].dims() == 2
                              ? output_matrix_shapes[i].dim_size(1)
                              : 1;
-    matrix_outputs.push_back(
-        MatrixMap(outputs[i]->flat<Scalar>().data() +
-                      matrix_index * output_matrix_shapes[i].num_elements(),
-                  num_output_rows, num_output_cols));
+    matrix_outputs.emplace_back(
+        outputs[i]->flat<Scalar>().data() +
+            matrix_index * output_matrix_shapes[i].num_elements(),
+        num_output_rows, num_output_cols);
  }
  ComputeMatrix(context, matrix_inputs, &matrix_outputs);
 }
--- a/tensorflow/core/kernels/linalg_ops_common.h
+++ b/tensorflow/core/kernels/linalg_ops_common.h
@ -21,10 +21,7 @@ limitations under the License.
 // computations across different threads if necessary.
 #include <algorithm>

-#define EIGEN_USE_THREADS
-
 #include "third_party/eigen3/Eigen/Core"
-#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
 #include "tensorflow/core/framework/kernel_def_builder.h"
 #include "tensorflow/core/framework/op_kernel.h"
 #include "tensorflow/core/framework/tensor.h"
--- a/tensorflow/core/ops/math_ops.cc
+++ b/tensorflow/core/ops/math_ops.cc
@ -731,6 +731,21 @@ The polygamma function is defined as:
 where \\(\psi(x)\\) is the digamma function.
 )doc");

+REGISTER_OP("Atan2")
+    .Input("y: T")
+    .Input("x: T")
+    .Output("z: T")
+    .Attr("T: {float, double}")
+    .SetShapeFn(shape_inference::BroadcastBinaryOpShapeFn)
+    .Doc(R"doc(
+Computes arctangent of `y/x` element-wise, respecting signs of the arguments.
+This is the angle \( \theta \in [-\pi, \pi] \) such that
+\[ x = r \cos(\theta) \]
+and
+\[ y = r \sin(\theta) \]
+where \(r = \sqrt(x^2 + y^2) \).
+)doc");
+
 REGISTER_OP("Betainc")
    .Input("a: T")
    .Input("b: T")
--- a/tensorflow/docs_src/community/documentation.md
+++ b/tensorflow/docs_src/community/documentation.md
@ -4,12 +4,24 @@ We welcome contributions to the Tensorflow documentation from the community.
 This document explains how you can contribute to that documentation. In
 particular, this document explains the following:

- Where the documentation is located.
- How to make conformant edits.
- How to build and test your documentation changes before you submit them.
+* Where the documentation is located.
+* How to make conformant edits.
+* How to build and test your documentation changes before you submit them.

-You can view Tensorflow documentation on tensorflow.org, and you can view and
-edit the raw files on Github.
+You can view Tensorflow documentation on https://www.tensorflow.org, and you
+can view and edit the raw files on Github. We're publishing our docs on Github
+so everybody can contribute. Whatever gets checked in tensorflow/docs_src will
+be published soon after on https://www.tensorflow.org. 
+
+Republishing TensorFlow documentation in different forms is absolutely allowed,
+but we are unlikely to accept other documentation formats (or the tooling to
+generate them) into our repository. If you do choose to republish our
+documentation in another form, please be sure to include:
+
+* The version of the API this represents (i.e. r1.0, master, etc.)
+* The commit or version from which the documentation was generated
+* Where to get the latest documentation (that is, https://www.tensorflow.org)
+* The Apache 2.0 license.

 ## A Note on Versions

--- a/tensorflow/docs_src/get_started/input_fn.md
+++ b/tensorflow/docs_src/get_started/input_fn.md
@ -12,7 +12,7 @@ When training a neural network using tf.contrib.learn, it's possible to pass
 your feature and target data directly into your `fit`, `evaluate`, or `predict`
 operations. Here's an example taken from the @{$tflearn$tf.contrib.learn quickstart tutorial}:

-```py
+```python
 training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING, target_dtype=np.int, features_dtype=np.float32)
 test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
--- a/tensorflow/docs_src/install/install_linux.md
+++ b/tensorflow/docs_src/install/install_linux.md
@ -166,7 +166,7 @@ Take the following steps to install TensorFlow with Virtualenv:
     virtualenv environment:

     <pre>(tensorflow)$ <b>pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl</b></pre>

 If you encounter installation problems, see
 [Common Installation Problems](#common_installation_problems).
@ -271,7 +271,7 @@ take the following steps:

     <pre>
     $ <b>sudo pip3 install --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl</b>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl</b>
     </pre>

     If this step fails, see
@ -458,7 +458,7 @@ Take the following steps to install TensorFlow in an Anaconda environment:

     <pre>
     (tensorflow)$ <b>pip install --ignore-installed --upgrade \
-     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl</b></pre>
+     https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl</b></pre>


 <a name="ValidateYourInstallation"></a>
@ -626,14 +626,14 @@ This section documents the relevant values for Linux installations.
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl
 </pre>

 Note that GPU support requires the NVIDIA hardware and software described in
@ -645,14 +645,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp34-cp34m-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp34-cp34m-linux_x86_64.whl
 </pre>

 Note that GPU support requires the NVIDIA hardware and software described in
@ -664,14 +664,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp35-cp35m-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl
 </pre>


@ -683,14 +683,14 @@ Note that GPU support requires the NVIDIA hardware and software described in
 CPU only:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0rc2-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp36-cp36m-linux_x86_64.whl
 </pre>


 GPU support:

 <pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0rc2-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp36-cp36m-linux_x86_64.whl
 </pre>


--- a/tensorflow/docs_src/install/install_mac.md
+++ b/tensorflow/docs_src/install/install_mac.md
@ -112,7 +112,7 @@ Take the following steps to install TensorFlow with Virtualenv:
     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0rc2-py2-none-any.whl</b></pre>

 If you encounter installation problems, see
-[Common Installation Problems](#CommonInstallationProblems).
+[Common Installation Problems](#common-installation-problems).


 ### Next Steps
@ -233,7 +233,7 @@ take the following steps:
     https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.1.0rc2-py2-none-any.whl</b> </pre>

     If the preceding command fails, see
-     [Common installation problems](#CommonInstallationProblems).
+     [installation problems](#common-installation-problems).



--- a/tensorflow/docs_src/install/install_sources.md
+++ b/tensorflow/docs_src/install/install_sources.md
@ -320,10 +320,10 @@ Invoke `pip install` to install that pip package.
 The filename of the `.whl` file depends on your platform.
 For example, the following command will install the pip package

-for TensorFlow 1.1.0rc2 on Linux:
+for TensorFlow 1.1.0 on Linux:

 <pre>
-$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.1.0rc2-py2-none-any.whl</b>
+$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.1.0-py2-none-any.whl</b>
 </pre>

 ## Validate your installation
--- a/tensorflow/docs_src/install/install_windows.md
+++ b/tensorflow/docs_src/install/install_windows.md
@ -103,7 +103,7 @@ Take the following steps to install TensorFlow in an Anaconda environment:
  2. Create a conda environment named <tt>tensorflow</tt>
     by invoking the following command:

-     <pre>C:\> <b>conda create -n tensorflow</b> </pre>
+     <pre>C:\> <b>conda create -n tensorflow python=3.5</b> </pre>

  3. Activate the conda environment by issuing the following command:

@ -114,12 +114,12 @@ Take the following steps to install TensorFlow in an Anaconda environment:
     environment. To install the CPU-only version of TensorFlow, enter the
     following command:

-     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0rc2-cp35-cp35m-win_amd64.whl</b> </pre>
+     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl</b> </pre>

     To install the GPU version of TensorFlow, enter the following command
     (on a single line):

-     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0rc2-cp35-cp35m-win_amd64.whl</b> </pre>
+     <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl</b> </pre>

 ## Validate your installation

@ -193,5 +193,20 @@ ImportError: cannot import name 'descriptor'</pre>
  <td><pre>No module named "pywrap_tensorflow"</pre></td>
 </tr>

+<tr>
+  <td><a href="https://stackoverflow.com/q/42217532">42217532</a></td>
+  <td>
+  <pre>OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits</pre>
+  </td>
+</tr>
+
+<tr>
+  <td><a href="https://stackoverflow.com/q/43134753">43134753</a></td>
+  <td>
+  <pre>The TensorFlow library wasn't compiled to use SSE instructions</pre>
+  </td>
+</tr>
+
+
 </table>

--- a/tensorflow/examples/image_retraining/retrain.py
+++ b/tensorflow/examples/image_retraining/retrain.py
@ -369,9 +369,12 @@ def create_bottleneck_file(bottleneck_path, image_lists, label_name, index,
  if not gfile.Exists(image_path):
    tf.logging.fatal('File does not exist %s', image_path)
  image_data = gfile.FastGFile(image_path, 'rb').read()
-  bottleneck_values = run_bottleneck_on_image(sess, image_data,
-                                              jpeg_data_tensor,
-                                              bottleneck_tensor)
+  try:
+    bottleneck_values = run_bottleneck_on_image(
+        sess, image_data, jpeg_data_tensor, bottleneck_tensor)
+  except:
+    raise RuntimeError('Error during processing file %s' % image_path)
+
  bottleneck_string = ','.join(str(x) for x in bottleneck_values)
  with open(bottleneck_path, 'w') as bottleneck_file:
    bottleneck_file.write(bottleneck_string)
--- a/tensorflow/python/estimator/estimator_test.py
+++ b/tensorflow/python/estimator/estimator_test.py
@ -54,6 +54,7 @@ from tensorflow.python.platform import test
 from tensorflow.python.platform import tf_logging as logging
 from tensorflow.python.saved_model import loader
 from tensorflow.python.saved_model import tag_constants
+from tensorflow.python.summary.writer import writer_cache
 from tensorflow.python.training import checkpoint_state_pb2
 from tensorflow.python.training import saver
 from tensorflow.python.training import saver_test_utils
@ -436,8 +437,12 @@ class EstimatorTrainTest(test.TestCase):
        model_fn=model_fn_global_step_incrementer)
    est1.train(dummy_input_fn, steps=5)

+    # We have to clear the cache before we can rename the directory,
+    # otherwise open file handles will prevent the delete on Windows.
+    writer_cache.FileWriterCache.clear()
    model_dir2 = os.path.join(tmpdir, 'model_dir2')
    os.renames(model_dir1, model_dir2)
+
    est2 = estimator.Estimator(
        model_dir=model_dir2,
        model_fn=model_fn_global_step_incrementer)
--- a/tensorflow/python/framework/ops.py
+++ b/tensorflow/python/framework/ops.py
@ -2705,11 +2705,11 @@ class Graph(object):
    Args:
      name: The key for the collection. For example, the `GraphKeys` class
        contains many standard names for collections.
-      scope: (Optional.) If supplied, the resulting list is filtered to include
-        only items whose `name` attribute matches using `re.match`. Items
-        without a `name` attribute are never returned if a scope is supplied and
-        the choice or `re.match` means that a `scope` without special tokens
-        filters by prefix.
+      scope: (Optional.) A string. If supplied, the resulting list is filtered
+        to include only items whose `name` attribute matches `scope` using
+        `re.match`. Items without a `name` attribute are never returned if a
+        scope is supplied. The choice of `re.match` means that a `scope` without
+        special tokens filters by prefix.

    Returns:
      The list of values in the collection with the given `name`, or
--- a/tensorflow/python/kernel_tests/cwise_ops_test.py
+++ b/tensorflow/python/kernel_tests/cwise_ops_test.py
@ -615,6 +615,13 @@ class BinaryOpTest(test.TestCase):
    self._compareBoth(x, y, np.multiply, _MUL)
    self._compareBoth(x, y + 0.1, np.true_divide, _TRUEDIV)
    self._compareBoth(x, y + 0.1, np.floor_divide, _FLOORDIV)
+    self._compareBoth(x, y, np.arctan2, math_ops.atan2)
+    x1 = np.random.randn(5, 6).astype(np.float32)
+    x2 = np.random.randn(5, 6).astype(np.float32)
+    # Remove tiny values--atan2 gradients are flaky near the origin.
+    x1[np.abs(x1) < 0.05] = 0.05 * np.sign(x1[np.abs(x1) < 0.05])
+    x2[np.abs(x2) < 0.05] = 0.05 * np.sign(x2[np.abs(x2) < 0.05])
+    self._compareBoth(x1, x2, np.arctan2, math_ops.atan2)
    try:
      from scipy import special  # pylint: disable=g-import-not-at-top
      a_pos_small = np.linspace(0.1, 2, 15).reshape(1, 3, 5).astype(np.float32)
@ -672,6 +679,13 @@ class BinaryOpTest(test.TestCase):
    self._compareBoth(x, y, np.multiply, _MUL)
    self._compareBoth(x, y + 0.1, np.true_divide, _TRUEDIV)
    self._compareBoth(x, y + 0.1, np.floor_divide, _FLOORDIV)
+    self._compareBoth(x, y, np.arctan2, math_ops.atan2)
+    x1 = np.random.randn(7, 4).astype(np.float64)
+    x2 = np.random.randn(7, 4).astype(np.float64)
+    # Remove tiny values--atan2 gradients are flaky near the origin.
+    x1[np.abs(x1) < 0.5] = 0.5 * np.sign(x1[np.abs(x1) < 0.5])
+    x2[np.abs(x2) < 0.5] = 0.5 * np.sign(x2[np.abs(x2) < 0.5])
+    self._compareBoth(x1, x2, np.arctan2, math_ops.atan2)
    try:
      from scipy import special  # pylint: disable=g-import-not-at-top
      a_pos_small = np.linspace(0.1, 2, 15).reshape(1, 3, 5).astype(np.float32)
@ -1090,6 +1104,19 @@ class BinaryOpTest(test.TestCase):
          error = gradient_checker.compute_gradient_error(y, [], z, [])
          self.assertLess(error, 2e-4)

+  def testAtan2SpecialValues(self):
+    x1l, x2l = zip((+0.0, +0.0), (+0.0, -0.0), (-0.0, +0.0), (-0.0, -0.0),
+                   (1.2345, float("inf")), (1.2345, -float("inf")),
+                   (-4.321, float("inf")), (-4.125, -float("inf")),
+                   (float("inf"), float("inf")), (float("inf"), -float("inf")),
+                   (-float("inf"), float("inf")), (-float("inf"),
+                                                   -float("inf")))
+    for dtype in np.float32, np.float64:
+      x1 = np.array(x1l).astype(dtype)
+      x2 = np.array(x2l).astype(dtype)
+      self._compareCpu(x1, x2, np.arctan2, math_ops.atan2)
+      self._compareGpu(x1, x2, np.arctan2, math_ops.atan2)
+

 class ComparisonOpTest(test.TestCase):

--- a/tensorflow/python/kernel_tests/tensor_priority_test.py
+++ b/tensorflow/python/kernel_tests/tensor_priority_test.py
@ -0,0 +1,86 @@
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for the binary ops priority mechanism."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import test as test_lib
+
+
+class TensorPriorityTest(test_lib.TestCase):
+
+  def testSupportedRhsWithoutDelegation(self):
+
+    class NumpyArraySubclass(np.ndarray):
+      pass
+
+    supported_rhs_without_delegation = (3, 3.0, [1.0, 2.0], np.array(
+        [1.0, 2.0]), NumpyArraySubclass(
+            shape=(1, 2), buffer=np.array([1.0, 2.0])),
+                                        ops.convert_to_tensor([[1.0, 2.0]]))
+    for rhs in supported_rhs_without_delegation:
+      tensor = ops.convert_to_tensor([[10.0, 20.0]])
+      res = tensor + rhs
+      self.assertIsInstance(res, ops.Tensor)
+
+  def testUnsupportedRhsWithoutDelegation(self):
+
+    class WithoutReverseAdd(object):
+      pass
+
+    tensor = ops.convert_to_tensor([[10.0, 20.0]])
+    rhs = WithoutReverseAdd()
+    with self.assertRaisesWithPredicateMatch(
+        TypeError, lambda e: "Expected float" in str(e)):
+      # pylint: disable=pointless-statement
+      tensor + rhs
+
+  def testUnsupportedRhsWithDelegation(self):
+
+    class WithReverseAdd(object):
+
+      def __radd__(self, lhs):
+        return "Works!"
+
+    tensor = ops.convert_to_tensor([[10.0, 20.0]])
+    rhs = WithReverseAdd()
+    res = tensor + rhs
+    self.assertEqual(res, "Works!")
+
+  def testFullDelegationControlUsingRegistry(self):
+
+    class NumpyArraySubclass(np.ndarray):
+
+      def __radd__(self, lhs):
+        return "Works!"
+
+    def raise_to_delegate(value, dtype=None, name=None, as_ref=False):
+      del value, dtype, name, as_ref  # Unused.
+      raise TypeError
+
+    ops.register_tensor_conversion_function(
+        NumpyArraySubclass, raise_to_delegate, priority=0)
+    tensor = ops.convert_to_tensor([[10.0, 20.0]])
+    rhs = NumpyArraySubclass(shape=(1, 2), buffer=np.array([1.0, 2.0]))
+    res = tensor + rhs
+    self.assertEqual(res, "Works!")
+
+
+if __name__ == "__main__":
+  test_lib.main()
--- a/tensorflow/python/layers/convolutional.py
+++ b/tensorflow/python/layers/convolutional.py
@ -972,7 +972,7 @@ def separable_conv2d(inputs,


 class Conv2DTranspose(Conv2D):
-  """Transposed convolution layer (sometimes called Deconvolution).
+  """Transposed 2D convolution layer (sometimes called 2D Deconvolution).

  The need for transposed convolutions generally arises
  from the desire to use a transformation going in the opposite direction
@ -1086,19 +1086,9 @@ class Conv2DTranspose(Conv2D):
    kernel_h, kernel_w = self.kernel_size
    stride_h, stride_w = self.strides

-    def get_deconv_dim(dim_size, stride_size, kernel_size, padding):
-      if isinstance(dim_size, ops.Tensor):
-        dim_size = math_ops.multiply(dim_size, stride_size)
-      elif dim_size is not None:
-        dim_size *= stride_size
-
-      if padding == 'valid' and dim_size is not None:
-        dim_size += max(kernel_size - stride_size, 0)
-      return dim_size
-
    # Infer the dynamic output shape:
-    out_height = get_deconv_dim(height, stride_h, kernel_h, self.padding)
-    out_width = get_deconv_dim(width, stride_w, kernel_w, self.padding)
+    out_height = utils.get_deconv_dim(height, stride_h, kernel_h, self.padding)
+    out_width = utils.get_deconv_dim(width, stride_w, kernel_w, self.padding)

    if self.data_format == 'channels_first':
      output_shape = (batch_size, self.filters, out_height, out_width)
@ -1119,10 +1109,10 @@ class Conv2DTranspose(Conv2D):
    # Infer the static output shape:
    out_shape = inputs.get_shape().as_list()
    out_shape[c_axis] = self.filters
-    out_shape[h_axis] = get_deconv_dim(
-        out_shape[h_axis], stride_h, kernel_h, self.padding)
-    out_shape[w_axis] = get_deconv_dim(
-        out_shape[w_axis], stride_w, kernel_w, self.padding)
+    out_shape[h_axis] = utils.get_deconv_dim(out_shape[h_axis], stride_h,
+                                             kernel_h, self.padding)
+    out_shape[w_axis] = utils.get_deconv_dim(out_shape[w_axis], stride_w,
+                                             kernel_w, self.padding)
    outputs.set_shape(out_shape)

    if self.bias:
@ -1152,7 +1142,7 @@ def conv2d_transpose(inputs,
                     trainable=True,
                     name=None,
                     reuse=None):
-  """Transposed convolution layer (sometimes called Deconvolution).
+  """Functional interface for transposed 2D convolution layer.

  The need for transposed convolutions generally arises
  from the desire to use a transformation going in the opposite direction
@ -1177,12 +1167,12 @@ def conv2d_transpose(inputs,
      `channels_last` corresponds to inputs with shape
      `(batch, height, width, channels)` while `channels_first` corresponds to
      inputs with shape `(batch, channels, height, width)`.
-    activation: Activation function. Set it to None to maintain a
+    activation: Activation function. Set it to `None` to maintain a
      linear activation.
    use_bias: Boolean, whether the layer uses a bias.
    kernel_initializer: An initializer for the convolution kernel.
-    bias_initializer: An initializer for the bias vector. If None, no bias will
-      be applied.
+    bias_initializer: An initializer for the bias vector. If `None`, then no
+      bias will be applied.
    kernel_regularizer: Optional regularizer for the convolution kernel.
    bias_regularizer: Optional regularizer for the bias vector.
    activity_regularizer: Regularizer function for the output.
@ -1215,6 +1205,250 @@ def conv2d_transpose(inputs,
  return layer.apply(inputs)


+class Conv3DTranspose(Conv3D):
+  """Transposed 3D convolution layer (sometimes called 3D Deconvolution).
+
+  Arguments:
+    filters: Integer, the dimensionality of the output space (i.e. the number
+      of filters in the convolution).
+    kernel_size: An integer or tuple/list of 3 integers, specifying the
+      depth, height and width of the 3D convolution window.
+      Can be a single integer to specify the same value for all spatial
+      dimensions.
+    strides: An integer or tuple/list of 3 integers, specifying the strides
+      of the convolution along the depth, height and width.
+      Can be a single integer to specify the same value for all spatial
+      dimensions.
+    padding: One of `"valid"` or `"same"` (case-insensitive).
+    data_format: A string, one of `channels_last` (default) or `channels_first`.
+      The ordering of the dimensions in the inputs.
+      `channels_last` corresponds to inputs with shape
+      `(batch, depth, height, width, channels)` while `channels_first`
+      corresponds to inputs with shape
+      `(batch, channels, depth, height, width)`.
+    activation: Activation function. Set it to `None` to maintain a
+      linear activation.
+    use_bias: Boolean, whether the layer uses a bias.
+    kernel_initializer: An initializer for the convolution kernel.
+    bias_initializer: An initializer for the bias vector. If `None`, then no
+      bias will be applied.
+    kernel_regularizer: Optional regularizer for the convolution kernel.
+    bias_regularizer: Optional regularizer for the bias vector.
+    activity_regularizer: Regularizer function for the output.
+    trainable: Boolean, if `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    name: A string, the name of the layer.
+  """
+
+  def __init__(self,
+               filters,
+               kernel_size,
+               strides=(1, 1, 1),
+               padding='valid',
+               data_format='channels_last',
+               activation=None,
+               use_bias=True,
+               kernel_initializer=None,
+               bias_initializer=init_ops.zeros_initializer(),
+               kernel_regularizer=None,
+               bias_regularizer=None,
+               activity_regularizer=None,
+               trainable=True,
+               name=None,
+               **kwargs):
+    super(Conv3DTranspose, self).__init__(
+        filters=filters,
+        kernel_size=kernel_size,
+        strides=strides,
+        padding=padding,
+        data_format=data_format,
+        activation=activation,
+        use_bias=use_bias,
+        kernel_initializer=kernel_initializer,
+        bias_initializer=bias_initializer,
+        kernel_regularizer=kernel_regularizer,
+        bias_regularizer=bias_regularizer,
+        activity_regularizer=activity_regularizer,
+        trainable=trainable,
+        name=name,
+        **kwargs)
+
+  def build(self, input_shape):
+    if len(input_shape) != 5:
+      raise ValueError('Inputs should have rank 5, received input shape:',
+                       str(input_shape))
+    if self.data_format == 'channels_first':
+      channel_axis = 1
+    else:
+      channel_axis = -1
+    if input_shape[channel_axis] is None:
+      raise ValueError('The channel dimension of the inputs '
+                       'should be defined, found None: ' + str(input_shape))
+    input_dim = input_shape[channel_axis]
+    kernel_shape = self.kernel_size + (self.filters, input_dim)
+
+    self.kernel = self.add_variable(
+        'kernel',
+        shape=kernel_shape,
+        initializer=self.kernel_initializer,
+        regularizer=self.kernel_regularizer,
+        trainable=True,
+        dtype=self.dtype)
+    if self.use_bias:
+      self.bias = self.add_variable(
+          'bias',
+          shape=(self.filters,),
+          initializer=self.bias_initializer,
+          regularizer=self.bias_regularizer,
+          trainable=True,
+          dtype=self.dtype)
+    else:
+      self.bias = None
+
+  def call(self, inputs):
+    inputs_shape = array_ops.shape(inputs)
+    batch_size = inputs_shape[0]
+    if self.data_format == 'channels_first':
+      c_axis, d_axis, h_axis, w_axis = 1, 2, 3, 4
+    else:
+      c_axis, d_axis, h_axis, w_axis = 4, 1, 2, 3
+
+    depth = inputs_shape[d_axis]
+    height = inputs_shape[h_axis]
+    width = inputs_shape[w_axis]
+
+    kernel_d, kernel_h, kernel_w = self.kernel_size
+    stride_d, stride_h, stride_w = self.strides
+
+    # Infer the dynamic output shape:
+    out_depth = utils.get_deconv_dim(depth, stride_d, kernel_d, self.padding)
+    out_height = utils.get_deconv_dim(height, stride_h, kernel_h, self.padding)
+    out_width = utils.get_deconv_dim(width, stride_w, kernel_w, self.padding)
+
+    if self.data_format == 'channels_first':
+      output_shape = (batch_size, self.filters, out_depth, out_height,
+                      out_width)
+      strides = (1, 1, stride_d, stride_h, stride_w)
+    else:
+      output_shape = (batch_size, out_depth, out_height, out_width,
+                      self.filters)
+      strides = (1, stride_d, stride_h, stride_w, 1)
+
+    output_shape_tensor = array_ops.stack(output_shape)
+    outputs = nn.conv3d_transpose(
+        inputs,
+        self.kernel,
+        output_shape_tensor,
+        strides,
+        data_format=utils.convert_data_format(self.data_format, ndim=5),
+        padding=self.padding.upper())
+
+    # Infer the static output shape:
+    out_shape = inputs.get_shape().as_list()
+    out_shape[c_axis] = self.filters
+    out_shape[d_axis] = utils.get_deconv_dim(out_shape[d_axis], stride_d,
+                                             kernel_d, self.padding)
+    out_shape[h_axis] = utils.get_deconv_dim(out_shape[h_axis], stride_h,
+                                             kernel_h, self.padding)
+    out_shape[w_axis] = utils.get_deconv_dim(out_shape[w_axis], stride_w,
+                                             kernel_w, self.padding)
+    outputs.set_shape(out_shape)
+
+    if self.bias:
+      outputs_shape = outputs.shape.as_list()
+      if self.data_format == 'channels_first':
+        outputs_4d = array_ops.reshape(outputs, [
+            outputs_shape[0], outputs_shape[1],
+            outputs_shape[2] * outputs_shape[3], outputs_shape[4]
+        ])
+      else:
+        outputs_4d = array_ops.reshape(outputs, [
+            outputs_shape[0], outputs_shape[1] * outputs_shape[2],
+            outputs_shape[3], outputs_shape[4]
+        ])
+      outputs_4d = nn.bias_add(
+          outputs_4d,
+          self.bias,
+          data_format=utils.convert_data_format(self.data_format, ndim=4))
+      outputs = array_ops.reshape(outputs_4d, outputs_shape)
+
+    if self.activation is not None:
+      return self.activation(outputs)
+    return outputs
+
+
+def conv3d_transpose(inputs,
+                     filters,
+                     kernel_size,
+                     strides=(1, 1, 1),
+                     padding='valid',
+                     data_format='channels_last',
+                     activation=None,
+                     use_bias=True,
+                     kernel_initializer=None,
+                     bias_initializer=init_ops.zeros_initializer(),
+                     kernel_regularizer=None,
+                     bias_regularizer=None,
+                     activity_regularizer=None,
+                     trainable=True,
+                     name=None,
+                     reuse=None):
+  """Functional interface for transposed 3D convolution layer.
+
+  Arguments:
+    inputs: Input tensor.
+    filters: Integer, the dimensionality of the output space (i.e. the number
+      of filters in the convolution).
+    kernel_size: A tuple or list of 3 positive integers specifying the spatial
+      dimensions of of the filters. Can be a single integer to specify the same
+      value for all spatial dimensions.
+    strides: A tuple or list of 3 positive integers specifying the strides
+      of the convolution. Can be a single integer to specify the same value for
+      all spatial dimensions.
+    padding: one of `"valid"` or `"same"` (case-insensitive).
+    data_format: A string, one of `channels_last` (default) or `channels_first`.
+      The ordering of the dimensions in the inputs.
+      `channels_last` corresponds to inputs with shape
+      `(batch, height, width, channels)` while `channels_first` corresponds to
+      inputs with shape `(batch, channels, height, width)`.
+    activation: Activation function. Set it to None to maintain a
+      linear activation.
+    use_bias: Boolean, whether the layer uses a bias.
+    kernel_initializer: An initializer for the convolution kernel.
+    bias_initializer: An initializer for the bias vector. If None, no bias will
+      be applied.
+    kernel_regularizer: Optional regularizer for the convolution kernel.
+    bias_regularizer: Optional regularizer for the bias vector.
+    activity_regularizer: Regularizer function for the output.
+    trainable: Boolean, if `True` also add variables to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
+    name: A string, the name of the layer.
+    reuse: Boolean, whether to reuse the weights of a previous layer
+      by the same name.
+
+  Returns:
+    Output tensor.
+  """
+  layer = Conv3DTranspose(
+      filters=filters,
+      kernel_size=kernel_size,
+      strides=strides,
+      padding=padding,
+      data_format=data_format,
+      activation=activation,
+      use_bias=use_bias,
+      kernel_initializer=kernel_initializer,
+      bias_initializer=bias_initializer,
+      kernel_regularizer=kernel_regularizer,
+      bias_regularizer=bias_regularizer,
+      activity_regularizer=activity_regularizer,
+      trainable=trainable,
+      name=name,
+      _reuse=reuse,
+      _scope=name)
+  return layer.apply(inputs)
+
+
 # Aliases

 Convolution1D = Conv1D
@ -1222,8 +1456,10 @@ Convolution2D = Conv2D
 Convolution3D = Conv3D
 SeparableConvolution2D = SeparableConv2D
 Convolution2DTranspose = Deconvolution2D = Deconv2D = Conv2DTranspose
+Convolution3DTranspose = Deconvolution3D = Deconv3D = Conv3DTranspose
 convolution1d = conv1d
 convolution2d = conv2d
 convolution3d = conv3d
 separable_convolution2d = separable_conv2d
 convolution2d_transpose = deconvolution2d = deconv2d = conv2d_transpose
+convolution3d_transpose = deconvolution3d = deconv3d = conv3d_transpose
--- a/tensorflow/python/layers/convolutional_test.py
+++ b/tensorflow/python/layers/convolutional_test.py
@ -651,5 +651,174 @@ class Conv2DTransposeTest(test.TestCase):
    self.assertEqual(len(variables.trainable_variables()), 4)


+class Conv3DTransposeTest(test.TestCase):
+
+  def testInvalidDataFormat(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    with self.assertRaisesRegexp(ValueError, 'data_format'):
+      conv_layers.conv3d_transpose(volumes, 4, 3, data_format='invalid')
+
+  def testInvalidStrides(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    with self.assertRaisesRegexp(ValueError, 'strides'):
+      conv_layers.conv3d_transpose(volumes, 4, 3, strides=(1, 2))
+
+    with self.assertRaisesRegexp(ValueError, 'strides'):
+      conv_layers.conv3d_transpose(volumes, 4, 3, strides=None)
+
+  def testInvalidKernelSize(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    with self.assertRaisesRegexp(ValueError, 'kernel_size'):
+      conv_layers.conv3d_transpose(volumes, 4, (1, 2))
+
+    with self.assertRaisesRegexp(ValueError, 'kernel_size'):
+      conv_layers.conv3d_transpose(volumes, 4, None)
+
+  def testCreateConv3DTranspose(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], activation=nn_ops.relu)
+    output = layer.apply(volumes)
+    self.assertEqual(output.op.name, 'conv3d_transpose/Relu')
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth + 2, height + 2, width + 2, 4])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertListEqual(layer.bias.get_shape().as_list(), [4])
+
+  def testCreateConv3DTransposeIntegerKernelSize(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    layer = conv_layers.Conv3DTranspose(4, 3)
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth + 2, height + 2, width + 2, 4])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertListEqual(layer.bias.get_shape().as_list(), [4])
+
+  def testCreateConv3DTransposeChannelsFirst(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, 32, depth, height, width))
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], data_format='channels_first')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, 4, depth + 2, height + 2, width + 2])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertListEqual(layer.bias.get_shape().as_list(), [4])
+
+  def testConv3DTransposePaddingSame(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 64), seed=1)
+    layer = conv_layers.Conv3DTranspose(
+        32, volumes.get_shape()[1:4], padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth, height, width, 32])
+
+  def testCreateConv3DTransposeWithStrides(self):
+    depth, height, width = 4, 6, 8
+    # Test strides tuple.
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], strides=(2, 2, 2), padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth * 2, height * 2, width * 2, 4])
+
+    # Test strides integer.
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], strides=2, padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth * 2, height * 2, width * 2, 4])
+
+    # Test unequal strides.
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], strides=(2, 1, 1), padding='same')
+    output = layer.apply(volumes)
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth * 2, height, width, 4])
+
+  def testConv3DTransposeKernelRegularizer(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    reg = lambda x: 0.1 * math_ops.reduce_sum(x)
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], kernel_regularizer=reg)
+    layer.apply(volumes)
+    loss_keys = ops.get_collection(ops.GraphKeys.REGULARIZATION_LOSSES)
+    self.assertEqual(len(loss_keys), 1)
+    self.assertListEqual(layer.losses, loss_keys)
+
+  def testConv3DTransposeBiasRegularizer(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    reg = lambda x: 0.1 * math_ops.reduce_sum(x)
+    layer = conv_layers.Conv3DTranspose(4, [3, 3, 3], bias_regularizer=reg)
+    layer.apply(volumes)
+    loss_keys = ops.get_collection(ops.GraphKeys.REGULARIZATION_LOSSES)
+    self.assertEqual(len(loss_keys), 1)
+    self.assertListEqual(layer.losses, loss_keys)
+
+  def testConv3DTransposeNoBias(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32))
+    layer = conv_layers.Conv3DTranspose(
+        4, [3, 3, 3], activation=nn_ops.relu, use_bias=False)
+    output = layer.apply(volumes)
+    self.assertEqual(output.op.name, 'conv3d_transpose/Relu')
+    self.assertListEqual(output.get_shape().as_list(),
+                         [5, depth + 2, height + 2, width + 2, 4])
+    self.assertListEqual(layer.kernel.get_shape().as_list(), [3, 3, 3, 4, 32])
+    self.assertEqual(layer.bias, None)
+
+  def testFunctionalConv3DTransposeReuse(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+    self.assertEqual(len(variables.trainable_variables()), 2)
+    conv_layers.conv3d_transpose(
+        volumes, 4, [3, 3, 3], name='deconv1', reuse=True)
+    self.assertEqual(len(variables.trainable_variables()), 2)
+
+  def testFunctionalConv3DTransposeReuseFromScope(self):
+    with variable_scope.variable_scope('scope'):
+      depth, height, width = 5, 7, 9
+      volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+      conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+      self.assertEqual(len(variables.trainable_variables()), 2)
+    with variable_scope.variable_scope('scope', reuse=True):
+      conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+      self.assertEqual(len(variables.trainable_variables()), 2)
+
+  def testFunctionalConv3DTransposeInitializerFromScope(self):
+    with self.test_session() as sess:
+      with variable_scope.variable_scope(
+          'scope', initializer=init_ops.ones_initializer()):
+        depth, height, width = 5, 7, 9
+        volumes = random_ops.random_uniform(
+            (5, depth, height, width, 32), seed=1)
+        conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3], name='deconv1')
+        weights = variables.trainable_variables()
+        # Check the names of weights in order.
+        self.assertTrue('kernel' in weights[0].name)
+        self.assertTrue('bias' in weights[1].name)
+        sess.run(variables.global_variables_initializer())
+        weights = sess.run(weights)
+        # Check that the kernel weights got initialized to ones (from scope)
+        self.assertAllClose(weights[0], np.ones((3, 3, 3, 4, 32)))
+        # Check that the bias still got initialized to zeros.
+        self.assertAllClose(weights[1], np.zeros((4)))
+
+  def testFunctionalConv3DTransposeNoReuse(self):
+    depth, height, width = 5, 7, 9
+    volumes = random_ops.random_uniform((5, depth, height, width, 32), seed=1)
+    conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3])
+    self.assertEqual(len(variables.trainable_variables()), 2)
+    conv_layers.conv3d_transpose(volumes, 4, [3, 3, 3])
+    self.assertEqual(len(variables.trainable_variables()), 4)
+
+
 if __name__ == '__main__':
  test.main()
--- a/tensorflow/python/layers/layers.py
+++ b/tensorflow/python/layers/layers.py
@ -23,6 +23,7 @@
@@conv3d
@@separable_conv2d
@@conv2d_transpose
+@@conv3d_transpose
@@average_pooling1d
@@max_pooling1d
@@average_pooling2d
@ -50,6 +51,7 @@ from tensorflow.python.layers.convolutional import conv2d
 from tensorflow.python.layers.convolutional import conv3d
 from tensorflow.python.layers.convolutional import separable_conv2d
 from tensorflow.python.layers.convolutional import conv2d_transpose
+from tensorflow.python.layers.convolutional import conv3d_transpose

 # Pooling layers.
 from tensorflow.python.layers.pooling import average_pooling1d
--- a/tensorflow/python/layers/utils.py
+++ b/tensorflow/python/layers/utils.py
@ -26,6 +26,7 @@ import numpy as np

 from tensorflow.python.ops import variables
 from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
 from tensorflow.python.framework import ops
 from tensorflow.python.framework import tensor_util

@ -164,3 +165,28 @@ def constant_value(pred):
  else:
    raise TypeError('`pred` must be a Tensor, a Variable, or a Python bool.')
  return pred_value
+
+
+def get_deconv_dim(dim_size, stride_size, kernel_size, padding):
+  """Return output dimension of a deconv layer, based on input dimension.
+
+  Arguments:
+    dim_size: An int representing size of dimension, can be height, width
+      or depth.
+    stride_size: An int representing the stride of deconvolution filters
+      along the same dimension.
+    kernel_size: An int representing size of deconv kernel (filter) along
+      the same dimension.
+    padding: one of `"valid"` or `"same"` (case-insensitive).
+
+  Returns:
+    An int representing the size of output dimension of the layer.
+  """
+  if isinstance(dim_size, ops.Tensor):
+    dim_size = math_ops.multiply(dim_size, stride_size)
+  elif dim_size is not None:
+    dim_size *= stride_size
+
+  if padding == 'valid' and dim_size is not None:
+    dim_size += max(kernel_size - stride_size, 0)
+  return dim_size
--- a/tensorflow/python/layers/utils_test.py
+++ b/tensorflow/python/layers/utils_test.py
@ -62,6 +62,13 @@ class ConvUtilsTest(test.TestCase):
    with self.assertRaises(ValueError):
      utils.normalize_padding('invalid')

+  def testGetDeconvDim(self):
+    self.assertEqual(utils.get_deconv_dim(30, 1, 3, 'valid'), 32)
+    self.assertEqual(utils.get_deconv_dim(28, 1, 5, 'valid'), 32)
+    self.assertEqual(utils.get_deconv_dim(28, 2, 5, 'valid'), 59)
+    self.assertEqual(utils.get_deconv_dim(32, 1, 3, 'same'), 32)
+    self.assertEqual(utils.get_deconv_dim(32, 1, 5, 'same'), 32)
+    self.assertEqual(utils.get_deconv_dim(32, 2, 5, 'same'), 64)

 if __name__ == '__main__':
  test.main()
--- a/tensorflow/python/ops/array_ops.py
+++ b/tensorflow/python/ops/array_ops.py
@ -470,7 +470,10 @@ def _SliceHelper(tensor, slice_spec, var=None):
    else:
      begin.append(s)
      end.append(s + 1)
-      strides.append(1)
+      if isinstance(s, ops.Tensor):
+        strides.append(constant(1, s.dtype))
+      else:
+        strides.append(np.ones_like(s).dtype.type(1))
      shrink_axis_mask |= (1 << index)
    index += 1

--- a/tensorflow/python/ops/ctc_ops.py
+++ b/tensorflow/python/ops/ctc_ops.py
@ -197,7 +197,7 @@ def ctc_greedy_decoder(inputs, sequence_length, merge_repeated=True):
    merge_repeated: Boolean.  Default: True.

  Returns:
-    A tuple `(decoded, log_probabilities)` where
+    A tuple `(decoded, neg_sum_logits)` where
    decoded: A single-element list. `decoded[0]`
      is an `SparseTensor` containing the decoded outputs s.t.:
      `decoded.indices`: Indices matrix `(total_decoded_outputs x 2)`.
@ -206,8 +206,9 @@ def ctc_greedy_decoder(inputs, sequence_length, merge_repeated=True):
        The vector stores the decoded classes.
      `decoded.shape`: Shape vector, size `(2)`.
        The shape values are: `[batch_size, max_decoded_length]`
-    log_probability: A `float` matrix `(batch_size x 1)` containing sequence
-        log-probabilities.
+    neg_sum_logits: A `float` matrix `(batch_size x 1)` containing, for the
+        sequence found, the negative of the sum of the greatest logit at each
+        timeframe.
  """
  outputs = gen_ctc_ops._ctc_greedy_decoder(
      inputs, sequence_length, merge_repeated=merge_repeated)
--- a/tensorflow/python/ops/init_ops.py
+++ b/tensorflow/python/ops/init_ops.py
@ -39,6 +39,7 @@ from tensorflow.python.framework import constant_op
 from tensorflow.python.framework import dtypes
 from tensorflow.python.ops import array_ops
 from tensorflow.python.ops import linalg_ops
+from tensorflow.python.ops import math_ops
 from tensorflow.python.ops import random_ops


@ -487,16 +488,18 @@ class Orthogonal(Initializer):
    flat_shape = (num_rows, num_cols)

    # Generate a random matrix
-    a = random_ops.random_uniform(flat_shape, dtype=dtype, seed=self.seed)
-    # Compute the svd
-    _, u, v = linalg_ops.svd(a, full_matrices=False)
-    # Pick the appropriate singular value decomposition
-    if num_rows > num_cols:
-      q = u
-    else:
-      # Tensorflow departs from numpy conventions
-      # such that we need to transpose axes here
-      q = array_ops.transpose(v)
+    a = random_ops.random_normal(flat_shape, dtype=dtype, seed=self.seed)
+    # Compute the qr factorization
+    q, r = linalg_ops.qr(a, full_matrices=False)
+    # Make Q uniform
+    square_len = math_ops.minimum(num_rows, num_cols)
+    d = array_ops.diag_part(r[:square_len, :square_len])
+    ph = d / math_ops.abs(d)
+    q *= ph
+    # Pad zeros to Q (if rows smaller than cols)
+    if num_rows < num_cols:
+      padding = array_ops.zeros([num_rows, num_cols - num_rows], dtype=dtype)
+      q = array_ops.concat([q, padding], 1)
    return self.gain * array_ops.reshape(q, shape)

  def get_config(self):
--- a/tensorflow/python/ops/logging_ops.py
+++ b/tensorflow/python/ops/logging_ops.py
@ -124,7 +124,7 @@ def image_summary(tag, tensor, max_images=3, collections=None, name=None):
  """Outputs a `Summary` protocol buffer with images.

  For an explanation of why this op was deprecated, and information on how to
-  migrate, look ['here'](https://www.tensorflow.org/code/tensorflow/contrib/deprecated/__init__.py)
+  migrate, look ['here'](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/deprecated/__init__.py)

  The summary has up to `max_images` summary values containing images. The
  images are built from `tensor` which must be 4-D with shape `[batch_size,
--- a/tensorflow/python/ops/math_grad.py
+++ b/tensorflow/python/ops/math_grad.py
@ -613,6 +613,16 @@ def _AtanGrad(op, grad):
    return grad * inv


+@ops.RegisterGradient("Atan2")
+def _Atan2Grad(op, grad):
+  """Returns grad * x / (x^2 + y^2), grad * -y / (x^2 + y^2)."""
+  y = op.inputs[0]
+  x = op.inputs[1]
+  with ops.control_dependencies([grad.op]):
+    grad_inv = grad / (math_ops.square(x) + math_ops.square(y))
+    return x * grad_inv, -y * grad_inv
+
+
@ops.RegisterGradient("AddN")
 def _AddNGrad(op, grad):
  """Copies the gradient to all inputs."""
--- a/tensorflow/python/ops/math_ops.py
+++ b/tensorflow/python/ops/math_ops.py
@ -56,6 +56,7 @@ See the @{$python/math_ops} guide.
@@acos
@@asin
@@atan
+@@atan2
@@lgamma
@@digamma
@@erf
@ -824,7 +825,16 @@ def _OverrideBinaryOperatorHelper(func, op_name, clazz_object=ops.Tensor):
  def binary_op_wrapper(x, y):
    with ops.name_scope(None, op_name, [x, y]) as name:
      if not isinstance(y, sparse_tensor.SparseTensor):
-        y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
+        try:
+          y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
+        except TypeError:
+          # If the RHS is not a tensor, it might be a tensor aware object
+          # that can implement the operator with knowledge of itself
+          # and the tensor.
+          if hasattr(type(y), "__r%s__" % op_name):
+            return NotImplemented
+          else:
+            raise
      return func(x, y, name=name)

  def binary_op_wrapper_sparse(sp_x, y):
--- a/tensorflow/python/ops/nn_impl.py
+++ b/tensorflow/python/ops/nn_impl.py
@ -301,9 +301,8 @@ def zero_fraction(value, name=None):
  This is useful in summaries to measure and report sparsity.  For example,

  ```python
-      z = tf.Relu(...)
-      summ = tf.contrib.deprecated.scalar_summary('sparsity',
-      tf.nn.zero_fraction(z))
+      z = tf.nn.relu(...)
+      summ = tf.summary.scalar('sparsity', tf.nn.zero_fraction(z))
  ```

  Args:
--- a/tensorflow/python/ops/nn_ops.py
+++ b/tensorflow/python/ops/nn_ops.py
@ -840,6 +840,11 @@ def pool(input,  # pylint: disable=redefined-builtin
 def atrous_conv2d(value, filters, rate, padding, name=None):
  """Atrous convolution (a.k.a. convolution with holes or dilated convolution).

+  This function is a simpler wrapper around the more general
+  @{tf.nn.convolution}, and exists only for backwards compatibility. You can
+  use @{tf.nn.convolution} to perform 1-D, 2-D, or 3-D atrous convolution.
+
+
  Computes a 2-D atrous convolution, also known as convolution with holes or
  dilated convolution, given 4-D `value` and `filters` tensors. If the `rate`
  parameter is equal to one, it performs regular 2-D convolution. If the `rate`
@ -959,93 +964,12 @@ def atrous_conv2d(value, filters, rate, padding, name=None):
    ValueError: If input/output depth does not match `filters`' shape, or if
      padding is other than `'VALID'` or `'SAME'`.
  """
-  with ops.name_scope(name, "atrous_conv2d", [value, filters]) as name:
-    value = ops.convert_to_tensor(value, name="value")
-    filters = ops.convert_to_tensor(filters, name="filters")
-    if not value.get_shape()[3].is_compatible_with(filters.get_shape()[2]):
-      raise ValueError(
-          "value's input channels does not match filters' input channels, "
-          "{} != {}".format(value.get_shape()[3], filters.get_shape()[2]))
-    if rate < 1:
-      raise ValueError("rate {} cannot be less than one".format(rate))
-
-    if rate == 1:
-      value = gen_nn_ops.conv2d(input=value,
-                                filter=filters,
-                                strides=[1, 1, 1, 1],
-                                padding=padding)
-      return value
-
-    # We have two padding contributions. The first is used for converting "SAME"
-    # to "VALID". The second is required so that the height and width of the
-    # zero-padded value tensor are multiples of rate.
-
-    # Padding required to reduce to "VALID" convolution
-    if padding == "SAME":
-      # Handle filters whose shape is unknown during graph creation.
-      if filters.get_shape().is_fully_defined():
-        filter_shape = filters.get_shape().as_list()
-      else:
-        filter_shape = array_ops.shape(filters)
-      filter_height, filter_width = filter_shape[0], filter_shape[1]
-
-      # Spatial dimensions of the filters and the upsampled filters in which we
-      # introduce (rate - 1) zeros between consecutive filter values.
-      filter_height_up = filter_height + (filter_height - 1) * (rate - 1)
-      filter_width_up = filter_width + (filter_width - 1) * (rate - 1)
-
-      pad_height = filter_height_up - 1
-      pad_width = filter_width_up - 1
-
-      # When pad_height (pad_width) is odd, we pad more to bottom (right),
-      # following the same convention as conv2d().
-      pad_top = pad_height // 2
-      pad_bottom = pad_height - pad_top
-      pad_left = pad_width // 2
-      pad_right = pad_width - pad_left
-    elif padding == "VALID":
-      pad_top = 0
-      pad_bottom = 0
-      pad_left = 0
-      pad_right = 0
-    else:
-      raise ValueError("Invalid padding")
-
-    # Handle input whose shape is unknown during graph creation.
-    if value.get_shape().is_fully_defined():
-      value_shape = value.get_shape().as_list()
-    else:
-      value_shape = array_ops.shape(value)
-
-    in_height = value_shape[1] + pad_top + pad_bottom
-    in_width = value_shape[2] + pad_left + pad_right
-
-    # More padding so that rate divides the height and width of the input.
-    pad_bottom_extra = (rate - in_height % rate) % rate
-    pad_right_extra = (rate - in_width % rate) % rate
-
-    # The paddings argument to space_to_batch includes both padding components.
-    space_to_batch_pad = [[pad_top, pad_bottom + pad_bottom_extra],
-                          [pad_left, pad_right + pad_right_extra]]
-
-    value = array_ops.space_to_batch(input=value,
-                                     paddings=space_to_batch_pad,
-                                     block_size=rate)
-
-    value = gen_nn_ops.conv2d(input=value,
-                              filter=filters,
-                              strides=[1, 1, 1, 1],
-                              padding="VALID",
-                              name=name)
-
-    # The crops argument to batch_to_space is just the extra padding component.
-    batch_to_space_crop = [[0, pad_bottom_extra], [0, pad_right_extra]]
-
-    value = array_ops.batch_to_space(input=value,
-                                     crops=batch_to_space_crop,
-                                     block_size=rate)
-
-    return value
+  return convolution(
+      input=value,
+      filter=filters,
+      padding=padding,
+      dilation_rate=np.broadcast_to(rate, (2,)),
+      name=name)


 def conv2d_transpose(value,
@ -1272,7 +1196,7 @@ def conv3d_transpose(value,
                     output_shape,
                     strides,
                     padding="SAME",
-                     data_format=None,
+                     data_format="NDHWC",
                     name=None):
  """The transpose of `conv3d`.

@ -1308,10 +1232,11 @@ def conv3d_transpose(value,
                      [value, filter, output_shape]) as name:
    value = ops.convert_to_tensor(value, name="value")
    filter = ops.convert_to_tensor(filter, name="filter")
-    if not value.get_shape()[4].is_compatible_with(filter.get_shape()[4]):
+    axis = 1 if data_format == "NCDHW" else 4
+    if not value.get_shape()[axis].is_compatible_with(filter.get_shape()[4]):
      raise ValueError("input channels does not match filter's input channels, "
-                       "{} != {}".format(value.get_shape()[4], filter.get_shape(
-                       )[4]))
+                       "{} != {}".format(value.get_shape()[axis],
+                                         filter.get_shape()[4]))

    output_shape_ = ops.convert_to_tensor(output_shape, name="output_shape")
    if not output_shape_.get_shape().is_compatible_with(tensor_shape.vector(5)):
--- a/tensorflow/python/ops/parsing_ops.py
+++ b/tensorflow/python/ops/parsing_ops.py
@ -845,7 +845,7 @@ def parse_single_sequence_example(
  Parses a single serialized [`SequenceExample`](https://www.tensorflow.org/code/tensorflow/core/example/example.proto)
  proto given in `serialized`.

-  This op parses a serialize sequence example into a tuple of dictionaries
+  This op parses a serialized sequence example into a tuple of dictionaries
  mapping keys to `Tensor` and `SparseTensor` objects respectively.
  The first dictionary contains mappings for keys appearing in
  `context_features`, and the second dictionary contains mappings for keys
--- a/tensorflow/python/ops/random_ops.py
+++ b/tensorflow/python/ops/random_ops.py
@ -324,7 +324,7 @@ def multinomial(logits, num_samples, seed=None, name=None):

  Args:
    logits: 2-D Tensor with shape `[batch_size, num_classes]`.  Each slice
-      `[i, :]` represents the unnormalized log probabilities for all classes.
+      `[i, :]` represents the log-odds for all classes.
    num_samples: 0-D.  Number of independent samples to draw for each row slice.
    seed: A Python integer. Used to create a random seed for the distribution.
      See
--- a/tensorflow/python/summary/writer/writer_cache.py
+++ b/tensorflow/python/summary/writer/writer_cache.py
@ -39,6 +39,10 @@ class FileWriterCache(object):
  def clear():
    """Clear cached summary writers. Currently only used for unit tests."""
    with FileWriterCache._lock:
+      # Make sure all the writers are closed now (otherwise open file handles
+      # may hang around, blocking deletions on Windows).
+      for item in FileWriterCache._cache.values():
+        item.close()
      FileWriterCache._cache = {}

  @staticmethod
--- a/tensorflow/python/tools/import_pb_to_tensorboard.py
+++ b/tensorflow/python/tools/import_pb_to_tensorboard.py
@ -0,0 +1,50 @@
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ================================
+"""Imports a protobuf model as a graph in Tensorboard."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.core.framework import graph_pb2
+from tensorflow.python.client import session
+from tensorflow.python.framework import importer
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import gfile
+from tensorflow.python.summary import summary
+
+
+def import_to_tensorboard(model_dir, log_dir):
+  """View an imported protobuf model (`.pb` file) as a graph in Tensorboard.
+
+  Args:
+    model_dir: The location of the protobuf (`pb`) model to visualize
+    log_dir: The location for the Tensorboard log to begin visualisation from.
+
+  Usage:
+    Call this function with your model location and desired log directory.
+    Launch Tensorboard by pointing it to the log directory.
+    View your imported `.pb` model as a graph.
+  """
+  with session.Session(graph=ops.Graph()) as sess:
+    with gfile.FastGFile(model_dir, "rb") as f:
+      graph_def = graph_pb2.GraphDef()
+      graph_def.ParseFromString(f.read())
+      importer.import_graph_def(graph_def)
+
+    pb_visual_writer = summary.FileWriter(log_dir)
+    pb_visual_writer.add_graph(sess.graph)
+    print("Model Imported. Visualize by running: "
+          "> tensorboard --logdir={}".format(log_dir))
--- a/tensorflow/python/training/checkpoint_utils.py
+++ b/tensorflow/python/training/checkpoint_utils.py
@ -59,14 +59,14 @@ def load_checkpoint(ckpt_dir_or_file):


 def load_variable(ckpt_dir_or_file, name):
-  """Returns a tensor with the contents of the given variable in the checkpoint.
+  """Returns the tensor value of the given variable in the checkpoint.

  Args:
    ckpt_dir_or_file: Directory with checkpoints file or path to checkpoint.
-    name: Name of the tensor to return.
+    name: Name of the variable to return.

  Returns:
-    `Tensor` object.
+    A numpy `ndarray` with a copy of the value of this variable.
  """
  # TODO(b/29227106): Fix this in the right place and remove this.
  if name.endswith(":0"):
@ -210,9 +210,8 @@ def init_from_checkpoint(ckpt_dir_or_file, assignment_map):
      else:
        var_name = ",".join([v.name for v in var])
      _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
-      logging.info("Initialize variable %s from checkpoint %s with %s" % (
-          var_name, ckpt_dir_or_file, tensor_name_in_ckpt
-      ))
+      logging.info("Initialize variable %s from checkpoint %s with %s",
+                   var_name, ckpt_dir_or_file, tensor_name_in_ckpt)
    else:
      scopes = ""
      # TODO(vihanjain): Support list of 'current_var_or_name' here.
@ -250,9 +249,8 @@ def init_from_checkpoint(ckpt_dir_or_file, assignment_map):
        if var is None:
          var = _collect_partitioned_variable(var_name, store_vars)
        _set_variable_or_list_initializer(var, ckpt_file, full_tensor_name)
-        logging.info("Initialize variable %s from checkpoint %s with %s" % (
-            var_name, ckpt_dir_or_file, full_tensor_name
-        ))
+        logging.info("Initialize variable %s from checkpoint %s with %s",
+                     var_name, ckpt_dir_or_file, full_tensor_name)


 def _get_checkpoint_filename(ckpt_dir_or_file):
--- a/tensorflow/python/training/input.py
+++ b/tensorflow/python/training/input.py
@ -879,9 +879,6 @@ def batch(tensors, batch_size, num_threads=1, capacity=32,
  `get_shape` method will have a first `Dimension` value of `None`, and
  operations that depend on fixed batch_size would fail.

-  Note: if `num_epochs` is not `None`, this function creates local counter
-  `epochs`. Use `local_variables_initializer()` to initialize local variables.
-
  Args:
    tensors: The list or dictionary of tensors to enqueue.
    batch_size: The new batch size pulled from the queue.
@ -1181,9 +1178,6 @@ def shuffle_batch(tensors, batch_size, capacity, min_after_dequeue,
  `get_shape` method will have a first `Dimension` value of `None`, and
  operations that depend on fixed batch_size would fail.

-  Note: if `num_epochs` is not `None`, this function creates local counter
-  `epochs`. Use `local_variables_initializer()` to initialize local variables.
-
  Args:
    tensors: The list or dictionary of tensors to enqueue.
    batch_size: The new batch size pulled from the queue.
--- a/tensorflow/python/training/monitored_session.py
+++ b/tensorflow/python/training/monitored_session.py
@ -559,7 +559,7 @@ class MonitoredSession(_MonitoredSession):

  ```python
  saver_hook = CheckpointSaverHook(...)
-  summary_hook = SummaryHook(...)
+  summary_hook = SummarySaverHook(...)
  with MonitoredSession(session_creator=ChiefSessionCreator(...),
                        hooks=[saver_hook, summary_hook]) as sess:
    while not sess.should_stop():
@ -648,7 +648,7 @@ class SingularMonitoredSession(_MonitoredSession):
  Example usage:
  ```python
  saver_hook = CheckpointSaverHook(...)
-  summary_hook = SummaryHook(...)
+  summary_hook = SummarySaverHook(...)
  with SingularMonitoredSession(hooks=[saver_hook, summary_hook]) as sess:
    while not sess.should_stop():
      sess.run(train_op)
--- a/tensorflow/python/training/saver.py
+++ b/tensorflow/python/training/saver.py
@ -935,11 +935,11 @@ def get_checkpoint_state(checkpoint_dir, latest_filename=None):
          ckpt.all_model_checkpoint_paths[i] = os.path.join(checkpoint_dir, p)
  except errors.OpError as e:
    # It's ok if the file cannot be read
-    logging.warning(str(e))
+    logging.warning("%s: %s", type(e).__name__, e)
    logging.warning("%s: Checkpoint ignored", coord_checkpoint_filename)
    return None
  except text_format.ParseError as e:
-    logging.warning(str(e))
+    logging.warning("%s: %s", type(e).__name__, e)
    logging.warning("%s: Checkpoint ignored", coord_checkpoint_filename)
    return None
  finally:
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@ -67,14 +67,6 @@ limitations under the License.
 extern bool FLAGS_check_gpu_leaks;
 bool FLAGS_prefer_cubin_to_ptx = true;

-namespace perftools {
-namespace gputools {
-namespace rng {
-class RngSupport;
-}  // namespace rng
-}  // namespace gputools
-}  // namespace perftools
-
 namespace perftools {
 namespace gputools {
 namespace cuda {
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
@ -35,17 +35,6 @@ limitations under the License.
 #include "tensorflow/stream_executor/platform/thread_annotations.h"
 #include "tensorflow/stream_executor/stream_executor_internal.h"

-namespace perftools {
-namespace gputools {
-namespace blas {
-class BlasSupport;
-}
-namespace internal {
-class RngSupport;
-}  // namespace internal
-}  // namespace gputools
-}  // namespace perftools
-
 namespace perftools {
 namespace gputools {
 namespace cuda {
--- a/tensorflow/tensorboard/backend/application_test.py
+++ b/tensorflow/tensorboard/backend/application_test.py
@ -227,6 +227,19 @@ class TensorboardServerTest(test.TestCase):
      response.read()
      connection.close()

+  def testScalars(self):
+    """Test the format of /data/scalars."""
+    data = self._getJson('/data/scalars?run=run1&tag=simple_values')
+    self.assertEqual(len(data), self._SCALAR_COUNT)
+
+  def testScalarsCsv(self):
+    """Test the csv format of /data/scalars."""
+    data = self._get(
+        '/data/scalars?run=run1&tag=simple_values&format=csv').read()
+    line_count = data.count('\n')
+    self.assertEqual(line_count,
+                     self._SCALAR_COUNT + 1)  # include 1 more line for header
+
  def testHistograms(self):
    """Test the format of /data/histograms."""
    self.assertEqual(
--- a/tensorflow/tensorboard/backend/event_processing/event_accumulator_test.py
+++ b/tensorflow/tensorboard/backend/event_processing/event_accumulator_test.py
@ -225,6 +225,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertTagsEqual(x.Tags(), {})

  def testTags(self):
+    """Tags should be found in EventAccumulator after adding some events."""
    gen = _EventGenerator(self)
    gen.AddScalar('s1')
    gen.AddScalar('s2')
@ -245,6 +246,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    })

  def testReload(self):
+    """EventAccumulator contains suitable tags after calling Reload."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    acc.Reload()
@ -267,6 +269,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    })

  def testScalars(self):
+    """Tests whether EventAccumulator contains scalars after adding them."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    s1 = ea.ScalarEvent(wall_time=1, step=10, value=32)
@ -293,6 +296,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
      self.assertEqual(expected_value, gotten_event.value[i])

  def testHealthPills(self):
+    """HealthPills should be properly inserted into EventAccumulator."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    gen.AddHealthPill(13371337, 41, 'Add', 0, range(1, 13))
@ -328,6 +332,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertItemsEqual(['Add', 'MatMul'], acc.GetOpsWithHealthPills())

  def testHistograms(self):
+    """Tests whether histograms are inserted into EventAccumulator."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)

@ -377,6 +382,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertEqual(acc.Histograms('hst2'), [hst2])

  def testCompressedHistograms(self):
+    """Tests compressed histograms inserted into EventAccumulator."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen, compression_bps=(0, 2500, 5000, 7500, 10000))

@ -428,6 +434,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertEqual(acc.CompressedHistograms('hst2'), [expected_cmphst2])

  def testCompressedHistogramsWithEmptyHistogram(self):
+    """Tests that empty histograms compressed properly in EventAccumulator."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen, compression_bps=(0, 2500, 5000, 7500, 10000))

@ -481,6 +488,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertAlmostEqual(vals[8].value, 1.0)

  def testImages(self):
+    """Tests 2 images inserted/accessed in EventAccumulator."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    im1 = ea.ImageEvent(
@ -514,6 +522,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertEqual(acc.Images('im2'), [im2])

  def testAudio(self):
+    """Tests 2 audio events inserted/accessed in EventAccumulator."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    snd1 = ea.AudioEvent(
@ -551,6 +560,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
    self.assertEqual(acc.Audio('snd2'), [snd2])

  def testKeyError(self):
+    """KeyError should be raised when accessing non-existing keys."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    acc.Reload()
@ -574,7 +584,7 @@ class MockingEventAccumulatorTest(EventAccumulatorTest):
      acc.Audio('hst1')

  def testNonValueEvents(self):
-    """Tests that non-value events in the generator don't cause early exits."""
+    """Non-value events in the generator don't cause early exits."""
    gen = _EventGenerator(self)
    acc = ea.EventAccumulator(gen)
    gen.AddScalar('s1', wall_time=1, step=10, value=20)
--- a/tensorflow/tensorboard/dist/tf-tensorboard.html
+++ b/tensorflow/tensorboard/dist/tf-tensorboard.html
@ -20112,7 +20112,7 @@ var TF;
          new TF.Dashboard.TfGraphDashboard(backend, debuggerDataEnabled),
          new TF.Dashboard.TfDistributionDashboard(backend),
          new TF.Dashboard.TfHistogramDashboard(backend),
-          new TF.Dashboard.VzProjectorDashboard('/data/plugin/projector'),
+          new TF.Dashboard.VzProjectorDashboard('data/plugin/projector'),
          new TF.Dashboard.TfTextDashboard(backend),
        ];
      },
--- a/tensorflow/tensorflow.bzl
+++ b/tensorflow/tensorflow.bzl
@ -1185,7 +1185,7 @@ def tf_version_info_genrule():
      ],
      outs=["util/version_info.cc"],
      cmd=
-      "$(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$@\"",
+      "$(PYTHON_BIN_PATH) $(location //tensorflow/tools/git:gen_git_source.py) --generate $(SRCS) \"$@\"",
      local=1,
      tools=[clean_dep("//tensorflow/tools/git:gen_git_source.py")],)

--- a/tensorflow/tools/api/golden/tensorflow.layers.pbtxt
+++ b/tensorflow/tools/api/golden/tensorflow.layers.pbtxt
@ -32,6 +32,10 @@ tf_module {
    name: "conv3d"
    argspec: "args=[\'inputs\', \'filters\', \'kernel_size\', \'strides\', \'padding\', \'data_format\', \'dilation_rate\', \'activation\', \'use_bias\', \'kernel_initializer\', \'bias_initializer\', \'kernel_regularizer\', \'bias_regularizer\', \'activity_regularizer\', \'trainable\', \'name\', \'reuse\'], varargs=None, keywords=None, defaults=[\'(1, 1, 1)\', \'valid\', \'channels_last\', \'(1, 1, 1)\', \'None\', \'True\', \'None\', \'<tensorflow.python.ops.init_ops.Zeros object instance>\', \'None\', \'None\', \'None\', \'True\', \'None\', \'None\'], "
  }
+  member_method {
+    name: "conv3d_transpose"
+    argspec: "args=[\'inputs\', \'filters\', \'kernel_size\', \'strides\', \'padding\', \'data_format\', \'activation\', \'use_bias\', \'kernel_initializer\', \'bias_initializer\', \'kernel_regularizer\', \'bias_regularizer\', \'activity_regularizer\', \'trainable\', \'name\', \'reuse\'], varargs=None, keywords=None, defaults=[\'(1, 1, 1)\', \'valid\', \'channels_last\', \'None\', \'True\', \'None\', \'<tensorflow.python.ops.init_ops.Zeros object instance>\', \'None\', \'None\', \'None\', \'True\', \'None\', \'None\'], "
+  }
  member_method {
    name: "dense"
    argspec: "args=[\'inputs\', \'units\', \'activation\', \'use_bias\', \'kernel_initializer\', \'bias_initializer\', \'kernel_regularizer\', \'bias_regularizer\', \'activity_regularizer\', \'trainable\', \'name\', \'reuse\'], varargs=None, keywords=None, defaults=[\'None\', \'True\', \'None\', \'<tensorflow.python.ops.init_ops.Zeros object instance>\', \'None\', \'None\', \'None\', \'True\', \'None\', \'None\'], "
--- a/tensorflow/tools/api/golden/tensorflow.nn.pbtxt
+++ b/tensorflow/tools/api/golden/tensorflow.nn.pbtxt
@ -70,7 +70,7 @@ tf_module {
  }
  member_method {
    name: "conv3d_transpose"
-    argspec: "args=[\'value\', \'filter\', \'output_shape\', \'strides\', \'padding\', \'data_format\', \'name\'], varargs=None, keywords=None, defaults=[\'SAME\', \'None\', \'None\'], "
+    argspec: "args=[\'value\', \'filter\', \'output_shape\', \'strides\', \'padding\', \'data_format\', \'name\'], varargs=None, keywords=None, defaults=[\'SAME\', \'NDHWC\', \'None\'], "
  }
  member_method {
    name: "convolution"
--- a/tensorflow/tools/api/golden/tensorflow.pbtxt
+++ b/tensorflow/tools/api/golden/tensorflow.pbtxt
@ -636,6 +636,10 @@ tf_module {
    name: "atan"
    argspec: "args=[\'x\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
  }
+  member_method {
+    name: "atan2"
+    argspec: "args=[\'y\', \'x\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
+  }
  member_method {
    name: "batch_to_space"
    argspec: "args=[\'input\', \'crops\', \'block_size\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
--- a/tensorflow/tools/ci_build/update_version.sh
+++ b/tensorflow/tools/ci_build/update_version.sh
@ -73,8 +73,11 @@ OLD_MAJOR=$(cat ${VERSION_H} | grep -E "^#define TF_MAJOR_VERSION [0-9]+" | \
 cut -d ' ' -f 3)
 OLD_MINOR=$(cat ${VERSION_H} | grep -E "^#define TF_MINOR_VERSION [0-9]+" | \
 cut -d ' ' -f 3)
-OLD_PATCH=$(cat ${VERSION_H} | grep -E "^#define TF_PATCH_VERSION [[:alnum:]-]+" | \
+OLD_PATCH_NUM=$(cat ${VERSION_H} | grep -E "^#define TF_PATCH_VERSION [[:alnum:]-]+" | \
 cut -d ' ' -f 3)
+OLD_EXTENSION=$(cat ${VERSION_H} | grep -E "^#define TF_VERSION_SUFFIX \"[[:alnum:]-]+\"" | \
+cut -d ' ' -f 3)
+OLD_PATCH="$OLD_PATCH_NUM${OLD_EXTENSION//\"}"
 OLD_PIP_PATCH="${OLD_PATCH//-}"

 sed -i -e "s/^#define TF_MAJOR_VERSION ${OLD_MAJOR}/#define TF_MAJOR_VERSION ${MAJOR}/g" ${VERSION_H}
--- a/tensorflow/tools/common/public_api.py
+++ b/tensorflow/tools/common/public_api.py
@ -38,7 +38,7 @@ class PublicAPIVisitor(object):
    self._visitor = visitor

    # Modules/classes we do not want to descend into if we hit them. Usually,
-    # sytem modules exposed through platforms for compatibility reasons.
+    # system modules exposed through platforms for compatibility reasons.
    # Each entry maps a module path to a name to ignore in traversal.
    self._do_not_descend_map = {
        '': [
--- a/tensorflow/tools/compatibility/tf_upgrade.py
+++ b/tensorflow/tools/compatibility/tf_upgrade.py
@ -34,6 +34,10 @@ class APIChangeSpec(object):
    # Maps from a function name to a dictionary that describes how to
    # map from an old argument keyword to the new argument keyword.
    self.function_keyword_renames = {
+        "tf.batch_matmul": {
+            "adj_x": "adjoint_a",
+            "adj_y": "adjoint_b",
+        },
        "tf.count_nonzero": {
            "reduction_indices": "axis"
        },
--- a/tensorflow/tools/docker/Dockerfile.devel
+++ b/tensorflow/tools/docker/Dockerfile.devel
@ -92,7 +92,8 @@ WORKDIR /tensorflow
 ENV CI_BUILD_PYTHON python

 RUN tensorflow/tools/ci_build/builds/configured CPU \
-    bazel build -c opt tensorflow/tools/pip_package:build_pip_package && \
+    bazel build -c opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
+        tensorflow/tools/pip_package:build_pip_package && \
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
    pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
    rm -rf /tmp/pip && \
--- a/tensorflow/tools/docker/Dockerfile.devel-gpu
+++ b/tensorflow/tools/docker/Dockerfile.devel-gpu
@ -92,7 +92,8 @@ ENV TF_NEED_CUDA 1
 ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0,3.5,5.2,6.0,6.1

 RUN tensorflow/tools/ci_build/builds/configured GPU \
-    bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package && \
+    bazel build -c opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
+        tensorflow/tools/pip_package:build_pip_package && \
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
    pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
    rm -rf /tmp/pip && \
--- a/tensorflow/tools/docs/doc_generator_visitor.py
+++ b/tensorflow/tools/docs/doc_generator_visitor.py
@ -170,7 +170,7 @@ class DocGeneratorVisitor(object):
    master names to a lexicographically sorted list of all aliases for that name
    (incl. the master name).

-    All these are computed and set as fields if they haven't aready.
+    All these are computed and set as fields if they haven't already.
    """
    if self._reverse_index is not None:
      return
--- a/tensorflow/tools/docs/pretty_docs.py
+++ b/tensorflow/tools/docs/pretty_docs.py
@ -230,7 +230,7 @@ def _build_signature(obj_info):


 def _build_compatibility(compatibility):
-  """Return the compatability section as an md string."""
+  """Return the compatibility section as an md string."""
  parts = []
  sorted_keys = sorted(compatibility.keys())
  for key in sorted_keys:
--- a/tensorflow/tools/graph_transforms/README.md
+++ b/tensorflow/tools/graph_transforms/README.md
@ -81,10 +81,10 @@ bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
 --out_graph=optimized_inception_graph.pb \
 --inputs='Mul:0' \
 --outputs='softmax:0' \
--transforms='\
-strip_unused_nodes(type=float, shape="1,299,299,3") \
-remove_nodes(op=Identity, op=CheckNumerics) \
-fold_old_batch_norms \
+--transforms='
+strip_unused_nodes(type=float, shape="1,299,299,3")
+remove_nodes(op=Identity, op=CheckNumerics)
+fold_old_batch_norms
 '
 ```

@ -94,7 +94,10 @@ transforms to modify the graph with. The transforms are given as a list of
 names, and can each have arguments themselves. These transforms define the
 pipeline of modifications that are applied in order to produce the output.
 Sometimes you need some transforms to happen before others, and the ordering
-within the list lets you specify which happen first.
+within the list lets you specify which happen first. 
+Note that the optimization 
+`remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control 
+flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`.

 ## Inspecting Graphs

@ -169,7 +172,7 @@ then you'll need to make local modifications to the build files to include the
 right .cc file that defines it. In a lot of cases the op is just a vestigial
 remnant from the training process though, and if that's true then you can run
 the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs
-of your inference usage, to remove those unneccessary nodes:
+of your inference usage, to remove those unnecessary nodes:

 ```bash
 bazel build tensorflow/tools/graph_transforms:transform_graph
--- a/tensorflow/tools/pip_package/BUILD
+++ b/tensorflow/tools/pip_package/BUILD
@ -68,6 +68,7 @@ py_binary(
        ":included_headers",
        "//tensorflow/contrib/nn:nn_py",
        "//tensorflow/contrib/session_bundle:session_bundle_pip",
+        "//tensorflow/contrib/signal:signal_py",
        "//tensorflow/contrib/slim/python/slim/data:data_pip",
        "//tensorflow/python:util_example_parser_configuration",
        "//tensorflow/python/debug:debug_pip",
@ -141,6 +142,7 @@ sh_binary(
            "//tensorflow/contrib/ndlstm:ndlstm",
            "//tensorflow/contrib/nn:nn_py",
            "//tensorflow/contrib/session_bundle:session_bundle_pip",
+            "//tensorflow/contrib/signal:signal_py",
            "//tensorflow/contrib/slim:slim",
            "//tensorflow/contrib/slim/python/slim/data:data_pip",
            "//tensorflow/contrib/slim/python/slim/nets:nets_pip",
--- a/tensorflow/tools/pip_package/setup.py
+++ b/tensorflow/tools/pip_package/setup.py
@ -29,7 +29,7 @@ from setuptools.dist import Distribution
 # This version string is semver compatible, but incompatible with pip.
 # For pip, we will remove all '-' characters from this string, and use the
 # result for pip.
-_VERSION = '1.1.0-rc2'
+_VERSION = '1.1.0'

 REQUIRED_PACKAGES = [
    'numpy >= 1.11.0',
--- a/tensorflow/tools/quantization/quantize_graph.py
+++ b/tensorflow/tools/quantization/quantize_graph.py
@ -453,7 +453,8 @@ class GraphRewriter(object):

  def round_nodes_recursively(self, current_node):
    """The entry point for simple rounding quantization."""
-    if self.already_visited[current_node.name]:
+    if (current_node.name in self.already_visited
+       ) and self.already_visited[current_node.name]:
      return
    self.already_visited[current_node.name] = True
    for input_node_name in current_node.input:
--- a/tensorflow/tools/tfprof/README.md
+++ b/tensorflow/tools/tfprof/README.md
@ -30,7 +30,7 @@ statistics.

 tfprof is part of TensorFlow core. Simply ```import tensorflow as tf```.

-### Examine the shapes and sizes of all trainiable Variables.
+### Examine the shapes and sizes of all trainable Variables.
 ```python
 # Print trainable variable parameter statistics to stdout.
 param_stats = tf.contrib.tfprof.model_analyzer.print_model_analysis(
@ -439,7 +439,7 @@ with gfile.Open(os.path.join(output_dir, "run_meta"), "w") as f:
 <b>--op_log_path:</b>
 tensorflow::tfprof::OpLog. A proto used to provide extra op information
 for ops. By giving a group of ops a type name, users can easily aggregate the
-statistics for those ops without accidently missing or including extra ops.
+statistics for those ops without accidentally missing or including extra ops.
 tfprof exposes the following Python API to add op information and logging.

 ```python
--- a/Show More
+++ b/Show More