Merge changes from github.

END_PUBLIC

---
Commit d0f53f77f authored by Penghao Cen<scorpiocph@gmail.com>
Committed by Shanqing Cai<cais@google.com>:
Minor fix typo (#11323)

---
Commit 02fcf564e authored by Chris Song<sjhshy@gmail.com>
Committed by Chris Song<sjhshy@gmail.com>:
Fix misspells.

---
Commit 764c9b6b4 authored by Louis Tiao<ltiao@users.noreply.github.com>
Committed by GitHub<noreply@github.com>:
Fixed typo in docstring
---
Commit f8cd1283e authored by Shanqing Cai<cais@google.com>
Committed by Shanqing Cai<cais@google.com>:
Chaser

---
Commit 01383b946 authored by Shanqing Cai<cais@google.com>
Committed by Shanqing Cai<cais@google.com>:
Adapt TensorFlowTestCase.setUp() to new reset_default_graph() semantics

Avoid calling reset_default_graph() directly to prevent exceptions in
cases where test methods error out from within nested graph contexts,
which can leave _default_graph_stack non-empty in certain Python
versions.

---
Commit 0ffc37890 authored by Amit Patankar<amitpatankar@google.com>
Committed by Amit Patankar<amitpatankar@google.com>:
Removing second declaration of functions.

---
Commit f9c9cacb0 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Refactor ElementalIrEmitter's slice index finding code into
IrArray::Index::SourceIndexOfSlice().

PiperOrigin-RevId: 161140653

---
Commit ba297aec9 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Update ops-related pbtxt files.

PiperOrigin-RevId: 161138258

---
Commit 68d666737 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fixes a reentrant lock issue with tensors using ndarray memory which uses tensor memory.

PiperOrigin-RevId: 161137788

---
Commit a2ee8bca3 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor.

PiperOrigin-RevId: 161137741

---
Commit 755fa7b50 authored by Mark Daoust<markdaoust@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Block generate_test, and docs generating from running in python3.

- Doc generation is currently unsupported in python3

- These both end in errors in python 3.5.1+

PiperOrigin-RevId: 161137467

---
Commit 97cbcac45 authored by Peter Hawkins<phawkins@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[TF:XLA] Fix failure in functionalize_control_flow rewrite for Enter nodes that are unused. Make sure we ignore such nodes without producing an error.

PiperOrigin-RevId: 161136545

---
Commit dabcb60bc authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA] Add reasonable error messages to Builder::Build for bad parameter numbers.

PiperOrigin-RevId: 161136262

---
Commit 0cbd249e8 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add complex tensors support to `matrix_determinant`.

PiperOrigin-RevId: 161132422

---
Commit 335f1f14d authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Extend static shape inference for SparseTensors with dense_shapes constructed using slicing.

PiperOrigin-RevId: 161132391

---
Commit 53604916e authored by Jianwei Xie<xiejw@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fixed the missing labels test in TPUEstimator.

PiperOrigin-RevId: 161131282

---
Commit 9f57dc8dd authored by Bruno Rosa<bruno.rosa@eldorado.org.br>
Committed by Bruno Rosa<bruno.rosa@eldorado.org.br>:
Use mcpu instead of march for ppc64le

march is not support by gcc on ppc64le

---
Commit 7d5c74a9c authored by Skye Wanderman-Milne<skyewm@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Move duplicate detection logic from Graph to FunctionLibraryDefinition

Turns out this is more useful, since there are many function libraries
that don't belong to a graph. This will be used in a future
change. Note that this maintains the current behavior of Graph.

In addition, updates FunctionDefsEqual() to handle unset attr entries
(I ran into this when using this in said future change).

PiperOrigin-RevId: 161126628

---
Commit 2caec3af1 authored by Shanqing Cai<cais@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Disable more timeseries py tests failing in OSS PIP GPU builds

PiperOrigin-RevId: 161124799

---
Commit 0b5cce367 authored by Eugene Brevdo<ebrevdo@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Get TopK op working on GPU again.  Extend using cub's radix sort.

1. Undo rollback of Andreas Kirsch's initial implementation.
2. Use cub segmented radix sort if Andreas' heap-based impl
   for large k and small num_cols (thresholds of k=100, n=1000
   determined empirically).
3. Use cub segmented radix sort if k == num_cols (this case is always faster).
4. Added benchmarks.

Benchmarks show that the GPU implementation is up to 3x slower for small k but
can be 10x faster for large num_cols and k.

Benchmarks:

Benchmark: m_128_n_10_k_5_use_gpu_False          wall_time: 0.000166 s   Throughput: 0.0077 GB/s
Benchmark: m_128_n_10_k_5_use_gpu_True   wall_time: 0.000796 s   Throughput: 0.00161 GB/s
Benchmark: m_128_n_10_k_9_use_gpu_False          wall_time: 0.00017 s    Throughput: 0.00751 GB/s
Benchmark: m_128_n_10_k_9_use_gpu_True   wall_time: 0.000796 s   Throughput: 0.00161 GB/s
Benchmark: m_128_n_10_k_10_use_gpu_False         wall_time: 0.00017 s    Throughput: 0.00753 GB/s
Benchmark: m_128_n_10_k_10_use_gpu_True          wall_time: 0.000775 s   Throughput: 0.00165 GB/s
Benchmark: m_128_n_100_k_1_use_gpu_False         wall_time: 0.000155 s   Throughput: 0.0826 GB/s
Benchmark: m_128_n_100_k_1_use_gpu_True          wall_time: 0.000796 s   Throughput: 0.0161 GB/s
Benchmark: m_128_n_100_k_50_use_gpu_False        wall_time: 0.000247 s   Throughput: 0.0519 GB/s
Benchmark: m_128_n_100_k_50_use_gpu_True         wall_time: 0.0008 s     Throughput: 0.016 GB/s
Benchmark: m_128_n_100_k_99_use_gpu_False        wall_time: 0.000261 s   Throughput: 0.049 GB/s
Benchmark: m_128_n_100_k_99_use_gpu_True         wall_time: 0.000794 s   Throughput: 0.0161 GB/s
Benchmark: m_128_n_100_k_100_use_gpu_False       wall_time: 0.000239 s   Throughput: 0.0536 GB/s
Benchmark: m_128_n_100_k_100_use_gpu_True        wall_time: 0.000777 s   Throughput: 0.0165 GB/s
Benchmark: m_128_n_1000_k_1_use_gpu_False        wall_time: 0.000324 s   Throughput: 0.395 GB/s
Benchmark: m_128_n_1000_k_1_use_gpu_True         wall_time: 0.000916 s   Throughput: 0.14 GB/s
Benchmark: m_128_n_1000_k_10_use_gpu_False       wall_time: 0.00042 s    Throughput: 0.305 GB/s
Benchmark: m_128_n_1000_k_10_use_gpu_True        wall_time: 0.000902 s   Throughput: 0.142 GB/s
Benchmark: m_128_n_1000_k_500_use_gpu_False      wall_time: 0.0011 s     Throughput: 0.116 GB/s
Benchmark: m_128_n_1000_k_500_use_gpu_True       wall_time: 0.00097 s    Throughput: 0.132 GB/s
Benchmark: m_128_n_1000_k_990_use_gpu_False      wall_time: 0.00133 s    Throughput: 0.0962 GB/s
Benchmark: m_128_n_1000_k_990_use_gpu_True       wall_time: 0.000993 s   Throughput: 0.129 GB/s
Benchmark: m_128_n_1000_k_1000_use_gpu_False     wall_time: 0.00102 s    Throughput: 0.126 GB/s
Benchmark: m_128_n_1000_k_1000_use_gpu_True      wall_time: 0.000964 s   Throughput: 0.133 GB/s
Benchmark: m_128_n_10000_k_10_use_gpu_False      wall_time: 0.002 s      Throughput: 0.64 GB/s
Benchmark: m_128_n_10000_k_10_use_gpu_True       wall_time: 0.00288 s    Throughput: 0.445 GB/s
Benchmark: m_128_n_10000_k_100_use_gpu_False     wall_time: 0.00233 s    Throughput: 0.549 GB/s
Benchmark: m_128_n_10000_k_100_use_gpu_True      wall_time: 0.00325 s    Throughput: 0.394 GB/s
Benchmark: m_128_n_10000_k_5000_use_gpu_False    wall_time: 0.0127 s     Throughput: 0.101 GB/s
Benchmark: m_128_n_10000_k_5000_use_gpu_True     wall_time: 0.00381 s    Throughput: 0.336 GB/s
Benchmark: m_128_n_10000_k_9900_use_gpu_False    wall_time: 0.015 s      Throughput: 0.0853 GB/s
Benchmark: m_128_n_10000_k_9900_use_gpu_True     wall_time: 0.00438 s    Throughput: 0.292 GB/s
Benchmark: m_128_n_10000_k_10000_use_gpu_False   wall_time: 0.0104 s     Throughput: 0.123 GB/s
Benchmark: m_128_n_10000_k_10000_use_gpu_True    wall_time: 0.00427 s    Throughput: 0.3 GB/s
Benchmark: m_128_n_100000_k_100_use_gpu_False    wall_time: 0.0148 s     Throughput: 0.865 GB/s
Benchmark: m_128_n_100000_k_100_use_gpu_True     wall_time: 0.0262 s     Throughput: 0.488 GB/s
Benchmark: m_128_n_100000_k_1000_use_gpu_False   wall_time: 0.0201 s     Throughput: 0.636 GB/s
Benchmark: m_128_n_100000_k_1000_use_gpu_True    wall_time: 0.0263 s     Throughput: 0.486 GB/s
Benchmark: m_128_n_100000_k_50000_use_gpu_False          wall_time: 0.214 s      Throughput: 0.0599 GB/s
Benchmark: m_128_n_100000_k_50000_use_gpu_True   wall_time: 0.0322 s     Throughput: 0.398 GB/s
Benchmark: m_128_n_100000_k_99000_use_gpu_False          wall_time: 0.262 s      Throughput: 0.0489 GB/s
Benchmark: m_128_n_100000_k_99000_use_gpu_True   wall_time: 0.0377 s     Throughput: 0.34 GB/s
Benchmark: m_128_n_100000_k_100000_use_gpu_False         wall_time: 0.118 s      Throughput: 0.108 GB/s
Benchmark: m_128_n_100000_k_100000_use_gpu_True          wall_time: 0.0365 s     Throughput: 0.351 GB/s

END_PUBLIC

BEGIN_PUBLIC
BEGIN_PUBLIC
Automated g4 rollback of changelist 157169178

PiperOrigin-RevId: 161476569
This commit is contained in:
Shanqing Cai 2017-07-10 19:22:04 -07:00 committed by TensorFlower Gardener
parent aa239529c4
commit 90d6421c5e
243 changed files with 2707 additions and 814 deletions

View File

@ -17,6 +17,7 @@ If you open a GitHub issue, here is our policy:
- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**:
- **TensorFlow installed from (source or binary)**:
- **TensorFlow version (use command below)**:
- **Python version**:
- **Bazel version (if compiling from source)**:
- **CUDA/cuDNN version**:
- **GPU model and memory**:

View File

@ -16,7 +16,7 @@ or more CPUs or GPUs in a desktop, server, or mobile device without rewriting
code. TensorFlow also includes TensorBoard, a data visualization toolkit.
TensorFlow was originally developed by researchers and engineers
working on the Google Brain team within Google's Machine Intelligence research
working on the Google Brain team within Google's Machine Intelligence Research
organization for the purposes of conducting machine learning and deep neural
networks research. The system is general enough to be applicable in a wide
variety of other domains, as well.
@ -34,13 +34,13 @@ and discussion.**
People who are a little more adventurous can also try our nightly binaries:
* Linux CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.0-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.0-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.0-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/))
* Linux GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.0-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.0-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.0-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/))
* Mac CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.0-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.0-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
* Mac GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.0-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.0-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/))
* Windows CPU-only: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=35/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.2.0-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=35/)) / [Python 3.6 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=36/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.2.0-cp36-cp36m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=36/))
* Windows GPU: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=35/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.2.0-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=35/)) / [Python 3.6 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=36/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.2.0-cp36-cp36m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=36/))
* Android: [demo APK](https://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/tensorflow_demo.apk), [native libs](https://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/native/)
* Linux CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.1-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.1-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.1-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-python35-linux-cpu/))
* Linux GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.1-cp27-none-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-linux/)) / [Python 3.4](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.1-cp34-cp34m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-linux/)) / [Python 3.5](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-linux-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3.5,label=gpu-linux/))
* Mac CPU-only: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.1-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=mac-slave/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.2.1-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/))
* Mac GPU: [Python 2](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.1-py2-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=gpu-mac/)) / [Python 3](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow_gpu-1.2.1-py3-none-any.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-mac-gpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=gpu-mac/))
* Windows CPU-only: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=35/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.2.1-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=35/)) / [Python 3.6 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=36/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow-1.2.1-cp36-cp36m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows,PY=36/))
* Windows GPU: [Python 3.5 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=35/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.2.1-cp35-cp35m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=35/)) / [Python 3.6 64-bit](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=36/lastSuccessfulBuild/artifact/cmake_build/tf_python/dist/tensorflow_gpu-1.2.1-cp36-cp36m-win_amd64.whl) ([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-win/M=windows-gpu,PY=36/))
* Android: [demo APK](https://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/tensorflow_demo.apk), [native libs](http://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/native/)
([build history](https://ci.tensorflow.org/view/Nightly/job/nightly-android/))
#### *Try your first TensorFlow program*

View File

@ -1,3 +1,9 @@
# Release 1.2.1
## Bug Fixes and Other Changes
* Updating markdown version required to >= 2.6.8.
* Support tensors as dropout rates again, by removing the min(max(..))
# Release 1.2.0
## Major Features and Improvements

11
configure vendored
View File

@ -25,6 +25,10 @@ function is_windows() {
[[ "${PLATFORM}" =~ msys_nt*|mingw*|cygwin*|uwin* ]]
}
function is_ppc64le() {
[[ "${uname -m}" == "ppc64le" ]]
}
function sed_in_place() {
sed -e $1 $2 > "$2.bak"
mv "$2.bak" $2
@ -294,7 +298,12 @@ fi # TF_NEED_MKL
## Set up architecture-dependent optimization flags.
if [ -z "$CC_OPT_FLAGS" ]; then
default_cc_opt_flags="-march=native"
if [ is_ppc64le ]; then
# gcc on ppc64le does not support -march, use mcpu instead
default_cc_opt_flags="-mcpu=native"
else
default_cc_opt_flags="-march=native"
fi
read -p "Please specify optimization flags to use during compilation when bazel option "\
"\"--config=opt\" is specified [Default is $default_cc_opt_flags]: " CC_OPT_FLAGS
if [ -z "$CC_OPT_FLAGS" ]; then

View File

@ -912,11 +912,13 @@ class CSession {
for (TF_Operation* o : outputs) {
outputs_.emplace_back(TF_Output{o, 0});
}
output_values_.resize(outputs_.size());
}
void SetOutputs(const std::vector<TF_Output>& outputs) {
ResetOutputValues();
outputs_ = outputs;
output_values_.resize(outputs_.size());
}
void SetTargets(std::initializer_list<TF_Operation*> targets) {

View File

@ -152,12 +152,12 @@ Status SymbolicGradientBuilder::Initialize() {
grad_outputs_->resize(inputs_.size());
// Populate `output_nodes_` from node ids in `outputs_`.
output_nodes_.reserve(outputs_.size());
for (int i = 0; i < outputs_.size(); ++i) {
for (size_t i = 0; i < outputs_.size(); ++i) {
output_nodes_.insert(outputs_[i].node()->id());
}
// Populate `input_nodes_` from Outputs in `inputs_`.
input_nodes_.reserve(inputs_.size());
for (int i = 0; i < inputs_.size(); ++i) {
for (size_t i = 0; i < inputs_.size(); ++i) {
input_nodes_.insert({inputs_[i], i});
}
@ -341,7 +341,7 @@ Status SymbolicGradientBuilder::AddGradients() {
// gradient function to the src node/output to which it should be
// backproped. Maybe grad functions can return a vector of Output pairs to
// make this association explicit.
int dx_index = 0;
size_t dx_index = 0;
for (const Edge* e : n->in_edges()) {
if (e->IsControlEdge()) continue;
if (dx_index == dx.size()) {

View File

@ -203,6 +203,46 @@ Status TanhGrad(const Scope& scope, const Operation& op,
}
REGISTER_GRADIENT_OP("Tanh", TanhGrad);
Status AsinhGrad(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
// y = asinh(x)
// dy/dx = 1 / cosh(y)
auto dydx = Reciprocal(scope, Cosh(scope, op.output(0)));
// grad(x) = grad(y) * conj(dy/dx)
grad_outputs->push_back(
Mul(scope, grad_inputs[0], ConjugateHelper(scope, dydx)));
return scope.status();
}
REGISTER_GRADIENT_OP("Asinh", AsinhGrad);
Status AcoshGrad(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
// y = acosh(x)
// dy/dx = 1 / sinh(y)
auto dydx = Reciprocal(scope, Sinh(scope, op.output(0)));
// grad(x) = grad(y) * conj(dy/dx)
grad_outputs->push_back(
Mul(scope, grad_inputs[0], ConjugateHelper(scope, dydx)));
return scope.status();
}
REGISTER_GRADIENT_OP("Acosh", AcoshGrad);
Status AtanhGrad(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
// y = atanh(x)
// dy/dx = 1 / (1 - x^2)
auto one = Cast(scope, Const(scope, 1.0), op.input(0).type());
auto dydx = Reciprocal(scope, Sub(scope, one, Square(scope, op.input(0))));
// grad(x) = grad(y) * conj(dy/dx)
grad_outputs->push_back(
Mul(scope, grad_inputs[0], ConjugateHelper(scope, dydx)));
return scope.status();
}
REGISTER_GRADIENT_OP("Atanh", AtanhGrad);
Status SigmoidGrad(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {

View File

@ -48,6 +48,9 @@ class CWiseUnaryGradTest : public ::testing::Test {
SINH,
COSH,
TANH,
ASINH,
ACOSH,
ATANH,
SIGMOID,
SIGN,
SIN,
@ -122,6 +125,15 @@ class CWiseUnaryGradTest : public ::testing::Test {
case TANH:
y = Tanh(scope_, x);
break;
case ASINH:
y = Asinh(scope_, x);
break;
case ACOSH:
y = Acosh(scope_, x);
break;
case ATANH:
y = Atanh(scope_, x);
break;
case SIGMOID:
y = Sigmoid(scope_, x);
break;
@ -413,6 +425,76 @@ TEST_F(CWiseUnaryGradTest, Tanh_Complex) {
TestCWiseGrad<complex64>(TANH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Asinh) {
auto x_fn = [this](const int i) { return RV({0, -1, 1, -2, 2, -3, 3}); };
auto dy_fn = [this](const float x) { return x + RV({-2, 2, -3, 3, -4, 4}); };
auto dx_fn = [this](const float x, const float dy) {
auto y = std::asinh(x);
return dy / std::cosh(y);
};
TestCWiseGrad<float>(ASINH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Asinh_Complex) {
auto x_fn = [this](const int i) {
return CRV({{1, 0}, {0, 1}, {2, -1}, {1, 2}, {3, 4}});
};
auto dy_fn = [this](const complex64& x) {
return x + CRV({{-2, 2}, {-3, 3}, {1, -4}});
};
auto dx_fn = [this](const complex64& x, const complex64& dy) {
auto y = std::asinh(x);
return dy / conjugate(std::cosh(y));
};
TestCWiseGrad<complex64>(ASINH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Acosh) {
auto x_fn = [this](const int i) { return RV({1, 2, 3, 4, 5, 6, 7}); };
auto dy_fn = [this](const float x) { return x + RV({8, 9, 10, 11, 12, 13, 14}); };
auto dx_fn = [this](const float x, const float dy) {
auto y = std::acosh(x);
return dy / std::sinh(y);
};
TestCWiseGrad<float>(ACOSH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Acosh_Complex) {
auto x_fn = [this](const int i) {
return CRV({{1, 1}, {2, 1}, {1, 4}, {1, 2}, {3, 4}});
};
auto dy_fn = [this](const complex64& x) {
return x + CRV({{2, 2}, {3, 3}, {1, 4}});
};
auto dx_fn = [this](const complex64& x, const complex64& dy) {
auto y = std::acosh(x);
return dy / conjugate(std::sinh(y));
};
TestCWiseGrad<complex64>(ACOSH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Atanh) {
auto x_fn = [this](const int i) { return RV({0, -0.5, 0.5, -0.1, 0.1}); };
auto dy_fn = [this](const float x) { return x + RV({-2, 2, -3, 3, -4, 4}); };
auto dx_fn = [this](const float x, const float dy) {
return dy * (1. / (1. - x * x));
};
TestCWiseGrad<float>(ATANH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Atanh_Complex) {
auto x_fn = [this](const int i) {
return CRV({{0.1, 0}, {0, 0.1}, {0.2, -0.1}, {0.1, 0.2}, {0.3, 0.4}});
};
auto dy_fn = [this](const complex64& x) {
return x + CRV({{-2, 2}, {-3, 3}, {1, -4}});
};
auto dx_fn = [this](const complex64& x, const complex64& dy) {
return dy / conjugate(one_ - x * x);
};
TestCWiseGrad<complex64>(ATANH, x_fn, dy_fn, dx_fn);
}
TEST_F(CWiseUnaryGradTest, Sigmoid) {
auto x_fn = [this](const int i) { return RV({0, -1, 1, -2, 2, -3, 3}); };
auto dy_fn = [this](const float x) { return x + RV({-2, 2, -3, 3, -4, 4}); };

View File

@ -383,7 +383,7 @@ Status FunctionalizeLoop(Graph* graph, Frame* frame,
}
}
if (arg.exit == nullptr) {
return errors::InvalidArgument("Mising Exit successor to ",
return errors::InvalidArgument("Missing Exit successor to ",
arg.switch_node->name());
}
}

View File

@ -63,7 +63,7 @@ class ParallelizationPreparation : public HloPassInterface {
// Outlines 'instruction' from entry computation, if it had
// been assigned parallel tasks in an earlier pass through the computation.
// Returns true if 'instruction' was succesfully outlined, false otherwise.
// Returns true if 'instruction' was successfully outlined, false otherwise.
bool OutlineParallelizableInstruction(HloInstruction* instruction);
// Returns true if 'instruction' can be outlined into the same sub-computation

View File

@ -21,6 +21,7 @@ limitations under the License.
#define TENSORFLOW_COMPILER_XLA_SERVICE_CPU_INFEED_MANAGER_H_
#include <deque>
#include <vector>
#include "tensorflow/compiler/xla/types.h"
#include "tensorflow/core/platform/mutex.h"

View File

@ -441,7 +441,7 @@ Status ParallelCpuExecutable::ExecuteComputeFunctions(
// TODO(b/27458679) Manage scheduling based on in-flight concurrency limits.
// For example, if we expect a library conv/matmul call to run at max
// concurrency, we should not dispatch runnable instructions until the
// libary call is finished (to avoid expensive cache invalidation).
// library call is finished (to avoid expensive cache invalidation).
Executor executor(functions, run_options, &pending, &results,
buffer_pointers.data(), profile_counters.data(),
assignment_.get());

View File

@ -20,7 +20,7 @@ namespace cpu {
std::vector<int64> ShapePartitionAssigner::Run(int64 target_partition_count) {
// Gather outer-most dims where dim_size >= 'target_partition_count'.
// Note: always leave inner-dim static for vectorization/optimzations.
// Note: always leave inner-dim static for vectorization/optimizations.
std::vector<int64> outer_dims;
int64 outer_dim_size = 1;
// TODO(b/27458679) Consider reserving enough minor dimensions (based on

View File

@ -38,7 +38,7 @@ namespace cpu {
//
// [0, 1), [1, 2), [2, 3), [3, 4), [4, 5) [5, 8)
//
// Note that the last parition has residule because the dimension size is
// Note that the last partition has residule because the dimension size is
// not a multiple of the partition count.
//
//

View File

@ -495,7 +495,7 @@ TEST_F(HloAliasAnalysisTest, NestedWhiles) {
};
// Build separate condition computations so the call graph is flat. The
// callgraph is always flattened in the compiler pipeline, and the flattened
// callgraph enables representive interference analysis.
// callgraph enables representative interference analysis.
HloComputation* condition1 =
module_->AddEmbeddedComputation(build_cond_computation());
HloComputation* condition2 =

View File

@ -527,7 +527,7 @@ xla_test(
)
# Tests the dot operation in some cases that can be performed via a
# runtime call on some backends - e.g. a runtime call to to Eigen.
# runtime call on some backends - e.g. a runtime call to Eigen.
xla_test(
name = "dot_operation_runtime_test",
srcs = ["dot_operation_test.cc"],

View File

@ -1,8 +1,9 @@
"""Build rules for XLA testing."""
load("@local_config_cuda//cuda:build_defs.bzl", "cuda_is_configured")
load("//tensorflow/compiler/xla/tests:plugin.bzl", "plugins")
all_backends = ["cpu", "cpu_parallel", "gpu"]
all_backends = ["cpu", "cpu_parallel", "gpu"] + plugins.keys()
def filter_backends(backends):
"""Removes "gpu" from a backend list if CUDA is not enabled.
@ -121,6 +122,11 @@ def xla_test(name,
backend_deps = ["//tensorflow/compiler/xla/service:gpu_plugin"]
backend_deps += ["//tensorflow/compiler/xla/tests:test_macros_gpu"]
this_backend_tags += ["requires-gpu-sm35"]
elif backend in plugins:
backend_deps = plugins[backend]["deps"]
this_backend_copts += plugins[backend]["copts"]
this_backend_tags += plugins[backend]["tags"]
this_backend_args += plugins[backend]["args"]
else:
fail("Unknown backend %s" % backend)

View File

@ -0,0 +1,32 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Additional XLA devices to be included in the unit test suite."""
# Example:
#
# plugins = {
# "foo": {
# "deps": [
# "//tensorflow/compiler/plugin/foo:foo_lib",
# "//tensorflow/compiler/plugin/foo:test_macros",
# ],
# "copts": [],
# "tags": [],
# "args": []
# },
# }
plugins = {}

View File

@ -51,7 +51,7 @@ Status Concat(OpKernelContext* context, const gtl::ArraySlice<Tensor>& inputs,
std::vector<std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>> inputs_flat;
inputs_flat.reserve(inputs.size());
int64 output_dim0 = 0;
for (int i = 0; i < inputs.size(); ++i) {
for (size_t i = 0; i < inputs.size(); ++i) {
const Tensor& input = inputs[i];
if (input.dims() != input_dims) {
return errors::InvalidArgument(
@ -548,7 +548,7 @@ class BatchKernel : public AsyncOpKernel {
return Status::OK();
}
int32 last_size = 0;
for (int i = 0; i < allowed_batch_sizes_.size(); ++i) {
for (size_t i = 0; i < allowed_batch_sizes_.size(); ++i) {
const int32 size = allowed_batch_sizes_.at(i);
if (i > 0 && size <= last_size) {
return errors::InvalidArgument(
@ -675,7 +675,7 @@ class UnbatchResource : public ResourceBase {
// If we have a non-empty tensor, finish the waitlisted runs,
// and store any remaining pieces.
if (nonempty_input) {
for (int i = 0; i < batch_keys.size(); ++i) {
for (size_t i = 0; i < batch_keys.size(); ++i) {
auto runs_it = waiting_callbacks_.find(batch_keys[i]);
if (runs_it != waiting_callbacks_.end()) {
runs_it->second.context->set_output(0, split_inputs[i]);

View File

@ -106,7 +106,7 @@ void CategoricalFeatureColumnHandler::GenerateFeatureSplitCandidates(
NodeStats left_node_stats(learner_config, left_gradient_stats);
NodeStats right_node_stats(learner_config, right_gradient_stats);
// Generate split candiate and update best split candidate for the
// Generate split candidate and update best split candidate for the
// current root if needed.
FeatureSplitCandidate split_candidate(
slot_id_,

View File

@ -93,7 +93,7 @@ void DenseQuantizedFeatureColumnHandler::GenerateFeatureSplitCandidates(
NodeStats left_node_stats(learner_config, left_gradient_stats);
NodeStats right_node_stats(learner_config, right_gradient_stats);
// Generate split candiate.
// Generate split candidate.
const float threshold = dense_quantiles_(bucket_id);
FeatureSplitCandidate split_candidate(
slot_id_, CreateDenseSplitNode(dense_feature_column_, threshold),

View File

@ -109,7 +109,7 @@ void SparseQuantizedFeatureColumnHandler::GenerateFeatureSplitCandidates(
NodeStats left_node_stats(learner_config, left_gradient_stats);
NodeStats right_node_stats(learner_config, right_gradient_stats);
// Generate split candiate.
// Generate split candidate.
const float threshold = sparse_quantiles_(bucket_id);
FeatureSplitCandidate split_candidate(
slot_id_,
@ -124,7 +124,7 @@ void SparseQuantizedFeatureColumnHandler::GenerateFeatureSplitCandidates(
// Determine if we need a backward pass by checking if the residual gradient
// after forward aggregation is almost the same as the aggregated gradient.
// for the current root. This helps avoid unecessary computation as well
// for the current root. This helps avoid unnecessary computation as well
// as consistency due to floating point precision.
if (!right_gradient_stats.IsAlmostZero()) {
// Backward pass with left default direction.
@ -147,7 +147,7 @@ void SparseQuantizedFeatureColumnHandler::GenerateFeatureSplitCandidates(
NodeStats left_node_stats(learner_config, left_gradient_stats);
NodeStats right_node_stats(learner_config, right_gradient_stats);
// Generate split candiate.
// Generate split candidate.
const float threshold = sparse_quantiles_(bucket_id - 1);
FeatureSplitCandidate split_candidate(
slot_id_,

View File

@ -1149,7 +1149,7 @@ class PredictionOpsTest(test_util.TensorFlowTestCase):
adjusted_tree_ensemble_config = (
tree_config_pb2.DecisionTreeEnsembleConfig())
# When we say to average over more trees than possible, it is averaging
# accross all trees.
# across all trees.
total_num = 100
for i in range(0, total_num):
tree = tree_ensemble_config.trees.add()

View File

@ -18,6 +18,7 @@ set(cub_URL https://github.com/NVlabs/cub/archive/1.6.4.zip)
set(cub_HASH SHA256=966d0c4f41e2bdc81aebf9ccfbf0baffaac5a74f00b826b06f4dee79b2bb8cee)
set(cub_BUILD ${CMAKE_CURRENT_BINARY_DIR}/cub/src/cub)
set(cub_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/cub/src/cub)
set(cub_ARCHIVE_DIR ${CMAKE_CURRENT_BINARY_DIR}/external/cub_archive)
ExternalProject_Add(cub
PREFIX cub
@ -26,4 +27,4 @@ ExternalProject_Add(cub
DOWNLOAD_DIR "${DOWNLOAD_LOCATION}"
BUILD_IN_SOURCE 1
PATCH_COMMAND ${CMAKE_COMMAND} -E copy_if_different ${CMAKE_CURRENT_SOURCE_DIR}/patches/cub/CMakeLists.txt ${cub_BUILD}
INSTALL_COMMAND "")
INSTALL_COMMAND ${CMAKE_COMMAND} -E copy_directory ${cub_INCLUDE_DIR}/cub ${cub_ARCHIVE_DIR}/cub)

View File

@ -126,13 +126,15 @@ if(WIN32)
"${tensorflow_source_dir}/tensorflow/core/kernels/*quantiz*.h"
"${tensorflow_source_dir}/tensorflow/core/kernels/*quantiz*.cc"
"${tensorflow_source_dir}/tensorflow/core/kernels/neon/*"
# no in tensorflow.dll - comes from .so
# not in core - those are loaded dynamically as dll
"${tensorflow_source_dir}/tensorflow/contrib/resampler/kernels/resampler_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/rnn/kernels/blas_gemm.cc"
"${tensorflow_source_dir}/tensorflow/contrib/rnn/kernels/gru_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/rnn/kernels/lstm_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/rnn/ops/gru_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/rnn/ops/lstm_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/seq2seq/kernels/beam_search_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/seq2seq/ops/beam_search_ops.cc"
# temporarily disable nccl (nccl itself needs to be ported to windows first)
"${tensorflow_source_dir}/tensorflow/contrib/nccl/kernels/nccl_manager.cc"
"${tensorflow_source_dir}/tensorflow/contrib/nccl/kernels/nccl_ops.cc"

View File

@ -148,6 +148,7 @@ if (tensorflow_BUILD_PYTHON_TESTS)
"${tensorflow_source_dir}/tensorflow/contrib/data/*_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/factorization/*_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/integration_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/seq2seq/python/kernel_tests/*_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/stateless/python/kernel_tests/*_test.py"
# NOTE: tensor_forest tests in tensor_forest/hybrid/... still don't pass.
"${tensorflow_source_dir}/tensorflow/contrib/tensor_forest/client/*_test.py"
@ -171,6 +172,10 @@ if (tensorflow_BUILD_PYTHON_TESTS)
"${tensorflow_source_dir}/tensorflow/python/saved_model/saved_model_test.py"
# requires scipy
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/preprocessing/*_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/tfprof/python/tools/tfprof/pprof_profiler_test.py"
# flaky tests
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/cwise_ops_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/tfprof/python/tools/tfprof/internal/run_metadata_test.py"
)
if (WIN32)
set(tf_test_src_py_exclude

View File

@ -196,7 +196,7 @@ class Binomial(distribution.Distribution):
@property
def probs(self):
"""Probability of of drawing a `1`."""
"""Probability of drawing a `1`."""
return self._probs
def _batch_shape_tensor(self):

View File

@ -160,7 +160,7 @@ class _VectorStudentT(transformed_distribution.TransformedDistribution):
#### Examples
A single instance of a "Vector Student's t-distribution" is defined by a mean
vector of of length `k` and a scale matrix of shape `k x k`.
vector of length `k` and a scale matrix of shape `k x k`.
Extra leading dimensions, if provided, allow for batches.

View File

@ -800,7 +800,7 @@ class WALSModel(object):
regularization: A tensor (scalar) that contains the normalized
regularization term for the minibatch loss corresponding to sp_input.
sum_weights: The sum of the weights corresponding to sp_input. This
can be used with unregularized loss to caluclate the root weighted
can be used with unregularized loss to calculate the root weighted
squared error.
"""
assert isinstance(sp_input, sparse_tensor.SparseTensor)

View File

@ -32,11 +32,11 @@ namespace functor {
// Explicit instantiation of the CPU functor.
typedef Eigen::ThreadPoolDevice CPUDevice;
template class FillProjectiveTransform<CPUDevice, uint8>;
template class FillProjectiveTransform<CPUDevice, int32>;
template class FillProjectiveTransform<CPUDevice, int64>;
template class FillProjectiveTransform<CPUDevice, float>;
template class FillProjectiveTransform<CPUDevice, double>;
template struct FillProjectiveTransform<CPUDevice, uint8>;
template struct FillProjectiveTransform<CPUDevice, int32>;
template struct FillProjectiveTransform<CPUDevice, int64>;
template struct FillProjectiveTransform<CPUDevice, float>;
template struct FillProjectiveTransform<CPUDevice, double>;
} // end namespace functor
@ -116,7 +116,7 @@ namespace functor {
void FillProjectiveTransform<GPUDevice, TYPE>::operator()( \
const GPUDevice& device, OutputType* output, const InputType& images, \
const TransformsType& transform) const; \
extern template class FillProjectiveTransform<GPUDevice, TYPE>
extern template struct FillProjectiveTransform<GPUDevice, TYPE>
TF_CALL_uint8(DECLARE_FUNCTOR);
TF_CALL_int32(DECLARE_FUNCTOR);

View File

@ -21,6 +21,7 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import json
import os
@ -263,6 +264,9 @@ def get_uid(prefix=''):
```
"""
graph = ops.get_default_graph()
if graph not in tf_base_layers.PER_GRAPH_LAYER_NAME_UIDS:
tf_base_layers.PER_GRAPH_LAYER_NAME_UIDS[graph] = collections.defaultdict(
int)
layer_name_uids = tf_base_layers.PER_GRAPH_LAYER_NAME_UIDS[graph]
layer_name_uids[prefix] += 1
return layer_name_uids[prefix]

View File

@ -316,7 +316,7 @@ class Conv3D(tf_convolutional_layers.Conv3D, Layer):
When using this layer as the first layer in a model,
provide the keyword argument `input_shape`
(tuple of integers, does not include the sample axis),
e.g. `input_shape=(128, 128, 128, 3)` for 128x128x128 volumes
e.g. `input_shape=(128, 128, 128, 1)` for 128x128x128 volumes
with a single channel,
in `data_format="channels_last"`.

View File

@ -78,7 +78,7 @@ def layer_test(layer_cls, kwargs=None, input_shape=None, input_dtype=None,
if e is None:
input_data_shape[i] = np.random.randint(1, 4)
input_data = 10 * np.random.random(input_data_shape)
if input_dtype[:4] == 'float':
if input_dtype[:5] == 'float':
input_data -= 0.5
input_data = input_data.astype(input_dtype)
elif input_shape is None:

View File

@ -177,7 +177,7 @@ class StringCrosser {
static const auto k_feature_separator = "_X_";
gtl::InlinedVector<InternalType, 6> cross_vec(columns_.size());
for (int i = 0; i < permutation.size(); i++) {
for (size_t i = 0; i < permutation.size(); i++) {
cross_vec[i] = columns_[i]->Feature(batch_index, permutation[i]);
}
// TODO(zakaria): this will copy the string twice, might effect
@ -267,7 +267,7 @@ class ProductIterator {
next_permutation_.resize(columns_.size(), 0);
// Sets has_next_ to false if any feature column has 0 features.
has_next_ = true;
for (int i = 0; i < columns_.size(); i++) {
for (size_t i = 0; i < columns_.size(); i++) {
if (columns_[i]->FeatureCount(batch_index_) == 0) {
has_next_ = false;
break;
@ -581,7 +581,7 @@ class SparseFeatureCrossOp : public OpKernel {
columns,
int batch_index) {
int64 cross_count = 1;
for (int i = 0; i < columns.size(); i++) {
for (size_t i = 0; i < columns.size(); i++) {
const auto feature_count = columns[i]->FeatureCount(batch_index);
// If one column is missing any feature, there won't be any cross.
if (feature_count == 0) {

View File

@ -871,7 +871,7 @@ def _embedding_lookup_with_distributed_aggregation(params,
p_segment_ids = array_ops.gather(segment_ids, pindices[p])
# Number the p_segment_ids to meet segment_sum's requirements. Note
# that unique_p_segment_ids contains unique segment ids of this
# partiton and these ids' order is unchanged.
# partition and these ids' order is unchanged.
unique_p_segment_ids, unique_p_segment_idx = array_ops.unique(
p_segment_ids)
partitioned_segment_ids.append(unique_p_segment_ids)

View File

@ -165,7 +165,7 @@ class _LinearEmbeddingLookupArguments(
"combiner"])):
"""Represents the information needed from a column for embedding lookup.
Used to to compute DNN inputs and weighted sum.
Used to compute DNN inputs and weighted sum.
"""
pass
@ -184,7 +184,7 @@ class _DeepEmbeddingLookupArguments(
"trainable"])):
"""Represents the information needed from a column for embedding lookup.
Used to to compute DNN inputs and weighted sum.
Used to compute DNN inputs and weighted sum.
"""
pass

View File

@ -938,7 +938,7 @@ def convolution(inputs,
with "NC".
num_outputs: Integer, the number of output filters.
kernel_size: A sequence of N positive integers specifying the spatial
dimensions of of the filters. Can be a single integer to specify the same
dimensions of the filters. Can be a single integer to specify the same
value for all spatial dimensions.
stride: A sequence of N positive integers specifying the stride at which to
compute output. Can be a single integer to specify the same value for all

View File

@ -1493,12 +1493,12 @@ class PartialFlattenTest(test.TestCase):
def testSparsePartialFlatten(self):
"""Test `_inner_flatten` on `SparseTensor`s."""
shape = [4, 3, 11, 6, 1, 3]
shape = [4, 3, 11, 6]
np.random.seed(10301)
random_ = np.random.rand(*shape)
indices, values, _ = _sparsify(random_)
for new_rank in [1, 2, 3, 4, 5]:
for new_rank in [1, 2, 3]:
expected_shape = (shape[:new_rank - 1] + [np.prod(shape[new_rank - 1:])])
reshaped_random_ = np.reshape(random_, expected_shape)
expected_indices, expected_values, _ = _sparsify(reshaped_random_)

View File

@ -525,7 +525,7 @@ class Experiment(object):
differences in resource control. First, the resources (e.g., memory) used
by training will be released before evaluation (`train_and_evaluate` takes
double resources). Second, more checkpoints will be saved as a checkpoint
is generated at the end of each trainning iteration.
is generated at the end of each training iteration.
3. As the estimator.train starts from scratch (new graph, new states for
input, etc) at each iteration, it is recommended to have the

View File

@ -79,7 +79,7 @@ cd $SCRIPT_DIR/gen/proto
tar cf $FW_DIR_TFCORE_HDRS/tmp.tar tensorflow
cd $FW_DIR_TFCORE_HDRS
tar xf tmp.tar
# Dont include the auto downloaded/generated to build this library
# Don't include the auto downloaded/generated to build this library
rm -rf tensorflow/contrib/makefile
rm -f tmp.tar

View File

@ -283,7 +283,7 @@ def _luong_score(query, keys, scale):
raise ValueError(
"Incompatible or unknown inner dimensions between query and keys. "
"Query (%s) has units: %s. Keys (%s) have units: %s. "
"Perhaps you need to set num_units to the the keys' dimension (%s)?"
"Perhaps you need to set num_units to the keys' dimension (%s)?"
% (query, depth, keys, key_units, key_units))
dtype = query.dtype

View File

@ -775,7 +775,7 @@ images, labels = LoadTestData(...)
predictions = MyModel(images)
mae_value_op, mae_update_op = slim.metrics.streaming_mean_absolute_error(predictions, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels)
pl_value_op, pl_update_op = slim.metrics.percentage_less(mean_relative_errors, 0.3)
```

View File

@ -315,7 +315,7 @@ class TensorForestEstimator(estimator.Estimator):
though training might be distributed.
version: String indicating TensorForest version to use, for backward
compatibility. Either 'v2', 'v4', or None to let system pick.
Overrides grpah_builder_class.
Overrides graph_builder_class.
Returns:
A `TensorForestEstimator` instance.

View File

@ -109,7 +109,7 @@ class GrowStats {
const TensorForestParams& params_;
// We cache these beacuse they're used often.
// We cache these because they're used often.
const int split_after_samples_;
const int num_splits_to_consider_;

View File

@ -689,11 +689,11 @@ class InputStatisticsFromMiniBatch(object):
values = features[TrainEvalFeatures.VALUES]
else:
# times and values may not be available, for example during prediction. We
# still need to retreive our variables so that they can be read from, even
# still need to retrieve our variables so that they can be read from, even
# if we're not going to update them.
times = None
values = None
# Create/retreive variables representing input statistics, initialized
# Create/retrieve variables representing input statistics, initialized
# without data to avoid deadlocking if variables are initialized before
# queue runners are started.
with variable_scope.variable_scope("input_statistics", use_resource=True):

View File

@ -196,7 +196,7 @@ class ChainingStateManager(_OverridableStateManager):
return time // self._state_saving_interval
def _get_cached_states(self, times):
"""Retreive cached states for a batch of times."""
"""Retrieve cached states for a batch of times."""
read_chunk_numbers = self._get_chunk_number(times)
looked_up_state = list(self._cached_states.lookup(
math_ops.cast(read_chunk_numbers, dtypes.int64)))
@ -242,7 +242,7 @@ class ChainingStateManager(_OverridableStateManager):
# written to the next bucket). This assumes fixed missing times (i.e. if we
# were presented with times [10, 50] we will never see times [30, 50]).
#
# TODO(allenl): Retreive the highest time less than the current time rather
# TODO(allenl): Retrieve the highest time less than the current time rather
# than relying on fixed bucketing.
write_chunk_numbers = math_ops.maximum(
self._get_chunk_number(array_ops.concat(

View File

@ -150,7 +150,7 @@ class StateInterpolatingAnomalyDetector(FilteringStepPostprocessor):
This is simply Bayes' theorem, where p(data | anomaly) is the
alternative/anomaly distribution, p(data | not anomaly) is the model's
predicted distribution, and anomaly_prior_probability is the prior probability
of an anomaly occuring (user-specified, defaulting to 1%).
of an anomaly occurring (user-specified, defaulting to 1%).
Rather than computing p(anomaly | data) directly, we use the odds ratio:

View File

@ -70,7 +70,7 @@ class StructuralEnsemble(state_space_model.StateSpaceIndependentEnsemble):
`observation_noise`, `level_noise`, `trend noise`, `seasonality_noise`, and
`transient` are (typically scalar) Gaussian random variables whose variance is
learned from data, and that variance is not time dependant in this
learned from data, and that variance is not time dependent in this
implementation. Level noise is optional due to its similarity with observation
noise in some cases. Seasonality is enforced by constraining a full cycle of
seasonal variables to have zero expectation, allowing seasonality to adapt

View File

@ -544,7 +544,7 @@ def _convert_model_fn_to_train_step(model_fn, dequeue_fn, mode, run_config,
# TODO(xiejw): how to do we support hook and savers in the original
# model_fn. Realistically, the original
# model_fn will be excuted on TPU chips in a replica way. The hooks
# model_fn will be executed on TPU chips in a replica way. The hooks
# returned by the model_fn cannot be supported at all. If we have to,
# the graph construction part in the model_fn should be separated from the
# control part (such as hooks and savers). By that the graph construction

View File

@ -298,7 +298,7 @@ class InfeedQueue(object):
input_tensors is a list of lists of Tensors whose types and shapes are used
to set the queue configuration. The length of the outer list is the number
of shards required, and each inner list is the tuple of Tensors to use to
determine the types and shapes of the correponding shard. This method
determine the types and shapes of the corresponding shard. This method
depends on the shard dimension, and calling it freezes the shard policy.
Args:

View File

@ -88,7 +88,7 @@ def bucket(tensors,
This function is implemented using several queues. A `QueueRunner` for the
queues is added to the current `Graph`'s `QUEUE_RUNNER` collection.
As the returned tensors are the result of of a dequeue operation, evaluating
As the returned tensors are the result of a dequeue operation, evaluating
them will throw a `tf.errors.OutOfRangeError` when the input queue is
exhausted. If these tensors are feeding another input queue, its queue runner
will catch this exception, however, if they are used in your main thread

View File

@ -329,7 +329,7 @@ class EvaluateRepeatedlyTest(test.TestCase):
if not gfile.Exists(checkpoint_dir):
gfile.MakeDirs(checkpoint_dir)
# We need a variable that that the saver will try to restore.
# We need a variable that the saver will try to restore.
variables.get_or_create_global_step()
# Run with placeholders. If we actually try to evaluate this, we'd fail
@ -394,7 +394,7 @@ class EvaluateRepeatedlyTest(test.TestCase):
'evaluate_with_eval_feed_dict')
self._train_model(checkpoint_dir, num_steps=1)
# We need a variable that that the saver will try to restore.
# We need a variable that the saver will try to restore.
variables.get_or_create_global_step()
# Create a variable and an eval op that increments it with a placeholder.

View File

@ -761,8 +761,8 @@ void RdmaTensorBuffer::SendNextItem() {
(buffer_size > size_ && local_status_ == idle &&
remote_status_ == idle)) {
if ((local_status_ != none) && (buffer_size > size_)) {
CHECK(rm.data_type_ == DT_STRING)
<< "Only string tensor allows to change size";
VLOG(2) << "Extend RDMA buffer from " << size_ << " to "
<< buffer_size;
}
CreateCPUBuffer(buffer_size, false);
mu_.unlock();
@ -782,11 +782,13 @@ void RdmaTensorBuffer::SendNextItem() {
// local/remote_status_ won't be set back to idle
// unitl Write() is successful
mu_.unlock();
CHECK((buffer_size == size_ && rm.data_type_ != DT_STRING) ||
(buffer_size <= size_ && rm.data_type_ == DT_STRING))
<< "tensor and buffer size do not agree!"
<< " buffer_size = " << size_
<< " requested tensor size = " << buffer_size << in.DebugString();
if (!((buffer_size == size_ && rm.data_type_ != DT_STRING) ||
(buffer_size <= size_ && rm.data_type_ == DT_STRING))) {
VLOG(2) << "Tensor and buffer size do not agree,"
<< " buffer_size = " << size_
<< " requested tensor size = "
<< buffer_size << in.DebugString();
}
uint32_t imm_data = LookupBufferIndex(key);
rm.type_ = RDMA_MESSAGE_TENSOR_WRITE;
string message = RdmaMessage::CreateMessage(rm);

View File

@ -1221,6 +1221,9 @@ LIB_INTERNAL_WINDOWS_DEPS = glob(
"platform/*.cc",
"platform/profile_utils/**/*.h",
"platform/profile_utils/**/*.cc",
] + [
"framework/resource_handle.h",
"framework/resource_handle.cc",
],
exclude = [
"**/*test*",
@ -1250,7 +1253,6 @@ cc_library(
"platform/*.cc",
"platform/profile_utils/**/*.h",
"platform/profile_utils/**/*.cc",
] + [
"framework/resource_handle.h",
"framework/resource_handle.cc",
],

View File

@ -77,9 +77,10 @@ Status OpRegistry::LookUp(const string& op_type_name,
if (first_unregistered) {
OpList op_list;
Export(true, &op_list);
VLOG(1) << "All registered Ops:";
for (const auto& op : op_list.op()) {
VLOG(1) << SummarizeOpDef(op);
if (VLOG_IS_ON(3)) {
LOG(INFO) << "All registered Ops:";
for (const auto& op : op_list.op())
LOG(INFO) << SummarizeOpDef(op);
}
first_unregistered = false;
}

View File

@ -73,7 +73,7 @@ bool ConsumeEquals(StringPiece* description) {
return false;
}
// Split `*orig` into two pieces at the first occurence of `split_ch`.
// Split `*orig` into two pieces at the first occurrence of `split_ch`.
// Returns whether `split_ch` was found. Afterwards, `*before_split`
// contains the maximum prefix of the input `*orig` that doesn't
// contain `split_ch`, and `*orig` contains everything after the

View File

@ -716,7 +716,7 @@ class OpKernelContext {
StringPiece output_name, const TensorShape& output_shape,
Tensor** output) TF_MUST_USE_RESULT;
// Tries to reuse one of of the inputs given in input_indices as a temporary.
// Tries to reuse one of the inputs given in input_indices as a temporary.
// If none of the given inputs can be forwarded, calls
// allocate_temp() to allocate a new temporary buffer.
Status forward_input_or_allocate_temp(

View File

@ -77,6 +77,9 @@ class QueueInterface : public ResourceBase {
virtual void Close(OpKernelContext* ctx, bool cancel_pending_enqueues,
DoneCallback callback) = 0;
// Returns true if a given queue is closed and false if it is open.
virtual bool is_closed() const = 0;
// Assuming *this represents a shared queue, verify that it matches
// another instantiation indicated by node_def.
virtual Status MatchesNodeDef(const NodeDef& node_def) = 0;

View File

@ -31,7 +31,7 @@ namespace tensorflow {
const int Graph::kControlSlot = -1;
struct NodeProperties {
class NodeProperties {
public:
NodeProperties(const OpDef* op_def, const NodeDef& node_def,
const DataTypeSlice inputs, const DataTypeSlice outputs)

View File

@ -158,7 +158,7 @@ Status SingleMachine::Run(const GraphDef& graph_def,
// Also clear the timeline to save memory
init_metadata_.clear_step_stats();
}
for (int i = 0; i < queue_runner_defs_.size(); ++i) {
for (size_t i = 0; i < queue_runner_defs_.size(); ++i) {
std::unique_ptr<QueueRunner> queue_runner;
TF_RETURN_IF_ERROR(QueueRunner::New(queue_runner_defs_[i],
coordinator_.get(), &queue_runner));

View File

@ -141,7 +141,7 @@ Status GraphProperties::MergeEnqueueShapesAndTypes(
"Enqueue nodes mixed number of tensors: ", shapes_and_types.size(),
" vs ", queue_shapes_and_types->size());
}
for (int i = 0; i < shapes_and_types.size(); ++i) {
for (size_t i = 0; i < shapes_and_types.size(); ++i) {
const ShapeAndType& a = shapes_and_types[i];
ShapeAndType& b = (*queue_shapes_and_types)[i];
if (a.dtype != b.dtype) {
@ -163,7 +163,7 @@ Status GraphProperties::RelaxEnqueueShapesAndMergeTypes(
"Enqueue nodes mixed number of tensors: ", shapes_and_types.size(),
" vs ", queue_shapes_and_types->size());
}
for (int i = 0; i < shapes_and_types.size(); ++i) {
for (size_t i = 0; i < shapes_and_types.size(); ++i) {
const ShapeAndType& a = shapes_and_types[i];
ShapeAndType& b = (*queue_shapes_and_types)[i];
if (a.dtype != b.dtype) {

View File

@ -365,7 +365,7 @@ NodeState& VirtualScheduler::GetNodeStateOrCreateIt(const NodeDef* node) {
// Initialize output port related data:
// Assume the size of OutputProperties represents the number of output ports
// of this node.
for (int i = 0; i < node_state.output_properties.size(); ++i) {
for (size_t i = 0; i < node_state.output_properties.size(); ++i) {
node_state.time_no_references[i] = Costs::Duration::max();
node_state.num_outputs_executed[i] = 0;
// Populate an empty vector for each port. The caller will add nodes

View File

@ -396,7 +396,7 @@ Status ConstantFolding::EvaluateOneFoldable(const NodeDef& node,
if (output_tensors.empty()) {
Status(error::INVALID_ARGUMENT, "Expected at least one output.");
}
for (int i = 0; i < output_tensors.size(); i++) {
for (size_t i = 0; i < output_tensors.size(); i++) {
string node_name = AddPrefixToNodeName(node.name(), kConstantFoldingConst);
if (output_tensors.size() > 1) {
node_name = strings::StrCat(node_name, "-", i);

View File

@ -32,6 +32,7 @@ load(
"tf_kernel_library",
"tf_mkl_kernel_library",
"cc_header_only_library",
"if_not_windows",
)
load("@local_config_sycl//sycl:build_defs.bzl", "if_sycl")
load("//tensorflow:tensorflow.bzl", "tf_cuda_cc_test")
@ -231,7 +232,7 @@ cc_library(
name = "ops_util",
srcs = ["ops_util.cc"],
hdrs = ["ops_util.h"],
copts = ["-Wno-sign-compare"],
copts = if_not_windows(["-Wno-sign-compare"]),
deps = [
"//tensorflow/core:framework",
"//tensorflow/core:lib",
@ -344,7 +345,7 @@ cc_library(
name = "save_restore_tensor",
srcs = ["save_restore_tensor.cc"],
hdrs = ["save_restore_tensor.h"],
copts = ["-Wno-sign-compare"],
copts = if_not_windows(["-Wno-sign-compare"]),
deps = [
":bounds_check",
"//tensorflow/core:framework",
@ -1222,6 +1223,7 @@ tf_kernel_library(
],
visibility = ["//visibility:private"],
deps = [
":ops_util",
"//tensorflow/core:framework",
"//tensorflow/core/kernels:conv_ops",
"//third_party/eigen3",
@ -3219,6 +3221,7 @@ cc_library(
":sparse_reduce_op",
":sparse_reorder_op",
":sparse_reshape_op",
":sparse_slice_op",
":sparse_softmax",
":sparse_sparse_binary_op_shared",
":sparse_split_op",
@ -3301,6 +3304,12 @@ tf_kernel_library(
deps = SPARSE_DEPS,
)
tf_kernel_library(
name = "sparse_slice_op",
prefix = "sparse_slice_op",
deps = SPARSE_DEPS,
)
tf_kernel_library(
name = "sparse_softmax",
prefix = "sparse_softmax",

View File

@ -413,7 +413,7 @@ class Barrier : public ResourceBase {
}
queue_closed_ = true;
if (cancel_pending_enqueues) queue_cancelled_ = true;
if (!ready_queue_->closed()) {
if (!ready_queue_->is_closed()) {
ready_queue_->Close(ctx, cancel_pending_enqueues, callback);
}
}

View File

@ -45,20 +45,23 @@ struct CastFunctor<Eigen::SyclDevice, O, I> {
} // namespace functor
#define CURRY_TYPES3(FN, arg0, arg1) \
FN(arg0, arg1, bool); \
FN(arg0, arg1, uint8); \
FN(arg0, arg1, int8); \
FN(arg0, arg1, uint16); \
FN(arg0, arg1, int16); \
FN(arg0, arg1, int32); \
FN(arg0, arg1, int64); \
FN(arg0, arg1, Eigen::half); \
FN(arg0, arg1, float); \
FN(arg0, arg1, double); \
FN(arg0, arg1, std::complex<float>); \
#define CURRY_TYPES3_NO_HALF(FN, arg0, arg1) \
FN(arg0, arg1, bool); \
FN(arg0, arg1, uint8); \
FN(arg0, arg1, int8); \
FN(arg0, arg1, uint16); \
FN(arg0, arg1, int16); \
FN(arg0, arg1, int32); \
FN(arg0, arg1, int64); \
FN(arg0, arg1, float); \
FN(arg0, arg1, double); \
FN(arg0, arg1, std::complex<float>); \
FN(arg0, arg1, std::complex<double>)
#define CURRY_TYPES3(FN, arg0, arg1) \
CURRY_TYPES3_NO_HALF(FN, arg0, arg1) \
FN(arg0, arg1, Eigen::half);
#define CAST_CASE(DEVICE, IN, OUT) \
if (DataTypeToEnum<OUT>::value == dst_dtype) { \
return [](OpKernelContext* ctx, const Tensor& inp, Tensor* out) { \
@ -154,6 +157,15 @@ GetGpuCastFromBfloat(DataType dst_dtype);
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromBool(DataType dst_dtype);
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromUint8(DataType dst_dtype);
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromUint16(DataType dst_dtype);
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromInt16(DataType dst_dtype);
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromInt32(DataType dst_dtype);
@ -165,10 +177,8 @@ GetSyclCastFromFloat(DataType dst_dtype);
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromDouble(DataType dst_dtype);
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow
#endif // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_CAST_OP_IMPL_H_

View File

@ -38,10 +38,9 @@ GetGpuCastFromBool(DataType dst_dtype) {
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromBool(DataType dst_dtype) {
CURRY_TYPES3(CAST_CASE, SYCLDevice, bool);
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, bool);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -38,10 +38,9 @@ GetGpuCastFromDouble(DataType dst_dtype) {
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromDouble(DataType dst_dtype) {
CURRY_TYPES3(CAST_CASE, SYCLDevice, double);
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, double);
return nullptr;
}
#endif // TENSORFLOW_USE_SYC
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -53,10 +53,9 @@ GetGpuCastFromFloat(DataType dst_dtype) {
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromFloat(DataType dst_dtype) {
CURRY_TYPES3(CAST_CASE, SYCLDevice, float);
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, float);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -34,4 +34,13 @@ GetGpuCastFromInt16(DataType dst_dtype) {
}
#endif // GOOGLE_CUDA
#ifdef TENSORFLOW_USE_SYCL
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromInt16(DataType dst_dtype) {
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, int16);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -38,9 +38,9 @@ GetGpuCastFromInt32(DataType dst_dtype) {
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromInt32(DataType dst_dtype) {
CURRY_TYPES3(CAST_CASE, SYCLDevice, int32);
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, int32);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -38,9 +38,9 @@ GetGpuCastFromInt64(DataType dst_dtype) {
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromInt64(DataType dst_dtype) {
CURRY_TYPES3(CAST_CASE, SYCLDevice, int64);
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, int64);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -34,4 +34,13 @@ GetGpuCastFromInt8(DataType dst_dtype) {
}
#endif // GOOGLE_CUDA
#ifdef TENSORFLOW_USE_SYCL
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromInt8(DataType dst_dtype) {
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, int8);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -34,4 +34,13 @@ GetGpuCastFromUint16(DataType dst_dtype) {
}
#endif // GOOGLE_CUDA
#ifdef TENSORFLOW_USE_SYCL
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromUint16(DataType dst_dtype) {
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, uint16);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -34,4 +34,13 @@ GetGpuCastFromUint8(DataType dst_dtype) {
}
#endif // GOOGLE_CUDA
#ifdef TENSORFLOW_USE_SYCL
typedef Eigen::SyclDevice SYCLDevice;
std::function<void(OpKernelContext*, const Tensor&, Tensor*)>
GetSyclCastFromUint8(DataType dst_dtype) {
CURRY_TYPES3_NO_HALF(CAST_CASE, SYCLDevice, uint8);
return nullptr;
}
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -22,17 +22,6 @@ REGISTER5(UnaryOp, CPU, "Abs", functor::abs, float, Eigen::half, double, int32,
REGISTER2(UnaryOp, CPU, "ComplexAbs", functor::abs, complex64, complex128);
#endif
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Abs") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::abs<TYPE>>);
REGISTER_SYCL_KERNEL(float);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER4(UnaryOp, GPU, "Abs", functor::abs, float, Eigen::half, double, int64);
REGISTER2(UnaryOp, GPU, "ComplexAbs", functor::abs, complex64, complex128);
@ -48,4 +37,13 @@ REGISTER_KERNEL_BUILDER(Name("Abs")
UnaryOp<CPUDevice, functor::abs<int32>>);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER3(UnaryOp, SYCL, "Abs", functor::abs, float, double, int64);
REGISTER_KERNEL_BUILDER(Name("Abs")
.Device(DEVICE_SYCL)
.HostMemory("x")
.HostMemory("y")
.TypeConstraint<int32>("T"),
UnaryOp<CPUDevice, functor::abs<int32>>);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -18,19 +18,11 @@ limitations under the License.
namespace tensorflow {
REGISTER2(UnaryOp, CPU, "Acos", functor::acos, float, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Acos") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::acos<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER2(UnaryOp, GPU, "Acos", functor::acos, float, double);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Acos", functor::acos, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -0,0 +1,38 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/kernels/cwise_ops_common.h"
#include "tensorflow/core/kernels/cwise_ops_gradients.h"
namespace tensorflow {
REGISTER4(UnaryOp, CPU, "Acosh", functor::acosh, float, double,
complex64, complex128);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Acosh") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::acosh<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER2(UnaryOp, GPU, "Acosh", functor::acosh, float, double);
#endif
} // namespace tensorflow

View File

@ -19,26 +19,6 @@ namespace tensorflow {
REGISTER5(BinaryOp, CPU, "Add", functor::add, float, Eigen::half, double, int32,
int64);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Add") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
BinaryOp<SYCLDevice, functor::add<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
REGISTER_KERNEL_BUILDER(Name("Add")
.Device(DEVICE_SYCL)
.HostMemory("x")
.HostMemory("y")
.HostMemory("z")
.TypeConstraint<int32>("T"),
BinaryOp<CPUDevice, functor::add<int32>>);
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(BinaryOp, GPU, "Add", functor::add, float, Eigen::half, double);
@ -54,4 +34,15 @@ REGISTER_KERNEL_BUILDER(Name("Add")
BinaryOp<CPUDevice, functor::add<int32>>);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER2(BinaryOp, SYCL, "Add", functor::add, float, double);
REGISTER_KERNEL_BUILDER(Name("Add")
.Device(DEVICE_SYCL)
.HostMemory("x")
.HostMemory("y")
.HostMemory("z")
.TypeConstraint<int32>("T"),
BinaryOp<CPUDevice, functor::add<int32>>);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -18,19 +18,11 @@ limitations under the License.
namespace tensorflow {
REGISTER2(UnaryOp, CPU, "Asin", functor::asin, float, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Asin") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::asin<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER2(UnaryOp, GPU, "Asin", functor::asin, float, double);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Asin", functor::asin, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -0,0 +1,38 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/kernels/cwise_ops_common.h"
#include "tensorflow/core/kernels/cwise_ops_gradients.h"
namespace tensorflow {
REGISTER4(UnaryOp, CPU, "Asinh", functor::asinh, float, double,
complex64, complex128);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Asinh") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::asinh<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYC
#if GOOGLE_CUDA
REGISTER2(UnaryOp, GPU, "Asinh", functor::asinh, float, double);
#endif
} // namespace tensorflow

View File

@ -18,19 +18,11 @@ limitations under the License.
namespace tensorflow {
REGISTER2(UnaryOp, CPU, "Atan", functor::atan, float, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Atan") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::atan<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER2(UnaryOp, GPU, "Atan", functor::atan, float, double);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Atan", functor::atan, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -0,0 +1,38 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/kernels/cwise_ops_common.h"
#include "tensorflow/core/kernels/cwise_ops_gradients.h"
namespace tensorflow {
REGISTER4(UnaryOp, CPU, "Atanh", functor::atanh, float, double,
complex64, complex128);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Atanh") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::atanh<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYC
#if GOOGLE_CUDA
REGISTER2(UnaryOp, GPU, "Atanh", functor::atanh, float, double);
#endif
} // namespace tensorflow

View File

@ -18,19 +18,11 @@ limitations under the License.
namespace tensorflow {
REGISTER3(UnaryOp, CPU, "Ceil", functor::ceil, float, Eigen::half, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Ceil") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::ceil<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "Ceil", functor::ceil, float, Eigen::half, double);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Ceil", functor::ceil, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -19,19 +19,11 @@ namespace tensorflow {
REGISTER5(UnaryOp, CPU, "Cos", functor::cos, float, Eigen::half, double,
complex64, complex128);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Cos") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::cos<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "Cos", functor::cos, float, Eigen::half, double);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Cos", functor::cos, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -24,32 +24,6 @@ REGISTER5(BinaryOp, CPU, "TruncateDiv", functor::safe_div, uint8, uint16, int16,
int32, int64);
REGISTER5(BinaryOp, CPU, "RealDiv", functor::div, float, Eigen::half, double,
complex64, complex128);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Div") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
BinaryOp<SYCLDevice, functor::div<TYPE>>); \
REGISTER_KERNEL_BUILDER( \
Name("RealDiv") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
BinaryOp<SYCLDevice, functor::div<TYPE>>);
REGISTER_SYCL_KERNEL(float)
REGISTER_SYCL_KERNEL(double)
#undef REGISTER_SYCL_KERNEL
// A special GPU kernel for int32.
// TODO(b/25387198): Also enable int32 in device memory. This kernel
// registration requires all int32 inputs and outputs to be in host memory.
REGISTER_KERNEL_BUILDER(Name("Div")
.Device(DEVICE_SYCL)
.HostMemory("x")
.HostMemory("y")
.HostMemory("z")
.TypeConstraint<int32>("T"),
BinaryOp<CPUDevice, functor::safe_div<int32>>);
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER9(BinaryOp, GPU, "Div", functor::div, float, Eigen::half, double, uint8,
uint16, int16, int64, complex64, complex128);
@ -70,4 +44,15 @@ REGISTER_KERNEL_BUILDER(Name("Div")
BinaryOp<CPUDevice, functor::safe_div<int32>>);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(BinaryOp, SYCL, "Div", functor::div, float, double);
REGISTER2(BinaryOp, SYCL, "RealDiv", functor::div, float, double);
REGISTER_KERNEL_BUILDER(Name("Div")
.Device(DEVICE_SYCL)
.HostMemory("x")
.HostMemory("y")
.HostMemory("z")
.TypeConstraint<int32>("T"),
BinaryOp<CPUDevice, functor::safe_div<int32>>);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -47,8 +47,8 @@ REGISTER_KERNEL_BUILDER(Name("Equal")
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(BinaryOp, SYCL, "Equal", functor::equal_to, float, double);
REGISTER5(BinaryOp, SYCL, "Equal", functor::equal_to, float, double, uint8,
int8, int16);
REGISTER_KERNEL_BUILDER(Name("Equal")
.Device(DEVICE_SYCL)
.HostMemory("x")

View File

@ -19,19 +19,11 @@ namespace tensorflow {
REGISTER5(UnaryOp, CPU, "Exp", functor::exp, float, Eigen::half, double,
complex64, complex128);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Exp") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::exp<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "Exp", functor::exp, float, Eigen::half, double);
#endif
#if TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Exp", functor::exp, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -22,6 +22,6 @@ REGISTER5(UnaryOp, CPU, "Expm1", functor::expm1, float, Eigen::half, double,
REGISTER3(UnaryOp, GPU, "Expm1", functor::expm1, float, Eigen::half, double);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER(UnaryOp, SYCL, "Expm1", functor::expm1, float);
REGISTER2(UnaryOp, SYCL, "Expm1", functor::expm1, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -18,19 +18,10 @@ limitations under the License.
namespace tensorflow {
REGISTER3(UnaryOp, CPU, "Floor", functor::floor, float, Eigen::half, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("Floor") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::floor<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "Floor", functor::floor, float, Eigen::half, double);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "Floor", functor::floor, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -0,0 +1,27 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
#include "tensorflow/core/kernels/cwise_ops_gpu_gradients.cu.h"
namespace tensorflow {
namespace functor {
DEFINE_UNARY2(acosh, float, double);
} // namespace functor
} // namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -0,0 +1,27 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
#include "tensorflow/core/kernels/cwise_ops_gpu_gradients.cu.h"
namespace tensorflow {
namespace functor {
DEFINE_UNARY2(asinh, float, double);
} // namespace functor
} // namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -0,0 +1,27 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
#include "tensorflow/core/kernels/cwise_ops_gpu_gradients.cu.h"
namespace tensorflow {
namespace functor {
DEFINE_UNARY2(atanh, float, double);
} // namespace functor
} // namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -34,11 +34,8 @@ REGISTER_KERNEL_BUILDER(Name("Greater")
BinaryOp<CPUDevice, functor::greater<int32>>);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER(BinaryOp, SYCL, "Greater", functor::greater, float);
REGISTER2(BinaryOp, SYCL, "Greater", functor::greater, float, double);
// A special GPU kernel for int32.
// TODO(b/25387198): Also enable int32 in device memory. This kernel
// registration requires all int32 inputs and outputs to be in host memory.
REGISTER_KERNEL_BUILDER(Name("Greater")
.Device(DEVICE_SYCL)
.HostMemory("x")
@ -47,5 +44,4 @@ REGISTER_KERNEL_BUILDER(Name("Greater")
.TypeConstraint<int32>("T"),
BinaryOp<CPUDevice, functor::greater<int32>>);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -35,7 +35,7 @@ REGISTER_KERNEL_BUILDER(Name("GreaterEqual")
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER(BinaryOp, SYCL, "GreaterEqual", functor::greater_equal, float);
REGISTER2(BinaryOp, SYCL, "GreaterEqual", functor::greater_equal, float, double);
REGISTER_KERNEL_BUILDER(Name("GreaterEqual")
.Device(DEVICE_SYCL)

View File

@ -19,20 +19,12 @@ namespace tensorflow {
REGISTER3(UnaryOp, CPU, "IsFinite", functor::isfinite, float, Eigen::half,
double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("IsFinite") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::isfinite<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "IsFinite", functor::isfinite, float, Eigen::half,
double);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "IsFinite", functor::isfinite, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -18,19 +18,11 @@ limitations under the License.
namespace tensorflow {
REGISTER3(UnaryOp, CPU, "IsInf", functor::isinf, float, Eigen::half, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("IsInf") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::isinf<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "IsInf", functor::isinf, float, Eigen::half, double);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "IsInf", functor::isinf, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -18,19 +18,11 @@ limitations under the License.
namespace tensorflow {
REGISTER3(UnaryOp, CPU, "IsNan", functor::isnan, float, Eigen::half, double);
#if TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(TYPE) \
REGISTER_KERNEL_BUILDER( \
Name("IsNan") \
.Device(DEVICE_SYCL) \
.TypeConstraint<TYPE>("T"), \
UnaryOp<SYCLDevice, functor::isnan<TYPE>>);
REGISTER_SYCL_KERNEL(float);
REGISTER_SYCL_KERNEL(double);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#if GOOGLE_CUDA
REGISTER3(UnaryOp, GPU, "IsNan", functor::isnan, float, Eigen::half, double);
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER2(UnaryOp, SYCL, "IsNan", functor::isnan, float, double);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -35,7 +35,6 @@ REGISTER_KERNEL_BUILDER(Name("Less")
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER3(BinaryOp, SYCL, "Less", functor::less, float, double, int64);
REGISTER_KERNEL_BUILDER(Name("Less")
.Device(DEVICE_SYCL)
.HostMemory("x")

View File

@ -35,8 +35,8 @@ REGISTER_KERNEL_BUILDER(Name("LessEqual")
#endif
#ifdef TENSORFLOW_USE_SYCL
REGISTER(BinaryOp, SYCL, "LessEqual", functor::less_equal, float);
REGISTER6(BinaryOp, SYCL, "LessEqual", functor::less_equal, float, double,
int64, uint8, int8, int16);
REGISTER_KERNEL_BUILDER(Name("LessEqual")
.Device(DEVICE_SYCL)
.HostMemory("x")
@ -45,5 +45,4 @@ REGISTER_KERNEL_BUILDER(Name("LessEqual")
.TypeConstraint<int32>("T"),
BinaryOp<CPUDevice, functor::less_equal<int32>>);
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

Some files were not shown because too many files have changed in this diff Show More