Merge changes from github.

END_PUBLIC

---
Commit 9f8523640 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Update ops-related pbtxt files.

PiperOrigin-RevId: 173145770

---
Commit 01b6b0638 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Cut tracing memory cost

PiperOrigin-RevId: 173144626

---
Commit 5e23e0e67 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA] Erase cloned instructions on the fly when merging fusion nodes.

This avoids the awkward situation where an RNG which is clearly eligible for fusion becomes ineligible mid-fusion because it suddenly has an extra (dead) user.

PiperOrigin-RevId: 173141716

---
Commit 1038927c0 authored by Saurabh Saxena<srbs@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add SerializeIterator op that serializes an IteratorResource into a variant tensor.
Add DeserializeIterator op that builds IteratorResource from a variant tensor.
Move BundleReaderWrapper and BundleWriterWrapper from dataset.h to iterator_ops.cc.
Add generic key-value store interfaces IteratorStateReader and IteratorStateWriter for reading/writing state of iterators.
Get rid of IteratorBundleReader and IteratorBundleWriter.

PiperOrigin-RevId: 173140858

---
Commit 57f3e529d authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Internal change

PiperOrigin-RevId: 173136642

---
Commit 0e56ffb7b authored by Shanqing Cai<cais@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fix breakages in OSS builds

See example breakages logs at:
http://ci.tensorflow.org/job/tensorflow-cl-cpu-python3-pip/10847/console
http://ci.tensorflow.org/job/tensorflow-cl-gpu/11008/console

1. CL/172477381 added the no_oss tag to tests with oss_serial tags, which broke the logic of OSS_SERIAL tests in pip.sh and run_pip_test.sh. This CL fixes that.

2. The nccl_kernels BUILD target in contrib/nccl/BUILD was missing some dependencies. This CL adds the missing ones.

Fixes: #13918
PiperOrigin-RevId: 173133914

---
Commit 3ed049b67 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Allows calling keras layers in eager mode.

PiperOrigin-RevId: 173129805

---
Commit 4ec6f2b07 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Switching contrib.summaries API to be context-manager-centric

PiperOrigin-RevId: 173129793

---
Commit 03b02ffc9 authored by Justine Tunney<jart@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Put Bazel mirror URLs first

PiperOrigin-RevId: 173127955

---
Commit 46ab25e4d authored by David Majnemer<majnemer@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[XLA] Add support for convolutions with no spatial dimensions

PiperOrigin-RevId: 173126950

---
Commit fc56349b7 authored by Derek Murray<mrry@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
[tf.data] Convert dataset arguments to tensors as early as possible.

This change raises a `TypeError` earlier if (for example) the `batch_size`
argument to `Dataset.batch()` has the incorrect type.

PiperOrigin-RevId: 173126678

---
Commit 4f7503a87 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
K-FAC: Support for registering multiple minibatches with register_fully_connected()

PiperOrigin-RevId: 173121735

---
Commit 2845bfcd6 authored by Tim Harley<tharley@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Avoid listing all modified Enter/RefEnter nodes on INFO, use VLOG(1) instead.

Leave a single, simple, message on INFO.

PiperOrigin-RevId: 173121726

---
Commit 434695921 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
K-FAC: _check_registration() supports multiple towers.

PiperOrigin-RevId: 173115870

---
Commit 670dddf4a authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Multi-minibatch support for
tf.contrib.kfac.fisher_blocks.FullyConnectedKFACBasicFB.

PiperOrigin-RevId: 173109677

---
Commit dc13a8e2f authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Fix import of meta graphs with partitioned variables into a scope.

Saver inspects SliceInfo to decide the variable name when creating a
checkpoint. Before this fix even if a partitioned variable ("weights")
was imported into a scope "a" it would still be checkpointed as ("weights")
instead of ("a/weights") since import_scoped_meta_graph was not adjusting
the SliceInfo.

WARNING: if you use import_meta_graph on graphs with partitioned_variables WITH an import_scope argument AND then create a Saver to write/read checkpoints this change
may break your checkpoint loading.
PiperOrigin-RevId: 173105796

---
Commit eea089bdb authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
K-FAC: Multi-tower support for ConvDiagonalFB.

PiperOrigin-RevId: 173105412

---
Commit 9b9cbbe2a authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Add int64 Tperm type support for `Transpose` (#13909)

* Add int64 Tperm type support for `Transpose`

This fix adds int64 Tperm support for `Transpose`. In
`array_ops.cc`, `Transpose` and `ConjugateTranspose`
have been specified as accepting int32 and int64 perm
types. However, only int32 kernels has been registered.

This fix adds the int64 perm support by removing
the constraint on Tperm, resolve the type at runtime,
and copying the data type accordingly to correctly handle
the int64/int32 types.

Additional tests have been added as well.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add test cases for int64 of perm in Transpose.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add namespace to hide PermutationHelper

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Enable use_gpu=True for perm type test.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* extra // namespace annotation

* Adding a comment about int32 casting that should be safe.

Permutations only contain values that refer to dimensions, and the maximum number of dimensions we have is 254, so an int32 is always safe here.

---
Commit ac0004e71 authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Add int64 shape support on GPU for stateless random ops. (#13908)

* Add int64 shape support on GPU for stateless random ops.

This fix adds int64 shape support on GPU for stateless random ops
`StatelessRandomUniform`, `StatelessRandomNormal`, `StatelessTruncatedNormal`.

The int64 shape for stateless random ops is already supported on CPU
with int32/int64 processed properly through `MakeShape`.

However, on GPU a type constraint `.TypeConstraint<int32>("T")`
has been improperly added. Such a type constraint actually prevents
an int64 shape type to run on GPU. (As a comparision, no type constraint
on CPU).

This fix removes the type constraint and allows int64 shape to be run on GPU.

This fix also adds test cases for int64 shape support on stateless random ops.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add test cases for int64 shape support for stateless random ops.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add int32 to shape types tested.

---
Commit 0d437c3be authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Add int64 padding support for MirrorPad (#13907)

* Add int64 padding support for MirrorPad

This fix adds int64 padding support for `MirrorPad`.
In the `array_ops.cc` the `MirrorPad`/`MirrorPadGrad`
has been specified as supporting int64 padding. The related
kernels does not have the int64 padding registered though.
This fix adds the int64 padding support. This fix also adds
additional test cases for coverage.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update template for CPU and GPU support of int64 paddings.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add int64 padding support for MirrorPad

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Put eigen header first like before, just in case.

---
Commit 690003cc0 authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Add `int64` type `multiples` support for `tf.tile` (#13884)

* Add `int64` type `multiples` support for `tf.tile`

In the doc of `tf.tile` (tf.tile.__doc__) both `int32`
and `int64` are supported for `multiples`. However, the kernel
for `int64` is not registered yet.

This fix adds the support of `int64` `multiples` so that the
behavior matches the description of the docs.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update functors for int64 multiples support in `tf.tile`

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update test cases for int64 of multiples in `tf.tile`

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add GPU and non GPU tests

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* format with clang-format -i

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Move Tmultiples after T (as it is  auxilliary)

And use `use_gpu=True`

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit fd8d517b9 authored by Yunxing Dai<yunxing@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add tests for convolution 1D
RELNOTES: n/a

PiperOrigin-RevId: 173060283

---
Commit 40c475b48 authored by formath<jinpengliu@163.com>
Committed by Vijay Vasudevan<vrv@google.com>:
add segment_reduction_ops to tf_op_files (#13901)

---
Commit bfa4ec194 authored by Tayo Oguntebi<10927929+tayo@users.noreply.github.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Update node_def.proto comments (#13874)

The device field had outdated comments.

Note: We could consider adding tpu as an example here, e.g. "gpu" | "cpu" | "tpu".  Thoughts?
---
Commit c9cb5a58d authored by formath<jinpengliu@163.com>
Committed by Vijay Vasudevan<vrv@google.com>:
protobuf lib path bug fix for benckmark on osx (#13878)

---
Commit 1c1dad105 authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Add int64 axis support for reduction ops. (#13891)

* Add int64 axis support for reduction ops.

This fix is a follow up to PR 13863. In PR 13863 the
program crash is fixed if int64 axis is passed to reduction ops,
e.g. reduce_sum, reduce_max, etc. However, 13863 does not
process the case of int64 support, it merely fixes the crash.

This fix adds the support for int64 axis of reduction ops.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add int64 axis support for mean, prod, sum

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add int64 axis support for min and max.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add int64 axis support for reduce_all and reduce_any

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add test cases for int64 axis support of reduce_any and reduce_all

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit 17096081e authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Improve resize_bicubic performance by reorganizing loops (#13840)

* Improve resize_bicubic performance by reorganizing loops

This fix tries to address the issue raised in 13693 where
performance of `resize_bicubic` is not on par with opencv.

This fix rearranges the loops so that it is the same for
num_channel=40 and num_channel=3:

Pre-fix:
```
CHANNEL=40
opencv: 145.08ms
tf: 314.26ms

CHANNEL=3
opencv: 11.95ms
tf: 8.95ms
```

Post-fix:
```
CHANNEL=40
opencv: 144.25ms
tf: 214.55ms

CHANNEL=3
opencv: 11.78ms
tf: 14.07ms
```

This fix fixes 13693.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Keep special handling of `num_channels=3` for `resize_bicubic`

This commit keeps special handling of `num_channels=3` for
`resize_bicubic`:
Without special handling:
```
opencv: 11.78ms
tf: 14.07ms
```
With special handling:
```
opencv: 11.74ms
tf: 9.46ms
```

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Expand Benchmark test for resize_bicubic

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update from review feedback.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit b927df57f authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Update protobuf.cmake to b04e5cba356212e4e8c66c61bbe0c3a20537c5b9 (#13893)

This fix tries to address the issue raised in 8187 where
protobuf.cmake used different version as bazel.

The reason for discrepancy was due to the fact that a customerized
protobuf was needed with Windows patch. Since the patch has been
merged in (https://github.com/google/protobuf/pull/2203),
it makes sense to update protobuf.cmake so that the same version
of cmake is used.

This fix fixes 8187.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
---
Commit d1183ca6a authored by Vijay Vasudevan<vrv@google.com>
Committed by GitHub<noreply@github.com>:
Give each variable a unique name in accumulate_n_v2_eager_test. (#13886)

---
Commit a69945810 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Update pin for bazel-toolchains to latest version

PiperOrigin-RevId: 173002530

---
Commit 9d55c249c authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Fix doc in TF_CALL_ when invoked in mobile platform (#13881)

* Fix doc in TF_CALL_ when defined(IS_MOBILE_PLATFORM) && !defined(__ANDROID_TYPES_FULL__)

This is a small doc fix that includes bool as part of the types
that is supported in mobile (IS_MOBILE_PLATFORM && !__ANDROID_TYPES_FULL__),
as bool is clearly invoked in the following define.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Also add bool to android full version.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit ba49d8583 authored by Bjarke Hammersholt Roune<broune@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Slight change to reduce_test to avoid generating inf, which was triggering an inf detector unnecessarily.

PiperOrigin-RevId: 172965466

---
Commit 93e8f3c67 authored by Anna R<annarev@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Adding Python ApiDef overrides.

PiperOrigin-RevId: 172960496

---
Commit 0d6a2e353 authored by Anna R<annarev@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Internal change.

PiperOrigin-RevId: 172960439

---
Commit 62df65c72 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add dtype argument to Mean and Accuracy object-oriented metrics.

PiperOrigin-RevId: 172957714

---
Commit d7409d32b authored by Simone Cirillo<my.accounts@gmx.se>
Committed by Vijay Vasudevan<vrv@google.com>:
Fix import of spatial_softmax from tensorflow.contrib.layers (#13833)

---
Commit df8bce63d authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Fix crash when `int64` axis is passed to `tf.reduce_sum` (#13863)

* Fix crash when `int64` axis is passed to `tf.reduce_sum`

This fix tries to fix the crash triggered by `int64` axis passed
to `tf.reduce_sum`:
```
ubuntu@ubuntu:~/tensorflow2$ (cd && python)
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> v = tf.reduce_sum([1,2,3], tf.constant(0, tf.int64))
2017-10-20 15:55:06.993430: F tensorflow/core/framework/tensor.cc:601] Check failed: dtype() == expected_dtype (9 vs. 3)
ubuntu@ubuntu:~/tensorflow2$
```

The issue is caused by the fact that shape inference in `common_shape_fns.cc`
only assumes int32 without proper handling of diffent types. In `math_ops.cc`
both int32 and int64 are mentioned.

NOTE that this fix does not address the issue that int64 is not supported.
To allow int64 axis it is more than adding a template in `ReductionOp` as the type
of the axis seems to be decided by some other ways in Eigen.

This fix merely fixed the crash so that an error message will return without
exit from the python program "No OpKernel was registered to support Op 'Sum' with these attrs".

Still, I think its worth to at least allow the program to continue in case of unsupported kernel.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Update implementation with a template helper function.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit 29c7b4658 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Adding the Stanford Tensorflow class to community resources.

PiperOrigin-RevId: 172956049

---
Commit f758b24a8 authored by Alexandre Passos<apassos@google.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Variable name for the eager test (#13873)

---
Commit a5fe66b15 authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Removed some unnecessary broadcasts in binary ops where only one input needs
broadcasting (which is a fairly common case, even in the fallback path).

PiperOrigin-RevId: 172950493

---
Commit c77090a0a authored by Yong Tang<yong.tang.github@outlook.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Fix issues where int64 crops could not be passed to batch_to_space. (#13862)

* Fix issues where int64 crops could not be passed to batch_to_space.

This fix tries to address the issue where int64 `crops` could
not be passed to `batch_to_space` even though both int32 and
int64 are specified as supported in the docs (tf.batch_to_space.__doc__)

The reason is that BatchToSpace kernel puts a constraint of int32 to crops
data types.

This fix removed the constraint so that int64 `crops` could be supported.

NOTE: Just removing the constraint should work and it is not necessary
to add specification to the kernel class template, as `SubtleMustCopyFlat`
called in the class already correctly handled both int32 and int64 cases.
Besides, other data types (e.g., float or double) will not be passed to the
kernel as they are guarded by the specification in `array_ops.cc`.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Also remove int64/int32 type constraints for SpaceToBatch kernels

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add test cases for int64 crops of batch_to_space and space_to_batch

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Fix test failures.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

---
Commit 494837936 authored by Joshua V. Dillon<jvdillon@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Make `tf.contrib.distributions` quadrature family accept a `Tensor` for
`quadrature_grid_and_probs` argument.

PiperOrigin-RevId: 172950094

---
Commit 9c825d32c authored by Jinze Bai<baijinze1994@163.com>
Committed by Vijay Vasudevan<vrv@google.com>:
Merge two GPU kernel launching to one in DiagOp. (#13859)

---
Commit c0ca50a47 authored by Yan Facai (???)<facai.yan@gmail.com>
Committed by Vijay Vasudevan<vrv@google.com>:
ENH: add Relu6GradGrad (#13268)

* ENH: add Relu6GradGrad

* TST: add test case

* CLN: import nn_grad

* TST: add init value

---
Commit 8ff33271e authored by Justin Lebar<jlebar@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Dump the computation's SessionModule as part of the tf_compile rule.

PiperOrigin-RevId: 172946149

---
Commit ebcae4a5e authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add streaming_precision_recall_at_equal_thresholds

This helper method computes streaming tp, fp, tn, fp, precision, and recall for the user in a way that exhibits O(T + N) time and space complexity (instead of O(T * N)), where T is the number of thresholds and N is the size of the predictions tensor.

Thanks to Frank Chu for the efficient algorithm!

PiperOrigin-RevId: 172946073

---
Commit ccfd9c1e5 authored by Sanjoy Das<sanjoy@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Log Hlo IR during AOT compilation

PiperOrigin-RevId: 172944165

---
Commit 985031a10 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Allows tfe.enable_eager_execution(device_policy=tfe.DEVICE_POLICY_WARN).

PiperOrigin-RevId: 172943398

---
Commit 703182d85 authored by Mingxing Tan<tanmingxing@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Add performance guide for fused decode_and_crop_jpeg optimization.

PiperOrigin-RevId: 172943116

---
Commit 66b1f4383 authored by Francois Chollet<fchollet@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Make Network compatible with eager mode. Currently it only allows to instantiate a Network in eager mode using the regular Keras API, and call it on eager tensors.

PiperOrigin-RevId: 172942569

---
Commit 41df2cec2 authored by ashankar<ashankar@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Testing pending CL: 172939383

---
Commit 37fd95179 authored by Alexandre Passos<apassos@google.com>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
Simplifies capturing code in graph_callable to use recent function improvements.

PiperOrigin-RevId: 172937003

---
Commit d1e7382af authored by A. Unique TensorFlower<gardener@tensorflow.org>
Committed by TensorFlower Gardener<gardener@tensorflow.org>:
BEGIN_PUBLIC
Automated g4 rollback of changelist 172924803

PiperOrigin-RevId: 173347587
This commit is contained in:
Benoit Steiner 2017-10-24 19:47:46 -07:00 committed by TensorFlower Gardener
parent e384e28a97
commit 355e25ebca
193 changed files with 5394 additions and 1399 deletions

5
.gitignore vendored
View File

@ -17,3 +17,8 @@ cmake_build/
.idea/**
/build/
/tensorflow/core/util/version_info.cc
/tensorflow/python/framework/fast_tensor_util.cpp
Pods
Podfile.lock
*.pbxproj
*.xcworkspacedata

View File

@ -38,10 +38,11 @@ People who are a little more adventurous can also try our nightly binaries:
**Nightly pip packages**
* We are pleased to announce that TensorFlow now offers nightly pip packages
under the [tf-nightly](https://pypi.python.org/pypi/tf-nightly) project on pypi.
Simply run `pip install tf-nightly` in a clean environment to install the nightly
tensorflow build. We currently only support CPU packages on Linux, Mac, and Windows.
GPU packages on all platforms will arrive soon!
under the [tf-nightly](https://pypi.python.org/pypi/tf-nightly) and
[tf-nightly-gpu](https://pypi.python.org/pypi/tf-nightly-gpu) project on pypi.
Simply run `pip install tf-nightly` or `pip install tf-nightly-gpu` in a clean
environment to install the nightly TensorFlow build. We support CPU and GPU
packages on Linux, Mac, and Windows.
**Individual whl files**

View File

@ -1,20 +1,51 @@
# Release 1.4.0
## Major Features And Improvements
* `tf.keras` is now part of the core TensorFlow API.
* [`tf.data`](http://tensorflow.org/programmers_guide/datasets) is now part of
the core TensorFlow API.
* The API is now subject to backwards compatibility guarantees.
* For a guide to migrating from the `tf.contrib.data` API, see the
[README] (https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/data/README.md).
[README](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/data/README.md).
* Major new features include `Dataset.from_generator()` (for building an input
pipeline from a Python generator), and the `Dataset.apply()` method for
applying custom transformation functions.
* Several custom transformation functions have been added, including
`tf.contrib.data.batch_and_drop_remainder()` and
`tf.contrib.data.sloppy_interleave()`.
* Add `train_and_evaluate` for simple distributed `Estimator` training.
* Add `tf.spectral.dct` for computing the DCT-II.
* Add Mel-Frequency Cepstral Coefficient support to `tf.contrib.signal`
(with GPU and gradient support).
* Add a self-check on `import tensorflow` for Windows DLL issues.
* Add NCHW support to `tf.depth_to_space` on GPU.
* SinhArcsinh (scalar) distribution added to `contrib.distributions`.
* Make `GANEstimator` opensource.
* `Estimator.export_savedmodel()` now includes all valid serving signatures
that can be constructed from the Serving Input Receiver and all available
ExportOutputs. For instance, a classifier may provide regression- and
prediction-flavored outputs, in addition to the classification-flavored one.
Building signatures from these allows TF Serving to honor requests using the
different APIs (Classify, Regress, and Predict). Furthermore,
`serving_input_receiver_fn()` may now specify alternative subsets of nodes
that may act as inputs. This allows, for instance, producing a prediction
signature for a classifier that accepts raw `Tensors` instead of a serialized
`tf.Example`.
* Add `tf.contrib.bayesflow.hmc`.
* Add `tf.contrib.distributions.MixtureSameFamily`.
* Make `Dataset.shuffle()` always reshuffles after each iteration by default.
* Add `tf.contrib.bayesflow.metropolis_hastings`.
* Add `log_rate` parameter to `tf.contrib.distributions.Poisson`.
* Extend `tf.contrib.distributions.bijector` API to handle some non-injective
transforms.
* Java:
* Generics (e.g., `Tensor<Integer>`) for improved type-safety (courtesy @andrewcmyers).
* Generics (e.g., `Tensor<Integer>`) for improved type-safety
(courtesy @andrewcmyers).
* Support for multi-dimensional string tensors.
* Support loading of custom operations (e.g. many in `tf.contrib`) on Linux
and OS X
* All our prebuilt binaries have been built with CUDA 8 and cuDNN 6.
We anticipate releasing TensorFlow 1.5 with CUDA 9 and cuDNN 7.
## Bug Fixes and Other Changes
* `tf.nn.rnn_cell.DropoutWrapper` is now more careful about dropping out LSTM
@ -26,11 +57,57 @@
* Removed `tf.contrib.training.python_input`. The same behavior, in a more
flexible and reproducible package, is available via the new
`tf.contrib.data.Dataset.from_generator` method!
* Fix `tf.contrib.distributions.Affine` incorrectly computing log-det-jacobian.
* Fix `tf.random_gamma` incorrectly handling non-batch, scalar draws.
* Resolved a race condition in TensorForest TreePredictionsV4Op.
* Google Cloud Storage file system and Hadoop file system support are now
default build options.
* Custom op libraries must link against libtensorflow_framework.so
(installed at `tf.sysconfig.get_lib()`).
## Breaking Changes to the API
* The signature of the `tf.contrib.data.rejection_resample()` function has been
changed. It now returns a function that can be used as an argument to
`Dataset.apply()`.
* Remove `tf.contrib.data.Iterator.from_dataset()` method. Use
`Dataset.make_initializable_iterator()` instead.
* Remove seldom used and unnecessary `tf.contrib.data.Iterator.dispose_op()`.
* Reorder some TFGAN loss functions in a non-backwards compatible way.
## Thanks to our Contributors
This release contains contributions from many people at Google, as well as:
4d55397500, Abdullah Alrasheed, abenmao, Adam Salvail, Aditya Dhulipala, Ag Ramesh,
Akimasa Kimura, Alan Du, Alan Yee, Alexander, Amit Kushwaha, Amy, Andrei Costinescu,
Andrei Nigmatulin, Andrew Erlichson, Andrew Myers, Andrew Stepanov, Androbin, AngryPowman,
Anish Shah, Anton Daitche, Artsiom Chapialiou, asdf2014, Aseem Raj Baranwal, Ash Hall,
Bart Kiers, Batchu Venkat Vishal, ben, Ben Barsdell, Bill Piel, Carl Thomé, Catalin Voss,
Changming Sun, Chengzhi Chen, Chi Zeng, Chris Antaki, Chris Donahue, Chris Oelmueller,
Chris Tava, Clayne Robison, Codrut, Courtial Florian, Dalmo Cirne, Dan J, Darren Garvey,
David Kristoffersson, David Norman, David RöThlisberger, DavidNorman, Dhruv, DimanNe,
Dorokhov, Duncan Mac-Vicar P, EdwardDixon, EMCP, error.d, FAIJUL, Fan Xia,
Francois Xavier, Fred Reiss, Freedom" Koan-Sin Tan, Fritz Obermeyer, Gao, Xiang,
Guenther Schmuelling, Guo Yejun (郭叶军), Hans Gaiser, HectorSVC, Hyungsuk Yoon,
James Pruegsanusak, Jay Young, Jean Wanka, Jeff Carpenter, Jeremy Rutman, Jeroen BéDorf,
Jett Jones, Jimmy Jia, jinghuangintel, jinze1994, JKurland, Joel Hestness, joetoth,
John B Nelson, John Impallomeni, John Lawson, Jonas, Jonathan Dekhtiar, joshkyh, Jun Luan,
Jun Mei, Kai Sasaki, Karl Lessard, karl@kubx.ca, Kb Sriram, Kenichi Ueno, Kevin Slagle,
Kongsea, Lakshay Garg, lhlmgr, Lin Min, liu.guangcong, Loki Der Quaeler, Louie Helm,
lucasmoura, Luke Iwanski, Lyndon White, Mahmoud Abuzaina, Marcel Puyat, Mark Aaron Shirley,
Michele Colombo, MtDersvan, Namrata-Ibm, Nathan Luehr, Naurril, Nayana Thorat, Nicolas Lopez,
Niranjan Hasabnis, Nolan Liu, Nouce, Oliver Hennigh, osdamv, Patrik Erdes,
Patryk Chrabaszcz, Pavel Christof, Penghao Cen, postBG, Qingqing Cao, Qingying Chen, qjivy,
Raphael, Rasmi, raymondxyang, Renze Yu, resec, Roffel, Ruben Vereecken, Ryohei Kuroki,
sandipmgiri, Santiago Castro, Scott Kirkland, Sean Vig, Sebastian Raschka, Sebastian Weiss,
Sergey Kolesnikov, Sergii Khomenko, Shahid, Shivam Kotwalia, Stuart Berg, Sumit Gouthaman,
superzerg, Sven Mayer, tetris, Ti Zhou, Tiago Freitas Pereira, Tian Jin, Tomoaki Oiki,
Vaibhav Sood, vfdev, Vivek Rane, Vladimir Moskva, wangqr, Weber Xie, Will Frey,
Yan Facai (颜发才), yanivbl6, Yaroslav Bulatov, Yixing Lao, Yong Tang, youkaichao,
Yuan (Terry) Tang, Yue Zhang, Yuxin Wu, Ziming Dong, ZxYuan, 黄璞
We are also grateful to all who filed issues or helped resolve them, asked and
answered questions, and were part of inspiring discussions.
# Release 1.3.0

View File

@ -989,6 +989,7 @@ def main():
run_gen_git_source(environ_cp)
if is_windows():
environ_cp['TF_NEED_S3'] = '0'
environ_cp['TF_NEED_GCP'] = '0'
environ_cp['TF_NEED_HDFS'] = '0'
environ_cp['TF_NEED_JEMALLOC'] = '0'
@ -1001,9 +1002,9 @@ def main():
set_build_var(environ_cp, 'TF_NEED_JEMALLOC', 'jemalloc as malloc',
'with_jemalloc', True)
set_build_var(environ_cp, 'TF_NEED_GCP', 'Google Cloud Platform',
'with_gcp_support', False, 'gcp')
'with_gcp_support', True, 'gcp')
set_build_var(environ_cp, 'TF_NEED_HDFS', 'Hadoop File System',
'with_hdfs_support', False, 'hdfs')
'with_hdfs_support', True, 'hdfs')
set_build_var(environ_cp, 'TF_NEED_S3', 'Amazon S3 File System',
'with_s3_support', True, 's3')
set_build_var(environ_cp, 'TF_ENABLE_XLA', 'XLA JIT', 'with_xla_support',

View File

@ -323,6 +323,7 @@ filegroup(
"//tensorflow/compiler/jit/kernels:all_files",
"//tensorflow/compiler/jit/legacy_flags:all_files",
"//tensorflow/compiler/jit/ops:all_files",
"//tensorflow/compiler/plugin:all_files",
"//tensorflow/compiler/tests:all_files",
"//tensorflow/compiler/tf2xla:all_files",
"//tensorflow/compiler/tf2xla/cc:all_files",

View File

@ -1153,7 +1153,7 @@ TF_CAPI_EXPORT extern TF_Function* TF_FunctionImportFunctionDef(
const void* proto, size_t proto_len, TF_Status* status);
// Sets function attribute named `attr_name` to value stored in `proto`.
// If this attribute is already set to another value, it is overriden.
// If this attribute is already set to another value, it is overridden.
// `proto` should point to a sequence of bytes of length `proto_len`
// representing a binary serialization of an AttrValue protocol
// buffer.

View File

@ -13,6 +13,9 @@ See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#define _USE_MATH_DEFINES
#include <cmath>
#include "tensorflow/cc/ops/array_ops_internal.h"
#include "tensorflow/cc/ops/math_ops_internal.h"
#include "tensorflow/cc/ops/standard_ops.h"
@ -200,8 +203,8 @@ Status TanhGrad(const Scope& scope, const Operation& op,
// evaluated.
Scope grad_scope = scope.WithControlDependencies(grad);
auto y = ConjugateHelper(grad_scope, op.output(0));
grad_outputs->push_back(internal::TanhGrad(scope, y, grad));
return scope.status();
grad_outputs->push_back(internal::TanhGrad(grad_scope, y, grad));
return grad_scope.status();
}
REGISTER_GRADIENT_OP("Tanh", TanhGrad);
@ -256,8 +259,8 @@ Status SigmoidGrad(const Scope& scope, const Operation& op,
// evaluated.
Scope grad_scope = scope.WithControlDependencies(grad);
auto y = ConjugateHelper(grad_scope, op.output(0));
grad_outputs->push_back(internal::SigmoidGrad(scope, y, grad));
return scope.status();
grad_outputs->push_back(internal::SigmoidGrad(grad_scope, y, grad));
return grad_scope.status();
}
REGISTER_GRADIENT_OP("Sigmoid", SigmoidGrad);
@ -696,15 +699,32 @@ Status MeanGrad(const Scope& scope, const Operation& op,
}
REGISTER_GRADIENT_OP("Mean", MeanGrad);
Status ErfGrad(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
auto grad = grad_inputs[0];
auto two_over_root_pi = Cast(scope, Const(scope, 2 / std::sqrt(M_PI)),
grad.type());
Scope grad_scope = scope.WithControlDependencies(grad);
auto x = ConjugateHelper(grad_scope, op.input(0));
// grad * 2/sqrt(pi) * exp(-x**2)
auto dx = Mul(grad_scope,
Mul(grad_scope, grad, two_over_root_pi),
Exp(grad_scope, Neg(grad_scope, Square(grad_scope, x))));
grad_outputs->push_back(dx);
return grad_scope.status();
}
REGISTER_GRADIENT_OP("Erf", ErfGrad);
Status LgammaGrad(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
auto grad = grad_inputs[0];
Scope grad_scope = scope.WithControlDependencies(grad);
auto x = ConjugateHelper(grad_scope, op.input(0));
auto dx = Mul(scope, grad, Digamma(scope, x));
auto dx = Mul(grad_scope, grad, Digamma(grad_scope, x));
grad_outputs->push_back(dx);
return scope.status();
return grad_scope.status();
}
REGISTER_GRADIENT_OP("Lgamma", LgammaGrad);

View File

@ -64,7 +64,9 @@ class CWiseUnaryGradTest : public ::testing::Test {
IMAG,
CONJ,
COMPLEX,
ANGLE
ANGLE,
LGAMMA,
ERF
};
template <typename X_T, typename Y_T>
@ -168,6 +170,12 @@ class CWiseUnaryGradTest : public ::testing::Test {
case ANGLE:
y = Angle(scope_, x);
break;
case LGAMMA:
y = Lgamma(scope_, x);
break;
case ERF:
y = Erf(scope_, x);
break;
}
float max_error;
@ -503,6 +511,42 @@ TEST_F(CWiseUnaryGradTest, Angle) {
TestCWiseGrad<complex64, float>(ANGLE, x_fn);
}
TEST_F(CWiseUnaryGradTest, Lgamma) {
auto x_fn = [this](const int i) {
return RV({-3.5, -2.5, -1.5, 1.0, 2.0, 3.5});
};
TestCWiseGrad<float, float>(LGAMMA, x_fn);
}
TEST_F(CWiseUnaryGradTest, Lgamma_Complex) {
auto x_fn = [this](const int i) {
return CRV({{-3.5, 0.5}, {-1.5, -0.5}, {1.5, -1.0}, {3.5, 1.0}});
};
// TODO(kbsriram)
// Add test when the lgamma kernel supports complex numbers
if (false) {
TestCWiseGrad<complex64, complex64>(LGAMMA, x_fn);
}
}
TEST_F(CWiseUnaryGradTest, Erf) {
auto x_fn = [this](const int i) {
return RV({-1.2, -1.0, -0.5, 0.3, 0.5, 1.3});
};
TestCWiseGrad<float, float>(ERF, x_fn);
}
TEST_F(CWiseUnaryGradTest, Erf_Complex) {
auto x_fn = [this](const int i) {
return CRV({{-1.2, 0.5}, {-0.5, -0.5}, {0.5, 0.5}, {1.2, -0.5}});
};
// TODO(kbsriram)
// Add test when the erf kernel supports complex numbers
if (false) {
TestCWiseGrad<complex64, complex64>(ERF, x_fn);
}
}
class MathGradTest : public ::testing::Test {
protected:
MathGradTest() : root_(Scope::NewRootScope().WithDevice("/cpu:0")) {}
@ -821,17 +865,5 @@ TEST_F(NaryGradTest, Minimum) {
RunTest(x, x_init_value, y, shape);
}
TEST_F(NaryGradTest, Lgamma) {
TensorShape shape({3, 2});
auto x = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(shape));
auto y = Lgamma(scope_, x);
// Select values to avoid instability when computing finite differences.
// Ref: https://en.wikipedia.org/wiki/File:Gamma_plot.svg
Tensor x_init_value =
test::AsTensor<float>({-3.5f, -2.5f, -1.5f, 1.0f, 2.0f, 3.5f}, {3, 2});
RunTest(x, x_init_value, y, shape);
// TODO(suharshs): add test case for complex values
}
} // namespace
} // namespace tensorflow

View File

@ -33,6 +33,7 @@ cc_library(
deps = [
":xla_cpu_device",
":xla_cpu_jit",
"//tensorflow/compiler/plugin",
] + if_cuda_is_configured([
":xla_gpu_device",
":xla_gpu_jit",

View File

@ -0,0 +1,56 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Configuration file for an XLA plugin.
please don't check in changes to this file. to prevent changes appearing
in git status, use:
git update-index --assume-unchanged tensorflow/compiler/plugin/BUILD
To add additional devices to the XLA subsystem, add targets to the
dependency list in the 'plugin' target. For instance:
deps = ["//tensorflow/compiler/plugin/example:plugin_lib"],
** Please don't remove this file - it is supporting some 3rd party plugins **
"""
licenses(["notice"])
package(
default_visibility = ["//visibility:public"],
)
cc_library(
name = "plugin",
deps = [
#"//tensorflow/compiler/plugin/example:example_lib",
],
)
#-----------------------------------------------------------------------------
filegroup(
name = "all_files",
srcs = glob(
["**/*"],
exclude = [
"**/METADATA",
"**/OWNERS",
],
),
visibility = ["//tensorflow:__subpackages__"],
)

View File

@ -0,0 +1,16 @@
3rd party XLA devices
---------------------
This directory is intended as a place for 3rd party XLA devices which are _not_
integrated into the public repository.
By adding entries to the BUILD target in this directory, a third party device
can be included as a dependency of the JIT subsystem.
For integration into the unit test system, see the files:
- tensorflow/compiler/tests/plugin.bzl
- tensorflow/compiler/xla/tests/plugin.bzl
-

View File

@ -310,7 +310,7 @@ TEST_F(HloComputationTest, DeepCopyArrayAtIndices) {
}
TEST_F(HloComputationTest, DeepCopyTupleAtIndices) {
// Test that DeepCopyInstruction properly copies elements of a a tuple as
// Test that DeepCopyInstruction properly copies elements of a tuple as
// specified by the given indices.
auto builder = HloComputation::Builder(TestName());
auto constant1 = builder.AddInstruction(HloInstruction::CreateConstant(

View File

@ -90,8 +90,12 @@ Status InlinerVisitor::HandleMap(
// different than the map shape. Hence, a broadcast is needed, else the
// cloned operand with new shape and operands work.
if (root.opcode() != HloOpcode::kConstant) {
std::vector<HloInstruction*> params;
for (int64 o = 0; o < root.operands().size(); o++) {
params.push_back(operands[root.operand(o)->parameter_number()]);
}
HloInstruction* placed_instruction = computation_->AddInstruction(
root.CloneWithNewOperands(map->shape(), operands));
root.CloneWithNewOperands(map->shape(), params));
TF_RETURN_IF_ERROR(
computation_->ReplaceInstruction(map, placed_instruction));
} else {

View File

@ -108,5 +108,44 @@ TEST_F(InlinerTest, MapConstant) {
LiteralTestUtil::ExpectEqual(*result, *expected);
}
TEST_F(InlinerTest, MapSubtractOppositeOrder) {
Shape r0f32 = ShapeUtil::MakeShape(F32, {});
// Note that the parameter ordinals are in the opposite order to their
// position as operands
auto max_builder = HloComputation::Builder(TestName());
auto param1 = max_builder.AddInstruction(
HloInstruction::CreateParameter(1, r0f32, "x"));
auto param2 = max_builder.AddInstruction(
HloInstruction::CreateParameter(0, r0f32, "y"));
max_builder.AddInstruction(HloInstruction::CreateBinary(
param1->shape(), HloOpcode::kSubtract, param1, param2));
auto max_f32 = max_builder.Build();
auto builder = HloComputation::Builder("MapSubFunction");
auto lhs = builder.AddInstruction(
HloInstruction::CreateConstant(Literal::CreateR1<float>({1, 2, 3, 4})));
auto rhs = builder.AddInstruction(
HloInstruction::CreateConstant(Literal::CreateR1<float>({4, 3, 2, 1})));
builder.AddInstruction(
HloInstruction::CreateMap(lhs->shape(), {lhs, rhs}, max_f32.get()));
auto computation = builder.Build();
auto hlo_module = CreateNewModule();
hlo_module->AddEmbeddedComputation(std::move(max_f32));
hlo_module->AddEntryComputation(std::move(computation));
Inliner inliner;
EXPECT_TRUE(inliner.Run(hlo_module.get()).ValueOrDie());
EXPECT_THAT(hlo_module->entry_computation()->root_instruction(),
op::Subtract(rhs, lhs));
// Verify execution on CPU.
auto result = ExecuteAndTransfer(std::move(hlo_module), {});
auto expected = Literal::CreateR1<float>({3, 1, -1, -3});
LiteralTestUtil::ExpectEqual(*result, *expected);
}
} // namespace
} // namespace xla

View File

@ -191,7 +191,7 @@ def _ragged_split(tensor, pieces):
def _ring_permutations(num_workers, num_subchunks, gpu_perm):
""""Generate an array of device index arrays, one for for each subchunk.
""""Generate an array of device index arrays, one for each subchunk.
In the basic ring reduction algorithm there are size(T)/num_devices
data chunks and each device process one chunk per tick, i.e. sending

View File

@ -1,7 +1,7 @@
# TF Boosted Trees (TFBT)
TF Boosted trees is an implementation of a gradient boosting algorithm with
trees used as week learners.
trees used as weak learners.
## Examples
Folder "examples" demonstrates how TFBT estimators can be used for various

View File

@ -21,7 +21,7 @@ r"""Demonstrates multiclass MNIST TF Boosted trees example.
python tensorflow/contrib/boosted_trees/examples/binary_mnist.py \
--output_dir="/tmp/binary_mnist" --depth=4 --learning_rate=0.3 \
--batch_size=10761 --examples_per_layer=10761 --eval_batch_size=1030 \
--num_eval_steps=1 --num_trees=10 --l2=1 --vmodule=training_ops=1 \
--num_eval_steps=1 --num_trees=10 --l2=1 --vmodule=training_ops=1
When training is done, accuracy on eval data is reported. Point tensorboard
to the directory for the run to see how the training progresses:

View File

@ -22,7 +22,7 @@ r"""Demonstrates multiclass MNIST TF Boosted trees example.
python tensorflow/contrib/boosted_trees/examples/mnist.py \
--output_dir="/tmp/mnist" --depth=4 --learning_rate=0.3 --batch_size=60000 \
--examples_per_layer=60000 --eval_batch_size=10000 --num_eval_steps=1 \
--num_trees=10 --l2=1 --vmodule=training_ops=1 \
--num_trees=10 --l2=1 --vmodule=training_ops=1
When training is done, accuracy on eval data is reported. Point tensorboard
to the directory for the run to see how the training progresses:

View File

@ -14,8 +14,8 @@
# ==============================================================================
include (ExternalProject)
set(cub_URL https://mirror.bazel.build/github.com/NVlabs/cub/archive/1.7.3.zip)
set(cub_HASH SHA256=b7ead9e291d34ffa8074243541c1380d63be63f88de23de8ee548db573b72ebe)
set(cub_URL https://mirror.bazel.build/github.com/NVlabs/cub/archive/1.7.4.zip)
set(cub_HASH SHA256=20a1a39fd97e5da7f40f5f2e7fd73fd2ea59f9dc4bb8a6c5f228aa543e727e31)
set(cub_BUILD ${CMAKE_CURRENT_BINARY_DIR}/cub/src/cub)
set(cub_INCLUDE_DIR ${CMAKE_CURRENT_BINARY_DIR}/cub/src/cub)
set(cub_ARCHIVE_DIR ${CMAKE_CURRENT_BINARY_DIR}/external/cub_archive)

View File

@ -15,8 +15,8 @@
include (ExternalProject)
set(PROTOBUF_INCLUDE_DIRS ${CMAKE_CURRENT_BINARY_DIR}/protobuf/src/protobuf/src)
set(PROTOBUF_URL https://github.com/mrry/protobuf.git) # Includes MSVC fix.
set(PROTOBUF_TAG 1d2c7b6c7376f396c8c7dd9b6afd2d4f83f3cb05)
set(PROTOBUF_URL https://github.com/google/protobuf.git)
set(PROTOBUF_TAG b04e5cba356212e4e8c66c61bbe0c3a20537c5b9)
if(WIN32)
set(protobuf_STATIC_LIBRARIES

View File

@ -33,6 +33,8 @@ else(tensorflow_BUILD_ALL_KERNELS)
"${tensorflow_source_dir}/tensorflow/core/kernels/matmul_op.cc"
"${tensorflow_source_dir}/tensorflow/core/kernels/no_op.h"
"${tensorflow_source_dir}/tensorflow/core/kernels/no_op.cc"
"${tensorflow_source_dir}/tensorflow/core/kernels/ops_util.h"
"${tensorflow_source_dir}/tensorflow/core/kernels/ops_util.cc"
"${tensorflow_source_dir}/tensorflow/core/kernels/sendrecv_ops.h"
"${tensorflow_source_dir}/tensorflow/core/kernels/sendrecv_ops.cc"
)
@ -65,6 +67,8 @@ if(tensorflow_BUILD_CONTRIB_KERNELS)
"${tensorflow_source_dir}/tensorflow/contrib/boosted_trees/ops/split_handler_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/boosted_trees/ops/stats_accumulator_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/boosted_trees/ops/training_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/cudnn_rnn/kernels/cudnn_rnn_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/cudnn_rnn/ops/cudnn_rnn_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/factorization/kernels/clustering_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/factorization/kernels/masked_matmul_ops.cc"
"${tensorflow_source_dir}/tensorflow/contrib/factorization/kernels/wals_solver_ops.cc"

View File

@ -179,6 +179,9 @@ if (tensorflow_BUILD_PYTHON_TESTS)
# exclude the ones we don't want
set(tf_test_src_py_exclude
# generally excluded
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/__init__.py"
# Python source line inspection tests are flaky on Windows (b/36375074).
"${tensorflow_source_dir}/tensorflow/python/debug/cli/analyzer_cli_test.py"
"${tensorflow_source_dir}/tensorflow/python/debug/cli/profile_analyzer_cli_test.py"
@ -188,19 +191,16 @@ if (tensorflow_BUILD_PYTHON_TESTS)
"${tensorflow_source_dir}/tensorflow/python/debug/lib/dist_session_debug_grpc_test.py"
"${tensorflow_source_dir}/tensorflow/python/debug/lib/session_debug_grpc_test.py"
# generally not working
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/__init__.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/benchmark_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/resource_variable_ops_test.py"
"${tensorflow_source_dir}/tensorflow/python/profiler/pprof_profiler_test.py"
# flaky test
"${tensorflow_source_dir}/tensorflow/python/profiler/internal/run_metadata_test.py"
# Fails because uses data dependencies with bazel
"${tensorflow_source_dir}/tensorflow/python/saved_model/saved_model_test.py"
# requires scipy
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/preprocessing/*_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/tfprof/python/tools/tfprof/pprof_profiler_test.py"
# flaky tests
# Takes very long to run without sharding (defined in bazel build file).
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/cwise_ops_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/tfprof/python/tools/tfprof/internal/run_metadata_test.py"
# Loading resources in contrib doesn't seem to work on Windows
"${tensorflow_source_dir}/tensorflow/contrib/tensor_forest/client/random_forest_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/tensor_forest/python/tensor_forest_test.py"
@ -213,47 +213,57 @@ if (tensorflow_BUILD_PYTHON_TESTS)
if (WIN32)
set(tf_test_src_py_exclude
${tf_test_src_py_exclude}
# generally excluded
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/__init__.py"
# TODO: failing tests.
# Nothing critical in here but should get this list down to []
# The failing list is grouped by failure source
# stl on windows handles overflows different
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/as_string_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/cast_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/string_to_number_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/clip_ops_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/tensor_array_ops_test.py" # Needs portpicker.
# Matrix_set_diag failing on GPU on windows.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/cholesky_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/diag_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/linalg_ops_test.py"
"${tensorflow_source_dir}/tensorflow/python/ops/init_ops.py"
# Numerical issues, calculations off.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/concat_op_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/factorization/python/ops/wals_test.py"
# Float division by zero
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/benchmark_test.py"
# Flaky, for unknown reasons. Cannot reproduce in terminal. Revisit once we can get stack traces.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/batch_matmul_op_test.py"
# Flaky because of local cluster creation.
"${tensorflow_source_dir}/tensorflow/python/training/sync_replicas_optimizer_test.py"
"${tensorflow_source_dir}/tensorflow/python/debug/lib/session_debug_grpc_test.py"
"${tensorflow_source_dir}tensorflow/python/training/localhost_cluster_performance_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/iterator_ops_cluster_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/functional_ops_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/data/python/kernel_tests/iterator_ops_cluster_test.py"
# Type error in testRemoteIteratorUsingRemoteCallOpDirectSessionGPUCPU.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/iterator_ops_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/self_adjoint_eig_op_test.py"
# misc
"${tensorflow_source_dir}/tensorflow/contrib/data/python/kernel_tests/iterator_ops_test.py"
# IteratorGetMax OutOfRangeError
"${tensorflow_source_dir}/tensorflow/contrib/data/python/kernel_tests/batch_dataset_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/variable_scope_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/reshape_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/training/evaluation_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/neon_depthwise_conv_op_test.py" # Depends on gemmlowp -> pthread.
# Depends on gemmlowp -> pthread
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/neon_depthwise_conv_op_test.py"
# int32/int64 mixup
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/cast_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/variable_scope_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/functional_ops_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/py_func_test.py"
# Windows file management related issues.
"${tensorflow_source_dir}/tensorflow/python/training/evaluation_test.py"
# training tests
"${tensorflow_source_dir}/tensorflow/python/training/basic_session_run_hooks_test.py" # Needs tf.contrib fix.
"${tensorflow_source_dir}/tensorflow/python/training/localhost_cluster_performance_test.py" # Needs portpicker.
"${tensorflow_source_dir}/tensorflow/python/training/quantize_training_test.py" # Needs quantization ops to be included in windows.
"${tensorflow_source_dir}/tensorflow/python/training/supervisor_test.py" # Flaky I/O error on rename.
"${tensorflow_source_dir}/tensorflow/python/training/sync_replicas_optimizer_test.py" # Needs portpicker.
"${tensorflow_source_dir}/tensorflow/python/training/server_lib_test.py" # Test occasionally deadlocks.
"${tensorflow_source_dir}/tensorflow/python/debug/lib/session_debug_multi_gpu_test.py"
"${tensorflow_source_dir}/tensorflow/python/debug/lib/session_debug_multi_gpu_test.py" # Fails on multiple GPUs.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/concat_op_test.py" # numerical issues
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/linalg_grad_test.py" # cudaSolver handle creation fails.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/array_ops_test.py" # depends on python/framework/test_ops
# Dataset tests
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/dataset_constructor_op_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/dataset_constructor_op_test.py" # Segfaults on windows
"${tensorflow_source_dir}/tensorflow/contrib/data/python/kernel_tests/dataset_constructor_op_test.py" # Segfaults on Windows.
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/iterator_ops_cluster_test.py"
# Broken tensorboard test due to cmake issues.
"${tensorflow_source_dir}/tensorflow/contrib/data/python/kernel_tests/iterator_ops_cluster_test.py" # Needs portpicker
@ -264,8 +274,6 @@ if (tensorflow_BUILD_PYTHON_TESTS)
"${tensorflow_source_dir}/tensorflow/contrib/tensor_forest/python/kernel_tests/scatter_add_ndim_op_test.py" # Bad placement.
"${tensorflow_source_dir}/tensorflow/contrib/tensor_forest/python/topn_test.py" # Results inaccurate
"${tensorflow_source_dir}/tensorflow/python/ops/cloud/bigquery_reader_ops_test.py" # No libcurl support
# Newly running on Windows since TensorBoard backend move. Fail on Windows and need debug.
"${tensorflow_source_dir}/tensorflow/contrib/data/python/kernel_tests/dataset_constructor_op_test.py" # Segfaults on Windows.
# Dask.Dataframe bugs on Window Build
"${tensorflow_source_dir}/tensorflow/contrib/learn/python/learn/tests/dataframe/tensorflow_dataframe_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/learn/python/learn/learn_io/data_feeder_test.py"
@ -274,37 +282,15 @@ if (tensorflow_BUILD_PYTHON_TESTS)
# Need extra build
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/conditional_distribution_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/conditional_transformed_distribution_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/estimator_test.py"
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/array_ops_test.py" # depends on python/framework/test_ops
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/depthtospace_op_test.py" # QuantizeV2
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/spacetodepth_op_test.py" # QuantizeV2
# Windows Path
"${tensorflow_source_dir}/tensorflow/contrib/framework/python/ops/checkpoint_ops_test.py" #TODO: Fix path
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/models_test.py"
# Related to Windows Multiprocessing https://github.com/fchollet/keras/issues/5071
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/engine/training_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/utils/data_utils_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/callbacks_test.py"
# Scipy needed
"${tensorflow_source_dir}/tensorflow/contrib/keras/python/keras/preprocessing/image_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/bijectors/sigmoid_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/binomial_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/chi2_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/geometric_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/inverse_gamma_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/logistic_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/mixture_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/mvn_diag_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/mvn_full_covariance_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/mvn_tril_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/negative_binomial_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/poisson_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/quantized_distribution_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/relaxed_bernoulli_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/relaxed_onehot_categorical_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/transformed_distribution_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/vector_student_t_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/wishart_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/factorization/python/ops/kmeans_test.py"
"${tensorflow_source_dir}/tensorflow/contrib/learn/python/learn/estimators/kmeans_test.py"
# Failing with TF 1.3 (TODO)
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/estimator_test.py"
# Numpy upgrade needed?
"${tensorflow_source_dir}/tensorflow/contrib/distributions/python/kernel_tests/bijectors/sinh_arcsinh_test.py"
# Test should only be run manually
"${tensorflow_source_dir}/tensorflow/python/kernel_tests/reduction_ops_test_big.py"

View File

@ -385,7 +385,7 @@ class CudnnRNNTestSaveRestore(TensorFlowTestCase):
reset_op = state_ops.assign(
opaque_params,
array_ops.zeros(array_ops.shape(opaque_params), dtype=dtype))
# Passing graph explictly, otherwise an old sess would be reused.
# Passing graph explicitly, otherwise an old sess would be reused.
with self.test_session(use_gpu=True, graph=g) as sess:
sess.run(variables.global_variables_initializer())
val = saver.save(sess, save_path)
@ -436,7 +436,7 @@ class CudnnRNNTestSaveRestore(TensorFlowTestCase):
save_path = os.path.join(self.get_temp_dir(),
"save-restore-variable-test2")
saver = saver_lib.Saver()
# Passing graph explictly, otherwise an old sess would be reused.
# Passing graph explicitly, otherwise an old sess would be reused.
with self.test_session(use_gpu=True, graph=g) as sess:
sess.run(variables.global_variables_initializer())
val = saver.save(sess, save_path)
@ -484,7 +484,7 @@ class CudnnRNNTestSaveRestore(TensorFlowTestCase):
array_ops.zeros(
array_ops.shape(rnn.trainable_variables[0]), dtype=dtype))
# Passing graph explictly, otherwise an old sess would be reused.
# Passing graph explicitly, otherwise an old sess would be reused.
with self.test_session(use_gpu=True, graph=g) as sess:
sess.run(variables.global_variables_initializer())
inputs, initial_state = model.SynthesizeInput(seq_length, batch_size)

View File

@ -11,6 +11,9 @@ py_test(
size = "small",
srcs = ["batch_dataset_op_test.py"],
srcs_version = "PY2AND3",
tags = [
"manual", # b/67958604
],
deps = [
"//tensorflow/contrib/data/python/ops:dataset_ops",
"//tensorflow/contrib/data/python/ops:transformation_ops",
@ -358,6 +361,9 @@ py_test(
size = "small",
srcs = ["sloppy_transformation_dataset_op_test.py"],
srcs_version = "PY2AND3",
tags = [
"manual", # b/67958761
],
deps = [
"//tensorflow/contrib/data/python/ops:dataset_ops",
"//tensorflow/contrib/data/python/ops:transformation_ops",

View File

@ -10,9 +10,8 @@ package(default_visibility = [
"//tensorflow:__subpackages__",
])
load("//tensorflow:tensorflow.bzl", "cuda_py_test")
load("//tensorflow:tensorflow.bzl", "tf_custom_op_py_library")
load("//tensorflow:tensorflow.bzl", "py_test")
load("//tensorflow:tensorflow.bzl", "tf_custom_op_py_library")
load("//tensorflow:tensorflow.bzl", "tf_custom_op_library")
load("//tensorflow:tensorflow.bzl", "tf_gen_op_wrapper_py")
load("//tensorflow:tensorflow.bzl", "tf_gen_op_libs")
@ -27,6 +26,7 @@ tf_custom_op_py_library(
"python/framework/experimental.py",
"python/framework/tensor_util.py",
"python/ops/__init__.py",
"python/ops/accumulate_n_v2.py",
"python/ops/arg_scope.py",
"python/ops/audio_ops.py",
"python/ops/checkpoint_ops.py",
@ -149,6 +149,31 @@ py_test(
],
)
py_test(
name = "accumulate_n_v2_test",
size = "small",
srcs = ["python/ops/accumulate_n_v2_test.py"],
srcs_version = "PY2AND3",
deps = [
":framework_py",
"//tensorflow/python:client_testlib",
"//tensorflow/python:framework_for_generated_wrappers",
],
)
py_test(
name = "accumulate_n_v2_eager_test",
size = "small",
srcs = ["python/ops/accumulate_n_v2_eager_test.py"],
srcs_version = "PY2AND3",
deps = [
":framework_py",
"//tensorflow/python:client_testlib",
"//tensorflow/python:framework_for_generated_wrappers",
"//tensorflow/python/eager:backprop",
],
)
py_test(
name = "ops_test",
size = "small",

View File

@ -0,0 +1,111 @@
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Ops that will eventually be folded into tensorflow/python/ops/math_ops.py
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.python.eager import context
from tensorflow.python.framework import ops
from tensorflow.python.framework import tensor_shape
from tensorflow.python.ops import gen_math_ops
from tensorflow.python.ops import math_ops
def accumulate_n_v2(inputs, shape=None, tensor_dtype=None, name=None):
"""Returns the element-wise sum of a list of tensors.
Optionally, pass `shape` and `tensor_dtype` for shape and type checking,
otherwise, these are inferred.
`tf.accumulate_n_v2` performs the same operation as `tf.add_n`, but does not
wait for all of its inputs to be ready before beginning to sum. This can
save memory if inputs are ready at different times, since minimum temporary
storage is proportional to the output size rather than the inputs size.
Unlike the original `accumulate_n`, `accumulate_n_v2` is differentiable.
For example:
```python
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 0], [0, 6]])
tf.accumulate_n_v2([a, b, a]) # [[7, 4], [6, 14]]
# Explicitly pass shape and type
tf.accumulate_n_v2([a, b, a], shape=[2, 2], tensor_dtype=tf.int32)
# [[7, 4],
# [6, 14]]
```
Args:
inputs: A list of `Tensor` objects, each with same shape and type.
shape: Shape of elements of `inputs`.
tensor_dtype: The type of `inputs`.
name: A name for the operation (optional).
Returns:
A `Tensor` of same shape and type as the elements of `inputs`.
Raises:
ValueError: If `inputs` don't all have same shape and dtype or the shape
cannot be inferred.
"""
_INPUTS_ERR_MSG = ValueError("inputs must be a list of at least one Tensor"
"with the same dtype and shape")
if not inputs or not isinstance(inputs, (list, tuple)):
raise _INPUTS_ERR_MSG
inputs = ops.convert_n_to_tensor_or_indexed_slices(inputs)
if not all(isinstance(x, ops.Tensor) for x in inputs):
raise _INPUTS_ERR_MSG
if not all(x.dtype == inputs[0].dtype for x in inputs):
raise _INPUTS_ERR_MSG
if shape is not None:
shape = tensor_shape.as_shape(shape)
else:
shape = tensor_shape.unknown_shape()
for input_tensor in inputs:
if isinstance(input_tensor, ops.Tensor):
shape = shape.merge_with(input_tensor.get_shape())
# tensor_dtype is for safety only; operator's output type computed in C++
if tensor_dtype is not None and tensor_dtype != inputs[0].dtype:
raise TypeError("tensor_dtype is {}, but input is of type {}"
.format(tensor_dtype, inputs[0].dtype))
if len(inputs) == 1 and name is None:
return inputs[0]
elif len(inputs) == 1 and name is not None:
return array_ops.identity(inputs[0], name=name)
elif context.in_eager_mode():
# TemporaryVariable not currently supported in eager mode; fall back
# onto AddN for now.
# TODO(frreiss) remove this once the lifetime of eager variables gets
# addressed
return math_ops.add_n(inputs, name=name)
else:
return gen_math_ops._accumulate_nv2(inputs, name=name, shape=shape)
# The following code should eventually be merged into
# tensorflow/python/ops/math_grad.py
@ops.RegisterGradient("AccumulateNV2")
def _AddNGrad(op, grad):
"""Same as gradient for AddN. Copies the gradient to all inputs."""
# Not broadcasting.
return [grad] * len(op.inputs)

View File

@ -0,0 +1,85 @@
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for new version of accumulate_n op that will eventually go into
`ops.math_ops`.
These test cases spefically exercise the `eager` APIs. They need to be in a
separate file from the remaining tests because eager mode is currently something
you can turn on but can't turn off for the lifetime of the current process."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from tensorflow.contrib.framework.python.ops import accumulate_n_v2 as av2
from tensorflow.python.eager import backprop
from tensorflow.python.eager import context as eager_context
from tensorflow.python.eager import tape
from tensorflow.python.framework import constant_op
from tensorflow.python.framework import dtypes as dtypes_lib
from tensorflow.python.framework import ops
from tensorflow.python.framework import test_util
from tensorflow.python.ops import gradients
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import resource_variable_ops
from tensorflow.python.platform import test
class AccumulateNV2EagerTest(test_util.TensorFlowTestCase):
"""Tests of the new, differentiable version of accumulate_n"""
def testMinimalEagerMode(self):
forty = constant_op.constant(40)
two = constant_op.constant(2)
answer = av2.accumulate_n_v2([forty, two])
self.assertEqual(42, answer.numpy())
def testFloat(self):
np.random.seed(12345)
x = [np.random.random((1, 2, 3, 4, 5)) - 0.5 for _ in range(5)]
tf_x = ops.convert_n_to_tensor(x)
with self.test_session(use_gpu=True):
self.assertAllClose(sum(x), av2.accumulate_n_v2(tf_x).numpy())
self.assertAllClose(x[0] * 5, av2.accumulate_n_v2([tf_x[0]] * 5).numpy())
def testGrad(self):
np.random.seed(42)
num_inputs = 3
input_vars = [
resource_variable_ops.ResourceVariable(10.0 * np.random.random(),
name="t%d" % i)
for i in range(0, num_inputs)
]
def fn(first, second, third):
return av2.accumulate_n_v2([first, second, third])
grad_fn = backprop.gradients_function(fn)
grad = grad_fn(input_vars[0], input_vars[1], input_vars[2])
self.assertAllEqual(np.repeat(1.0, num_inputs), # d/dx (x + y + ...) = 1
[elem.numpy() for elem in grad])
if __name__ == "__main__":
ops.enable_eager_execution()
test.main()

View File

@ -0,0 +1,123 @@
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for new version of accumulate_n op that will eventually go into
`ops.math_ops`."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from tensorflow.contrib.framework.python.ops import accumulate_n_v2 as av2
from tensorflow.python.framework import constant_op
from tensorflow.python.framework import dtypes as dtypes_lib
from tensorflow.python.framework import ops
from tensorflow.python.framework import test_util
from tensorflow.python.ops import gradients
from tensorflow.python.ops import variables
from tensorflow.python.platform import googletest
class AccumulateNV2Test(test_util.TensorFlowTestCase):
"""Tests of the new, differentiable version of accumulate_n"""
def testFloat(self):
np.random.seed(12345)
x = [np.random.random((1, 2, 3, 4, 5)) - 0.5 for _ in range(5)]
tf_x = ops.convert_n_to_tensor(x)
with self.test_session(use_gpu=True):
self.assertAllClose(sum(x), av2.accumulate_n_v2(tf_x).eval())
self.assertAllClose(x[0] * 5, av2.accumulate_n_v2([tf_x[0]] * 5).eval())
def testInt(self):
np.random.seed(54321)
x = [np.random.randint(-128, 128, (5, 4, 3, 2, 1)) for _ in range(6)]
tf_x = ops.convert_n_to_tensor(x)
with self.test_session(use_gpu=True):
self.assertAllEqual(sum(x), av2.accumulate_n_v2(tf_x).eval())
self.assertAllEqual(x[0] * 6, av2.accumulate_n_v2([tf_x[0]] * 6).eval())
def testGrad(self):
np.random.seed(42)
for num_inputs in range(1, 10):
with self.test_session(use_gpu=True) as sess:
input_vars = [
variables.Variable(10.0 * np.random.random())
for i in range(0, num_inputs)
]
accum_n = av2.accumulate_n_v2(input_vars)
sess.run(variables.global_variables_initializer())
accum_n_grad = gradients.gradients(accum_n, input_vars)
self.assertAllEqual(np.repeat(1.0, num_inputs), # d/dx (x + y + ...) = 1
[g.eval() for g in accum_n_grad])
# The tests below used to be in a separate class under cwise_ops_test.py,
# which did not run in the default test target.
# Putting them here so that everything that exercises AccumulateNV2 is in
# one place and the default build runs all unit tests.
def testSimple(self):
with self.test_session():
random_arrays = [
np.random.rand(16, 16, 16, 16).astype(np.float32) for _ in range(20)
]
random_tensors = [
ops.convert_to_tensor(
x, dtype=dtypes_lib.float32) for x in random_arrays
]
tf_val = av2.accumulate_n_v2(random_tensors)
np_val = random_arrays[0]
for random_array in random_arrays[1:]:
np_val += random_array
self.assertAllClose(np_val, tf_val.eval())
def testZeroArgs(self):
with self.test_session():
with self.assertRaises(ValueError):
tf_val = av2.accumulate_n_v2([])
tf_val.eval()
def testWrongShape(self):
with self.test_session():
with self.assertRaises(ValueError):
a = variables.Variable(0.2)
b = variables.Variable(0.1)
tf_val = av2.accumulate_n_v2([a,b], shape=[2,2]) # Should be shape=[]
def testIncompatibleShapes(self):
with self.test_session():
with self.assertRaises(ValueError):
a = variables.Variable(np.array([0.1,0.2]))
b = variables.Variable(np.array([[0.3],[0.4]]))
tf_val = av2.accumulate_n_v2([a,b])
def testWrongType(self):
with self.test_session():
with self.assertRaises(TypeError):
a = variables.Variable(0.2, dtype=np.float32)
b = variables.Variable(0.1, dtype=np.float32)
tf_val = av2.accumulate_n_v2([a,b], tensor_dtype=np.int32)
def testWrongTypeOneInput(self):
# Scenario that used to trigger a bug, even when testWrongType() worked
with self.test_session():
with self.assertRaises(TypeError):
a = variables.Variable(0.2, dtype=np.float32)
tf_val = av2.accumulate_n_v2([a], tensor_dtype=np.int32)
if __name__ == "__main__":
googletest.main()

View File

@ -26,6 +26,8 @@ projective transforms (including rotation) are supported.
@@random_yiq_hsv
@@rotate
@@transform
@@translate
@@translations_to_projective_transforms
@@bipartite_match
@@single_image_random_dot_stereograms
"""
@ -41,6 +43,8 @@ from tensorflow.contrib.image.python.ops.image_ops import angles_to_projective_t
from tensorflow.contrib.image.python.ops.image_ops import compose_transforms
from tensorflow.contrib.image.python.ops.image_ops import rotate
from tensorflow.contrib.image.python.ops.image_ops import transform
from tensorflow.contrib.image.python.ops.image_ops import translate
from tensorflow.contrib.image.python.ops.image_ops import translations_to_projective_transforms
from tensorflow.contrib.image.python.ops.single_image_random_dot_stereograms import single_image_random_dot_stereograms
from tensorflow.python.util.all_util import remove_undocumented

View File

@ -36,8 +36,8 @@ _DTYPES = set(
class ImageOpsTest(test_util.TensorFlowTestCase):
def test_zeros(self):
with self.test_session():
for dtype in _DTYPES:
for dtype in _DTYPES:
with self.test_session():
for shape in [(5, 5), (24, 24), (2, 24, 24, 3)]:
for angle in [0, 1, np.pi / 2.0]:
image = array_ops.zeros(shape, dtype)
@ -46,8 +46,8 @@ class ImageOpsTest(test_util.TensorFlowTestCase):
np.zeros(shape, dtype.as_numpy_dtype()))
def test_rotate_even(self):
with self.test_session():
for dtype in _DTYPES:
for dtype in _DTYPES:
with self.test_session():
image = array_ops.reshape(
math_ops.cast(math_ops.range(36), dtype), (6, 6))
image_rep = array_ops.tile(image[None, :, :, None], [3, 1, 1, 1])
@ -68,8 +68,8 @@ class ImageOpsTest(test_util.TensorFlowTestCase):
[1, 7, 13, 19, 25, 31], [0, 6, 12, 18, 24, 30]]])
def test_rotate_odd(self):
with self.test_session():
for dtype in _DTYPES:
for dtype in _DTYPES:
with self.test_session():
image = array_ops.reshape(
math_ops.cast(math_ops.range(25), dtype), (5, 5))
image_rep = array_ops.tile(image[None, :, :, None], [3, 1, 1, 1])
@ -87,9 +87,25 @@ class ImageOpsTest(test_util.TensorFlowTestCase):
[22, 17, 12, 7, 2], [23, 18, 13, 8, 3],
[24, 19, 14, 9, 4]]])
def test_translate(self):
for dtype in _DTYPES:
with self.test_session():
image = constant_op.constant(
[[1, 0, 1, 0],
[0, 1, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 1]], dtype=dtype)
translation = constant_op.constant([-1, -1], dtypes.float32)
image_translated = image_ops.translate(image, translation)
self.assertAllEqual(image_translated.eval(),
[[1, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 1, 0],
[0, 0, 0, 0]])
def test_compose(self):
with self.test_session():
for dtype in _DTYPES:
for dtype in _DTYPES:
with self.test_session():
image = constant_op.constant(
[[1, 1, 1, 0],
[1, 0, 0, 0],
@ -246,4 +262,3 @@ class BipartiteMatchTest(test_util.TensorFlowTestCase):
if __name__ == "__main__":
googletest.main()

View File

@ -37,16 +37,18 @@ _IMAGE_DTYPES = set(
ops.RegisterShape("ImageProjectiveTransform")(common_shapes.call_cpp_shape_fn)
def rotate(images, angles, interpolation="NEAREST"):
def rotate(images, angles, interpolation="NEAREST", name=None):
"""Rotate image(s) by the passed angle(s) in radians.
Args:
images: A tensor of shape (num_images, num_rows, num_columns, num_channels)
(NHWC), (num_rows, num_columns, num_channels) (HWC), or
(num_rows, num_columns) (HW).
(num_rows, num_columns) (HW). The rank must be statically known (the
shape is not `TensorShape(None)`.
angles: A scalar angle to rotate all images by, or (if images has rank 4)
a vector of length num_images, with an angle for each image in the batch.
interpolation: Interpolation mode. Supported values: "NEAREST", "BILINEAR".
name: The name of the op.
Returns:
Image(s) with the same type and shape as `images`, rotated by the given
@ -55,38 +57,77 @@ def rotate(images, angles, interpolation="NEAREST"):
Raises:
TypeError: If `image` is an invalid type.
"""
image_or_images = ops.convert_to_tensor(images, name="images")
if image_or_images.dtype.base_dtype not in _IMAGE_DTYPES:
raise TypeError("Invalid dtype %s." % image_or_images.dtype)
if len(image_or_images.get_shape()) == 2:
images = image_or_images[None, :, :, None]
elif len(image_or_images.get_shape()) == 3:
images = image_or_images[None, :, :, :]
elif len(image_or_images.get_shape()) == 4:
images = image_or_images
else:
raise TypeError("Images should have rank between 2 and 4.")
with ops.name_scope(name, "rotate"):
image_or_images = ops.convert_to_tensor(images)
if image_or_images.dtype.base_dtype not in _IMAGE_DTYPES:
raise TypeError("Invalid dtype %s." % image_or_images.dtype)
elif image_or_images.get_shape().ndims is None:
raise TypeError("image_or_images rank must be statically known")
elif len(image_or_images.get_shape()) == 2:
images = image_or_images[None, :, :, None]
elif len(image_or_images.get_shape()) == 3:
images = image_or_images[None, :, :, :]
elif len(image_or_images.get_shape()) == 4:
images = image_or_images
else:
raise TypeError("Images should have rank between 2 and 4.")
image_height = math_ops.cast(array_ops.shape(images)[1], dtypes.float32)[None]
image_width = math_ops.cast(array_ops.shape(images)[2], dtypes.float32)[None]
output = transform(
images,
angles_to_projective_transforms(angles, image_height, image_width),
interpolation=interpolation)
if len(image_or_images.get_shape()) == 2:
return output[0, :, :, 0]
elif len(image_or_images.get_shape()) == 3:
return output[0, :, :, :]
else:
return output
image_height = math_ops.cast(array_ops.shape(images)[1],
dtypes.float32)[None]
image_width = math_ops.cast(array_ops.shape(images)[2],
dtypes.float32)[None]
output = transform(
images,
angles_to_projective_transforms(angles, image_height, image_width),
interpolation=interpolation)
if image_or_images.get_shape().ndims is None:
raise TypeError("image_or_images rank must be statically known")
elif len(image_or_images.get_shape()) == 2:
return output[0, :, :, 0]
elif len(image_or_images.get_shape()) == 3:
return output[0, :, :, :]
else:
return output
def angles_to_projective_transforms(angles, image_height, image_width):
def translate(images, translations, interpolation="NEAREST", name=None):
"""Translate image(s) by the passed vectors(s).
Args:
images: A tensor of shape (num_images, num_rows, num_columns, num_channels)
(NHWC), (num_rows, num_columns, num_channels) (HWC), or
(num_rows, num_columns) (HW). The rank must be statically known (the
shape is not `TensorShape(None)`.
translations: A vector representing [dx, dy] or (if images has rank 4)
a matrix of length num_images, with a [dx, dy] vector for each image in
the batch.
interpolation: Interpolation mode. Supported values: "NEAREST", "BILINEAR".
name: The name of the op.
Returns:
Image(s) with the same type and shape as `images`, translated by the given
vector(s). Empty space due to the translation will be filled with zeros.
Raises:
TypeError: If `image` is an invalid type.
"""
with ops.name_scope(name, "translate"):
return transform(
images,
translations_to_projective_transforms(translations),
interpolation=interpolation)
def angles_to_projective_transforms(angles,
image_height,
image_width,
name=None):
"""Returns projective transform(s) for the given angle(s).
Args:
angles: A scalar angle to rotate all images by, or (for batches of images)
a vector with an angle to rotate each image in the batch.
a vector with an angle to rotate each image in the batch. The rank must
be statically known (the shape is not `TensorShape(None)`.
image_height: Height of the image(s) to be transformed.
image_width: Width of the image(s) to be transformed.
@ -94,41 +135,89 @@ def angles_to_projective_transforms(angles, image_height, image_width):
A tensor of shape (num_images, 8). Projective transforms which can be given
to `tf.contrib.image.transform`.
"""
angle_or_angles = ops.convert_to_tensor(
angles, name="angles", dtype=dtypes.float32)
if len(angle_or_angles.get_shape()) == 0: # pylint: disable=g-explicit-length-test
angles = angle_or_angles[None]
elif len(angle_or_angles.get_shape()) == 1:
angles = angle_or_angles
else:
raise TypeError("Angles should have rank 0 or 1.")
x_offset = ((image_width - 1) - (math_ops.cos(angles) *
(image_width - 1) - math_ops.sin(angles) *
(image_height - 1))) / 2.0
y_offset = ((image_height - 1) - (math_ops.sin(angles) *
(image_width - 1) + math_ops.cos(angles) *
(image_height - 1))) / 2.0
num_angles = array_ops.shape(angles)[0]
return array_ops.concat(
values=[
math_ops.cos(angles)[:, None],
-math_ops.sin(angles)[:, None],
x_offset[:, None],
math_ops.sin(angles)[:, None],
math_ops.cos(angles)[:, None],
y_offset[:, None],
array_ops.zeros((num_angles, 2), dtypes.float32),
],
axis=1)
with ops.name_scope(name, "angles_to_projective_transforms"):
angle_or_angles = ops.convert_to_tensor(
angles, name="angles", dtype=dtypes.float32)
if len(angle_or_angles.get_shape()) == 0: # pylint: disable=g-explicit-length-test
angles = angle_or_angles[None]
elif len(angle_or_angles.get_shape()) == 1:
angles = angle_or_angles
else:
raise TypeError("Angles should have rank 0 or 1.")
x_offset = ((image_width - 1) - (math_ops.cos(angles) *
(image_width - 1) - math_ops.sin(angles) *
(image_height - 1))) / 2.0
y_offset = ((image_height - 1) - (math_ops.sin(angles) *
(image_width - 1) + math_ops.cos(angles) *
(image_height - 1))) / 2.0
num_angles = array_ops.shape(angles)[0]
return array_ops.concat(
values=[
math_ops.cos(angles)[:, None],
-math_ops.sin(angles)[:, None],
x_offset[:, None],
math_ops.sin(angles)[:, None],
math_ops.cos(angles)[:, None],
y_offset[:, None],
array_ops.zeros((num_angles, 2), dtypes.float32),
],
axis=1)
def transform(images, transforms, interpolation="NEAREST"):
def translations_to_projective_transforms(translations, name=None):
"""Returns projective transform(s) for the given translation(s).
Args:
translations: A 2-element list representing [dx, dy] or a matrix of
2-element lists representing [dx, dy] to translate for each image
(for a batch of images). The rank must be statically known (the shape
is not `TensorShape(None)`.
name: The name of the op.
Returns:
A tensor of shape (num_images, 8) projective transforms which can be given
to `tf.contrib.image.transform`.
"""
with ops.name_scope(name, "translations_to_projective_transforms"):
translation_or_translations = ops.convert_to_tensor(
translations, name="translations", dtype=dtypes.float32)
if translation_or_translations.get_shape().ndims is None:
raise TypeError(
"translation_or_translations rank must be statically known")
elif len(translation_or_translations.get_shape()) == 1:
translations = translation_or_translations[None]
elif len(translation_or_translations.get_shape()) == 2:
translations = translation_or_translations
else:
raise TypeError("Translations should have rank 1 or 2.")
num_translations = array_ops.shape(translations)[0]
# The translation matrix looks like:
# [[1 0 -dx]
# [0 1 -dy]
# [0 0 1]]
# where the last entry is implicit.
# Translation matrices are always float32.
return array_ops.concat(
values=[
array_ops.ones((num_translations, 1), dtypes.float32),
array_ops.zeros((num_translations, 1), dtypes.float32),
-translations[:, 0, None],
array_ops.zeros((num_translations, 1), dtypes.float32),
array_ops.ones((num_translations, 1), dtypes.float32),
-translations[:, 1, None],
array_ops.zeros((num_translations, 2), dtypes.float32),
],
axis=1)
def transform(images, transforms, interpolation="NEAREST", name=None):
"""Applies the given transform(s) to the image(s).
Args:
images: A tensor of shape (num_images, num_rows, num_columns, num_channels)
(NHWC), (num_rows, num_columns, num_channels) (HWC), or
(num_rows, num_columns) (HW).
(num_rows, num_columns) (HW). The rank must be statically known (the
shape is not `TensorShape(None)`.
transforms: Projective transform matrix/matrices. A vector of length 8 or
tensor of size N x 8. If one row of transforms is
[a0, a1, a2, b0, b1, b2, c0, c1], then it maps the *output* point
@ -146,34 +235,40 @@ def transform(images, transforms, interpolation="NEAREST"):
Raises:
TypeError: If `image` is an invalid type.
"""
image_or_images = ops.convert_to_tensor(images, name="images")
transform_or_transforms = ops.convert_to_tensor(
transforms, name="transforms", dtype=dtypes.float32)
if image_or_images.dtype.base_dtype not in _IMAGE_DTYPES:
raise TypeError("Invalid dtype %s." % image_or_images.dtype)
if len(image_or_images.get_shape()) == 2:
images = image_or_images[None, :, :, None]
elif len(image_or_images.get_shape()) == 3:
images = image_or_images[None, :, :, :]
elif len(image_or_images.get_shape()) == 4:
images = image_or_images
else:
raise TypeError("Images should have rank between 2 and 4.")
with ops.name_scope(name, "transform"):
image_or_images = ops.convert_to_tensor(images, name="images")
transform_or_transforms = ops.convert_to_tensor(
transforms, name="transforms", dtype=dtypes.float32)
if image_or_images.dtype.base_dtype not in _IMAGE_DTYPES:
raise TypeError("Invalid dtype %s." % image_or_images.dtype)
elif image_or_images.get_shape().ndims is None:
raise TypeError("image_or_images rank must be statically known")
elif len(image_or_images.get_shape()) == 2:
images = image_or_images[None, :, :, None]
elif len(image_or_images.get_shape()) == 3:
images = image_or_images[None, :, :, :]
elif len(image_or_images.get_shape()) == 4:
images = image_or_images
else:
raise TypeError("Images should have rank between 2 and 4.")
if len(transform_or_transforms.get_shape()) == 1:
transforms = transform_or_transforms[None]
elif len(transform_or_transforms.get_shape()) == 2:
transforms = transform_or_transforms
else:
raise TypeError("Transforms should have rank 1 or 2.")
output = gen_image_ops.image_projective_transform(
images, transforms, interpolation=interpolation.upper())
if len(image_or_images.get_shape()) == 2:
return output[0, :, :, 0]
elif len(image_or_images.get_shape()) == 3:
return output[0, :, :, :]
else:
return output
if len(transform_or_transforms.get_shape()) == 1:
transforms = transform_or_transforms[None]
elif transform_or_transforms.get_shape().ndims is None:
raise TypeError(
"transform_or_transforms rank must be statically known")
elif len(transform_or_transforms.get_shape()) == 2:
transforms = transform_or_transforms
else:
raise TypeError("Transforms should have rank 1 or 2.")
output = gen_image_ops.image_projective_transform(
images, transforms, interpolation=interpolation.upper())
if len(image_or_images.get_shape()) == 2:
return output[0, :, :, 0]
elif len(image_or_images.get_shape()) == 3:
return output[0, :, :, :]
else:
return output
def compose_transforms(*transforms):
@ -191,11 +286,12 @@ def compose_transforms(*transforms):
order.
"""
assert transforms, "transforms cannot be empty"
composed = _flat_transforms_to_matrices(transforms[0])
for tr in transforms[1:]:
# Multiply batches of matrices.
composed = math_ops.matmul(composed, _flat_transforms_to_matrices(tr))
return _transform_matrices_to_flat(composed)
with ops.name_scope("compose_transforms"):
composed = _flat_transforms_to_matrices(transforms[0])
for tr in transforms[1:]:
# Multiply batches of matrices.
composed = math_ops.matmul(composed, _flat_transforms_to_matrices(tr))
return _transform_matrices_to_flat(composed)
def _flat_transforms_to_matrices(transforms):
@ -211,8 +307,8 @@ def _flat_transforms_to_matrices(transforms):
def _transform_matrices_to_flat(transform_matrices):
# Flatten each matrix.
transforms = array_ops.reshape(
transform_matrices, constant_op.constant([-1, 9]))
transforms = array_ops.reshape(transform_matrices,
constant_op.constant([-1, 9]))
# Divide each matrix by the last entry (normally 1).
transforms /= transforms[:, 8:9]
return transforms[:, :8]
@ -260,10 +356,10 @@ def _image_projective_transform_grad(op, grad):
return [output, None]
def bipartite_match(
distance_mat,
num_valid_rows,
top_k=-1):
def bipartite_match(distance_mat,
num_valid_rows,
top_k=-1,
name="bipartite_match"):
"""Find bipartite matching based on a given distance matrix.
A greedy bi-partite matching algorithm is used to obtain the matching with
@ -282,6 +378,7 @@ def bipartite_match(
top_k: A scalar that specifies the number of top-k matches to retrieve.
If set to be negative, then is set according to the maximum number of
matches from `distance_mat`.
name: The name of the op.
Returns:
row_to_col_match_indices: A vector of length num_rows, which is the number
@ -292,7 +389,8 @@ def bipartite_match(
If `col_to_row_match_indices[j]` is not -1, column j is matched to row
`col_to_row_match_indices[j]`.
"""
result = gen_image_ops.bipartite_match(distance_mat, num_valid_rows, top_k)
result = gen_image_ops.bipartite_match(
distance_mat, num_valid_rows, top_k, name=name)
return result

View File

@ -104,7 +104,7 @@ class LossFunction(object):
@abc.abstractmethod
def multiply_hessian_factor_transpose(self, vector):
"""Right-multiply a vector by the tranpose of a factor B of the Hessian.
"""Right-multiply a vector by the transpose of a factor B of the Hessian.
Here the 'Hessian' is the Hessian matrix (i.e. matrix of 2nd-derivatives)
of the loss function with respect to its inputs. Typically this will be
@ -218,7 +218,7 @@ class NegativeLogProbLoss(LossFunction):
@abc.abstractmethod
def multiply_fisher_factor_transpose(self, vector):
"""Right-multiply a vector by the tranpose of a factor B of the Fisher.
"""Right-multiply a vector by the transpose of a factor B of the Fisher.
Here the 'Fisher' is the Fisher information matrix (i.e. expected outer-
product of gradients) with respect to the parameters of the underlying
@ -397,7 +397,7 @@ class NormalMeanVarianceNegativeLogProbLoss(DistributionNegativeLogProbLoss):
This class parameterizes a multivariate normal distribution with n independent
dimensions. Unlike `NormalMeanNegativeLogProbLoss`, this class does not
assume the variance is held constant. The Fisher Information for for n = 1
assume the variance is held constant. The Fisher Information for n = 1
is given by,
F = [[1 / variance, 0],

View File

@ -61,7 +61,7 @@ class OpQueue(object):
sess: tf.Session.
Returns:
Next Op chosen from from 'ops'.
Next Op chosen from 'ops'.
"""
# In Python 3, type(next_op_name) == bytes. Calling bytes.decode('ascii')
# returns a str.

View File

@ -47,6 +47,7 @@ See the @{$python/contrib.layers} guide.
@@separable_conv2d
@@separable_convolution2d
@@softmax
@@spatial_softmax
@@stack
@@unit_norm
@@bow_encoder

View File

@ -165,7 +165,7 @@ def run(experiment_fn, output_dir=None, schedule=None, run_config=None,
must be None.
2) It accepts two arguments `run_config` and `hparams`, which should be
used to create the `Estimator` (`run_config` passed as `config` to its
constructor; `hparams` used as the hyper-paremeters of the model).
constructor; `hparams` used as the hyper-parameters of the model).
It must return an `Experiment`. For this case, `output_dir` must be None.
output_dir: Base output directory [Deprecated].
schedule: The name of the method in the `Experiment` to run.

View File

@ -28,6 +28,7 @@ from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn
from tensorflow.python.ops import nn_ops
from tensorflow.python.util.deprecation import deprecated
from tensorflow.python.util.deprecation import deprecated_args
__all__ = ["absolute_difference",
"add_loss",
@ -623,8 +624,9 @@ def mean_pairwise_squared_error(
@deprecated("2016-12-30", "Use tf.losses.cosine_distance instead.")
@deprecated_args(None, "dim is deprecated, use axis instead", "dim")
def cosine_distance(
predictions, labels=None, dim=None, weights=1.0, scope=None):
predictions, labels=None, axis=None, weights=1.0, scope=None, dim=None):
"""Adds a cosine-distance loss to the training procedure.
Note that the function assumes that `predictions` and `labels` are already
@ -633,10 +635,11 @@ def cosine_distance(
Args:
predictions: An arbitrary matrix.
labels: A `Tensor` whose shape matches 'predictions'
dim: The dimension along which the cosine distance is computed.
axis: The dimension along which the cosine distance is computed.
weights: Coefficients for the loss a scalar, a tensor of shape
[batch_size] or a tensor whose shape matches `predictions`.
scope: The scope for the operations performed in computing the loss.
dim: The old (deprecated) name for `axis`.
Returns:
A scalar `Tensor` representing the loss value.
@ -645,8 +648,12 @@ def cosine_distance(
ValueError: If `predictions` shape doesn't match `labels` shape, or
`weights` is `None`.
"""
if dim is None:
raise ValueError("`dim` cannot be None.")
if dim is not None:
if axis is not None:
raise ValueError("Cannot specify both 'axis' and 'dim'")
axis = dim
if axis is None and dim is None:
raise ValueError("You must specify 'axis'.")
with ops.name_scope(scope, "cosine_distance_loss",
[predictions, labels, weights]) as scope:
predictions.get_shape().assert_is_compatible_with(labels.get_shape())
@ -655,5 +662,5 @@ def cosine_distance(
labels = math_ops.to_float(labels)
radial_diffs = math_ops.multiply(predictions, labels)
losses = 1 - math_ops.reduce_sum(radial_diffs, reduction_indices=[dim,])
losses = 1 - math_ops.reduce_sum(radial_diffs, reduction_indices=[axis,])
return compute_weighted_loss(losses, weights, scope=scope)

View File

@ -194,6 +194,10 @@ LIBFLAGS :=
# If we're on OS X, make sure that globals aren't stripped out.
ifeq ($(TARGET),OSX)
ifeq ($(HAS_GEN_HOST_PROTOC),true)
LIBFLAGS += -L$(MAKEFILE_DIR)/gen/protobuf-host/lib
export LD_LIBRARY_PATH=$(MAKEFILE_DIR)/gen/protobuf-host/lib
endif
LDFLAGS += -all_load
endif
# Make sure that we don't strip global constructors on Linux.

View File

@ -54,7 +54,7 @@ download_and_extract() {
elif [[ "${url}" == *zip ]]; then
tempdir=$(mktemp -d)
tempdir2=$(mktemp -d)
wget ${url} -P ${tempdir}
wget -P ${tempdir} ${url}
unzip ${tempdir}/* -d ${tempdir2}
# unzip has no strip components, so unzip to a temp dir, and move the files
# we want from the tempdir to destination.

View File

@ -264,3 +264,4 @@ tensorflow/core/kernels/spacetobatch_functor.cc
tensorflow/core/kernels/spacetobatch_op.cc
tensorflow/core/kernels/batchtospace_op.cc
tensorflow/core/kernels/warn_about_ints.cc
tensorflow/core/kernels/segment_reduction_ops.cc

View File

@ -749,7 +749,7 @@ def meta_graph_transform(
base_meta_graph_def, meta_graph_def, collection_name,
removed_op_names)
# Append newly added initalizers to collection.
# Append newly added initializers to collection.
_add_new_inits_to_collection(meta_graph_def, updated_initializer_names)
# Copy signature_defs, excluding any pruned nodes

View File

@ -5856,7 +5856,7 @@ class StreamingMeanIOUTest(test.TestCase):
sess.run(variables.local_variables_initializer())
for _ in range(5):
sess.run(update_op)
desired_output = np.mean([1.0 / 3.0, 2.0 / 4.0, 0.])
desired_output = np.mean([1.0 / 3.0, 2.0 / 4.0])
self.assertAlmostEqual(desired_output, miou.eval())
def testUpdateOpEvalIsAccumulatedConfusionMatrix(self):
@ -5938,6 +5938,58 @@ class StreamingMeanIOUTest(test.TestCase):
desired_miou = np.mean([2. / 4., 4. / 6.])
self.assertAlmostEqual(desired_miou, miou.eval())
def testMissingClassInLabels(self):
labels = constant_op.constant([
[[0, 0, 1, 1, 0, 0],
[1, 0, 0, 0, 0, 1]],
[[1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0]]])
predictions = constant_op.constant([
[[0, 0, 2, 1, 1, 0],
[0, 1, 2, 2, 0, 1]],
[[0, 0, 2, 1, 1, 1],
[1, 1, 2, 0, 0, 0]]])
num_classes = 3
with self.test_session() as sess:
miou, update_op = metrics.streaming_mean_iou(
predictions, labels, num_classes)
sess.run(variables.local_variables_initializer())
self.assertAllEqual([[7, 4, 3], [3, 5, 2], [0, 0, 0]], update_op.eval())
self.assertAlmostEqual(
1 / 3 * (7 / (7 + 3 + 7) + 5 / (5 + 4 + 5) + 0 / (0 + 5 + 0)),
miou.eval())
def testMissingClassOverallSmall(self):
labels = constant_op.constant([0])
predictions = constant_op.constant([0])
num_classes = 2
with self.test_session() as sess:
miou, update_op = metrics.streaming_mean_iou(
predictions, labels, num_classes)
sess.run(variables.local_variables_initializer())
self.assertAllEqual([[1, 0], [0, 0]], update_op.eval())
self.assertAlmostEqual(1, miou.eval())
def testMissingClassOverallLarge(self):
labels = constant_op.constant([
[[0, 0, 1, 1, 0, 0],
[1, 0, 0, 0, 0, 1]],
[[1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0]]])
predictions = constant_op.constant([
[[0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1]],
[[0, 0, 0, 1, 1, 1],
[1, 1, 1, 0, 0, 0]]])
num_classes = 3
with self.test_session() as sess:
miou, update_op = metrics.streaming_mean_iou(
predictions, labels, num_classes)
sess.run(variables.local_variables_initializer())
self.assertAllEqual([[9, 5, 0], [3, 7, 0], [0, 0, 0]], update_op.eval())
self.assertAlmostEqual(
1 / 2 * (9 / (9 + 3 + 5) + 7 / (7 + 5 + 3)), miou.eval())
class StreamingConcatTest(test.TestCase):

View File

@ -194,7 +194,7 @@ class DistributedOptimizer(tf.train.Optimizer):
See Optimizer.compute_gradients() for more info.
In DistributedOptimizer, compute_gradients() is overriden to also
In DistributedOptimizer, compute_gradients() is overridden to also
allreduce the gradients before returning them.
"""
gradients = (super(DistributedOptimizer, self)

View File

@ -18,6 +18,7 @@
@@deprecated_flipped_softmax_cross_entropy_with_logits
@@deprecated_flipped_sparse_softmax_cross_entropy_with_logits
@@deprecated_flipped_sigmoid_cross_entropy_with_logits
@@nth_element
@@rank_sampled_softmax_loss
@@scaled_softplus
"""
@ -31,6 +32,7 @@ from tensorflow.contrib.nn.python.ops.alpha_dropout import *
from tensorflow.contrib.nn.python.ops.cross_entropy import *
from tensorflow.contrib.nn.python.ops.sampling_ops import *
from tensorflow.contrib.nn.python.ops.scaled_softplus import *
from tensorflow.python.ops.nn_ops import nth_element
# pylint: enable=unused-import,wildcard-import
from tensorflow.python.util.all_util import remove_undocumented

View File

@ -27,13 +27,15 @@ import math
from tensorflow.contrib.receptive_field.python.util import graph_compute_order
from tensorflow.contrib.util import make_ndarray
from tensorflow.python.platform import tf_logging as logging
from tensorflow.python.framework import ops as framework_ops
import numpy as np
# White-listed layer operations, which do not affect the receptive field
# computation.
_UNCHANGED_RF_LAYER_OPS = [
"Softplus", "Relu", "BiasAdd", "Mul", "Add", "Const", "Identity",
"VariableV2", "Sub", "Rsqrt", "ConcatV2"
]
'Add', 'BiasAdd', 'Ceil', 'ConcatV2', 'Const', 'Floor', 'Identity', 'Log',
'Mul', 'Pow', 'RealDiv', 'Relu', 'Round', 'Rsqrt', 'Softplus', 'Sub',
'VariableV2']
# Different ways in which padding modes may be spelled.
_VALID_PADDING = ["VALID", b"VALID"]
@ -238,7 +240,8 @@ def _get_layer_params(node, name_to_order_node):
padding_x = 0
padding_y = 0
else:
raise ValueError("Unknown layer op: %s" % node.op)
raise ValueError("Unknown layer for operation '%s': %s" %
(node.name, node.op))
return kernel_size_x, kernel_size_y, stride_x, stride_y, padding_x, padding_y
@ -304,13 +307,103 @@ def _get_effective_padding_node_input(stride, padding,
return stride * effective_padding_output + padding
def compute_receptive_field_from_graph_def(graph_def, input_node, output_node):
"""Computes receptive field (RF) parameters from a GraphDef object.
class ReceptiveField:
"""
Receptive field of a convolutional neural network.
Args:
graph_def: GraphDef object.
input_node: Name of the input node from graph.
output_node: Name of the output node from graph.
size: Receptive field size.
stride: Effective stride.
padding: Effective padding.
"""
def __init__(self, size, stride, padding):
self.size = np.asarray(size)
self.stride = np.asarray(stride)
self.padding = np.asarray(padding)
def compute_input_center_coordinates(self, y, axis=None):
"""
Computes the center of the receptive field that generated a feature.
Args:
y: An array of feature coordinates with shape `(..., d)`, where `d` is the
number of dimensions of the coordinates.
axis: The dimensions for which to compute the input center coordinates.
If `None` (the default), compute the input center coordinates for all
dimensions.
Returns:
x: Center of the receptive field that generated the features, at the input
of the network.
Raises:
ValueError: If the number of dimensions of the feature coordinates does
not match the number of elements in `axis`.
"""
# Use all dimensions.
if axis is None:
axis = range(self.size.size)
# Ensure axis is a list because tuples have different indexing behavior.
axis = list(axis)
y = np.asarray(y)
if y.shape[-1] != len(axis):
raise ValueError("Dimensionality of the feature coordinates `y` (%d) "
"does not match dimensionality of `axis` (%d)" %
(y.shape[-1], len(axis)))
return - self.padding[axis] + y * self.stride[axis] + \
(self.size[axis] - 1) / 2
def compute_feature_coordinates(self, x, axis=None):
"""
Computes the position of a feature given the center of a receptive field.
Args:
x: An array of input center coordinates with shape `(..., d)`, where `d`
is the number of dimensions of the coordinates.
axis: The dimensions for which to compute the feature coordinates.
If `None` (the default), compute the feature coordinates for all
dimensions.
Returns:
y: Coordinates of the features.
Raises:
ValueError: If the number of dimensions of the input center coordinates
does not match the number of elements in `axis`.
"""
# Use all dimensions.
if axis is None:
axis = range(self.size.size)
# Ensure axis is a list because tuples have different indexing behavior.
axis = list(axis)
x = np.asarray(x)
if x.shape[-1] != len(axis):
raise ValueError("Dimensionality of the input center coordinates `x` "
"(%d) does not match dimensionality of `axis` (%d)" %
(x.shape[-1], len(axis)))
return (x + self.padding[axis] + (1 - self.size[axis]) / 2) / \
self.stride[axis]
def __iter__(self):
return iter(np.concatenate([self.size, self.stride, self.padding]))
def compute_receptive_field_from_graph_def(graph_def, input_node, output_node,
stop_propagation=None):
"""Computes receptive field (RF) parameters from a Graph or GraphDef object.
The algorithm stops the calculation of the receptive field whenever it
encounters an operation in the list `stop_propagation`. Stopping the
calculation early can be useful to calculate the receptive field of a
subgraph such as a single branch of the
[inception network](https://arxiv.org/abs/1512.00567).
Args:
graph_def: Graph or GraphDef object.
input_node: Name of the input node or Tensor object from graph.
output_node: Name of the output node or Tensor object from graph.
stop_propagation: List of operation or scope names for which to stop the
propagation of the receptive field.
Returns:
rf_size_x: Receptive field size of network in the horizontal direction, with
@ -331,6 +424,18 @@ def compute_receptive_field_from_graph_def(graph_def, input_node, output_node):
cannot be found. For network criterion alignment, see
photos/vision/features/delf/g3doc/rf_computation.md
"""
# Convert a graph to graph_def if necessary.
if isinstance(graph_def, framework_ops.Graph):
graph_def = graph_def.as_graph_def()
# Convert tensors to names.
if isinstance(input_node, framework_ops.Tensor):
input_node = input_node.op.name
if isinstance(output_node, framework_ops.Tensor):
output_node = output_node.op.name
stop_propagation = stop_propagation or []
# Computes order of computation for a given graph.
name_to_order_node = graph_compute_order.get_compute_order(
graph_def=graph_def)
@ -422,6 +527,10 @@ def compute_receptive_field_from_graph_def(graph_def, input_node, output_node):
# Loop over this node's inputs and potentially propagate information down.
for inp_name in node.input:
# Stop the propagation of the receptive field.
if any(inp_name.startswith(stop) for stop in stop_propagation):
logging.vlog(3, "Skipping explicitly ignored node %s.", node.name)
continue
logging.vlog(4, "inp_name = %s", inp_name)
inp_node = name_to_order_node[inp_name].node
logging.vlog(4, "inp_node = \n%s", inp_node)
@ -480,6 +589,7 @@ def compute_receptive_field_from_graph_def(graph_def, input_node, output_node):
raise ValueError("Output node was not found")
if input_node not in rf_sizes_x:
raise ValueError("Input node was not found")
return (rf_sizes_x[input_node], rf_sizes_y[input_node],
effective_strides_x[input_node], effective_strides_y[input_node],
effective_paddings_x[input_node], effective_paddings_y[input_node])
return ReceptiveField(
(rf_sizes_x[input_node], rf_sizes_y[input_node]),
(effective_strides_x[input_node], effective_strides_y[input_node]),
(effective_paddings_x[input_node], effective_paddings_y[input_node]))

View File

@ -25,6 +25,7 @@ from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import nn
from tensorflow.python.platform import test
import numpy as np
def create_test_network_1():
@ -150,6 +151,31 @@ def create_test_network_5():
return g
def create_test_network_6():
"""Aligned network with dropout for test.
The graph is similar to create_test_network_1(), except that the right branch
has dropout normalization.
Returns:
g: Tensorflow graph object (Graph proto).
"""
g = ops.Graph()
with g.as_default():
# An 8x8 test image.
x = array_ops.placeholder(dtypes.float32, (1, 8, 8, 1), name='input_image')
# Left branch.
l1 = slim.conv2d(x, 1, [1, 1], stride=4, scope='L1', padding='VALID')
# Right branch.
l2_pad = array_ops.pad(x, [[0, 0], [1, 0], [1, 0], [0, 0]])
l2 = slim.conv2d(l2_pad, 1, [3, 3], stride=2, scope='L2', padding='VALID')
l3 = slim.conv2d(l2, 1, [1, 1], stride=2, scope='L3', padding='VALID')
dropout = slim.dropout(l3)
# Addition.
nn.relu(l1 + dropout, name='output')
return g
class RfUtilsTest(test.TestCase):
def testComputeRFFromGraphDefAligned(self):
@ -220,6 +246,36 @@ class RfUtilsTest(test.TestCase):
self.assertEqual(effective_padding_x, 0)
self.assertEqual(effective_padding_y, 0)
def testComputeRFFromGraphDefStopPropagation(self):
graph_def = create_test_network_6().as_graph_def()
input_node = 'input_image'
output_node = 'output'
# Compute the receptive field but stop the propagation for the random
# uniform variable of the dropout.
(receptive_field_x, receptive_field_y, effective_stride_x,
effective_stride_y, effective_padding_x, effective_padding_y) = (
receptive_field.compute_receptive_field_from_graph_def(
graph_def, input_node, output_node,
['Dropout/dropout/random_uniform']))
self.assertEqual(receptive_field_x, 3)
self.assertEqual(receptive_field_y, 3)
self.assertEqual(effective_stride_x, 4)
self.assertEqual(effective_stride_y, 4)
self.assertEqual(effective_padding_x, 1)
self.assertEqual(effective_padding_y, 1)
def testComputeCoordinatesRoundtrip(self):
graph_def = create_test_network_1()
input_node = 'input_image'
output_node = 'output'
rf = receptive_field.compute_receptive_field_from_graph_def(
graph_def, input_node, output_node)
x = np.random.randint(0, 100, (50, 2))
y = rf.compute_feature_coordinates(x)
x2 = rf.compute_input_center_coordinates(y)
self.assertAllEqual(x, x2)
if __name__ == '__main__':
test.main()

View File

@ -20,6 +20,7 @@ from __future__ import print_function
import numpy as np
from tensorflow.contrib import stateless
from tensorflow.python.framework import constant_op
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import random_seed
from tensorflow.python.ops import array_ops
@ -79,6 +80,21 @@ class StatelessOpsTest(test.TestCase):
for s1, v1 in values:
self.assertEqual(s0 == s1, np.all(v0 == v1))
def testShapeType(self):
with self.test_session(use_gpu=True):
for shape_dtype in [dtypes.int32, dtypes.int64]:
seed_t = array_ops.placeholder(dtypes.int64, shape=[2])
seeds = [(x, y) for x in range(5) for y in range(5)] * 3
for stateless_op, _ in CASES:
for shape in (), (3,), (2, 5):
pure = stateless_op(constant_op.constant(shape, dtype=shape_dtype),
seed=seed_t)
values = [(seed, pure.eval(feed_dict={seed_t: seed}))
for seed in seeds]
for s0, v0 in values:
for s1, v1 in values:
self.assertEqual(s0 == s1, np.all(v0 == v1))
if __name__ == '__main__':
test.main()

View File

@ -783,6 +783,7 @@ cc_library(
"//tensorflow/core/kernels:dataset_ops",
"//tensorflow/core/kernels:fake_quant_ops",
"//tensorflow/core/kernels:function_ops",
"//tensorflow/core/kernels:histogram_op",
"//tensorflow/core/kernels:image",
"//tensorflow/core/kernels:io",
"//tensorflow/core/kernels:linalg",
@ -1943,6 +1944,7 @@ CORE_CPU_LIB_HEADERS = CORE_CPU_BASE_HDRS + [
tf_cuda_library(
name = "core_cpu_impl",
srcs = [
"common_runtime/accumulate_n_optimizer.cc",
"common_runtime/allocator_retry.cc",
"common_runtime/bfc_allocator.cc",
"common_runtime/build_graph_options.cc",
@ -2178,6 +2180,7 @@ tf_cuda_library(
":lib",
":lib_internal",
":protos_all_cc",
":stream_executor",
"//third_party/eigen3",
] + if_static([":gpu_runtime_impl"]),
)
@ -2673,6 +2676,22 @@ tf_cc_tests(
],
)
tf_cc_test_mkl(
name = "mkl_runtime_tests",
size = "small",
srcs = ["common_runtime/mkl_cpu_allocator_test.cc"],
linkstatic = 1,
deps = [
":core",
":core_cpu",
":framework",
":framework_internal",
":test",
":test_main",
":testlib",
],
)
tf_cc_test_mkl(
name = "mkl_related_tests",
size = "small",
@ -2700,7 +2719,20 @@ tf_cc_test_mkl(
"//tensorflow/cc:sendrecv_ops",
"//tensorflow/core/kernels:ops_util",
"//third_party/eigen3",
],
] + if_mkl([
"//tensorflow/core/kernels:mkl_aggregate_ops",
"//tensorflow/core/kernels:mkl_concat_op",
"//tensorflow/core/kernels:mkl_conv_op",
"//tensorflow/core/kernels:mkl_cwise_ops_common",
"//tensorflow/core/kernels:mkl_fused_batch_norm_op",
"//tensorflow/core/kernels:mkl_identity_op",
"//tensorflow/core/kernels:mkl_input_conversion_op",
"//tensorflow/core/kernels:mkl_lrn_op",
"//tensorflow/core/kernels:mkl_pooling_ops",
"//tensorflow/core/kernels:mkl_relu_op",
"//tensorflow/core/kernels:mkl_reshape_op",
"//tensorflow/core/kernels:mkl_tfconv_op",
]),
)
tf_cc_tests_gpu(

View File

@ -0,0 +1,191 @@
/* Copyright 2016 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/common_runtime/optimization_registry.h"
#include "tensorflow/core/graph/node_builder.h"
namespace tensorflow {
namespace {
Tensor make_zeros(const DataType& dtype, const TensorShapeProto& shape) {
Tensor tensor(dtype, TensorShape(shape));
// Conveniently, all numeric data types have 0x0 == zero. Otherwise we would
// need a giant switch statement here.
memset(const_cast<char*>(tensor.tensor_data().data()), 0,
tensor.tensor_data().size());
return tensor;
}
// Replaces occurrences of the "AccumulateNV2" stub operator with a graph of
// lower-level ops. The graph is equivalent (modulo certain corner cases)
// to the semantics of the original accumulate_n() Python op in math_ops.py.
// Implementing the op with a rewrite allows this new variant of accumulate_n
// to be differentiable.
//
// The binary code that generates AccumulateNV2 stub ops is located in a
// dynamic library built out of tensorflow/contrib/framework. Ideally, this
// class would also be in contrib, but calls to REGISTER_OPTIMIZATION() from
// third-party libraries aren't currently supported.
class AccumulateNV2RemovePass : public GraphOptimizationPass {
public:
Status Run(const GraphOptimizationPassOptions& options) override {
// TODO(freiss.oss@gmail.com): Substantial shared code with
// ParallelConcatRemovePass::Run(). Consider refactoring if someone makes
// a third similar rewrite.
if (options.graph == nullptr) {
// TODO(apassos) returning OK feels weird here as we can't do anything
// without a graph, but some tests require this.
return Status::OK();
}
Graph* g = options.graph->get();
if (g == nullptr) {
return errors::Internal(
"AccumulateNV2 removal should happen before partitioning and a "
"graph should be available.");
}
// Build up a todo list of ops to replace, *then* modify the graph
gtl::InlinedVector<Node*, 2> matches;
for (Node* n : g->op_nodes()) {
if (n->type_string() == "AccumulateNV2") {
matches.push_back(n);
}
}
for (Node* n : matches) {
TF_RETURN_IF_ERROR(rewriteNode(n, g));
}
return Status::OK();
}
Status rewriteNode(Node* n, Graph* g) {
AttrSlice n_attrs = n->attrs();
auto base_make_node = [n, g, &n_attrs](const string& op,
const string& name) {
NodeBuilder node_builder(name, op);
// The pieces of AccumulateNV2 should all be on the same node.
node_builder.Device(n->requested_device());
string colo;
if (GetNodeAttr(n_attrs, kColocationAttrName, &colo).ok()) {
node_builder.Attr(kColocationAttrName, colo);
}
return node_builder;
};
auto make_node = [n, g, &n_attrs, &base_make_node](string op) {
return base_make_node(
op, g->NewName(strings::StrCat(n->name(), "/Internal")));
};
DataType dtype;
TF_RETURN_IF_ERROR(GetNodeAttr(n_attrs, "T", &dtype));
TensorShapeProto shape;
TF_RETURN_IF_ERROR(GetNodeAttr(n_attrs, "shape", &shape));
std::vector<const Edge*> data_edges, control_edges;
for (const Edge* input_edge : n->in_edges()) {
if (input_edge->IsControlEdge()) {
control_edges.push_back(input_edge);
} else {
data_edges.push_back(input_edge);
}
}
// Create the following ops to replace the AccumulateNV2 placeholder:
Node* create_accumulator = nullptr; // TemporaryVariable op
Node* initial_val = nullptr; // Const op
Node* initialize_accumulator = nullptr; // Assign op
std::vector<Node*> add_values_to_accumulator; // AssignAdd ops
Node* clean_up_accumulator = nullptr; // DestroyTemporaryVariable
const string accumulator_name =
strings::StrCat(n->name(), "/Internal/Accumulator");
TF_RETURN_IF_ERROR(make_node("TemporaryVariable")
.Attr("shape", shape)
.Attr("dtype", dtype)
.Attr("var_name", accumulator_name)
.Finalize(g, &create_accumulator));
TF_RETURN_IF_ERROR(make_node("Const")
.Attr("value", make_zeros(dtype, shape))
.Attr("dtype", dtype)
.Finalize(g, &initial_val));
TF_RETURN_IF_ERROR(make_node("Assign")
.Attr("T", dtype)
.Input(create_accumulator) // ref: Ref(T)
.Input(initial_val) // value: T
.Finalize(g, &initialize_accumulator));
for (int i = 0; i < data_edges.size(); ++i) {
Node* assignAdd;
TF_RETURN_IF_ERROR(make_node("AssignAdd")
.Attr("T", dtype)
.Attr("use_locking", true)
.Input(initialize_accumulator) // ref: Ref(T)
.Input(data_edges[i]->src(),
data_edges[i]->src_output()) // value: T
.Finalize(g, &assignAdd));
add_values_to_accumulator.push_back(assignAdd);
}
// Note that we use the original placeholder op's name here
TF_RETURN_IF_ERROR(base_make_node("DestroyTemporaryVariable", n->name())
.Attr("T", dtype)
.Attr("var_name", accumulator_name)
.Input(initialize_accumulator)
.Finalize(g, &clean_up_accumulator));
// Add edges to the graph to ensure that operations occur in the right
// order:
// 1. Do anything that had a control edge to the AccumulateNV2 placeholder
// 2. Initialize accumulator
// 3. Add input values to accumulator (already handled by data edges
// added above)
// 4. Reclaim the buffer that held the accumulator
// 5. Do anything that depended on the AccumulateNV2 placeholder
for (const Edge* control_edge : control_edges) {
g->AddControlEdge(control_edge->src(), initialize_accumulator);
}
for (Node* assign_add : add_values_to_accumulator) {
g->AddControlEdge(assign_add, clean_up_accumulator);
}
for (const Edge* out_edge : n->out_edges()) {
if (out_edge->IsControlEdge()) {
g->AddControlEdge(clean_up_accumulator, out_edge->dst());
} else {
g->AddEdge(clean_up_accumulator, 0, out_edge->dst(),
out_edge->dst_input());
}
}
// Remove the original AccumulateNV2 placeholder op.
// This removal modifies the op and must happen after we have finished
// using its incoming/outgoing edge sets.
g->RemoveNode(n);
return Status::OK();
}
};
REGISTER_OPTIMIZATION(OptimizationPassRegistry::PRE_PLACEMENT, 0,
AccumulateNV2RemovePass);
} // namespace
} // namespace tensorflow

View File

@ -21,9 +21,13 @@ limitations under the License.
#ifdef INTEL_MKL
#include <unistd.h>
#include <cstdlib>
#include <string>
#include "tensorflow/core/common_runtime/bfc_allocator.h"
#include "tensorflow/core/framework/allocator.h"
#include "tensorflow/core/lib/strings/numbers.h"
#include "tensorflow/core/lib/strings/str_util.h"
#include "tensorflow/core/platform/mem.h"
#include "i_malloc.h"
@ -46,10 +50,50 @@ class MklCPUAllocator : public Allocator {
public:
// Constructor and other standard functions
MklCPUAllocator() {
/// Environment variable that user can set to upper bound on memory allocation
static constexpr const char* kMaxLimitStr = "TF_MKL_ALLOC_MAX_BYTES";
/// Default upper limit on allocator size - 64GB
static const size_t kDefaultMaxLimit = 64LL << 30;
MklCPUAllocator() { TF_CHECK_OK(Initialize()); }
~MklCPUAllocator() override { delete allocator_; }
Status Initialize() {
VLOG(2) << "MklCPUAllocator: In MklCPUAllocator";
allocator_ =
new BFCAllocator(new MklSubAllocator, kMaxMemSize, kAllowGrowth, kName);
// Set upper bound on memory allocation to physical RAM available on the
// CPU unless explicitly specified by user
uint64 max_mem_bytes = kDefaultMaxLimit;
#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
max_mem_bytes =
(uint64)sysconf(_SC_PHYS_PAGES) * (uint64)sysconf(_SC_PAGESIZE);
#endif
char* user_mem_bytes = getenv(kMaxLimitStr);
if (user_mem_bytes != NULL) {
uint64 user_val = 0;
if (!strings::safe_strtou64(user_mem_bytes, &user_val)) {
return errors::InvalidArgument("Invalid memory limit (", user_mem_bytes,
") specified for MKL allocator through ",
kMaxLimitStr);
}
#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
if (user_val > max_mem_bytes) {
LOG(WARNING) << "The user specifed a memory limit " << kMaxLimitStr
<< "=" << user_val
<< " greater than available physical memory: "
<< max_mem_bytes
<< ". This could significantly reduce performance!";
}
#endif
max_mem_bytes = user_val;
}
VLOG(1) << "MklCPUAllocator: Setting max_mem_bytes: " << max_mem_bytes;
allocator_ = new BFCAllocator(new MklSubAllocator, max_mem_bytes,
kAllowGrowth, kName);
// For redirecting all allocations from MKL to this allocator
// From: http://software.intel.com/en-us/node/528565
@ -57,9 +101,9 @@ class MklCPUAllocator : public Allocator {
i_calloc = CallocHook;
i_realloc = ReallocHook;
i_free = FreeHook;
}
~MklCPUAllocator() override { delete allocator_; }
return Status::OK();
}
inline string Name() override { return kName; }
@ -71,6 +115,8 @@ class MklCPUAllocator : public Allocator {
allocator_->DeallocateRaw(ptr);
}
void GetStats(AllocatorStats* stats) { return allocator_->GetStats(stats); }
private:
// Hooks provided by this allocator for memory allocation routines from MKL
@ -96,11 +142,6 @@ class MklCPUAllocator : public Allocator {
TF_CHECK_OK(s); // way to assert with an error message
}
// TODO(jbobba): We should ideally move this into CPUOptions in config.proto.
/// Memory limit - 64GB
static const size_t kMaxMemSize =
static_cast<size_t>(64) * 1024 * 1024 * 1024;
/// Do we allow growth in BFC Allocator
static const bool kAllowGrowth = true;

View File

@ -0,0 +1,53 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifdef INTEL_MKL
#include "tensorflow/core/common_runtime/mkl_cpu_allocator.h"
#include "tensorflow/core/lib/core/status_test_util.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/test.h"
namespace tensorflow {
TEST(MKLBFCAllocatorTest, TestMaxLimit) {
AllocatorStats stats;
setenv(MklCPUAllocator::kMaxLimitStr, "1000", 1);
MklCPUAllocator a;
TF_EXPECT_OK(a.Initialize());
a.GetStats(&stats);
EXPECT_EQ(stats.bytes_limit, 1000);
unsetenv(MklCPUAllocator::kMaxLimitStr);
TF_EXPECT_OK(a.Initialize());
a.GetStats(&stats);
uint64 max_mem_bytes = MklCPUAllocator::kDefaultMaxLimit;
#if defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
max_mem_bytes =
(uint64)sysconf(_SC_PHYS_PAGES) * (uint64)sysconf(_SC_PAGESIZE);
#endif
EXPECT_EQ(stats.bytes_limit, max_mem_bytes);
setenv(MklCPUAllocator::kMaxLimitStr, "wrong-input", 1);
EXPECT_TRUE(errors::IsInvalidArgument(a.Initialize()));
setenv(MklCPUAllocator::kMaxLimitStr, "-20", 1);
EXPECT_TRUE(errors::IsInvalidArgument(a.Initialize()));
}
} // namespace tensorflow
#endif // INTEL_MKL

View File

@ -1020,6 +1020,29 @@ Status UnknownShape(shape_inference::InferenceContext* c) {
return Status::OK();
}
template <typename T>
Status ReductionShapeHelper(const Tensor* reduction_indices_t,
const int32 input_rank,
std::set<int64>& true_indices) {
auto reduction_indices = reduction_indices_t->flat<T>();
for (int i = 0; i < reduction_indices_t->NumElements(); ++i) {
const T reduction_index = reduction_indices(i);
if (reduction_index < -input_rank || reduction_index >= input_rank) {
return errors::InvalidArgument("Invalid reduction dimension ",
reduction_index, " for input with ",
input_rank, " dimensions.");
}
auto wrapped_index = reduction_index;
if (wrapped_index < 0) {
wrapped_index += input_rank;
}
true_indices.insert(wrapped_index);
}
return Status::OK();
}
Status ReductionShape(InferenceContext* c) {
ShapeHandle input = c->input(0);
@ -1050,22 +1073,16 @@ Status ReductionShape(InferenceContext* c) {
}
const int32 input_rank = c->Rank(input);
std::set<int32> true_indices;
auto reduction_indices = reduction_indices_t->flat<int32>();
for (int i = 0; i < reduction_indices_t->NumElements(); ++i) {
int32 reduction_index = reduction_indices(i);
if (reduction_index < -input_rank || reduction_index >= input_rank) {
return errors::InvalidArgument("Invalid reduction dimension ",
reduction_index, " for input with ",
input_rank, " dimensions.");
}
int32 wrapped_index = reduction_index;
if (wrapped_index < 0) {
wrapped_index += input_rank;
}
true_indices.insert(wrapped_index);
std::set<int64> true_indices;
if (reduction_indices_t->dtype() == DataType::DT_INT32) {
TF_RETURN_IF_ERROR(ReductionShapeHelper<int32>(reduction_indices_t,
input_rank, true_indices));
} else if (reduction_indices_t->dtype() == DataType::DT_INT64) {
TF_RETURN_IF_ERROR(ReductionShapeHelper<int64>(reduction_indices_t,
input_rank, true_indices));
} else {
return errors::InvalidArgument(
"reduction_indices can only be int32 or int64");
}
std::vector<DimensionHandle> dims;
@ -1319,11 +1336,10 @@ Status ScatterNdUpdateShape(InferenceContext* c) {
Status s = c->Merge(prefix_indices, prefix_updates, &unused);
if (!s.ok()) {
return errors::InvalidArgument(
"The outer ", num_outer_dims,
" dimensions of indices.shape=", c->DebugString(indices_shape),
" must match the outer ", num_outer_dims,
" dimensions of updates.shape=", c->DebugString(updates_shape),
": ", s.error_message());
"The outer ", num_outer_dims, " dimensions of indices.shape=",
c->DebugString(indices_shape), " must match the outer ",
num_outer_dims, " dimensions of updates.shape=",
c->DebugString(updates_shape), ": ", s.error_message());
}
ShapeHandle input_suffix;

View File

@ -35,7 +35,7 @@ message NodeDef {
// CONSTRAINT ::= ("job:" JOB_NAME)
// | ("replica:" [1-9][0-9]*)
// | ("task:" [1-9][0-9]*)
// | ( ("gpu" | "cpu") ":" ([1-9][0-9]* | "*") )
// | ("device:" ("gpu" | "cpu") ":" ([1-9][0-9]* | "*") )
//
// Valid values for this string include:
// * "/job:worker/replica:0/task:1/device:GPU:3" (full specification)

View File

@ -87,7 +87,7 @@ limitations under the License.
#elif defined(__ANDROID_TYPES_FULL__)
// Only half, float, int32, int64, and quantized types are supported.
// Only half, float, int32, int64, bool, and quantized types are supported.
#define TF_CALL_float(m) m(float)
#define TF_CALL_double(m)
#define TF_CALL_int32(m) m(::tensorflow::int32)
@ -117,7 +117,7 @@ limitations under the License.
#else // defined(IS_MOBILE_PLATFORM) && !defined(__ANDROID_TYPES_FULL__)
// Only float and int32 are supported.
// Only float, int32, and bool are supported.
#define TF_CALL_float(m) m(float)
#define TF_CALL_double(m)
#define TF_CALL_int32(m) m(::tensorflow::int32)

View File

@ -210,7 +210,7 @@ class LocalRendezvousImpl : public Rendezvous {
ItemQueue* queue = &table_[key_hash];
if (queue->empty() || !queue->front()->IsSendValue()) {
// There is no message to pick up.
// Only recv-related fileds need to be filled.
// Only recv-related fields need to be filled.
Item* item = new Item;
item->waiter = std::move(done);
item->recv_args = recv_args;

View File

@ -639,7 +639,7 @@ class Graph {
std::unordered_map<string, int> device_names_map_;
// All the while contexts owned by this graph, keyed by frame name,
// corresonding to all the while loops contained in this graph (including
// corresponding to all the while loops contained in this graph (including
// nested loops). The stored contexts are usually accessed via
// AddWhileContext() or Node::while_ctx(), but this manages the lifetime.
std::map<string, WhileContext> while_ctxs_;

View File

@ -543,7 +543,7 @@ class MklLayoutRewritePass : public GraphOptimizationPass {
string reason;
// Substring that should be checked for in device name for CPU device.
const char* const kCPUDeviceSubStr = "cpu";
const char* const kCPUDeviceSubStr = "CPU";
// If Op has been specifically assigned to a non-CPU device, then No.
if (!n->assigned_device_name().empty() &&

View File

@ -39,7 +39,7 @@ limitations under the License.
namespace tensorflow {
namespace {
const char kCPUDevice[] = "/job:a/replica:0/task:0/cpu:0";
const char kCPUDevice[] = "/job:a/replica:0/task:0/device:CPU:0";
const char kGPUDevice[] = "/job:a/replica:0/task:0/device:GPU:0";
static void InitGraph(const string& s, Graph* graph,

View File

@ -480,6 +480,24 @@ Node* Conv2D(Graph* g, Node* in0, Node* in1) {
return ret;
}
Node* Diag(Graph* g, Node* in, DataType type) {
Node* ret;
TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Diag")
.Input(in)
.Attr("T", type)
.Finalize(g, &ret));
return ret;
}
Node* DiagPart(Graph* g, Node* in, DataType type) {
Node* ret;
TF_CHECK_OK(NodeBuilder(g->NewName("n"), "DiagPart")
.Input(in)
.Attr("T", type)
.Finalize(g, &ret));
return ret;
}
void ToGraphDef(Graph* g, GraphDef* gdef) { g->ToGraphDef(gdef); }
} // end namespace graph

View File

@ -199,6 +199,12 @@ Node* BiasAdd(Graph* g, Node* value, Node* bias);
// Add a Conv2D node in "g".
Node* Conv2D(Graph* g, Node* in0, Node* in1);
// Add a Diag node in "g".
Node* Diag(Graph* g, Node* in, DataType type);
// Add a DiagPart node in "g".
Node* DiagPart(Graph* g, Node* in, DataType type);
} // end namespace graph
} // end namespace test
} // end namespace tensorflow

View File

@ -104,7 +104,7 @@ Status ModelPruner::Optimize(Cluster* cluster, const GrapplerItem& item,
// - Don't remove nodes that receive reference values, as those can be
// converting references to non-references. It is important to preserve
// these non-references since the partitioner will avoid sending
// non-references accross partitions more than once.
// non-references across partitions more than once.
if (!rewriter.DrivesControlDependency(node) &&
!rewriter.IsDrivenByControlDependency(node) &&
!rewriter.IsConnectedToFunction(node) &&

View File

@ -2499,6 +2499,7 @@ cc_library(
":cross_op",
":cwise_op",
":fft_ops",
":histogram_op",
":matmul_op",
":population_count_op",
":reduction_ops",
@ -2635,6 +2636,24 @@ tf_kernel_library(
deps = MATH_DEPS,
)
tf_cc_test(
name = "sequence_ops_test",
size = "small",
srcs = ["sequence_ops_test.cc"],
deps = [
":ops_testutil",
":ops_util",
":sequence_ops",
"//tensorflow/core:core_cpu",
"//tensorflow/core:framework",
"//tensorflow/core:lib",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core:test",
"//tensorflow/core:test_main",
"//tensorflow/core:testlib",
],
)
tf_cuda_cc_test(
name = "cast_op_test",
size = "small",
@ -2893,6 +2912,24 @@ tf_cuda_cc_test(
],
)
tf_cuda_cc_test(
name = "diag_op_test",
size = "small",
srcs = ["diag_op_test.cc"],
deps = [
":diag_op",
":ops_testutil",
":ops_util",
"//tensorflow/core:core_cpu",
"//tensorflow/core:framework",
"//tensorflow/core:lib",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core:test",
"//tensorflow/core:test_main",
"//tensorflow/core:testlib",
],
)
# conv_grad_ops currently has to be built with conv_ops*.
# TODO(josh11b, zhengxq): put these a separate libraries in ":nn" below once
# conv_ops_gpu.h has be separated into its own library.
@ -2993,6 +3030,7 @@ cc_library(
":in_topk_op",
":l2loss_op",
":lrn_op",
":nth_element_op",
":relu_op",
":softmax_op",
":softplus_op",
@ -3079,6 +3117,12 @@ tf_kernel_library(
deps = NN_DEPS + if_cuda(["@cub_archive//:cub"]),
)
tf_kernel_library(
name = "nth_element_op",
prefix = "nth_element_op",
deps = NN_DEPS,
)
tf_kernel_library(
name = "xent_op",
prefix = "xent_op",
@ -3096,6 +3140,17 @@ tf_kernel_library(
],
)
tf_kernel_library(
name = "histogram_op",
prefix = "histogram_op",
deps = [
"//tensorflow/core:framework",
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
"//third_party/eigen3",
] + if_cuda(["@cub_archive//:cub"]),
)
tf_kernel_library(
name = "l2loss_op",
prefix = "l2loss_op",

View File

@ -249,40 +249,34 @@ class BatchToSpaceOp : public OpKernel {
Tensor block_shape_;
};
#define REGISTER(T) \
REGISTER_KERNEL_BUILDER(Name("BatchToSpaceND") \
.Device(DEVICE_CPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tblock_shape") \
.TypeConstraint<int32>("Tcrops") \
.HostMemory("block_shape") \
.HostMemory("crops"), \
BatchToSpaceNDOp<CPUDevice, T>); \
REGISTER_KERNEL_BUILDER(Name("BatchToSpace") \
.Device(DEVICE_CPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("crops"), \
#define REGISTER(T) \
REGISTER_KERNEL_BUILDER(Name("BatchToSpaceND") \
.Device(DEVICE_CPU) \
.TypeConstraint<T>("T") \
.HostMemory("block_shape") \
.HostMemory("crops"), \
BatchToSpaceNDOp<CPUDevice, T>); \
REGISTER_KERNEL_BUILDER(Name("BatchToSpace") \
.Device(DEVICE_CPU) \
.TypeConstraint<T>("T") \
.HostMemory("crops"), \
BatchToSpaceOp<CPUDevice, T>);
TF_CALL_REAL_NUMBER_TYPES(REGISTER);
#undef REGISTER
#if GOOGLE_CUDA
#define REGISTER(T) \
REGISTER_KERNEL_BUILDER(Name("BatchToSpaceND") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tblock_shape") \
.TypeConstraint<int32>("Tcrops") \
.HostMemory("block_shape") \
.HostMemory("crops"), \
BatchToSpaceNDOp<GPUDevice, T>); \
REGISTER_KERNEL_BUILDER(Name("BatchToSpace") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("crops"), \
#define REGISTER(T) \
REGISTER_KERNEL_BUILDER(Name("BatchToSpaceND") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.HostMemory("block_shape") \
.HostMemory("crops"), \
BatchToSpaceNDOp<GPUDevice, T>); \
REGISTER_KERNEL_BUILDER(Name("BatchToSpace") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.HostMemory("crops"), \
BatchToSpaceOp<GPUDevice, T>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER);

View File

@ -394,7 +394,7 @@ __global__ void SwapDimension1And2InTensor3SmallDim(const T* input,
int output_block_idx = SmallDim2 ? block_offset : block_offset * small_dim;
int output_block_origin_idx = output_block_offset + output_block_idx;
// Store the tranposed memory region in shared memory to device.
// Store the transposed memory region in shared memory to device.
if (x < tile_height) {
for (int y = 0; y < small_dim; y++) {
int output_idx = output_block_origin_idx + x +

View File

@ -61,8 +61,12 @@ class CropAndResizeOpTest : public OpsTestBase {
REGISTER_TEST(float)
REGISTER_TEST(double)
REGISTER_TEST(int8)
REGISTER_TEST(uint8)
REGISTER_TEST(uint16)
REGISTER_TEST(int8)
REGISTER_TEST(int16)
REGISTER_TEST(int32)
REGISTER_TEST(int64)
#undef REGISTER_TEST

View File

@ -412,7 +412,7 @@ class DatasetIterator : public IteratorBase {
// Owns one reference on the shared dataset resource.
const DatasetType* dataset;
// Identifies the sequence of iterators leading up to to this iterator.
// Identifies the sequence of iterators leading up to this iterator.
const string prefix;
};

View File

@ -14,65 +14,32 @@ limitations under the License.
==============================================================================*/
// See docs in ../ops/array_ops.cc
#define EIGEN_USE_THREADS
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#endif // GOOGLE_CUDA
#include "tensorflow/core/kernels/diag_op.h"
#include <algorithm>
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/util/work_sharder.h"
namespace tensorflow {
namespace {
template <typename T, size_t NumDims, size_t DoubleNumDims>
class DiagonalGenerator {
public:
explicit DiagonalGenerator(const Tensor& diagonal) : diagonal_(diagonal) {
static_assert(DoubleNumDims == 2 * NumDims,
"The second size must be the double of the first size.");
CHECK_EQ(diagonal.dims(), NumDims);
}
T operator()(
const Eigen::array<Eigen::DenseIndex, DoubleNumDims>& coordinates) const {
Eigen::array<Eigen::DenseIndex, NumDims> index;
for (size_t i = 0; i < NumDims; ++i) {
if (coordinates[i] != coordinates[NumDims + i]) {
return T(0);
}
index[i] = coordinates[i];
}
return diagonal_.tensor<T, NumDims>()(index);
}
private:
Tensor diagonal_;
};
template <typename T, size_t NumDims>
class DiagonalExtractor {
public:
explicit DiagonalExtractor(const Tensor& tensor) : tensor_(tensor) {
CHECK_EQ(tensor.dims(), 2 * NumDims);
}
T operator()(const Eigen::array<Eigen::Index, NumDims>& coordinates) const {
Eigen::array<Eigen::Index, 2 * NumDims> index;
for (size_t j = 0; j < NumDims; ++j){
index[j] = coordinates[j];
}
for (size_t j = NumDims; j < 2 * NumDims; ++j){
index[j] = index[j - NumDims];
}
return tensor_.tensor<T, 2 * NumDims>()(index);
}
private:
Tensor tensor_;
};
} // namespace
typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
// Generate the diagonal tensor with the diagonal set to the input tensor.
// It only allows up to rank 3 input tensor, so the output tensor is up to
// rank 6.
template <typename T>
template <typename Device, typename T>
class DiagOp : public OpKernel {
public:
explicit DiagOp(OpKernelConstruction* context) : OpKernel(context) {}
@ -80,9 +47,8 @@ class DiagOp : public OpKernel {
void Compute(OpKernelContext* context) override {
const Tensor& diagonal = context->input(0);
const int num_dims = diagonal.dims();
OP_REQUIRES(context, 1 <= num_dims && num_dims <= 3,
errors::InvalidArgument("Expected 1 <= dims <= 3, got shape ",
diagonal.shape().DebugString()));
OP_REQUIRES(context, 0 != num_dims, errors::InvalidArgument(
"Input must be at least rank 1, got 0"));
TensorShape out_shape;
for (int i = 0; i < num_dims; ++i) {
out_shape.AddDim(diagonal.dim_size(i));
@ -93,45 +59,17 @@ class DiagOp : public OpKernel {
Tensor* output_tensor = nullptr;
OP_REQUIRES_OK(context,
context->allocate_output(0, out_shape, &output_tensor));
switch (num_dims) {
case 1:
output_tensor->tensor<T, 2>() = output_tensor->tensor<T, 2>().generate(
DiagonalGenerator<T, 1, 2>(diagonal));
break;
case 2:
output_tensor->tensor<T, 4>() = output_tensor->tensor<T, 4>().generate(
DiagonalGenerator<T, 2, 4>(diagonal));
break;
case 3:
output_tensor->tensor<T, 6>() = output_tensor->tensor<T, 6>().generate(
DiagonalGenerator<T, 3, 6>(diagonal));
break;
default:
context->SetStatus(errors::Unimplemented(
"Diagonal of rank ", num_dims, " tensor is not supported yet."));
return;
}
functor::DiagFunctor<Device, T> diagFunc;
Status s = diagFunc(context,
diagonal.NumElements(),
diagonal.flat<T>().data(),
output_tensor->flat<T>().data());
OP_REQUIRES_OK(context, s);
}
};
#define REGISTER_DIAGOP(T) \
REGISTER_KERNEL_BUILDER( \
Name("Diag").Device(DEVICE_CPU).TypeConstraint<T>("T"), DiagOp<T>)
REGISTER_DIAGOP(double);
REGISTER_DIAGOP(float);
REGISTER_DIAGOP(int32);
REGISTER_DIAGOP(int64);
REGISTER_DIAGOP(complex64);
REGISTER_DIAGOP(complex128);
#undef REGISTER_DIAGOP
// Generate the diagonal tensor with the diagonal set to the input tensor.
// It only allows rank 2, 4, or 6 input tensor, so the output tensor is
// rank 1, 2, or 3.
template <typename T>
// Extract the diagonal tensor with the diagonal set to the input tensor.
template <typename Device, typename T>
class DiagPartOp : public OpKernel {
public:
explicit DiagPartOp(OpKernelConstruction* context) : OpKernel(context) {}
@ -140,9 +78,9 @@ class DiagPartOp : public OpKernel {
const Tensor& tensor = context->input(0);
const int num_dims = tensor.dims();
const int out_dims = num_dims / 2;
OP_REQUIRES(context, 2 == num_dims || 4 == num_dims || 6 == num_dims,
errors::InvalidArgument("The rank of the tensor should be 2, \
4, or 6, got shape ",
OP_REQUIRES(context, 0 == num_dims % 2,
errors::InvalidArgument("The rank of the tensor should be \
even and positive, got shape ",
tensor.shape().DebugString()));
for (int i = 0; i < out_dims; i++){
OP_REQUIRES(context, tensor.dim_size(i) == tensor.dim_size(i + out_dims),
@ -160,39 +98,158 @@ class DiagPartOp : public OpKernel {
Tensor* output = nullptr;
OP_REQUIRES_OK(context,
context->allocate_output(0, out_shape, &output));
switch (num_dims) {
case 2:
output->tensor<T, 1>() = output->tensor<T, 1>().generate(
DiagonalExtractor<T, 1>(tensor));
break;
case 4:
output->tensor<T, 2>() = output->tensor<T, 2>().generate(
DiagonalExtractor<T, 2>(tensor));
break;
case 6:
output->tensor<T, 3>() = output->tensor<T, 3>().generate(
DiagonalExtractor<T, 3>(tensor));
break;
default:
context->SetStatus(errors::Unimplemented(
"Diagonal of rank ", num_dims, " tensor is not supported yet."));
return;
}
functor::DiagPartFunctor<Device, T> diagPartFunc;
Status s = diagPartFunc(context,
out_shape.num_elements(),
tensor.flat<T>().data(),
output->flat<T>().data());
OP_REQUIRES_OK(context, s);
}
};
#define REGISTER_DIAGPARTOP(T) \
REGISTER_KERNEL_BUILDER( \
Name("DiagPart").Device(DEVICE_CPU).TypeConstraint<T>("T"), DiagPartOp<T>)
// Implementation of the functor specialization for CPU.
//
// According to the diagonal definition,
// `output[i1,..., ik, i1,..., ik] = input[i1,..., ik]`,
//
// Let the rank of input is [s1,..., sk], then any offset of input's
// pointer can be represent by coordinate [i1,..., ik],
// where `index = i1*(s2*...*sk) + i2*(s3*...*sk) +... + ik`
//
// Let new_index is the offset of output's pointer with coordinate
// [i1,..., ik, i1,..., ik], then we have
// `new_index = i1*(s2*...sk*s1*...*sk) + i2*(s3*...*sk*s1*...*sk) +... + \
// ik*(s1*...*sk) + i1*(s2*...*sk) + i2*(s3*...*sk) +... + ik
// = (i1*(s2*...*sk) + i2*(s3*...*sk) +... + ik) * (1 + s1*...*sk)
// = index * (1 + s1*...*sk)
//
// Let `size = s1*...*sk`, we finally have `new_index = index * (1 + size)`,
// which is the transfer function we use below.
// This trick make our implementations clear and easy to be parallel.
namespace functor {
template <typename T>
struct DiagFunctor<CPUDevice, T> {
EIGEN_ALWAYS_INLINE Status
operator() (OpKernelContext* context, const int64 size,
const T* in, T* out) {
// This subprocess is responsible for writing values in index range
// [start*size, limit*size)
auto subDiag = [in, out, size](int64 start, int64 limit) {
std::fill(out + size * start, out + size * limit, T());
for (int64 index = start; index < limit; ++index) {
out[(1 + size) * index] = in[index];
}
};
REGISTER_DIAGPARTOP(double);
REGISTER_DIAGPARTOP(float);
REGISTER_DIAGPARTOP(int32);
REGISTER_DIAGPARTOP(int64);
REGISTER_DIAGPARTOP(complex64);
REGISTER_DIAGPARTOP(complex128);
// Here, 5 is a empirical factor of cost_per_unit.
auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
Shard(worker_threads.num_threads, worker_threads.workers, size,
5 * size, subDiag);
return Status::OK();
}
};
template <typename T>
struct DiagPartFunctor<CPUDevice, T> {
EIGEN_ALWAYS_INLINE Status
operator() (OpKernelContext* context, const int64 size,
const T* in, T* out) {
// This subprocess is responsible for extracting values in index range
// [start, limit)
auto subDiagPart = [in, out, size](int64 start, int64 limit) {
for (int64 index = start; index < limit; ++index) {
out[index] = in[(1 + size) * index];
}
};
// Here, 5 is a empirical factor of cost_per_unit.
auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
Shard(worker_threads.num_threads, worker_threads.workers, size,
5, subDiagPart);
return Status::OK();
}
};
} // namespace functor
// Register the CPU kernels.
#define REGISTER_DIAGOP(T) \
REGISTER_KERNEL_BUILDER( \
Name("Diag").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
DiagOp<CPUDevice, T>)
TF_CALL_double(REGISTER_DIAGOP);
TF_CALL_float(REGISTER_DIAGOP);
TF_CALL_int32(REGISTER_DIAGOP);
TF_CALL_int64(REGISTER_DIAGOP);
TF_CALL_complex64(REGISTER_DIAGOP);
TF_CALL_complex128(REGISTER_DIAGOP);
#undef REGISTER_DIAGOP
#define REGISTER_DIAGPARTOP(T) \
REGISTER_KERNEL_BUILDER( \
Name("DiagPart").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
DiagPartOp<CPUDevice, T>)
TF_CALL_double(REGISTER_DIAGPARTOP);
TF_CALL_float(REGISTER_DIAGPARTOP);
TF_CALL_int32(REGISTER_DIAGPARTOP);
TF_CALL_int64(REGISTER_DIAGPARTOP);
TF_CALL_complex64(REGISTER_DIAGPARTOP);
TF_CALL_complex128(REGISTER_DIAGPARTOP);
#undef REGISTER_DIAGPARTOP
// Register the GPU kernels.
#ifdef GOOGLE_CUDA
// Forward declarations of the functor specializations for GPU.
namespace functor {
extern template struct DiagFunctor<GPUDevice, double>;
extern template struct DiagFunctor<GPUDevice, float>;
extern template struct DiagFunctor<GPUDevice, int32>;
extern template struct DiagFunctor<GPUDevice, int64>;
extern template struct DiagFunctor<GPUDevice, complex64>;
extern template struct DiagFunctor<GPUDevice, complex128>;
} // namespace functor
#define REGISTER_DIAGOP_GPU(T) \
REGISTER_KERNEL_BUILDER( \
Name("Diag").Device(DEVICE_GPU).TypeConstraint<T>("T"), \
DiagOp<GPUDevice, T>)
TF_CALL_double(REGISTER_DIAGOP_GPU);
TF_CALL_float(REGISTER_DIAGOP_GPU);
TF_CALL_int32(REGISTER_DIAGOP_GPU);
TF_CALL_int64(REGISTER_DIAGOP_GPU);
TF_CALL_complex64(REGISTER_DIAGOP_GPU);
TF_CALL_complex128(REGISTER_DIAGOP_GPU);
#undef REGISTER_DIAGOP_GPU
// Forward declarations of the functor specializations for GPU.
namespace functor {
extern template struct DiagPartFunctor<GPUDevice, double>;
extern template struct DiagPartFunctor<GPUDevice, float>;
extern template struct DiagPartFunctor<GPUDevice, int32>;
extern template struct DiagPartFunctor<GPUDevice, int64>;
extern template struct DiagPartFunctor<GPUDevice, complex64>;
extern template struct DiagPartFunctor<GPUDevice, complex128>;
} // namespace functor
#define REGISTER_DIAGPARTOP_GPU(T) \
REGISTER_KERNEL_BUILDER( \
Name("DiagPart").Device(DEVICE_GPU).TypeConstraint<T>("T"), \
DiagPartOp<GPUDevice, T>)
TF_CALL_double(REGISTER_DIAGPARTOP_GPU);
TF_CALL_float(REGISTER_DIAGPARTOP_GPU);
TF_CALL_int32(REGISTER_DIAGPARTOP_GPU);
TF_CALL_int64(REGISTER_DIAGPARTOP_GPU);
TF_CALL_complex64(REGISTER_DIAGPARTOP_GPU);
TF_CALL_complex128(REGISTER_DIAGPARTOP_GPU);
#undef REGISTER_DIAGPARTOP_GPU
#endif // GOOGLE_CUDA
} // namespace tensorflow

View File

@ -0,0 +1,43 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef TENSORFLOW_CORE_KERNELS_DIAG_OP_H_
#define TENSORFLOW_CORE_KERNELS_DIAG_OP_H_
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
namespace tensorflow {
namespace functor {
template <typename Device, typename T>
struct DiagFunctor {
Status operator() (OpKernelContext* context, const int64 size,
const T* in, T* out);
};
template <typename Device, typename T>
struct DiagPartFunctor {
Status operator() (OpKernelContext* context, const int64 size,
const T* in, T* out);
};
} // namespace functor
} // namespace tensorflow
#endif // TENSORFLOW_CORE_KERNELS_DIAG_OP_H_

View File

@ -0,0 +1,139 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include <complex>
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/util/cuda_kernel_helper.h"
#include "tensorflow/core/kernels/diag_op.h"
namespace tensorflow {
namespace functor {
typedef Eigen::GpuDevice GPUDevice;
template <typename T>
__global__ void DiagCudaKernel(const int num_threads,
const int64 size,
const T* in,
T* out) {
CUDA_1D_KERNEL_LOOP(index, num_threads) {
// Fill the diagonal elements or set to zero in other place.
if (index % (1 + size) == 0) {
out[index] = in[index / (1 + size)];
} else {
out[index] = T(0);
}
}
}
template <typename T>
struct DiagFunctor<GPUDevice, T> {
EIGEN_ALWAYS_INLINE Status
operator() (OpKernelContext* context, const int64 size,
const T* in, T* out) {
// Empty tensor couldn't launch the kernel.
if (size == 0) {
return Status::OK();
}
// CudaLaunchConfig uses an int for virtual_thread_count,
// so this may overflow for `size*size` in extreme cases,
// here is checking the multiplication overflow for integer.
if (size && (int(size * size) / size) != size) {
return errors::Internal(
"DiagOp got input size too large.");
}
int virtual_thread_count = int(size * size);
// Launch the GPU kernel.
const GPUDevice& device = context->eigen_device<GPUDevice>();
CudaLaunchConfig diag_config = GetCudaLaunchConfig(
virtual_thread_count, device);
DiagCudaKernel<<<diag_config.block_count,
diag_config.thread_per_block,
0, device.stream()>>>(
diag_config.virtual_thread_count, size, in, out);
auto err = cudaGetLastError();
if (err != cudaSuccess) {
return errors::Internal(
"Could not launch DiagOp kernel: ",
cudaGetErrorString(err), ".");
}
return Status::OK();
}
};
template struct DiagFunctor<GPUDevice, double>;
template struct DiagFunctor<GPUDevice, float>;
template struct DiagFunctor<GPUDevice, int32>;
template struct DiagFunctor<GPUDevice, int64>;
template struct DiagFunctor<GPUDevice, complex64>;
template struct DiagFunctor<GPUDevice, complex128>;
template <typename T>
__global__ void DiagPartCudaKernel(const int num_threads,
const int64 size,
const T* in,
T* out) {
CUDA_1D_KERNEL_LOOP(index, num_threads) {
out[index] = in[(1 + size) * index];
}
}
template <typename T>
struct DiagPartFunctor<GPUDevice, T> {
EIGEN_ALWAYS_INLINE Status
operator() (OpKernelContext* context, const int64 size,
const T* in, T* out) {
// Empty tensor couldn't launch the kernel.
if (size == 0) {
return Status::OK();
}
const GPUDevice& device = context->eigen_device<GPUDevice>();
// Extract the diagonal elements.
CudaLaunchConfig diag_config = GetCudaLaunchConfig(size, device);
DiagPartCudaKernel<<<diag_config.block_count,
diag_config.thread_per_block,
0, device.stream()>>>(
diag_config.virtual_thread_count, size, in, out);
auto err = cudaGetLastError();
if (err != cudaSuccess) {
return errors::Internal(
"Could not launch DiagPartOp kernel: ",
cudaGetErrorString(err), ".");
}
return Status::OK();
}
};
template struct DiagPartFunctor<GPUDevice, double>;
template struct DiagPartFunctor<GPUDevice, float>;
template struct DiagPartFunctor<GPUDevice, int32>;
template struct DiagPartFunctor<GPUDevice, int64>;
template struct DiagPartFunctor<GPUDevice, complex64>;
template struct DiagPartFunctor<GPUDevice, complex128>;
} // end namespace functor
} // end namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -0,0 +1,54 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/platform/test.h"
#include "tensorflow/core/platform/test_benchmark.h"
namespace tensorflow {
template <typename T>
static Graph* Diag(int n, DataType type) {
Graph* g = new Graph(OpRegistry::Global());
Tensor in(type, TensorShape({n}));
in.flat<T>().setRandom();
Node* out = test::graph::Diag(g, test::graph::Constant(g, in), type);
test::graph::DiagPart(g, out, type);
return g;
}
#define BM_DiagDev(N, T, TFTYPE, DEVICE) \
static void BM_Diag##_##N##_##TFTYPE##_##DEVICE(int iters) { \
testing::UseRealTime(); \
testing::ItemsProcessed(static_cast<int64>(iters) * N * N); \
test::Benchmark(#DEVICE, Diag<T>(N, TFTYPE)).Run(iters); \
} \
BENCHMARK(BM_Diag##_##N##_##TFTYPE##_##DEVICE);
#define BM_Diag(N) \
BM_DiagDev(N, int, DT_INT32, cpu); \
BM_DiagDev(N, float, DT_FLOAT, cpu); \
BM_DiagDev(N, std::complex<float>, DT_COMPLEX64, cpu); \
BM_DiagDev(N, int, DT_INT32, gpu); \
BM_DiagDev(N, float, DT_FLOAT, gpu); \
BM_DiagDev(N, std::complex<float>, DT_COMPLEX64, gpu);
BM_Diag(16);
BM_Diag(128);
BM_Diag(512);
} // end namespace tensorflow

View File

@ -0,0 +1,147 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// See docs in ../ops/math_ops.cc.
#define EIGEN_USE_THREADS
#include "tensorflow/core/kernels/histogram_op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/types.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/platform/types.h"
namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
namespace functor {
template <typename T, typename Tout>
struct HistogramFixedWidthFunctor<CPUDevice, T, Tout> {
static Status Compute(OpKernelContext* context,
const typename TTypes<T, 1>::ConstTensor& values,
const typename TTypes<T, 1>::ConstTensor& value_range,
int32 nbins, typename TTypes<Tout, 1>::Tensor& out) {
const CPUDevice& d = context->eigen_device<CPUDevice>();
Tensor index_to_bin_tensor;
TF_RETURN_IF_ERROR(context->forward_input_or_allocate_temp(
{0}, DataTypeToEnum<int32>::value, TensorShape({values.size()}),
&index_to_bin_tensor));
auto index_to_bin = index_to_bin_tensor.flat<int32>();
const double step = static_cast<double>(value_range(1) - value_range(0)) /
static_cast<double>(nbins);
// The calculation is done by finding the slot of each value in `values`.
// With [a, b]:
// step = (b - a) / nbins
// (x - a) / step
// , then the entries are mapped to output.
index_to_bin.device(d) =
((values.cwiseMax(value_range(0)) - values.constant(value_range(0)))
.template cast<double>() /
step)
.template cast<int32>()
.cwiseMin(nbins - 1);
out.setZero();
for (int32 i = 0; i < index_to_bin.size(); i++) {
out(index_to_bin(i)) += Tout(1);
}
return Status::OK();
}
};
} // namespace functor
template <typename Device, typename T, typename Tout>
class HistogramFixedWidthOp : public OpKernel {
public:
explicit HistogramFixedWidthOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
void Compute(OpKernelContext* ctx) override {
const Tensor& values_tensor = ctx->input(0);
const Tensor& value_range_tensor = ctx->input(1);
const Tensor& nbins_tensor = ctx->input(2);
OP_REQUIRES(ctx, TensorShapeUtils::IsVector(value_range_tensor.shape()),
errors::InvalidArgument("value_range should be a vector."));
OP_REQUIRES(ctx, (value_range_tensor.shape().num_elements() == 2),
errors::InvalidArgument(
"value_range should be a vector of 2 elements."));
OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(nbins_tensor.shape()),
errors::InvalidArgument("nbins should be a scalar."));
const auto values = values_tensor.flat<T>();
const auto value_range = value_range_tensor.flat<T>();
const auto nbins = nbins_tensor.scalar<int32>()();
OP_REQUIRES(
ctx, (value_range(0) < value_range(1)),
errors::InvalidArgument("value_range should satisfy value_range[0] < "
"value_range[1], but got '[",
value_range(0), ", ", value_range(1), "]'"));
OP_REQUIRES(
ctx, (nbins > 0),
errors::InvalidArgument("nbins should be a positive number, but got '",
nbins, "'"));
Tensor* out_tensor;
OP_REQUIRES_OK(ctx,
ctx->allocate_output(0, TensorShape({nbins}), &out_tensor));
auto out = out_tensor->flat<Tout>();
OP_REQUIRES_OK(
ctx, functor::HistogramFixedWidthFunctor<Device, T, Tout>::Compute(
ctx, values, value_range, nbins, out));
}
};
#define REGISTER_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("HistogramFixedWidth") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("dtype"), \
HistogramFixedWidthOp<CPUDevice, type, int32>) \
REGISTER_KERNEL_BUILDER(Name("HistogramFixedWidth") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("dtype"), \
HistogramFixedWidthOp<CPUDevice, type, int64>)
TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNELS);
#undef REGISTER_KERNELS
#if GOOGLE_CUDA
#define REGISTER_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("HistogramFixedWidth") \
.Device(DEVICE_GPU) \
.HostMemory("value_range") \
.HostMemory("nbins") \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("dtype"), \
HistogramFixedWidthOp<GPUDevice, type, int32>)
TF_CALL_GPU_NUMBER_TYPES(REGISTER_KERNELS);
#undef REGISTER_KERNELS
#endif // GOOGLE_CUDA
} // end namespace tensorflow

View File

@ -0,0 +1,38 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef TENSORFLOW_HISTOGRAM_OP_H_
#define TENSORFLOW_HISTOGRAM_OP_H_
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/framework/types.h"
#include "tensorflow/core/lib/core/errors.h"
namespace tensorflow {
namespace functor {
template <typename Device, typename T, typename Tout>
struct HistogramFixedWidthFunctor {
static Status Compute(OpKernelContext* context,
const typename TTypes<T, 1>::ConstTensor& values,
const typename TTypes<T, 1>::ConstTensor& value_range,
int32 nbins, typename TTypes<Tout, 1>::Tensor& out);
};
} // end namespace functor
} // end namespace tensorflow
#endif // TENSORFLOW_HISTOGRAM_OP_H_

View File

@ -0,0 +1,125 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "tensorflow/core/kernels/histogram_op.h"
#include "external/cub_archive/cub/device/device_histogram.cuh"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/util/cuda_kernel_helper.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
namespace tensorflow {
typedef Eigen::GpuDevice GPUDevice;
namespace functor {
// TODO(yongtang) int64 of atomicAdd is not supported yet.
template <typename T, typename Tout>
struct HistogramFixedWidthFunctor<GPUDevice, T, Tout> {
static Status Compute(OpKernelContext* context,
const typename TTypes<T, 1>::ConstTensor& values,
const typename TTypes<T, 1>::ConstTensor& value_range,
int32 nbins, typename TTypes<Tout, 1>::Tensor& out) {
tensorflow::AllocatorAttributes pinned_allocator;
pinned_allocator.set_on_host(true);
pinned_allocator.set_gpu_compatible(true);
Tensor levels_tensor;
TF_RETURN_IF_ERROR(context->allocate_temp(
DataTypeToEnum<T>::value, TensorShape({nbins + 1}), &levels_tensor,
pinned_allocator));
auto levels = levels_tensor.flat<T>();
const double step = static_cast<double>(value_range(1) - value_range(0)) /
static_cast<double>(nbins);
levels(0) = std::numeric_limits<T>::lowest();
for (int i = 1; i < nbins; i++) {
levels(i) =
static_cast<T>(static_cast<double>(value_range(0)) + step * i);
}
levels(nbins) = std::numeric_limits<T>::max();
size_t temp_storage_bytes = 0;
const T* d_samples = values.data();
Tout* d_histogram = out.data();
int num_levels = levels.size();
T* d_levels = levels.data();
int num_samples = values.size();
const cudaStream_t& stream = GetCudaStream(context);
// The first HistogramRange is to obtain the temp storage size required
// with d_temp_storage = NULL passed to the call.
auto err = cub::DeviceHistogram::HistogramRange(
/* d_temp_storage */ NULL,
/* temp_storage_bytes */ temp_storage_bytes,
/* d_samples */ d_samples,
/* d_histogram */ d_histogram,
/* num_levels */ num_levels,
/* d_levels */ d_levels,
/* num_samples */ num_samples,
/* stream */ stream);
if (err != cudaSuccess) {
return errors::Internal(
"Could not launch HistogramRange to get temp storage: ",
cudaGetErrorString(err), ".");
}
Tensor temp_storage;
TF_RETURN_IF_ERROR(context->allocate_temp(
DataTypeToEnum<int8>::value,
TensorShape({static_cast<int64>(temp_storage_bytes)}), &temp_storage));
void* d_temp_storage = temp_storage.flat<int8>().data();
// The second HistogramRange is to actual run with d_temp_storage
// allocated with temp_storage_bytes.
err = cub::DeviceHistogram::HistogramRange(
/* d_temp_storage */ d_temp_storage,
/* temp_storage_bytes */ temp_storage_bytes,
/* d_samples */ d_samples,
/* d_histogram */ d_histogram,
/* num_levels */ num_levels,
/* d_levels */ d_levels,
/* num_samples */ num_samples,
/* stream */ stream);
if (err != cudaSuccess) {
return errors::Internal("Could not launch HistogramRange: ",
cudaGetErrorString(err), ".");
}
return Status::OK();
}
};
} // end namespace functor
#define REGISTER_GPU_SPEC(type) \
template struct functor::HistogramFixedWidthFunctor<GPUDevice, type, int32>;
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_SPEC);
#undef REGISTER_GPU_SPEC
} // namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -24,12 +24,13 @@ limitations under the License.
#include "tensorflow/core/lib/core/status.h"
namespace tensorflow {
template <typename T>
template <typename T, typename Tidx>
class ListDiffOp : public OpKernel {
public:
explicit ListDiffOp(OpKernelConstruction* context) : OpKernel(context) {
const DataType dt = DataTypeToEnum<T>::v();
OP_REQUIRES_OK(context, context->MatchSignature({dt, dt}, {dt, DT_INT32}));
const DataType dtidx = DataTypeToEnum<Tidx>::v();
OP_REQUIRES_OK(context, context->MatchSignature({dt, dt}, {dt, dtidx}));
}
void Compute(OpKernelContext* context) override {
@ -72,9 +73,9 @@ class ListDiffOp : public OpKernel {
Tensor* indices = nullptr;
OP_REQUIRES_OK(context, context->allocate_output(1, {out_size}, &indices));
auto Tindices = indices->vec<int32>();
auto Tindices = indices->vec<Tidx>();
for (int i = 0, p = 0; i < static_cast<int32>(x_size); ++i) {
for (Tidx i = 0, p = 0; i < static_cast<Tidx>(x_size); ++i) {
if (y_set.count(Tx(i)) == 0) {
OP_REQUIRES(context, p < out_size,
errors::InvalidArgument(
@ -95,7 +96,12 @@ class ListDiffOp : public OpKernel {
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("out_idx"), \
ListDiffOp<type>)
ListDiffOp<type, int32>) \
REGISTER_KERNEL_BUILDER(Name("ListDiff") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("out_idx"), \
ListDiffOp<type, int64>)
TF_CALL_REAL_NUMBER_TYPES(REGISTER_LISTDIFF);
REGISTER_LISTDIFF(string);

View File

@ -111,15 +111,21 @@ class StagingMap : public ResourceBase {
void notify_inserters_if_bounded(std::unique_lock<std::mutex>* lock) {
if (has_capacity() || has_memory_limit()) {
lock->unlock();
full_.notify_one();
// Notify all inserters. The removal of an element
// may make memory available for many inserters
// to insert new elements
full_.notify_all();
}
}
// Notify any removers waiting to extract values
// Notify all removers waiting to extract values
// that data is now available
void notify_removers(std::unique_lock<std::mutex>* lock) {
lock->unlock();
not_empty_.notify_one();
// Notify all removers. This is because they are
// waiting for specific keys to appear in the map
// so we don't know which one to wake up.
not_empty_.notify_all();
}
bool has_capacity() const { return capacity_ > 0; }

View File

@ -18,10 +18,10 @@ limitations under the License.
#define EIGEN_USE_THREADS
#include "tensorflow/core/kernels/mirror_pad_op.h"
#include <string>
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
@ -35,7 +35,7 @@ limitations under the License.
namespace tensorflow {
template <typename Device, typename T>
template <typename Device, typename T, typename Tpaddings>
class MirrorPadOp : public OpKernel {
public:
explicit MirrorPadOp(OpKernelConstruction* context) : OpKernel(context) {
@ -82,10 +82,10 @@ class MirrorPadOp : public OpKernel {
// Compute the shape of the output tensor, and allocate it.
TensorShape output_shape;
TTypes<int32>::ConstMatrix paddings = in1.matrix<int32>();
typename TTypes<Tpaddings>::ConstMatrix paddings = in1.matrix<Tpaddings>();
for (int d = 0; d < dims; ++d) {
const int32 before = paddings(d, 0); // Pad before existing elements.
const int32 after = paddings(d, 1); // Pad after existing elements.
const Tpaddings before = paddings(d, 0); // Pad before existing elements.
const Tpaddings after = paddings(d, 1); // Pad after existing elements.
OP_REQUIRES(context, before >= 0 && after >= 0,
errors::InvalidArgument("paddings must be non-negative: ",
before, " ", after));
@ -121,7 +121,7 @@ class MirrorPadOp : public OpKernel {
#define MIRROR_PAD_CASE(i) \
case i: { \
functor::MirrorPad<Device, T, i>()( \
functor::MirrorPad<Device, T, Tpaddings, i>()( \
context->eigen_device<Device>(), To32Bit(output->tensor<T, i>()), \
To32Bit(in0.tensor<T, i>()), paddings, offset_); \
break; \
@ -152,20 +152,25 @@ using GpuDevice = Eigen::GpuDevice;
namespace functor {
// Forward declarations of the functor specializations defined in the sharded
// files.
#define DECLARE_CPU_SPEC(T, i) \
template <> \
void MirrorPad<CpuDevice, T, i>::operator()( \
const CpuDevice&, typename TTypes<T, i, int32>::Tensor, \
typename TTypes<T, i, int32>::ConstTensor, TTypes<int32>::ConstMatrix, \
int); \
extern template struct MirrorPad<CpuDevice, T, i>;
#define DECLARE_CPU_SPEC(T, Tpaddings, i) \
template <> \
void MirrorPad<CpuDevice, T, Tpaddings, i>::operator()( \
const CpuDevice&, typename TTypes<T, i, int32>::Tensor, \
typename TTypes<T, i, int32>::ConstTensor, \
TTypes<Tpaddings>::ConstMatrix, int); \
extern template struct MirrorPad<CpuDevice, T, Tpaddings, i>;
#define DECLARE_CPU_SPECS(T) \
DECLARE_CPU_SPEC(T, 1); \
DECLARE_CPU_SPEC(T, 2); \
DECLARE_CPU_SPEC(T, 3); \
DECLARE_CPU_SPEC(T, 4); \
DECLARE_CPU_SPEC(T, 5);
#define DECLARE_CPU_SPECS(T) \
DECLARE_CPU_SPEC(T, int32, 1); \
DECLARE_CPU_SPEC(T, int32, 2); \
DECLARE_CPU_SPEC(T, int32, 3); \
DECLARE_CPU_SPEC(T, int32, 4); \
DECLARE_CPU_SPEC(T, int32, 5); \
DECLARE_CPU_SPEC(T, int64, 1); \
DECLARE_CPU_SPEC(T, int64, 2); \
DECLARE_CPU_SPEC(T, int64, 3); \
DECLARE_CPU_SPEC(T, int64, 4); \
DECLARE_CPU_SPEC(T, int64, 5);
TF_CALL_POD_TYPES(DECLARE_CPU_SPECS);
@ -179,7 +184,13 @@ TF_CALL_POD_TYPES(DECLARE_CPU_SPECS);
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadOp<CpuDevice, type>);
MirrorPadOp<CpuDevice, type, int32>); \
REGISTER_KERNEL_BUILDER(Name("MirrorPad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadOp<CpuDevice, type, int64>);
// Note that we do register for bool type, but not in the gradient op.
TF_CALL_POD_TYPES(REGISTER_KERNEL);
@ -188,20 +199,25 @@ TF_CALL_POD_TYPES(REGISTER_KERNEL);
#if GOOGLE_CUDA
namespace functor {
// Forward declarations of the functor specializations for GPU.
#define DECLARE_GPU_SPEC(T, i) \
template <> \
void MirrorPad<GpuDevice, T, i>::operator()( \
const GpuDevice&, typename TTypes<T, i, int32>::Tensor, \
typename TTypes<T, i, int32>::ConstTensor, TTypes<int32>::ConstMatrix, \
int); \
extern template struct MirrorPad<GpuDevice, T, i>;
#define DECLARE_GPU_SPEC(T, Tpaddings, i) \
template <> \
void MirrorPad<GpuDevice, T, Tpaddings, i>::operator()( \
const GpuDevice&, typename TTypes<T, i, int32>::Tensor, \
typename TTypes<T, i, int32>::ConstTensor, \
TTypes<Tpaddings>::ConstMatrix, int); \
extern template struct MirrorPad<GpuDevice, T, Tpaddings, i>;
#define DECLARE_GPU_SPECS(T) \
DECLARE_GPU_SPEC(T, 1); \
DECLARE_GPU_SPEC(T, 2); \
DECLARE_GPU_SPEC(T, 3); \
DECLARE_GPU_SPEC(T, 4); \
DECLARE_GPU_SPEC(T, 5);
#define DECLARE_GPU_SPECS(T) \
DECLARE_GPU_SPEC(T, int32, 1); \
DECLARE_GPU_SPEC(T, int32, 2); \
DECLARE_GPU_SPEC(T, int32, 3); \
DECLARE_GPU_SPEC(T, int32, 4); \
DECLARE_GPU_SPEC(T, int32, 5); \
DECLARE_GPU_SPEC(T, int64, 1); \
DECLARE_GPU_SPEC(T, int64, 2); \
DECLARE_GPU_SPEC(T, int64, 3); \
DECLARE_GPU_SPEC(T, int64, 4); \
DECLARE_GPU_SPEC(T, int64, 5);
TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
#undef DECLARE_GPU_SPECS
@ -215,14 +231,20 @@ TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadOp<GpuDevice, T>)
MirrorPadOp<GpuDevice, T, int32>); \
REGISTER_KERNEL_BUILDER(Name("MirrorPad") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadOp<GpuDevice, T, int64>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNEL);
#undef REGISTER_GPU_KERNEL
#endif // GOOGLE_CUDA
// Gradient op.
template <typename Device, typename T>
template <typename Device, typename T, typename Tpaddings>
class MirrorPadGradOp : public OpKernel {
public:
explicit MirrorPadGradOp(OpKernelConstruction* context) : OpKernel(context) {
@ -269,10 +291,10 @@ class MirrorPadGradOp : public OpKernel {
// Compute the shape of the output tensor, and allocate it.
TensorShape output_shape;
TTypes<int32>::ConstMatrix paddings = in1.matrix<int32>();
typename TTypes<Tpaddings>::ConstMatrix paddings = in1.matrix<Tpaddings>();
for (int d = 0; d < dims; ++d) {
const int32 before = paddings(d, 0); // Pad before existing elements.
const int32 after = paddings(d, 1); // Pad after existing elements.
const Tpaddings before = paddings(d, 0); // Pad before existing elements.
const Tpaddings after = paddings(d, 1); // Pad after existing elements.
OP_REQUIRES(context, before >= 0 && after >= 0,
errors::InvalidArgument("Paddings must be non-negative: ",
before, ", ", after));
@ -308,7 +330,7 @@ class MirrorPadGradOp : public OpKernel {
#define MIRROR_PAD_GRAD_CASE(k) \
case k: { \
functor::MirrorPadGrad<Device, T, k>()( \
functor::MirrorPadGrad<Device, T, Tpaddings, k>()( \
context->eigen_device<Device>(), To32Bit(output->tensor<T, k>()), \
To32Bit(in0.tensor<T, k>()), paddings, offset_, \
To32Bit(scratch.tensor<T, k>())); \
@ -337,33 +359,45 @@ class MirrorPadGradOp : public OpKernel {
namespace functor {
// Forward declarations of the functor specializations defined in the sharded
// files.
#define DECLARE_CPU_SPEC(T, k) \
template <> \
void MirrorPadGrad<CpuDevice, T, k>::operator()( \
const CpuDevice&, typename TTypes<T, k, int32>::Tensor, \
typename TTypes<T, k, int32>::ConstTensor, TTypes<int32>::ConstMatrix, \
int, typename TTypes<T, k, int32>::Tensor); \
extern template struct MirrorPadGrad<CpuDevice, T, k>;
#define DECLARE_CPU_SPEC(T, Tpaddings, k) \
template <> \
void MirrorPadGrad<CpuDevice, T, Tpaddings, k>::operator()( \
const CpuDevice&, typename TTypes<T, k, int32>::Tensor, \
typename TTypes<T, k, int32>::ConstTensor, \
TTypes<Tpaddings>::ConstMatrix, int, \
typename TTypes<T, k, int32>::Tensor); \
extern template struct MirrorPadGrad<CpuDevice, T, Tpaddings, k>;
#define DECLARE_CPU_SPECS(T) \
DECLARE_CPU_SPEC(T, 1); \
DECLARE_CPU_SPEC(T, 2); \
DECLARE_CPU_SPEC(T, 3); \
DECLARE_CPU_SPEC(T, 4); \
DECLARE_CPU_SPEC(T, 5);
#define DECLARE_CPU_SPECS(T) \
DECLARE_CPU_SPEC(T, int32, 1); \
DECLARE_CPU_SPEC(T, int32, 2); \
DECLARE_CPU_SPEC(T, int32, 3); \
DECLARE_CPU_SPEC(T, int32, 4); \
DECLARE_CPU_SPEC(T, int32, 5); \
DECLARE_CPU_SPEC(T, int64, 1); \
DECLARE_CPU_SPEC(T, int64, 2); \
DECLARE_CPU_SPEC(T, int64, 3); \
DECLARE_CPU_SPEC(T, int64, 4); \
DECLARE_CPU_SPEC(T, int64, 5);
TF_CALL_NUMBER_TYPES(DECLARE_CPU_SPECS);
#undef DECLARE_CPU_SPECS
#undef DECLARE_CPU_SPEC
} // namespace functor
#define REGISTER_KERNEL(type) \
REGISTER_KERNEL_BUILDER(Name("MirrorPadGrad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadGradOp<CpuDevice, type>);
#define REGISTER_KERNEL(type) \
REGISTER_KERNEL_BUILDER(Name("MirrorPadGrad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadGradOp<CpuDevice, type, int32>); \
REGISTER_KERNEL_BUILDER(Name("MirrorPadGrad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadGradOp<CpuDevice, type, int64>);
TF_CALL_NUMBER_TYPES(REGISTER_KERNEL);
#undef REGISTER_KERNEL
@ -371,20 +405,26 @@ TF_CALL_NUMBER_TYPES(REGISTER_KERNEL);
#if GOOGLE_CUDA
namespace functor {
// Forward declarations of the functor specializations for GPU.
#define DECLARE_GPU_SPEC(T, k) \
template <> \
void MirrorPadGrad<GpuDevice, T, k>::operator()( \
const GpuDevice&, typename TTypes<T, k, int32>::Tensor, \
typename TTypes<T, k, int32>::ConstTensor, TTypes<int32>::ConstMatrix, \
int, typename TTypes<T, k, int32>::Tensor); \
extern template struct MirrorPadGrad<GpuDevice, T, k>;
#define DECLARE_GPU_SPEC(T, Tpaddings, k) \
template <> \
void MirrorPadGrad<GpuDevice, T, Tpaddings, k>::operator()( \
const GpuDevice&, typename TTypes<T, k, int32>::Tensor, \
typename TTypes<T, k, int32>::ConstTensor, \
TTypes<Tpaddings>::ConstMatrix, int, \
typename TTypes<T, k, int32>::Tensor); \
extern template struct MirrorPadGrad<GpuDevice, T, Tpaddings, k>;
#define DECLARE_GPU_SPECS(T) \
DECLARE_GPU_SPEC(T, 1); \
DECLARE_GPU_SPEC(T, 2); \
DECLARE_GPU_SPEC(T, 3); \
DECLARE_GPU_SPEC(T, 4); \
DECLARE_GPU_SPEC(T, 5);
#define DECLARE_GPU_SPECS(T) \
DECLARE_GPU_SPEC(T, int32, 1); \
DECLARE_GPU_SPEC(T, int32, 2); \
DECLARE_GPU_SPEC(T, int32, 3); \
DECLARE_GPU_SPEC(T, int32, 4); \
DECLARE_GPU_SPEC(T, int32, 5); \
DECLARE_GPU_SPEC(T, int64, 1); \
DECLARE_GPU_SPEC(T, int64, 2); \
DECLARE_GPU_SPEC(T, int64, 3); \
DECLARE_GPU_SPEC(T, int64, 4); \
DECLARE_GPU_SPEC(T, int64, 5);
TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
#undef DECLARE_GPU_SPECS
@ -398,7 +438,13 @@ TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadGradOp<GpuDevice, T>)
MirrorPadGradOp<GpuDevice, T, int32>); \
REGISTER_KERNEL_BUILDER(Name("MirrorPadGrad") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
MirrorPadGradOp<GpuDevice, T, int64>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNEL);
#undef REGISTER_GPU_KERNEL

View File

@ -64,9 +64,8 @@ class TensorMirrorPadOp
StorageKind;
typedef typename Eigen::internal::traits<TensorMirrorPadOp>::Index Index;
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
TensorMirrorPadOp(const XprType& expr, const PaddingDimensions& padding_dims,
Index offset)
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMirrorPadOp(
const XprType& expr, const PaddingDimensions& padding_dims, Index offset)
: xpr_(expr), padding_dims_(padding_dims), offset_(offset) {}
EIGEN_DEVICE_FUNC
@ -336,12 +335,12 @@ namespace functor {
// offset argument must be either 0 or 1. This controls whether the boundary
// values are replicated (offset == 0) or not replicated (offset == 1).
template <typename Device, typename T, int Dims>
template <typename Device, typename T, typename Tpaddings, int Dims>
struct MirrorPad {
void operator()(const Device& device,
typename TTypes<T, Dims, int32>::Tensor output,
typename TTypes<T, Dims, int32>::ConstTensor input,
TTypes<int32>::ConstMatrix padding, int offset) {
typename TTypes<Tpaddings>::ConstMatrix padding, int offset) {
Eigen::array<Eigen::IndexPair<int32>, Dims> padding_dims;
for (int i = 0; i < Dims; ++i) {
@ -363,12 +362,12 @@ struct MirrorPad {
// offset argument must be either 0 or 1. This controls whether the boundary
// values are replicated (offset == 0) or not replicated (offset == 1).
template <typename Device, typename T, int Dims>
template <typename Device, typename T, typename Tpaddings, int Dims>
struct MirrorPadGrad {
void operator()(const Device& device,
typename TTypes<T, Dims, int32>::Tensor output,
typename TTypes<T, Dims, int32>::ConstTensor input,
TTypes<int32>::ConstMatrix paddings, int offset,
typename TTypes<Tpaddings>::ConstMatrix paddings, int offset,
typename TTypes<T, Dims, int32>::Tensor scratch) {
// Copy the gradient input into the scratch buffer.
scratch.device(device) = input;

View File

@ -25,13 +25,17 @@ namespace tensorflow {
using CpuDevice = Eigen::ThreadPoolDevice;
#define DEFINE_CPU_SPECS(T) \
template struct functor::MirrorPad<CpuDevice, T, CPU_PROVIDED_IXDIM>;
#define DEFINE_CPU_SPECS(T) \
template struct functor::MirrorPad<CpuDevice, T, int32, CPU_PROVIDED_IXDIM>; \
template struct functor::MirrorPad<CpuDevice, T, int64, CPU_PROVIDED_IXDIM>;
TF_CALL_POD_TYPES(DEFINE_CPU_SPECS);
#undef DEFINE_CPU_SPECS
#define DEFINE_CPU_SPECS(T) \
template struct functor::MirrorPadGrad<CpuDevice, T, CPU_PROVIDED_IXDIM>;
#define DEFINE_CPU_SPECS(T) \
template struct functor::MirrorPadGrad<CpuDevice, T, int32, \
CPU_PROVIDED_IXDIM>; \
template struct functor::MirrorPadGrad<CpuDevice, T, int64, \
CPU_PROVIDED_IXDIM>;
TF_CALL_NUMBER_TYPES(DEFINE_CPU_SPECS);
#undef DEFINE_CPU_SPECS

View File

@ -25,17 +25,27 @@ namespace tensorflow {
using GpuDevice = Eigen::GpuDevice;
#define DEFINE_GPU_SPECS(T) \
template struct functor::MirrorPad<GpuDevice, T, 1>; \
template struct functor::MirrorPad<GpuDevice, T, 2>; \
template struct functor::MirrorPad<GpuDevice, T, 3>; \
template struct functor::MirrorPad<GpuDevice, T, 4>; \
template struct functor::MirrorPad<GpuDevice, T, 5>; \
template struct functor::MirrorPadGrad<GpuDevice, T, 1>; \
template struct functor::MirrorPadGrad<GpuDevice, T, 2>; \
template struct functor::MirrorPadGrad<GpuDevice, T, 3>; \
template struct functor::MirrorPadGrad<GpuDevice, T, 4>; \
template struct functor::MirrorPadGrad<GpuDevice, T, 5>;
#define DEFINE_GPU_SPECS(T) \
template struct functor::MirrorPad<GpuDevice, T, int32, 1>; \
template struct functor::MirrorPad<GpuDevice, T, int32, 2>; \
template struct functor::MirrorPad<GpuDevice, T, int32, 3>; \
template struct functor::MirrorPad<GpuDevice, T, int32, 4>; \
template struct functor::MirrorPad<GpuDevice, T, int32, 5>; \
template struct functor::MirrorPad<GpuDevice, T, int64, 1>; \
template struct functor::MirrorPad<GpuDevice, T, int64, 2>; \
template struct functor::MirrorPad<GpuDevice, T, int64, 3>; \
template struct functor::MirrorPad<GpuDevice, T, int64, 4>; \
template struct functor::MirrorPad<GpuDevice, T, int64, 5>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int32, 1>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int32, 2>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int32, 3>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int32, 4>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int32, 5>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int64, 1>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int64, 2>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int64, 3>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int64, 4>; \
template struct functor::MirrorPadGrad<GpuDevice, T, int64, 5>;
TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
#undef DEFINE_GPU_SPECS

View File

@ -288,8 +288,10 @@ class MklConv2DOp : public OpKernel {
mkl_filter_output_mkl_shape.SetMklLayout(mkl_context.prim_fwd,
dnnResourceFilter);
size_t filter_sizes[4] = {filter.dim_size(0), filter.dim_size(1),
filter.dim_size(2), filter.dim_size(3)};
size_t filter_sizes[4] = {static_cast<size_t>(filter.dim_size(0)),
static_cast<size_t>(filter.dim_size(1)),
static_cast<size_t>(filter.dim_size(2)),
static_cast<size_t>(filter.dim_size(3))};
mkl_filter_output_mkl_shape.SetTfLayout(filter.dims(), filter_sizes,
mkl_context.filter_strides);

View File

@ -0,0 +1,139 @@
/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// See docs in ../ops/nn_ops.cc.
#include "tensorflow/core/kernels/nth_element_op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/util/work_sharder.h"
#include <vector>
#include <algorithm>
#include <iostream>
namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
template <typename Device, typename T>
class NthElementOp : public OpKernel {
public:
explicit NthElementOp(OpKernelConstruction* context) : OpKernel(context) {
OP_REQUIRES_OK(context, context->GetAttr("reverse", &reverse_));
}
void Compute(OpKernelContext* context) override {
// The second args is N, which must be a positive scalar.
const auto& n_in = context->input(1);
OP_REQUIRES(context, TensorShapeUtils::IsScalar(n_in.shape()),
errors::InvalidArgument("N must be scalar, got shape ",
n_in.shape().DebugString()));
int n = n_in.scalar<int32>()();
OP_REQUIRES(context, n >= 0,
errors::InvalidArgument("Need n >= 0, got ", n));
// The first args is input tensor, which must have 1 dimension at least.
const Tensor& input_in = context->input(0);
const int num_dims = input_in.dims();
OP_REQUIRES(context, num_dims >= 1,
errors::InvalidArgument("Input must be >= 1-D, got shape ",
input_in.shape().DebugString()));
// The last dimension of input tensor must be greater than N.
OP_REQUIRES(context, input_in.dim_size(num_dims-1) > n,
errors::InvalidArgument("Input must have at least n+1 columns"));
// std::nth_element only support the nth-smallest selection.
if (reverse_) {
n = input_in.dim_size(num_dims - 1) - n - 1;
}
// Assume input_shape is [d1,d2,...dk], and output_shape is [d1,d2...dk-1].
TensorShape out_shape;
for (int i = 0; i < num_dims-1; ++i) {
out_shape.AddDim(input_in.dim_size(i));
}
Tensor* output_tensor = nullptr;
OP_REQUIRES_OK(context,
context->allocate_output(0, out_shape, &output_tensor));
functor::NthElementFunctor<Device, T> nthElementFunc;
nthElementFunc(context, input_in, *output_tensor, n, reverse_);
}
private:
bool reverse_;
};
namespace functor {
template <typename T>
struct NthElementFunctor<CPUDevice, T> {
void operator() (OpKernelContext* context,
const Tensor& input_tensor,
Tensor& output_tensor,
int n,
bool reverse) {
const T* input = input_tensor.flat<T>().data();
T* output = output_tensor.flat<T>().data();
// Assume input_shape is [d1,d2,...dk], and output_shape is [d1,d2...dk-1],
// then num_rows = d1*d2...dk-1, last_dim = dk.
const int num_rows = output_tensor.NumElements();
const int last_dim = input_tensor.dim_size(input_tensor.dims()-1);
// Allocate each row to different shard.
auto SubNthElement = [&, input, output, last_dim, n](int start,
int limit) {
// std::nth_element would rearrange the array, so we need a new buffer.
std::vector<T> buf(last_dim);
for (int b = start; b < limit; ++b) {
// Copy from one row of elements to buffer
const T* input_start = input + b * last_dim;
const T* input_end = input + (b+1) * last_dim;
std::copy(input_start, input_end, buf.begin());
std::nth_element(buf.begin(), buf.begin()+n, buf.end());
// The element placed in the nth position is exactly the element that
// would occur in this position if the range was fully sorted.
output[b] = buf[n];
}
};
auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
// The average time complexity of partition-based nth_element (BFPRT) is O(n),
// althought the worst time complexity could be O(n^2).
// Here, 20 is a empirical factor of cost_per_unit.
Shard(worker_threads.num_threads, worker_threads.workers, num_rows,
20 * last_dim, SubNthElement);
}
};
} // namespace functor
#define REGISTER_NTHOP(T) \
REGISTER_KERNEL_BUILDER( \
Name("NthElement").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
NthElementOp<CPUDevice, T>)
TF_CALL_REAL_NUMBER_TYPES(REGISTER_NTHOP);
#undef REGISTER_NTHOP
} // end namespace tensorflow

View File

@ -0,0 +1,39 @@
/* Copyright 2016 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef TENSORFLOW_NTH_ELEMENT_OP_H_
#define TENSORFLOW_NTH_ELEMENT_OP_H_
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
namespace tensorflow {
namespace functor {
template <typename Device, typename T>
struct NthElementFunctor {
void operator() (OpKernelContext* context,
const Tensor& input_tensor,
Tensor& output_tensor,
int n);
};
} // namespace functor
} // namespace tensorflow
#endif // TENSORFLOW_NTH_ELEMENT_OP_H_

View File

@ -40,9 +40,9 @@ typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
#ifdef TENSORFLOW_USE_SYCL
typedef Eigen::SyclDevice SYCLDevice;
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
template <typename Device, typename T>
template <typename Device, typename T, typename Tpadding>
class PadOp : public OpKernel {
public:
explicit PadOp(OpKernelConstruction* context) : OpKernel(context) {}
@ -82,10 +82,11 @@ class PadOp : public OpKernel {
// Compute the shape of the output tensor, and allocate it.
TensorShape output_shape;
TTypes<int32>::ConstMatrix paddings = in1.matrix<int32>();
typename TTypes<Tpadding>::ConstMatrix paddings = in1.matrix<Tpadding>();
for (int d = 0; d < fixed_dims; ++d) {
const int32 before_d = paddings(d, 0); // Pad before existing elements.
const int32 after_d = paddings(d, 1); // Pad after existing elements.
const Tpadding before_d =
paddings(d, 0); // Pad before existing elements.
const Tpadding after_d = paddings(d, 1); // Pad after existing elements.
OP_REQUIRES(context, before_d >= 0 && after_d >= 0,
errors::InvalidArgument("Paddings must be non-negative: ",
before_d, " ", after_d));
@ -142,32 +143,47 @@ class PadOp : public OpKernel {
template <int Dims>
void Operate(OpKernelContext* context,
typename TTypes<T, Dims>::ConstTensor input,
TTypes<int32>::ConstMatrix paddings, T pad_value,
typename TTypes<Tpadding>::ConstMatrix paddings, T pad_value,
Tensor* output) {
CHECK_EQ(Dims, paddings.dimension(0));
CHECK_EQ(2, paddings.dimension(1));
Eigen::array<Eigen::IndexPair<int32>, Dims> paddings_array;
Eigen::array<Eigen::IndexPair<Tpadding>, Dims> paddings_array;
for (int i = 0; i < Dims; ++i) {
paddings_array[i] = {paddings(i, 0), paddings(i, 1)};
}
functor::Pad<Device, T, Dims> functor;
functor::Pad<Device, T, Tpadding, Dims> functor;
functor(context->eigen_device<Device>(), output->tensor<T, Dims>(), input,
paddings_array, pad_value);
}
};
#define REGISTER_KERNEL(type) \
REGISTER_KERNEL_BUILDER(Name("Pad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.HostMemory("paddings"), \
PadOp<CPUDevice, type>); \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<CPUDevice, type>);
#define REGISTER_KERNEL(type) \
REGISTER_KERNEL_BUILDER(Name("Pad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
PadOp<CPUDevice, type, int32>); \
REGISTER_KERNEL_BUILDER(Name("Pad") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
PadOp<CPUDevice, type, int64>); \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<CPUDevice, type, int32>); \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<CPUDevice, type, int64>);
TF_CALL_POD_TYPES(REGISTER_KERNEL);
#undef REGISTER_KERNEL
@ -177,11 +193,17 @@ TF_CALL_POD_TYPES(REGISTER_KERNEL);
namespace functor {
#define DECLARE_GPU_SPEC(T, Dims) \
template <> \
void Pad<GPUDevice, T, Dims>::operator()( \
void Pad<GPUDevice, T, int32, Dims>::operator()( \
const GPUDevice& d, typename TTypes<T, Dims>::Tensor output, \
typename TTypes<T, Dims>::ConstTensor input, \
Eigen::array<Eigen::IndexPair<int32>, Dims> paddings, T pad_value); \
extern template struct Pad<GPUDevice, T, Dims>;
extern template struct Pad<GPUDevice, T, int32, Dims>; \
template <> \
void Pad<GPUDevice, T, int64, Dims>::operator()( \
const GPUDevice& d, typename TTypes<T, Dims>::Tensor output, \
typename TTypes<T, Dims>::ConstTensor input, \
Eigen::array<Eigen::IndexPair<int64>, Dims> paddings, T pad_value); \
extern template struct Pad<GPUDevice, T, int64, Dims>;
#define DECLARE_GPU_SPECS(T) \
DECLARE_GPU_SPEC(T, 0); \
@ -202,14 +224,27 @@ TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
PadOp<GPUDevice, T>); \
PadOp<GPUDevice, T, int32>); \
REGISTER_KERNEL_BUILDER(Name("Pad") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
PadOp<GPUDevice, T, int64>); \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<GPUDevice, T>)
PadOp<GPUDevice, T, int32>) \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_GPU) \
.TypeConstraint<T>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<GPUDevice, T, int64>)
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNEL);
@ -223,7 +258,15 @@ REGISTER_KERNEL_BUILDER(Name("Pad")
.HostMemory("input")
.HostMemory("paddings")
.HostMemory("output"),
PadOp<CPUDevice, int32>);
PadOp<CPUDevice, int32, int32>);
REGISTER_KERNEL_BUILDER(Name("Pad")
.Device(DEVICE_GPU)
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tpaddings")
.HostMemory("input")
.HostMemory("paddings")
.HostMemory("output"),
PadOp<CPUDevice, int32, int64>);
REGISTER_KERNEL_BUILDER(Name("PadV2")
.Device(DEVICE_GPU)
.TypeConstraint<int32>("T")
@ -232,7 +275,16 @@ REGISTER_KERNEL_BUILDER(Name("PadV2")
.HostMemory("paddings")
.HostMemory("constant_values")
.HostMemory("output"),
PadOp<CPUDevice, int32>);
PadOp<CPUDevice, int32, int32>);
REGISTER_KERNEL_BUILDER(Name("PadV2")
.Device(DEVICE_GPU)
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tpaddings")
.HostMemory("input")
.HostMemory("paddings")
.HostMemory("constant_values")
.HostMemory("output"),
PadOp<CPUDevice, int32, int64>);
#endif
#ifdef TENSORFLOW_USE_SYCL
@ -243,14 +295,27 @@ REGISTER_KERNEL_BUILDER(Name("PadV2")
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings"), \
PadOp<SYCLDevice, T>); \
PadOp<SYCLDevice, T, int32>); \
REGISTER_KERNEL_BUILDER(Name("Pad") \
.Device(DEVICE_SYCL) \
.TypeConstraint<T>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings"), \
PadOp<SYCLDevice, T, int64>); \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_SYCL) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tpaddings") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<SYCLDevice, T>)
PadOp<SYCLDevice, T, int32>) \
REGISTER_KERNEL_BUILDER(Name("PadV2") \
.Device(DEVICE_SYCL) \
.TypeConstraint<T>("T") \
.TypeConstraint<int64>("Tpaddings") \
.HostMemory("paddings") \
.HostMemory("constant_values"), \
PadOp<SYCLDevice, T, int64>)
TF_CALL_GPU_NUMBER_TYPES_NO_HALF(REGISTER_SYCL_KERNEL);
REGISTER_KERNEL_BUILDER(Name("Pad")
@ -260,7 +325,15 @@ REGISTER_KERNEL_BUILDER(Name("Pad")
.HostMemory("input")
.HostMemory("paddings")
.HostMemory("output"),
PadOp<CPUDevice, int32>);
PadOp<CPUDevice, int32, int32>);
REGISTER_KERNEL_BUILDER(Name("Pad")
.Device(DEVICE_SYCL)
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tpaddings")
.HostMemory("input")
.HostMemory("paddings")
.HostMemory("output"),
PadOp<CPUDevice, int32, int64>);
REGISTER_KERNEL_BUILDER(Name("PadV2")
.Device(DEVICE_SYCL)
.TypeConstraint<int32>("T")
@ -269,8 +342,17 @@ REGISTER_KERNEL_BUILDER(Name("PadV2")
.HostMemory("paddings")
.HostMemory("constant_values")
.HostMemory("output"),
PadOp<CPUDevice, int32>);
PadOp<CPUDevice, int32, int32>);
REGISTER_KERNEL_BUILDER(Name("PadV2")
.Device(DEVICE_SYCL)
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tpaddings")
.HostMemory("input")
.HostMemory("paddings")
.HostMemory("constant_values")
.HostMemory("output"),
PadOp<CPUDevice, int32, int64>);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // end namespace tensorflow

View File

@ -25,13 +25,13 @@ namespace tensorflow {
namespace functor {
// Functor used by PadOp to do the computations.
template <typename Device, typename T, int Dims>
template <typename Device, typename T, typename Tpadding, int Dims>
struct Pad {
// Pad "input" into "output", as specified by "paddings" and "pad_value".
// See pad_op.cc for details.
void operator()(const Device& d, typename TTypes<T, Dims>::Tensor output,
typename TTypes<T, Dims>::ConstTensor input,
Eigen::array<Eigen::IndexPair<int32>, Dims> paddings,
Eigen::array<Eigen::IndexPair<Tpadding>, Dims> paddings,
T pad_value) {
if (Eigen::internal::is_same<Device, Eigen::GpuDevice>::value &&
(output.size() <= std::numeric_limits<int32>::max())) {
@ -42,12 +42,12 @@ struct Pad {
}
};
template <typename Device, typename T>
struct Pad<Device, T, 0> {
template <typename Device, typename T, typename Tpadding>
struct Pad<Device, T, Tpadding, 0> {
// In the scalar case we simply copy the input.
void operator()(const Device& d, typename TTypes<T, 0>::Tensor output,
typename TTypes<T, 0>::ConstTensor input,
Eigen::array<Eigen::IndexPair<int32>, 0>, T) {
Eigen::array<Eigen::IndexPair<Tpadding>, 0>, T) {
output.device(d) = input;
}
};

View File

@ -26,14 +26,18 @@ namespace tensorflow {
typedef Eigen::GpuDevice GPUDevice;
// Definition of the GPU implementations declared in pad_op.cc.
#define DEFINE_GPU_SPECS(T) \
template struct functor::Pad<GPUDevice, T, 0>; \
template struct functor::Pad<GPUDevice, T, 1>; \
template struct functor::Pad<GPUDevice, T, 2>; \
template struct functor::Pad<GPUDevice, T, 3>; \
template struct functor::Pad<GPUDevice, T, 4>; \
template struct functor::Pad<GPUDevice, T, 5>; \
template struct functor::Pad<GPUDevice, T, 6>;
#define DEFINE_GPU_PAD_SPECS(T, Tpadding) \
template struct functor::Pad<GPUDevice, T, Tpadding, 0>; \
template struct functor::Pad<GPUDevice, T, Tpadding, 1>; \
template struct functor::Pad<GPUDevice, T, Tpadding, 2>; \
template struct functor::Pad<GPUDevice, T, Tpadding, 3>; \
template struct functor::Pad<GPUDevice, T, Tpadding, 4>; \
template struct functor::Pad<GPUDevice, T, Tpadding, 5>; \
template struct functor::Pad<GPUDevice, T, Tpadding, 6>;
#define DEFINE_GPU_SPECS(T) \
DEFINE_GPU_PAD_SPECS(T, int32) \
DEFINE_GPU_PAD_SPECS(T, int64)
TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);

View File

@ -22,7 +22,13 @@ REGISTER_KERNEL_BUILDER(
.TypeConstraint<int32>("Tidx")
.Device(DEVICE_CPU)
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, bool, Eigen::internal::AndReducer>);
ReductionOp<CPUDevice, bool, int32, Eigen::internal::AndReducer>);
REGISTER_KERNEL_BUILDER(
Name("All")
.TypeConstraint<int64>("Tidx")
.Device(DEVICE_CPU)
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, bool, int64, Eigen::internal::AndReducer>);
#if GOOGLE_CUDA
REGISTER_KERNEL_BUILDER(
@ -30,7 +36,13 @@ REGISTER_KERNEL_BUILDER(
.TypeConstraint<int32>("Tidx")
.Device(DEVICE_GPU)
.HostMemory("reduction_indices"),
ReductionOp<GPUDevice, bool, Eigen::internal::AndReducer>);
ReductionOp<GPUDevice, bool, int32, Eigen::internal::AndReducer>);
REGISTER_KERNEL_BUILDER(
Name("All")
.TypeConstraint<int64>("Tidx")
.Device(DEVICE_GPU)
.HostMemory("reduction_indices"),
ReductionOp<GPUDevice, bool, int64, Eigen::internal::AndReducer>);
#endif
} // namespace tensorflow

View File

@ -22,7 +22,13 @@ REGISTER_KERNEL_BUILDER(
.TypeConstraint<int32>("Tidx")
.Device(DEVICE_CPU)
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, bool, Eigen::internal::OrReducer>);
ReductionOp<CPUDevice, bool, int32, Eigen::internal::OrReducer>);
REGISTER_KERNEL_BUILDER(
Name("Any")
.TypeConstraint<int64>("Tidx")
.Device(DEVICE_CPU)
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, bool, int64, Eigen::internal::OrReducer>);
#if GOOGLE_CUDA
REGISTER_KERNEL_BUILDER(
@ -30,7 +36,13 @@ REGISTER_KERNEL_BUILDER(
.TypeConstraint<int32>("Tidx")
.Device(DEVICE_GPU)
.HostMemory("reduction_indices"),
ReductionOp<GPUDevice, bool, Eigen::internal::OrReducer>);
ReductionOp<GPUDevice, bool, int32, Eigen::internal::OrReducer>);
REGISTER_KERNEL_BUILDER(
Name("Any")
.TypeConstraint<int64>("Tidx")
.Device(DEVICE_GPU)
.HostMemory("reduction_indices"),
ReductionOp<GPUDevice, bool, int64, Eigen::internal::OrReducer>);
#endif
} // namespace tensorflow

View File

@ -57,13 +57,12 @@ gtl::InlinedVector<int32, 8> ReductionHelper::permutation() {
return perm;
}
Status ReductionHelper::Simplify(const Tensor& data, const Tensor& axis,
const bool keep_dims) {
// bitmap[i] indicates whether to reduce data along i-th axis.
gtl::InlinedVector<bool, 4> bitmap(data.dims(), false);
auto axis_vec = axis.flat<int32>();
template <typename Tperm>
Status SimplifyHelper(const Tensor& data, const Tensor& axis,
gtl::InlinedVector<bool, 4>& bitmap) {
auto axis_vec = axis.flat<Tperm>();
for (int64 i = 0; i < axis.NumElements(); ++i) {
int32 index = axis_vec(i);
Tperm index = axis_vec(i);
if (index < -data.dims() || index >= data.dims()) {
return errors::InvalidArgument("Invalid reduction dimension (", index,
" for input with ", data.dims(),
@ -72,7 +71,18 @@ Status ReductionHelper::Simplify(const Tensor& data, const Tensor& axis,
index = (index + data.dims()) % data.dims();
bitmap[index] = true;
}
return Status::OK();
}
Status ReductionHelper::Simplify(const Tensor& data, const Tensor& axis,
const bool keep_dims) {
// bitmap[i] indicates whether to reduce data along i-th axis.
gtl::InlinedVector<bool, 4> bitmap(data.dims(), false);
if (axis.dtype() == DT_INT32) {
TF_RETURN_IF_ERROR(SimplifyHelper<int32>(data, axis, bitmap));
} else {
TF_RETURN_IF_ERROR(SimplifyHelper<int64>(data, axis, bitmap));
}
// Output tensor's dim sizes.
out_shape_.clear();
for (int i = 0; i < data.dims(); ++i) {

View File

@ -25,6 +25,7 @@ limitations under the License.
#include "third_party/eigen3/Eigen/Core"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/numeric_op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
@ -42,7 +43,7 @@ typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
#ifdef TENSORFLOW_USE_SYCL
typedef Eigen::SyclDevice SYCLDevice;
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
template <typename Device>
struct Constants {
@ -68,11 +69,13 @@ struct ConstantsBase {
const Eigen::IndexList<Eigen::type2index<1>> kOne;
const Eigen::IndexList<Eigen::type2index<0>, Eigen::type2index<2>> kZeroTwo;
};
template<> struct Constants<CPUDevice> : ConstantsBase{};
template <>
struct Constants<CPUDevice> : ConstantsBase {};
#ifdef TENSORFLOW_USE_SYCL
template<> struct Constants<SYCLDevice> : ConstantsBase{};
#endif // TENSORFLOW_USE_SYCL
#endif // EIGEN_HAS_INDEX_LIST
template <>
struct Constants<SYCLDevice> : ConstantsBase {};
#endif // TENSORFLOW_USE_SYCL
#endif // EIGEN_HAS_INDEX_LIST
class ReductionHelper {
public:
@ -131,12 +134,13 @@ class ReductionHelper {
// For operations where the output is a reduction function along some
// dimensions of the input.
template <typename Device, class T, typename Reducer>
template <typename Device, class T, typename Tperm, typename Reducer>
class ReductionOp : public OpKernel {
public:
explicit ReductionOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
const DataType dt = DataTypeToEnum<T>::v();
OP_REQUIRES_OK(ctx, ctx->MatchSignature({dt, DT_INT32}, {dt}));
const DataType pt = DataTypeToEnum<Tperm>::v();
OP_REQUIRES_OK(ctx, ctx->MatchSignature({dt, pt}, {dt}));
OP_REQUIRES_OK(ctx, ctx->GetAttr("keep_dims", &keep_dims_));
}
@ -266,20 +270,19 @@ struct ReduceFunctorBase {
}
template <typename OUT_T>
static void FillIdentity(const Device& d, OUT_T out,
const Reducer& reducer) {
static void FillIdentity(const Device& d, OUT_T out, const Reducer& reducer) {
FillIdentityEigenImpl(d, out, reducer);
}
};
template <typename Reducer>
struct ReduceFunctor<CPUDevice, Reducer>
: ReduceFunctorBase<CPUDevice, Reducer>{};
: ReduceFunctorBase<CPUDevice, Reducer> {};
#if TENSORFLOW_USE_SYCL
template <typename Reducer>
struct ReduceFunctor<SYCLDevice, Reducer>
: ReduceFunctorBase<SYCLDevice, Reducer>{};
#endif // TENSORFLOW_USE_SYCL
: ReduceFunctorBase<SYCLDevice, Reducer> {};
#endif // TENSORFLOW_USE_SYCL
} // namespace functor
} // namespace tensorflow

View File

@ -17,26 +17,39 @@ limitations under the License.
namespace tensorflow {
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, Eigen::internal::MaxReducer<type>>);
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, int32, Eigen::internal::MaxReducer<type>>); \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ReductionOp<CPUDevice, type, int64, Eigen::internal::MaxReducer<type>>);
TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, Eigen::internal::MaxReducer<type>>);
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int32, Eigen::internal::MaxReducer<type>>); \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int64, Eigen::internal::MaxReducer<type>>);
REGISTER_GPU_KERNELS(float);
REGISTER_GPU_KERNELS(double);
REGISTER_GPU_KERNELS(int64);
@ -52,21 +65,37 @@ REGISTER_KERNEL_BUILDER(
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int32>("Tidx"),
ReductionOp<CPUDevice, int32, Eigen::internal::MaxReducer<int32>>);
ReductionOp<CPUDevice, int32, int32, Eigen::internal::MaxReducer<int32>>);
REGISTER_KERNEL_BUILDER(
Name("Max")
.Device(DEVICE_GPU)
.HostMemory("reduction_indices")
.HostMemory("input")
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tidx"),
ReductionOp<CPUDevice, int32, int64, Eigen::internal::MaxReducer<int32>>);
#undef REGISTER_GPU_KERNELS
#endif
#ifdef TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Max") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, Eigen::internal::MaxReducer<type>>);
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Max") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int32, \
Eigen::internal::MaxReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Max") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int64, \
Eigen::internal::MaxReducer<type>>);
REGISTER_SYCL_KERNELS(float);
REGISTER_SYCL_KERNELS(double);
@ -78,8 +107,17 @@ REGISTER_KERNEL_BUILDER(
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int32>("Tidx"),
ReductionOp<CPUDevice, int32, Eigen::internal::MaxReducer<int32>>);
ReductionOp<CPUDevice, int32, int32, Eigen::internal::MaxReducer<int32>>);
REGISTER_KERNEL_BUILDER(
Name("Max")
.Device(DEVICE_SYCL)
.HostMemory("reduction_indices")
.HostMemory("input")
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tidx"),
ReductionOp<CPUDevice, int32, int64, Eigen::internal::MaxReducer<int32>>);
#undef REGISTER_SYCL_KERNELS
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -17,26 +17,39 @@ limitations under the License.
namespace tensorflow {
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Mean") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, Eigen::internal::MeanReducer<type>>);
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Mean") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, int32, \
Eigen::internal::MeanReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Mean") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ReductionOp<CPUDevice, type, int64, \
Eigen::internal::MeanReducer<type>>);
TF_CALL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Mean") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, Eigen::internal::MeanReducer<type>>);
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Mean") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int32, \
Eigen::internal::MeanReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Mean") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int64, \
Eigen::internal::MeanReducer<type>>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
TF_CALL_complex64(REGISTER_GPU_KERNELS);
TF_CALL_complex128(REGISTER_GPU_KERNELS);
@ -45,17 +58,24 @@ TF_CALL_complex128(REGISTER_GPU_KERNELS);
#endif
#ifdef TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Mean") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, Eigen::internal::MeanReducer<type>>);
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Mean") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int32, \
Eigen::internal::MeanReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Mean") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int64, \
Eigen::internal::MeanReducer<type>>);
REGISTER_SYCL_KERNELS(float);
REGISTER_SYCL_KERNELS(double);
#undef REGISTER_SYCL_KERNELS
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -17,26 +17,39 @@ limitations under the License.
namespace tensorflow {
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, Eigen::internal::MinReducer<type>>);
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, int32, Eigen::internal::MinReducer<type>>); \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ReductionOp<CPUDevice, type, int64, Eigen::internal::MinReducer<type>>);
TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, Eigen::internal::MinReducer<type>>);
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int32, Eigen::internal::MinReducer<type>>); \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int64, Eigen::internal::MinReducer<type>>);
REGISTER_GPU_KERNELS(float);
REGISTER_GPU_KERNELS(double);
@ -51,21 +64,37 @@ REGISTER_KERNEL_BUILDER(
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int32>("Tidx"),
ReductionOp<CPUDevice, int32, Eigen::internal::MinReducer<int32>>);
ReductionOp<CPUDevice, int32, int32, Eigen::internal::MinReducer<int32>>);
REGISTER_KERNEL_BUILDER(
Name("Min")
.Device(DEVICE_GPU)
.HostMemory("reduction_indices")
.HostMemory("input")
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tidx"),
ReductionOp<CPUDevice, int32, int64, Eigen::internal::MinReducer<int32>>);
#undef REGISTER_GPU_KERNELS
#endif
#ifdef TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Min") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, Eigen::internal::MinReducer<type>>);
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Min") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int32, \
Eigen::internal::MinReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Min") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int64, \
Eigen::internal::MinReducer<type>>);
REGISTER_SYCL_KERNELS(float);
REGISTER_SYCL_KERNELS(double);
@ -77,8 +106,17 @@ REGISTER_KERNEL_BUILDER(
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int32>("Tidx"),
ReductionOp<CPUDevice, int32, Eigen::internal::MinReducer<int32>>);
ReductionOp<CPUDevice, int32, int32, Eigen::internal::MinReducer<int32>>);
REGISTER_KERNEL_BUILDER(
Name("Min")
.Device(DEVICE_SYCL)
.HostMemory("reduction_indices")
.HostMemory("input")
.HostMemory("output")
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tidx"),
ReductionOp<CPUDevice, int32, int64, Eigen::internal::MinReducer<int32>>);
#undef REGISTER_SYCL_KERNELS
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -17,26 +17,39 @@ limitations under the License.
namespace tensorflow {
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Prod") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, Eigen::internal::ProdReducer<type>>);
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Prod") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, int32, \
Eigen::internal::ProdReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Prod") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ReductionOp<CPUDevice, type, int64, \
Eigen::internal::ProdReducer<type>>);
TF_CALL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Prod") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, Eigen::internal::ProdReducer<type>>);
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Prod") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int32, \
Eigen::internal::ProdReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Prod") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int64, \
Eigen::internal::ProdReducer<type>>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
TF_CALL_int32(REGISTER_GPU_KERNELS);
TF_CALL_complex64(REGISTER_GPU_KERNELS);
@ -46,18 +59,25 @@ TF_CALL_complex128(REGISTER_GPU_KERNELS);
#endif
#ifdef TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Prod") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, Eigen::internal::ProdReducer<type>>);
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Prod") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int32, \
Eigen::internal::ProdReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Prod") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int64, \
Eigen::internal::ProdReducer<type>>);
REGISTER_SYCL_KERNELS(int32);
REGISTER_SYCL_KERNELS(float);
REGISTER_SYCL_KERNELS(double);
#undef REGISTER_SYCL_KERNELS
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -17,26 +17,39 @@ limitations under the License.
namespace tensorflow {
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, Eigen::internal::SumReducer<type>>);
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ReductionOp<CPUDevice, type, int32, Eigen::internal::SumReducer<type>>); \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ReductionOp<CPUDevice, type, int64, Eigen::internal::SumReducer<type>>);
TF_CALL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, Eigen::internal::SumReducer<type>>);
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int32, Eigen::internal::SumReducer<type>>); \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<GPUDevice, type, int64, Eigen::internal::SumReducer<type>>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
TF_CALL_complex64(REGISTER_GPU_KERNELS);
TF_CALL_complex128(REGISTER_GPU_KERNELS);
@ -53,19 +66,35 @@ REGISTER_KERNEL_BUILDER(
.HostMemory("input")
.HostMemory("output")
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, int32, Eigen::internal::SumReducer<int32>>);
ReductionOp<CPUDevice, int32, int32, Eigen::internal::SumReducer<int32>>);
REGISTER_KERNEL_BUILDER(
Name("Sum")
.Device(DEVICE_GPU)
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tidx")
.HostMemory("input")
.HostMemory("output")
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, int32, int64, Eigen::internal::SumReducer<int32>>);
#endif
#ifdef TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Sum") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, Eigen::internal::SumReducer<type>>);
#define REGISTER_SYCL_KERNELS(type) \
REGISTER_KERNEL_BUILDER(Name("Sum") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int32, \
Eigen::internal::SumReducer<type>>); \
REGISTER_KERNEL_BUILDER(Name("Sum") \
.Device(DEVICE_SYCL) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("reduction_indices"), \
ReductionOp<SYCLDevice, type, int64, \
Eigen::internal::SumReducer<type>>);
REGISTER_SYCL_KERNELS(float);
REGISTER_SYCL_KERNELS(double);
@ -77,8 +106,17 @@ REGISTER_KERNEL_BUILDER(
.HostMemory("input")
.HostMemory("output")
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, int32, Eigen::internal::SumReducer<int32>>);
ReductionOp<CPUDevice, int32, int32, Eigen::internal::SumReducer<int32>>);
REGISTER_KERNEL_BUILDER(
Name("Sum")
.Device(DEVICE_SYCL)
.TypeConstraint<int32>("T")
.TypeConstraint<int64>("Tidx")
.HostMemory("input")
.HostMemory("output")
.HostMemory("reduction_indices"),
ReductionOp<CPUDevice, int32, int64, Eigen::internal::SumReducer<int32>>);
#undef REGISTER_SYCL_KERNELS
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
} // namespace tensorflow

View File

@ -20,7 +20,6 @@ limitations under the License.
#include <algorithm>
#include <array>
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor.h"
@ -29,6 +28,7 @@ limitations under the License.
#include "tensorflow/core/kernels/image_resizer_state.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/platform/logging.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
namespace tensorflow {
namespace {
@ -235,6 +235,7 @@ inline void interpolate_with_caching(
const T* input_b_ptr = input_data.data();
float* output_y_ptr = output_data.data();
std::vector<float> cached_value(num_channels == 3 ? 0 : 4 * num_channels, 0);
for (int64 b = 0; b < resizer_state.batch_size;
++b, input_b_ptr += in_batch_width) {
@ -248,6 +249,7 @@ inline void interpolate_with_caching(
const T* y_ptr_1 = input_b_ptr + y_wai.index_1 * in_row_width;
const T* y_ptr_2 = input_b_ptr + y_wai.index_2 * in_row_width;
const T* y_ptr_3 = input_b_ptr + y_wai.index_3 * in_row_width;
if (num_channels == 3) {
// Manually unroll case of 3 channels.
float cached_value_0[4] = {0};
@ -330,48 +332,61 @@ inline void interpolate_with_caching(
x_wai.weight_2, x_wai.weight_3);
}
} else {
for (int64 c = 0; c < num_channels; ++c) {
float cached_value[4] = {0};
for (int64 x = 0; x < resizer_state.out_width; ++x) {
const WeightsAndIndices& x_wai = x_wais[x];
// Shift values in cached_value to fill first 'advance' values.
switch (x_wai.advance) {
case 3:
cached_value[0] = cached_value[1];
cached_value[1] = cached_value[2];
cached_value[2] = cached_value[3];
break;
case 2:
cached_value[0] = cached_value[2];
cached_value[1] = cached_value[3];
break;
case 1: {
cached_value[0] = cached_value[3];
break;
for (int64 x = 0; x < resizer_state.out_width; ++x) {
const WeightsAndIndices& x_wai = x_wais[x];
// Shift values in cached_value to fill first 'advance' values.
switch (x_wai.advance) {
case 3:
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 0] = cached_value[4 * c + 1];
cached_value[4 * c + 1] = cached_value[4 * c + 2];
cached_value[4 * c + 2] = cached_value[4 * c + 3];
}
break;
case 2:
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 0] = cached_value[4 * c + 2];
cached_value[4 * c + 1] = cached_value[4 * c + 3];
}
break;
case 1: {
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 0] = cached_value[4 * c + 3];
}
break;
}
}
// Set the remaining '4-advance' values by computing.
switch (x_wai.advance) {
case 0:
cached_value[0] = ComputeYInterpolation(
// Set the remaining '4-advance' values by computing.
switch (x_wai.advance) {
case 0:
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 0] = ComputeYInterpolation(
0, c, y_wai, y_ptr_0, y_ptr_1, y_ptr_2, y_ptr_3, x_wai);
TF_FALLTHROUGH_INTENDED;
case 1:
cached_value[1] = ComputeYInterpolation(
}
TF_FALLTHROUGH_INTENDED;
case 1:
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 1] = ComputeYInterpolation(
1, c, y_wai, y_ptr_0, y_ptr_1, y_ptr_2, y_ptr_3, x_wai);
TF_FALLTHROUGH_INTENDED;
case 2:
cached_value[2] = ComputeYInterpolation(
}
TF_FALLTHROUGH_INTENDED;
case 2:
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 2] = ComputeYInterpolation(
2, c, y_wai, y_ptr_0, y_ptr_1, y_ptr_2, y_ptr_3, x_wai);
TF_FALLTHROUGH_INTENDED;
case 3:
cached_value[3] = ComputeYInterpolation(
}
TF_FALLTHROUGH_INTENDED;
case 3:
for (int64 c = 0; c < num_channels; ++c) {
cached_value[4 * c + 3] = ComputeYInterpolation(
3, c, y_wai, y_ptr_0, y_ptr_1, y_ptr_2, y_ptr_3, x_wai);
break;
}
}
break;
}
for (int64 c = 0; c < num_channels; ++c) {
output_y_ptr[x * num_channels + c] =
Compute(cached_value, x_wai.weight_0, x_wai.weight_1,
Compute(&cached_value[4 * c], x_wai.weight_0, x_wai.weight_1,
x_wai.weight_2, x_wai.weight_3);
}
}

View File

@ -251,14 +251,15 @@ TEST_F(ResizeBicubicOpTest, TestAreaRandomDataSeveralInputsSizes4Channels) {
RunManyRandomTests(4);
}
static Graph* ResizeBicubic(int batch_size, int size, int channels) {
static Graph* ResizeBicubic(int batch_size, int size, int channels,
float scale_y = 0.3, float scale_x = 0.7) {
Graph* g = new Graph(OpRegistry::Global());
Tensor input(DT_FLOAT, TensorShape({batch_size, size, size, channels}));
input.flat<float>().setRandom();
Tensor shape(DT_INT32, TensorShape({2}));
auto shape_t = shape.flat<int32>();
shape_t(0) = 0.3 * size;
shape_t(1) = 0.7 * size;
shape_t(0) = scale_y * size;
shape_t(1) = scale_x * size;
test::graph::Binary(g, "ResizeBicubic", test::graph::Constant(g, input),
test::graph::Constant(g, shape));
return g;
@ -285,4 +286,17 @@ BM_ResizeBicubicDev(32, 128, 3);
BM_ResizeBicubicDev(32, 512, 3);
BM_ResizeBicubicDev(32, 1024, 3);
#define BM_ResizeBicubicExpand(BATCH, SIZE, CHANNELS) \
static void BM_ResizeBicubicExpand##_##BATCH##_##SIZE##_##CHANNELS(int iters) { \
testing::ItemsProcessed(static_cast<int64>(iters) * BATCH * SIZE * SIZE * \
CHANNELS * 8 * 8); \
test::Benchmark("cpu", ResizeBicubic(BATCH, SIZE, CHANNELS, 8, 8)) \
.Run(iters); \
} \
BENCHMARK(BM_ResizeBicubicExpand##_##BATCH##_##SIZE##_##CHANNELS);
BM_ResizeBicubicExpand(12, 48, 1);
BM_ResizeBicubicExpand(12, 48, 3);
BM_ResizeBicubicExpand(12, 48, 40);
} // end namespace tensorflow

View File

@ -175,6 +175,7 @@ class ReverseSequenceOp : public OpKernel {
REGISTER_REVERSE_SEQUENCE(type, int64);
TF_CALL_NUMBER_TYPES(REGISTER_REVERSE_SEQUENCE_LEN);
TF_CALL_bool(REGISTER_REVERSE_SEQUENCE_LEN);
#if GOOGLE_CUDA
@ -200,6 +201,7 @@ namespace functor {
DECLARE_GPU_SPEC_LEN(T, 5);
TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
TF_CALL_bool(DECLARE_GPU_SPECS);
} // namespace functor
@ -215,6 +217,7 @@ TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
REGISTER_REVERSE_SEQUENCE_GPU(type, int64);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_REVERSE_SEQUENCE_GPU_LEN);
TF_CALL_bool(REGISTER_REVERSE_SEQUENCE_GPU_LEN);
#undef REGISTER_REVERSE_SEQUENCE_GPU

View File

@ -39,6 +39,7 @@ typedef Eigen::GpuDevice GPUDevice;
DEFINE_GPU_SPEC_LEN(T, 5);
TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
TF_CALL_bool(DEFINE_GPU_SPECS);
} // end namespace tensorflow

View File

@ -35,7 +35,7 @@ namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
template <typename Device, class T, typename Reducer>
template <typename Device, class T, typename Reducer, typename Tidx>
class ScanOp : public OpKernel {
public:
explicit ScanOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
@ -51,8 +51,9 @@ class ScanOp : public OpKernel {
errors::InvalidArgument("ScanOp: axis must be a scalar, not ",
tensor_axis.shape().DebugString()));
const int axis_arg = internal::SubtleMustCopy(tensor_axis.scalar<int>()());
const int axis = (axis_arg < 0) ? input.dims() + axis_arg : axis_arg;
const Tidx axis_arg =
internal::SubtleMustCopy(tensor_axis.scalar<Tidx>()());
const Tidx axis = (axis_arg < 0) ? input.dims() + axis_arg : axis_arg;
OP_REQUIRES(ctx, FastBoundsCheck(axis, input.dims()),
errors::InvalidArgument(
"ScanOp: Expected scan axis in the range [", -input.dims(),
@ -70,11 +71,11 @@ class ScanOp : public OpKernel {
// Dim reduction.
int64 reduced_shape[3] = {1, 1, 1};
for (int i = 0; i < axis; ++i) {
for (Tidx i = 0; i < axis; ++i) {
reduced_shape[0] *= input.dim_size(i);
}
reduced_shape[1] = input.dim_size(axis);
for (int i = axis + 1; i < input.dims(); ++i) {
for (Tidx i = axis + 1; i < input.dims(); ++i) {
reduced_shape[2] *= input.dim_size(i);
}
@ -112,51 +113,76 @@ TF_CALL_GPU_NUMBER_TYPES(DECLARE_FOR_ALL_REDUCERS);
} // namespace functor
#endif // GOOGLE_CUDA
// Register Cumsum kernels
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumsum") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ScanOp<CPUDevice, type, Eigen::internal::SumReducer<type>>)
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumsum") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ScanOp<CPUDevice, type, Eigen::internal::SumReducer<type>, int32>) \
REGISTER_KERNEL_BUILDER( \
Name("Cumsum") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ScanOp<CPUDevice, type, Eigen::internal::SumReducer<type>, int64>)
TF_CALL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumsum") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("axis"), \
ScanOp<GPUDevice, type, Eigen::internal::SumReducer<type>>)
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumsum") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("axis"), \
ScanOp<GPUDevice, type, Eigen::internal::SumReducer<type>, int32>) \
REGISTER_KERNEL_BUILDER( \
Name("Cumsum") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("axis"), \
ScanOp<GPUDevice, type, Eigen::internal::SumReducer<type>, int64>)
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS)
#undef REGISTER_GPU_KERNELS
#endif // GOOGLE_CUDA
// Register Cumprod kernels
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumprod") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ScanOp<CPUDevice, type, Eigen::internal::ProdReducer<type>>)
#define REGISTER_CPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumprod") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx"), \
ScanOp<CPUDevice, type, Eigen::internal::ProdReducer<type>, int32>) \
REGISTER_KERNEL_BUILDER( \
Name("Cumprod") \
.Device(DEVICE_CPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx"), \
ScanOp<CPUDevice, type, Eigen::internal::ProdReducer<type>, int64>)
TF_CALL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
#undef REGISTER_CPU_KERNELS
#if GOOGLE_CUDA
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumprod") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("axis"), \
ScanOp<GPUDevice, type, Eigen::internal::ProdReducer<type>>)
#define REGISTER_GPU_KERNELS(type) \
REGISTER_KERNEL_BUILDER( \
Name("Cumprod") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("axis"), \
ScanOp<GPUDevice, type, Eigen::internal::ProdReducer<type>, int32>) \
REGISTER_KERNEL_BUILDER( \
Name("Cumprod") \
.Device(DEVICE_GPU) \
.TypeConstraint<type>("T") \
.TypeConstraint<int64>("Tidx") \
.HostMemory("axis"), \
ScanOp<GPUDevice, type, Eigen::internal::ProdReducer<type>, int64>)
TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS)
#undef REGISTER_GPU_KERNELS
#endif // GOOGLE_CUDA

View File

@ -96,7 +96,7 @@ TF_CALL_double(REGISTER_SYCL_KERNEL);
TF_CALL_int32(REGISTER_SYCL_KERNEL);
TF_CALL_int64(REGISTER_SYCL_KERNEL);
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#endif // TENSORFLOW_USE_SYCL
TF_CALL_float(REGISTER_CPU_KERNEL);
TF_CALL_double(REGISTER_CPU_KERNEL);
@ -116,7 +116,7 @@ TF_CALL_int64(REGISTER_GPU_KERNEL);
#undef REGISTER_CPU_KERNEL
#undef REGISTER_GPU_KERNEL
template <typename T>
template <typename T, typename Tnum>
class LinSpaceOp : public OpKernel {
public:
explicit LinSpaceOp(OpKernelConstruction* context) : OpKernel(context) {}
@ -136,7 +136,7 @@ class LinSpaceOp : public OpKernel {
num_in.shape().DebugString()));
const T start = start_in.scalar<T>()();
const T stop = stop_in.scalar<T>()();
const int32 num = num_in.scalar<int32>()();
const Tnum num = num_in.scalar<Tnum>()();
OP_REQUIRES(context, num > 0,
errors::InvalidArgument("Requires num > 0: ", num));
Tensor* out = nullptr;
@ -147,34 +147,46 @@ class LinSpaceOp : public OpKernel {
flat(0) = start;
} else {
const T step = (stop - start) / (num - 1);
for (int32 i = 0; i < num; ++i) flat(i) = start + step * i;
for (Tnum i = 0; i < num; ++i) flat(i) = start + step * i;
}
}
};
#define REGISTER_KERNEL(DEV, T) \
REGISTER_KERNEL_BUILDER(Name("LinSpace") \
.Device(DEV) \
.TypeConstraint<T>("T") \
.TypeConstraint<int32>("Tidx") \
.HostMemory("start") \
.HostMemory("stop") \
.HostMemory("num") \
.HostMemory("output"), \
LinSpaceOp<T>);
#define REGISTER_CPU_KERNEL(T) REGISTER_KERNEL(DEVICE_CPU, T)
#define REGISTER_KERNEL(DEV, T, Tidx) \
REGISTER_KERNEL_BUILDER(Name("LinSpace") \
.Device(DEV) \
.TypeConstraint<T>("T") \
.TypeConstraint<Tidx>("Tidx") \
.HostMemory("start") \
.HostMemory("stop") \
.HostMemory("num") \
.HostMemory("output"), \
LinSpaceOp<T, Tidx>);
#define REGISTER_KERNEL_ALL_NUMS(dev, T) \
REGISTER_KERNEL(dev, T, int32); \
REGISTER_KERNEL(dev, T, int64)
#define REGISTER_CPU_KERNEL(T) REGISTER_KERNEL_ALL_NUMS(DEVICE_CPU, T)
TF_CALL_float(REGISTER_CPU_KERNEL);
TF_CALL_double(REGISTER_CPU_KERNEL);
// NOTE(touts): We register the op on GPU but it still runs on CPU
// because its inputs and outputs are tagged as HostMemory.
#define REGISTER_GPU_KERNEL(T) REGISTER_KERNEL(DEVICE_GPU, T)
#define REGISTER_GPU_KERNEL(T) REGISTER_KERNEL_ALL_NUMS(DEVICE_GPU, T)
TF_CALL_float(REGISTER_GPU_KERNEL);
TF_CALL_double(REGISTER_GPU_KERNEL);
#undef REGISTER_GPU_KERNEL
#ifdef TENSORFLOW_USE_SYCL
#define REGISTER_SYCL_KERNEL(T) REGISTER_KERNEL(DEVICE_SYCL, T)
#define REGISTER_SYCL_KERNEL(T) REGISTER_KERNEL_ALL_NUMS(DEVICE_SYCL, T)
TF_CALL_float(REGISTER_SYCL_KERNEL);
TF_CALL_double(REGISTER_SYCL_KERNEL);
#endif // TENSORFLOW_USE_SYCL
#undef REGISTER_SYCL_KERNEL
#endif // TENSORFLOW_USE_SYCL
#undef REGISTER_CPU_KERNEL
#undef REGISTER_KERNEL_ALL_NUMS
#undef REGISTER_KERNEL
} // namespace tensorflow

View File

@ -0,0 +1,148 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/framework/allocator.h"
#include "tensorflow/core/framework/fake_input.h"
#include "tensorflow/core/framework/node_def_builder.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_testutil.h"
#include "tensorflow/core/framework/types.h"
#include "tensorflow/core/framework/types.pb.h"
#include "tensorflow/core/kernels/ops_testutil.h"
#include "tensorflow/core/kernels/ops_util.h"
#include "tensorflow/core/platform/test.h"
namespace tensorflow {
namespace {
class RangeOpTest : public OpsTestBase {
protected:
void MakeOp(DataType input_type) {
TF_ASSERT_OK(NodeDefBuilder("myop", "Range")
.Input(FakeInput(input_type))
.Input(FakeInput(input_type))
.Input(FakeInput(input_type))
.Finalize(node_def()));
TF_ASSERT_OK(InitOp());
}
};
class LinSpaceOpTest : public OpsTestBase {
protected:
void MakeOp(DataType input_type, DataType index_type) {
TF_ASSERT_OK(NodeDefBuilder("myop", "LinSpace")
.Input(FakeInput(input_type))
.Input(FakeInput(input_type))
.Input(FakeInput(index_type))
.Finalize(node_def()));
TF_ASSERT_OK(InitOp());
}
};
TEST_F(RangeOpTest, Simple_D32) {
MakeOp(DT_INT32);
// Feed and run
AddInputFromArray<int32>(TensorShape({}), {0});
AddInputFromArray<int32>(TensorShape({}), {10});
AddInputFromArray<int32>(TensorShape({}), {2});
TF_ASSERT_OK(RunOpKernel());
// Check the output
Tensor expected(allocator(), DT_INT32, TensorShape({5}));
test::FillValues<int32>(&expected, {0, 2, 4, 6, 8});
test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
}
TEST_F(RangeOpTest, Simple_Float) {
MakeOp(DT_FLOAT);
// Feed and run
AddInputFromArray<float>(TensorShape({}), {0.5});
AddInputFromArray<float>(TensorShape({}), {2});
AddInputFromArray<float>(TensorShape({}), {0.3});
TF_ASSERT_OK(RunOpKernel());
// Check the output
Tensor expected(allocator(), DT_FLOAT, TensorShape({5}));
test::FillValues<float>(&expected, {0.5, 0.8, 1.1, 1.4, 1.7});
test::ExpectTensorEqual<float>(expected, *GetOutput(0));
}
TEST_F(RangeOpTest, Large_Double) {
MakeOp(DT_DOUBLE);
// Feed and run
AddInputFromArray<double>(TensorShape({}), {0.0});
AddInputFromArray<double>(TensorShape({}), {10000});
AddInputFromArray<double>(TensorShape({}), {0.5});
TF_ASSERT_OK(RunOpKernel());
// Check the output
Tensor expected(allocator(), DT_DOUBLE, TensorShape({20000}));
std::vector<double> result;
for (int32 i = 0; i < 20000; ++i) result.push_back(i * 0.5);
test::FillValues<double>(&expected, gtl::ArraySlice<double>(result));
test::ExpectTensorEqual<double>(expected, *GetOutput(0));
}
TEST_F(LinSpaceOpTest, Simple_D32) {
MakeOp(DT_FLOAT, DT_INT32);
// Feed and run
AddInputFromArray<float>(TensorShape({}), {3.0});
AddInputFromArray<float>(TensorShape({}), {7.0});
AddInputFromArray<int32>(TensorShape({}), {3});
TF_ASSERT_OK(RunOpKernel());
// Check the output
Tensor expected(allocator(), DT_FLOAT, TensorShape({3}));
test::FillValues<float>(&expected, {3.0, 5.0, 7.0});
test::ExpectTensorEqual<float>(expected, *GetOutput(0));
}
TEST_F(LinSpaceOpTest, Single_D64) {
MakeOp(DT_FLOAT, DT_INT64);
// Feed and run
AddInputFromArray<float>(TensorShape({}), {9.0});
AddInputFromArray<float>(TensorShape({}), {100.0});
AddInputFromArray<int64>(TensorShape({}), {1});
TF_ASSERT_OK(RunOpKernel());
// Check the output
Tensor expected(allocator(), DT_FLOAT, TensorShape({1}));
test::FillValues<float>(&expected, {9.0});
test::ExpectTensorEqual<float>(expected, *GetOutput(0));
}
TEST_F(LinSpaceOpTest, Simple_Double) {
MakeOp(DT_DOUBLE, DT_INT32);
// Feed and run
AddInputFromArray<double>(TensorShape({}), {5.0});
AddInputFromArray<double>(TensorShape({}), {6.0});
AddInputFromArray<int32>(TensorShape({}), {6});
TF_ASSERT_OK(RunOpKernel());
// Check the output
Tensor expected(allocator(), DT_DOUBLE, TensorShape({6}));
test::FillValues<double>(&expected, {5.0, 5.2, 5.4, 5.6, 5.8, 6.0});
test::ExpectTensorEqual<double>(expected, *GetOutput(0));
}
} // namespace
} // namespace tensorflow

Some files were not shown because too many files have changed in this diff Show More