This duplicates some of the BUILD dependency tree to go around the need to link huge bottleneck dependencies (such as `//tensorflow/core:framework`). Until TF can use `cc_shared_library` in a stable way (and all support in Bazel exists), we will need to use the duplicated tree for fuzzing.
PiperOrigin-RevId: 317326319
Change-Id: I1493e3ae7340298971fe15bd3702b63657f9bf9f
Both uint32 & uint64 had been omitted from TF_CALL_INTEGRAL_TYPES due to
suggested concerns of size bloat. In reality it seems that the size
increase is only around 2MB. Further, this fixes#39649 since we are no
longer inadvertently using the XLA_CPU device to perform tf.reduce_mean.
PiperOrigin-RevId: 317259372
Change-Id: Iacf75eaedce198fbef4bd9fd59b6fefa584cbf34
This PR tries to address the issue raised in 40471 where
the output shape of an autograph consists of tf.equal
could not inference correctly. Specifically
`x.shape == [None, 10, 1]` and `y.shape == [None, 1, 4]`
only yield `shape == None` (should be `shape == [None, 10, 4]`).
The reason was that the shape inbference function for equal
didn't capture the cases where both x and y's dim are None.
This PR fixes the issue.
This PR fixes 40471.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
The private `_shared_rendezvous` property allows the function to use the
rendezvous of the parent. This is only needed in order to support code
where raw send/recv operations are inserted and when functions are run
in graph mode where they may not be inlined.
PiperOrigin-RevId: 315319264
Change-Id: Ieb6b3924c51ccfd201b4693f3a499f883c7c0b71
We add the TF_CALL_COMPLEX_TYPES macro and update related kernel
registrations with more compact macros rather than the individual dtype
listings.
This should be a no-op and should give better visibility into what is
the dtype coverage for many of our kernels.
PiperOrigin-RevId: 315224662
Change-Id: I14aad07711a407fa632a94d891238a48ae89bcab
If we are unable to find any valid devices for a node, we can do a quick
check to see if the node is even valid as per the op definition. This
greatly improves the eager error message since there is no point in
listing all the available kernels across all devices if we know none of
them can match.
Previous:
NotFoundError: Could not find device for node: {{node GatherV2}} = GatherV2[Taxis=DT_INT32, Tindices=DT_FLOAT, Tparams=DT_INT32, batch_dims=0]
All kernels registered for op GatherV2:
device='CPU'; Tparams in [DT_INT64]; Tindices in [DT_INT32]
device='CPU'; Tparams in [DT_INT64]; Tindices in [DT_INT64]
device='CPU'; Tparams in [DT_INT32]; Tindices in [DT_INT32]
... Many more registrations ...
New:
InvalidArgumentError: Value for attr 'Tindices' of float is not in the list of allowed values: int32, int64
; NodeDef: {{node GatherV2}}; ...
PiperOrigin-RevId: 314963092
Change-Id: I8072e7ba9e6d316570a536780d78992691e620f1
This CL switches from using iterator prefix for identifying the parent node in the model tree when a node is constructed to directly passing a parent pointer to the constructor. In addition, this CL makes the `IteratorBase::InitializeBase` method public, which makes it possible to fix `cache` and `snapshot` implementations to reflect their use of nested iterators in the model tree.
PiperOrigin-RevId: 314570434
Change-Id: Ide0b37f404077938ad8dc4fbbd91489b7197c6e1
LargeAllocationWarningBytes() is implied to be a quantity ( it calls AvailableRam(), which unless improperly named, is a quantity/ i.e. non-negative ).
Given LargeAllocationWarningBytes() is a quantity, it can be cast to size_t with behavior constrained to be as-intend.
The timeout is set as an argument to a collective op. When non zero value, a completion timeout is set to detect staleness. If a timeout goes off, the execution is aborted through a DEADLINE_EXCEEDED error.
PiperOrigin-RevId: 313861868
Change-Id: I7fee45736608ad7fbcc9dd980db2fd302c9cb4df
1. Thread the RPC cancel signal through the eager service RunComponentFunction calls;
2. Always pass the cancellation manager to the underlying executor (instead of only passing when `is_eager` is true, i.e., pure eager ops). With this we do not need to cancel the rendezvous from the process FLR; instead the ExecutorState takes care of it when op fails.
3. Do not mark all statuses as derived when aborting rendezvous or triggering cancellation. This usually results in the original errors buried as one of the derived errors.
PiperOrigin-RevId: 313814162
Change-Id: Ia866f5f522a0b1aa54e9dce7b9cc0bcf7682136a
slicing out a contiguous pieces of tensors along the batch dimension and
copying them to another tensor.
PiperOrigin-RevId: 313414257
Change-Id: I2530c58ed53ad8e92e5f976f2dd1728296d12185
These allow us to implement tf.data service compression/decompression as a part of the tf.data pipeline.
PiperOrigin-RevId: 312605093
Change-Id: I4a833bc89e602c8fd78abc4c1a0026c2a397449f
At present, the `CallFrameInterface` (and, by extension, all TensorFlow functions that pass arguments via `ArgOp`) retains ownership of one reference on each argument tensor for the lifetime of the frame. This prevents buffer forwarding optimizations from being performed on arguments, which can lead to performance issues. However, in some cases (e.g. the loop variables in a `WhileOp`, the arguments to the map function in `Dataset.map()`) we know enough about the caller to be able to "move" the arguments into the function call.
This change adds the following:
* Methods called `CanConsumeArg(int index)` and `ConsumeArg(int index, Tensor* val)` to `CallFrameInterface` that allow the runtime to query the call frame for consumable arguments. The default implementations do not permit any arguments to be consumed, for backwards compatibility.
* An implementation of these methods in tf.data's "captured_function.cc", allowing arguments to `Dataset.map()` functions (and similar) to be consumed by the function call.
* A specialized `CallFrameInterface` implementation for the synchronous pass in `WhileOp` that allows arguments to be consumed.
* Modifications to `SingleThreadedExecutorImpl` to query the `CanConsumeArg()` method and consume arguments wherever possible.
Potential future additions:
* Add `ConsumeArg()` support to the default executor.
* Use custom call frames with `ConsumeArg()` support in more functional ops.
PiperOrigin-RevId: 309981157
Change-Id: I02f7b9f5611a27b087ee0058540ab7e38e70c21d
- Expand the packed _Arg nodes when the graph is ready for graph partition.
- Introduce an optional sub-index to function Arg nodes, in order to distinguish between two arguments with the same "index". It happens after replacing a packed _Arg node which is assigned to a CompositeDevice with multiple replica nodes (one per device).
The "index" of an _Arg node is unique before expanding it. It's also unique within each subgraph after graph partition.
PiperOrigin-RevId: 309781835
Change-Id: Ic6e351f45b7523288b5dae30997ddf0dae86660b
This change adds an `OpKernelContext::executor_type()` method, which (by analogy with `OpKernelContext::runner()` and `OpKernelContext::run_all_kernels_inline()`) enables function calls within a kernel to inherit that option from the calling context. As a concrete example, this enables a WhileOp or IfOp running with the SINGLE_THREADED_EXECUTOR executor_type to use the same optimizations in the branch/cond/body functions of those ops.
PiperOrigin-RevId: 309515120
Change-Id: I11b0b3ee458dd8ea1cdc9284f8acdb293c5cb770
Now we can iterate through tf.data service datasets with `for elem in dataset`, and `distributed_dataset.repeat()` will work correctly.
This CL removes the previous method of iteration via CreateJob/CreateDataServiceIterator. It wasn't yet made public, so it is OK to remove the old ops.
PiperOrigin-RevId: 309281077
Change-Id: I9531f7d2834ce6669f15896d8c830d23d8277b13
Many ops mark themselves as stateful even though they primarily want to
disable optimizations such as constant folding and CSE. We add a new
method to clearly indicate this intent even though we are currently not
adding a new flag.
PiperOrigin-RevId: 309253555
Change-Id: I8cae8bbc4c3b71819ee869b1870fce1e39e061be
In many cases, we invoke a TensorFlow function synchronously, and have to create a callback and notification to block on the result. With some executor types (e.g. SINGLE_THREADED_EXECUTOR), this causes unnecessary atomic operations to be executed (e.g. setting and destroying the Notification), which can account for ~100ns.
PiperOrigin-RevId: 309105760
Change-Id: Ie5b3aef5c4dcd3529e6a3a2701ac152b9630cc01
This change modifies these includes to point to
"tensorflow/core/common_runtime/graph_constructor.h" instead. This change will enable us to remove the accidental dependency from //tensorflow/core/graph to //tensorflow/core/common_runtime.
PiperOrigin-RevId: 309035649
Change-Id: I2af0fdd6a6ccc4ae8d351a9117a69b6fc80c22e9
1. Avoid re-evaluating the "TF_RUN_HANDLER_USE_SUB_THREAD_POOL" environment variable each time a task is enqueued, by caching the result in a static local variable.
2. A similar optimization in `ChooseRequestsWithExponentialDistribution()`.
3. Since all the arguments are string literals, we can pass `const char*` to the `ParamFromEnv*WithDefault()` methods, to avoid creating a temporary `std::string` on each call.
PiperOrigin-RevId: 308915732
Change-Id: I1642b5d924477eb006497acf95ac7cfc17956feb