1. We currently only call `UpdateProperties()` from `AddAttr()`, which ensures that we have unique ownership of the shared `NodeProperties`. This allows us to modify the `NodeProperties` in place.
2. We only need to update the `NodeProperties` when the input and output types change as a result of adding the new attr. Most calls to `AddAttr()` add inferred shape attributes, and do not modify the input or output types.
PiperOrigin-RevId: 307054556
Change-Id: I01fbc6ba832020bcd2d89822a5d3a2e3beba02ce
Instead, copy the class from the original node. This change also modifies the `kNodeClassTable` to use `absl::flat_hash_map<>`.
PiperOrigin-RevId: 306945263
Change-Id: I8eb1c80b57fdf204fbc7072a55615dd688025e87
node_def_util.cc/.h into node_def_util.cc/.h and graph_node_util.cc/.h, where only the latter depends on graph.h.
PiperOrigin-RevId: 288340739
Change-Id: I66932bab042bda4bd707f866514b18b80efa805b
Previously, we were converting the FunctionLibraryDefinition to a FunctionDefLibrary proto to read out its number of functions, which incurs heavy allocation and deallocation costs when there are many functions.
PiperOrigin-RevId: 274255340
This enables the testing for whether a node calls a function by checking the NodeClass enum, rather than looking up the type string in a FunctionLibraryDefinition map.
We call IsFunctionCall() for every node in several graph rewrites (including the control-flow lowering pass), so this should reduce the cost of rewrites on large graphs.
PiperOrigin-RevId: 272940579
This prevents insertion of H2D and D2H copies when XLA-GPU clusters
have int32 outputs. This merge is only used the merge the outputs
from the XlaRun and the the PartitionedCall node.
The existing Check trying to catch malformed graph is not robust when
an op is registered with an expected number of inputs but has data edges
beyond this.
PiperOrigin-RevId: 266826557
This method currently copies the given `const NodeDef&` proto into the returned `Node`. In many cases, the argument can be moved into the call, and we can elide a potentially large copy.
PiperOrigin-RevId: 254008134
Introduces a new n-way _SwitchN op+kernel.
I audited usages of the grappler and graph variants of IsSwitch and IsMerge, and believe the corrections in this CL are correct.
PiperOrigin-RevId: 250803634
We were resizing the edge free list on every call, which meant repeated calls would have a 100% probability of resizing the edge free list and making node removal O(n^2) in the number of nodes to remove.
Instead, never call reserve. Use std::vector's default resizing heuristics to amortize this away.
PiperOrigin-RevId: 246459207
* Don't copy all nodes in PruneForReverseReachability.
* Using std::vector<bool> instead of std::unordered_set<Node*> for marking visited nodes makes PruneForReverseReachability about 2X faster.
* std::move target nodes when calling PruneForReverseReachability.
* Don't clear content of nodes when moving them to the free list: They are immediately overwritten when re-used, so no need to touch the memory where they reside when recycling them.
* Add benchmnarks for RemoveNode and PruneForReverseReachability.
PiperOrigin-RevId: 242508299
* Don't copy all nodes in PruneForReverseReachability.
* std::move target nodes when calling PruneForReverseReachability.
* Don't clear content of nodes when moving them to the free list: They are immediately overwritten when re-used, so no need to touch the memory where they reside when recycling them.
PiperOrigin-RevId: 241841395
As functions become more prominent, we have been directly checking
for op types in many places. This is slow and buggy because not all
places were updated for Device* versions of _Arg and _Retval.
PiperOrigin-RevId: 238044205
Also, make eager runtime always emit PartitionedCall and remove
special handling of xla compilation.
Becase this change makes XLA look inside PartitionedCalls, this change
had to update/disable some tests that include PartitionedCalls with
some uncompilable ops inside.
PiperOrigin-RevId: 237486703
This is a conservative approach to guarantee that any side-effects of the op are carried out.
This CL also reverts a previous (incomplete) solution to the same problem.
PiperOrigin-RevId: 233663631
This is in preparation for changing while_v2 to rewrite the forward
pass to output intermediates needed by the gradient, instead of
outputting all intermediates. Since While ops always have the same
inputs and output types, we need to be able to add inputs in addition
to adding outputs.
PiperOrigin-RevId: 223812986
Prior to this change, the If lowering pass would always use the first
output of the predicate op as the predicate. This change makes it use
the correct output. In addition, this adds more OutputTensor plumbing
which required making the node field non-const.
PiperOrigin-RevId: 221308751
FakeParam claims to have a different output shape than it actually
outputs (since the output is not meant to ever be accessed). Prior to
this change, ConstantFold() would call ReplaceTensorWithConstant() with
the invalid FakeParam output, which would cause a
use-before-initialization error.
PiperOrigin-RevId: 217929903
Doesn't attempt to deal with cases where we might have already generated
the functiondef for the parent function as in that case we cannot easily
modify the forward pass.
PiperOrigin-RevId: 216243224
where:
- E = number of edges in the graph
- e = number of edges on the node of interest
e is necessarily <= E and is typically really small
(# of inputs to an operation + control edges)
PiperOrigin-RevId: 210624296
Complete just enough of the core implementation to run
multi-device collectives locally within a single process.
Interfaces are still private and not availble for general use.
PiperOrigin-RevId: 197617132
That is, instances of sp.ToString() are replaced with std::string(sp).
This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view.
PiperOrigin-RevId: 195689392