Commit Graph

110 Commits

Author SHA1 Message Date
Derek Murray
f7bae45c69 [Graph] Avoid copying the NodeDef where possible in Node::UpdateProperties().
1. We currently only call `UpdateProperties()` from `AddAttr()`, which ensures that we have unique ownership of the shared `NodeProperties`. This allows us to modify the `NodeProperties` in place.
2. We only need to update the `NodeProperties` when the input and output types change as a result of adding the new attr. Most calls to `AddAttr()` add inferred shape attributes, and do not modify the input or output types.

PiperOrigin-RevId: 307054556
Change-Id: I01fbc6ba832020bcd2d89822a5d3a2e3beba02ce
2020-04-17 09:24:37 -07:00
Derek Murray
aec85065ff [Graph] Avoid calling Node::GetNodeClassForOp() when a node is copied.
Instead, copy the class from the original node. This change also modifies the `kNodeClassTable` to use `absl::flat_hash_map<>`.

PiperOrigin-RevId: 306945263
Change-Id: I8eb1c80b57fdf204fbc7072a55615dd688025e87
2020-04-16 16:38:09 -07:00
A. Unique TensorFlower
5f3a3019ba Replace NodeDef with std::shared_ptr<NodeProperties> in the kernel creation code paths and try to avoid as many copies of NodeDefs as possible. This will in most cases allow sharing the NodeDef between the OpKernel and the graph Node from which it is created.
This reduces the number of allocations in the executor benchmark by about 8%:

name                                                 old time/op             new time/op             delta
BM_executor/16/1k       [Nodes = 9824  ]              911µs ± 3%              911µs ± 1%    ~     (p=0.548 n=5+5)
BM_executor/32/8k       [Nodes = 141991]             17.1ms ± 2%             16.8ms ± 1%  -2.17%  (p=0.016 n=5+5)
BM_executor/1k/16       [Nodes = 6781  ]             1.21ms ± 1%             1.25ms ± 7%    ~     (p=0.095 n=5+5)
BM_executor/8k/32       [Nodes = 130875]              4.35s ± 0%              4.34s ± 0%    ~     (p=0.841 n=5+5)
BM_executor/1k/1k       [Nodes = 526256]              3.33s ± 1%              3.31s ± 1%    ~     (p=0.095 n=5+5)
BM_FeedInputFetchOutput                              54.0µs ± 7%             56.9µs ±13%    ~     (p=0.222 n=5+5)

name                                                 old allocs/op           new allocs/op           delta
BM_executor/16/1k       [Nodes = 9824  ]              15.4k ± 0%              14.1k ± 0%  -7.95%  (p=0.008 n=5+5)
BM_executor/32/8k       [Nodes = 141991]               226k ± 0%               208k ± 0%  -7.86%  (p=0.008 n=5+5)
BM_executor/1k/16       [Nodes = 6781  ]              10.2k ± 0%               9.3k ± 0%  -8.36%  (p=0.008 n=5+5)
BM_executor/8k/32       [Nodes = 130875]               197k ± 0%               180k ± 0%  -8.31%  (p=0.016 n=4+5)
BM_executor/1k/1k       [Nodes = 526256]               771k ± 0%               706k ± 0%  -8.53%  (p=0.008 n=5+5)
BM_FeedInputFetchOutput                                58.0 ± 0%               57.0 ± 0%  -1.72%  (p=0.008 n=5+5)

PiperOrigin-RevId: 295803318
Change-Id: I0d262c6082822023f449f9817dc943d20bd302d5
2020-02-18 13:20:06 -08:00
Anna R
31c3789692 Split out node_def_util, op_def_builder, op_def_util and attr_value_util targets in tensorflow/core/framework/BUILD. Split out
node_def_util.cc/.h into node_def_util.cc/.h and graph_node_util.cc/.h, where only the latter depends on graph.h.

PiperOrigin-RevId: 288340739
Change-Id: I66932bab042bda4bd707f866514b18b80efa805b
2020-01-06 11:46:01 -08:00
TensorFlower Gardener
c87a16e17a Merge pull request from bas-aarts:xla-merge
PiperOrigin-RevId: 276286841
Change-Id: I4f9cbc4d82cc963676b0b55ea023d4792ee1b0c7
2019-10-23 09:14:40 -07:00
Derek Murray
1d0aff41ac In Graph constructor, use FunctionLibraryDefinition::num_functions() to get number of functions.
Previously, we were converting the FunctionLibraryDefinition to a FunctionDefLibrary proto to read out its number of functions, which incurs heavy allocation and deallocation costs when there are many functions.

PiperOrigin-RevId: 274255340
2019-10-11 19:26:30 -07:00
Derek Murray
fc30d76c55 Automated rollback of commit ce05dd80cc
PiperOrigin-RevId: 273515236
2019-10-08 07:37:58 -07:00
Derek Murray
ce05dd80cc Automated rollback of commit 3d830370a8
PiperOrigin-RevId: 272959765
2019-10-04 23:40:25 -07:00
Derek Murray
3d830370a8 Cache whether a Node is a function op in the Node's class.
This enables the testing for whether a node calls a function by checking the NodeClass enum, rather than looking up the type string in a FunctionLibraryDefinition map.

We call IsFunctionCall() for every node in several graph rewrites (including the control-flow lowering pass), so this should reduce the cost of rewrites on large graphs.

PiperOrigin-RevId: 272940579
2019-10-04 19:12:39 -07:00
Bas Aarts
791bf78c29 Add XLA-only merge that can merge all types.
This prevents insertion of H2D and D2H copies when XLA-GPU clusters
have int32 outputs. This merge is only used the merge the outputs
from the XlaRun and the the PartitionedCall node.
2019-10-04 15:29:07 -07:00
Zhuoran Liu
c45537230d Make debug message more human-readable.
PiperOrigin-RevId: 268257613
2019-09-10 11:53:34 -07:00
Mehdi Amini
0197a2d8a3 Add a check to catch out-of-bound access on invalid Graphs
The existing Check trying to catch malformed graph is not robust when
an op is registered with an expected number of inputs but has data edges
beyond this.

PiperOrigin-RevId: 266826557
2019-09-02 16:57:38 -07:00
Ayush Dubey
39e7715eb0 Configure NcclGather in collective param resolution.
Also add python tests that cover NCCL implementations of broadcast and
all-gather.

PiperOrigin-RevId: 261408514
2019-08-02 17:19:53 -07:00
Yanhua Sun
170a95de67 In while_v2 emit a StatelessIf op if the body is stateless.
PiperOrigin-RevId: 260927755
2019-07-31 08:12:52 -07:00
Saurabh Saxena
22beb4c7d5 In cond_v2 emit a StatelessIf op if the body is stateless.
PiperOrigin-RevId: 256009375
2019-07-01 14:16:11 -07:00
Derek Murray
a101d48091 Change Graph::AddNode() to take node_def by value.
This method currently copies the given `const NodeDef&` proto into the returned `Node`. In many cases, the argument can be moved into the call, and we can elide a potentially large copy.

PiperOrigin-RevId: 254008134
2019-06-19 09:18:27 -07:00
Brian Patton
10ed2f7bb5 Adds a lowering from Case to _SwitchN+Merge.
Introduces a new n-way _SwitchN op+kernel.

I audited usages of the grappler and graph variants of IsSwitch and IsMerge, and believe the corrections in this CL are correct.

PiperOrigin-RevId: 250803634
2019-05-30 18:28:08 -07:00
Andy Ly
1a40c07e83 Expose FindKernelDef with NodeDef components (name, op, device, etc.). Update Grappler util's IsKernelRegisteredForNode to use lower FindKernelDef.
PiperOrigin-RevId: 247104392
2019-05-07 16:33:58 -07:00
A. Unique TensorFlower
7867001d88 [TF core] Remove accidentally quadratic behaviour when repeatedly calling Graph::RemoveNode
We were resizing the edge free list on every call, which meant repeated calls would have a 100% probability of resizing the edge free list and making node removal O(n^2) in the number of nodes to remove.

Instead, never call reserve. Use std::vector's default resizing heuristics to amortize this away.

PiperOrigin-RevId: 246459207
2019-05-03 00:27:00 -07:00
A. Unique TensorFlower
c42d974e7a Optimize tf graph manipulation:
* Don't copy all nodes in PruneForReverseReachability.
  * Using std::vector<bool> instead of std::unordered_set<Node*> for marking visited nodes makes PruneForReverseReachability about 2X faster.
  * std::move target nodes when calling PruneForReverseReachability.
  * Don't clear content of nodes when moving them to the free list: They are immediately overwritten when re-used, so no need to touch the memory where they reside when recycling them.
  * Add benchmnarks for RemoveNode and PruneForReverseReachability.

PiperOrigin-RevId: 242508299
2019-04-08 12:27:12 -07:00
A. Unique TensorFlower
4591bad9f9 Automated rollback of commit fe30579d67
PiperOrigin-RevId: 241998820
2019-04-04 13:57:10 -07:00
A. Unique TensorFlower
fd32a7f773 Automated rollback of commit fe30579d67
PiperOrigin-RevId: 241944783
2019-04-04 09:36:17 -07:00
A. Unique TensorFlower
98c3cfbf74 Automated rollback of commit fe30579d67
PiperOrigin-RevId: 241863985
2019-04-03 21:26:55 -07:00
A. Unique TensorFlower
fe30579d67 Optimize tf graph manipulation:
* Don't copy all nodes in PruneForReverseReachability.
  * std::move target nodes when calling PruneForReverseReachability.
  * Don't clear content of nodes when moving them to the free list: They are immediately overwritten when re-used, so no need to touch the memory where they reside when recycling them.

PiperOrigin-RevId: 241841395
2019-04-03 17:57:18 -07:00
A. Unique TensorFlower
919b38007e Speedup removal of nodes from Graph by not removing edges one by one from the node's own EdgeSet. Only remove it from the neighboring nodes' EdgeSets, then clear the node's own in_edges_ and out_edges_ in one operation.
PiperOrigin-RevId: 240449165
2019-03-26 16:18:58 -07:00
Igor Ganichev
ea294844b0 Add Node::IsArg() and Node::IsRetval() methods and use them
As functions become more prominent, we have been directly checking
for op types in many places. This is slow and buggy because not all
places were updated for Device* versions of _Arg and _Retval.

PiperOrigin-RevId: 238044205
2019-03-12 10:34:44 -07:00
Igor Ganichev
ed2b195990 Support function calls through PartitionedCall in tf2xla
Also, make eager runtime always emit PartitionedCall and remove
special handling of xla compilation.

Becase this change makes XLA look inside PartitionedCalls, this change
had to update/disable some tests that include PartitionedCalls with
some uncompilable ops inside.

PiperOrigin-RevId: 237486703
2019-03-08 11:32:38 -08:00
Jiri Simsa
5702b86c96 [tf.data] Marking dataset ops that consume a dataset without an iterator as stateful to make sure they are not prune from the graph in case their output is not used.
This is a conservative approach to guarantee that any side-effects of the op are carried out.

This CL also reverts a previous (incomplete) solution to the same problem.

PiperOrigin-RevId: 233663631
2019-02-12 19:11:25 -08:00
Jiri Simsa
eadd8aca53 Making sure dataset "output" ops are not pruned from function graphs as they might have side effects even though the ops are not marked as stateful.
PiperOrigin-RevId: 229425577
2019-01-15 13:35:35 -08:00
A. Unique TensorFlower
f9699b8d40 Fixing build due to ambiguous vector constructor.
PiperOrigin-RevId: 225859201
2018-12-17 11:30:46 -08:00
A. Unique TensorFlower
e8d6281e7e [Error improvement] We now put an attribute for keeping track of the original source nodes. We have also changed many optimizers to correctly transmit the original node values.
PiperOrigin-RevId: 225457141
2018-12-13 16:36:57 -08:00
Skye Wanderman-Milne
b4c2856141 Make AddWhileInputHack handle control inputs correctly.
PiperOrigin-RevId: 225131361
2018-12-11 23:19:14 -08:00
Skye Wanderman-Milne
62db4a3ccf Introduce Operation._add_while_inputs to allow adding inputs to a While op.
This is in preparation for changing while_v2 to rewrite the forward
pass to output intermediates needed by the gradient, instead of
outputting all intermediates. Since While ops always have the same
inputs and output types, we need to be able to add inputs in addition
to adding outputs.

PiperOrigin-RevId: 223812986
2018-12-03 10:11:42 -08:00
Skye Wanderman-Milne
199ead85e8 Fix bug in If lowering and make {Input,Output}Tensor.node non-const.
Prior to this change, the If lowering pass would always use the first
output of the predicate op as the predicate. This change makes it use
the correct output. In addition, this adds more OutputTensor plumbing
which required making the node field non-const.

PiperOrigin-RevId: 221308751
2018-11-13 12:09:26 -08:00
Skye Wanderman-Milne
d1b2537f33 Don't constant fold FakeParam ops.
FakeParam claims to have a different output shape than it actually
outputs (since the output is not meant to ever be accessed).  Prior to
this change, ConstantFold() would call ReplaceTensorWithConstant() with
the invalid FakeParam output, which would cause a
use-before-initialization error.

PiperOrigin-RevId: 217929903
2018-10-19 14:24:54 -07:00
A. Unique TensorFlower
be409cac81 During error conditions, we were currently putting an entire NodeDef information in the user facing error message. This makes it very hard to read and understand the error messages. We now just put the name in the message, and put the NodeDef information in the logs.
PiperOrigin-RevId: 217616833
2018-10-17 17:13:58 -07:00
Peter Hawkins
3f23f4ddea Automated rollback of commit 6fa6bd045c
PiperOrigin-RevId: 217173355
2018-10-15 11:22:32 -07:00
Peter Hawkins
6fa6bd045c Replace references to tensorflow::StringPiece with absl::string_view. No functional changes.
PiperOrigin-RevId: 217170781
2018-10-15 11:01:32 -07:00
Tong Shen
72bf28cd1f Add a utility function to build node name to node index.
PiperOrigin-RevId: 216853788
2018-10-12 06:39:26 -07:00
Rachel Lim
09e098e505 Automated rollback of commit d6a3d6a829
PiperOrigin-RevId: 216617037
2018-10-10 16:55:28 -07:00
A. Unique TensorFlower
d6a3d6a829 Automated rollback of commit 950cf87104
PiperOrigin-RevId: 216500702
2018-10-10 02:47:15 -07:00
Rachel Lim
950cf87104 [tf.data vectorization] Add vectorizer for Add op
PiperOrigin-RevId: 216424512
2018-10-09 14:46:11 -07:00
Alexandre Passos
eec9ca8f0b Partial support tfe.defun in tf.gradients.
Doesn't attempt to deal with cases where we might have already generated
the functiondef for the parent function as in that case we cannot easily
modify the forward pass.

PiperOrigin-RevId: 216243224
2018-10-08 13:58:40 -07:00
Rachel Lim
47eafbaf43 [tf.data] Add utility to deduplicate graph node names (after vectorization)
PiperOrigin-RevId: 215595078
2018-10-03 11:29:40 -07:00
Sanjoy Das
9884cb3629 Check that IsValid{Input|Output}Tensor is only given non-control edges
PiperOrigin-RevId: 215338658
2018-10-01 23:12:22 -07:00
Asim Shankar
7f52de1a2b Make Graph::UpdateEdge() be O(e) instead of O(E)
where:
- E = number of edges in the graph
- e = number of edges on the node of interest

e is necessarily <= E and is typically really small
(# of inputs to an operation + control edges)

PiperOrigin-RevId: 210624296
2018-08-28 16:07:05 -07:00
A. Unique TensorFlower
4f4e1b4886 Removed redundant std::string -> string conversions.
PiperOrigin-RevId: 210565027
2018-08-28 10:43:43 -07:00
Peter Hawkins
642a043de4 [TF:XLA] Replace bespoke NodeSlot class in subgraph encapsulation code with InputTensor and OutputTensor classes from TF core.
Add equality and hash methods to InputTensor and OutputTensor.

No functional changes intended.

PiperOrigin-RevId: 200440015
2018-06-13 13:08:14 -07:00
A. Unique TensorFlower
9d2c6ff2a5 Collective Ops Part 7
Complete just enough of the core implementation to run
multi-device collectives locally within a single process.
Interfaces are still private and not availble for general use.

PiperOrigin-RevId: 197617132
2018-05-22 13:51:22 -07:00
A. Unique TensorFlower
170634d5a1 Replaced calls to tensorflow::StringPiece::ToString with std::string conversions.
That is, instances of sp.ToString() are replaced with std::string(sp).

This will allow tensorflow::StringPiece::ToString to be removed, which is necessary before it can be replaced with absl::string_view.

PiperOrigin-RevId: 195689392
2018-05-07 16:39:29 -07:00