Move the ownership of Python stack traces to Graph object, make them accessible from C++ API
Expose stack printing options, implement common prefix filtering.
PiperOrigin-RevId: 345579757
Change-Id: I88673891e893b1f71a5b039e44f0bc30f190c18a
- `ConstructionContext::kDirectSession` From `tensorflow::DirectSession`, tf1 session API.
- `ConstructionContext::kFunctionDef`: From `FunctionDef`, @tf.function.
- `ConstructionContext::kUnknown`: Not tracked.
It can be accessed via `Graph::GetConstructionContext()`
PiperOrigin-RevId: 343109880
Change-Id: I7b3488648855c9d86d4fb4a202bd66cf7182191a
Instead, copy the class from the original node. This change also modifies the `kNodeClassTable` to use `absl::flat_hash_map<>`.
PiperOrigin-RevId: 306945263
Change-Id: I8eb1c80b57fdf204fbc7072a55615dd688025e87
This CL does two things:
1) Supports inter-procedural constant information propagation, across
PartitionedCall and StatefulPartitionedCall.
2) Done naively, (1) leads to exponential number of calls, as each function
will be reinlined for each (indirect) caller.
In order to address this performance issue, we cache the argument indices which
need to be constant, and attach that information to the Graph object.
This might require some clarification:
a) Caching in a passed map would not work, as duplication of constant
propagation for each top-level caller is still prohibitively expensive.
b) Caching in a global object would not work, as graphs are created and
destroyed during transformations.
c) Caching this meta-information on a `Graph` object has an added benefit that
we no longer perform the same constant propagation many times (a lot of
compilation passes call BackwardsConstAnalysis, and previously all this work
had to be repeated).
PiperOrigin-RevId: 303860413
Change-Id: I78f92ca1487fc952044e5ac6526dcaa5b50d5f21
This enables the testing for whether a node calls a function by checking the NodeClass enum, rather than looking up the type string in a FunctionLibraryDefinition map.
We call IsFunctionCall() for every node in several graph rewrites (including the control-flow lowering pass), so this should reduce the cost of rewrites on large graphs.
PiperOrigin-RevId: 272940579
In the Status-returning GetNodeAttr(), constructing an `errors::NotFound()` when the attr is not present involves expensive string concatenation.
Additionally, change GetNodeAttr() to GetNodeAttrString() on hot codepaths (e.g. `Executor::PropagateOutputs()`) to avoid copying a string on each call, and add overloads of GetNodeAttrSimple() that enable accessing const-pointers to non-POD types in the AttrValue proto without copying them.
PiperOrigin-RevId: 261141528
This method currently copies the given `const NodeDef&` proto into the returned `Node`. In many cases, the argument can be moved into the call, and we can elide a potentially large copy.
PiperOrigin-RevId: 254008134
As functions become more prominent, we have been directly checking
for op types in many places. This is slow and buggy because not all
places were updated for Device* versions of _Arg and _Retval.
PiperOrigin-RevId: 238044205
Also, make eager runtime always emit PartitionedCall and remove
special handling of xla compilation.
Becase this change makes XLA look inside PartitionedCalls, this change
had to update/disable some tests that include PartitionedCalls with
some uncompilable ops inside.
PiperOrigin-RevId: 237486703
This is a conservative approach to guarantee that any side-effects of the op are carried out.
This CL also reverts a previous (incomplete) solution to the same problem.
PiperOrigin-RevId: 233663631
This is in preparation for changing while_v2 to rewrite the forward
pass to output intermediates needed by the gradient, instead of
outputting all intermediates. Since While ops always have the same
inputs and output types, we need to be able to add inputs in addition
to adding outputs.
PiperOrigin-RevId: 223812986
Prior to this change, the If lowering pass would always use the first
output of the predicate op as the predicate. This change makes it use
the correct output. In addition, this adds more OutputTensor plumbing
which required making the node field non-const.
PiperOrigin-RevId: 221308751
FakeParam claims to have a different output shape than it actually
outputs (since the output is not meant to ever be accessed). Prior to
this change, ConstantFold() would call ReplaceTensorWithConstant() with
the invalid FakeParam output, which would cause a
use-before-initialization error.
PiperOrigin-RevId: 217929903
Doesn't attempt to deal with cases where we might have already generated
the functiondef for the parent function as in that case we cannot easily
modify the forward pass.
PiperOrigin-RevId: 216243224
where:
- E = number of edges in the graph
- e = number of edges on the node of interest
e is necessarily <= E and is typically really small
(# of inputs to an operation + control edges)
PiperOrigin-RevId: 210624296
Complete just enough of the core implementation to run
multi-device collectives locally within a single process.
Interfaces are still private and not availble for general use.
PiperOrigin-RevId: 197617132
This includes changes to Executor that (1) set scope_id on nodes that are
decorated with _scoped_allocator attribute, (2) mark such nodes to never
forward input.
PiperOrigin-RevId: 194807086
Summary of changes:
1. Set MarkForCompilationPassFlags::tf_xla_cpu_global_jit default to true in
C_API unit test env when XLA-execute is intended. Together with setting session
config config.graph_options.optimizer_options.global_jit_level to > 0, this
turns on XLA for the entire graph (eligible nodes only, with _Arg and _RetVal
nodes excluded).
We decided against defaulting MarkForCompilationPassFlags::tf_xla_cpu_global_jit
to true, due to performance concerns with the single-threaded nature of the XLA
CPU backend (see
https://www.tensorflow.org/performance/xla/jit#turning_on_jit_compilation).
2. In FindCompilationCandidates() during MarkForCompilationPass, skip compiling
any '_Arg'-typed nodes. This is necessary to avoid hitting a "Invalid argument
number" error during MarkForCompilationPass.
3. Extended C API based build rules to link in XLA libraries, and added unit
test "CAPI.Session_Min_XLA_CPU".
Also added some misc improvements and debugging aids.
PiperOrigin-RevId: 185193314