Commit Graph

202 Commits

Author SHA1 Message Date
Marissa Ikonomidis
0c4416e3c2 Update tf_mlir_enable_mlir_bridge support unspecified
The existing tf_mlir_enable_mlir_bridge flag allows models to
selectively enable or disable the model via TF_XLA_FLAGS. If the
flag is not set, it defaults to false.

In order to slowly and safely rollout the mlir_bridge, we will
need to distinguish between unspecified and forcibly disabled.
If the flag is unspecified, we can selectively choose when the
bridge is enabled. This will allow us to slowly ramp up the
number of models that use the new bridge.

This patch continues to support the existing TF_XLA_FLAG
interface (tf_mlir_enable_mlir_bridge can be set to true or false)
but internally, TensorFlow can now distinguish between false
(forcibly disabled) and unset (unspecified).

PiperOrigin-RevId: 337523318
Change-Id: I8ebb49da104663e12e5c1fa6399a1bf79239a44f
2020-10-16 09:55:12 -07:00
Yuanzhong Xu
ff2b597e36 Resubmit constant folding change without the 1024byte limit. It was causing tf.where to fail in tf2xla.
PiperOrigin-RevId: 337236352
Change-Id: I44b8a99c0e74f2d4814933e05149e8eab5b04aaa
2020-10-14 21:51:24 -07:00
A. Unique TensorFlower
c201ae4531 Skip computing nodes with known oversized outputs in constant folding.
The threshold on tensor size was applied after the value is computed, only when replacing the old nodes. However, that could already have caused OOM in large models.

Changed compilation to XLA to limit TF constant folding to 1024 bytes, since it's only used for getting the shapes, and XLA internally also has constant folding.

PiperOrigin-RevId: 337226951
Change-Id: Ib7ebb91950e379cac6978027a7162438eb0a58d2
2020-10-14 20:19:22 -07:00
Yuanzhong Xu
c577eb1a3d Skip computing nodes with known oversized outputs in constant folding.
The threshold on tensor size was applied after the value is computed, only when replacing the old nodes. However, that could already have caused OOM in large models.

Changed compilation to XLA to limit TF constant folding to 1024 bytes, since it's only used for getting the shapes, and XLA internally also has constant folding.

PiperOrigin-RevId: 337221696
Change-Id: I4cdca20d28141f34b2c85120298bffb89e6df85d
2020-10-14 19:16:25 -07:00
Andy Ly
c0da1d4092 Update CompileGraphToXlaHlo to populate target/control ret nodes.
This is in preparation of updating graph pruning to always prune imported function graphs.

PiperOrigin-RevId: 335944889
Change-Id: I3f6156aa08384883eee6227210f8fc8f1b7cc575
2020-10-07 14:07:41 -07:00
A. Unique TensorFlower
c5d4acd09a Internal change
PiperOrigin-RevId: 335680049
Change-Id: I91e6edc767caf596d3cf1a28c075cc87388043e2
2020-10-06 12:14:02 -07:00
Marissa Ikonomidis
0eda09a3fb Update tf_mlir_enable_mlir_bridge support unspecified
The existing tf_mlir_enable_mlir_bridge flag allows models to
selectively enable or disable the model via TF_XLA_FLAGS. If the
flag is not set, it defaults to false.

In order to slowly and safely rollout the mlir_bridge, we will
need to distinguish between unspecified and forcibly disabled.
If the flag is unspecified, we can selectively choose when the
bridge is enabled. This will allow us to slowly ramp up the
number of models that use the new bridge.

This patch continues to support the existing TF_XLA_FLAG
interface (tf_mlir_enable_mlir_bridge can be set to true or false)
but internally, TensorFlow can now distinguish between false
(forcibly disabled) and unset (unspecified).

PiperOrigin-RevId: 335662030
Change-Id: Iefc44436620e52ff21a72583d57ebf29124a2691
2020-10-06 10:37:32 -07:00
Russell Power
5b5aab7f63 Internal change
PiperOrigin-RevId: 335147548
Change-Id: Ib445cfbcb28421b4eb522d4d9524e4a64fe631df
2020-10-02 20:33:42 -07:00
A. Unique TensorFlower
4eb05c3014 Use macro helpers for TPU builds and clean up define flags.
PiperOrigin-RevId: 334732331
Change-Id: Ice5d240cf785d64d11d4f634ff8955933da26b4d
2020-09-30 20:13:49 -07:00
Russell Power
45d693198d Use macro helpers for TPU builds and clean up define flags.
PiperOrigin-RevId: 334725778
Change-Id: Ib0c04366bd9e460329775075d82e7cfd47ed6d4e
2020-09-30 19:10:21 -07:00
Wenhao Jia
8f1362de18 Add target environment constraints to a subset of TensorFlow packages and targets.
PiperOrigin-RevId: 332884872
Change-Id: I65691fa2021c065e6c2ab57815d5a2b342d30ee2
2020-09-21 10:57:01 -07:00
Andy Ly
0651d1ac60 Update CompileGraphToXlaHlo to use llvm::ArrayRef<XlaArgument> instead of llvm::ArrayRef<const XlaArgument> (NFC).
This will allow for std::vector<XlaArgument> and llvm::SmallVector arg parameters in CompileGraphToXlaHlo to be used under different builds.

PiperOrigin-RevId: 329757301
Change-Id: I1025f3106af21b2672e2157c3f5b80af07ef0d0f
2020-09-02 11:54:55 -07:00
Ken Franko
bbb8cbeeba Give more detail in error message for unsupported ops in XlaCompiler.
Explains option for enabling soft_device_placement if an Unsupported op
is encountered by XlaCompiler.

PiperOrigin-RevId: 328636713
Change-Id: I6913818d640902afe0695d05131534a064d3fb61
2020-08-26 17:00:28 -07:00
Ken Franko
216d406927 Don't return error when setting same device<->host transfer metadata if identical key/metadata.
In some rare cases, some functions may be compiled multiple times and the key with same data may be inserted multiple times.  If this is the case, it should not result in an error.

PiperOrigin-RevId: 327653747
Change-Id: Ibb5f98e0916721bc50b67241b7fb947472398ff1
2020-08-20 10:53:19 -07:00
Ken Franko
d179d2d42f Fix typo when checking existing of key in map in XlaCompiler::SetHostToDeviceMetadata.
PiperOrigin-RevId: 327492950
Change-Id: I0b8fcd3ff46e683639d99db0e12ad9e94d6b7414
2020-08-19 13:21:29 -07:00
Yunxing Dai
27da5d74dc Remove the use of SetDynamicBinding in tf2xla bridge.
- Replace SetDynamicBinding with SetDimensionSize models the information into the IR. Makes problems easier to reproduce by just looking at the HLO graph.
- This one of the last few places that use SetDynamicBinding, after the clean up, we should be able to replace this old API.

PiperOrigin-RevId: 327057424
Change-Id: I7fbadef18a9cd076c12fc61a53310311498416a0
2020-08-17 11:20:28 -07:00
TensorFlower Gardener
a647794f9f Merge pull request from tg-at-google:wsign-compare-semi-final-tf2xla
PiperOrigin-RevId: 324077068
Change-Id: Iefaa2f6e1641653d69abcbf572aa17f85ebfce7a
2020-07-30 14:19:02 -07:00
George Karpenkov
4aa666bff4 Rollback of rollback of enabling MLIR bridge for tf.function
PiperOrigin-RevId: 323912950
Change-Id: I596ed1e1e015bf36c07a11dbac083503b70f24e7
2020-07-29 18:39:33 -07:00
A. Unique TensorFlower
4353b9cd4d [TF2XLA] Enable using MLIR bridge when TF_XLA_FLAGS=--tf_mlir_enable_mlir_bridge is on for tf.function(compile=True)
PiperOrigin-RevId: 323707882
Change-Id: I34a513fad8a5119b8a68180fc7277ff80fc6a555
2020-07-28 20:15:42 -07:00
George Karpenkov
42a9b7f7ae [TF2XLA] Enable using MLIR bridge when TF_XLA_FLAGS=--tf_mlir_enable_mlir_bridge is on for tf.function(compile=True)
PiperOrigin-RevId: 323683301
Change-Id: Ib1cfaec1bd27c3bf691820c616cdca1721aabe25
2020-07-28 17:06:55 -07:00
George Karpenkov
bcfb60d0a1 [TF2XLA] [NFC] Break apart the [TF2XLA/MLIR] -> xla_compiler dependency edge
This is needed for invoking the MLIR tf2xla bridge from xla_compiler.

This CL breaks apart items from xla_compiler into individual build targets,
which are then depended on from the MLIR TF bridge.

PiperOrigin-RevId: 323640340
Change-Id: I78b972503db9e7b5254014ca7e889005490d8339
2020-07-28 13:36:06 -07:00
Taré Gaskin
1701747714 updates 2020-07-28 20:15:12 +00:00
Taré Gaskin
ad58928e65 tf2xla directory resolutions 2020-07-26 22:06:02 +00:00
Yunxing Dai
8f75c38677 Plumb TF node name into xla's argument's op metadata.
PiperOrigin-RevId: 322667361
Change-Id: Ifcd875d428ce92628fc13354be9d0b4829a65f67
2020-07-22 15:27:09 -07:00
George Karpenkov
b440bbb40f [TF/XLA] Fixup numbering of XLA parameters used for aliasing
Previously, the XLA argument parameter was incorrectly assumed to be
corresponding to the index in the vector of `XlaCompiler::Argument`.
This is not correct, since not all `XlaCompiler::Argument`s become arguments to
the compiler: notably, constants and uninitialized resource variables do not.

PiperOrigin-RevId: 321709603
Change-Id: I730fd6385949c360b2b831318a5b59c08f8362ef
2020-07-16 21:41:00 -07:00
Tare Gaskin
55ee67e114 [-Wsign-compare] warning fixes batch 9 2020-07-07 01:37:54 +00:00
George Karpenkov
aa7ff6aa28 [TF2XLA] Set up aliasing for resource variables even when not returning a tuple
PiperOrigin-RevId: 317414582
Change-Id: I45cd1f314331cb86a0257e7b7cf9d0639be84e99
2020-06-19 18:20:41 -07:00
George Karpenkov
0dda89c61e [TF/XLA] Rollback of rollback of 313256383, with a UB fix.
PiperOrigin-RevId: 313319715
Change-Id: I4b73f95a228b3e6e4fed524492c9389a19629f02
2020-05-26 20:47:42 -07:00
A. Unique TensorFlower
53037dcd66 [TF/XLA] Ignore _noinline inside force-compiled clusters
The code surrounding the handling of _noinline functions is very rarely hit,
and as a result is not well tested.  For now, the better approach is to follow
a more well-lit codepath and try to minimize the use of _noinline functions.

As a starting point, inline blocks even with _noinline inside force-compiled
blocks.

PiperOrigin-RevId: 313280139
Change-Id: I9f2d9b95d4bfe15eb2acea2a3d101b82355c14d5
2020-05-26 15:37:41 -07:00
George Karpenkov
0e4e0c593b [TF/XLA] Ignore _noinline inside force-compiled clusters
The code surrounding the handling of _noinline functions is very rarely hit,
and as a result is not well tested.  For now, the better approach is to follow
a more well-lit codepath and try to minimize the use of _noinline functions.

As a starting point, inline blocks even with _noinline inside force-compiled
blocks.

PiperOrigin-RevId: 313256383
Change-Id: If2f60aac933ac8e27f3dcb65bf6b389611c45bd7
2020-05-26 13:29:22 -07:00
Derek Murray
000c8f09ea [Build cleanup] Update #includes of moved header "graph/graph_constructor.h".
This change modifies these includes to point to
"tensorflow/core/common_runtime/graph_constructor.h" instead. This change will enable us to remove the accidental dependency from //tensorflow/core/graph to //tensorflow/core/common_runtime.

PiperOrigin-RevId: 309035649
Change-Id: I2af0fdd6a6ccc4ae8d351a9117a69b6fc80c22e9
2020-04-29 09:20:48 -07:00
Eugene Zhulenev
55c4d9e49c [XLA] Force single-device inlined function body placement when optimizing compiled graphs
Leaving function body nodes unplaced might break downstream compilation for TPU models.

PiperOrigin-RevId: 307446242
Change-Id: I008ce74f05348aa66ab9446f0a9b8b6c2d97aef8
2020-04-20 12:00:20 -07:00
Andy Ly
79ca75b618 Add support for updating argument/result shapes and layouts with associated shardings of entry function.
Sharding is present with model parallelism. Depending on what type of sharding is present, argument/result shapes and layouts need to be updated. ShapeRepresentationFn and shardings are used to determine the new shapes and layouts.

PiperOrigin-RevId: 303182568
Change-Id: I4185c1ae12de618b0b2ce9c07d2cd795c4e329b8
2020-03-26 13:32:04 -07:00
Andy Ly
3947c77855 Expose RewriteLayoutWithShardedShape from XlaCompiler.
This call can be reused when determining argument layouts with sharding.

PiperOrigin-RevId: 302111008
Change-Id: I3607e41dc987e348e8405b96f09ebc549a8427bc
2020-03-20 15:36:13 -07:00
Smit Hinsu
499b528806 Move Graph creation from NodeDef logic to XlaCompilationCache from XlaCompiler
XlaCompilationCache is the only user of single op compilation so we can move single op handling to the cache. This will allow MLIR based on demand compilation to reuse this logic in a follow-up change.

PiperOrigin-RevId: 300799049
Change-Id: I50d3f258e815cbc2caa6315eff0d902695146537
2020-03-13 11:57:30 -07:00
Yuanzhong Xu
ac2c05a1d5 [TF/XLA] Fix several layout issues.
1. The previous approach might have different layouts for computation.GetProgramShape() and xla_output_shape. It only used shape_representation_fn for xla_output_shape, but not entry's program shape. These being different are often confusing, and may make it hard to reproduce a bug with HLO dump which doesn't have HloModuleConfig.

2. Output shapes were not updated with layout when there is sharding.

3. The updated value of a resource did not preserve the fast_mem annotation on the argument.

PiperOrigin-RevId: 295811071
Change-Id: I801a46d3039b2349dd0196cbc14ec3d9a8211d55
2020-02-18 13:47:52 -08:00
Yunxing Dai
70d8aa322c Automatically set up user aliasing in tf2xla when a resource update is presented.
- When a resource update is presented, automatically alias the input and output.
- Also fix an issue where the input/output proto config is overwritten.

PiperOrigin-RevId: 294984983
Change-Id: I45e96513dfeaa91f523db63837355b698bd2fb85
2020-02-13 13:22:20 -08:00
Tong Shen
d7336a9186 Propagate sharded argument layouts through TF/XLA bridge.
After parameter sharding, per core argument might have different layout. In XLA compiler we cannot deduce layout for sharded parameter any more (because we cannot access shape_representation_fn any more). So we override XLA parameter layout with sharded parameter layout.

In XlaDeviceContext, CopyCPUTensorToDevice() use shape_representation_fn(cpu_tensor_shape) as device tensor shape, so we must use the same shape as XLA compiler input shape. For CopyDeviceTensorToCPU(), device tensor shape is defined by XLA compiler directly, so we do not need to fix anything.

PiperOrigin-RevId: 284812560
Change-Id: I567f180a8035ff71982d49910b84c98d07eb25d1
2019-12-10 11:36:20 -08:00
TensorFlower Gardener
dc93e8445d Merge pull request from Agoniii:dev/ignore_label_for_xla
PiperOrigin-RevId: 283924423
Change-Id: I3cf795c9e490493822231625c8aee6a10d099e4f
2019-12-05 01:19:16 -08:00
Tong Shen
ecb57cfd7a Add debug information for _Arg nodes.
PiperOrigin-RevId: 283813323
Change-Id: I6696b29d6f4fb56af72fde91ee59d5ba6924e0a4
2019-12-04 12:54:57 -08:00
Agoniii
6a74e16e94 add label for xlaop 2019-11-28 15:01:34 +08:00
George Karpenkov
61b41cb08a [tf2xla] Do not resolve compile time constants from XlaCompiler::CompileGraph
Constant folding inside `XlaCompiler::CompileGraph` is not necessary for
correctness, but is a performance optimization.  This optimization is not
necessary though: the XLA compiler performs constant folding in any case (also
using HloEvaluator), and in some cases constant folding in the bridge leads to
severe performance issues (1.5+hrs compile time): the XLA constant evaluator
does not perform folding on "broadcast" and "iota" operations, which can be
overly expensive for the interpreter.

This change pushes calculations which previously went through the HLO
interpreter to the corresponding HLO backend.
Consequently, numeric stability changes due to some optimizations performed by
the backends (namely: fast math optimizations on CPU backend, A / B => A * (1 /
B) rewrite, "-nvptx-prec-divf32=1" on GPU).
Additionally, reductions numerics are different: float reductions on
interpreter use double interpreter, while float reductions in XLA use float
accumulator, and in non-CPU backends the error grows linearly with the size of
the input.

PiperOrigin-RevId: 281554507
Change-Id: Ic58c547727fc0cb9a93bfc7eb2db763dc1e8b02e
2019-11-20 13:32:04 -08:00
Gaurav Jain
309a3c7964 Avoid allocating ScopedStepContainer for each Run
We avoid recreating a ScopedStepContainer by storing one for reuse in
the KernelAndDeviceOp & KernelAndDeviceFunc classes. Further, we can
avoid doing a resource manager lookup to perform a clean by adding a
dirty flag to indicate the ScopedStepContainer was accessed.

In addition, we simplify the signature of MakeResourceHandle by avoiding
the need to pass in the entire OpKernelContext object.

PiperOrigin-RevId: 281110991
Change-Id: I0a186583a1ff50b08bf68c18cfb99c912e05386d
2019-11-18 11:29:49 -08:00
Jeffrey A. Dean
ec030f72f3 Performance improvements to speed up invocation of XLA code, by making
canonicalization and signature generation faster

Added benchmark for XlaCompilationCache::BuildSignature to measure time
taken to build a signature for the cache.

Base is this CL with just the changes to add the benchmark in
xla_compilation_cache_test.cc, New is this whole CL.

Run on desktop machine (40 X 2793 MHz CPUs); 2019-09-17T08:30:04.125894664-07:00
CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB
Benchmark                                      Base (ns)    New (ns) Improvement
----------------------------------------------------------------------------
BM_BuildSignature/0                                  226          87    +61.5%
BM_BuildSignature/1                                  337         171    +49.3%
BM_BuildSignature/2                                  504         259    +48.6%
BM_BuildSignature/5                                 1008         592    +41.3%
BM_BuildSignature/10                                1751        1238    +29.3%

RELNOTES: n/a
PiperOrigin-RevId: 276289188
Change-Id: Ia47343203f6ac587a921a92f86c2428dd04db2a7
2019-10-23 09:39:27 -07:00
Eugene Brevdo
90f01af49a Pipe ConfigProto through FLR so that it can be accessed by Ops like PartitionedCallOp.
Also pass the ConfigProto through distributed function calls both in the standard
graph registration mode and in the new eager master setup.

The PFLR stores a std::optional<ConfigProto> instead of a pointer, because it may be created with a pointer that would dangle after its creation.  At the same time, we need to know if a ConfigProto was available at creation time, which is why it's a std::optional.  In contrast, the FLR gets a pointer directly because it is given a valid pointer that will outlast it in all cases.

PiperOrigin-RevId: 272763578
2019-10-03 16:20:55 -07:00
Youlong Cheng
2932851e5d Annotate arg in FastMem for XLA compiler.
PiperOrigin-RevId: 272525033
2019-10-02 16:15:56 -07:00
Yunxing Dai
b6a97cecc5 [XLA][TF2XLA] Remove CF for Shape ops.
XLA do special shape value inferring, no need to do it in tf's constant folding.

PiperOrigin-RevId: 271218195
2019-09-25 16:18:56 -07:00
Gunhan Gulsoy
5267ea0caa Move error_codes.proto part 2.
Move the usages to protobuf/error_codes.proto

PiperOrigin-RevId: 270927284
2019-09-24 10:07:35 -07:00
A. Unique TensorFlower
92cf1204d2 Refactor some reused functionality.
PiperOrigin-RevId: 268115443
2019-09-09 18:00:40 -07:00
Tong Shen
25e5c91b97 In XlaCompiler, when creating the TUPLE root instruction, set its sharding according to its inputs' shardings.
This is required because HLO sharding propagation does not modify root instruction sharding.

PiperOrigin-RevId: 267735025
2019-09-06 23:08:47 -07:00