STT-tensorflow

Author	SHA1	Message	Date
Marissa Ikonomidis	0c4416e3c2	Update tf_mlir_enable_mlir_bridge support unspecified The existing tf_mlir_enable_mlir_bridge flag allows models to selectively enable or disable the model via TF_XLA_FLAGS. If the flag is not set, it defaults to false. In order to slowly and safely rollout the mlir_bridge, we will need to distinguish between unspecified and forcibly disabled. If the flag is unspecified, we can selectively choose when the bridge is enabled. This will allow us to slowly ramp up the number of models that use the new bridge. This patch continues to support the existing TF_XLA_FLAG interface (tf_mlir_enable_mlir_bridge can be set to true or false) but internally, TensorFlow can now distinguish between false (forcibly disabled) and unset (unspecified). PiperOrigin-RevId: 337523318 Change-Id: I8ebb49da104663e12e5c1fa6399a1bf79239a44f	2020-10-16 09:55:12 -07:00
Yuanzhong Xu	ff2b597e36	Resubmit constant folding change without the 1024byte limit. It was causing tf.where to fail in tf2xla. PiperOrigin-RevId: 337236352 Change-Id: I44b8a99c0e74f2d4814933e05149e8eab5b04aaa	2020-10-14 21:51:24 -07:00
A. Unique TensorFlower	c201ae4531	Skip computing nodes with known oversized outputs in constant folding. The threshold on tensor size was applied after the value is computed, only when replacing the old nodes. However, that could already have caused OOM in large models. Changed compilation to XLA to limit TF constant folding to 1024 bytes, since it's only used for getting the shapes, and XLA internally also has constant folding. PiperOrigin-RevId: 337226951 Change-Id: Ib7ebb91950e379cac6978027a7162438eb0a58d2	2020-10-14 20:19:22 -07:00
Yuanzhong Xu	c577eb1a3d	Skip computing nodes with known oversized outputs in constant folding. The threshold on tensor size was applied after the value is computed, only when replacing the old nodes. However, that could already have caused OOM in large models. Changed compilation to XLA to limit TF constant folding to 1024 bytes, since it's only used for getting the shapes, and XLA internally also has constant folding. PiperOrigin-RevId: 337221696 Change-Id: I4cdca20d28141f34b2c85120298bffb89e6df85d	2020-10-14 19:16:25 -07:00
Andy Ly	c0da1d4092	Update CompileGraphToXlaHlo to populate target/control ret nodes. This is in preparation of updating graph pruning to always prune imported function graphs. PiperOrigin-RevId: 335944889 Change-Id: I3f6156aa08384883eee6227210f8fc8f1b7cc575	2020-10-07 14:07:41 -07:00
A. Unique TensorFlower	c5d4acd09a	Internal change PiperOrigin-RevId: 335680049 Change-Id: I91e6edc767caf596d3cf1a28c075cc87388043e2	2020-10-06 12:14:02 -07:00
Marissa Ikonomidis	0eda09a3fb	Update tf_mlir_enable_mlir_bridge support unspecified The existing tf_mlir_enable_mlir_bridge flag allows models to selectively enable or disable the model via TF_XLA_FLAGS. If the flag is not set, it defaults to false. In order to slowly and safely rollout the mlir_bridge, we will need to distinguish between unspecified and forcibly disabled. If the flag is unspecified, we can selectively choose when the bridge is enabled. This will allow us to slowly ramp up the number of models that use the new bridge. This patch continues to support the existing TF_XLA_FLAG interface (tf_mlir_enable_mlir_bridge can be set to true or false) but internally, TensorFlow can now distinguish between false (forcibly disabled) and unset (unspecified). PiperOrigin-RevId: 335662030 Change-Id: Iefc44436620e52ff21a72583d57ebf29124a2691	2020-10-06 10:37:32 -07:00
Russell Power	5b5aab7f63	Internal change PiperOrigin-RevId: 335147548 Change-Id: Ib445cfbcb28421b4eb522d4d9524e4a64fe631df	2020-10-02 20:33:42 -07:00
A. Unique TensorFlower	4eb05c3014	Use macro helpers for TPU builds and clean up define flags. PiperOrigin-RevId: 334732331 Change-Id: Ice5d240cf785d64d11d4f634ff8955933da26b4d	2020-09-30 20:13:49 -07:00
Russell Power	45d693198d	Use macro helpers for TPU builds and clean up define flags. PiperOrigin-RevId: 334725778 Change-Id: Ib0c04366bd9e460329775075d82e7cfd47ed6d4e	2020-09-30 19:10:21 -07:00
Wenhao Jia	8f1362de18	Add target environment constraints to a subset of TensorFlow packages and targets. PiperOrigin-RevId: 332884872 Change-Id: I65691fa2021c065e6c2ab57815d5a2b342d30ee2	2020-09-21 10:57:01 -07:00
Andy Ly	0651d1ac60	Update CompileGraphToXlaHlo to use llvm::ArrayRef<XlaArgument> instead of llvm::ArrayRef<const XlaArgument> (NFC). This will allow for std::vector<XlaArgument> and llvm::SmallVector arg parameters in CompileGraphToXlaHlo to be used under different builds. PiperOrigin-RevId: 329757301 Change-Id: I1025f3106af21b2672e2157c3f5b80af07ef0d0f	2020-09-02 11:54:55 -07:00
Ken Franko	bbb8cbeeba	Give more detail in error message for unsupported ops in XlaCompiler. Explains option for enabling soft_device_placement if an Unsupported op is encountered by XlaCompiler. PiperOrigin-RevId: 328636713 Change-Id: I6913818d640902afe0695d05131534a064d3fb61	2020-08-26 17:00:28 -07:00
Ken Franko	216d406927	Don't return error when setting same device<->host transfer metadata if identical key/metadata. In some rare cases, some functions may be compiled multiple times and the key with same data may be inserted multiple times. If this is the case, it should not result in an error. PiperOrigin-RevId: 327653747 Change-Id: Ibb5f98e0916721bc50b67241b7fb947472398ff1	2020-08-20 10:53:19 -07:00
Ken Franko	d179d2d42f	Fix typo when checking existing of key in map in XlaCompiler::SetHostToDeviceMetadata. PiperOrigin-RevId: 327492950 Change-Id: I0b8fcd3ff46e683639d99db0e12ad9e94d6b7414	2020-08-19 13:21:29 -07:00
Yunxing Dai	27da5d74dc	Remove the use of SetDynamicBinding in tf2xla bridge. - Replace SetDynamicBinding with SetDimensionSize models the information into the IR. Makes problems easier to reproduce by just looking at the HLO graph. - This one of the last few places that use SetDynamicBinding, after the clean up, we should be able to replace this old API. PiperOrigin-RevId: 327057424 Change-Id: I7fbadef18a9cd076c12fc61a53310311498416a0	2020-08-17 11:20:28 -07:00
TensorFlower Gardener	a647794f9f	Merge pull request #41751 from tg-at-google:wsign-compare-semi-final-tf2xla PiperOrigin-RevId: 324077068 Change-Id: Iefaa2f6e1641653d69abcbf572aa17f85ebfce7a	2020-07-30 14:19:02 -07:00
George Karpenkov	4aa666bff4	Rollback of rollback of enabling MLIR bridge for tf.function PiperOrigin-RevId: 323912950 Change-Id: I596ed1e1e015bf36c07a11dbac083503b70f24e7	2020-07-29 18:39:33 -07:00
A. Unique TensorFlower	4353b9cd4d	[TF2XLA] Enable using MLIR bridge when TF_XLA_FLAGS=--tf_mlir_enable_mlir_bridge is on for tf.function(compile=True) PiperOrigin-RevId: 323707882 Change-Id: I34a513fad8a5119b8a68180fc7277ff80fc6a555	2020-07-28 20:15:42 -07:00
George Karpenkov	42a9b7f7ae	[TF2XLA] Enable using MLIR bridge when TF_XLA_FLAGS=--tf_mlir_enable_mlir_bridge is on for tf.function(compile=True) PiperOrigin-RevId: 323683301 Change-Id: Ib1cfaec1bd27c3bf691820c616cdca1721aabe25	2020-07-28 17:06:55 -07:00
George Karpenkov	bcfb60d0a1	[TF2XLA] [NFC] Break apart the [TF2XLA/MLIR] -> xla_compiler dependency edge This is needed for invoking the MLIR tf2xla bridge from xla_compiler. This CL breaks apart items from xla_compiler into individual build targets, which are then depended on from the MLIR TF bridge. PiperOrigin-RevId: 323640340 Change-Id: I78b972503db9e7b5254014ca7e889005490d8339	2020-07-28 13:36:06 -07:00
Taré Gaskin	1701747714	updates	2020-07-28 20:15:12 +00:00
Taré Gaskin	ad58928e65	tf2xla directory resolutions	2020-07-26 22:06:02 +00:00
Yunxing Dai	8f75c38677	Plumb TF node name into xla's argument's op metadata. PiperOrigin-RevId: 322667361 Change-Id: Ifcd875d428ce92628fc13354be9d0b4829a65f67	2020-07-22 15:27:09 -07:00
George Karpenkov	b440bbb40f	[TF/XLA] Fixup numbering of XLA parameters used for aliasing Previously, the XLA argument parameter was incorrectly assumed to be corresponding to the index in the vector of `XlaCompiler::Argument`. This is not correct, since not all `XlaCompiler::Argument`s become arguments to the compiler: notably, constants and uninitialized resource variables do not. PiperOrigin-RevId: 321709603 Change-Id: I730fd6385949c360b2b831318a5b59c08f8362ef	2020-07-16 21:41:00 -07:00
Tare Gaskin	55ee67e114	[-Wsign-compare] warning fixes batch 9	2020-07-07 01:37:54 +00:00
George Karpenkov	aa7ff6aa28	[TF2XLA] Set up aliasing for resource variables even when not returning a tuple PiperOrigin-RevId: 317414582 Change-Id: I45cd1f314331cb86a0257e7b7cf9d0639be84e99	2020-06-19 18:20:41 -07:00
George Karpenkov	0dda89c61e	[TF/XLA] Rollback of rollback of 313256383, with a UB fix. PiperOrigin-RevId: 313319715 Change-Id: I4b73f95a228b3e6e4fed524492c9389a19629f02	2020-05-26 20:47:42 -07:00
A. Unique TensorFlower	53037dcd66	[TF/XLA] Ignore _noinline inside force-compiled clusters The code surrounding the handling of _noinline functions is very rarely hit, and as a result is not well tested. For now, the better approach is to follow a more well-lit codepath and try to minimize the use of _noinline functions. As a starting point, inline blocks even with _noinline inside force-compiled blocks. PiperOrigin-RevId: 313280139 Change-Id: I9f2d9b95d4bfe15eb2acea2a3d101b82355c14d5	2020-05-26 15:37:41 -07:00
George Karpenkov	0e4e0c593b	[TF/XLA] Ignore _noinline inside force-compiled clusters The code surrounding the handling of _noinline functions is very rarely hit, and as a result is not well tested. For now, the better approach is to follow a more well-lit codepath and try to minimize the use of _noinline functions. As a starting point, inline blocks even with _noinline inside force-compiled blocks. PiperOrigin-RevId: 313256383 Change-Id: If2f60aac933ac8e27f3dcb65bf6b389611c45bd7	2020-05-26 13:29:22 -07:00
Derek Murray	000c8f09ea	[Build cleanup] Update #includes of moved header "graph/graph_constructor.h". This change modifies these includes to point to "tensorflow/core/common_runtime/graph_constructor.h" instead. This change will enable us to remove the accidental dependency from //tensorflow/core/graph to //tensorflow/core/common_runtime. PiperOrigin-RevId: 309035649 Change-Id: I2af0fdd6a6ccc4ae8d351a9117a69b6fc80c22e9	2020-04-29 09:20:48 -07:00
Eugene Zhulenev	55c4d9e49c	[XLA] Force single-device inlined function body placement when optimizing compiled graphs Leaving function body nodes unplaced might break downstream compilation for TPU models. PiperOrigin-RevId: 307446242 Change-Id: I008ce74f05348aa66ab9446f0a9b8b6c2d97aef8	2020-04-20 12:00:20 -07:00
Andy Ly	79ca75b618	Add support for updating argument/result shapes and layouts with associated shardings of entry function. Sharding is present with model parallelism. Depending on what type of sharding is present, argument/result shapes and layouts need to be updated. ShapeRepresentationFn and shardings are used to determine the new shapes and layouts. PiperOrigin-RevId: 303182568 Change-Id: I4185c1ae12de618b0b2ce9c07d2cd795c4e329b8	2020-03-26 13:32:04 -07:00
Andy Ly	3947c77855	Expose RewriteLayoutWithShardedShape from XlaCompiler. This call can be reused when determining argument layouts with sharding. PiperOrigin-RevId: 302111008 Change-Id: I3607e41dc987e348e8405b96f09ebc549a8427bc	2020-03-20 15:36:13 -07:00
Smit Hinsu	499b528806	Move Graph creation from NodeDef logic to XlaCompilationCache from XlaCompiler XlaCompilationCache is the only user of single op compilation so we can move single op handling to the cache. This will allow MLIR based on demand compilation to reuse this logic in a follow-up change. PiperOrigin-RevId: 300799049 Change-Id: I50d3f258e815cbc2caa6315eff0d902695146537	2020-03-13 11:57:30 -07:00
Yuanzhong Xu	ac2c05a1d5	[TF/XLA] Fix several layout issues. 1. The previous approach might have different layouts for computation.GetProgramShape() and xla_output_shape. It only used shape_representation_fn for xla_output_shape, but not entry's program shape. These being different are often confusing, and may make it hard to reproduce a bug with HLO dump which doesn't have HloModuleConfig. 2. Output shapes were not updated with layout when there is sharding. 3. The updated value of a resource did not preserve the fast_mem annotation on the argument. PiperOrigin-RevId: 295811071 Change-Id: I801a46d3039b2349dd0196cbc14ec3d9a8211d55	2020-02-18 13:47:52 -08:00
Yunxing Dai	70d8aa322c	Automatically set up user aliasing in tf2xla when a resource update is presented. - When a resource update is presented, automatically alias the input and output. - Also fix an issue where the input/output proto config is overwritten. PiperOrigin-RevId: 294984983 Change-Id: I45e96513dfeaa91f523db63837355b698bd2fb85	2020-02-13 13:22:20 -08:00
Tong Shen	d7336a9186	Propagate sharded argument layouts through TF/XLA bridge. After parameter sharding, per core argument might have different layout. In XLA compiler we cannot deduce layout for sharded parameter any more (because we cannot access shape_representation_fn any more). So we override XLA parameter layout with sharded parameter layout. In XlaDeviceContext, CopyCPUTensorToDevice() use shape_representation_fn(cpu_tensor_shape) as device tensor shape, so we must use the same shape as XLA compiler input shape. For CopyDeviceTensorToCPU(), device tensor shape is defined by XLA compiler directly, so we do not need to fix anything. PiperOrigin-RevId: 284812560 Change-Id: I567f180a8035ff71982d49910b84c98d07eb25d1	2019-12-10 11:36:20 -08:00
TensorFlower Gardener	dc93e8445d	Merge pull request #34602 from Agoniii:dev/ignore_label_for_xla PiperOrigin-RevId: 283924423 Change-Id: I3cf795c9e490493822231625c8aee6a10d099e4f	2019-12-05 01:19:16 -08:00
Tong Shen	ecb57cfd7a	Add debug information for _Arg nodes. PiperOrigin-RevId: 283813323 Change-Id: I6696b29d6f4fb56af72fde91ee59d5ba6924e0a4	2019-12-04 12:54:57 -08:00
Agoniii	6a74e16e94	add label for xlaop	2019-11-28 15:01:34 +08:00
George Karpenkov	61b41cb08a	[tf2xla] Do not resolve compile time constants from `XlaCompiler::CompileGraph` Constant folding inside `XlaCompiler::CompileGraph` is not necessary for correctness, but is a performance optimization. This optimization is not necessary though: the XLA compiler performs constant folding in any case (also using HloEvaluator), and in some cases constant folding in the bridge leads to severe performance issues (1.5+hrs compile time): the XLA constant evaluator does not perform folding on "broadcast" and "iota" operations, which can be overly expensive for the interpreter. This change pushes calculations which previously went through the HLO interpreter to the corresponding HLO backend. Consequently, numeric stability changes due to some optimizations performed by the backends (namely: fast math optimizations on CPU backend, A / B => A * (1 / B) rewrite, "-nvptx-prec-divf32=1" on GPU). Additionally, reductions numerics are different: float reductions on interpreter use double interpreter, while float reductions in XLA use float accumulator, and in non-CPU backends the error grows linearly with the size of the input. PiperOrigin-RevId: 281554507 Change-Id: Ic58c547727fc0cb9a93bfc7eb2db763dc1e8b02e	2019-11-20 13:32:04 -08:00
Gaurav Jain	309a3c7964	Avoid allocating ScopedStepContainer for each Run We avoid recreating a ScopedStepContainer by storing one for reuse in the KernelAndDeviceOp & KernelAndDeviceFunc classes. Further, we can avoid doing a resource manager lookup to perform a clean by adding a dirty flag to indicate the ScopedStepContainer was accessed. In addition, we simplify the signature of MakeResourceHandle by avoiding the need to pass in the entire OpKernelContext object. PiperOrigin-RevId: 281110991 Change-Id: I0a186583a1ff50b08bf68c18cfb99c912e05386d	2019-11-18 11:29:49 -08:00
Jeffrey A. Dean	ec030f72f3	Performance improvements to speed up invocation of XLA code, by making canonicalization and signature generation faster Added benchmark for XlaCompilationCache::BuildSignature to measure time taken to build a signature for the cache. Base is this CL with just the changes to add the benchmark in xla_compilation_cache_test.cc, New is this whole CL. Run on desktop machine (40 X 2793 MHz CPUs); 2019-09-17T08:30:04.125894664-07:00 CPU: Intel Ivybridge with HyperThreading (20 cores) dL1:32KB dL2:256KB dL3:25MB Benchmark Base (ns) New (ns) Improvement ---------------------------------------------------------------------------- BM_BuildSignature/0 226 87 +61.5% BM_BuildSignature/1 337 171 +49.3% BM_BuildSignature/2 504 259 +48.6% BM_BuildSignature/5 1008 592 +41.3% BM_BuildSignature/10 1751 1238 +29.3% RELNOTES: n/a PiperOrigin-RevId: 276289188 Change-Id: Ia47343203f6ac587a921a92f86c2428dd04db2a7	2019-10-23 09:39:27 -07:00
Eugene Brevdo	90f01af49a	Pipe ConfigProto through FLR so that it can be accessed by Ops like PartitionedCallOp. Also pass the ConfigProto through distributed function calls both in the standard graph registration mode and in the new eager master setup. The PFLR stores a std::optional<ConfigProto> instead of a pointer, because it may be created with a pointer that would dangle after its creation. At the same time, we need to know if a ConfigProto was available at creation time, which is why it's a std::optional. In contrast, the FLR gets a pointer directly because it is given a valid pointer that will outlast it in all cases. PiperOrigin-RevId: 272763578	2019-10-03 16:20:55 -07:00
Youlong Cheng	2932851e5d	Annotate arg in FastMem for XLA compiler. PiperOrigin-RevId: 272525033	2019-10-02 16:15:56 -07:00
Yunxing Dai	b6a97cecc5	[XLA][TF2XLA] Remove CF for Shape ops. XLA do special shape value inferring, no need to do it in tf's constant folding. PiperOrigin-RevId: 271218195	2019-09-25 16:18:56 -07:00
Gunhan Gulsoy	5267ea0caa	Move error_codes.proto part 2. Move the usages to protobuf/error_codes.proto PiperOrigin-RevId: 270927284	2019-09-24 10:07:35 -07:00
A. Unique TensorFlower	92cf1204d2	Refactor some reused functionality. PiperOrigin-RevId: 268115443	2019-09-09 18:00:40 -07:00
Tong Shen	25e5c91b97	In XlaCompiler, when creating the TUPLE root instruction, set its sharding according to its inputs' shardings. This is required because HLO sharding propagation does not modify root instruction sharding. PiperOrigin-RevId: 267735025	2019-09-06 23:08:47 -07:00

1 2 3 4 5

202 Commits