STT-tensorflow/tensorflow/compiler/xla/client
Tayo Oguntebi 6983bacea1 Enables per-host dummy args for TPUExecute (TF1) and adds XLA options.
Enabling this logic removes cross-worker send/recv dependencies required for TPUExecuteOp nodes to access a model's variables. This decreases overhead at the start of a training loop.

The approach used is to replace remote variable reads with zero tensors on each worker, except for the primary worker. The zero tensors feed TPUExecute nodes that are local to that worker.  For large distributed systems with large variables, this removes the need for the initial Send/Recv variable broadcast, which can be expensive.

PiperOrigin-RevId: 351904109
Change-Id: I9f1ed63c2401f227646010a94a70c04f1c96cb7e
2021-01-14 17:03:51 -08:00
..
lib [XLA] Avoid reshape to R1 in NormalFloatingPointDistribution 2020-12-15 11:13:11 -08:00
BUILD Move CreateModuleConfig to a new hlo_module_util header. 2020-12-14 15:14:29 -08:00
client_library.cc - Change std::set<int> to absl::optional<std::set<int>> for allowed devices 2018-12-12 19:50:40 -08:00
client_library.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
client.cc xla directory resolutions 2020-07-26 22:14:33 +00:00
client.h Fix minor typos 2019-05-31 09:16:33 +09:00
compile_only_client.cc Integrate LLVM at https://github.com/llvm/llvm-project/commit/f0bab7875e78 2020-06-26 09:31:16 -07:00
compile_only_client.h Expose fusion configuration as part of HLO module's config and AOT compilation options. 2019-09-27 15:45:57 -07:00
executable_build_options.cc Enables per-host dummy args for TPUExecute (TF1) and adds XLA options. 2021-01-14 17:03:51 -08:00
executable_build_options.h Enables per-host dummy args for TPUExecute (TF1) and adds XLA options. 2021-01-14 17:03:51 -08:00
global_data.cc
global_data.h Allow multiple GlobalData objects to be released in one RPC swipe. 2018-10-18 19:02:47 -07:00
local_client.cc Remove platform field from shaped buffer. 2020-10-30 14:26:55 -07:00
local_client.h [XLA] Store host shape in ExecutionInput 2020-07-13 20:03:34 -07:00
padding_test.cc
padding.cc Fix 64-bit integer portability problems in TensorFlow compiler. 2020-01-16 13:16:05 -08:00
padding.h
sharding_builder.cc [XLA:SPMD] Minor fixes and utils for manual sharding 2020-12-01 13:37:36 -08:00
sharding_builder.h [XLA:SPMD] Minor fixes and utils for manual sharding 2020-12-01 13:37:36 -08:00
xla_builder_test.cc Speed up Shape creation by avoiding unnecessary validations. 2020-11-22 20:40:07 -08:00
xla_builder.cc [XLA:TPU] Implement 2D AllGather algorithm with use_global_device_ids = true. 2020-12-22 13:44:19 -08:00
xla_builder.h [XLA:TPU] Implement 2D AllGather algorithm with use_global_device_ids = true. 2020-12-22 13:44:19 -08:00
xla_computation.cc
xla_computation.h Take proto by value. 2020-05-20 23:40:37 -07:00