STT-tensorflow/tensorflow/core/protobuf/tpu
Tayo Oguntebi 6983bacea1 Enables per-host dummy args for TPUExecute (TF1) and adds XLA options.
Enabling this logic removes cross-worker send/recv dependencies required for TPUExecuteOp nodes to access a model's variables. This decreases overhead at the start of a training loop.

The approach used is to replace remote variable reads with zero tensors on each worker, except for the primary worker. The zero tensors feed TPUExecute nodes that are local to that worker.  For large distributed systems with large variables, this removes the need for the initial Send/Recv variable broadcast, which can be expensive.

PiperOrigin-RevId: 351904109
Change-Id: I9f1ed63c2401f227646010a94a70c04f1c96cb7e
2021-01-14 17:03:51 -08:00
..
BUILD BUILD file cleanup 2020-09-30 16:03:47 -07:00
compilation_result.proto Add error payload in status. 2021-01-07 00:09:29 -08:00
compile_metadata.proto Enables per-host dummy args for TPUExecute (TF1) and adds XLA options. 2021-01-14 17:03:51 -08:00
dynamic_padding.proto
optimization_parameters.proto Add a set of dynamic embedding optimizers directly taking an HloModule. 2020-09-29 18:51:01 -07:00
topology.proto
tpu_embedding_configuration.proto API changes to allow TPU embedding table sizes that do not fit in 32-bit 2021-01-12 16:19:20 -08:00
tpu_embedding_output_layout.proto Deprecated and removed uses of TPUEmbeddingOutputLayout proto and output_layout field in TPUEmbeddingConfiguration. 2020-09-30 14:37:01 -07:00