STT-tensorflow

History

Tayo Oguntebi 6983bacea1 Enables per-host dummy args for TPUExecute (TF1) and adds XLA options. Enabling this logic removes cross-worker send/recv dependencies required for TPUExecuteOp nodes to access a model's variables. This decreases overhead at the start of a training loop. The approach used is to replace remote variable reads with zero tensors on each worker, except for the primary worker. The zero tensors feed TPUExecute nodes that are local to that worker. For large distributed systems with large variables, this removes the need for the initial Send/Recv variable broadcast, which can be expensive. PiperOrigin-RevId: 351904109 Change-Id: I9f1ed63c2401f227646010a94a70c04f1c96cb7e		2021-01-14 17:03:51 -08:00
..
BUILD	BUILD file cleanup	2020-09-30 16:03:47 -07:00
compilation_result.proto	Add error payload in status.	2021-01-07 00:09:29 -08:00
compile_metadata.proto	Enables per-host dummy args for TPUExecute (TF1) and adds XLA options.	2021-01-14 17:03:51 -08:00
dynamic_padding.proto
optimization_parameters.proto	Add a set of dynamic embedding optimizers directly taking an HloModule.	2020-09-29 18:51:01 -07:00
topology.proto
tpu_embedding_configuration.proto	API changes to allow TPU embedding table sizes that do not fit in 32-bit	2021-01-12 16:19:20 -08:00
tpu_embedding_output_layout.proto	Deprecated and removed uses of TPUEmbeddingOutputLayout proto and output_layout field in TPUEmbeddingConfiguration.	2020-09-30 14:37:01 -07:00

No results found.