STT-tensorflow/tensorflow/python/distribute/cluster_resolver
Rick Chao 61c2e69663 Make task_type and task_id standard properties in tf.distribute cluster resolvers.
PiperOrigin-RevId: 317736970
Change-Id: Ia9c76462afc4c2fcc42a149960b50b2cbcafd482
2020-06-22 14:48:14 -07:00
..
tpu Add a TPUClusterResolver.connect API to simplify TPU initialization. 2020-06-20 00:09:59 -07:00
__init__.py Update API docs of ClusterResolver and all its implementations. 2019-07-21 19:40:32 -07:00
BUILD Move TPUClusterResolver into tpu subdirectory. 2020-05-13 14:59:47 -07:00
cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
cluster_resolver.py Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
gce_cluster_resolver_test.py Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
gce_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
kubernetes_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
kubernetes_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
README_Slurm.md Merge pull request #38355 from Flamefire:slurm_cluster_resolver_docu 2020-04-09 23:49:21 -07:00
README.md Moves ClusterResolvers into tensorflow.python.distribute in preparation for TensorFlow 2.0 2018-11-29 13:41:18 -08:00
slurm_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
slurm_cluster_resolver.py fix some linter errors for slurm_cluster_resolver. 2020-05-29 16:56:51 -07:00
tfconfig_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
tfconfig_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
tpu_cluster_resolver.py Move TPUClusterResolver into tpu subdirectory. 2020-05-13 14:59:47 -07:00

Cluster Resolvers

Cluster Resolvers are a new way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec framework, Cluster Resolvers allow users to simply specify a configuration and a cluster management service and a ClusterResolver will automatically fetch the relevant information from the service and populate ClusterSpecs.

ClusterResolvers are designed to work well with ManagedTrainingSession and ClusterSpec propagation so that distributed training sessions remain robust in the face of node and network failures.