STT-tensorflow/tensorflow/python/distribute/cluster_resolver
Rick Chao 61c2e69663 Make task_type and task_id standard properties in tf.distribute cluster resolvers.
PiperOrigin-RevId: 317736970
Change-Id: Ia9c76462afc4c2fcc42a149960b50b2cbcafd482
2020-06-22 14:48:14 -07:00
..
tpu Add a `TPUClusterResolver.connect` API to simplify TPU initialization. 2020-06-20 00:09:59 -07:00
BUILD Move TPUClusterResolver into tpu subdirectory. 2020-05-13 14:59:47 -07:00
README.md
README_Slurm.md
__init__.py
cluster_resolver.py Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
cluster_resolver_test.py
gce_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
gce_cluster_resolver_test.py Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
kubernetes_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
kubernetes_cluster_resolver_test.py
slurm_cluster_resolver.py fix some linter errors for slurm_cluster_resolver. 2020-05-29 16:56:51 -07:00
slurm_cluster_resolver_test.py
tfconfig_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
tfconfig_cluster_resolver_test.py
tpu_cluster_resolver.py Move TPUClusterResolver into tpu subdirectory. 2020-05-13 14:59:47 -07:00

README.md

Cluster Resolvers

Cluster Resolvers are a new way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec framework, Cluster Resolvers allow users to simply specify a configuration and a cluster management service and a ClusterResolver will automatically fetch the relevant information from the service and populate ClusterSpecs.

ClusterResolvers are designed to work well with ManagedTrainingSession and ClusterSpec propagation so that distributed training sessions remain robust in the face of node and network failures.