STT-tensorflow/tensorflow/python/distribute/cluster_resolver
Kibeom Kim 0c67638ac2 Remove deprecated tfrt_enabled test target flag.
PiperOrigin-RevId: 338530097
Change-Id: I0bd2ad366210330ece06f99a4fdb16de395ece05
2020-10-22 12:55:06 -07:00
..
tpu Remove deprecated tfrt_enabled test target flag. 2020-10-22 12:55:06 -07:00
__init__.py
BUILD Remove deprecated tfrt_enabled test target flag. 2020-10-22 12:55:06 -07:00
cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
cluster_resolver.py Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
gce_cluster_resolver_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
gce_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
kubernetes_cluster_resolver_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
kubernetes_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
README_Slurm.md Merge pull request #38355 from Flamefire:slurm_cluster_resolver_docu 2020-04-09 23:49:21 -07:00
README.md
sagemaker_cluster_resolver_test.py Merge pull request #43251 from sboshin:sagemaker_resolver 2020-09-17 20:22:03 -07:00
sagemaker_cluster_resolver.py Merge pull request #43251 from sboshin:sagemaker_resolver 2020-09-17 20:22:03 -07:00
slurm_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
slurm_cluster_resolver.py fix some linter errors for slurm_cluster_resolver. 2020-05-29 16:56:51 -07:00
tfconfig_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
tfconfig_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
tpu_cluster_resolver.py Move TPUClusterResolver into tpu subdirectory. 2020-05-13 14:59:47 -07:00

Cluster Resolvers

Cluster Resolvers are a new way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec framework, Cluster Resolvers allow users to simply specify a configuration and a cluster management service and a ClusterResolver will automatically fetch the relevant information from the service and populate ClusterSpecs.

ClusterResolvers are designed to work well with ManagedTrainingSession and ClusterSpec propagation so that distributed training sessions remain robust in the face of node and network failures.