STT-tensorflow/tensorflow/python/distribute/cluster_resolver
A. Unique TensorFlower 034633f23b PY2 removal cleanup
PiperOrigin-RevId: 352106691
Change-Id: I382d53c64f0d29da430b8cb6d2395a2cb281509e
2021-01-15 16:48:57 -08:00
..
tpu PY2 removal cleanup 2021-01-15 16:48:57 -08:00
__init__.py
BUILD PY2 removal cleanup 2021-01-15 16:48:57 -08:00
cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
cluster_resolver.py Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
gce_cluster_resolver_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
gce_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
kubernetes_cluster_resolver_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
kubernetes_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
README_Slurm.md Merge pull request from Flamefire:slurm_cluster_resolver_docu 2020-04-09 23:49:21 -07:00
README.md
sagemaker_cluster_resolver_test.py Merge pull request from sboshin:sagemaker_resolver 2020-09-17 20:22:03 -07:00
sagemaker_cluster_resolver.py Merge pull request from sboshin:sagemaker_resolver 2020-09-17 20:22:03 -07:00
slurm_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
slurm_cluster_resolver.py fix some linter errors for slurm_cluster_resolver. 2020-05-29 16:56:51 -07:00
tfconfig_cluster_resolver_test.py Add get_tpu_system_metadata API to TPUClusterResolver. Also export tf.tpu.experimental.TPUSystemMetadata and tf.tpu.experimental.Topology symbols. 2020-03-18 19:48:04 -07:00
tfconfig_cluster_resolver.py Docstring fixes for cluster resolvers. 2020-06-08 23:10:17 -07:00
tpu_cluster_resolver.py Move TPUClusterResolver into tpu subdirectory. 2020-05-13 14:59:47 -07:00

Cluster Resolvers

Cluster Resolvers are a new way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec framework, Cluster Resolvers allow users to simply specify a configuration and a cluster management service and a ClusterResolver will automatically fetch the relevant information from the service and populate ClusterSpecs.

ClusterResolvers are designed to work well with ManagedTrainingSession and ClusterSpec propagation so that distributed training sessions remain robust in the face of node and network failures.