61c2e69663
PiperOrigin-RevId: 317736970 Change-Id: Ia9c76462afc4c2fcc42a149960b50b2cbcafd482 |
||
---|---|---|
.. | ||
tpu | ||
BUILD | ||
README.md | ||
README_Slurm.md | ||
__init__.py | ||
cluster_resolver.py | ||
cluster_resolver_test.py | ||
gce_cluster_resolver.py | ||
gce_cluster_resolver_test.py | ||
kubernetes_cluster_resolver.py | ||
kubernetes_cluster_resolver_test.py | ||
slurm_cluster_resolver.py | ||
slurm_cluster_resolver_test.py | ||
tfconfig_cluster_resolver.py | ||
tfconfig_cluster_resolver_test.py | ||
tpu_cluster_resolver.py |
README.md
Cluster Resolvers
Cluster Resolvers are a new way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec
framework, Cluster Resolvers allow users to simply specify a configuration and a cluster management service and a ClusterResolver
will automatically fetch the relevant information from the service and populate ClusterSpec
s.
ClusterResolvers
are designed to work well with ManagedTrainingSession
and ClusterSpec
propagation so that distributed training sessions remain robust in the face of node and network failures.