STT-tensorflow/tensorflow/python/distribute
Ruoxin Sang 02ad000479 Support dynamic outputs for XLA on demand ops.
PiperOrigin-RevId: 317902879
Change-Id: I6b6dfa54855d5996ac15d4b5c48a5db5dc230025
2020-07-01 11:11:47 -07:00
..
cluster_resolver Make task_type and task_id standard properties in tf.distribute cluster resolvers. 2020-06-22 14:48:14 -07:00
experimental
model_collection
parallel_device Wrap save/restore logic in tf.function when in eager mode. This allows parallel saving and restoring when using multiple devices. 2020-06-22 13:23:14 -07:00
BUILD Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
README.md Graduate TPUStrategy from experimental. 2020-06-20 13:10:50 -07:00
__init__.py
all_reduce.py
all_reduce_test.py
central_storage_strategy.py
checkpoint_utils_test.py
checkpointing_test.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
collective_all_reduce_strategy.py Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
collective_all_reduce_strategy_test.py Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
collective_util.py
combinations.py Treat "test_xla_gpu" as GPU_TEST in "NamedGPUCombination". 2020-04-30 12:31:23 -07:00
combinations_test.py
cross_device_ops.py Explicitly take the set of devices in CollectiveAllReduce 2020-06-16 13:20:30 -07:00
cross_device_ops_test.py Fix cross_device_ops_test with multi GPU 2020-06-16 15:33:47 -07:00
cross_device_utils.py Explicitly take the set of devices in CollectiveAllReduce 2020-06-16 13:20:30 -07:00
cross_device_utils_test.py
ctl_correctness_test.py Removing v1 optimizer tests from ctl_correctness_test to speed it up 2020-04-30 14:21:05 -07:00
custom_training_loop_gradient_test.py Support Google-internal TPU resolution in strategy combinations. 2020-05-27 14:29:14 -07:00
custom_training_loop_input_test.py Support packed variable for tf data captured function. 2020-06-22 16:24:37 -07:00
custom_training_loop_metrics_test.py
custom_training_loop_models_test.py When calling `strategy.reduce` in eager mode, wrap the `strategy.run` calls inside with `tf.function` so it is compatible with TPUStrategy. 2020-05-22 12:23:49 -07:00
custom_training_loop_optimizer_test.py
device_util.py Try to deduce job, replica and task from config.list_logical_devices() again 2020-06-16 15:22:24 -07:00
device_util_test.py Try to deduce job, replica and task from config.list_logical_devices() again 2020-06-16 15:22:24 -07:00
distribute_config.py
distribute_coordinator.py
distribute_coordinator_context.py
distribute_coordinator_test.py
distribute_lib.py Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
distribute_lib_test.py Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
distribute_utils.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
distributed_file_utils.py Ensure distributed_file_utils.remove_temp_dirpath() can be safely called multiple times. 2020-04-27 16:05:05 -07:00
distributed_file_utils_test.py Ensure distributed_file_utils.remove_temp_dirpath() can be safely called multiple times. 2020-04-27 16:05:05 -07:00
distribution_strategy_context.py
estimator_training.py
input_lib.py Enable last partial batch for MWMS in TF2.x 2020-06-22 17:27:34 -07:00
input_lib_test.py Enable last partial batch for MWMS in TF2.x 2020-06-22 17:27:34 -07:00
input_lib_type_spec_test.py Fix incompatibilities between DistributedIterator and the corresponding DistributedIteratorSpec. 2020-04-30 18:10:48 -07:00
input_ops.py
input_ops_test.py
keras_metrics_test.py
keras_save_load_test.py
metrics_v1_test.py
minimize_loss_test.py
mirrored_function_strategy.py Another round of refactoring of values.py to split utility functions that use distributed Variable types defined in values.py. 2020-06-12 12:04:39 -07:00
mirrored_function_strategy_test.py
mirrored_run.py Another round of refactoring of values.py to split utility functions that use distributed Variable types defined in values.py. 2020-06-12 12:04:39 -07:00
mirrored_strategy.py Graduate TPUStrategy from experimental. 2020-06-20 13:10:50 -07:00
mirrored_strategy_test.py Fork the keras related mirrored_strategy_test to keras/distribute. 2020-06-19 10:27:02 -07:00
mirrored_variable_test.py Fork keras related mirror_variable_test to keras/distribute. 2020-06-19 10:57:00 -07:00
model_combinations.py
moving_averages_test.py In `assign_moving_average`, call `update_fn` instead of `strategy.extended.update(var, update_fn)` when in update context. 2020-06-10 17:52:55 -07:00
multi_process_lib.py Set TF_FORCE_GPU_ALLOW_GROWTH for multi process tests 2020-06-12 12:22:38 -07:00
multi_process_runner.py Fix tsan failure in multi_process_runner_test. 2020-06-22 15:50:17 -07:00
multi_process_runner_no_init_test.py
multi_process_runner_test.py Fix tsan failure in multi_process_runner_test. 2020-06-22 15:50:17 -07:00
multi_worker_continuous_run_test.py Improve multi_process_runner 2020-06-11 05:20:30 -07:00
multi_worker_test_base.py Mark multi-process utilities with subprocess module as deprecated in favor of using MultiProcessRunner. 2020-05-11 15:07:56 -07:00
multi_worker_util.py
multi_worker_util_test.py
numpy_dataset.py
numpy_dataset_test.py
one_device_strategy.py Add InputOptions to experimental_distribute_dataset(s_from_function). 2020-06-15 16:05:58 -07:00
one_device_strategy_test.py
packed_distributed_variable.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
packed_distributed_variable_test.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
parameter_server_strategy.py Add InputOptions to experimental_distribute_dataset(s_from_function). 2020-06-15 16:05:58 -07:00
parameter_server_strategy_test.py Another round of refactoring of values.py to split utility functions that use distributed Variable types defined in values.py. 2020-06-12 12:04:39 -07:00
ps_values.py Override "map_resources" in AggregatingVariable. 2020-06-19 16:18:14 -07:00
ps_values_test.py Refactor values.py into a utility file and a PS values file. 2020-05-28 10:58:13 -07:00
reduce_util.py
remote_mirrored_strategy_eager_test.py
saved_model_mixed_api_test.py Migrate saved_model_mixed_api_test to V2 API 2020-06-11 10:50:30 -07:00
saved_model_save_load_test.py
saved_model_test_base.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
sharded_variable.py Support ShardedVariable in `tf.keras.layers.Embedding`. 2020-05-21 11:06:33 -07:00
sharded_variable_test.py
shared_variable_creator.py
shared_variable_creator_test.py
single_loss_example.py
step_fn.py
step_fn_test.py
strategy_combinations.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
strategy_combinations_test.py
strategy_common_test.py Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
strategy_reduce_test.py When calling `strategy.reduce` in eager mode, wrap the `strategy.run` calls inside with `tf.function` so it is compatible with TPUStrategy. 2020-05-22 12:23:49 -07:00
strategy_test_lib.py Another round of refactoring of values.py to split utility functions that use distributed Variable types defined in values.py. 2020-06-12 12:04:39 -07:00
summary_op_util.py
tf_function_test.py Use first worker as default device in tf_function_test. 2020-06-08 11:32:39 -07:00
tpu_strategy.py Make cluster_resolver standard property in tf.distribute strategies. 2020-06-22 18:19:14 -07:00
tpu_strategy_test.py Support dynamic outputs for XLA on demand ops. 2020-07-01 11:11:47 -07:00
tpu_values.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
values.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
values_test.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
values_util.py Support packed variable in DistributedVariable. Add an option to enable packed variable in TPUStrategy. 2020-06-18 20:12:02 -07:00
warm_starting_util_test.py
zero_batch_test.py

README.md

Tensorflow Distribute Libraries

Overview

tf.distribute.Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines or TPUs. Using this API, users can distribute their existing models and training code with minimal code changes.

It can be used with TensorFlow's high level APIs, tf.keras and tf.estimator, with just a couple of lines of code change. It does so by changing the underlying components of TensorFlow to become strategy-aware. This includes variables, layers, models, optimizers, metrics, summaries, and checkpoints.

Documentation

Distributed Training Guide

Distributed Training With Keras Tutorial

Distributed Training With Custom Training Loops Tutorial

Multiworker Training With Keras Tutorial

Multiworker Training With Estimator Tutorial

Save and Load with Distribution Strategy

Simple Examples

Using compile fit with GPUs.

# Create the strategy instance. It will automatically detect all the GPUs.
mirrored_strategy = tf.distribute.MirroredStrategy()

# Create and compile the keras model under strategy.scope()
with mirrored_strategy.scope():
  model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
  model.compile(loss='mse', optimizer='sgd')

# Call model.fit and model.evaluate as before.
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
model.fit(dataset, epochs=2)
model.evaluate(dataset)

Custom training loop with TPUs.

# Create the strategy instance.
tpu_strategy = tf.distribute.TPUStrategy(resolver)


# Create the keras model under strategy.scope()
with tpu_strategy.scope():
  model = keras.layers.Dense(1, name="dense")

# Create custom training loop body as tf.function.
@tf.function
def train_step(iterator):
  def step_fn(inputs):
    images, targets = inputs
    with tf.GradientTape() as tape:
      outputs = model(images)
      loss = tf.reduce_sum(outputs - targets)
    grads = tape.gradient(loss, model.variables)
    return grads

  return tpu_strategy.run(
      step_fn, args=(next(iterator),))

# Run the loop body once on at dataset.
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10
input_iterator = iter(tpu_strategy.experimental_distribute_dataset(dataset))
train_step(input_iterator)

Testing

Tests here should cover all distribution strategies to ensure feature parity. This can be done using the test decorators in strategy_combinations.py.