Update generated Python Op docs.
Change: 141245282
This commit is contained in:
parent
761b12ed82
commit
d95969fe12
@ -1,294 +0,0 @@
|
|||||||
Class to synchronize, aggregate gradients and pass them to the optimizer.
|
|
||||||
|
|
||||||
In a typical asynchronous training environment, it's common to have some
|
|
||||||
stale gradients. For example, with a N-replica asynchronous training,
|
|
||||||
gradients will be applied to the variables N times independently. Depending
|
|
||||||
on each replica's training speed, some gradients might be calculated from
|
|
||||||
copies of the variable from several steps back (N-1 steps on average). This
|
|
||||||
optimizer avoids stale gradients by collecting gradients from all replicas,
|
|
||||||
summing them, then applying them to the variables in one shot, after
|
|
||||||
which replicas can fetch the new variables and continue.
|
|
||||||
|
|
||||||
The following queues are created:
|
|
||||||
<empty line>
|
|
||||||
* N `gradient` queues, one per variable to train. Gradients are pushed to
|
|
||||||
these queues and the chief worker will dequeue_many and then sum them
|
|
||||||
before applying to variables.
|
|
||||||
* 1 `token` queue where the optimizer pushes the new global_step value after
|
|
||||||
all gradients have been applied.
|
|
||||||
|
|
||||||
The following variables are created:
|
|
||||||
* N `local_step`, one per replica. Compared against global step to check for
|
|
||||||
staleness of the gradients.
|
|
||||||
|
|
||||||
This adds nodes to the graph to collect gradients and pause the trainers until
|
|
||||||
variables are updated.
|
|
||||||
For the PS:
|
|
||||||
<empty line>
|
|
||||||
1. A queue is created for each variable, and each replica now pushes the
|
|
||||||
gradients into the queue instead of directly applying them to the
|
|
||||||
variables.
|
|
||||||
2. For each gradient_queue, pop and sum the gradients once enough
|
|
||||||
replicas (replicas_to_aggregate) have pushed gradients to the queue.
|
|
||||||
3. Apply the aggregated gradients to the variables.
|
|
||||||
4. Only after all variables have been updated, increment the global step.
|
|
||||||
5. Only after step 4, clear all the gradients in the queues as they are
|
|
||||||
stale now (could happen when replicas are restarted and push to the queues
|
|
||||||
multiple times, or from the backup replicas).
|
|
||||||
6. Only after step 5, pushes `global_step` in the `token_queue`, once for
|
|
||||||
each worker replica. The workers can now fetch it to its local_step variable
|
|
||||||
and start the next batch.
|
|
||||||
|
|
||||||
For the replicas:
|
|
||||||
<empty line>
|
|
||||||
1. Start a step: fetch variables and compute gradients.
|
|
||||||
2. Once the gradients have been computed, push them into `gradient_queue` only
|
|
||||||
if local_step equals global_step, otherwise the gradients are just dropped.
|
|
||||||
This avoids stale gradients.
|
|
||||||
3. After pushing all the gradients, dequeue an updated value of global_step
|
|
||||||
from the token queue and record that step to its local_step variable. Note
|
|
||||||
that this is effectively a barrier.
|
|
||||||
4. Start the next batch.
|
|
||||||
|
|
||||||
### Usage
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Create any optimizer to update the variables, say a simple SGD:
|
|
||||||
opt = GradientDescentOptimizer(learning_rate=0.1)
|
|
||||||
|
|
||||||
# Wrap the optimizer with sync_replicas_optimizer with 50 replicas: at each
|
|
||||||
# step the optimizer collects 50 gradients before applying to variables.
|
|
||||||
opt = tf.SyncReplicasOptimizer(opt, replicas_to_aggregate=50,
|
|
||||||
replica_id=task_id, total_num_replicas=50)
|
|
||||||
# Note that if you want to have 2 backup replicas, you can change
|
|
||||||
# total_num_replicas=52 and make sure this number matches how many physical
|
|
||||||
# replicas you started in your job.
|
|
||||||
|
|
||||||
# Some models have startup_delays to help stabilize the model but when using
|
|
||||||
# sync_replicas training, set it to 0.
|
|
||||||
|
|
||||||
# Now you can call `minimize()` or `compute_gradients()` and
|
|
||||||
# `apply_gradients()` normally
|
|
||||||
grads = opt.minimize(total_loss, global_step=self.global_step)
|
|
||||||
|
|
||||||
|
|
||||||
# You can now call get_init_tokens_op() and get_chief_queue_runner().
|
|
||||||
# Note that get_init_tokens_op() must be called before creating session
|
|
||||||
# because it modifies the graph.
|
|
||||||
init_token_op = opt.get_init_tokens_op()
|
|
||||||
chief_queue_runner = opt.get_chief_queue_runner()
|
|
||||||
```
|
|
||||||
|
|
||||||
In the training program, every worker will run the train_op as if not
|
|
||||||
synchronized. But one worker (usually the chief) will need to execute the
|
|
||||||
chief_queue_runner and get_init_tokens_op generated from this optimizer.
|
|
||||||
|
|
||||||
```python
|
|
||||||
# After the session is created by the Supervisor and before the main while
|
|
||||||
# loop:
|
|
||||||
if is_chief and FLAGS.sync_replicas:
|
|
||||||
sv.start_queue_runners(sess, [chief_queue_runner])
|
|
||||||
# Insert initial tokens to the queue.
|
|
||||||
sess.run(init_token_op)
|
|
||||||
```
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.__init__(opt, replicas_to_aggregate, variable_averages=None, variables_to_average=None, replica_id=None, total_num_replicas=0, use_locking=False, name='sync_replicas')` {#SyncReplicasOptimizer.__init__}
|
|
||||||
|
|
||||||
Construct a sync_replicas optimizer.
|
|
||||||
|
|
||||||
##### Args:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`opt`</b>: The actual optimizer that will be used to compute and apply the
|
|
||||||
gradients. Must be one of the Optimizer classes.
|
|
||||||
* <b>`replicas_to_aggregate`</b>: number of replicas to aggregate for each variable
|
|
||||||
update.
|
|
||||||
* <b>`variable_averages`</b>: Optional `ExponentialMovingAverage` object, used to
|
|
||||||
maintain moving averages for the variables passed in
|
|
||||||
`variables_to_average`.
|
|
||||||
* <b>`variables_to_average`</b>: a list of variables that need to be averaged. Only
|
|
||||||
needed if variable_averages is passed in.
|
|
||||||
* <b>`replica_id`</b>: This is the task/worker/replica ID. Needed as index to access
|
|
||||||
local_steps to check staleness. Must be in the interval:
|
|
||||||
[0, total_num_replicas)
|
|
||||||
* <b>`total_num_replicas`</b>: Total number of tasks/workers/replicas, could be
|
|
||||||
different from replicas_to_aggregate.
|
|
||||||
If total_num_replicas > replicas_to_aggregate: it is backup_replicas +
|
|
||||||
replicas_to_aggregate.
|
|
||||||
If total_num_replicas < replicas_to_aggregate: Replicas compute
|
|
||||||
multiple batches per update to variables.
|
|
||||||
* <b>`use_locking`</b>: If True use locks for update operation.
|
|
||||||
* <b>`name`</b>: string. Optional name of the returned operation.
|
|
||||||
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.compute_gradients(*args, **kwargs)` {#SyncReplicasOptimizer.compute_gradients}
|
|
||||||
|
|
||||||
Compute gradients of "loss" for the variables in "var_list".
|
|
||||||
|
|
||||||
This simply wraps the compute_gradients() from the real optimizer. The
|
|
||||||
gradients will be aggregated in the apply_gradients() so that user can
|
|
||||||
modify the gradients like clipping with per replica global norm if needed.
|
|
||||||
The global norm with aggregated gradients can be bad as one replica's huge
|
|
||||||
gradients can hurt the gradients from other replicas.
|
|
||||||
|
|
||||||
##### Args:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`*args`</b>: Arguments for compute_gradients().
|
|
||||||
* <b>`**kwargs`</b>: Keyword arguments for compute_gradients().
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
A list of (gradient, variable) pairs.
|
|
||||||
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.apply_gradients(grads_and_vars, global_step=None, name=None)` {#SyncReplicasOptimizer.apply_gradients}
|
|
||||||
|
|
||||||
Apply gradients to variables.
|
|
||||||
|
|
||||||
This contains most of the synchronization implementation and also wraps the
|
|
||||||
apply_gradients() from the real optimizer.
|
|
||||||
|
|
||||||
##### Args:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`grads_and_vars`</b>: List of (gradient, variable) pairs as returned by
|
|
||||||
compute_gradients().
|
|
||||||
* <b>`global_step`</b>: Optional Variable to increment by one after the
|
|
||||||
variables have been updated.
|
|
||||||
* <b>`name`</b>: Optional name for the returned operation. Default to the
|
|
||||||
name passed to the Optimizer constructor.
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`train_op`</b>: The op to dequeue a token so the replicas can exit this batch
|
|
||||||
and start the next one. This is executed by each replica.
|
|
||||||
|
|
||||||
##### Raises:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`ValueError`</b>: If the grads_and_vars is empty.
|
|
||||||
* <b>`ValueError`</b>: If global step is not provided, the staleness cannot be
|
|
||||||
checked.
|
|
||||||
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.get_chief_queue_runner()` {#SyncReplicasOptimizer.get_chief_queue_runner}
|
|
||||||
|
|
||||||
Returns the QueueRunner for the chief to execute.
|
|
||||||
|
|
||||||
This includes the operations to synchronize replicas: aggregate gradients,
|
|
||||||
apply to variables, increment global step, insert tokens to token queue.
|
|
||||||
|
|
||||||
Note that this can only be called after calling apply_gradients() which
|
|
||||||
actually generates this queuerunner.
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
A `QueueRunner` for chief to execute.
|
|
||||||
|
|
||||||
##### Raises:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`ValueError`</b>: If this is called before apply_gradients().
|
|
||||||
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.get_init_tokens_op(num_tokens=-1)` {#SyncReplicasOptimizer.get_init_tokens_op}
|
|
||||||
|
|
||||||
Returns the op to fill the sync_token_queue with the tokens.
|
|
||||||
|
|
||||||
This is supposed to be executed in the beginning of the chief/sync thread
|
|
||||||
so that even if the total_num_replicas is less than replicas_to_aggregate,
|
|
||||||
the model can still proceed as the replicas can compute multiple steps per
|
|
||||||
variable update. Make sure:
|
|
||||||
`num_tokens >= replicas_to_aggregate - total_num_replicas`.
|
|
||||||
|
|
||||||
##### Args:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`num_tokens`</b>: Number of tokens to add to the queue.
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
An op for the chief/sync replica to fill the token queue.
|
|
||||||
|
|
||||||
##### Raises:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`ValueError`</b>: If this is called before apply_gradients().
|
|
||||||
* <b>`ValueError`</b>: If num_tokens are smaller than replicas_to_aggregate -
|
|
||||||
total_num_replicas.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#### Other Methods
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.get_clean_up_op()` {#SyncReplicasOptimizer.get_clean_up_op}
|
|
||||||
|
|
||||||
Returns the clean up op for the chief to execute before exit.
|
|
||||||
|
|
||||||
This includes the operation to abort the device with the token queue so all
|
|
||||||
other replicas can also restart. This can avoid potential hang when chief
|
|
||||||
restarts.
|
|
||||||
|
|
||||||
Note that this can only be called after calling apply_gradients().
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
A clean_up_op for chief to execute before exits.
|
|
||||||
|
|
||||||
##### Raises:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`ValueError`</b>: If this is called before apply_gradients().
|
|
||||||
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.get_slot(*args, **kwargs)` {#SyncReplicasOptimizer.get_slot}
|
|
||||||
|
|
||||||
Return a slot named "name" created for "var" by the Optimizer.
|
|
||||||
|
|
||||||
This simply wraps the get_slot() from the actual optimizer.
|
|
||||||
|
|
||||||
##### Args:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`*args`</b>: Arguments for get_slot().
|
|
||||||
* <b>`**kwargs`</b>: Keyword arguments for get_slot().
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
The `Variable` for the slot if it was created, `None` otherwise.
|
|
||||||
|
|
||||||
|
|
||||||
- - -
|
|
||||||
|
|
||||||
#### `tf.train.SyncReplicasOptimizer.get_slot_names(*args, **kwargs)` {#SyncReplicasOptimizer.get_slot_names}
|
|
||||||
|
|
||||||
Return a list of the names of slots created by the `Optimizer`.
|
|
||||||
|
|
||||||
This simply wraps the get_slot_names() from the actual optimizer.
|
|
||||||
|
|
||||||
##### Args:
|
|
||||||
|
|
||||||
|
|
||||||
* <b>`*args`</b>: Arguments for get_slot().
|
|
||||||
* <b>`**kwargs`</b>: Keyword arguments for get_slot().
|
|
||||||
|
|
||||||
##### Returns:
|
|
||||||
|
|
||||||
A list of strings.
|
|
||||||
|
|
||||||
|
|
@ -1,4 +1,4 @@
|
|||||||
### `tf.nn.nce_loss(weights, biases, inputs, labels, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, partition_strategy='mod', name='nce_loss')` {#nce_loss}
|
### `tf.nn.nce_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, partition_strategy='mod', name='nce_loss')` {#nce_loss}
|
||||||
|
|
||||||
Computes and returns the noise-contrastive estimation training loss.
|
Computes and returns the noise-contrastive estimation training loss.
|
||||||
|
|
||||||
@ -30,10 +30,10 @@ with an otherwise unused class.
|
|||||||
objects whose concatenation along dimension 0 has shape
|
objects whose concatenation along dimension 0 has shape
|
||||||
[num_classes, dim]. The (possibly-partitioned) class embeddings.
|
[num_classes, dim]. The (possibly-partitioned) class embeddings.
|
||||||
* <b>`biases`</b>: A `Tensor` of shape `[num_classes]`. The class biases.
|
* <b>`biases`</b>: A `Tensor` of shape `[num_classes]`. The class biases.
|
||||||
* <b>`inputs`</b>: A `Tensor` of shape `[batch_size, dim]`. The forward
|
|
||||||
activations of the input network.
|
|
||||||
* <b>`labels`</b>: A `Tensor` of type `int64` and shape `[batch_size,
|
* <b>`labels`</b>: A `Tensor` of type `int64` and shape `[batch_size,
|
||||||
num_true]`. The target classes.
|
num_true]`. The target classes.
|
||||||
|
* <b>`inputs`</b>: A `Tensor` of shape `[batch_size, dim]`. The forward
|
||||||
|
activations of the input network.
|
||||||
* <b>`num_sampled`</b>: An `int`. The number of classes to randomly sample per batch.
|
* <b>`num_sampled`</b>: An `int`. The number of classes to randomly sample per batch.
|
||||||
* <b>`num_classes`</b>: An `int`. The number of possible classes.
|
* <b>`num_classes`</b>: An `int`. The number of possible classes.
|
||||||
* <b>`num_true`</b>: An `int`. The number of target classes per training example.
|
* <b>`num_true`</b>: An `int`. The number of target classes per training example.
|
||||||
|
@ -660,7 +660,6 @@
|
|||||||
* [`summary_iterator`](../../api_docs/python/train.md#summary_iterator)
|
* [`summary_iterator`](../../api_docs/python/train.md#summary_iterator)
|
||||||
* [`SummarySaverHook`](../../api_docs/python/train.md#SummarySaverHook)
|
* [`SummarySaverHook`](../../api_docs/python/train.md#SummarySaverHook)
|
||||||
* [`Supervisor`](../../api_docs/python/train.md#Supervisor)
|
* [`Supervisor`](../../api_docs/python/train.md#Supervisor)
|
||||||
* [`SyncReplicasOptimizer`](../../api_docs/python/train.md#SyncReplicasOptimizer)
|
|
||||||
* [`SyncReplicasOptimizerV2`](../../api_docs/python/train.md#SyncReplicasOptimizerV2)
|
* [`SyncReplicasOptimizerV2`](../../api_docs/python/train.md#SyncReplicasOptimizerV2)
|
||||||
* [`WorkerSessionCreator`](../../api_docs/python/train.md#WorkerSessionCreator)
|
* [`WorkerSessionCreator`](../../api_docs/python/train.md#WorkerSessionCreator)
|
||||||
* [`write_graph`](../../api_docs/python/train.md#write_graph)
|
* [`write_graph`](../../api_docs/python/train.md#write_graph)
|
||||||
|
@ -3379,7 +3379,7 @@ TensorFlow provides the following sampled loss functions for faster training.
|
|||||||
|
|
||||||
- - -
|
- - -
|
||||||
|
|
||||||
### `tf.nn.nce_loss(weights, biases, inputs, labels, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, partition_strategy='mod', name='nce_loss')` {#nce_loss}
|
### `tf.nn.nce_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, partition_strategy='mod', name='nce_loss')` {#nce_loss}
|
||||||
|
|
||||||
Computes and returns the noise-contrastive estimation training loss.
|
Computes and returns the noise-contrastive estimation training loss.
|
||||||
|
|
||||||
@ -3411,10 +3411,10 @@ with an otherwise unused class.
|
|||||||
objects whose concatenation along dimension 0 has shape
|
objects whose concatenation along dimension 0 has shape
|
||||||
[num_classes, dim]. The (possibly-partitioned) class embeddings.
|
[num_classes, dim]. The (possibly-partitioned) class embeddings.
|
||||||
* <b>`biases`</b>: A `Tensor` of shape `[num_classes]`. The class biases.
|
* <b>`biases`</b>: A `Tensor` of shape `[num_classes]`. The class biases.
|
||||||
* <b>`inputs`</b>: A `Tensor` of shape `[batch_size, dim]`. The forward
|
|
||||||
activations of the input network.
|
|
||||||
* <b>`labels`</b>: A `Tensor` of type `int64` and shape `[batch_size,
|
* <b>`labels`</b>: A `Tensor` of type `int64` and shape `[batch_size,
|
||||||
num_true]`. The target classes.
|
num_true]`. The target classes.
|
||||||
|
* <b>`inputs`</b>: A `Tensor` of shape `[batch_size, dim]`. The forward
|
||||||
|
activations of the input network.
|
||||||
* <b>`num_sampled`</b>: An `int`. The number of classes to randomly sample per batch.
|
* <b>`num_sampled`</b>: An `int`. The number of classes to randomly sample per batch.
|
||||||
* <b>`num_classes`</b>: An `int`. The number of possible classes.
|
* <b>`num_classes`</b>: An `int`. The number of possible classes.
|
||||||
* <b>`num_true`</b>: An `int`. The number of target classes per training example.
|
* <b>`num_true`</b>: An `int`. The number of target classes per training example.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user