diff --git a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard1/tf.nn.moments.md b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard1/tf.nn.moments.md
index d3aa88d68de..dd56055311e 100644
--- a/tensorflow/g3doc/api_docs/python/functions_and_classes/shard1/tf.nn.moments.md
+++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/shard1/tf.nn.moments.md
@@ -6,6 +6,9 @@ The mean and variance are calculated by aggregating the contents of `x`
across `axes`. If `x` is 1-D and `axes = [0]` this is just the mean
and variance of a vector.
+Note: for numerical stability, when shift=None, the true mean
+would be computed and used as shift.
+
When using these moments for batch normalization (see
`tf.nn.batch_normalization`):
@@ -20,8 +23,9 @@ When using these moments for batch normalization (see
* `axes`: Array of ints. Axes along which to compute mean and
variance.
* `shift`: A `Tensor` containing the value by which to shift the data for
- numerical stability, or `None` if no shift is to be performed. A shift
- close to the true mean provides the most numerically stable results.
+ numerical stability, or `None` in which case the true mean of the data is
+ used as shift. A shift close to the true mean provides the most
+ numerically stable results.
* `name`: Name used to scope the operations that compute the moments.
* `keep_dims`: produce moments with the same dimensionality as the input.
diff --git a/tensorflow/g3doc/api_docs/python/nn.md b/tensorflow/g3doc/api_docs/python/nn.md
index fcc73663efd..983f68f8558 100644
--- a/tensorflow/g3doc/api_docs/python/nn.md
+++ b/tensorflow/g3doc/api_docs/python/nn.md
@@ -1932,6 +1932,9 @@ The mean and variance are calculated by aggregating the contents of `x`
across `axes`. If `x` is 1-D and `axes = [0]` this is just the mean
and variance of a vector.
+Note: for numerical stability, when shift=None, the true mean
+would be computed and used as shift.
+
When using these moments for batch normalization (see
`tf.nn.batch_normalization`):
@@ -1946,8 +1949,9 @@ When using these moments for batch normalization (see
* `axes`: Array of ints. Axes along which to compute mean and
variance.
* `shift`: A `Tensor` containing the value by which to shift the data for
- numerical stability, or `None` if no shift is to be performed. A shift
- close to the true mean provides the most numerically stable results.
+ numerical stability, or `None` in which case the true mean of the data is
+ used as shift. A shift close to the true mean provides the most
+ numerically stable results.
* `name`: Name used to scope the operations that compute the moments.
* `keep_dims`: produce moments with the same dimensionality as the input.