Update keras MultiHeadAttention attention mask docstring to state 1 is to attend and 0 is to mask.
https://github.com/tensorflow/tensorflow/issues/45854 PiperOrigin-RevId: 351258459 Change-Id: I1b6631cc7297d8a8077e754f45d701a03dccfe3e
This commit is contained in:
parent
161fcca9a7
commit
5e90f547a2
@ -308,7 +308,8 @@ class Softmax(Layer):
|
|||||||
normalization is applied.
|
normalization is applied.
|
||||||
Call arguments:
|
Call arguments:
|
||||||
inputs: The inputs, or logits to the softmax layer.
|
inputs: The inputs, or logits to the softmax layer.
|
||||||
mask: A boolean mask of the same shape as `inputs`. Defaults to `None`.
|
mask: A boolean mask of the same shape as `inputs`. Defaults to `None`. The
|
||||||
|
mask specifies 1 to keep and 0 to mask.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
softmaxed output with the same shape as `inputs`.
|
softmaxed output with the same shape as `inputs`.
|
||||||
@ -321,7 +322,7 @@ class Softmax(Layer):
|
|||||||
|
|
||||||
def call(self, inputs, mask=None):
|
def call(self, inputs, mask=None):
|
||||||
if mask is not None:
|
if mask is not None:
|
||||||
# Since attention_mask is 1.0 for positions we want to attend and 0.0 for
|
# Since mask is 1.0 for positions we want to keep and 0.0 for
|
||||||
# masked positions, this operation will create a tensor which is 0.0 for
|
# masked positions, this operation will create a tensor which is 0.0 for
|
||||||
# positions we want to attend and -1e.9 for masked positions.
|
# positions we want to attend and -1e.9 for masked positions.
|
||||||
adder = (1.0 - math_ops.cast(mask, inputs.dtype)) * (
|
adder = (1.0 - math_ops.cast(mask, inputs.dtype)) * (
|
||||||
|
@ -193,7 +193,10 @@ class MultiHeadAttention(Layer):
|
|||||||
key: Optional key `Tensor` of shape `[B, S, dim]`. If not given, will use
|
key: Optional key `Tensor` of shape `[B, S, dim]`. If not given, will use
|
||||||
`value` for both `key` and `value`, which is the most common case.
|
`value` for both `key` and `value`, which is the most common case.
|
||||||
attention_mask: a boolean mask of shape `[B, T, S]`, that prevents
|
attention_mask: a boolean mask of shape `[B, T, S]`, that prevents
|
||||||
attention to certain positions.
|
attention to certain positions. The boolean mask specifies which query
|
||||||
|
elements can attend to which key elements, 1 indicates attention and 0
|
||||||
|
indicates no attention. Broadcasting can happen for the missing batch
|
||||||
|
dimensions and the head dimension.
|
||||||
return_attention_scores: A boolean to indicate whether the output should
|
return_attention_scores: A boolean to indicate whether the output should
|
||||||
be attention output if True, or (attention_output, attention_scores) if
|
be attention output if True, or (attention_output, attention_scores) if
|
||||||
False. Defaults to False.
|
False. Defaults to False.
|
||||||
|
Loading…
Reference in New Issue
Block a user