PiperOrigin-RevId: 337402890
Change-Id: I23d0b41b721f5a0ba8d021ffe8e64f66befa65a2
This commit is contained in:
Yash Katariya 2020-10-15 16:10:25 -07:00 committed by TensorFlower Gardener
parent a70f132c8f
commit 53ae1fce37

View File

@ -63,7 +63,7 @@ def one_hot(input_text,
filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
lower=True,
split=' '):
"""One-hot encodes a text into a list of word indexes of size `n`.
r"""One-hot encodes a text into a list of word indexes of size `n`.
This function receives as input a string of text and returns a
list of encoded integers each corresponding to a word (or token)
@ -73,8 +73,11 @@ def one_hot(input_text,
input_text: Input text (string).
n: int. Size of vocabulary.
filters: list (or concatenation) of characters to filter out, such as
punctuation. Default: ``!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\\t\\n``,
includes basic punctuation, tabs, and newlines.
punctuation. Default:
```
'!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\t\n
```,
includes basic punctuation, tabs, and newlines.
lower: boolean. Whether to set the text to lowercase.
split: str. Separator for word splitting.