diff --git a/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_fast.md b/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_fast.md index 79cc778eb94..e684058326f 100644 --- a/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_fast.md +++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_fast.md @@ -4,11 +4,15 @@ Converts each string in the input Tensor to its hash mod by a number of buckets. The hash function is deterministic on the content of the string within the process and will never change. However, it is not suitable for cryptography. +This function may be used when CPU time is scarce and inputs are trusted or +unimportant. There is a risk of adversaries constructing inputs that all hash +to the same bucket. To prevent this problem, use a strong hash function with +`tf.string_to_hash_bucket_strong`. ##### Args: -* `input`: A `Tensor` of type `string`. The strings to assing a hash bucket. +* `input`: A `Tensor` of type `string`. The strings to assign a hash bucket. * `num_buckets`: An `int` that is `>= 1`. The number of buckets. * `name`: A name for the operation (optional). diff --git a/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_strong.md b/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_strong.md new file mode 100644 index 00000000000..67cf3b6fd98 --- /dev/null +++ b/tensorflow/g3doc/api_docs/python/functions_and_classes/tf.string_to_hash_bucket_strong.md @@ -0,0 +1,30 @@ +### `tf.string_to_hash_bucket_strong(input, num_buckets, key, name=None)` {#string_to_hash_bucket_strong} + +Converts each string in the input Tensor to its hash mod by a number of buckets. + +The hash function is deterministic on the content of the string within the +process. The hash function is a keyed hash function, where attribute `key` +defines the key of the hash function. `key` is an array of 2 elements. + +A strong hash is important when inputs may be malicious, e.g. URLs with +additional components. Adversaries could try to make their inputs hash to the +same bucket for a denial-of-service attack or to skew the results. A strong +hash prevents this by making it dificult, if not infeasible, to compute inputs +that hash to the same bucket. This comes at a cost of roughly 4x higher compute +time than tf.string_to_hash_bucket_fast. + +##### Args: + + +* `input`: A `Tensor` of type `string`. The strings to assign a hash bucket. +* `num_buckets`: An `int` that is `>= 1`. The number of buckets. +* `key`: A list of `ints`. + The key for the keyed hash function passed as a list of two uint64 + elements. +* `name`: A name for the operation (optional). + +##### Returns: + + A `Tensor` of type `int64`. + A Tensor of the same shape as the input `string_tensor`. + diff --git a/tensorflow/g3doc/api_docs/python/index.md b/tensorflow/g3doc/api_docs/python/index.md index c5f259fa891..39307e1248e 100644 --- a/tensorflow/g3doc/api_docs/python/index.md +++ b/tensorflow/g3doc/api_docs/python/index.md @@ -260,6 +260,7 @@ * [`reduce_join`](../../api_docs/python/string_ops.md#reduce_join) * [`string_to_hash_bucket`](../../api_docs/python/string_ops.md#string_to_hash_bucket) * [`string_to_hash_bucket_fast`](../../api_docs/python/string_ops.md#string_to_hash_bucket_fast) + * [`string_to_hash_bucket_strong`](../../api_docs/python/string_ops.md#string_to_hash_bucket_strong) * **[Histograms](../../api_docs/python/histogram_ops.md)**: * [`histogram_fixed_width`](../../api_docs/python/histogram_ops.md#histogram_fixed_width) diff --git a/tensorflow/g3doc/api_docs/python/string_ops.md b/tensorflow/g3doc/api_docs/python/string_ops.md index 302d9df8099..a516d851cf2 100644 --- a/tensorflow/g3doc/api_docs/python/string_ops.md +++ b/tensorflow/g3doc/api_docs/python/string_ops.md @@ -20,11 +20,15 @@ Converts each string in the input Tensor to its hash mod by a number of buckets. The hash function is deterministic on the content of the string within the process and will never change. However, it is not suitable for cryptography. +This function may be used when CPU time is scarce and inputs are trusted or +unimportant. There is a risk of adversaries constructing inputs that all hash +to the same bucket. To prevent this problem, use a strong hash function with +`tf.string_to_hash_bucket_strong`. ##### Args: -* `input`: A `Tensor` of type `string`. The strings to assing a hash bucket. +* `input`: A `Tensor` of type `string`. The strings to assign a hash bucket. * `num_buckets`: An `int` that is `>= 1`. The number of buckets. * `name`: A name for the operation (optional). @@ -34,6 +38,39 @@ process and will never change. However, it is not suitable for cryptography. A Tensor of the same shape as the input `string_tensor`. +- - - + +### `tf.string_to_hash_bucket_strong(input, num_buckets, key, name=None)` {#string_to_hash_bucket_strong} + +Converts each string in the input Tensor to its hash mod by a number of buckets. + +The hash function is deterministic on the content of the string within the +process. The hash function is a keyed hash function, where attribute `key` +defines the key of the hash function. `key` is an array of 2 elements. + +A strong hash is important when inputs may be malicious, e.g. URLs with +additional components. Adversaries could try to make their inputs hash to the +same bucket for a denial-of-service attack or to skew the results. A strong +hash prevents this by making it dificult, if not infeasible, to compute inputs +that hash to the same bucket. This comes at a cost of roughly 4x higher compute +time than tf.string_to_hash_bucket_fast. + +##### Args: + + +* `input`: A `Tensor` of type `string`. The strings to assign a hash bucket. +* `num_buckets`: An `int` that is `>= 1`. The number of buckets. +* `key`: A list of `ints`. + The key for the keyed hash function passed as a list of two uint64 + elements. +* `name`: A name for the operation (optional). + +##### Returns: + + A `Tensor` of type `int64`. + A Tensor of the same shape as the input `string_tensor`. + + - - - ### `tf.string_to_hash_bucket(string_tensor, num_buckets, name=None)` {#string_to_hash_bucket}