Add description of Nudging logic to FakeQuant documentation.
PiperOrigin-RevId: 238716177
This commit is contained in:
parent
25b44710c5
commit
5feba9d4a7
@ -8,6 +8,15 @@ when `narrow_range` is false and `[1; 2^num_bits - 1]` when it is true) and
|
||||
then de-quantized and output as floats in `[min; max]` interval.
|
||||
`num_bits` is the bitwidth of the quantization; between 2 and 16, inclusive.
|
||||
|
||||
Before quantization, `min` and `max` values are adjusted with the following
|
||||
logic.
|
||||
It is suggested to have `min <= 0 <= max`. If `0` is not in the range of values,
|
||||
the behavior can be unexpected:
|
||||
If `0 < min < max`: `min_adj = 0` and `max_adj = max - min`.
|
||||
If `min < max < 0`: `min_adj = min - max` and `max_adj = 0`.
|
||||
If `min <= 0 <= max`: `scale = (max - min) / (2^num_bits - 1) `,
|
||||
`min_adj = scale * round(min / scale)` and `max_adj = max + min_adj - min`.
|
||||
|
||||
Quantization is called fake since the output is still in floating point.
|
||||
END
|
||||
}
|
||||
|
@ -10,6 +10,15 @@ when `narrow_range` is false and `[1; 2^num_bits - 1]` when it is true) and
|
||||
then de-quantized and output as floats in `[min; max]` interval.
|
||||
`num_bits` is the bitwidth of the quantization; between 2 and 16, inclusive.
|
||||
|
||||
Before quantization, `min` and `max` values are adjusted with the following
|
||||
logic.
|
||||
It is suggested to have `min <= 0 <= max`. If `0` is not in the range of values,
|
||||
the behavior can be unexpected:
|
||||
If `0 < min < max`: `min_adj = 0` and `max_adj = max - min`.
|
||||
If `min < max < 0`: `min_adj = min - max` and `max_adj = 0`.
|
||||
If `min <= 0 <= max`: `scale = (max - min) / (2^num_bits - 1) `,
|
||||
`min_adj = scale * round(min / scale)` and `max_adj = max + min_adj - min`.
|
||||
|
||||
This operation has a gradient and thus allows for training `min` and `max`
|
||||
values.
|
||||
END
|
||||
|
@ -11,6 +11,15 @@ when `narrow_range` is false and `[1; 2^num_bits - 1]` when it is true) and
|
||||
then de-quantized and output as floats in `[min; max]` interval.
|
||||
`num_bits` is the bitwidth of the quantization; between 2 and 16, inclusive.
|
||||
|
||||
Before quantization, `min` and `max` values are adjusted with the following
|
||||
logic.
|
||||
It is suggested to have `min <= 0 <= max`. If `0` is not in the range of values,
|
||||
the behavior can be unexpected:
|
||||
If `0 < min < max`: `min_adj = 0` and `max_adj = max - min`.
|
||||
If `min < max < 0`: `min_adj = min - max` and `max_adj = 0`.
|
||||
If `min <= 0 <= max`: `scale = (max - min) / (2^num_bits - 1) `,
|
||||
`min_adj = scale * round(min / scale)` and `max_adj = max + min_adj - min`.
|
||||
|
||||
This operation has a gradient and thus allows for training `min` and `max`
|
||||
values.
|
||||
END
|
||||
|
Loading…
x
Reference in New Issue
Block a user