Update a stale comment in NEON kernels.

PiperOrigin-RevId: 246073838
This commit is contained in:
Benoit Jacob 2019-04-30 20:45:28 -07:00 committed by TensorFlower Gardener
parent 7114b99543
commit e0df2327cf

View File

@ -508,18 +508,26 @@ void Kernel8bitNeonOutOfOrder(const KernelParams8bit<4, 4>& params) {
"sqrdmulh v18.4s, v18.4s, v15.4s\n" "sqrdmulh v18.4s, v18.4s, v15.4s\n"
"sqrdmulh v19.4s, v19.4s, v15.4s\n" "sqrdmulh v19.4s, v19.4s, v15.4s\n"
// We have some rounding // We have some rounding division-by-power-of-two to do. This should
// division-by-power-of-two to do. Normally, this should be just // always use "round to nearest". We allow for some
// a rounding-right-shift, srshl. However, that does not quite // freedom in how ties are broken, to strike a good compromise of
// implement the round-to-nearest semantics that we need. See // performance on given hardware vs. perfect agreement of results
// Appendix B of https://arxiv.org/pdf/1712.05877.pdf // across hardware.
//
// Because we are going to get benchmarked against less-careful // When RUY_OPT_NATIVE_ROUNDING is enabled, we allow for implementation
// competition, let's give people the ability to get faster, less // defined tie-breaks to help performance. On NEON, this means that we
// careful arithmetic if they want --- define RUY_SLOPPY. We don't // can just use the NEON rounding instructions, such as srshl. They
// recommend using that in production, we have observed measurable // happen to be breaking ties upward.
// loss of accuracy from this on MobileNets (which is how we noticed //
// this whole issue in the first place). // When RUY_OPT_NATIVE_ROUNDING is disabled, we implement strict
// break-ties-away-from zero, as described in Appendix B of
// https://arxiv.org/pdf/1712.05877.pdf
// When we wrote that, we thought that that would be better unbiased
// than the NEON upwards tie-breaks, and we had observed some
// improvement on some model. However, that is only more unbiased for
// data centered at zero, which was likely the case in that model,
// but is not always the case. If we wanted something more consistently
// unbiased then we should try breaking ties toward-nearest-even.
#if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING) #if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING)
// Fix up values to be right-shifted, so that the (round to nearest, // Fix up values to be right-shifted, so that the (round to nearest,
// break ties upward) behavior of srshl applied to these fixed-up // break ties upward) behavior of srshl applied to these fixed-up
@ -1434,17 +1442,26 @@ void Kernel8bitNeonInOrder(const KernelParams8bit<4, 4>& params) {
"ldr x4, [%[rhs_ptr], #56]\n" "ldr x4, [%[rhs_ptr], #56]\n"
"sqrdmulh v19.4s, v19.4s, v15.4s\n" "sqrdmulh v19.4s, v19.4s, v15.4s\n"
// We have some rounding // We have some rounding division-by-power-of-two to do. This should
// division-by-power-of-two to do. Normally, this should be just // always use "round to nearest". We allow for some
// a rounding-right-shift, srshl. However, that does not quite // freedom in how ties are broken, to strike a good compromise of
// implement the round-to-nearest semantics that we need. See // performance on given hardware vs. perfect agreement of results
// Appendix B of https://arxiv.org/pdf/1712.05877.pdf // across hardware.
// Because we are going to get benchmarked against less-careful //
// competition, let's give people the ability to get faster, less // When RUY_OPT_NATIVE_ROUNDING is enabled, we allow for implementation
// careful arithmetic if they want --- define RUY_SLOPPY. We don't // defined tie-breaks to help performance. On NEON, this means that we
// recommend using that in production, we have observed measurable // can just use the NEON rounding instructions, such as srshl. They
// loss of accuracy from this on MobileNets (which is how we noticed // happen to be breaking ties upward.
// this whole issue in the first place). //
// When RUY_OPT_NATIVE_ROUNDING is disabled, we implement strict
// break-ties-away-from zero, as described in Appendix B of
// https://arxiv.org/pdf/1712.05877.pdf
// When we wrote that, we thought that that would be better unbiased
// than the NEON upwards tie-breaks, and we had observed some
// improvement on some model. However, that is only more unbiased for
// data centered at zero, which was likely the case in that model,
// but is not always the case. If we wanted something more consistently
// unbiased then we should try breaking ties toward-nearest-even.
#if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING) #if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING)
// Fix up values to be right-shifted, so that the (round to nearest, // Fix up values to be right-shifted, so that the (round to nearest,
// break ties upward) behavior of srshl applied to these fixed-up // break ties upward) behavior of srshl applied to these fixed-up
@ -2485,17 +2502,26 @@ void Kernel8bitNeonDotprodOutOfOrder(const KernelParams8bit<8, 8>& params) {
"sqrdmulh v30.4s, v30.4s, v14.4s\n" "sqrdmulh v30.4s, v30.4s, v14.4s\n"
"sqrdmulh v31.4s, v31.4s, v15.4s\n" "sqrdmulh v31.4s, v31.4s, v15.4s\n"
// We have some rounding // We have some rounding division-by-power-of-two to do. This should
// division-by-power-of-two to do. Normally, this should be just // always use "round to nearest". We allow for some
// a rounding-right-shift, srshl. However, that does not quite // freedom in how ties are broken, to strike a good compromise of
// implement the round-to-nearest semantics that we need. See // performance on given hardware vs. perfect agreement of results
// Appendix B of https://arxiv.org/pdf/1712.05877.pdf // across hardware.
// Because we are going to get benchmarked against less-careful //
// competition, let's give people the ability to get faster, less // When RUY_OPT_NATIVE_ROUNDING is enabled, we allow for implementation
// careful arithmetic if they want --- define RUY_SLOPPY. We don't // defined tie-breaks to help performance. On NEON, this means that we
// recommend using that in production, we have observed measurable // can just use the NEON rounding instructions, such as srshl. They
// loss of accuracy from this on MobileNets (which is how we noticed // happen to be breaking ties upward.
// this whole issue in the first place). //
// When RUY_OPT_NATIVE_ROUNDING is disabled, we implement strict
// break-ties-away-from zero, as described in Appendix B of
// https://arxiv.org/pdf/1712.05877.pdf
// When we wrote that, we thought that that would be better unbiased
// than the NEON upwards tie-breaks, and we had observed some
// improvement on some model. However, that is only more unbiased for
// data centered at zero, which was likely the case in that model,
// but is not always the case. If we wanted something more consistently
// unbiased then we should try breaking ties toward-nearest-even.
#if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING) #if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING)
// Fix up values to be right-shifted, so that the (round to nearest, // Fix up values to be right-shifted, so that the (round to nearest,
// break ties upward) behavior of srshl applied to these fixed-up // break ties upward) behavior of srshl applied to these fixed-up
@ -3504,17 +3530,26 @@ void Kernel8bitNeonDotprodInOrder(const KernelParams8bit<8, 8>& params) {
"sqrdmulh v30.4s, v30.4s, v14.4s\n" "sqrdmulh v30.4s, v30.4s, v14.4s\n"
"sqrdmulh v31.4s, v31.4s, v15.4s\n" "sqrdmulh v31.4s, v31.4s, v15.4s\n"
// We have some rounding // We have some rounding division-by-power-of-two to do. This should
// division-by-power-of-two to do. Normally, this should be just // always use "round to nearest". We allow for some
// a rounding-right-shift, srshl. However, that does not quite // freedom in how ties are broken, to strike a good compromise of
// implement the round-to-nearest semantics that we need. See // performance on given hardware vs. perfect agreement of results
// Appendix B of https://arxiv.org/pdf/1712.05877.pdf // across hardware.
// Because we are going to get benchmarked against less-careful //
// competition, let's give people the ability to get faster, less // When RUY_OPT_NATIVE_ROUNDING is enabled, we allow for implementation
// careful arithmetic if they want --- define RUY_SLOPPY. We don't // defined tie-breaks to help performance. On NEON, this means that we
// recommend using that in production, we have observed measurable // can just use the NEON rounding instructions, such as srshl. They
// loss of accuracy from this on MobileNets (which is how we noticed // happen to be breaking ties upward.
// this whole issue in the first place). //
// When RUY_OPT_NATIVE_ROUNDING is disabled, we implement strict
// break-ties-away-from zero, as described in Appendix B of
// https://arxiv.org/pdf/1712.05877.pdf
// When we wrote that, we thought that that would be better unbiased
// than the NEON upwards tie-breaks, and we had observed some
// improvement on some model. However, that is only more unbiased for
// data centered at zero, which was likely the case in that model,
// but is not always the case. If we wanted something more consistently
// unbiased then we should try breaking ties toward-nearest-even.
#if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING) #if !(RUY_OPT_SET & RUY_OPT_NATIVE_ROUNDING)
// Fix up values to be right-shifted, so that the (round to nearest, // Fix up values to be right-shifted, so that the (round to nearest,
// break ties upward) behavior of srshl applied to these fixed-up // break ties upward) behavior of srshl applied to these fixed-up