Rewrite a useless std::memory_order_release as std::memory_order_relaxed.

It's OK because these stores happen before calling Execute, and these values
will only be consumed in other threads after Execute has communicated with
these threads... which will involve a release-store anyway.

This might matter because this suggests that at least on ARM32, each release-store may involve a __sync_synchronize and the compiler may not know to keep only the last memory barrier in a loop that only performs stores:
https://godbolt.org/z/HYijeI

PiperOrigin-RevId: 259982740
This commit is contained in:
Benoit Jacob 2019-07-25 11:09:09 -07:00 committed by TensorFlower Gardener
parent 64105e1c41
commit 625f9aad07

View File

@ -237,7 +237,7 @@ void TrMul(TrMulParams* params, Context* context) {
const int size = NumBlocksPerSide(side, block_map);
allocator->Allocate(size, &packed[side]);
for (int i = 0; i < size; i++) {
packed[side][i].store(false, std::memory_order_release);
packed[side][i].store(false, std::memory_order_relaxed);
}
}
}