Rewrite a useless std::memory_order_release as std::memory_order_relaxed.

It's OK because these stores happen before calling Execute, and these values will only be consumed in other threads after Execute has communicated with these threads... which will involve a release-store anyway. This might matter because this suggests that at least on ARM32, each release-store may involve a __sync_synchronize and the compiler may not know to keep only the last memory barrier in a loop that only performs stores: https://godbolt.org/z/HYijeI PiperOrigin-RevId: 259982740
2019-07-25 11:09:09 -07:00 · 2019-07-25 11:09:09 -07:00 · 625f9aad07
commit 625f9aad07
parent 64105e1c41
1 changed files with 1 additions and 1 deletions
--- a/tensorflow/lite/experimental/ruy/trmul.cc
+++ b/tensorflow/lite/experimental/ruy/trmul.cc
@ -237,7 +237,7 @@ void TrMul(TrMulParams* params, Context* context) {
      const int size = NumBlocksPerSide(side, block_map);
      allocator->Allocate(size, &packed[side]);
      for (int i = 0; i < size; i++) {
-        packed[side][i].store(false, std::memory_order_release);
+        packed[side][i].store(false, std::memory_order_relaxed);
      }
    }
  }