Fix size computation logic in TransposeSimple

Use the right type when computing `num_bytes`. This caused the crash observed in the bug, but I could not reproduce in a unit test (even with cuda_asan) since the `InlinedVector` always uses stack storage. PiperOrigin-RevId: 321199018 Change-Id: I339307a2d2d098d4ad73b363b5f96c19ed65ea52
2020-07-14 11:30:37 -07:00 · 2020-07-14 11:30:37 -07:00 · e9516a8b0e
commit e9516a8b0e
parent 9fd2e39518
1 changed files with 1 additions and 1 deletions
--- a/tensorflow/core/kernels/transpose_functor_gpu.cu.cc
+++ b/tensorflow/core/kernels/transpose_functor_gpu.cu.cc
@ -72,7 +72,7 @@ void TransposeSimple(const GPUDevice& d, const Tensor& in,
    host_buf[ndims * 2 + i] = perm[i];
  }
  // Copies the input strides, output strides and permutation to the device.
-  auto num_bytes = sizeof(int64) * host_buf.size();
+  auto num_bytes = sizeof(int32) * host_buf.size();
  auto dev_buf = d.allocate(num_bytes);
  // NOTE: host_buf is not allocated by GpuHostAllocator, and
  // therefore we are doing a sync copy effectively.