Fix size computation logic in TransposeSimple

Use the right type when computing `num_bytes`.  This caused the crash observed
in the bug, but I could not reproduce in a unit test (even with cuda_asan) since
the `InlinedVector` always uses stack storage.

PiperOrigin-RevId: 321199018
Change-Id: I339307a2d2d098d4ad73b363b5f96c19ed65ea52
This commit is contained in:
Sanjoy Das 2020-07-14 11:30:37 -07:00 committed by TensorFlower Gardener
parent 9fd2e39518
commit e9516a8b0e

View File

@ -72,7 +72,7 @@ void TransposeSimple(const GPUDevice& d, const Tensor& in,
host_buf[ndims * 2 + i] = perm[i];
}
// Copies the input strides, output strides and permutation to the device.
auto num_bytes = sizeof(int64) * host_buf.size();
auto num_bytes = sizeof(int32) * host_buf.size();
auto dev_buf = d.allocate(num_bytes);
// NOTE: host_buf is not allocated by GpuHostAllocator, and
// therefore we are doing a sync copy effectively.