Fix size computation logic in TransposeSimple
Use the right type when computing `num_bytes`. This caused the crash observed in the bug, but I could not reproduce in a unit test (even with cuda_asan) since the `InlinedVector` always uses stack storage. PiperOrigin-RevId: 321199018 Change-Id: I339307a2d2d098d4ad73b363b5f96c19ed65ea52
This commit is contained in:
parent
9fd2e39518
commit
e9516a8b0e
@ -72,7 +72,7 @@ void TransposeSimple(const GPUDevice& d, const Tensor& in,
|
||||
host_buf[ndims * 2 + i] = perm[i];
|
||||
}
|
||||
// Copies the input strides, output strides and permutation to the device.
|
||||
auto num_bytes = sizeof(int64) * host_buf.size();
|
||||
auto num_bytes = sizeof(int32) * host_buf.size();
|
||||
auto dev_buf = d.allocate(num_bytes);
|
||||
// NOTE: host_buf is not allocated by GpuHostAllocator, and
|
||||
// therefore we are doing a sync copy effectively.
|
||||
|
Loading…
Reference in New Issue
Block a user