Fix size computation logic in TransposeSimple
Use the right type when computing `num_bytes`. This caused the crash observed in the bug, but I could not reproduce in a unit test (even with cuda_asan) since the `InlinedVector` always uses stack storage. PiperOrigin-RevId: 321199018 Change-Id: I339307a2d2d098d4ad73b363b5f96c19ed65ea52
This commit is contained in:
		
							parent
							
								
									9fd2e39518
								
							
						
					
					
						commit
						e9516a8b0e
					
				@ -72,7 +72,7 @@ void TransposeSimple(const GPUDevice& d, const Tensor& in,
 | 
			
		||||
    host_buf[ndims * 2 + i] = perm[i];
 | 
			
		||||
  }
 | 
			
		||||
  // Copies the input strides, output strides and permutation to the device.
 | 
			
		||||
  auto num_bytes = sizeof(int64) * host_buf.size();
 | 
			
		||||
  auto num_bytes = sizeof(int32) * host_buf.size();
 | 
			
		||||
  auto dev_buf = d.allocate(num_bytes);
 | 
			
		||||
  // NOTE: host_buf is not allocated by GpuHostAllocator, and
 | 
			
		||||
  // therefore we are doing a sync copy effectively.
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user