STT-tensorflow/tensorflow/tools/ci_build/gpu_build
Deven Desai 4574fb2176 [ROCm] Raising the memory allocation cap for GPU unit tests from 1GB to 2GB
This PR/commit updates the `parallel_gpu_execute.sh` script to raise the GPU memory allocation cap from 1GB to 2GB when running unit-tests.

Recently a couple of unit tests started failing on the ROCm platform because they were running out of memory

```
//tensorflow/python/kernel_tests:extract_image_patches_grad_test_gpu
//tensorflow/python/ops/numpy_ops:np_interop_test_gpu
```

GPU unit tests (atleast on the ROCm platform) are run with a cap that is set and implemented as shown here :

* https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh#L26-L32
* https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L130-L137
* https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L151
* https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L487-L503

It does not seem that the `parallel_gpu_execute.sh` is being used on the CUDA platform (anymore...think it was in the past). There does not seem to be any reference to it in the `Invocation Details` tab of the `Linux GPU` CI job.

for e.g - https://source.cloud.google.com/results/invocations/09d63e6a-f7a9-4fc6-9708-2fdd40b8b193/details

It also does not seem that GPU unit tests on the CUDA platform are being subjected to the 1GB memory cap. This can be verified by looking the at `Target Log` for the `//tensorflow/python/ops/numpy_ops:np_interop_test_gpu` test in the `Linux GPU` CI job (actually any GPU unit test)

for e.g. - https://source.cloud.google.com/results/invocations/09d63e6a-f7a9-4fc6-9708-2fdd40b8b193/targets/%2F%2Ftensorflow%2Fpython%2Fops%2Fnumpy_ops:np_interop_test_gpu/log

On the ROCm platform, we see the following log messages which are generated as a consequence of the memory cap (when TF tried to grab the entire available GPU memory on startup)

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L488-L494

```
 W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 16133306368 on device 0 within provided limit. [used=0, limit=1073741824]
 W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 14519974912 on device 0 within provided limit. [used=0, limit=1073741824]
 W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 13067976704 on device 0 within provided limit. [used=0, limit=1073741824]
 W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 11761178624 on device 0 within provided limit. [used=0, limit=1073741824]
...
...
...
```

These messsage are not present in unit tests logs for `Linux GPU` CI job, which seems to suggest that the env var `TF_PER_DEVICE_MEMORY_LIMIT_MB` is not set when the unit tests are run. Either that or the GPU on which the tests are being run has 1GB total memory which is unlikely.
2021-01-06 15:33:12 +00:00
..
BUILD
parallel_gpu_execute.sh