[ROCm] Raising the memory allocation cap for GPU unit tests from 1GB to 2GB

This PR/commit updates the `parallel_gpu_execute.sh` script to raise the GPU memory allocation cap from 1GB to 2GB when running unit-tests. Recently a couple of unit tests started failing on the ROCm platform because they were running out of memory ``` //tensorflow/python/kernel_tests:extract_image_patches_grad_test_gpu //tensorflow/python/ops/numpy_ops:np_interop_test_gpu ``` GPU unit tests (atleast on the ROCm platform) are run with a cap that is set and implemented as shown here : * https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh#L26-L32 * https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L130-L137 * https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L151 * https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L487-L503 It does not seem that the `parallel_gpu_execute.sh` is being used on the CUDA platform (anymore...think it was in the past). There does not seem to be any reference to it in the `Invocation Details` tab of the `Linux GPU` CI job. for e.g - https://source.cloud.google.com/results/invocations/09d63e6a-f7a9-4fc6-9708-2fdd40b8b193/details It also does not seem that GPU unit tests on the CUDA platform are being subjected to the 1GB memory cap. This can be verified by looking the at `Target Log` for the `//tensorflow/python/ops/numpy_ops:np_interop_test_gpu` test in the `Linux GPU` CI job (actually any GPU unit test) for e.g. - https://source.cloud.google.com/results/invocations/09d63e6a-f7a9-4fc6-9708-2fdd40b8b193/targets/%2F%2Ftensorflow%2Fpython%2Fops%2Fnumpy_ops:np_interop_test_gpu/log On the ROCm platform, we see the following log messages which are generated as a consequence of the memory cap (when TF tried to grab the entire available GPU memory on startup) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/stream_executor_pimpl.cc#L488-L494 ``` W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 16133306368 on device 0 within provided limit. [used=0, limit=1073741824] W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 14519974912 on device 0 within provided limit. [used=0, limit=1073741824] W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 13067976704 on device 0 within provided limit. [used=0, limit=1073741824] W tensorflow/stream_executor/stream_executor_pimpl.cc:490] Not enough memory to allocate 11761178624 on device 0 within provided limit. [used=0, limit=1073741824] ... ... ... ``` These messsage are not present in unit tests logs for `Linux GPU` CI job, which seems to suggest that the env var `TF_PER_DEVICE_MEMORY_LIMIT_MB` is not set when the unit tests are run. Either that or the GPU on which the tests are being run has 1GB total memory which is unlikely.
2021-01-06 15:08:40 +00:00 · 2021-01-06 15:08:40 +00:00 · 4574fb2176
commit 4574fb2176
parent 2f4a5dffed
2 changed files with 2 additions and 8 deletions
--- a/tensorflow/python/ops/numpy_ops/BUILD
+++ b/tensorflow/python/ops/numpy_ops/BUILD
@ -110,7 +110,6 @@ cuda_py_test(
 cuda_py_test(
    name = "np_interop_test",
    srcs = ["np_interop_test.py"],
-    tags = ["no_rocm"],
    deps = [
        ":numpy",
        "//tensorflow:tensorflow_py",
--- a/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh
+++ b/tensorflow/tools/ci_build/gpu_build/parallel_gpu_execute.sh
@ -23,13 +23,8 @@

 TF_GPU_COUNT=${TF_GPU_COUNT:-4}
 TF_TESTS_PER_GPU=${TF_TESTS_PER_GPU:-8}
-# We want to allow running one of the following configs:
-#  - 4 tests per GPU on k80
-#  - 8 tests per GPU on p100
-# p100 has minimum 12G memory. Therefore, we should limit each test to 1.5G.
-# To leave some room in case we want to run more tests in parallel in the
-# future and to use a rounder number, we set it to 1G.
-export TF_PER_DEVICE_MEMORY_LIMIT_MB=${TF_PER_DEVICE_MEMORY_LIMIT_MB:-1024}
+
+export TF_PER_DEVICE_MEMORY_LIMIT_MB=${TF_PER_DEVICE_MEMORY_LIMIT_MB:-2048}

 # *******************************************************************
 #         This section of the script is needed to