[ROCm] Fix for ROCm CSB breakage - 200527

The following commit introduces a new unit-test which fails on ROCm.

dbef0933eb

I think that this unit-test is for checking the reduced memory usage of the gradient checkpointing method.

The sub-test `test_does_not_raise_oom_exception` fails on ROCm, because on the ROCm platform the scratch space required for doing backward convolution pushes the total memory allocation just beyond the 1GB limit imposed by the testcase.

This fix moves up the threshold by 128MB (from 1024 MB to 1152 MB). This still presevers the intent of the unit-test, i.e. the `test_raises_oom_exception` continues to raise the exception, while also allowing the `test_does_not_raise_oom_exception` sub-test to pass on the ROCm platform.
This commit is contained in:
Deven Desai 2020-05-27 17:17:39 +00:00
parent b847ff9b30
commit 1c2527fd11

View File

@ -75,7 +75,7 @@ def _limit_gpu_memory():
if gpus:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1152)])
return True
return False