From 4cb805dba2d406debcfeea724efd3f973589c0f3 Mon Sep 17 00:00:00 2001 From: Deven Desai Date: Tue, 2 Jun 2020 14:47:24 +0000 Subject: [PATCH] [ROCm] Adding no_rocm tag to a unit-test regression. The following commit to bump up the LLVM commit pointer, introduces a unit-test regression on the ROCm platform. https://github.com/tensorflow/tensorflow/commit/4de4c60972da38d09662842614ad4dcfd019a6be There are changes other than LLVM commit pointer bump, in the above commit, but it is the change in LLVm version that seems to be causing the regression. The failing unit-test is ``` //tensorflow/python/keras/optimizer_v2:adam_test_gpu FAILED in 13.7s ... ... [ RUN ] NonFusedAdamOptimizerTest.testSparse 2020-06-02 11:32:19.520540: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so 2020-06-02 11:32:19.661617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Found device 0 with properties: pciBusID: 0000:23:00.0 name: Vega 10 [Radeon Instinct MI25] ROCm AMD GPU ISA: gfx900 ... ... 2020-06-02 11:32:28.304094: I tensorflow/compiler/jit/xla_compilation_cache.cc:314] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature) '-fp32-denormals' is not a recognized feature for this target (ignoring feature) '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature) '-fp32-denormals' is not a recognized feature for this target (ignoring feature) '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature) '-fp32-denormals' is not a recognized feature for this target (ignoring feature) '+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature) '-fp32-denormals' is not a recognized feature for this target (ignoring feature) Memory access fault by GPU node-4 (Agent handle: 0x345e4b0) on address 0x7f4200258000. Reason: Page not present or supervisor privilege. Fatal Python error: Aborted ... ``` This commit puts a no_rocm tag on that test to get the ROCm CSB to pass, while we root cause the regression and come up with a fix --- tensorflow/python/keras/optimizer_v2/BUILD | 1 + 1 file changed, 1 insertion(+) diff --git a/tensorflow/python/keras/optimizer_v2/BUILD b/tensorflow/python/keras/optimizer_v2/BUILD index 9e844b41332..be793378538 100644 --- a/tensorflow/python/keras/optimizer_v2/BUILD +++ b/tensorflow/python/keras/optimizer_v2/BUILD @@ -105,6 +105,7 @@ cuda_py_test( size = "medium", srcs = ["adam_test.py"], shard_count = 4, + tags = ["no_rocm"], deps = [ ":optimizer_v2", "//tensorflow/python:client_testlib",