Skip to content

Tensorflow GPU tests failing for TF 2.16 and 2.17 #46150

Open
@smuzaffar

Description

@smuzaffar

@cms-sw/ml-l2 , we are trying to update Tensorlfow for cmssw. We have TF 2.16 in CMSSW_14_2_TF_X_2024-09-24-1100 IB and TF 2.17 in this weeks IB (will be available soon). We noticed that GPU tests are failing [a] for these newer versions TF. Can you please look in to these failures ?

[a] run on lxplus-gpu node

> ssh lxplus-gpu
> cd /tmp/$(whoami)
> cmssw-el8 --nv
Singularity> scram p CMSSW_14_2_TF_X_2024-09-24-1100
Singularity> cmsenv
Singularity> git cms-addpkg PhysicsTools/TensorFlow
Singularity> scram build -j 8
Singularity> cmsenv
Singularity> ./test/el8_amd64_gcc12/testTFHelloWorldCUDA
Running .
46.0

CUDA service enabled: 1
Testing CUDA backend
E

##Failure Location unknown## : Error
Test name: testHelloWorldCUDA::test
uncaught exception of type std::exception (or derived).
- An exception of category 'UnavailableAccelerator' occurred while
   [0] Calling tensorflow::setBackend()
Exception Message:
Cuda backend requested, NVIDIA GPU visible to cmssw, but not visible to TensorFlow in the job

Failures !!!
Run: 1   Failure total: 1   Failures: 0   Errors: 1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions