-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Context
I'm switching from my Mac to Rosenberg.
Issue
Torch version
In the WandbTrainer, we're using autocast like this autocast(self.device.type, enabled=True)
, but in the recent version of torch the device is not passed to the autocast anymore, raising an error. This should be changed to : autocast(enabled=True)
. This didn't happen on Mac as I wasn't using cuda.
Cuda Capability
Rosenberg's GPUs are too old to compile correctly :
raise GPUTooOldForTriton(device_props, inspect.currentframe())
torch._inductor.exc.GPUTooOldForTriton: Found Tesla P100-SXM2-16GB which is too old to be supported by the triton GPU compiler, which is used as the backend. Triton only supports devices of CUDA Capability >= 7.0, but your device is of CUDA capability 6.0
I tried to :
- use torch < 2.0 --> need to use a venv with python 3.10 --> python 3.10 is not installed and I can't install it on the vm ❌
- use a docker container --> docker can't access the right GPU driver (most likely a compatibility issue) ❌
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
- change the function do_i_compile in the nnUnetTrainer to always return false --> we don't compile anymore --> no problem anymore ✅
Metadata
Metadata
Assignees
Labels
No labels