Skip to content

Compatibility issues #26

@Luugaaa

Description

@Luugaaa

Context

I'm switching from my Mac to Rosenberg.

Issue

Torch version

In the WandbTrainer, we're using autocast like this autocast(self.device.type, enabled=True), but in the recent version of torch the device is not passed to the autocast anymore, raising an error. This should be changed to : autocast(enabled=True). This didn't happen on Mac as I wasn't using cuda.

Cuda Capability

Rosenberg's GPUs are too old to compile correctly :

    raise GPUTooOldForTriton(device_props, inspect.currentframe())
torch._inductor.exc.GPUTooOldForTriton: Found Tesla P100-SXM2-16GB which is too old to be supported by the triton GPU compiler, which is used as the backend. Triton only supports devices of CUDA Capability >= 7.0, but your device is of CUDA capability 6.0

I tried to :

  • use torch < 2.0 --> need to use a venv with python 3.10 --> python 3.10 is not installed and I can't install it on the vm ❌
  • use a docker container --> docker can't access the right GPU driver (most likely a compatibility issue) ❌
WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .
  • change the function do_i_compile in the nnUnetTrainer to always return false --> we don't compile anymore --> no problem anymore ✅

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions