Skip to content

No access to rocminfo in a production environment - ability to manually set GPU arch. #1444

Open
@isaranto

Description

@isaranto

System Info

System Info

Working on a kubernetes deployment with debian + pytorch 2.4.0 + ROCm 6.1.
The deployment is using the multiple backend alpha release available in the parent bitsandbytes repo.

Reproduction

Trying to load a model with bitsandbytes fails because there is no access to rocminfo.

def get_rocm_gpu_arch() -> str:
    logger = logging.getLogger(__name__)
    try:
        if torch.version.hip:
            result = subprocess.run(["rocminfo"], capture_output=True, text=True)
            match = re.search(r"Name:\s+gfx([a-zA-Z\d]+)", result.stdout)
ERROR:bitsandbytes.cuda_specs:Could not detect ROCm GPU architecture: [Errno 2] No such file or directory: 'rocminfo'
WARNING:bitsandbytes.cuda_specs:
ROCm GPU architecture detection failed despite ROCm being available.

https://github.com/ROCm/bitsandbytes/blob/4aad810bc1d93c38a5316ec54c822cd12b1f1cd2/bitsandbytes/cuda_specs.py#L54

Expected behavior

I would prefer if I could set the architecture via an environment variable and rocminfo would be the fallback option if the env var is not set.
Here is the related cope snippet.
Happy to work on this if other people feel it is a good workaround.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Low RiskRisk of bugs in transformers and other librariesROCmcontributions-welcomeWe welcome contributions to fix this issue!cross-platformmedium priority(will be worked on after all high priority issues)

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions