Description
FBGEMM-GPU v1.5.0 does not appear to support SM70 (Volta) and SM75 (Turing) GPU architectures, despite them being listed as supported in the official release documentation.
Steps to Reproduce
Running cuobjdump on the compiled shared library reveals that only sm_80 and sm_90a architectures are included:
/usr/local/cuda-12.6/bin/cuobjdump /opt/conda/lib/python3.11/site-packages/fbgemm_gpu/fbgemm_gpu_tbe_training_forward.so \
| grep -E "sm_" \
| sort \
| uniq
Output:
arch = sm_80
arch = sm_90a
Expected Behavior
According to the official release documentation, SM75 should be supported. The compiled binary should include CUDA kernels targeting these architectures.
Actual Behavior
The shared library only contains kernels compiled for sm_80 and sm_90a, making it impossible to run FBGEMM-GPU on SM75 (e.g., T4) GPUs.
Environment
- CUDA Version: 12.6
- Python Version: 3.11
- Package:
torch==2.10.0 fbgemm-gpu==1.5.0 --index-url https://download.pytorch.org/whl/cu126
Additional Notes
Could you clarify whether SM70/SM75 support has been dropped in recent releases? If so, please update the documentation accordingly. If this is unintentional, a fix or recompilation targeting these architectures would be appreciated.
Description
FBGEMM-GPU v1.5.0 does not appear to support SM70 (Volta) and SM75 (Turing) GPU architectures, despite them being listed as supported in the official release documentation.
Steps to Reproduce
Running
cuobjdumpon the compiled shared library reveals that onlysm_80andsm_90aarchitectures are included:Output:
Expected Behavior
According to the official release documentation, SM75 should be supported. The compiled binary should include CUDA kernels targeting these architectures.
Actual Behavior
The shared library only contains kernels compiled for
sm_80andsm_90a, making it impossible to run FBGEMM-GPU on SM75 (e.g., T4) GPUs.Environment
torch==2.10.0 fbgemm-gpu==1.5.0 --index-url https://download.pytorch.org/whl/cu126Additional Notes
Could you clarify whether SM70/SM75 support has been dropped in recent releases? If so, please update the documentation accordingly. If this is unintentional, a fix or recompilation targeting these architectures would be appreciated.