Potential solution to RuntimeError: No such operator fbgemm::jagged_2d_to_dense #3168
Description
My envs BEFORE:
- in conda virtual env
- NVIDIA Tesla V100
- python: 3.10
- torch 2.1.0 + cu118
- cudnn 8.7.0
- fbgemm_gpu 0.7.0
My envs AFTER:
- in conda virtual env
- NVIDIA Tesla V100
- python: 3.10 -> 3.12
- torch 2.1.0 + cu118 -> torch 2.3.0 + cu118 (Key)
- cudnn 8.7.0
- fbgemm_gpu 0.7.0
PS: When I use fbgemm_gpu 0.8.0, there will be another error: AttributeError: '_OpNamespace' 'fbgemm' object has no attribute 'merge_pooled_embeddings'
. I have no idea why the later version has such an error.
Hint: If you find similar errors, check your configs by the following order:
(1) GPU device: My NVIDIA RTX 4090 can't work with the same config in envs AFTER. It seems only V and A devices can work.
(2) pytorch and cuda: If possible, you can try run fbgemm in conda virtual envs instead of docker / bare linux. CUDA 11.8 & 12.1 is recommended. AND USE torch 2.3.0+ NOT 2.1.0. As for libnvidia_ml.so
, libtorch.so
, no matter you use pip or conda to install torch, they will be installed.
(3) version: Try 0.7.0 but not 0.8.0.