I tried to replace Qwen2-7B-instruct with Qwen2.5-14B-instruct, and then I saw "CUDA out of memory". Environment: 1. Nvidia RTX4090 GPU * 2 (48GB total memory) 2. CUDA_VISIBLE_DEVICES=0,1 Does this project support using multiple GPUs to load models?
I tried to replace Qwen2-7B-instruct with Qwen2.5-14B-instruct, and then I saw "CUDA out of memory".
Environment:
Does this project support using multiple GPUs to load models?