-
Notifications
You must be signed in to change notification settings - Fork 79
Description
when I used lmquant-v0.0.0 branch to convert the fake model, if the model is big such as llama-30b, how can I set to use multi-GPUs.
My GPU: A100-40G x2
cmd: python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/gchn.yaml --model-name llama-30b --smooth-xw-alpha 0.1 --smooth-xw-beta 0.9 --model-path /home/lxxx/models/llama-30b/
Now I have set "export CUDA_VISIBLE_DEVICE=0,1", but still oom:"torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 456.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 218.06 MiB is free. Process 2760330 has 39.26 GiB memory in use. Of the allocated memory 38.45 GiB is allocated by PyTorch, and 338.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"
Looking for your help!