How to convert model using multi-gpus?

when I used lmquant-v0.0.0 branch to convert the fake model, if the model is big such as llama-30b, how can I set to use multi-GPUs.

My GPU: A100-40G x2
cmd: python -m lmquant.llm.run projects/llm/configs/llm.yaml   projects/llm/configs/qoq/gchn.yaml --model-name llama-30b --smooth-xw-alpha 0.1 --smooth-xw-beta 0.9 --model-path /home/lxxx/models/llama-30b/

Now I have set "export CUDA_VISIBLE_DEVICE=0,1", but still oom:"torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 456.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 218.06 MiB is free. Process 2760330 has 39.26 GiB memory in use. Of the allocated memory 38.45 GiB is allocated by PyTorch, and 338.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"


Looking for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to convert model using multi-gpus? #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to convert model using multi-gpus? #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions