Skip to content

How to convert model using multi-gpus? #112

@lisuying214

Description

@lisuying214

when I used lmquant-v0.0.0 branch to convert the fake model, if the model is big such as llama-30b, how can I set to use multi-GPUs.

My GPU: A100-40G x2
cmd: python -m lmquant.llm.run projects/llm/configs/llm.yaml projects/llm/configs/qoq/gchn.yaml --model-name llama-30b --smooth-xw-alpha 0.1 --smooth-xw-beta 0.9 --model-path /home/lxxx/models/llama-30b/

Now I have set "export CUDA_VISIBLE_DEVICE=0,1", but still oom:"torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 456.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 218.06 MiB is free. Process 2760330 has 39.26 GiB memory in use. Of the allocated memory 38.45 GiB is allocated by PyTorch, and 338.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"

Looking for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions