Skip to content

Load it in to 16-bit quantization(float16 or bfloat16) #48

@p-md-zeeshan-sheikh

Description

@p-md-zeeshan-sheikh

Hi @qiudi0127,

Thanks for the support and sharing this repo, I want to load the models in the float16 or bfloat16, but still even though I have ram of 46GB still I am facing memory issues while loading the models itself.

Below are the things I tried it on AWS g6e.2xlarge(https://instances.vantage.sh/aws/ec2/g6e.2xlarge)

  1. tried quantization with 16bit --> OOM error
  2. tried with bfloat16 quantization --> mismatch error with prepare_latents method. float32 dtype
  3. tried with bfloat16 quantization & update the dtype in prepare_latents method --> while generating getting the OOM error.
  4. tried with CPU offload --> even then OOM eeror

when loaded with bfloat16 it occupied 44221Mib/46068Mib on NVIDIA L40S

Can you help me here, how to further proceed. or do I need to increase the computation power. Pls share the required details.

Thanks in advance,
Zeeshan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions