Load it in to 16-bit quantization(float16 or bfloat16)

Hi @qiudi0127,

Thanks for the support and sharing this repo, I want to load the models in the float16 or bfloat16, but still even though I have ram of 46GB still I am facing memory issues while loading the models itself. 

Below are the things I tried it on AWS g6e.2xlarge(https://instances.vantage.sh/aws/ec2/g6e.2xlarge)

1. tried quantization with 16bit --> OOM error
2. tried with bfloat16 quantization -->  mismatch error with prepare_latents method. float32 dtype 
3. tried with bfloat16 quantization & update the dtype in prepare_latents method --> while generating getting the OOM error.
4. tried with CPU offload --> even then OOM eeror

when loaded with bfloat16 it occupied 44221Mib/46068Mib on NVIDIA L40S

Can you help me here, how to further proceed. or do I need to increase the computation power. Pls share the required details.

Thanks in advance,
Zeeshan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load it in to 16-bit quantization(float16 or bfloat16) #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Load it in to 16-bit quantization(float16 or bfloat16) #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions