Auto Quantization?

If a model hits an OOM error, we could try quantizing it for the user. 

They can already specify quantization themselves, this is just an error-handling auto thing.

Cuda-OOM errors: If your model doesn't fit on our 4xA40 (48 GB) server we return an error. Coming soon, we should fallback to accelerate ZeRO stage-3 (CPU/Disk offload). And/or allow a flag for quantization, load_in_8bit=True or load_in_4bit=True.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto Quantization? #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto Quantization? #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions