-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
If a model hits an OOM error, we could try quantizing it for the user.
They can already specify quantization themselves, this is just an error-handling auto thing.
Cuda-OOM errors: If your model doesn't fit on our 4xA40 (48 GB) server we return an error. Coming soon, we should fallback to accelerate ZeRO stage-3 (CPU/Disk offload). And/or allow a flag for quantization, load_in_8bit=True or load_in_4bit=True.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels