Skip to content

Quantization parameter not working in vLLM backend (cloud-edge LLM example) #372

@yashasviyadav30

Description

@yashasviyadav30

While going through vllm_llm.py, I noticed the quantization parameter is commented out with a TODO:

#quantization=self.quantization # TODO need to align with vllm API
Looked into why this was disabled - BaseLLM._parse_kwargs() never parses quantization from kwargs, so self.quantization is undefined when VllmLLM._load() tries to use it.

Other examples like examples/PIPL/edge-cloud_collaborative_learning_bench already use quantization in their configs, and the original proposal also mentions it as a planned feature.

Checked the vLLM docs - the quantization parameter works directly with values like bitsandbytes, awq, gptq.

Fix:

Parse quantization in BaseLLM._parse_kwargs()
Conditionally pass it to vLLM in VllmLLM._load()
Happy to submit a PR for this.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions