Quantization parameter not working in vLLM backend (cloud-edge LLM example)

While going through `vllm_llm.py`, I noticed the quantization parameter is commented out with a TODO:

```python
#quantization=self.quantization # TODO need to align with vllm API
Looked into why this was disabled - BaseLLM._parse_kwargs() never parses quantization from kwargs, so self.quantization is undefined when VllmLLM._load() tries to use it.

Other examples like examples/PIPL/edge-cloud_collaborative_learning_bench already use quantization in their configs, and the original proposal also mentions it as a planned feature.

Checked the vLLM docs - the quantization parameter works directly with values like bitsandbytes, awq, gptq.

Fix:

Parse quantization in BaseLLM._parse_kwargs()
Conditionally pass it to vLLM in VllmLLM._load()
Happy to submit a PR for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization parameter not working in vLLM backend (cloud-edge LLM example) #372

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization parameter not working in vLLM backend (cloud-edge LLM example) #372

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions