- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.9k
 
Open
Labels
Description
🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend
📌 Overview
I would like to request the addition of formal grammar support (via BNF or xgrammar) to the vLLM backend in LocalAI. This feature would allow users to enforce structured outputs from LLMs using context-free grammars, which is particularly useful for generating JSON, code, XML, or other machine-readable formats with strict syntactic rules.
📚 Background
- vLLM Documentation: The official vLLM documentation highlights its support for speculative decoding and PagedAttention, but currently does not support structured output via grammars.
 - Current Limitation in LocalAI: While LocalAI already supports constrained grammars through the 
llama.cppbackend (via--grammarorgrammarparameter in the API), this functionality is not available for thevLLMbackend. - Use Case Example: Users want to generate valid JSON responses for API integrations, or generate Python code that can be directly executed, but are limited by vLLM's lack of grammar enforcement.
 
✅ Acceptance Criteria
- Support for BNF (Backus-Naur Form) and/or xgrammar syntax in the vLLM backend.
 - Grammar validation during model generation (via API request parameters).
 -  Compatible with the OpenAI-like API spec (e.g., 
grammar: { type: "xgrammar", value: "..." }). - Documentation updated in LocalAI's feature documentation and vLLM integration guide.
 -  Working example in the LocalAI-examples repo (e.g., 
examples/grammar-vllm.json). 
🔗 Relevant Links
- vLLM GitHub: https://github.com/vllm-project/vllm
 - vLLM Docs on Structured Output: https://vllm.ai/en/latest/serving/structured_outputs.html
 - LocalAI Constrained Grammars Feature: https://localai.io/features/constrained_grammars/
 - LocalAI vLLM Backend: https://github.com/mudler/LocalAI/tree/master/backends/vllm
 - Example Grammar (xgrammar): https://github.com/vllm-project/vllm/blob/main/examples/xgrammar_example.py
 
🧑💻 Suggested Implementation
- Integrate 
xgrammarorgrammarparsing logic fromllama.cppinto the vLLM backend. - Use vLLM’s native 
speculative decodingandgrammar-based sampling viavllm.SamplingParams. - Ensure the grammar is passed through the HTTP API layer as a JSON string.
 
🏷️ Labels
enhancement, vLLM, grammar, structured-output
👥 Tagging
@mudler (project maintainer)
@U08FLGN0QJE (core contributor, vLLM-related work)
✅ Note: This request is inspired by similar issues in other frameworks (e.g., HuggingFace Transformers, Llama.cpp), and aligns with the growing need for reliable structured output generation in LLM applications.