Skip to content

Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend #6857

@localai-bot

Description

@localai-bot

🚀 Feature Request: Add grammar support (BNF/xgrammar) to vLLM backend

📌 Overview

I would like to request the addition of formal grammar support (via BNF or xgrammar) to the vLLM backend in LocalAI. This feature would allow users to enforce structured outputs from LLMs using context-free grammars, which is particularly useful for generating JSON, code, XML, or other machine-readable formats with strict syntactic rules.

📚 Background

  • vLLM Documentation: The official vLLM documentation highlights its support for speculative decoding and PagedAttention, but currently does not support structured output via grammars.
  • Current Limitation in LocalAI: While LocalAI already supports constrained grammars through the llama.cpp backend (via --grammar or grammar parameter in the API), this functionality is not available for the vLLM backend.
  • Use Case Example: Users want to generate valid JSON responses for API integrations, or generate Python code that can be directly executed, but are limited by vLLM's lack of grammar enforcement.

✅ Acceptance Criteria

  • Support for BNF (Backus-Naur Form) and/or xgrammar syntax in the vLLM backend.
  • Grammar validation during model generation (via API request parameters).
  • Compatible with the OpenAI-like API spec (e.g., grammar: { type: "xgrammar", value: "..." }).
  • Documentation updated in LocalAI's feature documentation and vLLM integration guide.
  • Working example in the LocalAI-examples repo (e.g., examples/grammar-vllm.json).

🔗 Relevant Links

🧑‍💻 Suggested Implementation

  • Integrate xgrammar or grammar parsing logic from llama.cpp into the vLLM backend.
  • Use vLLM’s native speculative decoding and grammar-based sampling via vllm.SamplingParams.
  • Ensure the grammar is passed through the HTTP API layer as a JSON string.

🏷️ Labels

enhancement, vLLM, grammar, structured-output

👥 Tagging

@mudler (project maintainer)
@U08FLGN0QJE (core contributor, vLLM-related work)

Note: This request is inspired by similar issues in other frameworks (e.g., HuggingFace Transformers, Llama.cpp), and aligns with the growing need for reliable structured output generation in LLM applications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions