Add simple granite4 tool parser by maxdebayser · Pull Request #36827 · vllm-project/vllm

maxdebayser · 2026-03-11T22:17:59Z

Purpose

Note: this is a simpler alternative to #35948 based on suggestions by @sfeng33

IBM's Granite 4 models use the Hermes tool calling convention and until now had been using the hermes parser. However, due to the popularity of the Hermes format many additions have been made to this parser to serve specific needs, such as the ability to work without specialized tool calling tokens. As a result, the parser's code has become mostly unreadable. We have found bugs that arise from the interaction with other features such as stop sequences and that are very hard to fix given the state of the code. Also given the complexity of the code, it is very hard for maintainers to trust that a PR won't break other things.
There is also a Granite 4 specific behavior which we need handled in the tool parser which is that the models have a tendency to generate the arguments as an escaped string instead of JSON text.

The granite4 parser in this PR has been re-written from the ground up to avoid the brittle partial json parsing that we see in other tool call parsers. By only streaming full tool call streaming, no partial json parsing is required.

Main design decisions:

Remove streaming of tool names ahead of arguments
Remove streaming of partial arguments: this complicates things and arguably doesn't benefit the end user at all
Rely only on text, not on tokens

Test Plan

Since the parser is compatible with Hermes tool calling, I'm reusing the Hermes tests except for one that allows incomplete input. I'm also adding tests for the lexer and parser as well as testing for known bugs.

Test Result

All the added or modified tests are passing locally.

This tool parser should be compatible with most models that use the Hermes tool calling pattern. It has been re-written from the ground up to avoid the brittle partial json parsing that we see in other tool call parsers. By relying on a stream-enabled parser it avoids bugs such as the interference from stop sequences which change the sequence of deltas that the model sees Main design decisions: - Remove streaming of partial arguments: this complicates things and arguably doesn't benefit the end user at all - Decompose the parser in several layers that are independently testable - Use a formal grammar to specify the parser - For the parser, use a library that is already part of vllm's dependencies. Lark is imported by llguidance. Since the parser is compatible with Hermes tool calling, I'm reusing the Hermes tests except for one that allows incomplete input. I'm also adding tests for the lexer and parser as well as testing for known bugs. Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

- Fix wrong scope for server fixtures in tests - Prevent the tool parser from streaming pieces of the <tool_call> marker as message content - Reduce unecessary delta messages Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Mypy is complaining about code outside of my changes for some reason Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Previously some lexing tasks were left to the vllm tool parser, which is at the wrong abstraction level, leading to unecessary complexity. Now the lexer also handles free text so that what comes out of the low level lark parser is already organized into text and tool calling segments. Since now the lexer and lark parser are aware of the surrunding text, it is easier to handle multiple tool calls cleanly Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

If we can assume that: 1) Streaming the name ahead of the arguments has no relevant use case; 2) Validating the tool call JSON while it is being assembled is not useful; then the parser can be simplified a lot by only streaming complete tool calls. The we can use simple regexes to find the tool call tokens and use json.loads() to handle anything in between. Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

mergify · 2026-03-11T22:18:50Z

Documentation preview: https://vllm--36827.org.readthedocs.build/en/36827/

mergify · 2026-03-11T22:22:11Z

Hi @maxdebayser, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

gemini-code-assist

Code Review

This pull request introduces a new granite4 tool parser, which is a simplified implementation for IBM's Granite 4 models. The changes include the parser logic, registration, documentation updates, and new tests. The existing hermes tool parser tests are also refactored to be parameterized and reused for the new parser. My review found a critical bug in the streaming logic of the new parser that could lead to an AttributeError and incorrect state management. I've also suggested an improvement to the new test file to simplify the logic for reconstructing tool calls, making it more readable and aligned with the parser's design of not streaming partial arguments.

vllm/tool_parsers/granite4_tool_parser.py

tests/entrypoints/openai/tool_parsers/test_granite4_tool_parser.py

maxdebayser · 2026-03-11T22:30:02Z

@sfeng33, here is an alternative implementation based on you suggestion. It can really be made much shorter after giving up on incremental parsing and streaming.

I'm going to answer your question of the other PR here, because it applies as well:

On partial <tool_call> handling: I'd suggest removing the partial matching logic for the <tool_call> tag. Since <tool_call> is a single token in the Granite 4 tokenizer, it will always arrive atomically in a single delta — it can never be split across chunks. The regex library's partial=True matching in consume_text adds complexity for a case that can't actually occur.

Relying only on text is useful for us to run tests with models that don't have dedicated tool calling tokens. But beyond that, I really prefer to have a single source of truth for the input, and since we have to parse json, the most appropriate input is text. To illustrate my point, there is currently a bug in vllm which causes the text deltas and the token deltas to go out of sync. If you run test_granite4_tool_parser.py::test_stop_sequence_interference and print the deltas that arrive at the tool parser, you'll see:

(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[100270]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[198]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[5018]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[609]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[794]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[330]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[456]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[62]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[582]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[2727]
(APIServer pid=250128) delta_text=''
(APIServer pid=250128) delta_token_ids=[62]
(APIServer pid=250128) delta_text='<t'
(APIServer pid=250128) delta_token_ids=[4030]
(APIServer pid=250128) delta_text='ool_c'
(APIServer pid=250128) delta_token_ids=[1292]
(APIServer pid=250128) delta_text='all>'
(APIServer pid=250128) delta_token_ids=[5595]
(APIServer pid=250128) delta_text='\n{"name": "g'
...
(APIServer pid=250128) delta_text='467722Z", "a'
(APIServer pid=250128) delta_token_ids=[100271]
(APIServer pid=250128) delta_text='cme_region": "A9345"}}\n</tool_call>'
(APIServer pid=250128) delta_token_ids=[100257]

But if you comment out the stop argument in the request, you see:

(APIServer pid=251071) delta_text='<tool_call>'
(APIServer pid=251071) delta_token_ids=[100270]
(APIServer pid=251071) delta_text='\n'
(APIServer pid=251071) delta_token_ids=[198]
(APIServer pid=251071) delta_text='{"'
(APIServer pid=251071) delta_token_ids=[5018]
(APIServer pid=251071) delta_text='name'
(APIServer pid=251071) delta_token_ids=[609]
(APIServer pid=251071) delta_text='":'
(APIServer pid=251071) delta_token_ids=[794]
(APIServer pid=251071) delta_text=' "'
(APIServer pid=251071) delta_token_ids=[330]
(APIServer pid=251071) delta_text='get'
(APIServer pid=251071) delta_token_ids=[456]
(APIServer pid=251071) delta_text='_'
(APIServer pid=251071) delta_token_ids=[62]
(APIServer pid=251071) delta_text='ac'
(APIServer pid=251071) delta_token_ids=[582]
(APIServer pid=251071) delta_text='me'
(APIServer pid=251071) delta_token_ids=[2727]
...
(APIServer pid=251071) delta_text='"}}\n'
(APIServer pid=251071) delta_token_ids=[96742]
(APIServer pid=251071) delta_text='</tool_call>'
(APIServer pid=251071) delta_token_ids=[100271]
(APIServer pid=251071) delta_text=''
(APIServer pid=251071) delta_token_ids=[100257]

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

maxdebayser · 2026-03-11T22:48:53Z

I've opened an issue for the bug I described above: #36830

maxdebayser and others added 15 commits March 3, 2026 23:50

Merge branch 'main' into add_granite4_tool_parser

fa765a3

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'main' into add_granite4_tool_parser

adc3076

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'upstream_main' into add_granite4_tool_parser

c220770

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

fix type annotations

ad3033e

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'upstream_main' into add_granite4_tool_parser

dd3261b

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Fix several small issues

61715d0

- Fix wrong scope for server fixtures in tests - Prevent the tool parser from streaming pieces of the <tool_call> marker as message content - Reduce unecessary delta messages Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'main' into add_granite4_tool_parser

d9d147e

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Appease mypy

2208783

Mypy is complaining about code outside of my changes for some reason Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'main' into add_granite4_tool_parser

2d2eadd

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

fix typos

3b0aa34

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'main' into add_granite4_tool_parser

fbbdb8d

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'main' into add_simple_granite4_tool_parser

5ab907c

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

maxdebayser requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners March 11, 2026 22:18

mergify bot added documentation Improvements or additions to documentation tool-calling labels Mar 11, 2026

github-project-automation bot added this to Tool Calling Mar 11, 2026

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

vllm/tool_parsers/granite4_tool_parser.py Show resolved Hide resolved

tests/entrypoints/openai/tool_parsers/test_granite4_tool_parser.py Show resolved Hide resolved

fix linter complaint

6a2232d

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

This was referenced Mar 11, 2026

Add granite4 tool parser #35948

Open

[Bug]: delta_text and delta_token_ids get out of sync when stop sequences are used. #36830

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add simple granite4 tool parser#36827

Add simple granite4 tool parser#36827
maxdebayser wants to merge 16 commits intovllm-project:mainfrom
maxdebayser:add_simple_granite4_tool_parser

maxdebayser commented Mar 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Mar 11, 2026

Uh oh!

maxdebayser commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

maxdebayser commented Mar 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

maxdebayser commented Mar 11, 2026

Uh oh!

maxdebayser commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maxdebayser commented Mar 11, 2026 •

edited by github-actions bot

Loading