[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model#42255
[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model#42255sniper35 wants to merge 6 commits into
Conversation
Register a built-in parser for Nemotron <TOOLCALL> JSON payloads, add a matching chat template example, and cover streaming extraction for content-plus-tool and parallel-call chunk boundaries. Signed-off-by: Dong Wang <dongw2019@gmail.com>
…unk boundaries Signed-off-by: Dong Wang <dongw2019@gmail.com>
|
Documentation preview: https://vllm--42255.org.readthedocs.build/en/42255/ |
There was a problem hiding this comment.
Code Review
This PR adds the NemotronJSONToolParser and a corresponding Jinja2 chat template to support tool calling for Nemotron models. It includes documentation and tests for streaming and non-streaming tool extraction. Feedback points out a potential JSON escaping issue in the Jinja2 template, recommending the tojson filter for function names instead of manual quoting.
Signed-off-by: Dong Wang <dongw2019@gmail.com>
Signed-off-by: Dong Wang <dongw2019@gmail.com>
| class ExampleToolParser(ToolParser): | ||
| def __init__(self, tokenizer: TokenizerLike): | ||
| super().__init__(tokenizer) | ||
| def __init__(self, tokenizer: TokenizerLike, tools=None): |
There was a problem hiding this comment.
This is to align with with the latest code base
|
I am from the Nemotron team and had worked on this model. The tool-parser implementation looks correct and the tests show it functions correctly. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Dong W <89223086+sniper35@users.noreply.github.com>
|
@tomeras91 @aarnphm @chaunceyjiang @sfeng33 @bbrowning Could you please review, thanks! |
Purpose
Closes #42065
Nemotron cookbook updated to align with vllm codebase: NVIDIA-NeMo/Nemotron#196
Related: HF repo update: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2/discussions/37
Test Plan
start the server with the new registered tool parser:
Non-streming:
Streaming:
Test Result
Non-streming:
Streaming:
test_nanotron_streaming.py
test_nanotron_non_streaming.py
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.