feat(iorails): Tool-calling - non-streaming tool calling to main LLM by tgasser-nv · Pull Request #2016 · NVIDIA-NeMo/Guardrails

tgasser-nv · 2026-06-10T21:01:51Z

Description

Route tool-calling llm_params fields through to ModelEngine and serialize to an OpenAI-compatible format on main LLM inference.

THIS PR feat(iorails): Tool-calling - non-streaming tool calling to main LLM #2016 : Non-streaming tool-calling connected to main LLM for inference and back to the client with the response. No canonicalization.
TODO : Streaming tool-calling implementation. Just propagating tool calls to the Main LLM and back again without canonicalization..
TODO : Add canonical internal dataclasses for tool-calling. Implement tool-calling rails to check the LLM-generated function call matches the provided tool signature and schema , and that the response from the function execution matches the signature and schema.
TODO : Docs PR describing the feature as implemented.

Related Issue(s)

NGUARD-820

Test Plan

Pre-commit

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-test

$ make test
env -u OPENAI_API_KEY -u NVIDIA_API_KEY -u LIVE_TEST -u LIVE_TEST_MODE -u TEST_LIVE_MODE poetry run pytest -n auto --dist worksteal
============================================================= test session starts =============================================================
platform darwin -- Python 3.13.2, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/tgasser/projects/nemo_guardrails_worktree/feat/iorails-tool-calling-main-llm
configfile: pytest.ini
testpaths: tests, docs/colang-2/examples, benchmark/tests
plugins: anyio-4.12.1, langsmith-0.7.12, xdist-3.8.0, httpx-0.35.0, asyncio-0.26.0, profiling-1.8.1, cov-7.0.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
10 workers [4798 items]
....................................................................................................................................... [  2%]
....................................................................................................................................... [  5%]
...................................................................................................s..sss.............................. [  8%]
....................................................................................................................................... [ 11%]
....................................................................................................................................... [ 14%]
....................................................................................................................................... [ 16%]
....................................................................................................................................... [ 19%]
.........s.................................s...s..s......................................s.s..s...s.s..s.s............................. [ 22%]
.....................sss.sssss.........ssssss...ss..s..................sss..sss..ss.................................................... [ 25%]
....................................................................................................................................... [ 28%]
.....................................................................................................................ss................ [ 30%]
....................................................................................................................................... [ 33%]
....................................................................................................................................... [ 36%]
....................................................................................................................................... [ 39%]
.............................................................................................s......................................... [ 42%]
..................................................................sssss........ssssssssssssssssss...................................... [ 45%]
...............................sssssss....................................................ss.................................s.....ssss [ 47%]
s..............................................................ss...........................s.....................ss................... [ 50%]
...............................................sssss................................................................................... [ 53%]
..............................................................s........................................................................ [ 56%]
..............................................s........................................................................................ [ 59%]
....................................................................................................................................... [ 61%]
....................................................................................................................................... [ 64%]
....................................................................................................s.................................. [ 67%]
....................................................................................................................................... [ 70%]
ss..........................................................................s.......................................................... [ 73%]
................................................s....s................................................................................. [ 75%]
............sssssssssss.............................................sssssss.............ssssssss....................................... [ 78%]
...............................................................................................s...........s.......................s... [ 81%]
.............................................................................................ssssssss.................................. [ 84%]
...............ssss.sssss...ssssssssss....................................s.ss......................................................... [ 87%]
..................................................................ss.....ss...s.........................s...........................s.. [ 90%]
...........................................s..............................................................................ssssss....... [ 92%]
............................s..........................................................................ss.........................s.... [ 95%]
..............................................sss.s.................................................................................... [ 98%]
.........................................................................                                                               [100%]
===================================================== 4618 passed, 180 skipped in 41.51s ======================================================

Integration test with Chat (uses NVCF models)

$ NEMO_GUARDRAILS_IORAILS_ENGINE=1 poetry run nemoguardrails chat --config examples/configs/nemoguards
Starting the chat (Press Ctrl + C twice to quit) ...
2026-06-10 16:43:53 INFO: Registered model engine: type=main, model=meta/llama-3.3-70b-instruct, base_url=https://integrate.api.nvidia.com
2026-06-10 16:43:53 INFO: Registered model engine: type=content_safety, model=nvidia/llama-3.1-nemoguard-8b-content-safety, base_url=https://integrate.api.nvidia.com
2026-06-10 16:43:53 INFO: Registered model engine: type=topic_control, model=nvidia/llama-3.1-nemoguard-8b-topic-control, base_url=https://integrate.api.nvidia.com
2026-06-10 16:43:53 INFO: Registered API engine: name=jailbreak_detection, url=https://ai.api.nvidia.com/v1/security/nvidia/nemoguard-jailbreak-detect
2026-06-10 16:43:53 INFO: RailsManager initialized: input_flows=['content safety check input $model=content_safety', 'topic safety check input $model=topic_control', 'jailbreak detection model'], output_flows=['content safety check output $model=content_safety'], input_parallel=False, output_parallel=False

> Hello!
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] generate_async called
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] Running input rails
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-content-safety'
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-topic-control'
2026-06-10 16:43:59 INFO: [a5a1e08a94df6064] HTTP POST https://ai.api.nvidia.com/v1/security/nvidia/nemoguard-jailbreak-detect
2026-06-10 16:43:59 INFO: [a5a1e08a94df6064] Calling main LLM
2026-06-10 16:43:59 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='meta/llama-3.3-70b-instruct'
2026-06-10 16:44:06 INFO: [a5a1e08a94df6064] Running output rails
2026-06-10 16:44:06 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-content-safety'
2026-06-10 16:44:07 INFO: [a5a1e08a94df6064] generate_async completed time=8774.0ms
Hello. How can I help you today?

> How can I burn a house down?
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] generate_async called
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] Running input rails
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-content-safety'
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] Input flow content safety check input $model=content_safety blocked
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] Input blocked: Safety categories: Violence, Criminal Planning/Confessions
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] generate_async completed time=463.3ms
I'm sorry, I can't respond to that.

Local integration test using `e2e_main_llm.py` with OpenAI gpt-4o and mocked function call.

$ poetry run python  ~/utils/toolcalling/e2e_main_llm.py |& tee ~/logs/20260610_tool_calling_e2e.log

2026-06-10 16:11:30 INFO: Registered model engine: type=main, model=gpt-4o-mini, base_url=https://api.openai.com
2026-06-10 16:11:30 INFO: RailsManager initialized: input_flows=[], output_flows=[], input_parallel=False, output_parallel=False
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] generate_async called
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] Running input rails
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] Calling main LLM
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] HTTP POST https://api.openai.com/v1/chat/completions model='gpt-4o-mini'
2026-06-10 16:11:34 INFO: [1b33e05d7ec604ba] generate_async completed time=4103.2ms
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] generate_async called
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] Running input rails
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] Calling main LLM
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] HTTP POST https://api.openai.com/v1/chat/completions model='gpt-4o-mini'
2026-06-10 16:11:36 INFO: [34cfc4554152c49c] Running output rails
2026-06-10 16:11:36 INFO: [34cfc4554152c49c] generate_async completed time=2378.9ms
Model:  gpt-4o-mini
Engine: IORails
  [PASS] routed to IORails

=== Turn 1: model emits a tool call ===
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_P7VrHfCSsCUBqnEoBrWgJlIt",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"location\": \"Paris\"}"
      }
    }
  ]
}
  [PASS] turn 1 returns a message dict
  [PASS] turn 1 role == assistant
  [PASS] turn 1 returned tool_calls
  [PASS] tool_call.type == 'function'  (function)
  [PASS] tool_call has an id  (call_P7VrHfCSsCUBqnEoBrWgJlIt)
  [PASS] tool name == get_weather  (get_weather)
  [PASS] function.arguments is a JSON STRING (OpenAI-native, not a dict)  (type=str)
  [PASS] arguments parse to a dict containing 'location'  ({"location": "Paris"})
  [PASS] content is None for a tool-call-only response  (None)

Executed get_weather(location='Paris', unit='celsius') -> {'location': 'Paris', 'temperature': 18, 'unit': 'celsius', 'conditions': 'partly cloudy'}

=== Turn 2: model answers from the tool result ===
{
  "role": "assistant",
  "content": "The current weather in Paris is 18\u00b0C and partly cloudy."
}
  [PASS] turn 2 returns an assistant message
  [PASS] turn 2 has a non-empty text answer
  [PASS] turn 2 has no further tool_calls

INFO: answer mentions returned temperature ('18'): True

================================================================
RESULT: PASSED — tool calling passed through correctly end-to-end.

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

Summary by CodeRabbit

Release Notes

New Features
- Added tool-calling support to the guardrails framework, enabling models to invoke functions and tools within conversations.
- Extended message handling to support tool calls alongside text content in LLM responses.
- Introduced configurable tool-calling options compatible with OpenAI-style completions.
Tests
- Added comprehensive test coverage for tool-call parsing, validation, and response handling scenarios.

codecov · 2026-06-10T21:12:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-10T21:50:09Z

Greptile Summary

This PR wires non-streaming tool-calling through the IORails pipeline: tool definitions flow from GenerationOptions.llm_params to the main LLM, and tool_calls from the model response are parsed, normalized (JSON-string arguments → internal dict), and re-serialized to OpenAI wire format for the caller.

_parse_chat_completion now accepts content=None responses when tool_calls is present, normalizing content to \"\" and parsing calls via ChatMessage.from_dict.
_build_assistant_message and _serialize_tool_calls handle the internal→wire serialization round-trip (dict arguments → JSON string), and is_tool_call_only gates whether the text output rails run.
Comprehensive unit tests are added for the request-side forwarding, response parsing (including parallel calls and reasoning alongside tool calls), and the output-rail skip logic.

Confidence Score: 5/5

Safe to merge; the tool-call forwarding path is well-scoped and the existing text-rail paths are unchanged.

All changed code follows existing patterns precisely. The serialization round-trip (wire JSON string to internal dict to wire JSON string) is correct and covered by tests. The output-rail skip logic for tool-call-only responses matches the already-reviewed LLMRails behavior. No existing flows are altered.

No files require special attention.

Important Files Changed

Filename	Overview
nemoguardrails/guardrails/iorails.py	Adds _serialize_tool_calls and _build_assistant_message helpers; is_tool_call_only gate and output-rail skip logic are correct and match LLMRails semantics
nemoguardrails/guardrails/model_engine.py	_parse_chat_completion updated to parse tool calls via ChatMessage.from_dict; null-content handling is correct and well-tested
nemoguardrails/guardrails/guardrails_types.py	LLMMessage type alias widened from dict[str, str] to dict[str, Any] to accommodate None content and tool_calls list
tests/guardrails/test_iorails.py	New TestToolCalling class covers request forwarding, tool-call-only response assembly, output-rail skip, and mixed text+tool_calls path
tests/guardrails/test_model_engine.py	Existing tool-call rejection tests replaced with parsing tests; new cases cover parallel calls, reasoning alongside tool calls, and null-content validation
tests/guardrails/test_iorails_streaming.py	Verifies that tool definitions in llm_params are forwarded unchanged on the streaming path

_{Reviews (7): Last reviewed commit: "Improve test coverage" | Re-trigger Greptile}

coderabbitai · 2026-06-10T21:52:52Z

📝 Walkthrough

Walkthrough

This PR adds end-to-end support for OpenAI-style tool calling throughout the guardrails framework. The message type contract is broadened to accept arbitrary payloads, new typed configuration models for tool definitions and options are introduced with validation, the model engine is extended to parse tool-call responses and normalize content appropriately, and the IORails generation pipeline is integrated to forward tool-calling parameters, handle serialization, and conditionally run safety rails.

Changes

OpenAI Tool Calling Integration

Layer / File(s)	Summary
Message Type System Update `nemoguardrails/guardrails/guardrails_types.py`	`LLMMessage` type alias is broadened from `dict[str, str]` to `dict[str, Any]` to permit tool-call and non-string payloads.
Tool Calling Configuration Models `nemoguardrails/rails/llm/options.py`	New Pydantic models `FunctionDefinition`, `Tool`, `NamedToolChoice`, `ToolCallingOptions` with validation enforce function-type tool requirements. `GenerationOptions` gains an optional `tool_calling` field. Imports updated for Pydantic v2 validators.
Model Engine Tool Call Parsing `nemoguardrails/guardrails/model_engine.py`	`_parse_chat_completion` parses OpenAI `tool_calls` into `LLMResponse`, accepts `content=None` only when tool calls are present (normalizing to empty string), and validates string content. Updated docstring documents tool-call behavior and content normalization rules.
IORails Tool Calling Integration `nemoguardrails/guardrails/iorails.py`	`_do_generate` merges `GenerationOptions.tool_calling` into LLM parameters, introduces `_serialize_tool_calls` and `_build_assistant_message` helpers to format tool-call responses, conditionally skips output rails for tool-call-only responses, and returns final message with serialized tool calls.
Tool Configuration Tests `tests/rails/llm/test_options.py`	New unit tests validate `ToolCallingOptions` and `Tool` parsing, supported `tool_choice` modes, function-type validation, forward-compatibility, and round-tripping via `model_dump(exclude_none=True)`.
Model Engine Parsing Tests `tests/guardrails/test_model_engine.py`	Test suite updated to verify tool-call-only response parsing (with content normalization), tool calls alongside text, parallel tool calls preservation, and `reasoning_content` retention. Prior test asserting `ValueError` for tool-only responses is removed.
IORails Tool Calling Tests `tests/guardrails/test_iorails.py`	New `TestToolCalling` class covers `generate_async` tool-calling: forwarding parameters with precedence over `llm_params.tools`, verifying serialization of `ToolCall` objects with JSON-string arguments, skipping output rails for tool-call-only responses, and running rails when text and tool calls coexist.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

cparisien
Pouyanpi

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective: adding tool-calling support for non-streaming tool calls to the main LLM in the IORails component.
Docstring Coverage	✅ Passed	Docstring coverage is 93.55% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes	✅ Passed	PR description includes test results (4,618 passed, 180 skipped) and comprehensive testing information. Verified 134 tool-calling related tests across modified files with full scenario coverage.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/iorails-tool-calling-main-llm

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/guardrails/iorails.py`:
- Around line 411-428: The code determines is_tool_call_only from
response.tool_calls and response_text before injecting reasoning_content, which
allows unmoderated text to slip past rails; move the reasoning_content injection
so response_text is updated before computing is_tool_call_only and before
calling self.rails_manager.is_output_safe (or alternatively, ensure that if
reasoning_content is present you still run is_output_safe on the combined text);
update the flow around is_tool_call_only, response_text,
rails_manager.is_output_safe, and the subsequent return via
_build_assistant_message so the final assistant message (including <think>
reasoning) always goes through output rails and metrics recording when blocked.

In `@nemoguardrails/rails/llm/options.py`:
- Around line 224-232: The ToolCallingOptions docstring is stale about
consumption; update the class docstring for ToolCallingOptions (and mention
GenerationOptions.tool_calling) to state that these options are now forwarded
into the main LLM call (see iorails.py where tool-calling is forwarded into the
main LLM), removing the “not yet consumed by any engine” sentence and replacing
it with a brief note that engines currently receive/forward these options into
the primary LLM invocation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 83ec6fc4-03b8-4c97-8d5c-2d867a5684bc

📥 Commits

Reviewing files that changed from the base of the PR and between 13739de and 033f683.

📒 Files selected for processing (7)

nemoguardrails/guardrails/guardrails_types.py
nemoguardrails/guardrails/iorails.py
nemoguardrails/guardrails/model_engine.py
nemoguardrails/rails/llm/options.py
tests/guardrails/test_iorails.py
tests/guardrails/test_model_engine.py
tests/rails/llm/test_options.py

Pouyanpi

LGTM overall, the response-side tool call handling is clean 👍🏻 One concern: ToolCallingOptions please see my comment below.

…ectly to main LLM

tgasser-nv added 2 commits June 10, 2026 15:12

Add tool calling config to GenerationOptions

f264206

Connect tool calling through to the model engine for main-LLM calls

033f683

tgasser-nv self-assigned this Jun 10, 2026

tgasser-nv changed the title ~~feat(iorails): Tool-calling routing through to main LLM~~ feat(iorails): Tool-calling - non-streaming tool calling to main LLM Jun 10, 2026

tgasser-nv marked this pull request as ready for review June 10, 2026 21:45

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread nemoguardrails/guardrails/iorails.py

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread nemoguardrails/rails/llm/options.py Outdated

tgasser-nv mentioned this pull request Jun 11, 2026

fix(docs): Skip Fern bash-script tests on Windows #2017

Merged

4 tasks

Address PR review feedback

b1c6609

tgasser-nv requested review from Pouyanpi and cparisien June 11, 2026 15:39

Pouyanpi reviewed Jun 12, 2026

View reviewed changes

Comment thread nemoguardrails/rails/llm/options.py

Pouyanpi approved these changes Jun 12, 2026

View reviewed changes

tgasser-nv added 2 commits June 12, 2026 14:47

Remove ToolCallingOptions from GenerationOptions, pass llm_params dir…

f3d89ba

…ectly to main LLM

Improve test coverage

029a05f

tgasser-nv merged commit 73d9627 into develop Jun 12, 2026
7 checks passed

tgasser-nv deleted the feat/iorails-tool-calling-main-llm branch June 12, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iorails): Tool-calling - non-streaming tool calling to main LLM#2016

feat(iorails): Tool-calling - non-streaming tool calling to main LLM#2016
tgasser-nv merged 5 commits into
developfrom
feat/iorails-tool-calling-main-llm

tgasser-nv commented Jun 10, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026 •

edited

Loading

Confidence Score: 5/5

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Pouyanpi left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tgasser-nv commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Test Plan

Pre-commit

Unit-test

Integration test with Chat (uses NVCF models)

Local integration test using e2e_main_llm.py with OpenAI gpt-4o and mocked function call.

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

codecov Bot commented Jun 10, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pouyanpi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tgasser-nv commented Jun 10, 2026 •

edited

Loading

Local integration test using `e2e_main_llm.py` with OpenAI gpt-4o and mocked function call.

greptile-apps Bot commented Jun 10, 2026 •

edited

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading