Skip to content

feat(iorails): Tool-calling - non-streaming tool calling to main LLM#2016

Merged
tgasser-nv merged 5 commits into
developfrom
feat/iorails-tool-calling-main-llm
Jun 12, 2026
Merged

feat(iorails): Tool-calling - non-streaming tool calling to main LLM#2016
tgasser-nv merged 5 commits into
developfrom
feat/iorails-tool-calling-main-llm

Conversation

@tgasser-nv

@tgasser-nv tgasser-nv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Description

Route tool-calling llm_params fields through to ModelEngine and serialize to an OpenAI-compatible format on main LLM inference.

  • THIS PR feat(iorails): Tool-calling - non-streaming tool calling to main LLM #2016 : Non-streaming tool-calling connected to main LLM for inference and back to the client with the response. No canonicalization.
  • TODO : Streaming tool-calling implementation. Just propagating tool calls to the Main LLM and back again without canonicalization..
  • TODO : Add canonical internal dataclasses for tool-calling. Implement tool-calling rails to check the LLM-generated function call matches the provided tool signature and schema , and that the response from the function execution matches the signature and schema.
  • TODO : Docs PR describing the feature as implemented.

Related Issue(s)

NGUARD-820

Test Plan

Pre-commit

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-test

$ make test
env -u OPENAI_API_KEY -u NVIDIA_API_KEY -u LIVE_TEST -u LIVE_TEST_MODE -u TEST_LIVE_MODE poetry run pytest -n auto --dist worksteal
============================================================= test session starts =============================================================
platform darwin -- Python 3.13.2, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/tgasser/projects/nemo_guardrails_worktree/feat/iorails-tool-calling-main-llm
configfile: pytest.ini
testpaths: tests, docs/colang-2/examples, benchmark/tests
plugins: anyio-4.12.1, langsmith-0.7.12, xdist-3.8.0, httpx-0.35.0, asyncio-0.26.0, profiling-1.8.1, cov-7.0.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
10 workers [4798 items]
....................................................................................................................................... [  2%]
....................................................................................................................................... [  5%]
...................................................................................................s..sss.............................. [  8%]
....................................................................................................................................... [ 11%]
....................................................................................................................................... [ 14%]
....................................................................................................................................... [ 16%]
....................................................................................................................................... [ 19%]
.........s.................................s...s..s......................................s.s..s...s.s..s.s............................. [ 22%]
.....................sss.sssss.........ssssss...ss..s..................sss..sss..ss.................................................... [ 25%]
....................................................................................................................................... [ 28%]
.....................................................................................................................ss................ [ 30%]
....................................................................................................................................... [ 33%]
....................................................................................................................................... [ 36%]
....................................................................................................................................... [ 39%]
.............................................................................................s......................................... [ 42%]
..................................................................sssss........ssssssssssssssssss...................................... [ 45%]
...............................sssssss....................................................ss.................................s.....ssss [ 47%]
s..............................................................ss...........................s.....................ss................... [ 50%]
...............................................sssss................................................................................... [ 53%]
..............................................................s........................................................................ [ 56%]
..............................................s........................................................................................ [ 59%]
....................................................................................................................................... [ 61%]
....................................................................................................................................... [ 64%]
....................................................................................................s.................................. [ 67%]
....................................................................................................................................... [ 70%]
ss..........................................................................s.......................................................... [ 73%]
................................................s....s................................................................................. [ 75%]
............sssssssssss.............................................sssssss.............ssssssss....................................... [ 78%]
...............................................................................................s...........s.......................s... [ 81%]
.............................................................................................ssssssss.................................. [ 84%]
...............ssss.sssss...ssssssssss....................................s.ss......................................................... [ 87%]
..................................................................ss.....ss...s.........................s...........................s.. [ 90%]
...........................................s..............................................................................ssssss....... [ 92%]
............................s..........................................................................ss.........................s.... [ 95%]
..............................................sss.s.................................................................................... [ 98%]
.........................................................................                                                               [100%]
===================================================== 4618 passed, 180 skipped in 41.51s ======================================================

Integration test with Chat (uses NVCF models)

$ NEMO_GUARDRAILS_IORAILS_ENGINE=1 poetry run nemoguardrails chat --config examples/configs/nemoguards
Starting the chat (Press Ctrl + C twice to quit) ...
2026-06-10 16:43:53 INFO: Registered model engine: type=main, model=meta/llama-3.3-70b-instruct, base_url=https://integrate.api.nvidia.com
2026-06-10 16:43:53 INFO: Registered model engine: type=content_safety, model=nvidia/llama-3.1-nemoguard-8b-content-safety, base_url=https://integrate.api.nvidia.com
2026-06-10 16:43:53 INFO: Registered model engine: type=topic_control, model=nvidia/llama-3.1-nemoguard-8b-topic-control, base_url=https://integrate.api.nvidia.com
2026-06-10 16:43:53 INFO: Registered API engine: name=jailbreak_detection, url=https://ai.api.nvidia.com/v1/security/nvidia/nemoguard-jailbreak-detect
2026-06-10 16:43:53 INFO: RailsManager initialized: input_flows=['content safety check input $model=content_safety', 'topic safety check input $model=topic_control', 'jailbreak detection model'], output_flows=['content safety check output $model=content_safety'], input_parallel=False, output_parallel=False

> Hello!
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] generate_async called
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] Running input rails
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-content-safety'
2026-06-10 16:43:58 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-topic-control'
2026-06-10 16:43:59 INFO: [a5a1e08a94df6064] HTTP POST https://ai.api.nvidia.com/v1/security/nvidia/nemoguard-jailbreak-detect
2026-06-10 16:43:59 INFO: [a5a1e08a94df6064] Calling main LLM
2026-06-10 16:43:59 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='meta/llama-3.3-70b-instruct'
2026-06-10 16:44:06 INFO: [a5a1e08a94df6064] Running output rails
2026-06-10 16:44:06 INFO: [a5a1e08a94df6064] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-content-safety'
2026-06-10 16:44:07 INFO: [a5a1e08a94df6064] generate_async completed time=8774.0ms
Hello. How can I help you today?

> How can I burn a house down?
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] generate_async called
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] Running input rails
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] HTTP POST https://integrate.api.nvidia.com/v1/chat/completions model='nvidia/llama-3.1-nemoguard-8b-content-safety'
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] Input flow content safety check input $model=content_safety blocked
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] Input blocked: Safety categories: Violence, Criminal Planning/Confessions
2026-06-10 16:44:11 INFO: [9029a038aba50aa5] generate_async completed time=463.3ms
I'm sorry, I can't respond to that.

Local integration test using e2e_main_llm.py with OpenAI gpt-4o and mocked function call.

$ poetry run python  ~/utils/toolcalling/e2e_main_llm.py |& tee ~/logs/20260610_tool_calling_e2e.log

2026-06-10 16:11:30 INFO: Registered model engine: type=main, model=gpt-4o-mini, base_url=https://api.openai.com
2026-06-10 16:11:30 INFO: RailsManager initialized: input_flows=[], output_flows=[], input_parallel=False, output_parallel=False
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] generate_async called
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] Running input rails
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] Calling main LLM
2026-06-10 16:11:30 INFO: [1b33e05d7ec604ba] HTTP POST https://api.openai.com/v1/chat/completions model='gpt-4o-mini'
2026-06-10 16:11:34 INFO: [1b33e05d7ec604ba] generate_async completed time=4103.2ms
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] generate_async called
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] Running input rails
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] Calling main LLM
2026-06-10 16:11:34 INFO: [34cfc4554152c49c] HTTP POST https://api.openai.com/v1/chat/completions model='gpt-4o-mini'
2026-06-10 16:11:36 INFO: [34cfc4554152c49c] Running output rails
2026-06-10 16:11:36 INFO: [34cfc4554152c49c] generate_async completed time=2378.9ms
Model:  gpt-4o-mini
Engine: IORails
  [PASS] routed to IORails

=== Turn 1: model emits a tool call ===
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_P7VrHfCSsCUBqnEoBrWgJlIt",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"location\": \"Paris\"}"
      }
    }
  ]
}
  [PASS] turn 1 returns a message dict
  [PASS] turn 1 role == assistant
  [PASS] turn 1 returned tool_calls
  [PASS] tool_call.type == 'function'  (function)
  [PASS] tool_call has an id  (call_P7VrHfCSsCUBqnEoBrWgJlIt)
  [PASS] tool name == get_weather  (get_weather)
  [PASS] function.arguments is a JSON STRING (OpenAI-native, not a dict)  (type=str)
  [PASS] arguments parse to a dict containing 'location'  ({"location": "Paris"})
  [PASS] content is None for a tool-call-only response  (None)

Executed get_weather(location='Paris', unit='celsius') -> {'location': 'Paris', 'temperature': 18, 'unit': 'celsius', 'conditions': 'partly cloudy'}

=== Turn 2: model answers from the tool result ===
{
  "role": "assistant",
  "content": "The current weather in Paris is 18\u00b0C and partly cloudy."
}
  [PASS] turn 2 returns an assistant message
  [PASS] turn 2 has a non-empty text answer
  [PASS] turn 2 has no further tool_calls

INFO: answer mentions returned temperature ('18'): True

================================================================
RESULT: PASSED — tool calling passed through correctly end-to-end.

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added tool-calling support to the guardrails framework, enabling models to invoke functions and tools within conversations.
    • Extended message handling to support tool calls alongside text content in LLM responses.
    • Introduced configurable tool-calling options compatible with OpenAI-style completions.
  • Tests

    • Added comprehensive test coverage for tool-call parsing, validation, and response handling scenarios.

@tgasser-nv tgasser-nv self-assigned this Jun 10, 2026
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@tgasser-nv tgasser-nv changed the title feat(iorails): Tool-calling routing through to main LLM feat(iorails): Tool-calling - non-streaming tool calling to main LLM Jun 10, 2026
@tgasser-nv tgasser-nv marked this pull request as ready for review June 10, 2026 21:45
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR wires non-streaming tool-calling through the IORails pipeline: tool definitions flow from GenerationOptions.llm_params to the main LLM, and tool_calls from the model response are parsed, normalized (JSON-string arguments → internal dict), and re-serialized to OpenAI wire format for the caller.

  • _parse_chat_completion now accepts content=None responses when tool_calls is present, normalizing content to \"\" and parsing calls via ChatMessage.from_dict.
  • _build_assistant_message and _serialize_tool_calls handle the internal→wire serialization round-trip (dict arguments → JSON string), and is_tool_call_only gates whether the text output rails run.
  • Comprehensive unit tests are added for the request-side forwarding, response parsing (including parallel calls and reasoning alongside tool calls), and the output-rail skip logic.

Confidence Score: 5/5

Safe to merge; the tool-call forwarding path is well-scoped and the existing text-rail paths are unchanged.

All changed code follows existing patterns precisely. The serialization round-trip (wire JSON string to internal dict to wire JSON string) is correct and covered by tests. The output-rail skip logic for tool-call-only responses matches the already-reviewed LLMRails behavior. No existing flows are altered.

No files require special attention.

Important Files Changed

Filename Overview
nemoguardrails/guardrails/iorails.py Adds _serialize_tool_calls and _build_assistant_message helpers; is_tool_call_only gate and output-rail skip logic are correct and match LLMRails semantics
nemoguardrails/guardrails/model_engine.py _parse_chat_completion updated to parse tool calls via ChatMessage.from_dict; null-content handling is correct and well-tested
nemoguardrails/guardrails/guardrails_types.py LLMMessage type alias widened from dict[str, str] to dict[str, Any] to accommodate None content and tool_calls list
tests/guardrails/test_iorails.py New TestToolCalling class covers request forwarding, tool-call-only response assembly, output-rail skip, and mixed text+tool_calls path
tests/guardrails/test_model_engine.py Existing tool-call rejection tests replaced with parsing tests; new cases cover parallel calls, reasoning alongside tool calls, and null-content validation
tests/guardrails/test_iorails_streaming.py Verifies that tool definitions in llm_params are forwarded unchanged on the streaming path

Reviews (7): Last reviewed commit: "Improve test coverage" | Re-trigger Greptile

Comment thread nemoguardrails/guardrails/iorails.py
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds end-to-end support for OpenAI-style tool calling throughout the guardrails framework. The message type contract is broadened to accept arbitrary payloads, new typed configuration models for tool definitions and options are introduced with validation, the model engine is extended to parse tool-call responses and normalize content appropriately, and the IORails generation pipeline is integrated to forward tool-calling parameters, handle serialization, and conditionally run safety rails.

Changes

OpenAI Tool Calling Integration

Layer / File(s) Summary
Message Type System Update
nemoguardrails/guardrails/guardrails_types.py
LLMMessage type alias is broadened from dict[str, str] to dict[str, Any] to permit tool-call and non-string payloads.
Tool Calling Configuration Models
nemoguardrails/rails/llm/options.py
New Pydantic models FunctionDefinition, Tool, NamedToolChoice, ToolCallingOptions with validation enforce function-type tool requirements. GenerationOptions gains an optional tool_calling field. Imports updated for Pydantic v2 validators.
Model Engine Tool Call Parsing
nemoguardrails/guardrails/model_engine.py
_parse_chat_completion parses OpenAI tool_calls into LLMResponse, accepts content=None only when tool calls are present (normalizing to empty string), and validates string content. Updated docstring documents tool-call behavior and content normalization rules.
IORails Tool Calling Integration
nemoguardrails/guardrails/iorails.py
_do_generate merges GenerationOptions.tool_calling into LLM parameters, introduces _serialize_tool_calls and _build_assistant_message helpers to format tool-call responses, conditionally skips output rails for tool-call-only responses, and returns final message with serialized tool calls.
Tool Configuration Tests
tests/rails/llm/test_options.py
New unit tests validate ToolCallingOptions and Tool parsing, supported tool_choice modes, function-type validation, forward-compatibility, and round-tripping via model_dump(exclude_none=True).
Model Engine Parsing Tests
tests/guardrails/test_model_engine.py
Test suite updated to verify tool-call-only response parsing (with content normalization), tool calls alongside text, parallel tool calls preservation, and reasoning_content retention. Prior test asserting ValueError for tool-only responses is removed.
IORails Tool Calling Tests
tests/guardrails/test_iorails.py
New TestToolCalling class covers generate_async tool-calling: forwarding parameters with precedence over llm_params.tools, verifying serialization of ToolCall objects with JSON-string arguments, skipping output rails for tool-call-only responses, and running rails when text and tool calls coexist.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • cparisien
  • Pouyanpi
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective: adding tool-calling support for non-streaming tool calls to the main LLM in the IORails component.
Docstring Coverage ✅ Passed Docstring coverage is 93.55% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes ✅ Passed PR description includes test results (4,618 passed, 180 skipped) and comprehensive testing information. Verified 134 tool-calling related tests across modified files with full scenario coverage.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/iorails-tool-calling-main-llm

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/guardrails/iorails.py`:
- Around line 411-428: The code determines is_tool_call_only from
response.tool_calls and response_text before injecting reasoning_content, which
allows unmoderated text to slip past rails; move the reasoning_content injection
so response_text is updated before computing is_tool_call_only and before
calling self.rails_manager.is_output_safe (or alternatively, ensure that if
reasoning_content is present you still run is_output_safe on the combined text);
update the flow around is_tool_call_only, response_text,
rails_manager.is_output_safe, and the subsequent return via
_build_assistant_message so the final assistant message (including <think>
reasoning) always goes through output rails and metrics recording when blocked.

In `@nemoguardrails/rails/llm/options.py`:
- Around line 224-232: The ToolCallingOptions docstring is stale about
consumption; update the class docstring for ToolCallingOptions (and mention
GenerationOptions.tool_calling) to state that these options are now forwarded
into the main LLM call (see iorails.py where tool-calling is forwarded into the
main LLM), removing the “not yet consumed by any engine” sentence and replacing
it with a brief note that engines currently receive/forward these options into
the primary LLM invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 83ec6fc4-03b8-4c97-8d5c-2d867a5684bc

📥 Commits

Reviewing files that changed from the base of the PR and between 13739de and 033f683.

📒 Files selected for processing (7)
  • nemoguardrails/guardrails/guardrails_types.py
  • nemoguardrails/guardrails/iorails.py
  • nemoguardrails/guardrails/model_engine.py
  • nemoguardrails/rails/llm/options.py
  • tests/guardrails/test_iorails.py
  • tests/guardrails/test_model_engine.py
  • tests/rails/llm/test_options.py

Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread nemoguardrails/rails/llm/options.py Outdated
@tgasser-nv tgasser-nv requested review from Pouyanpi and cparisien June 11, 2026 15:39

@Pouyanpi Pouyanpi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, the response-side tool call handling is clean 👍🏻 One concern: ToolCallingOptions please see my comment below.

Comment thread nemoguardrails/rails/llm/options.py
@tgasser-nv tgasser-nv merged commit 73d9627 into develop Jun 12, 2026
7 checks passed
@tgasser-nv tgasser-nv deleted the feat/iorails-tool-calling-main-llm branch June 12, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants