fix(iorails): Apply inference-time llm_params on top of Model.parameters in ModelEngine by tgasser-nv · Pull Request #2020 · NVIDIA-NeMo/Guardrails

tgasser-nv · 2026-06-12T16:07:34Z

Description

IORails was silently dropping llm_params sampling fields configured in a model's parameters (i.e. Model.parameters). It took GenerationOptions.llm_params at inference-time only.

This PR filters out reserved parameters (_RESERVED_LLM_PARAMETERS) in the Model.parameters dict and stores these per-model so they're used on every call. If GenerationOptions.llm_params keys are provided they overwrite the Model.parameters values.

Related Issue(s)

#2009 Introduced LLM span attributes including parameters.

Test Plan

Pre-commit

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-test

$ make test
env -u OPENAI_API_KEY -u NVIDIA_API_KEY -u LIVE_TEST -u LIVE_TEST_MODE -u TEST_LIVE_MODE poetry run pytest -n auto --dist worksteal
========================================================== test session starts ==========================================================
platform darwin -- Python 3.13.2, pytest-8.4.2, pluggy-1.6.0
rootdir: /Users/tgasser/projects/nemo_guardrails_worktree/fix/iorails-model-param-merge
configfile: pytest.ini
testpaths: tests, docs/colang-2/examples, benchmark/tests
plugins: anyio-4.12.1, langsmith-0.7.12, xdist-3.8.0, httpx-0.35.0, asyncio-0.26.0, profiling-1.8.1, cov-7.0.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
10 workers [4820 items]
................................................................................................................................. [  2%]
................................................................................................................................. [  5%]
......................................................................................................s.sss...................... [  8%]
................................................................................................................................. [ 10%]
................................................................................................................................. [ 13%]
................................................................................................................................. [ 16%]
.................................................................................ssssssss.........ssssss...ss...s................ [ 18%]
.....................s.s..s..sssss............................................................................................... [ 21%]
............s...................s...s............................................................................................ [ 24%]
......................................ss.............s...............s........s....s......s...................................... [ 26%]
....s.......................................................................................................ss................... [ 29%]
................................................................................................................................. [ 32%]
................................................................................................................................. [ 34%]
....................................................................s............................................................ [ 37%]
................................................................................................................................. [ 40%]
.....................................sss.ss...........................................sssssss........................ssssssssssss [ 42%]
ssssss.........................ss.......................................s........................................................ [ 45%]
...........ss......s............................................................................................................. [ 48%]
...ss......................s.s.sss............................................................s.................................. [ 50%]
...............................................................s.s.sss........................................................... [ 53%]
.............................................................................................s................................... [ 56%]
................................................................................................................................. [ 58%]
................................................................................................................................. [ 61%]
................................................................................................................................. [ 64%]
................................................................................................................................. [ 66%]
................................................................................................................................. [ 69%]
................................................................................................................................. [ 72%]
...........................................................s......................................................s.............s [ 74%]
sss.ssssss..............................................................................ss....................................... [ 77%]
........................................................................sssss.ss.............sssssss.s........................... [ 80%]
..............................s..s................................s.............................................................. [ 82%]
.........ssssssss.............................................................sss.........s.s.................................... [ 85%]
................s..............................s.s.........ss........s........................................................... [ 88%]
...........s...........................................ssss.sss..ss...ssss.s.....ssss...s........................................ [ 90%]
................................................................................................................................. [ 93%]
......ssss...s...............................................................................ssssss.ss........................... [ 96%]
......ss..........................s.............................................................................................. [ 99%]
...............................................                                                                                   [100%]
================================================== 4640 passed, 180 skipped in 37.63s ===================================================

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

….parameters (after filtering out reserved parameters)

codecov · 2026-06-12T16:12:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-12T16:16:25Z

Greptile Summary

This PR fixes a bug where Model.parameters sampling fields (e.g. temperature, max_tokens) were silently dropped by IORails — only per-call GenerationOptions.llm_params were forwarded to the inference endpoint. The fix introduces _RESERVED_LLM_PARAMETERS to strip transport/secret/streaming-control keys from Model.parameters, stores the rest as an immutable MappingProxyType on each ModelEngine, and merges it under per-call kwargs in both model_call and stream_model_call.

model_engine.py: Adds _RESERVED_LLM_PARAMETERS frozenset with documented rationale for each exclusion, and body_param_defaults: Mapping[str, Any] wrapping the filtered params in a MappingProxyType so the snapshot is immutable after __init__.
engine_registry.py: Both model_call and stream_model_call compute merged_params = {**engine.body_param_defaults, **kwargs} before passing to chat_completion/stream_chat_completion and set_llm_request_attributes, so the span reflects the actual sent parameters.
Tests: Comprehensive coverage in TestModelEngineBodyParamDefaults and TestEngineRegistryParameterDefaults validates filtering, precedence, and the stream duplicate-keyword guard.

Confidence Score: 5/5

Safe to merge — the change is a targeted bugfix with no observable regressions for callers that set no Model.parameters, and a correct improvement for those that do.

The merge logic is straightforward ({**defaults, **overrides}), reserved-key exclusions are well-documented and cover all keys that could collide with explicit body or stream arguments, MappingProxyType prevents shared-state mutation, and the test suite validates every combination including precedence order and the stream duplicate-keyword guard.

No files require special attention.

Important Files Changed

Filename	Overview
nemoguardrails/guardrails/model_engine.py	Adds `_RESERVED_LLM_PARAMETERS` frozenset and `body_param_defaults` (immutable `MappingProxyType`) to `ModelEngine.__init__`, correctly filtering transport/secret/streaming keys from `Model.parameters` before exposing them as per-request body defaults.
nemoguardrails/guardrails/engine_registry.py	Both `model_call` and `stream_model_call` now compute `merged_params = {engine.body_param_defaults, kwargs}` before passing to the engine and span attributes, correctly giving per-call kwargs priority over config-level defaults.
tests/guardrails/test_model_engine.py	New `TestModelEngineBodyParamDefaults` class covers all reserved-key categories (transport, secret, streaming-control, identity, client-only) and the sampling-pass-through case with isolated unit tests.
tests/guardrails/test_engine_registry.py	New `TestEngineRegistryParameterDefaults` class covers both non-streaming and streaming paths including precedence ordering, reserved-key exclusion, and the `stream` duplicate-keyword guard.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant EngineRegistry
    participant ModelEngine

    Note over Caller,ModelEngine: Before this PR — Model.parameters dropped
    Caller->>EngineRegistry: "model_call(model_type, messages, **llm_params)"
    EngineRegistry->>ModelEngine: "chat_completion(messages, **llm_params)"
    Note right of ModelEngine: Model.parameters ignored

    Note over Caller,ModelEngine: After this PR — parameters merged
    Caller->>EngineRegistry: "model_call(model_type, messages, **llm_params)"
    EngineRegistry->>EngineRegistry: "merged = {**body_param_defaults, **llm_params}"
    Note right of EngineRegistry: body_param_defaults = Model.parameters minus _RESERVED_LLM_PARAMETERS
    EngineRegistry->>ModelEngine: "chat_completion(messages, **merged)"
    Note right of ModelEngine: Config defaults + per-call overrides both applied

_{Reviews (3): Last reviewed commit: "Add default_headers and default_query to..." | Re-trigger Greptile}

coderabbitai · 2026-06-12T16:17:46Z

📝 Walkthrough

Walkthrough

Model parameter defaults are now computed and filtered during engine initialization, retaining only sampling/body parameters while excluding reserved transport and streaming keys. When EngineRegistry invokes model calls, it merges these model-configured defaults with per-call kwargs, with per-call values taking precedence, and applies the merged parameters to both LLM span request attributes and the actual chat-completion call.

Changes

Model Parameter Defaults and Merging

Layer / File(s)	Summary
ModelEngine reserved parameters and body_param_defaults `nemoguardrails/guardrails/model_engine.py`, `tests/guardrails/test_model_engine.py`	Defines `_RESERVED_LLM_PARAMETERS` enumerating transport/retry, API key, streaming control, and identity/model/message fields that must not appear in request bodies. During `ModelEngine.__init__`, computes `body_param_defaults` by filtering `Model.parameters` to exclude reserved keys, exposing sampling parameters as per-request defaults. Tests verify sampling parameters are preserved and reserved/transport/secret keys are excluded.
EngineRegistry parameter merging in model_call and stream_model_call `nemoguardrails/guardrails/engine_registry.py`, `tests/guardrails/test_engine_registry.py`	Both `model_call` and `stream_model_call` merge `engine.body_param_defaults` with per-call `kwargs`, with per-call values overriding defaults. Merged parameters populate LLM span request attributes and are passed to `engine.chat_completion` or `engine.stream_chat_completion`. Tests verify config defaults populate request bodies and span attributes, per-call overrides take precedence, and streaming path correctly excludes reserved keys while preserving sampling defaults.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR description provided has no test results/CI or regression/perf notes—only a checklist—despite changing LLM parameter merging behavior.	Update PR description to include testing info (e.g., pytest/CI run, affected suites) and, if relevant, any performance/numeric/regression validation before/after.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: applying inference-time llm_params on top of Model.parameters in ModelEngine, which is the core objective addressed across engine_registry.py and model_engine.py.
Docstring Coverage	✅ Passed	Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/iorails-model-param-merge

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Pouyanpi

LGTM 👍🏻

a side note:

IORails does not use the shared OpenAI-compatible client yet, so I don’t think we need to fully support default_headers or default_query here. But since these are client only options, can we add them to the reserved (filter) list so they don’t get forwarded as chat completion body fields? that keeps current behavior safe and leave proper client wiring for a (potential) future refactor.

Update ModelEngine to apply inference-time llm_params on top of Model…

e6e6a3d

….parameters (after filtering out reserved parameters)

tgasser-nv marked this pull request as ready for review June 12, 2026 16:12

greptile-apps Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread nemoguardrails/guardrails/model_engine.py

Comment thread nemoguardrails/guardrails/model_engine.py Outdated

Apply Greptile feedback

08e2eb5

tgasser-nv self-assigned this Jun 12, 2026

tgasser-nv requested review from Pouyanpi and cparisien June 12, 2026 16:42

Pouyanpi added this to the v0.23.0 milestone Jun 12, 2026

Pouyanpi approved these changes Jun 12, 2026

View reviewed changes

Add default_headers and default_query to reserved model parameters

53cb26e

tgasser-nv merged commit 81df835 into develop Jun 12, 2026
7 checks passed

tgasser-nv deleted the fix/iorails-model-param-merge branch June 12, 2026 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(iorails): Apply inference-time llm_params on top of Model.parameters in ModelEngine#2020

fix(iorails): Apply inference-time llm_params on top of Model.parameters in ModelEngine#2020
tgasser-nv merged 3 commits into
developfrom
fix/iorails-model-param-merge

tgasser-nv commented Jun 12, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 12, 2026

Uh oh!

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Pouyanpi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tgasser-nv commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Test Plan

Pre-commit

Unit-test

Checklist

Uh oh!

codecov Bot commented Jun 12, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Pouyanpi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tgasser-nv commented Jun 12, 2026 •

edited

Loading

greptile-apps Bot commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading