feat(iorails): IORails support for streaming output rails by tgasser-nv · Pull Request #1765 · NVIDIA-NeMo/Guardrails

tgasser-nv · 2026-04-06T18:16:10Z

Description

Adds output-rail streaming to IORails. Integrations into the API service, chat, etc via Guardrails top-level in follow-on PR #1766 . Key points:

Two concurrent paths: _generation_task pushes chunks into the StreamingHandler queue; _wrapped_iterator (or _run_output_rails_in_streaming) consumes
from it
StreamingHandler is the bridge — it's an AsyncIterator backed by an asyncio.Queue
RollingBuffer sits between the StreamingHandler and the caller, batching tokens and providing overlap context for output rail checks
Output rails use the same ModelEngine.call() (non-streaming) as input rails — only the main LLM uses stream_call()

High-level overview of the change below:

*  IORails.stream_async(messages)
  │
  ├─ _validate_streaming_with_output_rails()
  │
  └─ _wrapped_iterator()                          [async generator returned to caller]
     │
     ├─ Semaphore.acquire()                       [concurrency gate]
     ├─ set_new_request_id()
     │
     ├─ asyncio.create_task(_generation_task)      [background task ─────────────────┐
     │                                                                               │
     │   _generation_task()                                                          │
     │   │                                                                           │
     │   ├─ RailsManager.is_input_safe(messages)   [input rails, non-streaming]      │
     │   │   └─ ModelManager.generate_async("content_safety", ...)                   │
     │   │       └─ ModelEngine.call()              [HTTP POST, full response]       │
     │   │                                                                           │
     │   ├─ ModelManager.stream_async("main", messages)                              │
     │   │   └─ ModelEngine.stream_call()           [HTTP POST stream=True, SSE]     │
     │   │       └─ yields content delta strings ──┐                                 │
     │   │                                         │                                 │
     │   │   for each chunk:                       │                                 │
     │   │       StreamingHandler.push_chunk(chunk) ◄┘                               │
     │   │           └─ queue.put(chunk)                                             │
     │   │                                                                           │
     │   └─ StreamingHandler.push_chunk(END_OF_STREAM)                               │
     │                                              │                                │
     │                                              ▼                                │
     │                                     StreamingHandler                          │
     │                                     [AsyncIterator]                           │
     │                                     queue: chunk→chunk→...→EOS                │
     │                                              │                                │
     │  ┌───────────────────────────────────────────┘                                │
     │  │                                                                            │
     │  ▼  [if output rails streaming enabled]                                       │
     │                                                                               │
     ├─ _run_output_rails_in_streaming(streaming_handler, messages)                  │
     │  │                                                                            │
     │  ├─ get_buffer_strategy(config)                                               │
     │  │   └─ RollingBuffer(context_size, chunk_size)                               │
     │  │                                                                            │
     │  └─ async for chunk_batch in buffer_strategy(streaming_handler):              │
     │     │                                                                         │
     │     │  RollingBuffer.process_stream()                                         │
     │     │  ├─ accumulates chunks until len(buffer) >= chunk_size                  │
     │     │  └─ yields ChunkBatch:                                                  │
     │     │       .processing_context  = [context_size overlap + chunk_size tokens] │
     │     │       .user_output_chunks  = [new tokens only]                          │
     │     │                                                                         │
     │     ├─ [stream_first=True]  yield user_output_chunks to caller                │
     │     │                                                                         │
     │     ├─ RailsManager.is_output_safe(messages, processing_context_text)         │
     │     │   └─ ModelManager.generate_async("content_safety", ...)                 │
     │     │       └─ ModelEngine.call()                                             │
     │     │                                                                         │
     │     ├─ [unsafe] → yield error JSON, return                                    │
     │     │                                                                         │
     │     └─ [stream_first=False] yield user_output_chunks to caller                │
     │                                                                               │
     │  [if NO output rails]                                                         │
     ├─ async for chunk in streaming_handler: yield chunk                            │
     │                                                                               │
     └─ finally:                                                                     │
        ├─ Semaphore.release()                                                       │
        ├─ await task  ◄─────────────────────────────────────────────────────────────┘
        └─ reset_request_id(token)

Test Plan

Pre-commit

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-tests

$ poetry run pytest -q
.......................ssss.........................................................................................s............................... [  4%]
.................................................................................................................................................... [  9%]
.................................................................................................................................................... [ 13%]
.................................................................................................................................................... [ 18%]
.................................................................................................................................................... [ 22%]
.......................s......ss...................sssssss.......................................................................................... [ 27%]
...................................................................s.......s.........................................ss...........................s. [ 31%]
..............s.......sssss...............................................................s......................................................... [ 36%]
.............................................................ss........ss...ss............................................s......................... [ 40%]
............................s............s.......................................................................................................... [ 45%]
.................................................................................................................................................... [ 50%]
...................................................................sssss......ssssssssssssssssss.........sssss...................................... [ 54%]
..............................................s...........ss...................................sssssssss.ssssssssss................................s [ 59%]
...................................................s....s........................................................ssssssss..............sss...ss...ss [ 63%]
.....ssssssssssssss................................................................................................................................. [ 68%]
.....s..............................................................................................................s....................ssssssss... [ 72%]
......ss......................................................................................................................................ssssss [ 77%]
s...........................................................................s....................................................................... [ 81%]
..................................ss................................................................................................................ [ 86%]
.................................................................................................................................................... [ 90%]
...................................s................................................................................................................ [ 95%]
.................................................................................................................................................... [100%]
3115 passed, 141 skipped in 137.15s (0:02:17)

Local integration test

20260406_iorails_streaming_integ_tests.ipynb

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

greptile-apps · 2026-04-06T18:22:53Z

Greptile Summary

This PR adds streaming output rail support to IORails via a two-task architecture: a background _generation_task pushes SSE chunks into a StreamingHandler queue, and the _wrapped_iterator (or _run_output_rails_in_streaming) consumes them with optional RollingBuffer-based batching for output-rail checks. The ModelEngine gains a new stream_call() method for SSE parsing, and ModelManager exposes it via stream_async(). The implementation is well-structured with good concurrency control and error propagation.

One logic issue in _run_output_rails_in_streaming: when a generation error and content chunks land in the same buffer batch (e.g. the stream is shorter than chunk_size), the content chunks before the error are silently dropped even in stream_first=True mode.

Confidence Score: 4/5

Safe to merge after addressing the content-chunk loss bug in _run_output_rails_in_streaming.

One P1 logic bug: content chunks that share a flush batch with an error chunk are silently dropped, violating stream_first=True semantics. All previous thread concerns (TOCTOU race, reset_request_id ordering, semaphore leak path) are resolved. The rest of the implementation is well-structured and comprehensively tested.

nemoguardrails/guardrails/iorails.py — specifically _run_output_rails_in_streaming() error-detection ordering.

Important Files Changed

Filename	Overview
nemoguardrails/guardrails/iorails.py	Core streaming implementation: adds stream_async(), _generation_task(), _wrapped_iterator(), and _run_output_rails_in_streaming(); has a logic bug where content chunks preceding an error in the same buffer batch are silently dropped in stream_first=True mode.
nemoguardrails/guardrails/model_engine.py	Adds stream_call() with correct SSE line-by-line parsing via readline(); refactors shared request-building logic into _prepare_request()/_ensure_running()/_raise_for_status(); sock_read timeout reuses total timeout which may be too tight for slow generators.
nemoguardrails/guardrails/model_manager.py	Adds stream_async() as a thin async generator proxy over engine.stream_call(); straightforward and correct.
tests/guardrails/test_iorails_streaming.py	New streaming test suite covering validation, no-output-rails path, stream_first/gated modes, error propagation, and concurrency; does not assert against content-chunk loss when error shares a buffer batch with real chunks.
tests/guardrails/test_iorails.py	Adds two tests for generate_async: dict options forwarding and exception propagation; both correct.
tests/guardrails/test_model_engine.py	Tests for the new stream_call() and refactored helper methods; coverage looks adequate.
tests/guardrails/test_model_manager.py	Adds tests for ModelManager.stream_async(); straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant _wrapped_iterator
    participant _generation_task
    participant StreamingHandler
    participant RollingBuffer
    participant RailsManager
    participant ModelManager

    Caller->>_wrapped_iterator: async for chunk in stream_async()
    _wrapped_iterator->>_wrapped_iterator: semaphore.acquire()
    _wrapped_iterator->>_generation_task: asyncio.create_task()

    par Background Task
        _generation_task->>RailsManager: is_input_safe(messages)
        RailsManager-->>_generation_task: RailResult
        alt input safe
            _generation_task->>ModelManager: stream_async("main", messages)
            loop SSE chunks
                ModelManager-->>StreamingHandler: push_chunk(delta)
            end
            _generation_task->>StreamingHandler: push_chunk(END_OF_STREAM)
        else input blocked
            _generation_task->>StreamingHandler: push_chunk(REFUSAL_MESSAGE)
            _generation_task->>StreamingHandler: push_chunk(END_OF_STREAM)
        end
    and Consumer (with output rails)
        _wrapped_iterator->>RollingBuffer: buffer_strategy(streaming_handler)
        loop chunk_batch from RollingBuffer
            RollingBuffer->>RollingBuffer: accumulate until chunk_size tokens
            RollingBuffer-->>_wrapped_iterator: ChunkBatch
            alt stream_first=True
                _wrapped_iterator-->>Caller: yield user_output_chunks
            end
            _wrapped_iterator->>RailsManager: is_output_safe(messages, processing_context)
            RailsManager-->>_wrapped_iterator: RailResult
            alt output safe AND stream_first=False
                _wrapped_iterator-->>Caller: yield user_output_chunks
            else output blocked
                _wrapped_iterator-->>Caller: yield error JSON
            end
        end
    end

    _wrapped_iterator->>_generation_task: task.cancel() (if not done)
    _wrapped_iterator->>_wrapped_iterator: semaphore.release()

Prompt To Fix All With AI

This is a comment left during a code review.
Path: nemoguardrails/guardrails/iorails.py
Line: 352-363

Comment:
**Content chunks silently dropped in same-batch error scenario**

The error-detection loop runs before the `if stream_first: yield` block. When a generation error and content chunks land in the same flush batch (e.g. `chunk_size=5` but the stream raises after only 2 tokens), all preceding content chunks in `user_output_chunks` are discarded — only the error JSON is yielded. In `stream_first=True` mode this directly contradicts the "deliver eagerly" contract.

This is reproducible with the existing `_mid_stream_failure` mock (`chunk_size=5`, 2 real chunks then raise): all three items end up in a single flush at `END_OF_STREAM`, `real1`/`real2` are silently lost.

```python
# Separate error from real content first
real_chunks: list = []
error_chunk: str | None = None
for chunk in user_output_chunks:
    try:
        parsed = json.loads(chunk)
        if isinstance(parsed, dict) and parsed.get("error", {}).get("type") == _GENERATION_ERROR_TYPE:
            error_chunk = chunk
            break
    except (json.JSONDecodeError, TypeError):
        real_chunks.append(chunk)

if stream_first:
    for chunk in real_chunks:
        yield chunk
if error_chunk:
    yield error_chunk
    return

# Run output rails on the accumulated context
req_id = get_request_id()
log.info("[%s] Running output rails", req_id)
output_result = await self.rails_manager.is_output_safe(messages, bot_response_chunk)
if not output_result.is_safe:
    log.info("[%s] Output blocked: %s", req_id, output_result.reason)
    error_data = {
        "error": {
            "message": f"Blocked by output rails: {output_result.reason}",
            "type": "guardrails_violation",
            "code": "content_blocked",
        }
    }
    yield json.dumps(error_data)
    return

if not stream_first:
    for chunk in real_chunks:
        yield chunk
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemoguardrails/guardrails/model_engine.py
Line: 286-290

Comment:
**`sock_read` reuses the non-streaming total timeout**

`self._timeout.total` was sized for a complete blocking request. Using it as the per-chunk `sock_read` deadline means a model configured with `timeout=30` will abort streaming if no SSE chunk arrives within 30 seconds — which is plausible for a slow first-token scenario. Consider exposing a separate `timeout_sock_read` parameter, or documenting that operators should set `timeout` to the expected inter-chunk interval rather than the total generation duration.

```python
stream_timeout = aiohttp.ClientTimeout(
    total=None,
    connect=self._timeout.connect,
    sock_read=float(params.get("timeout_sock_read", self._timeout.total)),
)
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (6): Last reviewed commit: "Address Greptile feedback on SSE parsing..." | Re-trigger Greptile}

codecov · 2026-04-06T18:23:40Z

Codecov Report

❌ Patch coverage is 99.02913% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/guardrails/model_engine.py	97.43%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai · 2026-04-06T18:29:45Z

📝 Walkthrough

Walkthrough

This PR introduces streaming support to the NemoGuardrails framework across multiple layers. It adds a stream_async() method to IORails for applying input and output rails to streamed content with concurrency control, a stream_call() method in ModelEngine to handle SSE parsing, and a wrapper in ModelManager. A configuration file is also updated to use the NIM engine instead of NVIDIA AI Endpoints, with adjusted streaming chunk parameters.

Changes

Cohort / File(s)	Summary
Configuration Update `examples/configs/gs_content_safety/config/config.yml`	Updated engine from `nvidia_ai_endpoints` to `nim` for both main and content safety models; adjusted streaming `chunk_size` from 200 to 5 and `context_size` from 50 to 1.
Model Engine Streaming `nemoguardrails/guardrails/model_engine.py`	Refactored request handling with helper methods (`_ensure_running()`, `_prepare_request()`, `_raise_for_status()`, `_wrap_exception()`); added `stream_call()` public method for SSE streaming with `stream=True`, JSON event parsing, `[DONE]` termination detection, and consistent error handling.
Rails Streaming Layer `nemoguardrails/guardrails/iorails.py`	Added `stream_async()` method with input rails validation and output rails buffering; introduced `_run_output_rails_in_streaming()` for buffered validation with `stream_first` mode support; added semaphore-based concurrency control (`STREAM_MAX_CONCURRENCY = 256`) and `_validate_streaming_with_output_rails()` error validation.
Manager Streaming Wrapper `nemoguardrails/guardrails/model_manager.py`	Added `stream_async()` async generator method that obtains a `ModelEngine` and yields chunks from `engine.stream_call()` with request logging.
Streaming Tests `tests/guardrails/test_iorails_streaming.py`	Comprehensive async unit tests covering validation errors, input-only rails, output rails in both `stream_first` modes, error propagation, and semaphore-based concurrency control with mock LLM streaming and rail checks.
Engine Streaming Tests `tests/guardrails/test_model_engine.py`	Added `TestModelEngineStreamCall` suite verifying SSE delta parsing, `[DONE]` termination, request payload construction, HTTP error responses, pre-start validation, timeout configuration, and delta content filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant IORails
    participant ModelManager
    participant ModelEngine
    participant BufferStrategy

    Client->>IORails: stream_async(messages, options)
    IORails->>IORails: _validate_streaming_with_output_rails()
    IORails->>IORails: Apply input rails (synchronous)
    
    alt Input Unsafe
        IORails-->>Client: REFUSAL_MESSAGE
    else Input Safe
        IORails->>ModelManager: stream_async(model_type, messages)
        ModelManager->>ModelEngine: stream_call(messages, kwargs)
        
        loop Streaming
            ModelEngine-->>ModelManager: SSE chunk (delta.content)
            ModelManager-->>IORails: chunk
            
            alt Output Rails Enabled
                IORails->>BufferStrategy: Buffer chunk
                BufferStrategy-->>IORails: buffered content
                IORails->>IORails: Check is_output_safe()
                
                alt stream_first=true
                    IORails-->>Client: Yield chunk (optimistic)
                    alt Unsafe Detected
                        IORails-->>Client: Error JSON payload
                    end
                else stream_first=false
                    alt Unsafe Detected
                        IORails-->>Client: Error JSON payload
                    else Safe
                        IORails-->>Client: Yield chunk
                    end
                end
            else Output Rails Disabled
                IORails-->>Client: Yield chunk
            end
        end
        
        ModelEngine-->>IORails: [DONE] signal
        IORails-->>Client: Stream completion
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.40% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR introduces major changes (536+ lines of new code) but PR description lacks documentation of test results and testing information.	Update PR description to include Testing section documenting test scenarios, referencing test files, and confirming tests pass.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly describes the main feature addition: streaming output rails support in IORails, which is the primary focus of the changeset.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/streaming-output

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoguardrails/guardrails/iorails.py`:
- Around line 284-302: The output-buffering path is incompatible with
metadata-emitting chunks: before calling get_buffer_strategy(buffer_strategy)
and async for chunk_batch in buffer_strategy(streaming_handler) detect if
output_streaming_config.rails.output.streaming is enabled and the incoming
streaming_handler yields dicts/has include_metadata=True; either raise a clear
error rejecting this configuration or wrap/normalize the streaming_handler into
a transformer that maps dict chunks to plain text (extracting the text field)
before passing to buffer_strategy and then reattach the original metadata to
each yielded chunk after buffering/rails checks; update logic around
streaming_handler, buffer_strategy, get_buffer_strategy, and stream_first to use
the chosen approach so buffering always consumes List[str] and output yields
preserve metadata.
- Around line 252-256: The pre-check using self._stream_semaphore._value is
racy; replace it with a truly non-blocking acquisition attempt using
asyncio.wait_for to avoid blocking. Change the code to attempt await
asyncio.wait_for(self._stream_semaphore.acquire(), timeout=0) inside a
try/except asyncio.TimeoutError and on TimeoutError raise
asyncio.QueueFull("Streaming concurrency limit reached"); keep using
self._stream_semaphore.acquire() for the actual acquisition but only via the
non-blocking wait_for wrapper so the loser does not block.
- Around line 275-278: The finally block in _wrapped_iterator() releases
self._stream_semaphore before cancelling/awaiting self._generation_task, which
lets the streaming slot free while the background generation keeps pulling
tokens; change the teardown to first cancel the background task (call
self._generation_task.cancel()), then await it while suppressing
asyncio.CancelledError using contextlib.suppress (import suppress from
contextlib), and only after the task has been cancelled/awaited release
self._stream_semaphore and call reset_request_id(token); ensure this ordering
covers both early consumer exit and _run_output_rails_in_streaming() early
returns.

In `@nemoguardrails/guardrails/model_engine.py`:
- Around line 318-321: The SSE parser in model_engine.py currently assumes
chunk.get("choices", [{}])[0] exists and will IndexError on events with an empty
choices list; change the code to retrieve choices = chunk.get("choices") and
skip/continue if not choices (or if len(choices) == 0) before accessing [0],
then extract delta = choices[0].get("delta", {}) and yield content as before.
Also add a regression test that feeds an SSE event with "choices": [] (the
include_usage case) into the parser to assert it does not raise and simply skips
that event.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8a3dadd0-d15f-49e8-84ee-1f776e73e4ba

📥 Commits

Reviewing files that changed from the base of the PR and between 7aeba43 and 75fb98d.

📒 Files selected for processing (6)

examples/configs/gs_content_safety/config/config.yml
nemoguardrails/guardrails/iorails.py
nemoguardrails/guardrails/model_engine.py
nemoguardrails/guardrails/model_manager.py
tests/guardrails/test_iorails_streaming.py
tests/guardrails/test_model_engine.py

tgasser-nv · 2026-04-06T20:09:12Z

@greptile check the latest commit and update summary and score

tgasser-nv · 2026-04-06T20:20:11Z

@coderabbitai Re: Pre-merge checks. Re-check docstring coverage for latest commit. For "Test Results For Major Changes" see the Test Plan section in the PR description. Update table accordingly

coderabbitai · 2026-04-06T20:21:15Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

The PR description image shows a Jupyter notebook titled "IORails streaming integration test" serving as the Test Plan. It demonstrates:

Cell [5]: iorails.stream_async() streaming a long query about London — received 977 chunks of coherent text output
Cell [6]: iorails.generate_async() for a short query ("How many items in a bakers dozen?") — returned a correct, complete response
Cell [7]: The response printed as a dict with role: assistant and accurate content

This confirms end-to-end streaming and non-streaming functionality works with the content safety rails config.

Now let me check docstring coverage on the latest commit's changed files.

🏁 Script executed:

#!/bin/bash
echo "=== Checking docstrings in iorails.py ==="
python3 - <<'EOF'
import ast, sys

files = [
    "nemoguardrails/guardrails/iorails.py",
    "nemoguardrails/guardrails/model_engine.py",
    "nemoguardrails/guardrails/model_manager.py",
]

for filepath in files:
    print(f"\n--- {filepath} ---")
    try:
        with open(filepath) as f:
            source = f.read()
        tree = ast.parse(source)
    except Exception as e:
        print(f"  ERROR: {e}")
        continue

    missing = []
    covered = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
            name = node.name
            if name.startswith("__") and name.endswith("__"):
                continue  # skip dunder
            docstring = ast.get_docstring(node)
            kind = "class" if isinstance(node, ast.ClassDef) else "func"
            if docstring:
                covered.append(f"  [OK]  {kind}: {name} (line {node.lineno})")
            else:
                missing.append(f"  [MISS] {kind}: {name} (line {node.lineno})")

    for m in missing:
        print(m)
    for c in covered:
        print(c)
    print(f"\n  Summary: {len(covered)} with docstrings, {len(missing)} missing")
EOF

Length of output: 1668

@tgasser-nv: Here is the updated pre-merge checks table based on the latest commit's docstring scan and the Test Plan section visible in the PR description image (the "IORails streaming integration test" Jupyter notebook).

🔍 Pre-Merge Checks — Updated

Check	Status	Notes
Docstring Coverage	✅ Pass (with minor pre-existing gaps)	All newly introduced public and private methods in this PR (`stream_async`, `_run_output_rails_in_streaming`, `_validate_streaming_with_output_rails`, `_generation_task`, `_wrapped_iterator`, `stream_call`, `ModelManager.stream_async`, and all helpers in `model_engine.py`) have docstrings. Two pre-existing functions are still missing docstrings — `_run_sync_iorails` (iorails.py, line 106) and `api_call` (model_manager.py, line 172) — but neither was introduced in this PR.
Test Results For Major Changes	✅ Pass	The PR description includes an integration test notebook ("IORails streaming integration test") demonstrating: (1) `stream_async()` successfully streaming 977 chunks for a long LLM query, and (2) `generate_async()` returning a correct non-streaming response. This covers both the new streaming path and backward compatibility of the existing non-streaming path.

Summary of docstring coverage by file:

File	With Docstrings	Missing	Missing Detail
`nemoguardrails/guardrails/iorails.py`	10	1	`_run_sync_iorails` (line 106) — pre-existing private helper
`nemoguardrails/guardrails/model_engine.py`	14	0	✅ Full coverage
`nemoguardrails/guardrails/model_manager.py`	8	1	`api_call` (line 172) — pre-existing method

🧠 Learnings used

Learnt from: CR
Repo: NVIDIA-NeMo/Guardrails PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-09T09:16:48.423Z
Learning: For PRs with major changes (new features, breaking changes, significant refactoring), verify that the PR description includes test results or testing information

tgasser-nv · 2026-04-06T20:28:57Z

@Pouyanpi , @cparisien , @trebedea This is ready for review

Pouyanpi

Looks good 👍🏻 just some comments to review before merging. just opened an issue which seems necessary #1780

the only duplication is output-rails-in-streaming glue loop If we do extract, the shared helper would look like:

async def apply_output_rails_to_stream(
    streaming_handler,
    buffer_strategy,
    stream_first: bool,
    check_fn: Callable[[str], Awaitable[RailResult]],
) -> AsyncIterator[str]:

no need to address in this PR we might do a follow up, but i don't see any need for that either.

…nc. Make logging and request ID generation symmetric over both

…s streaming output rails

…utput rails

Initial commit of streaming code (IOrails and below)

75fb98d

greptile-apps Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread nemoguardrails/guardrails/iorails.py Outdated

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread nemoguardrails/guardrails/iorails.py

Comment thread nemoguardrails/guardrails/model_engine.py Outdated

tgasser-nv added 3 commits April 6, 2026 13:31

Revert accidental changes to streaming content-safety config

bb4fd46

Fix TOCTOU race and improve line coverage

3fa799c

Address code rabbit feedback

b6475d5

Add single-line docstrings to all unit-tests

d85fd91

tgasser-nv requested review from cparisien and trebedea April 6, 2026 20:28

tgasser-nv mentioned this pull request Apr 7, 2026

feat(iorails): IORails stream_async() integration into chat and server #1766

Merged

3 tasks

Pouyanpi assigned tgasser-nv Apr 9, 2026

Pouyanpi added the enhancement New feature or request label Apr 9, 2026

Pouyanpi added this to the v0.22.0 milestone Apr 9, 2026

Pouyanpi added the status: in review label Apr 9, 2026

Pouyanpi approved these changes Apr 9, 2026

View reviewed changes

Comment thread nemoguardrails/guardrails/iorails.py

Comment thread nemoguardrails/guardrails/iorails.py

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread nemoguardrails/guardrails/iorails.py Outdated

Comment thread tests/guardrails/test_iorails_streaming.py

tgasser-nv added 5 commits April 9, 2026 14:33

Start IORails on first inference-call to stream_async or generate_asy…

30f483d

…nc. Make logging and request ID generation symmetric over both

Convert dict to GenerationOptions, add helper to check if a config ha…

cbbb97f

…s streaming output rails

Add tests for mid-stream errors, rail exceptions dict GenerationOptions

008dd1d

Adding extra tests to get line coverage and refactor slightly

d3b7b16

Address Greptile feedback on SSE parsing and error chunks bypassing o…

c18fefd

…utput rails

tgasser-nv merged commit b78f48b into develop Apr 9, 2026
7 checks passed

tgasser-nv deleted the feat/streaming-output branch April 9, 2026 22:48

tgasser-nv mentioned this pull request Apr 14, 2026

fix(iorails): Fix failing tests due to ModelManager refactor #1791

Merged

4 tasks

tgasser-nv mentioned this pull request May 21, 2026

docs: expand 0.22 release notes #1909

Merged

4 tasks

Conversation

tgasser-nv commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Plan

Pre-commit

Unit-tests

Local integration test

Checklist

Uh oh!

greptile-apps Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgasser-nv commented Apr 6, 2026

Uh oh!

tgasser-nv commented Apr 6, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026

🔍 Pre-Merge Checks — Updated

Uh oh!

tgasser-nv commented Apr 6, 2026

Uh oh!

Pouyanpi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tgasser-nv commented Apr 6, 2026 •

edited

Loading

greptile-apps Bot commented Apr 6, 2026 •

edited

Loading

codecov Bot commented Apr 6, 2026 •

edited

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading