Skip to content

feat(iorails): IORails support for streaming output rails#1765

Merged
tgasser-nv merged 10 commits into
developfrom
feat/streaming-output
Apr 9, 2026
Merged

feat(iorails): IORails support for streaming output rails#1765
tgasser-nv merged 10 commits into
developfrom
feat/streaming-output

Conversation

@tgasser-nv

@tgasser-nv tgasser-nv commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds output-rail streaming to IORails. Integrations into the API service, chat, etc via Guardrails top-level in follow-on PR #1766 . Key points:

  • Two concurrent paths: _generation_task pushes chunks into the StreamingHandler queue; _wrapped_iterator (or _run_output_rails_in_streaming) consumes
    from it
  • StreamingHandler is the bridge — it's an AsyncIterator backed by an asyncio.Queue
  • RollingBuffer sits between the StreamingHandler and the caller, batching tokens and providing overlap context for output rail checks
  • Output rails use the same ModelEngine.call() (non-streaming) as input rails — only the main LLM uses stream_call()

High-level overview of the change below:

*  IORails.stream_async(messages)
  │
  ├─ _validate_streaming_with_output_rails()
  │
  └─ _wrapped_iterator()                          [async generator returned to caller]
     │
     ├─ Semaphore.acquire()                       [concurrency gate]
     ├─ set_new_request_id()
     │
     ├─ asyncio.create_task(_generation_task)      [background task ─────────────────┐
     │                                                                               │
     │   _generation_task()                                                          │
     │   │                                                                           │
     │   ├─ RailsManager.is_input_safe(messages)   [input rails, non-streaming]      │
     │   │   └─ ModelManager.generate_async("content_safety", ...)                   │
     │   │       └─ ModelEngine.call()              [HTTP POST, full response]       │
     │   │                                                                           │
     │   ├─ ModelManager.stream_async("main", messages)                              │
     │   │   └─ ModelEngine.stream_call()           [HTTP POST stream=True, SSE]     │
     │   │       └─ yields content delta strings ──┐                                 │
     │   │                                         │                                 │
     │   │   for each chunk:                       │                                 │
     │   │       StreamingHandler.push_chunk(chunk) ◄┘                               │
     │   │           └─ queue.put(chunk)                                             │
     │   │                                                                           │
     │   └─ StreamingHandler.push_chunk(END_OF_STREAM)                               │
     │                                              │                                │
     │                                              ▼                                │
     │                                     StreamingHandler                          │
     │                                     [AsyncIterator]                           │
     │                                     queue: chunk→chunk→...→EOS                │
     │                                              │                                │
     │  ┌───────────────────────────────────────────┘                                │
     │  │                                                                            │
     │  ▼  [if output rails streaming enabled]                                       │
     │                                                                               │
     ├─ _run_output_rails_in_streaming(streaming_handler, messages)                  │
     │  │                                                                            │
     │  ├─ get_buffer_strategy(config)                                               │
     │  │   └─ RollingBuffer(context_size, chunk_size)                               │
     │  │                                                                            │
     │  └─ async for chunk_batch in buffer_strategy(streaming_handler):              │
     │     │                                                                         │
     │     │  RollingBuffer.process_stream()                                         │
     │     │  ├─ accumulates chunks until len(buffer) >= chunk_size                  │
     │     │  └─ yields ChunkBatch:                                                  │
     │     │       .processing_context  = [context_size overlap + chunk_size tokens] │
     │     │       .user_output_chunks  = [new tokens only]                          │
     │     │                                                                         │
     │     ├─ [stream_first=True]  yield user_output_chunks to caller                │
     │     │                                                                         │
     │     ├─ RailsManager.is_output_safe(messages, processing_context_text)         │
     │     │   └─ ModelManager.generate_async("content_safety", ...)                 │
     │     │       └─ ModelEngine.call()                                             │
     │     │                                                                         │
     │     ├─ [unsafe] → yield error JSON, return                                    │
     │     │                                                                         │
     │     └─ [stream_first=False] yield user_output_chunks to caller                │
     │                                                                               │
     │  [if NO output rails]                                                         │
     ├─ async for chunk in streaming_handler: yield chunk                            │
     │                                                                               │
     └─ finally:                                                                     │
        ├─ Semaphore.release()                                                       │
        ├─ await task  ◄─────────────────────────────────────────────────────────────┘
        └─ reset_request_id(token)



Test Plan

Pre-commit

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
ruff (legacy alias)......................................................Passed
ruff format..............................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Unit-tests

$ poetry run pytest -q
.......................ssss.........................................................................................s............................... [  4%]
.................................................................................................................................................... [  9%]
.................................................................................................................................................... [ 13%]
.................................................................................................................................................... [ 18%]
.................................................................................................................................................... [ 22%]
.......................s......ss...................sssssss.......................................................................................... [ 27%]
...................................................................s.......s.........................................ss...........................s. [ 31%]
..............s.......sssss...............................................................s......................................................... [ 36%]
.............................................................ss........ss...ss............................................s......................... [ 40%]
............................s............s.......................................................................................................... [ 45%]
.................................................................................................................................................... [ 50%]
...................................................................sssss......ssssssssssssssssss.........sssss...................................... [ 54%]
..............................................s...........ss...................................sssssssss.ssssssssss................................s [ 59%]
...................................................s....s........................................................ssssssss..............sss...ss...ss [ 63%]
.....ssssssssssssss................................................................................................................................. [ 68%]
.....s..............................................................................................................s....................ssssssss... [ 72%]
......ss......................................................................................................................................ssssss [ 77%]
s...........................................................................s....................................................................... [ 81%]
..................................ss................................................................................................................ [ 86%]
.................................................................................................................................................... [ 90%]
...................................s................................................................................................................ [ 95%]
.................................................................................................................................................... [100%]
3115 passed, 141 skipped in 137.15s (0:02:17)

Local integration test

image

20260406_iorails_streaming_integ_tests.ipynb

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@greptile-apps

greptile-apps Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds streaming output rail support to IORails via a two-task architecture: a background _generation_task pushes SSE chunks into a StreamingHandler queue, and the _wrapped_iterator (or _run_output_rails_in_streaming) consumes them with optional RollingBuffer-based batching for output-rail checks. The ModelEngine gains a new stream_call() method for SSE parsing, and ModelManager exposes it via stream_async(). The implementation is well-structured with good concurrency control and error propagation.

One logic issue in _run_output_rails_in_streaming: when a generation error and content chunks land in the same buffer batch (e.g. the stream is shorter than chunk_size), the content chunks before the error are silently dropped even in stream_first=True mode.

Confidence Score: 4/5

Safe to merge after addressing the content-chunk loss bug in _run_output_rails_in_streaming.

One P1 logic bug: content chunks that share a flush batch with an error chunk are silently dropped, violating stream_first=True semantics. All previous thread concerns (TOCTOU race, reset_request_id ordering, semaphore leak path) are resolved. The rest of the implementation is well-structured and comprehensively tested.

nemoguardrails/guardrails/iorails.py — specifically _run_output_rails_in_streaming() error-detection ordering.

Important Files Changed

Filename Overview
nemoguardrails/guardrails/iorails.py Core streaming implementation: adds stream_async(), _generation_task(), _wrapped_iterator(), and _run_output_rails_in_streaming(); has a logic bug where content chunks preceding an error in the same buffer batch are silently dropped in stream_first=True mode.
nemoguardrails/guardrails/model_engine.py Adds stream_call() with correct SSE line-by-line parsing via readline(); refactors shared request-building logic into _prepare_request()/_ensure_running()/_raise_for_status(); sock_read timeout reuses total timeout which may be too tight for slow generators.
nemoguardrails/guardrails/model_manager.py Adds stream_async() as a thin async generator proxy over engine.stream_call(); straightforward and correct.
tests/guardrails/test_iorails_streaming.py New streaming test suite covering validation, no-output-rails path, stream_first/gated modes, error propagation, and concurrency; does not assert against content-chunk loss when error shares a buffer batch with real chunks.
tests/guardrails/test_iorails.py Adds two tests for generate_async: dict options forwarding and exception propagation; both correct.
tests/guardrails/test_model_engine.py Tests for the new stream_call() and refactored helper methods; coverage looks adequate.
tests/guardrails/test_model_manager.py Adds tests for ModelManager.stream_async(); straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant _wrapped_iterator
    participant _generation_task
    participant StreamingHandler
    participant RollingBuffer
    participant RailsManager
    participant ModelManager

    Caller->>_wrapped_iterator: async for chunk in stream_async()
    _wrapped_iterator->>_wrapped_iterator: semaphore.acquire()
    _wrapped_iterator->>_generation_task: asyncio.create_task()

    par Background Task
        _generation_task->>RailsManager: is_input_safe(messages)
        RailsManager-->>_generation_task: RailResult
        alt input safe
            _generation_task->>ModelManager: stream_async("main", messages)
            loop SSE chunks
                ModelManager-->>StreamingHandler: push_chunk(delta)
            end
            _generation_task->>StreamingHandler: push_chunk(END_OF_STREAM)
        else input blocked
            _generation_task->>StreamingHandler: push_chunk(REFUSAL_MESSAGE)
            _generation_task->>StreamingHandler: push_chunk(END_OF_STREAM)
        end
    and Consumer (with output rails)
        _wrapped_iterator->>RollingBuffer: buffer_strategy(streaming_handler)
        loop chunk_batch from RollingBuffer
            RollingBuffer->>RollingBuffer: accumulate until chunk_size tokens
            RollingBuffer-->>_wrapped_iterator: ChunkBatch
            alt stream_first=True
                _wrapped_iterator-->>Caller: yield user_output_chunks
            end
            _wrapped_iterator->>RailsManager: is_output_safe(messages, processing_context)
            RailsManager-->>_wrapped_iterator: RailResult
            alt output safe AND stream_first=False
                _wrapped_iterator-->>Caller: yield user_output_chunks
            else output blocked
                _wrapped_iterator-->>Caller: yield error JSON
            end
        end
    end

    _wrapped_iterator->>_generation_task: task.cancel() (if not done)
    _wrapped_iterator->>_wrapped_iterator: semaphore.release()
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemoguardrails/guardrails/iorails.py
Line: 352-363

Comment:
**Content chunks silently dropped in same-batch error scenario**

The error-detection loop runs before the `if stream_first: yield` block. When a generation error and content chunks land in the same flush batch (e.g. `chunk_size=5` but the stream raises after only 2 tokens), all preceding content chunks in `user_output_chunks` are discarded — only the error JSON is yielded. In `stream_first=True` mode this directly contradicts the "deliver eagerly" contract.

This is reproducible with the existing `_mid_stream_failure` mock (`chunk_size=5`, 2 real chunks then raise): all three items end up in a single flush at `END_OF_STREAM`, `real1`/`real2` are silently lost.

```python
# Separate error from real content first
real_chunks: list = []
error_chunk: str | None = None
for chunk in user_output_chunks:
    try:
        parsed = json.loads(chunk)
        if isinstance(parsed, dict) and parsed.get("error", {}).get("type") == _GENERATION_ERROR_TYPE:
            error_chunk = chunk
            break
    except (json.JSONDecodeError, TypeError):
        real_chunks.append(chunk)

if stream_first:
    for chunk in real_chunks:
        yield chunk
if error_chunk:
    yield error_chunk
    return

# Run output rails on the accumulated context
req_id = get_request_id()
log.info("[%s] Running output rails", req_id)
output_result = await self.rails_manager.is_output_safe(messages, bot_response_chunk)
if not output_result.is_safe:
    log.info("[%s] Output blocked: %s", req_id, output_result.reason)
    error_data = {
        "error": {
            "message": f"Blocked by output rails: {output_result.reason}",
            "type": "guardrails_violation",
            "code": "content_blocked",
        }
    }
    yield json.dumps(error_data)
    return

if not stream_first:
    for chunk in real_chunks:
        yield chunk
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: nemoguardrails/guardrails/model_engine.py
Line: 286-290

Comment:
**`sock_read` reuses the non-streaming total timeout**

`self._timeout.total` was sized for a complete blocking request. Using it as the per-chunk `sock_read` deadline means a model configured with `timeout=30` will abort streaming if no SSE chunk arrives within 30 seconds — which is plausible for a slow first-token scenario. Consider exposing a separate `timeout_sock_read` parameter, or documenting that operators should set `timeout` to the expected inter-chunk interval rather than the total generation duration.

```python
stream_timeout = aiohttp.ClientTimeout(
    total=None,
    connect=self._timeout.connect,
    sock_read=float(params.get("timeout_sock_read", self._timeout.total)),
)
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (6): Last reviewed commit: "Address Greptile feedback on SSE parsing..." | Re-trigger Greptile

Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread nemoguardrails/guardrails/iorails.py Outdated
@codecov

codecov Bot commented Apr 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.02913% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nemoguardrails/guardrails/model_engine.py 97.43% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@coderabbitai

coderabbitai Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR introduces streaming support to the NemoGuardrails framework across multiple layers. It adds a stream_async() method to IORails for applying input and output rails to streamed content with concurrency control, a stream_call() method in ModelEngine to handle SSE parsing, and a wrapper in ModelManager. A configuration file is also updated to use the NIM engine instead of NVIDIA AI Endpoints, with adjusted streaming chunk parameters.

Changes

Cohort / File(s) Summary
Configuration Update
examples/configs/gs_content_safety/config/config.yml
Updated engine from nvidia_ai_endpoints to nim for both main and content safety models; adjusted streaming chunk_size from 200 to 5 and context_size from 50 to 1.
Model Engine Streaming
nemoguardrails/guardrails/model_engine.py
Refactored request handling with helper methods (_ensure_running(), _prepare_request(), _raise_for_status(), _wrap_exception()); added stream_call() public method for SSE streaming with stream=True, JSON event parsing, [DONE] termination detection, and consistent error handling.
Rails Streaming Layer
nemoguardrails/guardrails/iorails.py
Added stream_async() method with input rails validation and output rails buffering; introduced _run_output_rails_in_streaming() for buffered validation with stream_first mode support; added semaphore-based concurrency control (STREAM_MAX_CONCURRENCY = 256) and _validate_streaming_with_output_rails() error validation.
Manager Streaming Wrapper
nemoguardrails/guardrails/model_manager.py
Added stream_async() async generator method that obtains a ModelEngine and yields chunks from engine.stream_call() with request logging.
Streaming Tests
tests/guardrails/test_iorails_streaming.py
Comprehensive async unit tests covering validation errors, input-only rails, output rails in both stream_first modes, error propagation, and semaphore-based concurrency control with mock LLM streaming and rail checks.
Engine Streaming Tests
tests/guardrails/test_model_engine.py
Added TestModelEngineStreamCall suite verifying SSE delta parsing, [DONE] termination, request payload construction, HTTP error responses, pre-start validation, timeout configuration, and delta content filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant IORails
    participant ModelManager
    participant ModelEngine
    participant BufferStrategy

    Client->>IORails: stream_async(messages, options)
    IORails->>IORails: _validate_streaming_with_output_rails()
    IORails->>IORails: Apply input rails (synchronous)
    
    alt Input Unsafe
        IORails-->>Client: REFUSAL_MESSAGE
    else Input Safe
        IORails->>ModelManager: stream_async(model_type, messages)
        ModelManager->>ModelEngine: stream_call(messages, kwargs)
        
        loop Streaming
            ModelEngine-->>ModelManager: SSE chunk (delta.content)
            ModelManager-->>IORails: chunk
            
            alt Output Rails Enabled
                IORails->>BufferStrategy: Buffer chunk
                BufferStrategy-->>IORails: buffered content
                IORails->>IORails: Check is_output_safe()
                
                alt stream_first=true
                    IORails-->>Client: Yield chunk (optimistic)
                    alt Unsafe Detected
                        IORails-->>Client: Error JSON payload
                    end
                else stream_first=false
                    alt Unsafe Detected
                        IORails-->>Client: Error JSON payload
                    else Safe
                        IORails-->>Client: Yield chunk
                    end
                end
            else Output Rails Disabled
                IORails-->>Client: Yield chunk
            end
        end
        
        ModelEngine-->>IORails: [DONE] signal
        IORails-->>Client: Stream completion
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR introduces major changes (536+ lines of new code) but PR description lacks documentation of test results and testing information. Update PR description to include Testing section documenting test scenarios, referencing test files, and confirming tests pass.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main feature addition: streaming output rails support in IORails, which is the primary focus of the changeset.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/streaming-output

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoguardrails/guardrails/iorails.py`:
- Around line 284-302: The output-buffering path is incompatible with
metadata-emitting chunks: before calling get_buffer_strategy(buffer_strategy)
and async for chunk_batch in buffer_strategy(streaming_handler) detect if
output_streaming_config.rails.output.streaming is enabled and the incoming
streaming_handler yields dicts/has include_metadata=True; either raise a clear
error rejecting this configuration or wrap/normalize the streaming_handler into
a transformer that maps dict chunks to plain text (extracting the text field)
before passing to buffer_strategy and then reattach the original metadata to
each yielded chunk after buffering/rails checks; update logic around
streaming_handler, buffer_strategy, get_buffer_strategy, and stream_first to use
the chosen approach so buffering always consumes List[str] and output yields
preserve metadata.
- Around line 252-256: The pre-check using self._stream_semaphore._value is
racy; replace it with a truly non-blocking acquisition attempt using
asyncio.wait_for to avoid blocking. Change the code to attempt await
asyncio.wait_for(self._stream_semaphore.acquire(), timeout=0) inside a
try/except asyncio.TimeoutError and on TimeoutError raise
asyncio.QueueFull("Streaming concurrency limit reached"); keep using
self._stream_semaphore.acquire() for the actual acquisition but only via the
non-blocking wait_for wrapper so the loser does not block.
- Around line 275-278: The finally block in _wrapped_iterator() releases
self._stream_semaphore before cancelling/awaiting self._generation_task, which
lets the streaming slot free while the background generation keeps pulling
tokens; change the teardown to first cancel the background task (call
self._generation_task.cancel()), then await it while suppressing
asyncio.CancelledError using contextlib.suppress (import suppress from
contextlib), and only after the task has been cancelled/awaited release
self._stream_semaphore and call reset_request_id(token); ensure this ordering
covers both early consumer exit and _run_output_rails_in_streaming() early
returns.

In `@nemoguardrails/guardrails/model_engine.py`:
- Around line 318-321: The SSE parser in model_engine.py currently assumes
chunk.get("choices", [{}])[0] exists and will IndexError on events with an empty
choices list; change the code to retrieve choices = chunk.get("choices") and
skip/continue if not choices (or if len(choices) == 0) before accessing [0],
then extract delta = choices[0].get("delta", {}) and yield content as before.
Also add a regression test that feeds an SSE event with "choices": [] (the
include_usage case) into the parser to assert it does not raise and simply skips
that event.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8a3dadd0-d15f-49e8-84ee-1f776e73e4ba

📥 Commits

Reviewing files that changed from the base of the PR and between 7aeba43 and 75fb98d.

📒 Files selected for processing (6)
  • examples/configs/gs_content_safety/config/config.yml
  • nemoguardrails/guardrails/iorails.py
  • nemoguardrails/guardrails/model_engine.py
  • nemoguardrails/guardrails/model_manager.py
  • tests/guardrails/test_iorails_streaming.py
  • tests/guardrails/test_model_engine.py

Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread nemoguardrails/guardrails/iorails.py
Comment thread nemoguardrails/guardrails/model_engine.py Outdated
@tgasser-nv

Copy link
Copy Markdown
Collaborator Author

@greptile check the latest commit and update summary and score

@tgasser-nv

Copy link
Copy Markdown
Collaborator Author

@coderabbitai Re: Pre-merge checks. Re-check docstring coverage for latest commit. For "Test Results For Major Changes" see the Test Plan section in the PR description. Update table accordingly

@coderabbitai

coderabbitai Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

The PR description image shows a Jupyter notebook titled "IORails streaming integration test" serving as the Test Plan. It demonstrates:

  • Cell [5]: iorails.stream_async() streaming a long query about London — received 977 chunks of coherent text output
  • Cell [6]: iorails.generate_async() for a short query ("How many items in a bakers dozen?") — returned a correct, complete response
  • Cell [7]: The response printed as a dict with role: assistant and accurate content

This confirms end-to-end streaming and non-streaming functionality works with the content safety rails config.

Now let me check docstring coverage on the latest commit's changed files.


🏁 Script executed:

#!/bin/bash
echo "=== Checking docstrings in iorails.py ==="
python3 - <<'EOF'
import ast, sys

files = [
    "nemoguardrails/guardrails/iorails.py",
    "nemoguardrails/guardrails/model_engine.py",
    "nemoguardrails/guardrails/model_manager.py",
]

for filepath in files:
    print(f"\n--- {filepath} ---")
    try:
        with open(filepath) as f:
            source = f.read()
        tree = ast.parse(source)
    except Exception as e:
        print(f"  ERROR: {e}")
        continue

    missing = []
    covered = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
            name = node.name
            if name.startswith("__") and name.endswith("__"):
                continue  # skip dunder
            docstring = ast.get_docstring(node)
            kind = "class" if isinstance(node, ast.ClassDef) else "func"
            if docstring:
                covered.append(f"  [OK]  {kind}: {name} (line {node.lineno})")
            else:
                missing.append(f"  [MISS] {kind}: {name} (line {node.lineno})")

    for m in missing:
        print(m)
    for c in covered:
        print(c)
    print(f"\n  Summary: {len(covered)} with docstrings, {len(missing)} missing")
EOF

Length of output: 1668


@tgasser-nv: Here is the updated pre-merge checks table based on the latest commit's docstring scan and the Test Plan section visible in the PR description image (the "IORails streaming integration test" Jupyter notebook).


🔍 Pre-Merge Checks — Updated

Check Status Notes
Docstring Coverage ✅ Pass (with minor pre-existing gaps) All newly introduced public and private methods in this PR (stream_async, _run_output_rails_in_streaming, _validate_streaming_with_output_rails, _generation_task, _wrapped_iterator, stream_call, ModelManager.stream_async, and all helpers in model_engine.py) have docstrings. Two pre-existing functions are still missing docstrings — _run_sync_iorails (iorails.py, line 106) and api_call (model_manager.py, line 172) — but neither was introduced in this PR.
Test Results For Major Changes ✅ Pass The PR description includes an integration test notebook ("IORails streaming integration test") demonstrating: (1) stream_async() successfully streaming 977 chunks for a long LLM query, and (2) generate_async() returning a correct non-streaming response. This covers both the new streaming path and backward compatibility of the existing non-streaming path.

Summary of docstring coverage by file:

File With Docstrings Missing Missing Detail
nemoguardrails/guardrails/iorails.py 10 1 _run_sync_iorails (line 106) — pre-existing private helper
nemoguardrails/guardrails/model_engine.py 14 0 ✅ Full coverage
nemoguardrails/guardrails/model_manager.py 8 1 api_call (line 172) — pre-existing method

🧠 Learnings used
Learnt from: CR
Repo: NVIDIA-NeMo/Guardrails PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-03-09T09:16:48.423Z
Learning: For PRs with major changes (new features, breaking changes, significant refactoring), verify that the PR description includes test results or testing information

@tgasser-nv tgasser-nv requested review from cparisien and trebedea April 6, 2026 20:28
@tgasser-nv

Copy link
Copy Markdown
Collaborator Author

@Pouyanpi , @cparisien , @trebedea This is ready for review

@Pouyanpi Pouyanpi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏻 just some comments to review before merging. just opened an issue which seems necessary #1780

the only duplication is output-rails-in-streaming glue loop If we do extract, the shared helper would look like:

async def apply_output_rails_to_stream(
    streaming_handler,
    buffer_strategy,
    stream_first: bool,
    check_fn: Callable[[str], Awaitable[RailResult]],
) -> AsyncIterator[str]:

no need to address in this PR we might do a follow up, but i don't see any need for that either.

Comment thread nemoguardrails/guardrails/iorails.py
Comment thread nemoguardrails/guardrails/iorails.py
Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread nemoguardrails/guardrails/iorails.py Outdated
Comment thread tests/guardrails/test_iorails_streaming.py
@tgasser-nv tgasser-nv merged commit b78f48b into develop Apr 9, 2026
7 checks passed
@tgasser-nv tgasser-nv deleted the feat/streaming-output branch April 9, 2026 22:48
@tgasser-nv tgasser-nv mentioned this pull request May 21, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request status: in review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants