test(recorded): add rails library coverage (5/5) by Pouyanpi · Pull Request #1978 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-06-03T15:00:28Z

Summary

Adds recorded coverage for library rails and their composition order.

Why

Built-in rails need replayable end-to-end coverage so future rail changes can be reviewed against stable behavior.

What Changed

Adds library rail configs, cassettes, and tests for regex, injection, self-check, content safety, jailbreak, topic control, and composition.
Adds library README guidance requiring recorded coverage for new rails.

Review Notes

This is the final stack PR and includes no additional runtime changes.

Stack Position

Part 5 of 5.

Previous: test(recorded): add rails public API coverage (4/5) #1977
Next: none, final stack PR

Stack Context

This stack decomposes recorded end-to-end replay coverage into reviewable slices. The PRs should be reviewed against their parent branch in the stack.

Please review each PR against its parent branch, not directly against the root base branch, except for part 1.

Order	PR	Branch	Base
1	#1974	`stack/recorded-tests-01-harness`	`develop`
2	#1975	`stack/recorded-tests-02-deterministic-library-load`	`stack/recorded-tests-01-harness`
3	#1976	`stack/recorded-tests-03-clients`	`stack/recorded-tests-02-deterministic-library-load`
4	#1977	`stack/recorded-tests-04-public-api`	`stack/recorded-tests-03-clients`
5	#1978	`stack/recorded-tests-05-library-rails`	`stack/recorded-tests-04-public-api`

Validation

poetry check --lock
poetry lock --no-update
poetry install --with dev
poetry run pytest tests/recorded --block-network -q
pre-commit hooks passed during commit creation

codecov · 2026-06-03T15:12:54Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/rails/llm/llmrails.py	66.66%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-11T10:48:19Z

Greptile Summary

This is the final PR in a 5-part stack adding recorded end-to-end test coverage for NeMo Guardrails library rails. It also ships a runtime fix to llmrails.py that corrects the streaming path's user-input extraction.

Adds seven test modules (regex, injection, self-check, content safety, jailbreak, topic control, composition) with VCR cassettes and supporting configs that pin the public contract of every built-in rail.
Fixes _get_latest_user_message in llmrails.py to return the message's .content string instead of the full dict, so the streaming output-rail path renders user_input correctly in prompt templates (the cassette for test_content_safety_output_blocks_fake_main_stream now shows user: hello instead of user: {'role': 'user', 'content': 'hello'}).
Adds contribution guidance to nemoguardrails/library/README.md requiring recorded coverage for new rails.

Confidence Score: 5/5

Safe to merge — adds test infrastructure only and a well-evidenced single-function fix in the streaming path.

The runtime change is narrow: one inner function inside the streaming output-rail handler now returns a content string instead of a full message dict. The cassettes before and after both verify the correct prompt is sent to the provider. All 45 changed files are either new test fixtures, configs, or VCR cassettes; no existing logic outside the streaming path is affected.

No files require special attention. The cassettes, configs, and test helpers are internally consistent.

Important Files Changed

Filename	Overview
nemoguardrails/rails/llm/llmrails.py	Fixes `_get_latest_user_message` to return the message content string instead of the full dict; cassettes confirm the streaming prompt now renders correctly.
tests/recorded/rails/library/helpers.py	Provides `check_rails`, `generate_with_fake_main`, and `stream_with_fake_main` helpers; streaming helper intentionally passes no `FakeLLMModel` and relies on `options={"rails": ["output"]}` to prevent main-model calls.
tests/recorded/rails/library/test_composition.py	Composition-order tests covering regex→self-check→jailbreak→content safety→topic control ordering; cassettes confirm each rail short-circuits correctly before the next rail runs.
tests/recorded/rails/library/configs/full_stack/config.yml	Full-stack config now has consistent jailbreak-before-content-safety ordering matching `full_stack_no_topic` (previous ordering discrepancy resolved).
tests/recorded/rails/library/test_content_safety.py	Covers safe-pass, block, streaming block, generation block, and provider-error scenarios for NIM content safety; streaming cassette confirms the user_input fix.
tests/recorded/rails/library/test_injection.py	Tests injection detection (block and omit modes) plus exception propagation when `enable_rails_exceptions=True`; all regex-based, no cassettes needed.
tests/recorded/rails/library/configs.py	Centralises `RailsConfigSource` constants and the shared `JAILBREAK_PROMPT` test fixture.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant T as Test
    participant R as LLMRails
    participant Regex as regex check
    participant SC as self check (OpenAI)
    participant JB as jailbreak detect (NIM)
    participant CS as content safety (NIM)
    participant TC as topic safety (NIM)

    T->>R: "check_async(messages, rail_types=[INPUT])"
    R->>Regex: detect_regex_pattern
    Regex-->>R: passed
    R->>SC: self_check_input prompt
    SC-->>R: safe → passed
    R->>JB: nemoguard-jailbreak-detect
    JB-->>R: "jailbreak=true → BLOCKED (stops here)"
    note over CS,TC: not reached when jailbreak blocks

    T->>R: "stream_async(messages, generator, rails=[output])"
    note over R: generator provides main content
    R->>CS: content_safety_check_output (NIM)
    CS-->>R: unsafe → stream error chunk emitted

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant T as Test
    participant R as LLMRails
    participant Regex as regex check
    participant SC as self check (OpenAI)
    participant JB as jailbreak detect (NIM)
    participant CS as content safety (NIM)
    participant TC as topic safety (NIM)

    T->>R: "check_async(messages, rail_types=[INPUT])"
    R->>Regex: detect_regex_pattern
    Regex-->>R: passed
    R->>SC: self_check_input prompt
    SC-->>R: safe → passed
    R->>JB: nemoguard-jailbreak-detect
    JB-->>R: "jailbreak=true → BLOCKED (stops here)"
    note over CS,TC: not reached when jailbreak blocks

    T->>R: "stream_async(messages, generator, rails=[output])"
    note over R: generator provides main content
    R->>CS: content_safety_check_output (NIM)
    CS-->>R: unsafe → stream error chunk emitted

_{Reviews (10): Last reviewed commit: "test(recorded): fix self-check-facts fal..." | Re-trigger Greptile}

greptile-apps · 2026-06-11T10:48:23Z

+
+          <END CONVERSATION>
+
+          Provide your safety assessment for the above conversation in the below output JSON format:
+          {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
+
+          Do not include anything other than the output JSON in your response.
+          Output JSON:


Streaming path sends raw dict repr as user_input to content safety prompt

The cassette captures the content safety check output prompt with user: {'role': 'user', 'content': 'hello'} — a Python dict repr — instead of the expected string user: hello. The equivalent non-streaming test cassette (test_content_safety_output_blocks_fake_main_generation.yaml) correctly renders user: hello for the same scenario. This means the streaming path in stream_with_fake_main passes the raw message dict (rather than its .content) as the user_input template variable, so the content safety model receives a structurally different prompt during streaming than during generation. VCR replay masks the defect because the cassette is returned regardless of prompt content, but live streaming calls to the provider will receive malformed prompts.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/recorded/rails/library/cassettes/test_content_safety/test_content_safety_output_blocks_fake_main_stream.yaml Line: 57-64 Comment: **Streaming path sends raw dict repr as `user_input` to content safety prompt** The cassette captures the content safety check output prompt with `user: {'role': 'user', 'content': 'hello'}` — a Python dict repr — instead of the expected string `user: hello`. The equivalent non-streaming test cassette (`test_content_safety_output_blocks_fake_main_generation.yaml`) correctly renders `user: hello` for the same scenario. This means the streaming path in `stream_with_fake_main` passes the raw message dict (rather than its `.content`) as the `user_input` template variable, so the content safety model receives a structurally different prompt during streaming than during generation. VCR replay masks the defect because the cassette is returned regardless of prompt content, but live streaming calls to the provider will receive malformed prompts. How can I resolve this? If you propose a fix, please make it concise.

this is a real bug.

@greptileai this is resolved in the latest commit. Do you confirm?

Confirmed. Commit 5cb2ed0 ("fix(streaming): pass user content to output rails") resolves it. The cassette now shows:

user: hello

instead of the broken:

user: {'role': 'user', 'content': 'hello'}

The streaming path now correctly extracts .content before passing user_input to the output rail prompt, bringing it in line with the generation path.

greptile-apps · 2026-06-11T10:48:25Z

+    flows:
+      - regex check input
+      - self check input
+      - content safety check input $model=content_safety
+      - topic safety check input $model=topic_control
+      - jailbreak detection model
+  output:


jailbreak detection model ordering differs between full_stack and full_stack_no_topic

In full_stack/config.yml jailbreak is last in the input rail order (content safety → topic safety → jailbreak detection model). In full_stack_no_topic/config.yml it is before content safety (self check → jailbreak detection model → content safety). The composition test test_input_jailbreak_runs_before_content_safety specifically uses the no-topic config to assert this order, but there is no test asserting jailbreak behavior in the full-stack config. If the difference is intentional, a short comment explaining the deliberate ordering would prevent future editors from treating the full-stack flow as a typo.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/recorded/rails/library/configs/full_stack/config.yml Line: 30-36 Comment: **`jailbreak detection model` ordering differs between `full_stack` and `full_stack_no_topic`** In `full_stack/config.yml` jailbreak is last in the input rail order (`content safety` → `topic safety` → `jailbreak detection model`). In `full_stack_no_topic/config.yml` it is before content safety (`self check` → `jailbreak detection model` → `content safety`). The composition test `test_input_jailbreak_runs_before_content_safety` specifically uses the no-topic config to assert this order, but there is no test asserting jailbreak behavior in the full-stack config. If the difference is intentional, a short comment explaining the deliberate ordering would prevent future editors from treating the full-stack flow as a typo. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Yes, you're right, this is subtle but still worth fixing.

@greptileai this is resolved in the latest commit. Do you confirm?

Confirmed. Both configs now have a consistent ordering:

full_stack: regex → self check → jailbreak → content safety → topic safety

full_stack_no_topic: regex → self check → jailbreak → content safety → (no topic)

Jailbreak runs before content safety in both, resolving the discrepancy. ✅

Fold library/helpers.py onto the shared build_rails()/async_chunks() instead of its own LLMRails(load_config(...)) + local _chunks (D11/F), and assert the content-safety output block via assert_blocked_generation (refusal + rail stop) rather than the weak assert_generation_response non-empty check (D6).

The test passed a list for relevant_chunks, which crashes retrieve_relevant_chunks (list + str) after the self_check_facts LLM call; the swallowed error produced the generic internal-error refusal, and the snapshot pinned that string as the expected fact-check block. A refactor that fixed the crash would have flipped this test red. Pass relevant_chunks as its canonical string form so the rail completes and the snapshot pins the real fact-check refusal. The underlying retrieve_relevant_chunks list+str bug is pre-existing on develop and tracked separately.

This was referenced Jun 3, 2026

test(recorded): add replay harness (1/5) #1974

Open

fix(llmrails): load library files deterministically (2/5) #1975

Open

test(recorded): add client cassette coverage (3/5) #1976

Open

test(recorded): add rails public API coverage (4/5) #1977

Open

Pouyanpi mentioned this pull request Jun 3, 2026

WIP: add recorded E2E tests #1938

Closed

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e9ed8b3 to e3e8722 Compare June 9, 2026 16:22

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch from 05feb29 to 4fbda3a Compare June 9, 2026 16:22

Pouyanpi marked this pull request as ready for review June 11, 2026 10:33

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e3e8722 to 3b279b4 Compare June 11, 2026 10:52

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch from 4fbda3a to 100ca5d Compare June 11, 2026 10:52

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from 3b279b4 to 27732c7 Compare June 11, 2026 12:44

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch 2 times, most recently from 039b4e1 to 9e5bfa4 Compare June 11, 2026 13:18

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from 27732c7 to a759e09 Compare June 11, 2026 13:18

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch from 9e5bfa4 to e7713a2 Compare June 11, 2026 13:36

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from a759e09 to bbc9455 Compare June 11, 2026 13:36

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch 2 times, most recently from 0092cf2 to 86ca895 Compare June 11, 2026 13:56

Pouyanpi self-assigned this Jun 11, 2026

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from efee56d to d7a1262 Compare June 15, 2026 08:57

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch 2 times, most recently from 0c5207e to e25239a Compare June 15, 2026 12:00

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from d7a1262 to 19ec9e4 Compare June 15, 2026 12:00

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch from e25239a to dac2f3c Compare June 15, 2026 14:18

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from 0f531ea to e6370fc Compare June 16, 2026 08:36

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch from dac2f3c to 16ec0ef Compare June 16, 2026 08:36

Pouyanpi added 5 commits June 17, 2026 14:13

test(recorded): add rails library coverage

b1af730

fix(streaming): pass user content to output rails

abe4ff0

test(recorded): align full-stack jailbreak ordering

60fb6d2

Pouyanpi force-pushed the stack/recorded-tests-05-library-rails branch from 16ec0ef to 80962e0 Compare June 17, 2026 12:17

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e6370fc to 399c914 Compare June 17, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(recorded): add rails library coverage (5/5)#1978

test(recorded): add rails library coverage (5/5)#1978
Pouyanpi wants to merge 5 commits into
stack/recorded-tests-04-public-apifrom
stack/recorded-tests-05-library-rails

Pouyanpi commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Pouyanpi Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Pouyanpi Jun 11, 2026

Uh oh!

Pouyanpi Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pouyanpi commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Review Notes

Stack Position

Stack Context

Validation

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pouyanpi commented Jun 3, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading