test(recorded): add rails public API coverage (4/5) by Pouyanpi · Pull Request #1977 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-06-03T15:00:24Z

Summary

Adds recorded public API coverage for generate, check, dialog, request parameters, and streaming behavior.

Why

The public LLMRails API contract should be pinned with deterministic replay across OpenAI and NIM-backed calls.

What Changed

Adds public API configs and cassettes.
Covers sync and async generate/check flows.
Covers streaming, request parameters, task-specific models, and negative provider/input cases.

Stack Position

Part 4 of 5.

Stack Context

This stack decomposes recorded end-to-end replay coverage into reviewable slices. The PRs should be reviewed against their parent branch in the stack.

Please review each PR against its parent branch, not directly against the root base branch, except for part 1.

Order	PR	Branch	Base
1	#1974	`stack/recorded-tests-01-harness`	`develop`
2	#1975	`stack/recorded-tests-02-deterministic-library-load`	`stack/recorded-tests-01-harness`
3	#1976	`stack/recorded-tests-03-clients`	`stack/recorded-tests-02-deterministic-library-load`
4	#1977	`stack/recorded-tests-04-public-api`	`stack/recorded-tests-03-clients`
5	#1978	`stack/recorded-tests-05-library-rails`	`stack/recorded-tests-04-public-api`

Validation

poetry check --lock
poetry lock --no-update
poetry install --with dev
poetry run pytest tests/recorded --block-network -q
pre-commit hooks passed during commit creation

codecov · 2026-06-03T15:11:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-11T10:48:31Z

Greptile Summary

This PR adds recorded (VCR-based) public API coverage for LLMRails — covering generate, check, dialog, request parameters, and streaming — backed by OpenAI and NIM cassettes. It also ships a focused streaming.py fix that prevents the usage metadata token from being re-emitted in the final END_OF_STREAM chunk.

nemoguardrails/streaming.py: After emitting a non-terminal chunk that carries usage in its metadata, usage is now popped from current_metadata so it cannot bleed into the final synthesised chunk on END_OF_STREAM.
tests/recorded/rails/public_api/: New test files cover sync/async generate, check (input/output/parallel rails), dialog, single-call, request-parameter forwarding, task-specific models, and streaming contracts with inline snapshots.
tests/recorded/test_recorded_helpers.py / tests/test_streaming_handler.py: New unit tests pin load_config isolation semantics and the no-duplicate-usage guarantee for the streaming fix.

Confidence Score: 5/5

Safe to merge — the production change is a minimal two-line fix in streaming.py that is directly covered by a new unit test, and the rest of the PR is additive test infrastructure only.

The only production code touched is a two-line guard in StreamingHandler._process_chunk that clears usage from current_metadata after it has been emitted in a non-terminal chunk. The logic is correct and test_raw_usage_metadata_chunk_is_not_repeated_on_final_chunk exercises exactly the fixed path. All other files are new test code with no impact on the library's runtime behaviour.

No files require special attention.

Important Files Changed

Filename	Overview
nemoguardrails/streaming.py	Two-line fix pops `usage` from `current_metadata` after emitting a usage-bearing chunk, preventing it from re-appearing in the final END_OF_STREAM chunk; logic is sound and covered by a new unit test.
tests/recorded/rails/public_api/configs.py	Centralises all config fixtures; STREAMING_DISABLED_CONFIG naming is a previously-flagged concern that doesn't affect correctness.
tests/recorded/rails/public_api/test_stream.py	Covers OpenAI and NIM stream contracts, metadata/usage assertions, streaming output rail allow/block, and StreamingNotSupportedError; snapshots and assertions look correct.
tests/recorded/rails/public_api/test_generate.py	Covers sync/async generate for OpenAI, NIM, and nemoguards_full; includes LLM call log assertions, invalid model error path, and FakeLLMModel-based output rail tests.
tests/recorded/rails/public_api/test_check.py	Exhaustive check-API coverage: 14 scenarios across input/output/parallel/auto/explicit rail types, both sync and async, with a VCR-backed NIM test and an edge-case empty-messages test.
tests/recorded/rails/public_api/test_requests.py	Verifies LLM parameter forwarding for both generate and stream, and validates task-specific model routing via recorded cassettes.
tests/recorded/test_recorded_helpers.py	Two focused unit tests: one pins that load_config returns a fresh copy after LLMRails mutates it, the other asserts assert_stream_contract accepts metadata-bearing chunks.
tests/test_streaming_handler.py	New unit test test_raw_usage_metadata_chunk_is_not_repeated_on_final_chunk directly verifies the streaming.py fix; checks that exactly one usage chunk is emitted and the final chunk's metadata contains no usage key.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant P as Provider (OpenAI/NIM)
    participant SH as StreamingHandler
    participant Q as asyncio.Queue
    participant C as Consumer

    P->>SH: push_chunk("Hello")
    SH->>Q: "put({"text": "Hello"})"
    Q->>C: "{"text": "Hello"}"

    P->>SH: "push_chunk("", metadata={"usage": {...}})"
    note over SH: current_metadata.update({"usage": {...}})
    SH->>Q: "put({"text": "", "metadata": {"usage": {...}}})"
    note over SH: pop("usage") from current_metadata
    Q->>C: "{"text": "", "metadata": {"usage": {...}}}"

    P->>SH: push_chunk(None) → END_OF_STREAM
    note over SH: current_metadata has no "usage" key
    SH->>Q: "put({"text": END_OF_STREAM, "metadata": {"response_metadata": None, "usage_metadata": None}})"
    Q->>C: final chunk (no duplicate usage)

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant P as Provider (OpenAI/NIM)
    participant SH as StreamingHandler
    participant Q as asyncio.Queue
    participant C as Consumer

    P->>SH: push_chunk("Hello")
    SH->>Q: "put({"text": "Hello"})"
    Q->>C: "{"text": "Hello"}"

    P->>SH: "push_chunk("", metadata={"usage": {...}})"
    note over SH: current_metadata.update({"usage": {...}})
    SH->>Q: "put({"text": "", "metadata": {"usage": {...}}})"
    note over SH: pop("usage") from current_metadata
    Q->>C: "{"text": "", "metadata": {"usage": {...}}}"

    P->>SH: push_chunk(None) → END_OF_STREAM
    note over SH: current_metadata has no "usage" key
    SH->>Q: "put({"text": END_OF_STREAM, "metadata": {"response_metadata": None, "usage_metadata": None}})"
    Q->>C: final chunk (no duplicate usage)

_{Reviews (11): Last reviewed commit: "test(recorded): consume shared harness i..." | Re-trigger Greptile}

greptile-apps · 2026-06-11T10:48:37Z

+
+from nemoguardrails.embeddings.index import EmbeddingsIndex, IndexItem
+
+
+class SimpleEmbeddingSearchProvider(EmbeddingsIndex):
+    @property
+    def embedding_size(self):
+        return 0
+
+    def __init__(self):
+        self.items: List[IndexItem] = []
+
+    async def add_item(self, item: IndexItem):
+        self.items.append(item)
+
+    async def add_items(self, items: List[IndexItem]):
+        self.items.extend(items)
+
+    async def build(self):
+        return None
+
+    async def search(self, text: str, max_results: int, threshold=None):
+        normalized = text.lower()
+        matches = [item for item in self.items if item.text.lower() in normalized or normalized in item.text.lower()]
+        return matches[:max_results] or self.items[:max_results]
+
+
+def init(app):
+    app.register_embedding_search_provider("simple", SimpleEmbeddingSearchProvider)


Three-way duplication of SimpleEmbeddingSearchProvider

configs/dialog/config.py, configs/single_call/config.py, and configs/task_models/config.py are byte-for-byte identical. When the EmbeddingsIndex interface changes, all three will need to be updated in sync. A shared module (e.g. a configs/shared/embedding.py imported by each config.py) or a factory helper would eliminate the duplication.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/recorded/rails/public_api/configs/dialog/config.py Line: 17-45 Comment: **Three-way duplication of `SimpleEmbeddingSearchProvider`** `configs/dialog/config.py`, `configs/single_call/config.py`, and `configs/task_models/config.py` are byte-for-byte identical. When the `EmbeddingsIndex` interface changes, all three will need to be updated in sync. A shared module (e.g. a `configs/shared/embedding.py` imported by each `config.py`) or a factory helper would eliminate the duplication. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@greptileai this is resolved in the latest commit. Do you confirm?

Not confirmed — the three files are still byte-for-byte identical in the latest commit. All three (configs/dialog/config.py, configs/single_call/config.py, configs/task_models/config.py) share the same blob SHA (72be185f684ee8a6a3e57b06c896aaa66ea17709) and identical content. The duplication hasn't been resolved yet.

@greptileai what about now?

Confirmed ✅ — all three config.py files now import SimpleEmbeddingSearchProvider from a shared simple_embedding_provider module instead of duplicating the class. The duplication is resolved.

Use build_rails() instead of inline LLMRails(load_config(...)) (D11) and wire the previously-orphaned assert_generation_response / assert_llm_tasks helpers into the task-specific-models test instead of hand-rolling the llm-task set (D9).

This was referenced Jun 3, 2026

test(recorded): add rails library coverage (5/5) #1978

Open

test(recorded): add replay harness (1/5) #1974

Open

fix(llmrails): load library files deterministically (2/5) #1975

Open

test(recorded): add client cassette coverage (3/5) #1976

Open

Pouyanpi mentioned this pull request Jun 3, 2026

WIP: add recorded E2E tests #1938

Closed

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e9ed8b3 to e3e8722 Compare June 9, 2026 16:22

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from dc3a4a7 to 8000a8e Compare June 9, 2026 16:22

Pouyanpi marked this pull request as ready for review June 11, 2026 10:33

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e3e8722 to 3b279b4 Compare June 11, 2026 10:52

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch 2 times, most recently from a253ae2 to 3d1d7ea Compare June 11, 2026 12:44

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from 27732c7 to a759e09 Compare June 11, 2026 13:18

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 3d1d7ea to 3ee9d48 Compare June 11, 2026 13:18

greptile-apps Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread tests/recorded/rails/public_api/test_stream.py

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 3ee9d48 to 5153569 Compare June 11, 2026 13:36

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from efee56d to d7a1262 Compare June 15, 2026 08:57

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch 2 times, most recently from caafb61 to 3136aa1 Compare June 15, 2026 12:00

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from 19ec9e4 to 0f531ea Compare June 15, 2026 14:18

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 3136aa1 to 06858a2 Compare June 15, 2026 14:18

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from 0f531ea to e6370fc Compare June 16, 2026 08:36

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 06858a2 to 474d54d Compare June 16, 2026 08:36

Pouyanpi added 2 commits June 17, 2026 14:13

test(recorded): add rails public API coverage

5ccf539

fix(streaming): avoid duplicate usage metadata chunk

6a19124

Pouyanpi added 2 commits June 17, 2026 14:13

test(recorded): share public api embedding fixture

00e1c34

Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e6370fc to 399c914 Compare June 17, 2026 12:17

Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 474d54d to 36679f9 Compare June 17, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(recorded): add rails public API coverage (4/5)#1977

test(recorded): add rails public API coverage (4/5)#1977
Pouyanpi wants to merge 4 commits into
stack/recorded-tests-03-clientsfrom
stack/recorded-tests-04-public-api

Pouyanpi commented Jun 3, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading

Confidence Score: 5/5

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Pouyanpi Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Pouyanpi Jun 11, 2026

Uh oh!

greptile-apps Bot Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pouyanpi commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What Changed

Stack Position

Stack Context

Validation

Uh oh!

codecov Bot commented Jun 3, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pouyanpi commented Jun 3, 2026 •

edited

Loading

greptile-apps Bot commented Jun 11, 2026 •

edited

Loading