Skip to content

test(recorded): add rails public API coverage (4/5)#1977

Open
Pouyanpi wants to merge 4 commits into
stack/recorded-tests-03-clientsfrom
stack/recorded-tests-04-public-api
Open

test(recorded): add rails public API coverage (4/5)#1977
Pouyanpi wants to merge 4 commits into
stack/recorded-tests-03-clientsfrom
stack/recorded-tests-04-public-api

Conversation

@Pouyanpi

@Pouyanpi Pouyanpi commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds recorded public API coverage for generate, check, dialog, request parameters, and streaming behavior.

Why

The public LLMRails API contract should be pinned with deterministic replay across OpenAI and NIM-backed calls.

What Changed

  • Adds public API configs and cassettes.
  • Covers sync and async generate/check flows.
  • Covers streaming, request parameters, task-specific models, and negative provider/input cases.

Stack Position

Part 4 of 5.

Stack Context

This stack decomposes recorded end-to-end replay coverage into reviewable slices. The PRs should be reviewed against their parent branch in the stack.

Please review each PR against its parent branch, not directly against the root base branch, except for part 1.

Order PR Branch Base
1 #1974 stack/recorded-tests-01-harness develop
2 #1975 stack/recorded-tests-02-deterministic-library-load stack/recorded-tests-01-harness
3 #1976 stack/recorded-tests-03-clients stack/recorded-tests-02-deterministic-library-load
4 #1977 stack/recorded-tests-04-public-api stack/recorded-tests-03-clients
5 #1978 stack/recorded-tests-05-library-rails stack/recorded-tests-04-public-api

Validation

poetry check --lock
poetry lock --no-update
poetry install --with dev
poetry run pytest tests/recorded --block-network -q
pre-commit hooks passed during commit creation

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e9ed8b3 to e3e8722 Compare June 9, 2026 16:22
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from dc3a4a7 to 8000a8e Compare June 9, 2026 16:22
@Pouyanpi Pouyanpi marked this pull request as ready for review June 11, 2026 10:33
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds recorded (VCR-based) public API coverage for LLMRails — covering generate, check, dialog, request parameters, and streaming — backed by OpenAI and NIM cassettes. It also ships a focused streaming.py fix that prevents the usage metadata token from being re-emitted in the final END_OF_STREAM chunk.

  • nemoguardrails/streaming.py: After emitting a non-terminal chunk that carries usage in its metadata, usage is now popped from current_metadata so it cannot bleed into the final synthesised chunk on END_OF_STREAM.
  • tests/recorded/rails/public_api/: New test files cover sync/async generate, check (input/output/parallel rails), dialog, single-call, request-parameter forwarding, task-specific models, and streaming contracts with inline snapshots.
  • tests/recorded/test_recorded_helpers.py / tests/test_streaming_handler.py: New unit tests pin load_config isolation semantics and the no-duplicate-usage guarantee for the streaming fix.

Confidence Score: 5/5

Safe to merge — the production change is a minimal two-line fix in streaming.py that is directly covered by a new unit test, and the rest of the PR is additive test infrastructure only.

The only production code touched is a two-line guard in StreamingHandler._process_chunk that clears usage from current_metadata after it has been emitted in a non-terminal chunk. The logic is correct and test_raw_usage_metadata_chunk_is_not_repeated_on_final_chunk exercises exactly the fixed path. All other files are new test code with no impact on the library's runtime behaviour.

No files require special attention.

Important Files Changed

Filename Overview
nemoguardrails/streaming.py Two-line fix pops usage from current_metadata after emitting a usage-bearing chunk, preventing it from re-appearing in the final END_OF_STREAM chunk; logic is sound and covered by a new unit test.
tests/recorded/rails/public_api/configs.py Centralises all config fixtures; STREAMING_DISABLED_CONFIG naming is a previously-flagged concern that doesn't affect correctness.
tests/recorded/rails/public_api/test_stream.py Covers OpenAI and NIM stream contracts, metadata/usage assertions, streaming output rail allow/block, and StreamingNotSupportedError; snapshots and assertions look correct.
tests/recorded/rails/public_api/test_generate.py Covers sync/async generate for OpenAI, NIM, and nemoguards_full; includes LLM call log assertions, invalid model error path, and FakeLLMModel-based output rail tests.
tests/recorded/rails/public_api/test_check.py Exhaustive check-API coverage: 14 scenarios across input/output/parallel/auto/explicit rail types, both sync and async, with a VCR-backed NIM test and an edge-case empty-messages test.
tests/recorded/rails/public_api/test_requests.py Verifies LLM parameter forwarding for both generate and stream, and validates task-specific model routing via recorded cassettes.
tests/recorded/test_recorded_helpers.py Two focused unit tests: one pins that load_config returns a fresh copy after LLMRails mutates it, the other asserts assert_stream_contract accepts metadata-bearing chunks.
tests/test_streaming_handler.py New unit test test_raw_usage_metadata_chunk_is_not_repeated_on_final_chunk directly verifies the streaming.py fix; checks that exactly one usage chunk is emitted and the final chunk's metadata contains no usage key.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant P as Provider (OpenAI/NIM)
    participant SH as StreamingHandler
    participant Q as asyncio.Queue
    participant C as Consumer

    P->>SH: push_chunk("Hello")
    SH->>Q: "put({"text": "Hello"})"
    Q->>C: "{"text": "Hello"}"

    P->>SH: "push_chunk("", metadata={"usage": {...}})"
    note over SH: current_metadata.update({"usage": {...}})
    SH->>Q: "put({"text": "", "metadata": {"usage": {...}}})"
    note over SH: pop("usage") from current_metadata
    Q->>C: "{"text": "", "metadata": {"usage": {...}}}"

    P->>SH: push_chunk(None) → END_OF_STREAM
    note over SH: current_metadata has no "usage" key
    SH->>Q: "put({"text": END_OF_STREAM, "metadata": {"response_metadata": None, "usage_metadata": None}})"
    Q->>C: final chunk (no duplicate usage)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant P as Provider (OpenAI/NIM)
    participant SH as StreamingHandler
    participant Q as asyncio.Queue
    participant C as Consumer

    P->>SH: push_chunk("Hello")
    SH->>Q: "put({"text": "Hello"})"
    Q->>C: "{"text": "Hello"}"

    P->>SH: "push_chunk("", metadata={"usage": {...}})"
    note over SH: current_metadata.update({"usage": {...}})
    SH->>Q: "put({"text": "", "metadata": {"usage": {...}}})"
    note over SH: pop("usage") from current_metadata
    Q->>C: "{"text": "", "metadata": {"usage": {...}}}"

    P->>SH: push_chunk(None) → END_OF_STREAM
    note over SH: current_metadata has no "usage" key
    SH->>Q: "put({"text": END_OF_STREAM, "metadata": {"response_metadata": None, "usage_metadata": None}})"
    Q->>C: final chunk (no duplicate usage)
Loading

Reviews (11): Last reviewed commit: "test(recorded): consume shared harness i..." | Re-trigger Greptile

Comment on lines +17 to +45

from nemoguardrails.embeddings.index import EmbeddingsIndex, IndexItem


class SimpleEmbeddingSearchProvider(EmbeddingsIndex):
@property
def embedding_size(self):
return 0

def __init__(self):
self.items: List[IndexItem] = []

async def add_item(self, item: IndexItem):
self.items.append(item)

async def add_items(self, items: List[IndexItem]):
self.items.extend(items)

async def build(self):
return None

async def search(self, text: str, max_results: int, threshold=None):
normalized = text.lower()
matches = [item for item in self.items if item.text.lower() in normalized or normalized in item.text.lower()]
return matches[:max_results] or self.items[:max_results]


def init(app):
app.register_embedding_search_provider("simple", SimpleEmbeddingSearchProvider)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Three-way duplication of SimpleEmbeddingSearchProvider

configs/dialog/config.py, configs/single_call/config.py, and configs/task_models/config.py are byte-for-byte identical. When the EmbeddingsIndex interface changes, all three will need to be updated in sync. A shared module (e.g. a configs/shared/embedding.py imported by each config.py) or a factory helper would eliminate the duplication.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/recorded/rails/public_api/configs/dialog/config.py
Line: 17-45

Comment:
**Three-way duplication of `SimpleEmbeddingSearchProvider`**

`configs/dialog/config.py`, `configs/single_call/config.py`, and `configs/task_models/config.py` are byte-for-byte identical. When the `EmbeddingsIndex` interface changes, all three will need to be updated in sync. A shared module (e.g. a `configs/shared/embedding.py` imported by each `config.py`) or a factory helper would eliminate the duplication.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai this is resolved in the latest commit. Do you confirm?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not confirmed — the three files are still byte-for-byte identical in the latest commit. All three (configs/dialog/config.py, configs/single_call/config.py, configs/task_models/config.py) share the same blob SHA (72be185f684ee8a6a3e57b06c896aaa66ea17709) and identical content. The duplication hasn't been resolved yet.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai what about now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed ✅ — all three config.py files now import SimpleEmbeddingSearchProvider from a shared simple_embedding_provider module instead of duplicating the class. The duplication is resolved.

@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e3e8722 to 3b279b4 Compare June 11, 2026 10:52
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch 2 times, most recently from a253ae2 to 3d1d7ea Compare June 11, 2026 12:44
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from 27732c7 to a759e09 Compare June 11, 2026 13:18
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 3d1d7ea to 3ee9d48 Compare June 11, 2026 13:18
Comment thread tests/recorded/rails/public_api/test_stream.py
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 3ee9d48 to 5153569 Compare June 11, 2026 13:36
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from efee56d to d7a1262 Compare June 15, 2026 08:57
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch 2 times, most recently from caafb61 to 3136aa1 Compare June 15, 2026 12:00
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch 2 times, most recently from 19ec9e4 to 0f531ea Compare June 15, 2026 14:18
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 3136aa1 to 06858a2 Compare June 15, 2026 14:18
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from 0f531ea to e6370fc Compare June 16, 2026 08:36
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 06858a2 to 474d54d Compare June 16, 2026 08:36
Pouyanpi added 2 commits June 17, 2026 14:13
Use build_rails() instead of inline LLMRails(load_config(...)) (D11) and wire the
previously-orphaned assert_generation_response / assert_llm_tasks helpers into the
task-specific-models test instead of hand-rolling the llm-task set (D9).
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-04-public-api branch from e6370fc to 399c914 Compare June 17, 2026 12:17
@Pouyanpi Pouyanpi force-pushed the stack/recorded-tests-03-clients branch from 474d54d to 36679f9 Compare June 17, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant