test(recorded): add rails public API coverage (4/5)#1977
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
e9ed8b3 to
e3e8722
Compare
dc3a4a7 to
8000a8e
Compare
Greptile SummaryThis PR adds recorded (VCR-based) public API coverage for
|
| Filename | Overview |
|---|---|
| nemoguardrails/streaming.py | Two-line fix pops usage from current_metadata after emitting a usage-bearing chunk, preventing it from re-appearing in the final END_OF_STREAM chunk; logic is sound and covered by a new unit test. |
| tests/recorded/rails/public_api/configs.py | Centralises all config fixtures; STREAMING_DISABLED_CONFIG naming is a previously-flagged concern that doesn't affect correctness. |
| tests/recorded/rails/public_api/test_stream.py | Covers OpenAI and NIM stream contracts, metadata/usage assertions, streaming output rail allow/block, and StreamingNotSupportedError; snapshots and assertions look correct. |
| tests/recorded/rails/public_api/test_generate.py | Covers sync/async generate for OpenAI, NIM, and nemoguards_full; includes LLM call log assertions, invalid model error path, and FakeLLMModel-based output rail tests. |
| tests/recorded/rails/public_api/test_check.py | Exhaustive check-API coverage: 14 scenarios across input/output/parallel/auto/explicit rail types, both sync and async, with a VCR-backed NIM test and an edge-case empty-messages test. |
| tests/recorded/rails/public_api/test_requests.py | Verifies LLM parameter forwarding for both generate and stream, and validates task-specific model routing via recorded cassettes. |
| tests/recorded/test_recorded_helpers.py | Two focused unit tests: one pins that load_config returns a fresh copy after LLMRails mutates it, the other asserts assert_stream_contract accepts metadata-bearing chunks. |
| tests/test_streaming_handler.py | New unit test test_raw_usage_metadata_chunk_is_not_repeated_on_final_chunk directly verifies the streaming.py fix; checks that exactly one usage chunk is emitted and the final chunk's metadata contains no usage key. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant P as Provider (OpenAI/NIM)
participant SH as StreamingHandler
participant Q as asyncio.Queue
participant C as Consumer
P->>SH: push_chunk("Hello")
SH->>Q: "put({"text": "Hello"})"
Q->>C: "{"text": "Hello"}"
P->>SH: "push_chunk("", metadata={"usage": {...}})"
note over SH: current_metadata.update({"usage": {...}})
SH->>Q: "put({"text": "", "metadata": {"usage": {...}}})"
note over SH: pop("usage") from current_metadata
Q->>C: "{"text": "", "metadata": {"usage": {...}}}"
P->>SH: push_chunk(None) → END_OF_STREAM
note over SH: current_metadata has no "usage" key
SH->>Q: "put({"text": END_OF_STREAM, "metadata": {"response_metadata": None, "usage_metadata": None}})"
Q->>C: final chunk (no duplicate usage)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant P as Provider (OpenAI/NIM)
participant SH as StreamingHandler
participant Q as asyncio.Queue
participant C as Consumer
P->>SH: push_chunk("Hello")
SH->>Q: "put({"text": "Hello"})"
Q->>C: "{"text": "Hello"}"
P->>SH: "push_chunk("", metadata={"usage": {...}})"
note over SH: current_metadata.update({"usage": {...}})
SH->>Q: "put({"text": "", "metadata": {"usage": {...}}})"
note over SH: pop("usage") from current_metadata
Q->>C: "{"text": "", "metadata": {"usage": {...}}}"
P->>SH: push_chunk(None) → END_OF_STREAM
note over SH: current_metadata has no "usage" key
SH->>Q: "put({"text": END_OF_STREAM, "metadata": {"response_metadata": None, "usage_metadata": None}})"
Q->>C: final chunk (no duplicate usage)
Reviews (11): Last reviewed commit: "test(recorded): consume shared harness i..." | Re-trigger Greptile
|
|
||
| from nemoguardrails.embeddings.index import EmbeddingsIndex, IndexItem | ||
|
|
||
|
|
||
| class SimpleEmbeddingSearchProvider(EmbeddingsIndex): | ||
| @property | ||
| def embedding_size(self): | ||
| return 0 | ||
|
|
||
| def __init__(self): | ||
| self.items: List[IndexItem] = [] | ||
|
|
||
| async def add_item(self, item: IndexItem): | ||
| self.items.append(item) | ||
|
|
||
| async def add_items(self, items: List[IndexItem]): | ||
| self.items.extend(items) | ||
|
|
||
| async def build(self): | ||
| return None | ||
|
|
||
| async def search(self, text: str, max_results: int, threshold=None): | ||
| normalized = text.lower() | ||
| matches = [item for item in self.items if item.text.lower() in normalized or normalized in item.text.lower()] | ||
| return matches[:max_results] or self.items[:max_results] | ||
|
|
||
|
|
||
| def init(app): | ||
| app.register_embedding_search_provider("simple", SimpleEmbeddingSearchProvider) |
There was a problem hiding this comment.
Three-way duplication of
SimpleEmbeddingSearchProvider
configs/dialog/config.py, configs/single_call/config.py, and configs/task_models/config.py are byte-for-byte identical. When the EmbeddingsIndex interface changes, all three will need to be updated in sync. A shared module (e.g. a configs/shared/embedding.py imported by each config.py) or a factory helper would eliminate the duplication.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/recorded/rails/public_api/configs/dialog/config.py
Line: 17-45
Comment:
**Three-way duplication of `SimpleEmbeddingSearchProvider`**
`configs/dialog/config.py`, `configs/single_call/config.py`, and `configs/task_models/config.py` are byte-for-byte identical. When the `EmbeddingsIndex` interface changes, all three will need to be updated in sync. A shared module (e.g. a `configs/shared/embedding.py` imported by each `config.py`) or a factory helper would eliminate the duplication.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
@greptileai this is resolved in the latest commit. Do you confirm?
There was a problem hiding this comment.
Not confirmed — the three files are still byte-for-byte identical in the latest commit. All three (configs/dialog/config.py, configs/single_call/config.py, configs/task_models/config.py) share the same blob SHA (72be185f684ee8a6a3e57b06c896aaa66ea17709) and identical content. The duplication hasn't been resolved yet.
There was a problem hiding this comment.
Confirmed ✅ — all three config.py files now import SimpleEmbeddingSearchProvider from a shared simple_embedding_provider module instead of duplicating the class. The duplication is resolved.
e3e8722 to
3b279b4
Compare
a253ae2 to
3d1d7ea
Compare
27732c7 to
a759e09
Compare
3d1d7ea to
3ee9d48
Compare
3ee9d48 to
5153569
Compare
efee56d to
d7a1262
Compare
caafb61 to
3136aa1
Compare
19ec9e4 to
0f531ea
Compare
3136aa1 to
06858a2
Compare
0f531ea to
e6370fc
Compare
06858a2 to
474d54d
Compare
Use build_rails() instead of inline LLMRails(load_config(...)) (D11) and wire the previously-orphaned assert_generation_response / assert_llm_tasks helpers into the task-specific-models test instead of hand-rolling the llm-task set (D9).
e6370fc to
399c914
Compare
474d54d to
36679f9
Compare
Summary
Adds recorded public API coverage for generate, check, dialog, request parameters, and streaming behavior.
Why
The public LLMRails API contract should be pinned with deterministic replay across OpenAI and NIM-backed calls.
What Changed
Stack Position
Part 4 of 5.
Stack Context
This stack decomposes recorded end-to-end replay coverage into reviewable slices. The PRs should be reviewed against their parent branch in the stack.
Please review each PR against its parent branch, not directly against the root base branch, except for part 1.
stack/recorded-tests-01-harnessdevelopstack/recorded-tests-02-deterministic-library-loadstack/recorded-tests-01-harnessstack/recorded-tests-03-clientsstack/recorded-tests-02-deterministic-library-loadstack/recorded-tests-04-public-apistack/recorded-tests-03-clientsstack/recorded-tests-05-library-railsstack/recorded-tests-04-public-apiValidation