Skip to content

RHAIENG-5096: extract shared load_golden() from duplicated conftest files#107

Merged
andrewdonheiser merged 1 commit into
mainfrom
RHAIENG-5096-extract-shared-load-golden-utility-from-duplicated-conftest-files
May 20, 2026
Merged

RHAIENG-5096: extract shared load_golden() from duplicated conftest files#107
andrewdonheiser merged 1 commit into
mainfrom
RHAIENG-5096-extract-shared-load-golden-utility-from-duplicated-conftest-files

Conversation

@andrewdonheiser
Copy link
Copy Markdown
Contributor

Summary

  • Extracts the duplicated load_golden() helper (load golden_queries.yaml, optionally filter by category) from 5 behavioral test files into a shared evals/harness/fixtures.py module
  • Each consumer keeps a thin 2-line wrapper that binds fixtures_dir and preserves the existing zero-arg call signature — no test logic changes
  • Updates docs/adding-behavioral-tests.md with the new pattern

Files changed

File Change
evals/harness/fixtures.py New — shared load_golden(fixtures_dir, category)
agents/autogen/mcp_agent/tests/behavioral/conftest.py Use shared loader
agents/crewai/websearch_agent/tests/behavioral/conftest.py Use shared loader
agents/langgraph/agentic_rag/tests/behavioral/conftest.py Use shared loader
agents/langgraph/react_agent/tests/behavioral/test_tool_usage.py Use shared loader
agents/vanilla_python/openai_responses_agent/tests/behavioral/conftest.py Use shared loader
docs/adding-behavioral-tests.md Document the new pattern

Test plan

  • Import resolution: from harness.fixtures import load_golden resolves via pythonpath = ["evals"]
  • load_golden is NOT re-exported from harness.__init__ (consumers import from harness.fixtures directly)
  • pytest --collect-only passes for all 5 affected agents (77 tests collected, 0 errors)
  • Cross-agent consistency check: 10/10 checks pass (import alias, FIXTURES_DIR naming, wrapper signature/body, no leftover inline implementations, no stale imports, doc accuracy)
  • No remaining golden_queries.yaml references in agent Python files (grep returns 0 matches)

🤖 Generated with Claude Code

@andrewdonheiser andrewdonheiser requested a review from a team as a code owner May 19, 2026 19:15
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: dbe4de0f-060a-4595-8157-7d37be318757

📥 Commits

Reviewing files that changed from the base of the PR and between a4767e8 and 3ed17bb.

📒 Files selected for processing (7)
  • agents/autogen/mcp_agent/tests/behavioral/conftest.py
  • agents/crewai/websearch_agent/tests/behavioral/conftest.py
  • agents/langgraph/agentic_rag/tests/behavioral/conftest.py
  • agents/langgraph/react_agent/tests/behavioral/test_tool_usage.py
  • agents/vanilla_python/openai_responses_agent/tests/behavioral/conftest.py
  • docs/adding-behavioral-tests.md
  • evals/harness/fixtures.py
✅ Files skipped from review due to trivial changes (1)
  • docs/adding-behavioral-tests.md

📝 Walkthrough

Walkthrough

Adds a shared golden-query loader (evals/harness/fixtures.py), updates multiple agent behavioral test modules to delegate their load_golden() to it (binding a local FIXTURES_DIR), and documents the wrapper pattern.

Changes

Golden Test Query Fixture Consolidation

Layer / File(s) Summary
Shared fixture utility
evals/harness/fixtures.py
New module exports load_golden(fixtures_dir, category) that reads golden_queries.yaml, extracts queries, and optionally filters by category.
Agent behavioral test migrations
agents/autogen/mcp_agent/tests/behavioral/conftest.py, agents/crewai/websearch_agent/tests/behavioral/conftest.py, agents/langgraph/agentic_rag/tests/behavioral/conftest.py, agents/langgraph/react_agent/tests/behavioral/test_tool_usage.py, agents/vanilla_python/openai_responses_agent/tests/behavioral/conftest.py
These test modules import the shared loader (aliased locally), define FIXTURES_DIR pointing at their fixtures/ folder, and replace inline YAML-reading/filtering with a thin wrapper that calls the shared loader with optional category.
Documentation update
docs/adding-behavioral-tests.md
Adds a load_golden() wrapper example that binds FIXTURES_DIR and delegates to harness.fixtures.load_golden.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main objective: extracting a duplicated load_golden() helper from multiple conftest files into a shared module, which matches the changeset perfectly.
Description check ✅ Passed The description accurately explains the refactoring effort, lists all affected files, details the test plan, and confirms consistency checks—directly related to and supporting the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch RHAIENG-5096-extract-shared-load-golden-utility-from-duplicated-conftest-files

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/adding-behavioral-tests.md`:
- Around line 43-50: The snippet uses Path and Any but doesn't import them; add
the missing imports for pathlib.Path and typing.Any at the top of the snippet so
the example is copy/paste-safe. Update the top of the snippet before the
existing from harness.fixtures import load_golden as _load_golden_from line to
import Path and Any, leaving FIXTURES_DIR, load_golden, and _load_golden_from
unchanged.

In `@evals/harness/fixtures.py`:
- Around line 2-5: Reorder and group the imports to satisfy Ruff I001: keep
"from __future__ import annotations" first, then the standard library imports in
alphabetical order ("from pathlib import Path" then "from typing import Any"),
then a blank line, then third-party imports ("import yaml"); ensure blank lines
separate the future, stdlib, and third-party groups and that the exact import
statements ("from __future__ import annotations", "from pathlib import Path",
"from typing import Any", "import yaml") are used as shown.
- Around line 14-15: yaml.safe_load(f) can return None or a non-mapping which
makes the subsequent data.get("queries", []) call raise; update the load logic
(the variable data from yaml.safe_load) to ensure it's a mapping before calling
get (e.g., if not isinstance(data, dict): set data = {} or directly set queries
= []), then derive queries from data.get("queries", []) so that queries is
always a list and does not blow up when the YAML root is empty/scalar.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 40a75545-352b-471c-b7bc-2cf62da88c66

📥 Commits

Reviewing files that changed from the base of the PR and between 7fb3ca4 and a4767e8.

📒 Files selected for processing (7)
  • agents/autogen/mcp_agent/tests/behavioral/conftest.py
  • agents/crewai/websearch_agent/tests/behavioral/conftest.py
  • agents/langgraph/agentic_rag/tests/behavioral/conftest.py
  • agents/langgraph/react_agent/tests/behavioral/test_tool_usage.py
  • agents/vanilla_python/openai_responses_agent/tests/behavioral/conftest.py
  • docs/adding-behavioral-tests.md
  • evals/harness/fixtures.py

Comment thread docs/adding-behavioral-tests.md
Comment thread evals/harness/fixtures.py
Comment thread evals/harness/fixtures.py Outdated
@andrewdonheiser andrewdonheiser force-pushed the RHAIENG-5096-extract-shared-load-golden-utility-from-duplicated-conftest-files branch 2 times, most recently from b1f78f8 to b955476 Compare May 19, 2026 19:21
Comment thread evals/harness/fixtures.py
tarun-etikala
tarun-etikala previously approved these changes May 20, 2026
The load_golden() helper was copy-pasted identically across 5 agent
test files. Extract it into a shared module with an explicit
fixtures_dir parameter; each consumer keeps a thin 2-line wrapper
that preserves the existing zero-arg call signature, so no test
files need changes.

Closes RHAIENG-5096

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@andrewdonheiser andrewdonheiser force-pushed the RHAIENG-5096-extract-shared-load-golden-utility-from-duplicated-conftest-files branch from b955476 to 3ed17bb Compare May 20, 2026 13:50
@andrewdonheiser andrewdonheiser merged commit 33cc35f into main May 20, 2026
8 checks passed
@andrewdonheiser andrewdonheiser deleted the RHAIENG-5096-extract-shared-load-golden-utility-from-duplicated-conftest-files branch May 20, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants