chore: add conventional commits pre-commit hook by sanafayyaz315 · Pull Request #84 · red-hat-data-services/agentic-starter-kits

sanafayyaz315 · 2026-04-29T09:23:51Z

Description

Adds a pre-commit hook to enforce the Conventional Commits specification on all commit messages. Uses the espressif/conventional-precommit-linter hook, which validates commit message format at the commit-msg stage.

Changes:

.pre-commit-config.yaml: new file with the conventional commits hook configured with allowed types, subject length limits, and breaking change support
CONTRIBUTING.md: added "Development setup" section with pre-commit install instructions and updated "Commit message conventions" with the enforced format, all allowed types, and examples

Jira Ticket

RHAIENG-4066

Testing

make test passes (run from the affected agent directory)
Manual testing performed (describe steps below)
No testing required (documentation/config change only)

Verified locally:

Bad commit messages (e.g. "updated stuff", "chore: doc") are rejected
Valid commit messages (e.g. "chore: add conventional commits pre-commit hook") pass

Checklist

I have read CONTRIBUTING.md
No .env or secret files are included in this PR
All changes are within scope of the linked Jira ticket

Review Guidance

.pre-commit-config.yaml — review the args to confirm the team is happy with the allowed types and subject length limits (min 10, max 72)
Additional args available from the linter that we're not currently using:
- --scopes: restrict to specific scopes (currently unrestricted)
- --body-max-line-length: limit body line length (currently unrestricted)
- --summary-uppercase: enforce uppercase first letter
- --scope-case-insensitive: allow uppercase in scopes
This is phase 1 of pre-commit setup. Ruff, markdownlint, and file-hygiene hooks will be added in a follow-up PR once the linter configs (ruff.toml, .markdownlint.json) are in place.

Related PRs

#80 (Tarun's Ruff linting and formatting — provides ruff.toml that future pre-commit hooks will use)

Migrate the evalhub_adapter package from the original behavioral-tests repo into tests/behavioral/evalhub_adapter/ as a first-class source package with unit tests and benchmark fixture files. - Adapter: AgenticEvalAdapter implementing FrameworkAdapter interface with 6-phase eval pipeline (init, load, eval, post-process, persist, report), 10 scorer dispatches, MLflow trace enrichment - Config: AgenticEvalParams bridge from JobSpec to TaskConfig - Benchmarks: 7 benchmark definitions with golden query YAML loader - Fixtures: tool_use.yaml with 5 queries, 6 stub YAMLs for remaining benchmarks - Tests: 32 unit tests covering config, benchmarks, scorer resolution, YAML loading, real fixture parsing, and edge cases - pyproject.toml: evalhub optional dep, unit marker, package include Key design decisions: - Lazy __init__.py import so unit tests run without evalhub SDK - verify_ssl defaults to True with warning log when disabled - query_error metric excluded from overall score computation - _report_fatal accepts phase parameter for accurate EvalHub reporting - MLflow gating requires both tracking_uri and experiment_name RHAIENG-4604 Made-with: Cursor

Strip speculative code from the EvalHub adapter migration: - Remove 6 empty benchmark YAML stubs (keep only agentic-tool-use) - Trim BENCHMARKS registry to the one working benchmark - Remove unused QuerySpec.difficulty/category fields - Remove unused TaskConfig import from adapter.py - Simplify __init__.py to plain import - Fix main() docstring to reference FrameworkAdapter base class Add missing deliverables: - README.md documenting adapter design, what works, what's planned - test_adapter.py with 28 unit tests for scorer dispatch and aggregation - .gitignore entries for .cursor/ and REFACTORING.md Made-with: Cursor

- Treat TaskResult(success=False) as a query failure before scoring; previously failed agent calls bypassed failed_count and ran through normal scorers - Include query_error in overall_score so failed queries penalize the result instead of being silently excluded - Escape regex metacharacter in test match= pattern Made-with: Cursor

- Move evalhub stub from test_adapter.py to conftest.py so tests work regardless of collection order or file selection - Wrap MLflow trace enrichment in its own try/except to prevent flaky MLflow from invalidating successful query results - Validate missing 'query' key in YAML entries with descriptive ValueError; widen adapter's load_queries catch to include ValueError - Strip orphan difficulty/category fields from tool_use.yaml - Add __post_init__ validation for positive timeout/latency values - Broaden .gitignore REFACTORING.md pattern to **/REFACTORING.md - Add TODO for asyncio.gather concurrent query execution - Update README: validation docs, MLflow fault tolerance, sequential execution note, conftest.py stub explanation Made-with: Cursor

…ng guide

coderabbitai · 2026-04-29T09:24:05Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces an EvalHub on-cluster adapter that bridges pytest-based behavioral evaluations with EvalHub's Kubernetes orchestration. It includes configuration validation, benchmark management, scoring aggregation, optional MLflow integration, and comprehensive test coverage with stubs for the evalhub package.

Changes

Cohort / File(s)	Summary
Configuration & Tooling `.gitignore`, `.pre-commit-config.yaml`, `CONTRIBUTING.md`, `pyproject.toml`	Extended .gitignore for `.cursor` and `REFACTORING.md`; added pre-commit hook configuration for conventional commits linting; updated CONTRIBUTING.md with stricter commit message format, expanded types, and breaking-change handling; added evalhub optional dependency group, pytest `unit` marker, and `evalhub_adapter*` package discovery in pyproject.toml.
Adapter Core `tests/behavioral/evalhub_adapter/__init__.py`, `tests/behavioral/evalhub_adapter/adapter.py`, `tests/behavioral/evalhub_adapter/benchmarks.py`, `tests/behavioral/evalhub_adapter/config.py`	Main EvalHub framework adapter (`AgenticEvalAdapter`) that validates job config, loads golden queries, orchestrates async task execution via httpx, runs scorer dispatch logic, aggregates per-query scores into per-metric results with pass rates/min/max, computes overall score, and conditionally logs to MLflow; benchmark registry with query/scorer management; configuration validation with positive-constraint checking and optional MLflow trace enrichment support.
Adapter Documentation `tests/behavioral/evalhub_adapter/README.md`	Comprehensive guide defining file architecture, configuration expectations via `JobSpec.parameters`, supported benchmarks/scorers, MLflow tracing capabilities, roadmap, and test execution guidance with evalhub stub notes.
Test Infrastructure `tests/behavioral/evalhub_adapter/tests/__init__.py`, `tests/behavioral/evalhub_adapter/tests/conftest.py`	Test package initializer; conftest with evalhub package stubs for import compatibility and `fixtures_dir` pytest fixture for isolated YAML fixture testing.
Adapter Tests `tests/behavioral/evalhub_adapter/tests/test_adapter.py`, `tests/behavioral/evalhub_adapter/tests/test_config_and_benchmarks.py`	Unit tests validating scorer dispatch, score aggregation with metric grouping, overall score computation with rounding; comprehensive config/benchmark tests covering `AgenticEvalParams` mapping/defaults, `TaskConfig` construction, benchmark registry lookups with error reporting, scorer resolution including `"all"` expansion, and YAML query loading with validation and fixture sanity checks.
Test Fixtures `tests/behavioral/fixtures/benchmarks/tool_use.yaml`	Golden benchmark fixture defining five agent queries with expected tool calls and extracted elements for the `agentic-tool-use` benchmark.

Sequence Diagram(s)

sequenceDiagram
    participant EvalHub
    participant AgenticEvalAdapter
    participant BenchmarkMgr as Benchmark Manager
    participant QueryLoader as Query Loader
    participant Agent as Agent Server
    participant Scorers
    participant MLflow
    
    EvalHub->>AgenticEvalAdapter: run_benchmark_job(JobSpec)
    activate AgenticEvalAdapter
    
    AgenticEvalAdapter->>AgenticEvalAdapter: validate config
    AgenticEvalAdapter->>BenchmarkMgr: get_benchmark(benchmark_id)
    BenchmarkMgr-->>AgenticEvalAdapter: BenchmarkDef
    
    AgenticEvalAdapter->>QueryLoader: load_queries(benchmark)
    QueryLoader-->>AgenticEvalAdapter: list[QuerySpec]
    
    AgenticEvalAdapter->>AgenticEvalAdapter: init MLflow (optional)
    
    loop for each query
        AgenticEvalAdapter->>Agent: run_task(query, config)
        Agent-->>AgenticEvalAdapter: TaskResult
        
        AgenticEvalAdapter->>Scorers: score_result(task_result, scorers)
        loop for each scorer
            Scorers-->>AgenticEvalAdapter: Score
        end
        
        alt MLflow enabled
            AgenticEvalAdapter->>MLflow: log trace enrichment
        end
    end
    
    AgenticEvalAdapter->>AgenticEvalAdapter: aggregate_scores()
    AgenticEvalAdapter->>AgenticEvalAdapter: compute_overall_score()
    
    alt has results
        AgenticEvalAdapter->>MLflow: log metrics/params
    end
    
    AgenticEvalAdapter-->>EvalHub: JobResults
    deactivate AgenticEvalAdapter

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a conventional commits pre-commit hook configuration. It is concise, specific, and directly reflects the primary objective of the PR.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset. It explains the purpose, lists all key changes, includes testing verification, and provides review guidance specific to the modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 97.06% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-29T09:24:19Z

Large PR detected (1700 lines changed)

This PR exceeds 1200 lines of code changes (excluding lock files, generated content, and images). Large PRs are harder to review thoroughly and are more likely to introduce bugs.

Consider splitting this PR into smaller, focused changes.

andrewdonheiser and others added 7 commits April 22, 2026 12:59

chore: add conventional commits pre-commit hook

f652670

chore: add pre-commit config

cbd4d0e

chore: add conventional commits pre-commit hook and update contributi…

560fc30

…ng guide

sanafayyaz315 requested a review from tarun-etikala April 29, 2026 09:23

github-actions Bot added area/docs area/tests size/xl labels Apr 29, 2026

sanafayyaz315 closed this Apr 29, 2026

sanafayyaz315 deleted the pre-commit-hooks branch April 29, 2026 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add conventional commits pre-commit hook#84

chore: add conventional commits pre-commit hook#84
sanafayyaz315 wants to merge 7 commits into
red-hat-data-services:mainfrom
sanafayyaz315:pre-commit-hooks

sanafayyaz315 commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanafayyaz315 commented Apr 29, 2026

Description

Jira Ticket

Testing

Checklist

Review Guidance

Review Guidance

Related PRs

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading