Skip to content

chore: add conventional commits pre-commit hook#84

Closed
sanafayyaz315 wants to merge 7 commits into
red-hat-data-services:mainfrom
sanafayyaz315:pre-commit-hooks
Closed

chore: add conventional commits pre-commit hook#84
sanafayyaz315 wants to merge 7 commits into
red-hat-data-services:mainfrom
sanafayyaz315:pre-commit-hooks

Conversation

@sanafayyaz315
Copy link
Copy Markdown
Contributor

Description

Adds a pre-commit hook to enforce the Conventional Commits specification on all commit messages. Uses the espressif/conventional-precommit-linter hook, which validates commit message format at the commit-msg stage.

Changes:

  • .pre-commit-config.yaml: new file with the conventional commits hook configured with allowed types, subject length limits, and breaking change support
  • CONTRIBUTING.md: added "Development setup" section with pre-commit install instructions and updated "Commit message conventions" with the enforced format, all allowed types, and examples

Jira Ticket

RHAIENG-4066

Testing

  • make test passes (run from the affected agent directory)
  • Manual testing performed (describe steps below)
  • No testing required (documentation/config change only)

Verified locally:

  • Bad commit messages (e.g. "updated stuff", "chore: doc") are rejected
  • Valid commit messages (e.g. "chore: add conventional commits pre-commit hook") pass

Checklist

  • I have read CONTRIBUTING.md
  • No .env or secret files are included in this PR
  • All changes are within scope of the linked Jira ticket

Review Guidance

Review Guidance

  • .pre-commit-config.yaml — review the args to confirm the team is happy with the allowed types and subject length limits (min 10, max 72)
  • Additional args available from the linter that we're not currently using:
    • --scopes: restrict to specific scopes (currently unrestricted)
    • --body-max-line-length: limit body line length (currently unrestricted)
    • --summary-uppercase: enforce uppercase first letter
    • --scope-case-insensitive: allow uppercase in scopes
  • This is phase 1 of pre-commit setup. Ruff, markdownlint, and file-hygiene hooks will be added in a follow-up PR once the linter configs (ruff.toml, .markdownlint.json) are in place.

Related PRs

#80 (Tarun's Ruff linting and formatting — provides ruff.toml that future pre-commit hooks will use)

andrewdonheiser and others added 7 commits April 22, 2026 12:59
Migrate the evalhub_adapter package from the original behavioral-tests
repo into tests/behavioral/evalhub_adapter/ as a first-class source
package with unit tests and benchmark fixture files.

- Adapter: AgenticEvalAdapter implementing FrameworkAdapter interface
  with 6-phase eval pipeline (init, load, eval, post-process, persist,
  report), 10 scorer dispatches, MLflow trace enrichment
- Config: AgenticEvalParams bridge from JobSpec to TaskConfig
- Benchmarks: 7 benchmark definitions with golden query YAML loader
- Fixtures: tool_use.yaml with 5 queries, 6 stub YAMLs for remaining
  benchmarks
- Tests: 32 unit tests covering config, benchmarks, scorer resolution,
  YAML loading, real fixture parsing, and edge cases
- pyproject.toml: evalhub optional dep, unit marker, package include

Key design decisions:
- Lazy __init__.py import so unit tests run without evalhub SDK
- verify_ssl defaults to True with warning log when disabled
- query_error metric excluded from overall score computation
- _report_fatal accepts phase parameter for accurate EvalHub reporting
- MLflow gating requires both tracking_uri and experiment_name

RHAIENG-4604

Made-with: Cursor
Strip speculative code from the EvalHub adapter migration:
- Remove 6 empty benchmark YAML stubs (keep only agentic-tool-use)
- Trim BENCHMARKS registry to the one working benchmark
- Remove unused QuerySpec.difficulty/category fields
- Remove unused TaskConfig import from adapter.py
- Simplify __init__.py to plain import
- Fix main() docstring to reference FrameworkAdapter base class

Add missing deliverables:
- README.md documenting adapter design, what works, what's planned
- test_adapter.py with 28 unit tests for scorer dispatch and aggregation
- .gitignore entries for .cursor/ and REFACTORING.md

Made-with: Cursor
- Treat TaskResult(success=False) as a query failure before scoring;
  previously failed agent calls bypassed failed_count and ran through
  normal scorers
- Include query_error in overall_score so failed queries penalize the
  result instead of being silently excluded
- Escape regex metacharacter in test match= pattern

Made-with: Cursor
- Move evalhub stub from test_adapter.py to conftest.py so tests work
  regardless of collection order or file selection
- Wrap MLflow trace enrichment in its own try/except to prevent flaky
  MLflow from invalidating successful query results
- Validate missing 'query' key in YAML entries with descriptive
  ValueError; widen adapter's load_queries catch to include ValueError
- Strip orphan difficulty/category fields from tool_use.yaml
- Add __post_init__ validation for positive timeout/latency values
- Broaden .gitignore REFACTORING.md pattern to **/REFACTORING.md
- Add TODO for asyncio.gather concurrent query execution
- Update README: validation docs, MLflow fault tolerance, sequential
  execution note, conftest.py stub explanation

Made-with: Cursor
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces an EvalHub on-cluster adapter that bridges pytest-based behavioral evaluations with EvalHub's Kubernetes orchestration. It includes configuration validation, benchmark management, scoring aggregation, optional MLflow integration, and comprehensive test coverage with stubs for the evalhub package.

Changes

Cohort / File(s) Summary
Configuration & Tooling
.gitignore, .pre-commit-config.yaml, CONTRIBUTING.md, pyproject.toml
Extended .gitignore for .cursor and REFACTORING.md; added pre-commit hook configuration for conventional commits linting; updated CONTRIBUTING.md with stricter commit message format, expanded types, and breaking-change handling; added evalhub optional dependency group, pytest unit marker, and evalhub_adapter* package discovery in pyproject.toml.
Adapter Core
tests/behavioral/evalhub_adapter/__init__.py, tests/behavioral/evalhub_adapter/adapter.py, tests/behavioral/evalhub_adapter/benchmarks.py, tests/behavioral/evalhub_adapter/config.py
Main EvalHub framework adapter (AgenticEvalAdapter) that validates job config, loads golden queries, orchestrates async task execution via httpx, runs scorer dispatch logic, aggregates per-query scores into per-metric results with pass rates/min/max, computes overall score, and conditionally logs to MLflow; benchmark registry with query/scorer management; configuration validation with positive-constraint checking and optional MLflow trace enrichment support.
Adapter Documentation
tests/behavioral/evalhub_adapter/README.md
Comprehensive guide defining file architecture, configuration expectations via JobSpec.parameters, supported benchmarks/scorers, MLflow tracing capabilities, roadmap, and test execution guidance with evalhub stub notes.
Test Infrastructure
tests/behavioral/evalhub_adapter/tests/__init__.py, tests/behavioral/evalhub_adapter/tests/conftest.py
Test package initializer; conftest with evalhub package stubs for import compatibility and fixtures_dir pytest fixture for isolated YAML fixture testing.
Adapter Tests
tests/behavioral/evalhub_adapter/tests/test_adapter.py, tests/behavioral/evalhub_adapter/tests/test_config_and_benchmarks.py
Unit tests validating scorer dispatch, score aggregation with metric grouping, overall score computation with rounding; comprehensive config/benchmark tests covering AgenticEvalParams mapping/defaults, TaskConfig construction, benchmark registry lookups with error reporting, scorer resolution including "all" expansion, and YAML query loading with validation and fixture sanity checks.
Test Fixtures
tests/behavioral/fixtures/benchmarks/tool_use.yaml
Golden benchmark fixture defining five agent queries with expected tool calls and extracted elements for the agentic-tool-use benchmark.

Sequence Diagram(s)

sequenceDiagram
    participant EvalHub
    participant AgenticEvalAdapter
    participant BenchmarkMgr as Benchmark Manager
    participant QueryLoader as Query Loader
    participant Agent as Agent Server
    participant Scorers
    participant MLflow
    
    EvalHub->>AgenticEvalAdapter: run_benchmark_job(JobSpec)
    activate AgenticEvalAdapter
    
    AgenticEvalAdapter->>AgenticEvalAdapter: validate config
    AgenticEvalAdapter->>BenchmarkMgr: get_benchmark(benchmark_id)
    BenchmarkMgr-->>AgenticEvalAdapter: BenchmarkDef
    
    AgenticEvalAdapter->>QueryLoader: load_queries(benchmark)
    QueryLoader-->>AgenticEvalAdapter: list[QuerySpec]
    
    AgenticEvalAdapter->>AgenticEvalAdapter: init MLflow (optional)
    
    loop for each query
        AgenticEvalAdapter->>Agent: run_task(query, config)
        Agent-->>AgenticEvalAdapter: TaskResult
        
        AgenticEvalAdapter->>Scorers: score_result(task_result, scorers)
        loop for each scorer
            Scorers-->>AgenticEvalAdapter: Score
        end
        
        alt MLflow enabled
            AgenticEvalAdapter->>MLflow: log trace enrichment
        end
    end
    
    AgenticEvalAdapter->>AgenticEvalAdapter: aggregate_scores()
    AgenticEvalAdapter->>AgenticEvalAdapter: compute_overall_score()
    
    alt has results
        AgenticEvalAdapter->>MLflow: log metrics/params
    end
    
    AgenticEvalAdapter-->>EvalHub: JobResults
    deactivate AgenticEvalAdapter
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a conventional commits pre-commit hook configuration. It is concise, specific, and directly reflects the primary objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset. It explains the purpose, lists all key changes, includes testing verification, and provides review guidance specific to the modifications.
Docstring Coverage ✅ Passed Docstring coverage is 97.06% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Large PR detected (1700 lines changed)

This PR exceeds 1200 lines of code changes (excluding lock files, generated content, and images). Large PRs are harder to review thoroughly and are more likely to introduce bugs.

Consider splitting this PR into smaller, focused changes.

@sanafayyaz315 sanafayyaz315 deleted the pre-commit-hooks branch April 29, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants