CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project overview

eval-hub-contrib is a collection of community-contributed evaluation framework adapters for the eval-hub service. Each adapter under adapters/ wraps an external evaluation framework (LightEval, GuideLLM, MTEB, IBM CLEAR) and implements the FrameworkAdapter pattern from the evalhub-sdk.

Build and test commands

Container images are built with podman by default (override with BUILD_TOOL=docker):

make image-lighteval          # build one adapter
make images                   # build all adapters
make push-lighteval REGISTRY=quay.io/your-org VERSION=v1.0.0

Each adapter has its own Python venv, dependencies, and pytest suite:

make test-lighteval           # run one adapter's tests
make test-clear               # run one adapter's tests
make tests                    # run all adapter tests

To run a single test file or test case manually:

cd adapters/lighteval
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt -r requirements-test.txt
.venv/bin/pytest tests/test_adapter.py -v             # all tests in file
.venv/bin/pytest tests/test_adapter.py::test_lighteval_happy_path -v  # single test

Commit conventions

This project enforces Conventional Commits via commitizen. A pre-commit hook validates commit messages (install with pre-commit install --hook-type commit-msg). CI will also reject non-conforming messages on PRs to main.

Format: <type>(<scope>): <subject>

Common types: feat, fix, docs, chore, refactor, test, ci. Scope is typically the adapter name (e.g., lighteval, guidellm, clear, mteb).

Architecture

Adapter structure

Every adapter lives in adapters/<name>/ with a consistent layout:

main.py -- single-file adapter implementing FrameworkAdapter.run_benchmark_job(config, callbacks) -> JobResults
provider.yaml -- declares the adapter to eval-hub (id, runtime resources, benchmarks, parameters). CI validates this file on new-adapter PRs.
meta/job.json -- sample JobSpec used by tests
tests/ -- pytest suite; conftest.py sets up fixtures, test_adapter.py tests the happy-path plumbing by monkeypatching the framework execution (not the evalhub-sdk layer)
Containerfile, requirements.txt, requirements-test.txt

FrameworkAdapter contract (from evalhub-sdk)

Each adapter subclass:

Receives a JobSpec (benchmark id, model config, parameters) and JobCallbacks
Reports progress through a fixed lifecycle of phases: INITIALIZING -> LOADING_DATA -> RUNNING_EVALUATION -> POST_PROCESSING -> PERSISTING_ARTIFACTS
Invokes the underlying framework (typically via subprocess CLI)
Extracts metrics into EvaluationResult objects and computes an overall_score
Optionally persists detailed results as OCI artifacts via callbacks.create_oci_artifact()
Returns JobResults

Adding a new adapter

Create adapters/<name>/ with the files above
Add provider.yaml (see adapters/mteb/provider.yaml for an annotated example)
Add build/push/test targets to the root Makefile
Add a CI workflow at .github/workflows/test-<name>.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project overview

Build and test commands

Commit conventions

Architecture

Adapter structure

FrameworkAdapter contract (from evalhub-sdk)

Adding a new adapter

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project overview

Build and test commands

Commit conventions

Architecture

Adapter structure

FrameworkAdapter contract (from evalhub-sdk)

Adding a new adapter