This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
eval-hub-contrib is a collection of community-contributed evaluation framework adapters for the eval-hub service. Each adapter under adapters/ wraps an external evaluation framework (LightEval, GuideLLM, MTEB, IBM CLEAR) and implements the FrameworkAdapter pattern from the evalhub-sdk.
Container images are built with podman by default (override with BUILD_TOOL=docker):
make image-lighteval # build one adapter
make images # build all adapters
make push-lighteval REGISTRY=quay.io/your-org VERSION=v1.0.0Each adapter has its own Python venv, dependencies, and pytest suite:
make test-lighteval # run one adapter's tests
make test-clear # run one adapter's tests
make tests # run all adapter testsTo run a single test file or test case manually:
cd adapters/lighteval
python3 -m venv .venv && .venv/bin/pip install -r requirements.txt -r requirements-test.txt
.venv/bin/pytest tests/test_adapter.py -v # all tests in file
.venv/bin/pytest tests/test_adapter.py::test_lighteval_happy_path -v # single testThis project enforces Conventional Commits via commitizen. A pre-commit hook validates commit messages (install with pre-commit install --hook-type commit-msg). CI will also reject non-conforming messages on PRs to main.
Format: <type>(<scope>): <subject>
Common types: feat, fix, docs, chore, refactor, test, ci. Scope is typically the adapter name (e.g., lighteval, guidellm, clear, mteb).
Every adapter lives in adapters/<name>/ with a consistent layout:
main.py-- single-file adapter implementingFrameworkAdapter.run_benchmark_job(config, callbacks) -> JobResultsprovider.yaml-- declares the adapter to eval-hub (id, runtime resources, benchmarks, parameters). CI validates this file on new-adapter PRs.meta/job.json-- sampleJobSpecused by teststests/-- pytest suite;conftest.pysets up fixtures,test_adapter.pytests the happy-path plumbing by monkeypatching the framework execution (not the evalhub-sdk layer)Containerfile,requirements.txt,requirements-test.txt
Each adapter subclass:
- Receives a
JobSpec(benchmark id, model config, parameters) andJobCallbacks - Reports progress through a fixed lifecycle of phases:
INITIALIZING->LOADING_DATA->RUNNING_EVALUATION->POST_PROCESSING->PERSISTING_ARTIFACTS - Invokes the underlying framework (typically via subprocess CLI)
- Extracts metrics into
EvaluationResultobjects and computes anoverall_score - Optionally persists detailed results as OCI artifacts via
callbacks.create_oci_artifact() - Returns
JobResults
- Create
adapters/<name>/with the files above - Add
provider.yaml(seeadapters/mteb/provider.yamlfor an annotated example) - Add build/push/test targets to the root
Makefile - Add a CI workflow at
.github/workflows/test-<name>.yml