Adding a New EvalHub Agent Integration

How to add a new agent to the EvalHub on-cluster evaluation pipeline.

For behavioral test coverage (pytest-based, inner loop), see Adding Behavioral Tests. For the full adapter architecture and end-to-end walkthrough, see the EvalHub Adapter README.

Prerequisites

Agent is deployed with /chat/completions (JSON + SSE) and /health
EvalHub adapter provider is registered
Push access to a container registry

1. Create Fixture Queries

mkdir -p agents/<framework>/<agent_name>/evalhub

Create evalhub/tool_use.yaml:

queries:
  - query: "A question that should trigger tool_a"
    expected_tools: ["tool_a"]
    expected_elements: ["keyword_from_tool_output"]

  - query: "A question that should trigger both tools"
    expected_tools: ["tool_a", "tool_b"]
    expected_elements: ["keyword_a", "keyword_b"]

  - query: "Hello, how are you today?"
    expected_tools: []
    expected_elements: []

expected_tools must match the agent's @tool function names exactly. Include at least one no-tool query and one multi-tool query.

Existing fixtures:

agents/langgraph/react_agent/evalhub/tool_use.yaml
agents/vanilla_python/openai_responses_agent/evalhub/tool_use.yaml

2. Add COPY Line to Containerfile

In evals/evalhub_adapter/Containerfile, add a COPY for your fixtures and extend the build-time assertion:

COPY agents/<framework>/<agent_name>/evalhub/ ./fixtures/<short_name>/

RUN python -c "from pathlib import Path; assert Path('fixtures/<short_name>/tool_use.yaml').exists()"

<short_name> should be unique (e.g. crewai_websearch).

3. Create Eval Submission YAML

Create evals/evalhub_adapter/eval-<agent_name>.yaml:

name: agentic-tool-use-<agent-name>
description: EvalHub orchestration run for <framework> <agent_name>
model:
  name: <framework>-<agent-name>
  url: https://<agent-route>
benchmarks:
  - id: agentic-tool-use
    provider_id: <provider-id-from-registration>
    parameters:
      known_tools: ["tool_a", "tool_b"]
      forbidden_actions: ["shell execution"]
      max_latency_seconds: 8.0
      timeout_seconds: 45.0
      verify_ssl: true
      fixtures_path: fixtures/<short_name>
      mlflow_tracking_uri: https://<mlflow-route>
      mlflow_experiment_name: <unique-run-experiment>
      mlflow_trace_experiment_name: <agent-experiment>

model.url — agent base URL, not the /chat/completions path
fixtures_path — must match <short_name> from step 2
provider_id — from evalhub providers list

See evals/evalhub_adapter/eval-react-agent.yaml.example and eval-openai-responses-agent.yaml.example for working examples. Full parameter reference is in the adapter README.

4. Rebuild and Push the Adapter Image

IMAGE_TAG=$(git rev-parse --short HEAD)
ADAPTER_IMAGE="quay.io/<your-user>/evalhub-agentic-adapter:${IMAGE_TAG}"

podman build -t "${ADAPTER_IMAGE}" -f evals/evalhub_adapter/Containerfile .
podman push "${ADAPTER_IMAGE}"

Re-register the provider if the image tag changed.

5. Submit and Verify

evalhub eval run --config evals/evalhub_adapter/eval-<agent_name>.yaml --wait --poll-interval 5
evalhub eval results <job-id> --format json

Metrics and result interpretation are documented in the adapter README.

Files Changed

File	Action
`agents/<framework>/<agent_name>/evalhub/tool_use.yaml`	Create
`evals/evalhub_adapter/Containerfile`	Edit — add `COPY` + assertion
`evals/evalhub_adapter/eval-<agent_name>.yaml`	Create
`evals/evalhub_adapter/README.md`	Edit — note new agent under "What works now"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a New EvalHub Agent Integration

Prerequisites

1. Create Fixture Queries

2. Add COPY Line to Containerfile

3. Create Eval Submission YAML

4. Rebuild and Push the Adapter Image

5. Submit and Verify

Files Changed

FilesExpand file tree

adding-evalhub-agent-integration.md

Latest commit

History

adding-evalhub-agent-integration.md

File metadata and controls

Adding a New EvalHub Agent Integration

Prerequisites

1. Create Fixture Queries

2. Add COPY Line to Containerfile

3. Create Eval Submission YAML

4. Rebuild and Push the Adapter Image

5. Submit and Verify

Files Changed