Submitting Results

This guide shows the full submission flow from run outputs to leaderboard ingestion.

Prerequisites

Your run output should contain task folders in this shape:

<run_output_dir>/
  <task_id>/
    agent_response.json
    network.har

You can submit partial coverage. Package creation reports leaderboard coverage counts for valid, incomplete, and missing tasks.

Step 1 - Create Submission Package

uvx webarena-verified create-submission-pkg \
  --run-output-dir ./output \
  --output ./my-submission \
  --leaderboard both

If you prefer, you can run the same CLI through a local install, uv run, or Docker.

The --output path is the submission package directory itself. If it already exists, the command will fail unless you pass --force to overwrite it.

Expected result: a ./my-submission/ folder containing task folders, submission.json, and manifest.json.

create-submission-pkg now embeds coverage stats in submission.json under packaged_tasks:

valid: tasks with both required files
incomplete: tasks with exactly one required file
missing: tasks with no files or no task directory
expected: total expected tasks for the leaderboard scope

Step 2 - Edit `submission.json`

Open submission.json and replace the placeholder values for name, model, reference, and contact_email:

{
  "name": "MySystem-v1",
  "model": "gpt-4.1-mini",
  "leaderboard": "both",
  "reference": "https://example.com/paper",
  "code_repository": "https://github.com/org/repo",
  "contact_email": "team@example.com",
  "packaged_tasks": {
    "full": {
      "valid": 750,
      "incomplete": 12,
      "missing": 50,
      "expected": 812
    },
    "hard": {
      "valid": 230,
      "incomplete": 5,
      "missing": 23,
      "expected": 258
    }
  }
}

Field	Required	Description
`name`	Yes	Submission name (e.g. `MySystem-v1`)
`model`	Yes	Model identifier used for this submission (e.g. `gpt-4.1-mini`)
`leaderboard`	Yes	Auto-filled from `create-submission-pkg --leaderboard`; do not change unless you recreate the package
`reference`	Yes	HTTP(S) URL to paper or model reference
`code_repository`	No	HTTP(S) URL to the agent code repository
`contact_email`	Yes	Contact email used only for submission-maintenance communication
`packaged_tasks`	Yes	Auto-filled coverage summary; do not edit

!!! info "How contact_email is used" contact_email is used only to contact the submission author when a modification to the submission is required. Maintainers may use it to verify that modification requests come from the original author. If you prefer not to share a real address, you can use a dummy value.

Step 3 - Submit To Leaderboard

uvx webarena-verified submit \
  --submission-dir ./my-submission

What this command does:

Validates the package and reads your submission.json.
Regenerates manifest.json for integrity.
Uploads the payload to HuggingFace and creates a dataset PR.

Expected output includes the HuggingFace PR URL, for example:

PR URL: https://huggingface.co/datasets/<org>/<repo>/discussions/<N>

!!! info "Authentication" The submit command requires HuggingFace authentication. Use either method:

```bash
# Option 1: Login via CLI (persistent)
hf auth login

# Option 2: Set token as environment variable
export HF_TOKEN=hf_...
```

What Happens Next

The automated ingestion pipeline runs every 30 minutes:

flowchart LR
    A[Submit CLI] --> B[HF dataset PR created]
    B --> C[Ingestion job validates payload]
    C --> D[Deterministic evaluation by task and site]
    D --> E[Canonical records updated]
    E --> F[Leaderboard artifacts published]

If your submission fails ingestion, review the PR payload and retry with a corrected package. For support, open an issue in the WebArena-Verified repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submitting Results

Prerequisites

Step 1 - Create Submission Package

Step 2 - Edit `submission.json`

Step 3 - Submit To Leaderboard

What Happens Next

FilesExpand file tree

submission.md

Latest commit

History

submission.md

File metadata and controls

Submitting Results

Prerequisites

Step 1 - Create Submission Package

Step 2 - Edit submission.json

Step 3 - Submit To Leaderboard

What Happens Next

Step 2 - Edit `submission.json`