This guide shows the full submission flow from run outputs to leaderboard ingestion.
Your run output should contain task folders in this shape:
<run_output_dir>/
<task_id>/
agent_response.json
network.har
You can submit partial coverage. Package creation reports leaderboard coverage counts for valid, incomplete, and missing tasks.
uvx webarena-verified create-submission-pkg \
--run-output-dir ./output \
--output ./my-submission \
--leaderboard bothIf you prefer, you can run the same CLI through a local install, uv run, or Docker.
The --output path is the submission package directory itself. If it already exists, the command will fail unless you pass --force to overwrite it.
Expected result: a ./my-submission/ folder containing task folders, submission.json, and manifest.json.
create-submission-pkg now embeds coverage stats in submission.json under packaged_tasks:
valid: tasks with both required filesincomplete: tasks with exactly one required filemissing: tasks with no files or no task directoryexpected: total expected tasks for the leaderboard scope
Open submission.json and replace the placeholder values for name, model, reference, and contact_email:
{
"name": "MySystem-v1",
"model": "gpt-4.1-mini",
"leaderboard": "both",
"reference": "https://example.com/paper",
"code_repository": "https://github.com/org/repo",
"contact_email": "team@example.com",
"packaged_tasks": {
"full": {
"valid": 750,
"incomplete": 12,
"missing": 50,
"expected": 812
},
"hard": {
"valid": 230,
"incomplete": 5,
"missing": 23,
"expected": 258
}
}
}| Field | Required | Description |
|---|---|---|
name |
Yes | Submission name (e.g. MySystem-v1) |
model |
Yes | Model identifier used for this submission (e.g. gpt-4.1-mini) |
leaderboard |
Yes | Auto-filled from create-submission-pkg --leaderboard; do not change unless you recreate the package |
reference |
Yes | HTTP(S) URL to paper or model reference |
code_repository |
No | HTTP(S) URL to the agent code repository |
contact_email |
Yes | Contact email used only for submission-maintenance communication |
packaged_tasks |
Yes | Auto-filled coverage summary; do not edit |
!!! info "How contact_email is used"
contact_email is used only to contact the submission author when a modification to the submission is required.
Maintainers may use it to verify that modification requests come from the original author.
If you prefer not to share a real address, you can use a dummy value.
uvx webarena-verified submit \
--submission-dir ./my-submissionWhat this command does:
- Validates the package and reads your
submission.json. - Regenerates
manifest.jsonfor integrity. - Uploads the payload to HuggingFace and creates a dataset PR.
Expected output includes the HuggingFace PR URL, for example:
PR URL: https://huggingface.co/datasets/<org>/<repo>/discussions/<N>
!!! info "Authentication"
The submit command requires HuggingFace authentication. Use either method:
```bash
# Option 1: Login via CLI (persistent)
hf auth login
# Option 2: Set token as environment variable
export HF_TOKEN=hf_...
```
The automated ingestion pipeline runs every 30 minutes:
flowchart LR
A[Submit CLI] --> B[HF dataset PR created]
B --> C[Ingestion job validates payload]
C --> D[Deterministic evaluation by task and site]
D --> E[Canonical records updated]
E --> F[Leaderboard artifacts published]
If your submission fails ingestion, review the PR payload and retry with a corrected package. For support, open an issue in the WebArena-Verified repository.