Skip to content

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #7498

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #7498

Re-run triggered October 27, 2025 18:19
Status Failure
Total duration 1m 1s
Artifacts

catalog_preparation.yml

on: pull_request
Matrix: preparation
Fit to window
Zoom out
Zoom in

Annotations

15 errors
preparation (0)
Process completed with exit code 1.
preparation (2)
The strategy configuration was canceled because "preparation._0" failed
preparation (2)
Process completed with exit code 1.
preparation (3)
The strategy configuration was canceled because "preparation._0" failed
preparation (3)
The operation was canceled.
preparation (5)
The strategy configuration was canceled because "preparation._0" failed
preparation (5)
The operation was canceled.
preparation (7)
The strategy configuration was canceled because "preparation._0" failed
preparation (7)
The operation was canceled.
preparation (4)
The strategy configuration was canceled because "preparation._0" failed
preparation (4)
The operation was canceled.
preparation (1)
The strategy configuration was canceled because "preparation._0" failed
preparation (1)
The operation was canceled.
preparation (6)
The strategy configuration was canceled because "preparation._0" failed
preparation (6)
The operation was canceled.