Skip to content

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #7451

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #7451

Triggered via pull request October 27, 2025 13:32
Status Failure
Total duration 1m 28s
Artifacts

library_tests.yml

on: pull_request
unittests
53s
unittests
Fit to window
Zoom out
Zoom in

Annotations

1 error
unittests
Process completed with exit code 1.