Skip to content

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #6875

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #6875

Triggered via pull request October 27, 2025 13:32
Status Success
Total duration 8m 13s
Artifacts

catalog_consistency.yml

on: pull_request
Fit to window
Zoom out
Zoom in