`load_config_dataset()` merges configs by positional index — no alignment key. This is safe when all model runs use the same `--seed`/`--max-samples` and the source dataset doesn't change, but fragile.
Add a content hash column (e.g. hash of the source image) during `ocr-bench run` and validate alignment during `judge`.