You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 2026.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,22 +49,26 @@ Participants will be ranked along two primary (character-level) metrics:
49
49
1.**Span Identification**: Intersection-over-Union (IoU) of characters marked as hallucinated in the gold reference vs. predicted
50
50
2.**Confidence Calibration**: Correlation between the probability assigned by a participant's system that a character is hallucinated and the empirical probability observed in our multi-annotator gold data
51
51
52
-
Rankings and submissions will be handled **separately per language**. Participants can also download the scoring program [here](./scorer.py) ${\color{red} BROKEN LINK}$ for reference and system development.
52
+
Rankings and submissions will be handled **separately per language**.
53
+
<!-- Participants can also download the scoring program [here](./scorer.py) ${\color{red} BROKEN LINK}$ for reference and system development. -->
53
54
54
55
#### Dataset Overview
55
56
We provide a curated dataset of 20,000 samples with multiple annotations with a fine-grained, span-level labeling scheme.
56
57
57
58
| Dataset Split | Size | Composition | Access |
58
59
|--------------|------|-------------|--------|
59
-
|**Training set**|~15,200 samples | Outputs from 5 diverse LVLMs, ~3,800 samples per language |[Download](https://a3s.fi/shroom-visions/train.zip) ${\color{red} BROKEN LINK}$ |
60
+
|**Training set**|~15,200 samples | Outputs from 5 diverse LVLMs, ~3,800 samples per language ||
60
61
|**Test set**| 4,800 samples | 1,200 samples per language | Closed test set |
61
62
63
+
Download the annotated training set and the unlabelled test set: [Download data](https://a3s.fi/mickusti-2007780-pub/shroom-visions-data.zip)
Supplementary materials, including annotation guidelines, raw annotations with comments, and image metadata, can be downloaded from [this link](https://a3s.fi/shroom-visions/extra-info.tar.gz) ${\color{red} BROKEN LINK}$.
65
70
$
66
71
67
-
<!--
68
72
We are releasing a participant kit containing:
69
73
- Scoring program and format checker
70
74
- Two baselines: a random baseline and a multimodal transformer-based system
0 commit comments