Skip to content

Commit 0cabf1f

Browse files
committed
Add PAN'26 tasks
1 parent 2cddcce commit 0cabf1f

File tree

10 files changed

+1011
-14
lines changed

10 files changed

+1011
-14
lines changed

_includes/organizations/clef-organizations-section.html

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,13 @@
124124
</a>
125125
</div>
126126
{% endif %}
127+
{% if include.year == 2026 %}
128+
<div>
129+
<a href="https://clef2026.clef-initiative.eu/" target="_blank">
130+
<img src="{{ '/img/organizations/logo-clef26.svg' | relative_url }}" alt="CLEF Jena 2026 logo">
131+
</a>
132+
</div>
133+
{% endif %}
127134
{% if include.year >= 2010 %}
128135
<div>
129136
<a href="http://www.clef-initiative.eu/" target="_blank">
249 KB
Loading

clef26/pan26-web/generated-content-analysis.html

Lines changed: 238 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
---
2+
layout: default
3+
nav_active: shared-tasks
4+
title: PAN at CLEF 2026 - Generated Plagiarism Detection
5+
description: PAN at CLEF 2026 - Generated Plagiarism Detection
6+
---
7+
<nav class="uk-container">
8+
<ul class="uk-breadcrumb">
9+
<li><a href="../../index.html">PAN</a></li>
10+
<li><a href="../../shared-tasks.html">Shared Tasks</a></li>
11+
<li class="uk-disabled"><a href="#">Generated Plagiarism Detection</a></li>
12+
</ul>
13+
</nav>
14+
15+
<main class="uk-section uk-section-default">
16+
<div class="uk-container">
17+
<div class="uk-container uk-margin-small">
18+
<div>
19+
<h1 class="uk-margin-remove-top">Generative Plagiarism Detection 2026</h1>
20+
<ul class="uk-list">
21+
<li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#synopsis">Synopsis</a></li>
22+
<li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#task">Task Overview</a></li>
23+
<li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#data">Data</a></li>
24+
<!-- <li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#submission">Submission</a></li>-->
25+
<li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#results">Results</a></li>
26+
<li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#related-work">Related Work</a></li>
27+
<li><span data-uk-icon="chevron-down"></span><a class="uk-margin-small-right" href="#task-committee">Task Committee</a></li>
28+
</ul>
29+
</div>
30+
</div>
31+
32+
33+
<div class="uk-container uk-margin-medium">
34+
<h2 id="synopsis">Synopsis</h2>
35+
<ul>
36+
<li>Task: Given a pair of documents, your task is to identify all contiguous maximal-length passages of reused text between them.</li>
37+
<li>Important dates:
38+
<ul>
39+
<li><strong>May 07, 2026:</strong> software submission</li>
40+
<li><strong>May 28, 2026:</strong> participant notebook submission
41+
[<a href="../../pan-notebook-paper-template/pan-notebook-paper-template.zip">template</a>]
42+
[<a href="https://easychair.org/conferences/?conf=clef2026">submission</a>&nbsp; – <em>select "Stylometry and Digital Text Forensics (PAN)"</em> ]</li>
43+
</ul>
44+
</li>
45+
<!-- <li>Input: [<a href="{{ 'data.html#pan25-text-alignment' | relative_url }}">data</a>].</li>-->
46+
<!-- <li>Baselines: [<a href="https://github.com/pan-webis-de/pan-code/blob/master/clef25/generated-plagiarism-detection" target="_blank">code</a>].</li>-->
47+
<!-- <li>Evaluation: [<a href="https://github.com/pan-webis-de/pan-code/blob/master/clef25/generated-plagiarism-detection/evaluation" target="_blank">code</a>].</li>-->
48+
<!-- <li>Submission: Deployment on TIRA [<a href="https://www.tira.io/task-overview/pan25-generated-plagiarism-detection">submit</a>]</li>-->
49+
</ul>
50+
51+
<h2 id="task">Task Overview</h2>
52+
<p>
53+
To develop your software, we provide you with a training and validation corpus that consists of pairs of
54+
documents, one of which may contain passages of text resued from the other. The reused text is
55+
subject to automatic LLM paraphrasing to hide the fact it has been reused. Multiple LLMs have been utilized
56+
and the documents may contain additional genuine LLM paraphrased text (i.e., it is not reused).
57+
The input and output formats are the same as in previous text-alignment tasks.
58+
<a href="clef14/pan14-web/text-alignment.html">Learn more »</a>
59+
</p>
60+
61+
62+
<h2 id="data">Data</h2>
63+
<p>The dataset is available via <a href="https://zenodo.org/records/14969012">Zenodo</a>.
64+
Please register first at <a href="https://www.tira.io/task-overview/pan25-generated-plagiarism-detection">Tira</a>.
65+
The dataset contains copyrighted material and may be used only for research purposes. <strong>No redistribution allowed.</strong></p>
66+
67+
<p>Enclosed in the train and validation corpora, two folders are found: (1) the text data and (2) the annotation data (<code>_truths</code> postfix).
68+
<ul>
69+
<li>Text Data: contains a <code>pairs</code> file which lists all pairs of suspicious documents (in the <code>susp</code> folder) and source documents (in the <code>src</code> folder) to be compared.</li>
70+
<li>Annotation Data: contains XML files for each pair in the <code>pairs</code> file providing information about the locations and its source of reused texts.</li>
71+
</ul>
72+
73+
The annotation data contains the following information that should be used for training:</p>
74+
<pre class="prettyprint lang-xml" style="overflow-x:auto"><nobr>&lt;document reference="suspicious-documentXYZ.txt"&gt;</nobr>
75+
&lt;feature
76+
name="plagiarism"
77+
this_offset="5"
78+
this_length="1000"
79+
&nbsp;&nbsp;<nobr>source_reference="source-documentABC.txt"</nobr>
80+
source_offset="100"
81+
source_length="1000"
82+
...
83+
/&gt;
84+
&lt;feature
85+
name="altered"
86+
this_offset="5"
87+
this_length="1000"
88+
&nbsp;&nbsp;<nobr>source_reference="source-documentABC.txt"</nobr>
89+
...
90+
/&gt;
91+
...
92+
&lt;/document&gt;</pre>
93+
<p>The <code>plagiarism</code> feature specifies an aligned passage of text between <code>suspicious-documentXYZ.txt</code>
94+
and <code>source-documentABC.txt</code>, and that it is of length 1000 characters, starting at
95+
character offset 5 in the suspicious document and at character offset 100 in the source
96+
document. The other attributes are used to allow for a more detailed analysis of the results and can be ignored for training.</p>
97+
98+
<p>The <code>altered</code> feature specifies the location of paraphrased text that was not reused (no plagiarism). This allows
99+
to distinguish between genuine LLM generated texts and reused text. For the evaluation, only the <code>plagiarism</code> features
100+
need to be predicted.</p>
101+
102+
<p>For each pair <code>suspicious-documentXYZ.txt</code> and <code>source-documentABC.txt</code> in the <code>pairs</code> file,
103+
your plagiarism detector shall output an XML file <code>suspicious-documentXYZ-source-documentABC.xml</code>
104+
which specifies the location of the plagiarism cases detected within. The name of the feature should be <code>detected-plagiarism</code>
105+
and specify the offsets and lengths in the suspicious and the source document. No other attributes are evaluated. For example:</p>
106+
<pre class="prettyprint lang-xml" style="overflow-x:auto"><nobr>&lt;document reference="suspicious-documentXYZ.txt"&gt;</nobr>
107+
&lt;feature
108+
name="detected-plagiarism"
109+
this_offset="5"
110+
this_length="1000"
111+
&nbsp;&nbsp;<nobr>source_reference="source-documentABC.txt"</nobr>
112+
source_offset="100"
113+
source_length="1000"
114+
/&gt;
115+
&lt;feature ... /&gt;
116+
...
117+
&lt;/document&gt;</pre>
118+
<p>For evaluation, the offset and length attributes <code>detected-plagiarism</code> features will be compared against the <code>plagiarism</code> features in the annotation data.
119+
No other information will be evaluated.</p>
120+
121+
<h2 id="results">Results</h2>
122+
tba.
123+
124+
125+
<h2 id="related-work">Related Work</h2>
126+
<ol>
127+
<li>
128+
<a href="{{ 'publications.html#?q=2014%20plagiarism%20potthast' | relative_url }}">Plagiarism Detection, PAN @ CLEF'14</a>
129+
</li>
130+
<li>
131+
<a href="{{ 'publications.html#?q=2013%20plagiarism%20potthast' | relative_url }}">Plagiarism Detection, PAN @ CLEF'13</a>
132+
</li>
133+
<li>
134+
<a href="{{ 'publications.html#?q=2012%20plagiarism%20potthast' | relative_url }}">Plagiarism Detection, PAN @ CLEF'12</a>
135+
</li>
136+
<li>
137+
<a href="{{ 'publications.html#?q=2011%20plagiarism%20potthast' | relative_url }}">Plagiarism Detection, PAN @ CLEF'11</a>
138+
</li>
139+
<li>
140+
<a href="{{ 'publications.html#?q=2010%20plagiarism%20potthast' | relative_url }}">Plagiarism Detection, PAN @ CLEF'10</a>
141+
</li>
142+
<li>
143+
<a href="{{ 'publications.html#?q=2009%20plagiarism%20potthast' | relative_url }}">Plagiarism Detection, PAN @ SEPLN'09</a>
144+
</li>
145+
</ol>
146+
147+
<h2 id="task-committee">Task Committee</h2>
148+
<div data-uk-grid class="uk-grid uk-grid-match uk-grid-small thumbnail-card-grid">
149+
{% include people-cards/greinerpetter.html %}
150+
{% include people-cards/philipwahle.html %}
151+
{% include people-cards/ruas.html %}
152+
{% include people-cards/gipp.html %}
153+
</div>
154+
<div class="uk-container uk-padding-large uk-padding-remove-bottom">
155+
{% include organizations/clef-organizations-section.html year=2026 %}
156+
</div>
157+
</div>
158+
</div>
159+
</main>

0 commit comments

Comments
 (0)