You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>We highly recommend to start from our example systems: for Sub-task 1 <ahref="https://github.com/touche-webis-de/touche-code/tree/main/clef25/retrieval-augmented-debating/debating-systems/basic-elastic-js">in JavaScript</a>, <ahref="https://github.com/touche-webis-de/touche-code/tree/main/clef25/retrieval-augmented-debating/debating-systems/basic-elastic-py">in Python</a> and for sub-task 2 <ahref="https://github.com/touche-webis-de/touche-code/tree/main/clef25/retrieval-augmented-debating/evaluation-systems/evaluation-1-baseline-py">in Python</a>]. They all provide endpoints for a <ahref="https://github.com/webis-de/GenIRSim">GenIRSim</a> service that runs in the background and is automatically started as they extend our <ahref="https://github.com/touche-webis-de/touche-code/tree/main/clef25/retrieval-augmented-debating/debating-systems/base">base image</a>. By adapting the examples, you do not need to care about such a service and can focus on providing the endpoints, e.g., for sub-task 1 <ahref="https://github.com/touche-webis-de/touche-code/blob/ee6056630141737a54287ec9b25c2b4a9f936a51/clef25/retrieval-augmented-debating/debating-systems/basic-elastic-js/index.js#L2">in JavaScript</a> and <ahref="https://github.com/touche-webis-de/touche-code/blob/ee6056630141737a54287ec9b25c2b4a9f936a51/clef25/retrieval-augmented-debating/debating-systems/basic-elastic-py/main.py#L18">in Python</a> and for sub-task 2 <ahref="https://github.com/touche-webis-de/touche-code/blob/b9138c906f2b716a6922a1bdf3543ab81e64c1bd/clef25/retrieval-augmented-debating/evaluation-systems/evaluation-1-baseline-py/main.py#L32-L70">in Python</a>.</p>
272
272
273
273
274
+
275
+
<h2id="results">Results</h2>
276
+
277
+
<h3id="results-subtask-1">Sub-Task 1</h3>
278
+
<p>We report results of sub-task 1 as the percentage of responses in the test debates that fulfill the specific criterion (quantity, quality, relation, or manner).</p>
279
+
280
+
<divclass='uk-overflow-auto'><tableclass='uk-table uk-table-divider uk-table-small uk-table-hover sortable'><caption>Submitted run of each team for sub-task 1.</caption>
<p>We report results of sub-task 2 as precision (P), recall (R) and F1-score for the task of classifying for each response in the test debates whether it fulfills the specific criterion (quantity, quality, relation, or manner).</p>
294
+
295
+
<divclass='uk-overflow-auto'><tableclass='uk-table uk-table-divider uk-table-small uk-table-hover sortable'><caption>Submitted run of each team for sub-task 2.</caption>
296
+
<thead><tr><thclass='header'><span>Team</span></th><th>Run</th><th>Score (F1)</th><thcolspan="3" class='header' style="text-align: center">Quantity</th><thcolspan="3" style="text-align: center" class='header'>Quality</th><thstyle="text-align: center" colspan="3" class='header'>Relation</th><thstyle="text-align: center" colspan="3" class='header'>Manner</th></tr>
0 commit comments