This repository contains the manual annoations for GePaDeSE (including annotation guidelines) as well as all code to train and evaluate the Sitation Entity (SE) classifier for our LREC 2026 submission GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates.
gepadese
├── configs
│ ├── eval.yaml
│ └── train.yaml
├── data
│ ├── json
│ └── pkl
├── guidelines
│ └── Guidelines_SitEnt_in_Parliamentary_Debates.pdf
├── predictions
│ ├── BERT_multi-label-run-65-69
│ └── README.md
├── scripts
│ ├── data_preproc.py
│ ├── eval.py
│ ├── metrics.py
│ ├── SitEnt.py
│ ├── train.py
│ ├── utils.py
├── splits
│ ├── dev_split.txt
│ ├── splits.tsv
│ ├── test_split.txt
│ └── train_split.txt
├── eval.sh
├── README.md
├── requirements.txt
└── train.sh
| Situation Entities | A1 | A2 | Avg. |
|---|---|---|---|
| State | 12,987 | 11,655 | 12,321.0 |
| Generic | 2,012 | 2,708 | 2,360.0 |
| Event | 1,406 | 2,885 | 2,145.5 |
| Generalizing | 942 | 1,062 | 1,002.0 |
| Event-Perfect-State | 1,286 | 259 | 772.5 |
| Question | 447 | 444 | 445.5 |
| Imperative | 345 | 335 | 340.0 |
| Report | 251 | 328 | 289.5 |
| Total | 19,676 | 19,676 | 19,676 |
| Abstract Entities | A1 | A2 | Avg. |
|---|---|---|---|
| Proposition | 315 | 222 | 268.5 |
| Fact | 301 | 123 | 212.0 |
The columns A1 and A2 show the number of instances for annotator 1 and 2, respectively.
The last column displays the average number of instances for each SE type.
Install the required dependencies:
pip install -r requirements.txtTODO
You can train a new SE classifier either using SLURM or by running the training script directly.
Option 1: Using SLURM Submit the training job:
sbatch ./train.shMake sure to modify train.sh to match your environment.
Option 2: Using Python Run the training script directly:
python scripts/train.py configs/train.yaml /path/to/cache_dir <run_id> <seed>Replace:
/path/to/cache_dir: directory where datasets and model checkpoints are cached.run_id: a name/string to identify this run.seed: the random seed to use for reproducibility.