GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates

This repository contains the manual annoations for GePaDeSE (including annotation guidelines) as well as all code to train and evaluate the Sitation Entity (SE) classifier for our LREC 2026 submission GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates.

Structure

gepadese
├── configs
│   ├── eval.yaml
│   └── train.yaml
├── data
│   ├── json
│   └── pkl
├── guidelines
│   └── Guidelines_SitEnt_in_Parliamentary_Debates.pdf
├── predictions
│   ├── BERT_multi-label-run-65-69
│   └── README.md
├── scripts
│   ├── data_preproc.py
│   ├── eval.py
│   ├── metrics.py
│   ├── SitEnt.py
│   ├── train.py
│   ├── utils.py
├── splits
│   ├── dev_split.txt
│   ├── splits.tsv
│   ├── test_split.txt
│   └── train_split.txt
├── eval.sh
├── README.md
├── requirements.txt
└── train.sh

Distribution of SE types in the GePaDeSE corpus

Situation Entities	A1	A2	Avg.
State	12,987	11,655	12,321.0
Generic	2,012	2,708	2,360.0
Event	1,406	2,885	2,145.5
Generalizing	942	1,062	1,002.0
Event-Perfect-State	1,286	259	772.5
Question	447	444	445.5
Imperative	345	335	340.0
Report	251	328	289.5
Total	19,676	19,676	19,676

Abstract Entities	A1	A2	Avg.
Proposition	315	222	268.5
Fact	301	123	212.0

The columns A1 and A2 show the number of instances for annotator 1 and 2, respectively.
The last column displays the average number of instances for each SE type.

SE Classifier Usage

Install the required dependencies:

pip install -r requirements.txt

1. Running an exisiting SE Classifier

TODO

2. Training a new SE Classifier

You can train a new SE classifier either using SLURM or by running the training script directly.

Option 1: Using SLURM Submit the training job:

sbatch ./train.sh

Make sure to modify train.sh to match your environment.

Option 2: Using Python Run the training script directly:

python scripts/train.py configs/train.yaml /path/to/cache_dir <run_id> <seed>

Replace:

/path/to/cache_dir: directory where datasets and model checkpoints are cached.
run_id: a name/string to identify this run.
seed: the random seed to use for reproducibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates

Structure

Distribution of SE types in the GePaDeSE corpus

SE Classifier Usage

1. Running an exisiting SE Classifier

2. Training a new SE Classifier

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
data		data
guidelines		guidelines
predictions		predictions
scripts		scripts
splits		splits
README.md		README.md
eval.sh		eval.sh
requirements.txt		requirements.txt
train.sh		train.sh

umanlp/gepadese

Folders and files

Latest commit

History

Repository files navigation

GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates

Structure

Distribution of SE types in the GePaDeSE corpus

SE Classifier Usage

1. Running an exisiting SE Classifier

2. Training a new SE Classifier

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages