Skip to content

umanlp/gepadese

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates

This repository contains the manual annoations for GePaDeSE (including annotation guidelines) as well as all code to train and evaluate the Sitation Entity (SE) classifier for our LREC 2026 submission GePaDeSE: A new resource for clause-level aspect in German Parliamentary Debates.


Structure

gepadese
├── configs
│   ├── eval.yaml
│   └── train.yaml
├── data
│   ├── json
│   └── pkl
├── guidelines
│   └── Guidelines_SitEnt_in_Parliamentary_Debates.pdf
├── predictions
│   ├── BERT_multi-label-run-65-69
│   └── README.md
├── scripts
│   ├── data_preproc.py
│   ├── eval.py
│   ├── metrics.py
│   ├── SitEnt.py
│   ├── train.py
│   ├── utils.py
├── splits
│   ├── dev_split.txt
│   ├── splits.tsv
│   ├── test_split.txt
│   └── train_split.txt
├── eval.sh
├── README.md
├── requirements.txt
└── train.sh

Distribution of SE types in the GePaDeSE corpus

Situation Entities A1 A2 Avg.
State 12,987 11,655 12,321.0
Generic 2,012 2,708 2,360.0
Event 1,406 2,885 2,145.5
Generalizing 942 1,062 1,002.0
Event-Perfect-State 1,286 259 772.5
Question 447 444 445.5
Imperative 345 335 340.0
Report 251 328 289.5
Total 19,676 19,676 19,676
Abstract Entities A1 A2 Avg.
Proposition 315 222 268.5
Fact 301 123 212.0

The columns A1 and A2 show the number of instances for annotator 1 and 2, respectively.
The last column displays the average number of instances for each SE type.


SE Classifier Usage

Install the required dependencies:

pip install -r requirements.txt

1. Running an exisiting SE Classifier

TODO

2. Training a new SE Classifier

You can train a new SE classifier either using SLURM or by running the training script directly.

Option 1: Using SLURM Submit the training job:

sbatch ./train.sh

Make sure to modify train.sh to match your environment.

Option 2: Using Python Run the training script directly:

python scripts/train.py configs/train.yaml /path/to/cache_dir <run_id> <seed>

Replace:

  • /path/to/cache_dir: directory where datasets and model checkpoints are cached.
  • run_id: a name/string to identify this run.
  • seed: the random seed to use for reproducibility.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published