Legal Rhetorical Role Classification using Sequence-to-Sequence Models

Context

This implementation was developed as part of the Legal Data Science Lab (LDSI_LAB) at Technical University of Munich (TUM) for the Master's in Informatics program (SS22). The work explores innovative approaches to legal NLP by applying sequence-to-sequence models to the structured analysis of Indian legal judgment documents, contributing to the broader effort of automating legal document processing in populous judicial systems.

Overview

This module implements a sequence-to-sequence approach for automatic rhetorical role classification in Indian legal judgment documents. The system uses transformer models (specifically T5-small, a sequence-to-sequence transformer) to predict rhetorical roles for individual sentences in legal documents. The approach treats the classification task as a text generation problem, where the model learns to generate appropriate rhetorical role labels given input legal text segments.

Methodology and Implementation

For this Praktikum, multiple methods were systematically evaluated and tested:

Sequential T5 Transformers: The primary focus of this repository, implementation of T5 model fine tuning for rhetorical role prediction
BiLSTM-CRF: Implemented using the approach from https://github.com/Law-AI/semantic-segmentation
BERT-HSLN: Implemented based on the methodology from https://github.com/Legal-NLP-EkStep/rhetorical-role-baseline

This repository specifically focuses on the T5 fine-tuning approach, while comprehensive results comparing all methods are available in the project report manuscript (PDF) included in the repo.

Rhetorical Roles in Legal Documents

Legal judgment documents can be segmented into topically coherent semantic units called Rhetorical Roles (RRs). These roles help structure legal documents for better organization, search, and automated processing.

Primary Dataset Details

Corpus for Automatic Structuring of Legal Documents

Prathamesh Kalamkar1,2,∗ , Aman Tiwari1,2,∗ , Astha Agarwal1,2,∗ , Saurabh Karn3,∗,Smita Gupta3, Vivek Raghavan1, Ashutosh Modi4
1EkStep Foundation, 2Thoughtworks Technologies India Pvt Ltd., 3Agami , 4 Indian Institute of Technology Kanpur (IIT-K) {prathamk, aman.tiwari, astha.agarwal}@thoughtworks.com, {saurabh, smita}@agami.in, vivek@ekstep.org, ashutoshm@cse.iitk.ac.in

Source: Indian Supreme Court judgment documents in English
Corpus Size: 354 legal documents with 40,305 annotated sentences
Annotation: 12 different rhetorical role categories
Granularity: Sentence-level annotations
Origin: Part of the BUILDNyAI project for legal NLP corpus development

Technical Approach

Sequence-to-Sequence Framework

Unlike traditional classification approaches, this implementation treats rhetorical role prediction as a text generation task:

Input: Legal sentence text (tokenized)
Processing: T5 encoder-decoder architecture
Output: Generated rhetorical role label
Advantage: Leverages pre-trained language model's understanding of legal language

Model Architecture

Base Model: T5-small (Text-to-Text Transfer Transformer)
Task Formulation: Sentence → Rhetorical Role Label generation
Preprocessing: Custom spaCy-based tokenization for legal text
Post-processing: Label extraction and cleaning

Results & Findings:

Model Performance

We evaluated the T5 sequence-to-sequence model on multiple datasets to assess its effectiveness for rhetorical role classification in legal documents. The model was trained with limited computational resources (Google Colab) with linmited training time on free plan, constraining the number of training epochs.

Performance on Legal Datasets

Kalamkar et al. Dataset:

https://arxiv.org/abs/2201.13125

Training Duration: 5 epochs
Accuracy: 0.503
Macro F1: 0.219
Weighted F1: 0.448

Afterwards we tested our approaches on a secondary dataset by Bhattacharya et al.

Bhattacharya et al. Dataset:

http://arxiv.org/abs/1911.05405

Training Duration: 6 epochs
Accuracy: 0.363
Macro F1: 0.246
Weighted F1: 0.356

Comparative Analysis:

The T5 sequence-to-sequence model demonstrated notably better performance on the Kalamkar dataset compared to the Bhattacharya dataset across all evaluation metrics. The Kalamkar dataset yielded approximately 39% higher accuracy (0.503 vs 0.363) and 26% higher weighted F1-score (0.448 vs 0.356).

Impact of Dataset Size

The performance disparity between datasets can be attributed primarily to the significant difference in training data availability. The Bhattacharya dataset contains only 50 documents, which is substantially smaller compared to the Kalamkar dataset. This limited training data severely constrains the model's ability to learn effective representations for rhetorical role classification. To validate this hypothesis, we conducted additional experiments on the PubMed20k RCT dataset, which contains a much larger corpus for sequential sentence classification: PubMed20k RCT Results:

Accuracy: 0.752
Macro F1: 0.682
Weighted F1: 0.747

The superior performance on PubMed20k RCT (49% improvement in accuracy over Kalamkar and 107% improvement over Bhattacharya) strongly corroborates that dataset size is a critical factor in the T5 model's performance for sequential sentence classification tasks.

Limitations and Constraints

The experimental results should be interpreted with the following limitations in mind:

Due to compute resource constraints T5-small was used which has much lower trainable parameters compared to larger models which hindered achieving high accuracy.
Limited Training Epochs: Due to computational resource constraints, models were trained for only 5-6 epochs, which may be insufficient for full convergence.
Hyperparameter Optimization: The scope of hyperparameter tuning was restricted due to high computational requirements and limited compute provisioning.
Early Stopping: Training was terminated early due to resource usage limits, potentially preventing the models from reaching optimal performance.

Key Findings:

Dataset Size Dependency: The T5 sequence-to-sequence approach shows strong sensitivity to training data size, with performance scaling significantly with larger datasets.
Legal Domain Challenges: Performance on legal datasets (Kalamkar and Bhattacharya) was notably lower than on biomedical abstracts (PubMed20k), suggesting domain-specific challenges in legal text processing.
Resource Requirements: The high computational demands of T5-small based sequence-to-sequence models present practical constraints for legal NLP applications with limited resources.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Notebooks		Notebooks
.gitignore		.gitignore
Legal_Data_Analysis_Lab_Sefat_and_Chowdhury.pdf		Legal_Data_Analysis_Lab_Sefat_and_Chowdhury.pdf
Paheli_Hugging_Face.ipynb		Paheli_Hugging_Face.ipynb
Seq2Seq_Pubmed_20k.ipynb		Seq2Seq_Pubmed_20k.ipynb
data_utils.py		data_utils.py
model_utils.py		model_utils.py
readme.md		readme.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal Rhetorical Role Classification using Sequence-to-Sequence Models

Context

Overview

Methodology and Implementation

Rhetorical Roles in Legal Documents

Primary Dataset Details

Corpus for Automatic Structuring of Legal Documents

The 12 Rhetorical Role Categories:

Technical Approach

Sequence-to-Sequence Framework

Model Architecture

Results & Findings:

Model Performance

Performance on Legal Datasets

Kalamkar et al. Dataset:

Bhattacharya et al. Dataset:

Comparative Analysis:

Impact of Dataset Size

Limitations and Constraints

Key Findings:

About

Uh oh!

Releases

Packages

Languages

chaoSefat/ldsi-lab

Folders and files

Latest commit

History

Repository files navigation

Legal Rhetorical Role Classification using Sequence-to-Sequence Models

Context

Overview

Methodology and Implementation

Rhetorical Roles in Legal Documents

Primary Dataset Details

Corpus for Automatic Structuring of Legal Documents

The 12 Rhetorical Role Categories:

Technical Approach

Sequence-to-Sequence Framework

Model Architecture

Results & Findings:

Model Performance

Performance on Legal Datasets

Kalamkar et al. Dataset:

Bhattacharya et al. Dataset:

Comparative Analysis:

Impact of Dataset Size

Limitations and Constraints

Key Findings:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages