This implementation was developed as part of the Legal Data Science Lab (LDSI_LAB) at Technical University of Munich (TUM) for the Master's in Informatics program (SS22). The work explores innovative approaches to legal NLP by applying sequence-to-sequence models to the structured analysis of Indian legal judgment documents, contributing to the broader effort of automating legal document processing in populous judicial systems.
This module implements a sequence-to-sequence approach for automatic rhetorical role classification in Indian legal judgment documents. The system uses transformer models (specifically T5-small, a sequence-to-sequence transformer) to predict rhetorical roles for individual sentences in legal documents. The approach treats the classification task as a text generation problem, where the model learns to generate appropriate rhetorical role labels given input legal text segments.
For this Praktikum, multiple methods were systematically evaluated and tested:
- Sequential T5 Transformers: The primary focus of this repository, implementation of T5 model fine tuning for rhetorical role prediction
- BiLSTM-CRF: Implemented using the approach from https://github.com/Law-AI/semantic-segmentation
- BERT-HSLN: Implemented based on the methodology from https://github.com/Legal-NLP-EkStep/rhetorical-role-baseline
This repository specifically focuses on the T5 fine-tuning approach, while comprehensive results comparing all methods are available in the project report manuscript (PDF) included in the repo.
Legal judgment documents can be segmented into topically coherent semantic units called Rhetorical Roles (RRs). These roles help structure legal documents for better organization, search, and automated processing.
Prathamesh Kalamkar1,2,∗
, Aman Tiwari1,2,∗
, Astha Agarwal1,2,∗
, Saurabh Karn3,∗,Smita Gupta3, Vivek Raghavan1, Ashutosh Modi4
1EkStep Foundation, 2Thoughtworks Technologies India Pvt Ltd.,
3Agami , 4
Indian Institute of Technology Kanpur (IIT-K)
{prathamk, aman.tiwari, astha.agarwal}@thoughtworks.com,
{saurabh, smita}@agami.in, vivek@ekstep.org, ashutoshm@cse.iitk.ac.in
- Source: Indian Supreme Court judgment documents in English
- Corpus Size: 354 legal documents with 40,305 annotated sentences
- Annotation: 12 different rhetorical role categories
- Granularity: Sentence-level annotations
- Origin: Part of the BUILDNyAI project for legal NLP corpus development
- Preamble: Document header and case identification
- Facts: Statement of facts and case background
- Arguments: Legal arguments presented by parties
- Statute: Referenced laws and statutory provisions
- Precedent: Cited previous court decisions
- Ratio: Court's reasoning and legal principles
- Ruling: Final decision and orders
- Dissent: Dissenting opinions (if any)
- Concurrence: Concurring opinions
- Analysis: Court's analysis of law and facts
- Issues: Legal issues identified by the court
- Other: Miscellaneous content not fitting other categories
Unlike traditional classification approaches, this implementation treats rhetorical role prediction as a text generation task:
- Input: Legal sentence text (tokenized)
- Processing: T5 encoder-decoder architecture
- Output: Generated rhetorical role label
- Advantage: Leverages pre-trained language model's understanding of legal language
- Base Model: T5-small (Text-to-Text Transfer Transformer)
- Task Formulation: Sentence → Rhetorical Role Label generation
- Preprocessing: Custom spaCy-based tokenization for legal text
- Post-processing: Label extraction and cleaning
We evaluated the T5 sequence-to-sequence model on multiple datasets to assess its effectiveness for rhetorical role classification in legal documents. The model was trained with limited computational resources (Google Colab) with linmited training time on free plan, constraining the number of training epochs.
https://arxiv.org/abs/2201.13125
- Training Duration: 5 epochs
- Accuracy: 0.503
- Macro F1: 0.219
- Weighted F1: 0.448
Afterwards we tested our approaches on a secondary dataset by Bhattacharya et al.
http://arxiv.org/abs/1911.05405
- Training Duration: 6 epochs
- Accuracy: 0.363
- Macro F1: 0.246
- Weighted F1: 0.356
The T5 sequence-to-sequence model demonstrated notably better performance on the Kalamkar dataset compared to the Bhattacharya dataset across all evaluation metrics. The Kalamkar dataset yielded approximately 39% higher accuracy (0.503 vs 0.363) and 26% higher weighted F1-score (0.448 vs 0.356).
The performance disparity between datasets can be attributed primarily to the significant difference in training data availability. The Bhattacharya dataset contains only 50 documents, which is substantially smaller compared to the Kalamkar dataset. This limited training data severely constrains the model's ability to learn effective representations for rhetorical role classification. To validate this hypothesis, we conducted additional experiments on the PubMed20k RCT dataset, which contains a much larger corpus for sequential sentence classification: PubMed20k RCT Results:
- Accuracy: 0.752
- Macro F1: 0.682
- Weighted F1: 0.747
The superior performance on PubMed20k RCT (49% improvement in accuracy over Kalamkar and 107% improvement over Bhattacharya) strongly corroborates that dataset size is a critical factor in the T5 model's performance for sequential sentence classification tasks.
The experimental results should be interpreted with the following limitations in mind:
- Due to compute resource constraints T5-small was used which has much lower trainable parameters compared to larger models which hindered achieving high accuracy.
- Limited Training Epochs: Due to computational resource constraints, models were trained for only 5-6 epochs, which may be insufficient for full convergence.
- Hyperparameter Optimization: The scope of hyperparameter tuning was restricted due to high computational requirements and limited compute provisioning.
- Early Stopping: Training was terminated early due to resource usage limits, potentially preventing the models from reaching optimal performance.
- Dataset Size Dependency: The T5 sequence-to-sequence approach shows strong sensitivity to training data size, with performance scaling significantly with larger datasets.
- Legal Domain Challenges: Performance on legal datasets (Kalamkar and Bhattacharya) was notably lower than on biomedical abstracts (PubMed20k), suggesting domain-specific challenges in legal text processing.
- Resource Requirements: The high computational demands of T5-small based sequence-to-sequence models present practical constraints for legal NLP applications with limited resources.