Skip to content

Author Page: marten-during #5009

Open
@KeerthanaMurugaraj

Description

@KeerthanaMurugaraj

Author Pages

@inproceedings{murugaraj-etal-2025-mining,
title = "Mining the Past: A Comparative Study of Classical and Neural Topic Models on Historical Newspaper Archives",
author = "Murugaraj, Keerthana and
Lamsiyah, Salima and
During, Marten and
Theobald, Martin",
editor = {H{"a}m{"a}l{"a}inen, Mika and
{"O}hman, Emily and
Bizzoni, Yuri and
Miyagawa, So and
Alnajjar, Khalid},
booktitle = "Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities",
month = may,
year = "2025",
address = "Albuquerque, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.nlp4dh-1.39/",
pages = "452--463",
ISBN = "979-8-89176-234-3",
abstract = "Analyzing historical discourse in large-scale newspaper archives requires scalable and interpretable methods to uncover hidden themes. This study systematically evaluates topic modeling approaches for newspaper articles from 1955 to 2018, comparing probabilistic LDA, matrix factorization NMF, and neural-based models such as Top2Vec and BERTopic across various preprocessing strategies. We benchmark these methods on topic coherence, diversity, scalability, and interpretability. While LDA is commonly used in historical text analysis, our findings demonstrate that BERTopic, leveraging contextual embeddings, consistently outperforms classical models in all tested aspects, making it a more robust choice for large-scale textual corpora. Additionally, we highlight the trade-offs between preprocessing strategies and model performance, emphasizing the importance of tailored pipeline design. These insights advance the field of historical NLP, offering concrete guidance for historians and computational social scientists in selecting the most effective topic-modeling approach for analyzing digitized archives. Our code will be publicly available on GitHub."
}

Type of Author Metadata Correction

  • The author page wrongly conflates different people with the same name.
  • This author has multiple pages with different spellings of their name.
  • This author has permanently changed their name.

Supporting Information

https://www.c2dh.uni.lu/people/marten-during

Metadata

Metadata

Labels

correctionfor corrections submitted to the anthologymetadataCorrection to metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions