Skip to content

Author Page for Marten During #5024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion data/xml/2025.nlp4dh.xml
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@
<title>Mining the Past: A Comparative Study of Classical and Neural Topic Models on Historical Newspaper Archives</title>
<author><first>Keerthana</first><last>Murugaraj</last><affiliation>University of Luxemburg</affiliation></author>
<author><first>Salima</first><last>Lamsiyah</last><affiliation>University of Luxemburg</affiliation></author>
<author><first>Marten</first><last>During</last><affiliation>University of Luxemburg</affiliation></author>
<author id="marten-during-ul"><first>Marten</first><last>During</last><affiliation>University of Luxemburg</affiliation></author>
<author><first>Martin</first><last>Theobald</last><affiliation>University of Luxembourg</affiliation></author>
<pages>452-463</pages>
<abstract>Analyzing historical discourse in large-scale newspaper archives requires scalable and interpretable methods to uncover hidden themes. This study systematically evaluates topic modeling approaches for newspaper articles from 1955 to 2018, comparing probabilistic LDA, matrix factorization NMF, and neural-based models such as Top2Vec and BERTopic across various preprocessing strategies. We benchmark these methods on topic coherence, diversity, scalability, and interpretability. While LDA is commonly used in historical text analysis, our findings demonstrate that BERTopic, leveraging contextual embeddings, consistently outperforms classical models in all tested aspects, making it a more robust choice for large-scale textual corpora. Additionally, we highlight the trade-offs between preprocessing strategies and model performance, emphasizing the importance of tailored pipeline design. These insights advance the field of historical NLP, offering concrete guidance for historians and computational social scientists in selecting the most effective topic-modeling approach for analyzing digitized archives. Our code will be publicly available on GitHub.</abstract>
Expand Down
2 changes: 1 addition & 1 deletion data/xml/W14.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1151,7 +1151,7 @@
<title>Mining the Twentieth Century’s History from the Time Magazine Corpus</title>
<author><first>Mike</first><last>Kestemont</last></author>
<author><first>Folgert</first><last>Karsdorp</last></author>
<author><first>Marten</first><last>Düring</last></author>
<author id="marten-during"><first>Marten</first><last>Düring</last></author>
<pages>62–70</pages>
<url hash="3c8cbe0e">W14-0609</url>
<doi>10.3115/v1/W14-0609</doi>
Expand Down
6 changes: 6 additions & 0 deletions data/yaml/name_variants.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10724,3 +10724,9 @@
id: hannah-cyberey
variants:
- {first: Hannah, last: Chen}
- canonical: {first: Marten, last: During}
comment: University of Luxembourg
id: marten-during-ul
- canonical: {first: Marten, last: Düring}
comment: May refer to several people
id: marten-during