Skip to content

Commit 7b24987

Browse files
authored
Added author page for Marten During (#5024)
1 parent 07fb021 commit 7b24987

File tree

3 files changed

+8
-2
lines changed

3 files changed

+8
-2
lines changed

data/xml/2025.nlp4dh.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,7 @@
411411
<title>Mining the Past: A Comparative Study of Classical and Neural Topic Models on Historical Newspaper Archives</title>
412412
<author><first>Keerthana</first><last>Murugaraj</last><affiliation>University of Luxemburg</affiliation></author>
413413
<author><first>Salima</first><last>Lamsiyah</last><affiliation>University of Luxemburg</affiliation></author>
414-
<author><first>Marten</first><last>During</last><affiliation>University of Luxemburg</affiliation></author>
414+
<author id="marten-during-ul"><first>Marten</first><last>During</last><affiliation>University of Luxemburg</affiliation></author>
415415
<author><first>Martin</first><last>Theobald</last><affiliation>University of Luxembourg</affiliation></author>
416416
<pages>452-463</pages>
417417
<abstract>Analyzing historical discourse in large-scale newspaper archives requires scalable and interpretable methods to uncover hidden themes. This study systematically evaluates topic modeling approaches for newspaper articles from 1955 to 2018, comparing probabilistic LDA, matrix factorization NMF, and neural-based models such as Top2Vec and BERTopic across various preprocessing strategies. We benchmark these methods on topic coherence, diversity, scalability, and interpretability. While LDA is commonly used in historical text analysis, our findings demonstrate that BERTopic, leveraging contextual embeddings, consistently outperforms classical models in all tested aspects, making it a more robust choice for large-scale textual corpora. Additionally, we highlight the trade-offs between preprocessing strategies and model performance, emphasizing the importance of tailored pipeline design. These insights advance the field of historical NLP, offering concrete guidance for historians and computational social scientists in selecting the most effective topic-modeling approach for analyzing digitized archives. Our code will be publicly available on GitHub.</abstract>

data/xml/W14.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1151,7 +1151,7 @@
11511151
<title>Mining the Twentieth Century’s History from the Time Magazine Corpus</title>
11521152
<author><first>Mike</first><last>Kestemont</last></author>
11531153
<author><first>Folgert</first><last>Karsdorp</last></author>
1154-
<author><first>Marten</first><last>Düring</last></author>
1154+
<author id="marten-during"><first>Marten</first><last>Düring</last></author>
11551155
<pages>62–70</pages>
11561156
<url hash="3c8cbe0e">W14-0609</url>
11571157
<doi>10.3115/v1/W14-0609</doi>

data/yaml/name_variants.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10724,3 +10724,9 @@
1072410724
id: hannah-cyberey
1072510725
variants:
1072610726
- {first: Hannah, last: Chen}
10727+
- canonical: {first: Marten, last: During}
10728+
comment: University of Luxembourg
10729+
id: marten-during-ul
10730+
- canonical: {first: Marten, last: Düring}
10731+
comment: May refer to several people
10732+
id: marten-during

0 commit comments

Comments
 (0)