New author system #6807

mjpost · 2025-12-15T14:57:20Z

I am opening this draft PR to produce a build in service of wrapping up the new author representation. Discussion here can largely continue from #5471.

List of TODO items

UI changes:

New author system UI changes #6824

The following are important tasks, but need not block merging:

Modify the ingestion script to use the library and match ORCID iDs on ingested papers (cf. Missing functionality for modifying and ingesting data with Python lib #4766 )
Add tooling to show people how to become verified
Fix bulk metadata script to use library

Administrative tasks:

Make sure submitting conferences use ORCID iDs on all papers (or as many as possible) (cf. Outreach to data providers to require ORCIDs #7048

…ogic

…g added multiple times)

…t look like a continuation of the last name

nschneid · 2026-01-06T20:10:58Z

Would it be a good idea to make a post/reach out to stakeholders (ARR, EACL 2026) warning them that the author transition is coming? E.g.:

In the coming days, we will update our database to make core technical changes that will improve our ability to maintain high-quality author pages. We thank authors who have submitted corrections for their patience during this process.

The user-facing changes at this time will be minor; the upgrade is not expected to require any downtime or break existing URLs. Users may notice that /unverified will appear in URLs for author pages that are constructed based on the name string alone, without any verification that they represent unique individuals. We are moving toward adding ORCIDs to better identify unique authors, and will be working with upcoming venues to encourage submitting authors to include ORCIDs in their profiles so they are attached to papers received by the Anthology.

We are sure about no URLs breaking, right? Any current unverified author page will be forwarded to /unverified? (This matters for the current ARR cycle which uses author page in its processes.)

mbollmann · 2026-01-06T22:00:19Z

We are sure about no URLs breaking, right?

I could run another systematic comparison (build under old + new system and run a script to compare which pages are being generated) before we switch. Can't hurt to be too careful IMO.

Any current unverified author page will be forwarded to /unverified?

Unless there’s a bug here that I haven’t caught, yes:

acl-anthology/hugo/static/.htaccess

Lines 84 to 87 in 4fc9af2

    
           # If the requested author page does not exist, soft-redirect [303 See Other] to the /unverified URL 
        
           RewriteCond %{REQUEST_FILENAME}/index.html !-f 
        
           RewriteCond %{REQUEST_FILENAME}/unverified -d 
        
           RewriteRule ^people/([^/]+)/?$ people/$1/unverified [L,R=303]

… means at the paper level

… of verification status

nschneid · 2026-01-10T15:27:53Z

Would it be a good idea to make a post/reach out to stakeholders (ARR, EACL 2026) warning them that the author transition is coming? E.g.:

In the coming days, we will update our database to make core technical changes that will improve our ability to maintain high-quality author pages. We thank authors who have submitted corrections for their patience during this process.

The user-facing changes at this time will be minor; the upgrade is not expected to require any downtime or break existing URLs. Users may notice that /unverified will appear in URLs for author pages that are constructed based on the name string alone, without any verification that they represent unique individuals. We are moving toward adding ORCIDs to better identify unique authors, and will be working with upcoming venues to encourage submitting authors to include ORCIDs in their profiles so they are attached to papers received by the Anthology.

@mjpost thoughts on an advance announcement? It seems like a good way to address the fact that there are a lot of open correction issues, i.e. we are busy on important work and will get to those soon. :)

mjpost · 2026-01-10T15:54:35Z

Yes, we definitely need to do this. My thinking is:

Complete New author system UI changes #6824
Set date for merging New author system #6807 (can we do early next week?)
Write blog post describing the new system and the ORCID iD aspirations
Reach out to many chairs point them to the blog post

…or not

This reverts commit 84063f9.

…t ORCID or not" This reverts commit 893ca21.

nschneid · 2026-01-10T18:18:53Z

https://preview.aclanthology.org/master-new-author-system-ui/people/yang-liu/unverified/:

https://preview.aclanthology.org/master-new-author-system-ui/people/yang-janet-liu/:

Where are the Chinese characters coming from? I thought unverified = no entry in our database, so how would we be able to disambiguate the Chinese spelling of the name? Is this a bug in the API?

nschneid · 2026-01-10T19:56:52Z

^ It seems that if a Person instances has multiple entries returned by names, and one of them is in a different script, it gets displayed in parentheses. My question is why/whether an unverified person should ever have variant names across scripts.

mbollmann · 2026-01-10T20:16:29Z

Variants in foreign script can be recorded directly in the XML. EDIT: ...and if that author doesn't have an explicit ID, they're of course unverified.

nschneid · 2026-01-10T20:41:34Z

Ah. So my concern is, for the unverified Yang Liu page, 扬刘 is but one possible Chinese variant of the name. Maybe it was the only one that happened to be present in an XML file where Yang Liu was unverified. But it seems misleading to show it in parentheses as if it represents ALL the unverified Yang Lius—some (presumably most) of whom didn't have a script variant in the XML.

I don't know if this is a representation problem in the library, or if the page should just not show the alternate script variant on unverified authors. Maybe this logic should be modified to check verification:

acl-anthology/bin/create_hugo_data.py

Lines 387 to 396 in efd9fde

    
           data["variant_entries"] = [] 
        
           diff_script_variants = [] 
        
           for n in person.names[1:]: 
        
               data["variant_entries"].append( 
        
                   {"first": n.first, "last": n.last, "full": n.as_full()} 
        
               ) 
        
               if n.script is not None: 
        
                   diff_script_variants.append(n.as_full()) 
        
           if diff_script_variants: 
        
               data["full"] = f"{data['full']} ({', '.join(diff_script_variants)})"

mbollmann · 2026-01-10T21:31:55Z

I don't see how this would be a representation problem in the library; the author tag with the name variant needs to be resolved to an ID, and that can be an unverified one. Still, the Han name variant belongs to that author ID, so it will be returned as one of the possible names for this unverified person. That does appear perfectly correct to me.

nschneid · 2026-01-10T21:40:07Z

If we were being really nitpicky we might distinguish script variants that occur with all instances of an unverified author from ones that occur only some of the time (because there can always be multiple people under the same unverified author and their names can have different script variants). But this feels like overkill, and I was able to implement a front-end change in #6824 that removes this source of confusion.

mbollmann · 2026-01-10T21:53:22Z

If we were being really really nitpicky I would say different-script name variants don't really belong on the paper level at all, but instead could trigger/require a verified ID assignment so that the variant can be stored at the author level. :)

New author system UI changes

mjpost · 2026-01-12T15:56:27Z

Hi, I've run out of time for this stuff today, will revert and reconvert tomorrow.

mbollmann added 30 commits July 18, 2025 17:27

Make logger use stderr (#5474)

10a9e58

Merge branch 'master' into python-dev

1479c56

Add ORCID field to Person

ee78650

Add new fields to NameSpec and Person, add check for verified IDs

309ad65

Removed outdated special case when slugifying

5b9ca84

Switch from name_variants.yaml to people.yaml & new name resolution l…

aa2411a

…ogic

Transition test data & fix tests outside of personindex_test.py

248009d

Remove tests for get_or_create_person, fix remaining ones

3471119

Refactor get_or_create_person to resolve_namespec, refactor exceptions

7ee6ac8

Refactor exceptions (again), add checks for ORCID on NameSpecification

983bbc8

Add ORCID validation (incl. checksum)

44ae702

Add integration test for PersonIndex, currently expected to fail

f287f16

Bump Codecov action to v5

2c11c80

Add by_orcid, rename name_to_ids to by_name

a582685

Disallow person IDs starting with numbers

e5511f4

Add tests for name resolution logic

5fe6470

Increase test coverage, fix small bug (checked for wrong exception)

bfc8bbc

Add function & tests for ingestion logic

4d7b39b

Update CHANGELOG

59ef07b

Refactor Person.names to store if NameLink was EXPLICIT or INFERRED

5f20fc2

Add save functionality for people.yaml

fb64a67

Let changes to Person automatically update PersonIndex

e65733f

Add Person.make_explicit + more people.yaml saving tests

6f6bbe2

Refactor PersonIndex tests & add check for duplicate ORCIDs

e5330c8

Move PersonIndex fields behind getters that auto-load data

cbecb5a

Add Person.update_id

157e313

Automatically call ingest_namespec() on create_ functions

43575a4

Add PersonIndex.create_person()

61cedf7

Update documentation (WIP)

7ac5e12

Change slugs_to_verified_ids to contain sets (fixes bug with IDs bein…

10a317f

…g added multiple times)

nschneid added 2 commits January 6, 2026 10:16

Spacing: add some daylight between the name and the icon so it doesn'…

3625997

…t look like a continuation of the last name

bootstrap class to set icon margin

ce7741f

nschneid added 6 commits January 6, 2026 21:15

verification.md: a bit more detail on the icons and what verification…

d297b22

… means at the paper level

header_navbar.html "GitHub" capitalization

10b99e0

verification.md: steps for verifying - we want an ORCID iD regardless…

5139fe4

… of verification status

orcid.md: OpenReview is one word

c7709ef

verification.md: fix icons

0fcdc4b

verification.md: rephrase part about unverified page for ambiguous names

abde567

nschneid and others added 5 commits January 10, 2026 11:07

author page template: no checkmark icon for verification, just ORCID …

893ca21

…or not

No ? for verified accounts

84063f9

Revert "No ? for verified accounts"

a2c9fd9

This reverts commit 84063f9.

Revert "author page template: no checkmark icon for verification, jus…

3d9bc64

…t ORCID or not" This reverts commit 893ca21.

Make question mark green

e3d7106

mjpost and others added 3 commits January 10, 2026 13:23

Restore paper verification notice in tooltip

20b83be

ORCID iD isn't recorded

cea36bd

accidental </a>

3f312cc

for author's full display name, only include script variants if verified

d2a8772

Merge pull request #6824 from acl-org/master-new-author-system-ui

728f349

New author system UI changes

New author system #6807

Are you sure you want to change the base?

New author system #6807

Uh oh!

Conversation

mjpost commented Dec 15, 2025 • edited by weissenh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nschneid commented Jan 6, 2026

Uh oh!

mbollmann commented Jan 6, 2026

Uh oh!

nschneid commented Jan 10, 2026

Uh oh!

mjpost commented Jan 10, 2026

Uh oh!

nschneid commented Jan 10, 2026

Uh oh!

nschneid commented Jan 10, 2026

Uh oh!

mbollmann commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nschneid commented Jan 10, 2026

Uh oh!

mbollmann commented Jan 10, 2026

Uh oh!

nschneid commented Jan 10, 2026

Uh oh!

mbollmann commented Jan 10, 2026

Uh oh!

mjpost commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mjpost commented Dec 15, 2025 •

edited by weissenh

Loading

mbollmann commented Jan 10, 2026 •

edited

Loading