Skip to content

Conversation

@mjpost
Copy link
Member

@mjpost mjpost commented Dec 15, 2025

I am opening this draft PR to produce a build in service of wrapping up the new author representation. Discussion here can largely continue from #5471.

List of TODO items

UI changes:

The following are important tasks, but need not block merging:

Administrative tasks:

mbollmann added 30 commits July 18, 2025 17:27
@nschneid
Copy link
Contributor

nschneid commented Jan 6, 2026

Would it be a good idea to make a post/reach out to stakeholders (ARR, EACL 2026) warning them that the author transition is coming? E.g.:

In the coming days, we will update our database to make core technical changes that will improve our ability to maintain high-quality author pages. We thank authors who have submitted corrections for their patience during this process.

The user-facing changes at this time will be minor; the upgrade is not expected to require any downtime or break existing URLs. Users may notice that /unverified will appear in URLs for author pages that are constructed based on the name string alone, without any verification that they represent unique individuals. We are moving toward adding ORCIDs to better identify unique authors, and will be working with upcoming venues to encourage submitting authors to include ORCIDs in their profiles so they are attached to papers received by the Anthology.

We are sure about no URLs breaking, right? Any current unverified author page will be forwarded to /unverified? (This matters for the current ARR cycle which uses author page in its processes.)

@mbollmann
Copy link
Member

We are sure about no URLs breaking, right?

I could run another systematic comparison (build under old + new system and run a script to compare which pages are being generated) before we switch. Can't hurt to be too careful IMO.

Any current unverified author page will be forwarded to /unverified?

Unless there’s a bug here that I haven’t caught, yes:

# If the requested author page does not exist, soft-redirect [303 See Other] to the /unverified URL
RewriteCond %{REQUEST_FILENAME}/index.html !-f
RewriteCond %{REQUEST_FILENAME}/unverified -d
RewriteRule ^people/([^/]+)/?$ people/$1/unverified [L,R=303]

@nschneid
Copy link
Contributor

Would it be a good idea to make a post/reach out to stakeholders (ARR, EACL 2026) warning them that the author transition is coming? E.g.:

In the coming days, we will update our database to make core technical changes that will improve our ability to maintain high-quality author pages. We thank authors who have submitted corrections for their patience during this process.

The user-facing changes at this time will be minor; the upgrade is not expected to require any downtime or break existing URLs. Users may notice that /unverified will appear in URLs for author pages that are constructed based on the name string alone, without any verification that they represent unique individuals. We are moving toward adding ORCIDs to better identify unique authors, and will be working with upcoming venues to encourage submitting authors to include ORCIDs in their profiles so they are attached to papers received by the Anthology.

@mjpost thoughts on an advance announcement? It seems like a good way to address the fact that there are a lot of open correction issues, i.e. we are busy on important work and will get to those soon. :)

@mjpost
Copy link
Member Author

mjpost commented Jan 10, 2026

Yes, we definitely need to do this. My thinking is:

  1. Complete New author system UI changes #6824
  2. Set date for merging New author system #6807 (can we do early next week?)
  3. Write blog post describing the new system and the ORCID iD aspirations
  4. Reach out to many chairs point them to the blog post

@nschneid
Copy link
Contributor

https://preview.aclanthology.org/master-new-author-system-ui/people/yang-liu/unverified/:

image

https://preview.aclanthology.org/master-new-author-system-ui/people/yang-janet-liu/:

image

Where are the Chinese characters coming from? I thought unverified = no entry in our database, so how would we be able to disambiguate the Chinese spelling of the name? Is this a bug in the API?

@nschneid
Copy link
Contributor

^ It seems that if a Person instances has multiple entries returned by names, and one of them is in a different script, it gets displayed in parentheses. My question is why/whether an unverified person should ever have variant names across scripts.

@mbollmann
Copy link
Member

mbollmann commented Jan 10, 2026

Variants in foreign script can be recorded directly in the XML. EDIT: ...and if that author doesn't have an explicit ID, they're of course unverified.

@nschneid
Copy link
Contributor

Ah. So my concern is, for the unverified Yang Liu page, 扬 刘 is but one possible Chinese variant of the name. Maybe it was the only one that happened to be present in an XML file where Yang Liu was unverified. But it seems misleading to show it in parentheses as if it represents ALL the unverified Yang Lius—some (presumably most) of whom didn't have a script variant in the XML.

I don't know if this is a representation problem in the library, or if the page should just not show the alternate script variant on unverified authors. Maybe this logic should be modified to check verification:

data["variant_entries"] = []
diff_script_variants = []
for n in person.names[1:]:
data["variant_entries"].append(
{"first": n.first, "last": n.last, "full": n.as_full()}
)
if n.script is not None:
diff_script_variants.append(n.as_full())
if diff_script_variants:
data["full"] = f"{data['full']} ({', '.join(diff_script_variants)})"

@mbollmann
Copy link
Member

I don't see how this would be a representation problem in the library; the author tag with the name variant needs to be resolved to an ID, and that can be an unverified one. Still, the Han name variant belongs to that author ID, so it will be returned as one of the possible names for this unverified person. That does appear perfectly correct to me.

@nschneid
Copy link
Contributor

If we were being really nitpicky we might distinguish script variants that occur with all instances of an unverified author from ones that occur only some of the time (because there can always be multiple people under the same unverified author and their names can have different script variants). But this feels like overkill, and I was able to implement a front-end change in #6824 that removes this source of confusion.

@mbollmann
Copy link
Member

If we were being really really nitpicky I would say different-script name variants don't really belong on the paper level at all, but instead could trigger/require a verified ID assignment so that the variant can be stored at the author level. :)

@mjpost
Copy link
Member Author

mjpost commented Jan 12, 2026

Hi, I've run out of time for this stuff today, will revert and reconvert tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants