Skip to content

IA import dropped Unicode characters #11637

@cdrini

Description

@cdrini

Problem

This import: https://openlibrary.org/recentchanges/2025/12/30/edit-book/160483814 dropped unicode characters like é and è. Note the subjects. They appear correct on internet archive though.

Reproducing the bug

  1. Locally, import https://openlibrary.org/import/preview?source=ia:worldalmanacbook1991mark
  • Expected behavior: subjects correctly include unicode characters in subjects, as they appear on ia
  • Actual behavior: unicode characters dropped

Context

  • Browser (Chrome, Safari, Firefox, etc):
  • OS (Windows, Mac, etc):
  • Logged in (Y/N):
  • Environment (prod, dev, local): prod

Breakdown

Requirements Checklist

  • [ ]

Related files

Stakeholders


Instructions for Contributors

  • Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs: BreakdownThis big issue needs a checklist or subissues to describe a breakdown of work. [managed]Needs: LeadNeeds: ResponseIssues which require feedback from leadNeeds: TriageThis issue needs triage. The team needs to decide who should own it, what to do, by when. [managed]Type: BugSomething isn't working. [managed]

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions