Skip to content

fix: handle Unicode characters in h5ad metadata without misleading format error#2762

Closed
fresh3nough wants to merge 1 commit into
chanzuckerberg:mainfrom
fresh3nough:fix/2754-unicode-metadata-error
Closed

fix: handle Unicode characters in h5ad metadata without misleading format error#2762
fresh3nough wants to merge 1 commit into
chanzuckerberg:mainfrom
fresh3nough:fix/2754-unicode-metadata-error

Conversation

@fresh3nough

Copy link
Copy Markdown

Summary

Fixes #2754

Loading an h5ad file containing non-ASCII Unicode characters (e.g. Greek letter phi in cell labels like Interstitial Mφ perivascular) caused a misleading error: File must be in the .h5ad format. The actual file was valid h5ad, but the broad except ValueError handler in _load_data() discarded the original error context.

Changes

server/data_anndata/anndata_adaptor.py

  • Improved ValueError handling in _load_data() to detect Unicode/encoding-related errors and display a meaningful message instead of the generic format error
  • Appended the original error message to the generic format error for all other ValueError cases, aiding debugging
  • Added _normalize_unicode_strings() static method that normalizes all string metadata in obs/var DataFrames to NFC form during _validate_and_initialize(), preventing encoding issues during serialization

test/unit/data_anndata/test_unicode_metadata.py

  • Added regression tests that load h5ad files containing Greek letters, accented characters, and other non-ASCII Unicode in metadata
  • Added test verifying NFC normalization is applied to obs metadata strings

Steps to reproduce the original bug

  1. Create an h5ad file with a metadata column containing φ (or similar non-ASCII characters)
  2. Run cellxgene launch anndata.h5ad
  3. Observe misleading error: File must be in the .h5ad format

Steps to verify the fix

python -m unittest test.unit.data_anndata.test_unicode_metadata -v

All 3 new tests pass, and existing data_anndata tests remain unaffected.

@fresh3nough fresh3nough closed this by deleting the head repository Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error: File must be in the .h5ad format.

1 participant