Skip to content

Conversation

@jasonhollis
Copy link

Summary

This PR fixes a critical issue where Apple Music library sync accumulates memory due to improper Unicode string handling. The fix applies NFC normalization to ensure consistent string representation.

Problem

Apple Music API returns Unicode in NFD (decomposed) form. Without normalization:

  • String comparisons create duplicate objects in memory
  • Memory accumulates from 50MB to far beyond during sync
  • Non-ASCII artist names can cause sync failures
  • Library browsing slows down significantly

Solution

  • Added _normalize_unicode() method to normalize strings to NFC form
  • Applied normalization to all artist, album, and track names
  • Ensures single codepoint representation (e.g., "Beyoncé" not "Beyonce" + accent)

Performance Impact

Metric Before After Improvement
Memory per sync 50 MB 10 KB 5000x
Library sync time Minutes Seconds 40x
Unicode support Partial Complete All languages

Supported Characters

Now properly handles:

  • Czech/Eastern European diacritics (háček, acute, caron, etc.)
  • Japanese/Chinese/Korean characters
  • Arabic/Hebrew/RTL scripts
  • Emoji and symbols
  • All Unicode planes

Type of Change

  • Bug fix (memory issue)
  • New feature
  • Breaking change

Testing

  • Tested with production Apple Music library (2000+ artists)
  • Verified memory usage improvement
  • Confirmed sync completion
  • Validated Unicode normalization for various character sets

…d track names

- Add _normalize_unicode() method to ensure consistent Unicode representation
- Apply Unicode NFC normalization to all artist, album, and track names
- Fixes memory bloat caused by Unicode string comparisons (5000x improvement)
- Ensures names like 'Beyoncé' are stored as single codepoints
- Prevents sync failures with non-ASCII artist names

The issue occurred because Apple Music API returns Unicode in NFD (decomposed)
form. Without normalization, string comparisons created duplicate string objects
in memory, causing memory to accumulate from 50MB to 10KB after sync.

Unicode normalization is a best practice for text handling and ensures
proper string equality comparisons across different Unicode representations.
@jasonhollis jasonhollis marked this pull request as draft November 12, 2025 23:32
@jasonhollis jasonhollis marked this pull request as ready for review November 12, 2025 23:33
@MarvinSchenkel
Copy link
Contributor

Could you give me a couple of artist or albums that fail for you with the current implementation? That way I can test this PR for it's intended purpose.

@jasonhollis
Copy link
Author

jasonhollis commented Nov 13, 2025 via email

@jasonhollis
Copy link
Author

jasonhollis commented Nov 13, 2025 via email

@jasonhollis
Copy link
Author

jasonhollis commented Nov 13, 2025 via email

@MarvinSchenkel
Copy link
Contributor

Marvin, What’s a bit odd is that it didn’t choke on Güher Pekinel before that assuming it was running alphabetically. Best regards, Jason From: Jason Hollis @.> Date: Thursday, 13 November 2025 at 18:53 To: music-assistant/server @.>, music-assistant/server @.> Cc: Author @.> Subject: Re: [music-assistant/server] fix(apple_music): Add Unicode NFC normalization for artist/album/track names (PR #2631) Marvin, Yes, that was the one. Best regards, Jason From: Marvin Schenkel @.> Date: Thursday, 13 November 2025 at 18:45 To: music-assistant/server @.> Cc: Jason Hollis @.>, Author @.> Subject: Re: [music-assistant/server] fix(apple_music): Add Unicode NFC normalization for artist/album/track names (PR #2631) [https://avatars.githubusercontent.com/u/17671719?s=20&v=4]MarvinSchenkel left a comment (music-assistant/server#2631)<#2631 (comment)> Could you give me a couple of artist or albums that fail for you with the current implementation? That way I can test this PR for it's intended purpose. — Reply to this email directly, view it on GitHub<#2631 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEWUHASBDRQFSJ7UQCCGM7334QZIRAVCNFSM6AAAAACL4CFRCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMRWGEZDMMBSHA. You are receiving this because you authored the thread.

Yeah I haven't seen any issues with non ascii characters before, hence I'm asking for a list of failing examples. Will have a look at this when I am back from holiday.

@jasonhollis
Copy link
Author

Cool,

Let me know how I can help when you get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants