Skip to content

Improve Sonarr and Radarr sync performance#3306

Open
mjc wants to merge 9 commits into
morpheus65535:developmentfrom
mjc:optimize-sonarr-sync-indexes
Open

Improve Sonarr and Radarr sync performance#3306
mjc wants to merge 9 commits into
morpheus65535:developmentfrom
mjc:optimize-sonarr-sync-indexes

Conversation

@mjc
Copy link
Copy Markdown

@mjc mjc commented Apr 22, 2026

Summary

This PR improves Sonarr and Radarr sync performance by removing repeated work from full library syncs and adding indexes for common sync lookup paths.

The main changes are:

  • Add SQLite indexes used by Sonarr episode sync and wanted-subtitle queries
  • Avoid duplicate Sonarr episode syncs during full-series sync
  • Reuse Sonarr profile/tag/language context across a full sync
  • Skip unchanged Sonarr series row updates
  • Collapse duplicate per-series episode DB queries
  • Avoid unnecessary episode file size checks when Sonarr already reports a valid size
  • Cache episode parser settings during a series sync
  • Replace Radarr O(n²) movie comparison with keyed lookups

Performance notes

Tested against a large Bazarr database: thousands of series, tens of thousands of episodes, and thousands of movies. Measurements came from live profiling, SQLite EXPLAIN, copied-database timing, and focused micro-benchmarks.

  • SQLite EXPLAIN previously showed SCAN table_episodes for per-series Sonarr episode lookups and wanted-subtitle counts. With the new indexes, those plans use indexed SEARCH operations.
  • A copied-database sweep over all series returned on the order of 100k episode rows per round. Median CPU dropped from 160.239s to 0.327s; wall time dropped from 161.061s to 0.329s.
  • Full Sonarr sync could roughly double per-series episode sync work because update_series() called sync_episodes() directly and update_one_series() could also call it. The full sync path now keeps one explicit episode sync per processed series.
  • Full Sonarr sync now computes profile/tag/language context once and passes the already fetched Sonarr series payload through, while standalone callers still fetch as before.
  • Unchanged series rows can now return before issuing an UPDATE, avoiding thousands of unnecessary SQLite writes on large libraries.
  • Episode sync now derives episode ids from the full-row query, removing one duplicate per-series SELECT.
  • episodeParser() now trusts Sonarr's reported file size first and only falls back to a filesystem stat when needed.
  • Parser settings that were resolved repeatedly through Dynaconf are now resolved once per series sync and passed into the parser.
  • Radarr movie comparison now uses a radarrId keyed dictionary. With thousands of movies, the old subset scan used 0.955625s CPU in a micro-benchmark; the keyed lookup used 0.001418s CPU, about 674x faster for that comparison step.

Test plan

  • Added regression coverage for standalone Sonarr series refreshes so API/manual/SignalR callers still process fetched series data
  • Added regression coverage that full-series Sonarr sync only performs one explicit episode sync per processed series
  • Added regression coverage that unchanged Sonarr series rows skip UPDATE while manual callers still sync episodes
  • Added regression coverage that episodeParser() avoids filesystem stat calls when Sonarr's reported size is already valid
  • Added regression coverage that Radarr movie comparison checks the matching radarrId row rather than scanning every movie row

Copilot AI review requested due to automatic review settings April 22, 2026 20:07
mjc added 9 commits April 22, 2026 14:09
Measured on a large Bazarr database with thousands of series, tens of thousands of episodes, and thousands of movies.

Before this migration, EXPLAIN showed SCAN table_episodes for the per-series Sonarr episode lookup and for the wanted-subtitle count. After adding idx_table_episodes_sonarrSeriesId and the partial missing_subtitles indexes, those plans changed to indexed SEARCH operations.

A copied-database sweep over all series returned on the order of 100k episode rows per round. Median CPU fell from 160.239s without the indexes to 0.327s with them, and wall time fell from 161.061s to 0.329s.
Code inspection and live profiling showed update_series() called update_one_series() and then sync_episodes(), while update_one_series() also called sync_episodes() for non-SignalR updates.

On a library with thousands of series, the full Sonarr pass could therefore roughly double per-series episode sync work before doing any useful episode comparison work.

This adds an explicit flag so the full-series loop updates the series row once and keeps its existing single sync_episodes(series_id=...) call.
Live py-spy samples after restart showed full-series sync spending time in get_language_profiles() and update_one_series() setup while processing the same full Sonarr series payload already fetched by update_series().

Before this change, update_one_series() rebuilt the audio profile list, tag map, and language profiles once per series on a library with thousands of series, and its standalone path could also fetch the series from Sonarr again.

The full sync path now computes the shared profile/tag context once, passes the already fetched show payload into update_one_series(), and leaves the per-series fallback behavior for standalone and SignalR calls.
Live py-spy samples still showed the full sync active inside database.execute(update(TableShows)...) for Sonarr series rows even when the parsed values were unchanged.

On the measured library, that meant thousands of SQLite update transactions during a full Sonarr pass before episode sync work. Comparing the parsed series dict to the existing row lets unchanged series return before issuing UPDATE.

Standalone non-SignalR calls still keep their episode sync behavior when the row is unchanged, so this only removes the avoidable series-row write.
Live py-spy samples during startup sync showed sync_episodes() spending time in _fetchall_impl/all at the TableEpisodes lookup for each series.

Code inspection showed two queries for the same sonarrSeriesId: one query for episode ids and a second full-row query used for comparisons.

The full-row query already contains the sonarrEpisodeId keys, so this derives the id list from that dictionary and removes one per-series SQLite SELECT from the measured thousands-series sync path.
After the earlier Sonarr sync reductions, live py-spy samples showed episodeParser() spending CPU in os.path.getsize()/genericpath for episode files.

sync_episodes() already accepts an episode when Sonarr reports episodeFile.size above MINIMUM_VIDEO_SIZE, but episodeParser() still re-statted the path for every parsed episode.

This trusts Sonarr's reported size first and only falls back to os.path.getsize() when the reported size is too small and the file is not an enabled .strm entry, removing the common per-episode filesystem stat from startup sync.
After avoiding the repeated file stat, live py-spy samples showed remaining parser overhead in Dynaconf setting resolution, including recursively_evaluate_lazy_format, from settings.general.enable_strm_support and settings.general.parse_embedded_audio_track.

Those settings do not change inside one sync_episodes() pass, but the old code resolved them for each parsed episode.

This resolves both settings once per series sync and passes the values to episodeParser(), while episodeParser() keeps fallback reads for standalone callers such as sync_one_episode().
The old full movie sync compared each parsed Radarr movie against every database movie row with any(parsed_movie.items() <= x for x in current_movies_db_kv), making the unchanged-row check O(movie_count squared).

Measured with thousands of movies, the old subset scan used 0.955625s CPU in a micro-benchmark, while the keyed radarrId lookup used 0.001418s CPU, about 674x faster for the comparison step.

This builds one radarrId-keyed dictionary from TableMovies, uses a set for the existing id membership check, and compares each parsed movie only against its matching row.
Covers the review risks around standalone Sonarr series refreshes, the full-series explicit episode sync path, unchanged series update skipping, episodeParser file-size handling, and keyed Radarr movie comparisons.

These tests keep the measured optimizations from regressing while avoiding exact library-size assumptions.
@mjc mjc force-pushed the optimize-sonarr-sync-indexes branch from 3967b54 to 4d2acd2 Compare April 22, 2026 20:10
@mjc
Copy link
Copy Markdown
Author

mjc commented Apr 22, 2026

Force-pushed the branch only to re-sign the commits and fix GitHub commit verification. No code changes were made in that push.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Bazarr’s Sonarr/Radarr sync performance by reducing repeated per-item work during full syncs and adding database indexes that align with common subtitle-sync query patterns.

Changes:

  • Add SQLite indexes to speed up Sonarr episode lookups and “wanted subtitles” queries.
  • Reduce redundant Sonarr sync work (reuse context across a full sync, avoid duplicate episode syncs, skip unchanged series updates, collapse duplicate episode DB queries).
  • Replace Radarr’s O(n²) “is this movie changed?” comparison with keyed lookups, and add regression tests for the new fast paths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/bazarr/test_sync_performance_paths.py Adds regression tests covering the optimized Sonarr/Radarr sync paths and parser behavior changes.
migrations/versions/6f0b2c8d9a1e_.py Adds indexes for episodes/movies missing-subtitles queries and per-series episode lookups.
bazarr/sonarr/sync/series.py Reuses Sonarr context across full syncs, avoids duplicate episode syncs, and skips unchanged series updates.
bazarr/sonarr/sync/parser.py Avoids filesystem stat calls when Sonarr’s reported episode file size is already valid; caches settings via parameters.
bazarr/sonarr/sync/episodes.py Removes a redundant per-series DB query and passes cached parser settings to reduce repeated config access.
bazarr/radarr/sync/movies.py Replaces per-movie subset scans with radarrId-keyed comparisons to avoid O(n²) behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@morpheus65535
Copy link
Copy Markdown
Owner

Could you review this PR following recent changes in development branch? We've already added indexes and there's been other db migrations that must be taken into account.

If you must create a new PR, please split strm file support and sync performance improvement in two different PR?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants