Improve Sonarr and Radarr sync performance#3306
Conversation
Measured on a large Bazarr database with thousands of series, tens of thousands of episodes, and thousands of movies. Before this migration, EXPLAIN showed SCAN table_episodes for the per-series Sonarr episode lookup and for the wanted-subtitle count. After adding idx_table_episodes_sonarrSeriesId and the partial missing_subtitles indexes, those plans changed to indexed SEARCH operations. A copied-database sweep over all series returned on the order of 100k episode rows per round. Median CPU fell from 160.239s without the indexes to 0.327s with them, and wall time fell from 161.061s to 0.329s.
Code inspection and live profiling showed update_series() called update_one_series() and then sync_episodes(), while update_one_series() also called sync_episodes() for non-SignalR updates. On a library with thousands of series, the full Sonarr pass could therefore roughly double per-series episode sync work before doing any useful episode comparison work. This adds an explicit flag so the full-series loop updates the series row once and keeps its existing single sync_episodes(series_id=...) call.
Live py-spy samples after restart showed full-series sync spending time in get_language_profiles() and update_one_series() setup while processing the same full Sonarr series payload already fetched by update_series(). Before this change, update_one_series() rebuilt the audio profile list, tag map, and language profiles once per series on a library with thousands of series, and its standalone path could also fetch the series from Sonarr again. The full sync path now computes the shared profile/tag context once, passes the already fetched show payload into update_one_series(), and leaves the per-series fallback behavior for standalone and SignalR calls.
Live py-spy samples still showed the full sync active inside database.execute(update(TableShows)...) for Sonarr series rows even when the parsed values were unchanged. On the measured library, that meant thousands of SQLite update transactions during a full Sonarr pass before episode sync work. Comparing the parsed series dict to the existing row lets unchanged series return before issuing UPDATE. Standalone non-SignalR calls still keep their episode sync behavior when the row is unchanged, so this only removes the avoidable series-row write.
Live py-spy samples during startup sync showed sync_episodes() spending time in _fetchall_impl/all at the TableEpisodes lookup for each series. Code inspection showed two queries for the same sonarrSeriesId: one query for episode ids and a second full-row query used for comparisons. The full-row query already contains the sonarrEpisodeId keys, so this derives the id list from that dictionary and removes one per-series SQLite SELECT from the measured thousands-series sync path.
After the earlier Sonarr sync reductions, live py-spy samples showed episodeParser() spending CPU in os.path.getsize()/genericpath for episode files. sync_episodes() already accepts an episode when Sonarr reports episodeFile.size above MINIMUM_VIDEO_SIZE, but episodeParser() still re-statted the path for every parsed episode. This trusts Sonarr's reported size first and only falls back to os.path.getsize() when the reported size is too small and the file is not an enabled .strm entry, removing the common per-episode filesystem stat from startup sync.
After avoiding the repeated file stat, live py-spy samples showed remaining parser overhead in Dynaconf setting resolution, including recursively_evaluate_lazy_format, from settings.general.enable_strm_support and settings.general.parse_embedded_audio_track. Those settings do not change inside one sync_episodes() pass, but the old code resolved them for each parsed episode. This resolves both settings once per series sync and passes the values to episodeParser(), while episodeParser() keeps fallback reads for standalone callers such as sync_one_episode().
The old full movie sync compared each parsed Radarr movie against every database movie row with any(parsed_movie.items() <= x for x in current_movies_db_kv), making the unchanged-row check O(movie_count squared). Measured with thousands of movies, the old subset scan used 0.955625s CPU in a micro-benchmark, while the keyed radarrId lookup used 0.001418s CPU, about 674x faster for the comparison step. This builds one radarrId-keyed dictionary from TableMovies, uses a set for the existing id membership check, and compares each parsed movie only against its matching row.
Covers the review risks around standalone Sonarr series refreshes, the full-series explicit episode sync path, unchanged series update skipping, episodeParser file-size handling, and keyed Radarr movie comparisons. These tests keep the measured optimizations from regressing while avoiding exact library-size assumptions.
3967b54 to
4d2acd2
Compare
|
Force-pushed the branch only to re-sign the commits and fix GitHub commit verification. No code changes were made in that push. |
There was a problem hiding this comment.
Pull request overview
This PR improves Bazarr’s Sonarr/Radarr sync performance by reducing repeated per-item work during full syncs and adding database indexes that align with common subtitle-sync query patterns.
Changes:
- Add SQLite indexes to speed up Sonarr episode lookups and “wanted subtitles” queries.
- Reduce redundant Sonarr sync work (reuse context across a full sync, avoid duplicate episode syncs, skip unchanged series updates, collapse duplicate episode DB queries).
- Replace Radarr’s O(n²) “is this movie changed?” comparison with keyed lookups, and add regression tests for the new fast paths.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/bazarr/test_sync_performance_paths.py | Adds regression tests covering the optimized Sonarr/Radarr sync paths and parser behavior changes. |
| migrations/versions/6f0b2c8d9a1e_.py | Adds indexes for episodes/movies missing-subtitles queries and per-series episode lookups. |
| bazarr/sonarr/sync/series.py | Reuses Sonarr context across full syncs, avoids duplicate episode syncs, and skips unchanged series updates. |
| bazarr/sonarr/sync/parser.py | Avoids filesystem stat calls when Sonarr’s reported episode file size is already valid; caches settings via parameters. |
| bazarr/sonarr/sync/episodes.py | Removes a redundant per-series DB query and passes cached parser settings to reduce repeated config access. |
| bazarr/radarr/sync/movies.py | Replaces per-movie subset scans with radarrId-keyed comparisons to avoid O(n²) behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Could you review this PR following recent changes in development branch? We've already added indexes and there's been other db migrations that must be taken into account. If you must create a new PR, please split strm file support and sync performance improvement in two different PR? Thanks! |
Summary
This PR improves Sonarr and Radarr sync performance by removing repeated work from full library syncs and adding indexes for common sync lookup paths.
The main changes are:
Performance notes
Tested against a large Bazarr database: thousands of series, tens of thousands of episodes, and thousands of movies. Measurements came from live profiling, SQLite
EXPLAIN, copied-database timing, and focused micro-benchmarks.EXPLAINpreviously showedSCAN table_episodesfor per-series Sonarr episode lookups and wanted-subtitle counts. With the new indexes, those plans use indexedSEARCHoperations.160.239sto0.327s; wall time dropped from161.061sto0.329s.update_series()calledsync_episodes()directly andupdate_one_series()could also call it. The full sync path now keeps one explicit episode sync per processed series.UPDATE, avoiding thousands of unnecessary SQLite writes on large libraries.SELECT.episodeParser()now trusts Sonarr's reported file size first and only falls back to a filesystem stat when needed.radarrIdkeyed dictionary. With thousands of movies, the old subset scan used0.955625sCPU in a micro-benchmark; the keyed lookup used0.001418sCPU, about674xfaster for that comparison step.Test plan
UPDATEwhile manual callers still sync episodesepisodeParser()avoids filesystem stat calls when Sonarr's reported size is already validradarrIdrow rather than scanning every movie row