Fix Incremental sync with trakt #1316
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements incremental synchronization for Trakt imports by tracking the last successful import date and utilizing the start_at parameter in API requests. It also introduces logic to handle movie rewatches and deduplicate episode entries within a one-day overlap window. The review feedback identifies several critical improvements: correcting the JSON-based filter for task arguments, increasing the precision of the last import timestamp to avoid redundant processing, eliminating a redundant database query when updating movies, and addressing inefficiencies in movie processing caused by the overlap window.
| result = ( | ||
| TaskResult.objects.filter( | ||
| task_name=task_name, | ||
| task_kwargs__contains=f"'user_id': {user.id},", |
There was a problem hiding this comment.
The task_kwargs field in TaskResult is typically stored as a JSON string. Using single quotes and a trailing comma in the __contains filter is very fragile and likely to fail. JSON standard uses double quotes, and the trailing comma won't be present if user_id is the last key in the dictionary. This could cause the incremental sync to always fail to find a previous result, falling back to a full sync every time.
| task_kwargs__contains=f"'user_id': {user.id},", | |
| task_kwargs__contains=f'"user_id": {user.id}', |
| .order_by("-date_done") | ||
| .first() | ||
| ) | ||
| return result.date_done.date() if result else None |
There was a problem hiding this comment.
Returning only the .date() part of date_done reduces the precision of the incremental sync. If multiple imports happen on the same day, using just the date will cause the sync to re-fetch data that was already processed earlier that day. It's better to return the full datetime object to allow for a more precise start_at filter in the Trakt API.
| return result.date_done.date() if result else None | |
| return result.date_done if result else None |
| movies_to_update = list( | ||
| Movie.objects.filter( | ||
| user=user, | ||
| item__media_id__in=updates.keys(), | ||
| item__source=Sources.TMDB.value, | ||
| ).select_related("item") | ||
| ) | ||
| for movie in movies_to_update: | ||
| movie.end_date = updates[movie.item.media_id] |
There was a problem hiding this comment.
This database query is redundant. The existing_media dictionary already contains the Movie instances (with their related Item objects) for the IDs found in updates. You can iterate over the updates dictionary and update the instances directly from existing_media before calling bulk_update_with_history, saving a database hit.
movies_to_update = []
for tmdb_id, end_date in updates.items():
movie = existing_media[tmdb_id]
movie.end_date = end_date
movies_to_update.append(movie)| while True: | ||
| url = f"{endpoint}?page={page}&limit={BULK_PAGE_SIZE}" | ||
| if start_at: | ||
| url += f"&start_at={start_at.isoformat()}" |
There was a problem hiding this comment.
The Trakt API documentation for the history endpoint specifies that the start_at parameter should be in the format YYYY-MM-DDTHH:mm:ss.SSSZ. Since start_at is currently a date object, isoformat() will only produce YYYY-MM-DD, which may lead to API errors or unexpected filtering behavior. If get_last_import_date is updated to return a datetime, this will produce a more compatible string.
| ): | ||
| return | ||
| # Skip existence check when date-based filtering guarantees all entries are new | ||
| if not (self.mode == "new" and self.last_import_at): |
There was a problem hiding this comment.
The comment stating that date-based filtering guarantees all entries are new is incorrect because of the 1-day overlap window used in process_history. By skipping should_process_media, the importer will fetch metadata and create objects for all movies watched in the last 24 hours on every sync, even if they were already imported. While bulk_create with ignore_conflicts=True prevents DB duplicates, this is inefficient. Consider keeping the existence check or implementing a more precise dedup mechanism for movies similar to the one used for episodes.
|
I am yet to test it, please let me know if the approach is worth it so that I go ahead with testing it. |
Currently incremental sync with trakt, gets all the data from trakt again, which is hugely wasteful. We get only the data from the last time the sync happened.
Also update the episodes of shows that were watched / rewatched.
Movies rewatch to also fix.
Fixes #655