Skip to content

Fix Incremental sync with trakt #1316

Open
darkcoderrises wants to merge 1 commit intoFuzzyGrim:devfrom
ForkedRepos:harshil/fix-trakt-new-sync
Open

Fix Incremental sync with trakt #1316
darkcoderrises wants to merge 1 commit intoFuzzyGrim:devfrom
ForkedRepos:harshil/fix-trakt-new-sync

Conversation

@darkcoderrises
Copy link
Copy Markdown

Currently incremental sync with trakt, gets all the data from trakt again, which is hugely wasteful. We get only the data from the last time the sync happened.
Also update the episodes of shows that were watched / rewatched.
Movies rewatch to also fix.
Fixes #655

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements incremental synchronization for Trakt imports by tracking the last successful import date and utilizing the start_at parameter in API requests. It also introduces logic to handle movie rewatches and deduplicate episode entries within a one-day overlap window. The review feedback identifies several critical improvements: correcting the JSON-based filter for task arguments, increasing the precision of the last import timestamp to avoid redundant processing, eliminating a redundant database query when updating movies, and addressing inefficiencies in movie processing caused by the overlap window.

result = (
TaskResult.objects.filter(
task_name=task_name,
task_kwargs__contains=f"'user_id': {user.id},",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The task_kwargs field in TaskResult is typically stored as a JSON string. Using single quotes and a trailing comma in the __contains filter is very fragile and likely to fail. JSON standard uses double quotes, and the trailing comma won't be present if user_id is the last key in the dictionary. This could cause the incremental sync to always fail to find a previous result, falling back to a full sync every time.

Suggested change
task_kwargs__contains=f"'user_id': {user.id},",
task_kwargs__contains=f'"user_id": {user.id}',

.order_by("-date_done")
.first()
)
return result.date_done.date() if result else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Returning only the .date() part of date_done reduces the precision of the incremental sync. If multiple imports happen on the same day, using just the date will cause the sync to re-fetch data that was already processed earlier that day. It's better to return the full datetime object to allow for a more precise start_at filter in the Trakt API.

Suggested change
return result.date_done.date() if result else None
return result.date_done if result else None

Comment on lines +223 to +231
movies_to_update = list(
Movie.objects.filter(
user=user,
item__media_id__in=updates.keys(),
item__source=Sources.TMDB.value,
).select_related("item")
)
for movie in movies_to_update:
movie.end_date = updates[movie.item.media_id]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This database query is redundant. The existing_media dictionary already contains the Movie instances (with their related Item objects) for the IDs found in updates. You can iterate over the updates dictionary and update the instances directly from existing_media before calling bulk_update_with_history, saving a database hit.

    movies_to_update = []
    for tmdb_id, end_date in updates.items():
        movie = existing_media[tmdb_id]
        movie.end_date = end_date
        movies_to_update.append(movie)

while True:
url = f"{endpoint}?page={page}&limit={BULK_PAGE_SIZE}"
if start_at:
url += f"&start_at={start_at.isoformat()}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Trakt API documentation for the history endpoint specifies that the start_at parameter should be in the format YYYY-MM-DDTHH:mm:ss.SSSZ. Since start_at is currently a date object, isoformat() will only produce YYYY-MM-DD, which may lead to API errors or unexpected filtering behavior. If get_last_import_date is updated to return a datetime, this will produce a more compatible string.

):
return
# Skip existence check when date-based filtering guarantees all entries are new
if not (self.mode == "new" and self.last_import_at):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment stating that date-based filtering guarantees all entries are new is incorrect because of the 1-day overlap window used in process_history. By skipping should_process_media, the importer will fetch metadata and create objects for all movies watched in the last 24 hours on every sync, even if they were already imported. While bulk_create with ignore_conflicts=True prevents DB duplicates, this is inefficient. Consider keeping the existence check or implementing a more precise dedup mechanism for movies similar to the one used for episodes.

@darkcoderrises
Copy link
Copy Markdown
Author

I am yet to test it, please let me know if the approach is worth it so that I go ahead with testing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Import not importing new watched episodes

1 participant