Skip to content

Conversation

rgregg
Copy link

@rgregg rgregg commented Oct 9, 2025

Description

This change updates the duplicate resolution logic to prefer certain filetypes over others - for example preferring HEIC over JPG or DNG over native camera raw formats. This change has a hard-coded priority list of different formats, which are used as the first duplicate selection mechanism.

This change makes it much easier to remove duplicates where you have the same file in multiple formats - with a preference towards newer or higher quality formats - even when the file size is smaller.

I didn't see an open issue on this, but there was a lot of discussion about this in #10665.

How Has This Been Tested?

  • Additional unit-testing provided and runs as expected
  • Ran this against my existing Immich backend server through a variety of situations where I had the same image in different formats - typically DNG and RAW/CR2/CR3. Another test was JPG and HEIC.

Screenshots (if appropriate)

Screenshot 2025-10-08 at 9 55 41 PM Screenshot 2025-10-08 at 9 57 31 PM

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • [] I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

I did use an LLM (Codex) to draft the first iteration of this code before refining and testing by hand.

@rgregg rgregg requested a review from danieldietzler as a code owner October 9, 2025 05:03
Copy link
Contributor

github-actions bot commented Oct 9, 2025

Label error. Requires exactly 1 of: changelog:.*. Found: . A maintainer will add the required label.

@bo0tzz
Copy link
Member

bo0tzz commented Oct 9, 2025

a hard-coded priority list of different formats

I doubt we want to do that, surely different people will have different priorities?

@rgregg
Copy link
Author

rgregg commented Oct 9, 2025

I can imagine it being a configurable priority list and built the code with that in mind. However since the existing prioritization was about preserving the most information (size and exif count) I think these priority groups are logical defaults. DNG are manually created so there was effort involved, and HEIC are richer than standard file formats due to Live Photos.

I could imagine a few option paths if we wanted to make it configurable:

  • one choice would be enabling or disabling each rule - format, size, exif count.
  • Another path could be allowing customizing the ranking order for formats. I think that's probably overkill for most people but interested in what the community thinks.

@D-Demny
Copy link

D-Demny commented Oct 9, 2025

@rgregg a choice would be perfect. Currently the .heic files are often part of a album while the .jpg copy is not. Since .jpg is usually larger, it gets auto selected.

Also idea for another PR: Choose "stack" for all selected duplicates. Currently its only possible to "delete all" which doesnt make sense. Noone wants to delete both versions of the picture. Would be nice to have some bulk actions implemented.

@bwees bwees changed the title Add additional dedupe logic based on image format feat: additional dedupe logic based on image format Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants