Reusing Immich’s pgvector Embeddings and Mediakit’s Architecture #55

pauloxnet · 2025-12-10T09:23:51Z

pauloxnet
Dec 10, 2025

Hi!
First of all, thank you for this amazing project. I’ve just started using Immich locally and I’ve recently discovered immich mediakit. The idea behind it is brilliant, and I really appreciate the clean Python implementation and the flexibility of running it on CPU or GPUs.

While reading through the documentation and looking at how mediakit processes all assets from scratch with ResNet152 and stores vectors in Qdrant, a few questions came to my mind. These are not criticisms at all I’m just very interested in how the architecture fits together and whether some parts of Immich could be reused to speed things up.

Here are my thoughts and questions:

Reusing Immich’s existing embedding vectors

From what I understand, Immich already computes embeddings for each asset during import and stores them in PostgreSQL using pgvector. Since mediakit also performs a full re-scan to compute embeddings with its own model, I was wondering:

Would it be technically possible to reuse Gimmick's existing pgvector embeddings instead of recomputing everything from scratch?

Is the embedding model used by Immich too different (or not expressive enough) to support the advanced similarity and duplicate detection features that mediakit provides?

Necessity of ResNet152 embedding pass

If ResNet152 is required to achieve better similarity or duplicate detection:

Is this because the embeddings generated by Immich (based on its CLIP-like model) are not suitable for the type of comparisons mediakit performs?

Or is it simply a design choice to ensure consistency in the vectors used for Qdrant indexing?

Using pgvector instead of Qdrant

Since Immich already ships with PostgreSQL + pgvector:

Would it be feasible (even theoretically) to store mediakit embeddings directly in Immich PostgreSQL instance, avoiding the need to set up a separate Qdrant vector database?

Are there specific Qdrant features that are necessary for your workflow (e.g., HNSW tuning, shard management, filtering, hybrid scoring) that pgvector cannot provide?

Leveraging Gimmick's existing GPU-enabled stack

Immich has a very convenient GPU-enabled Docker setup (supporting NVIDIA, AMD, Intel, and Apple Silicon).
Would it make sense — or would it even be possible — for mediakit to reuse Immich’s existing GPU-capable environment instead of requiring a separate container or Python environment?

From what I understand, Immich uses CLIP-based models (e.g., ViT-B-32 and other CLIP/SigLIP variants) to generate semantic embeddings stored in PostgreSQL via pgvector/VectorChord, while immich-mediakit uses ResNet152 to extract more “pixel-level” visual features stored in Qdrant.

This makes me wonder whether Immich’s existing embeddings could also be reused for this purpose, or whether the architectural choice of ResNet152 + Qdrant is essential for the quality of mediakit’s advanced duplicate-finding workflow.

Again, thank you for your work. immich mediakit looks extremely promising, especially for users with large photo libraries who want more advanced deduplication workflows. I’m asking these questions just out of curiosity and to better understand the reasoning behind the architecture — I’m really excited about the project and would love to follow its evolution.

Please let me know if any of the above points make sense or if I misunderstood anything.
Thanks again for your time and effort!

RazgrizHsu · 2025-12-12T21:28:51Z

RazgrizHsu
Dec 12, 2025
Maintainer

Hi @pauloxnet,

Thank you for the thoughtful questions and for taking the time to understand the architecture!

1. Why ResNet152 instead of reusing Immich's CLIP embeddings?

The key difference lies in what these models capture:

Model	Type	Purpose
CLIP/SigLIP (Immich)	Semantic embeddings	Understanding "what's in the image"
ResNet152 (MediaKit)	Visual feature embeddings	Capturing "how the image looks"

For duplicate detection, we need pixel-level visual similarity rather than semantic similarity. Two completely different beach photos might be very "similar" to CLIP (both are beaches), but ResNet152 can distinguish them as different images.

CLIP is optimized for queries like "sunset at beach" → finds beach photos. But for finding actual duplicates or near-duplicates, visual feature extraction produces fewer false positives.

2. Why Qdrant instead of pgvector?

A few reasons for this design choice:

Read-only access to Immich DB: MediaKit intentionally keeps Immich's PostgreSQL as read-only to avoid any accidental modifications to your Immich data
Separation of concerns: Keeping MediaKit's vector data separate makes it easier to reset, rebuild, or troubleshoot without affecting Immich
Deployment simplicity: Qdrant is included in the docker-compose setup with zero configuration needed

3. GPU Support

MediaKit already supports GPU acceleration! Check the README under "Choose CPU or GPU Version" for Docker (NVIDIA CUDA) and source installation (macOS MPS, Windows CUDA) options.

4. Reusing Immich's GPU environment

Integrating into Immich's container would increase architectural complexity and coupling. The standalone approach allows independent updates, easier debugging, and flexibility for different deployment scenarios.

As mentioned in the README, this project started simply as a tool to help organize my family's large photo collection. Of course, anyone interested could take a similar architecture and extend it into a desktop app or other applications - but that's a personal choice. After all, there are always many different approaches to solving the same problem.

Thanks again for the detailed questions!

1 reply

pauloxnet Dec 13, 2025
Author

Hi Raz,
thank you very much for the detailed and thoughtful reply — it really helped me better understand the design decisions behind MediaKit. I appreciate the clear distinction between semantic similarity (CLIP) and visual similarity (ResNet), and the emphasis on keeping Immich’s database read-only and the two projects loosely coupled.

I had a couple of follow-up thoughts and questions, more as architectural curiosity than feature requests:

1. PostgreSQL + pgvector as an alternative to Qdrant (in a separate DB)
I completely understand the reasons for not writing anything into Immich’s PostgreSQL. I was wondering whether, instead of Qdrant, using a second, fully separate PostgreSQL instance with pgvector (containerized in the same docker-compose) could be an interesting option in the future.
This would still keep MediaKit fully isolated from Immich’s DB, preserve the current “read-only access” model, and keep setup simple, while potentially making it easier one day to upstream or share parts of the similarity logic with Immich itself. Of course this is just a thought experiment — I’m curious whether you see any strong technical drawbacks compared to Qdrant.

2. GPU support beyond NVIDIA (Intel Arc / iGPU)
I saw the existing GPU support options and they already cover many cases — thank you for that. Do you think there is a feasible path to also support Intel GPUs (Arc or integrated) via OpenVINO / oneAPI, similar to how Immich supports multiple GPU backends? Or would that add too much complexity for limited benefit?

3. Photo culling / quality ranking
Finally, I was wondering if you ever considered (or would consider) extending MediaKit beyond duplicate detection into basic photo culling. For example, assigning a simple quality or “best shot” score within a group of similar images and optionally writing that back to Immich as a star rating.
I see MediaKit as already solving the hardest part (robust visual grouping), and photo culling could be an optional, higher-level layer on top — but I fully understand if this is out of scope for the project.

Thanks again for building and sharing MediaKit. I’m genuinely excited about using it together with Immich, and I really appreciate the care you put into keeping the architecture clean and safe.

RazgrizHsu · 2025-12-14T13:22:03Z

RazgrizHsu
Dec 14, 2025
Maintainer

Hi @pauloxnet,

Thanks for the continued interest!

MediaKit is simply a tool I built to solve a specific problem — managing duplicate photos in my family's library. The current architecture (Qdrant + SQLite + read-only Immich access) was chosen to keep things lightweight and simple for this purpose.

Features like alternative vector backends, additional GPU support, or photo quality scoring are beyond the current scope of the project. However, since this is an open-source project, contributions are always welcome if any of these ideas interest you!

Thanks for understanding, and happy organizing!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusing Immich’s pgvector Embeddings and Mediakit’s Architecture #55

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Reusing Immich’s pgvector Embeddings and Mediakit’s Architecture #55

Uh oh!

pauloxnet Dec 10, 2025

Replies: 2 comments · 1 reply

Uh oh!

RazgrizHsu Dec 12, 2025 Maintainer

1. Why ResNet152 instead of reusing Immich's CLIP embeddings?

2. Why Qdrant instead of pgvector?

3. GPU Support

4. Reusing Immich's GPU environment

Uh oh!

pauloxnet Dec 13, 2025 Author

Uh oh!

RazgrizHsu Dec 14, 2025 Maintainer

pauloxnet
Dec 10, 2025

Replies: 2 comments 1 reply

RazgrizHsu
Dec 12, 2025
Maintainer

pauloxnet Dec 13, 2025
Author

RazgrizHsu
Dec 14, 2025
Maintainer