Conversation
WalkthroughAdjusts face-embedding clustering by adding validation, normalization, duplicate detection, and tightening DBSCAN params; tightens OpenAPI metadata schema and modifies onboarding UI step indicators from 1-based to 0-based indexing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
frontend/src/components/OnboardingSteps/ThemeSelectionStep.tsx (1)
54-68: Critical: Revert to 1-based step indexing for user display.Same issue as in AvatarSelectionStep.tsx - the removal of
+ 1breaks user-facing display. Users will see "Step 0 of 3" with 0% progress at the first step.Apply this diff to restore correct 1-based indexing:
- const progressPercent = Math.round(((stepIndex ) / totalSteps) * 100); + const progressPercent = Math.round(((stepIndex + 1) / totalSteps) * 100); return ( <> <Card className="flex max-h-full w-1/2 flex-col border p-4"> <CardHeader className="p-3"> <div className="text-muted-foreground mb-1 flex justify-between text-xs"> <span> - Step {stepIndex } of {totalSteps} + Step {stepIndex + 1} of {totalSteps} </span> <span>{progressPercent}%</span> </div>frontend/src/components/OnboardingSteps/FolderSetupStep.tsx (1)
67-83: Critical: Revert to 1-based step indexing for user display.Same issue as in the other onboarding step components - the removal of
+ 1breaks the step indicator and progress bar for users.Apply this diff to restore correct 1-based indexing:
- const progressPercent = Math.round(((stepIndex) / totalSteps) * 100); + const progressPercent = Math.round(((stepIndex + 1) / totalSteps) * 100); return ( <> <Card className="flex max-h-full w-1/2 flex-col border p-4"> <CardHeader className="p-3"> <div className="text-muted-foreground mb-1 flex justify-between text-xs"> <span> - Step {stepIndex} of {totalSteps} + Step {stepIndex + 1} of {totalSteps} </span> <span>{progressPercent}%</span> </div>
🧹 Nitpick comments (2)
docs/backend/backend_python/openapi.json (1)
1120-1127: Consider simplifying the parameter schema pattern.The change wraps a single
$refin anallOfconstruct and adds metadata (title, default) at the parameter level. While valid in OpenAPI 3.1.0, this pattern is unconventional. Consider whether you should instead:
- Enhance the
InputTypeschema directly (at lines 2261–2267) withtitleanddefaultproperties, then use a plain$ref, or- Keep the direct
$refpattern if the title and default are parameter-specific (for documentation clarity alone).The current approach works but may confuse downstream code generators or API tooling that expect simpler schema references.
If stricter metadata handling is the goal, consider this alternative:
{ "name": "input_type", "in": "query", "required": false, "schema": { "$ref": "#/components/schemas/InputType" }, "description": "Choose input type: 'path' or 'base64'", "example": "path" }Then enhance the
InputTypeschema incomponents/schemasto include a default:"InputType": { "type": "string", "enum": ["path", "base64"], "title": "InputType", "default": "path" }backend/app/utils/face_clusters.py (1)
351-356: Remove commented-out code.The old line is dead code and should be removed for clarity.
# Normalize the face embedding - # face_norm = face_embedding / np.linalg.norm(face_embedding) norm = np.linalg.norm(face_embedding) if norm < 1e-6: return np.ones(len(cluster_means)) # max distance face_norm = face_embedding / norm -
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (6)
backend/app/utils/face_clusters.py(3 hunks)docs/backend/backend_python/openapi.json(1 hunks)frontend/src/components/OnboardingSteps/AvatarSelectionStep.tsx(1 hunks)frontend/src/components/OnboardingSteps/FolderSetupStep.tsx(1 hunks)frontend/src/components/OnboardingSteps/OnboardingStep.tsx(1 hunks)frontend/src/components/OnboardingSteps/ThemeSelectionStep.tsx(1 hunks)
🔇 Additional comments (2)
backend/app/utils/face_clusters.py (2)
161-162: LGTM!The adjusted DBSCAN parameters (
eps=0.35,min_samples=3) are reasonable for reducing over-clustering with cosine distance on normalized embeddings. This aligns with the PR objective to prevent unrelated faces from collapsing into a single cluster.
210-220: LGTM!Normalizing embeddings before DBSCAN with cosine metric is the correct approach. The duplicate detection is a useful debug heuristic that won't affect clustering behavior.
|
Hi, sorry. Even after the changes, the problem still persists. So I am closing this PR. |
Solves issue: BUG: Incorrect clustering for a folder with many images #722
PR Title
Fix face clustering collapsing unrelated faces into a single cluster
Description
Problem
In some cases, AI face clustering grouped a large number of unrelated faces into a single cluster.
This typically occurred on larger datasets when invalid, zero, or unnormalized face embeddings were passed to DBSCAN, causing cosine distance to behave incorrectly and collapse clusters.
Root Cause
Solution
This PR introduces minimal, targeted fixes to the face clustering utility:
None) and near-zero face embeddings.These changes ensure that only meaningful embeddings participate in clustering, preventing unrelated faces from collapsing into a single cluster.
Impact
Testing
Checklist
Team Name - EtherX
Summary by CodeRabbit
Improvements
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.