refactor best shot engine handling and improve ETA calculations by duartebarbosadev · Pull Request #60 · duartebarbosadev/PhotoSort

duartebarbosadev · 2025-11-15T03:16:52Z

This pull request refactors the AI best-shot ranking system to exclusively use OpenAI-compatible vision models, removing support for the local IQA (MUSIQ/MANIQA/LIQE) pipeline. It updates the documentation, cleans up related code, and streamlines the user interface and configuration options to reflect this change. Additionally, it introduces a new ETA display for similarity embedding progress and a utility function for formatting ETA.

AI Best-Shot Ranking Refactor

Removed local IQA pipeline support and dependency checks, making OpenAI-compatible vision models the sole engine for best-shot ranking and AI star ratings. This includes deleting src/core/ai/model_checker.py, removing related settings and functions, and updating imports and signal handling throughout the codebase. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
Updated documentation in README.md to clarify that only OpenAI-compatible vision models are supported for best-shot ranking and ratings, and removed references to the local pipeline. [1] [2]

User Interface and Preferences

Simplified the preferences dialog by removing engine selection for best-shot analysis and updating descriptive text to reflect the exclusive use of OpenAI-compatible models. [1] [2] [3] [4] [5]

Similarity Embedding Progress

Added ETA display to similarity embedding progress updates in src/core/similarity_engine.py, enhancing user feedback during long-running operations. [1] [2]
Introduced a new utility function format_eta in src/core/utils/time_utils.py for formatting ETA values in HH:MM:SS format.

Build System

Fix huge crash on builds by updating PyInstaller build configuration in .github/workflows/release-build.yml to ensure all necessary paths are included for packaging. [1] [2]

Copilot

Pull Request Overview

This pull request refactors the best shot analysis engine by removing the local on-device model support and consolidating on the LLM-based approach. The changes simplify the codebase by removing best shot engine configuration options and related model checking infrastructure, while adding ETA (estimated time of arrival) calculations to provide better user feedback during image processing operations.

Key changes:

Removed the local best shot engine option, making LLM the only engine
Added ETA calculations to BestShotWorker, AiRatingWorker, and SimilarityEngine
Improved threading and executor management with explicit shutdown logic

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
workers/best_shot_worker.py	Removed models_root and engine parameters; added executor management with locks and ETA calculations
src/workers/ai_rating_worker.py	Removed models_root and engine parameters; added executor management and ETA calculations
src/core/similarity_engine.py	Added ETA calculation support for embedding generation
src/core/app_settings.py	Removed best shot engine configuration settings
src/core/ai/best_shot_pipeline.py	Removed LocalBestShotStrategy class; simplified to only support LLM; added cancellation support
src/core/ai/model_checker.py	Deleted entire file (no longer needed without local models)
src/core/ai/best_photo_selector.py	Deleted entire file (no longer needed without local models)
src/core/ai/init.py	Updated exports to reflect LLM-only approach
src/ui/worker_manager.py	Removed models_root and engine parameters; removed best_shot_models_missing signal
src/ui/dialog_manager.py	Removed best shot engine selection UI and related dialog
src/ui/app_controller.py	Removed best shot engine retrieval and models_missing signal handler
tests/test_best_shot_worker_connectivity.py	Updated tests to remove engine parameter
tests/test_best_shot_worker.py	Removed monkeypatch usage for model checking
tests/test_best_shot_model_checker.py	Deleted entire test file
tests/test_best_photo_selector.py	Deleted entire test file
README.md	Updated documentation to reflect LLM-only approach

Comments suppressed due to low confidence (1)

workers/best_shot_worker.py:379

Potential race condition: The futures dictionary is being updated (line 377-379) outside of the _executor_lock. If stop() is called from another thread during this loop, _shutdown_executor() will capture only a partial list of futures to cancel. Consider either moving the entire submission loop inside the lock, or use a thread-safe approach to track futures as they're created.

                for cluster_id, paths in self._iter_clusters():
                    if self._should_stop:
                        logger.info("Best shot worker stop requested before submit.")
                        break
                    if not paths:
                        continue
                    futures[
                        executor.submit(self._analyze_cluster, cluster_id, paths)
                    ] = cluster_id

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-15T03:21:03Z

src/workers/ai_rating_worker.py

+    @staticmethod
+    def _format_eta(seconds: float) -> str:
+        bounded = max(0, int(round(seconds)))
+        return str(timedelta(seconds=bounded))


The _format_eta method is duplicated here from best_shot_worker.py and similarity_engine.py. This is the same utility function that should be extracted into a shared module to avoid code duplication.

Copilot · 2025-11-15T03:21:04Z

src/core/similarity_engine.py

+    @staticmethod
+    def _format_eta(seconds: float) -> str:
+        bounded = max(0, int(round(seconds)))
+        return str(timedelta(seconds=bounded))


The _format_eta method is duplicated here from best_shot_worker.py and ai_rating_worker.py. This is the same utility function that should be extracted into a shared module to avoid code duplication.

Copilot · 2025-11-15T03:21:04Z

src/workers/ai_rating_worker.py

+            start_time = time.perf_counter()
+            try:
+                for path in self.image_paths:
+                    futures[executor.submit(self._rate_single, path)] = path


Potential race condition: The futures dictionary is being updated (line 157) outside of the _executor_lock. If stop() is called from another thread during this loop, _shutdown_executor() will capture only a partial list of futures to cancel. Consider either moving the entire submission loop inside the lock, or use a thread-safe approach to track futures as they're created.

Suggested change

start_time = time.perf_counter()

try:

for path in self.image_paths:

futures[executor.submit(self._rate_single, path)] = path

for path in self.image_paths:

futures[executor.submit(self._rate_single, path)] = path

start_time = time.perf_counter()

try:

Copilot · 2025-11-15T03:21:04Z

src/core/ai/best_shot_pipeline.py

+    def request_cancel(self) -> None:
+        self._cancel_event.set()
+        try:
+            if hasattr(self._client, "close") and not self._client_closed:
+                self._client.close()
+                self._client_closed = True
+        except Exception:
+            logger.debug("Failed to close LLM client during cancellation", exc_info=True)


Potential race condition: request_cancel() closes the _client while concurrent threads may be using it in _call_llm() (lines 274-296). Although _lock protects the actual API call, the client is closed outside the lock. This could lead to errors in threads that are about to acquire the lock or are checking _cancel_event but haven't yet made the API call. Consider checking _client_closed flag inside the lock before making the API call, or ensure the client closing happens in a way that safely terminates ongoing requests.

Copilot · 2025-11-15T03:21:04Z

workers/best_shot_worker.py

+    @staticmethod
+    def _format_eta(seconds: float) -> str:
+        bounded = max(0, int(round(seconds)))
+        return str(timedelta(seconds=bounded))


The _format_eta method is duplicated across three files (best_shot_worker.py, ai_rating_worker.py, and similarity_engine.py). Consider extracting this utility function into a shared module (e.g., core/utils.py or workers/worker_utils.py) to reduce code duplication and improve maintainability.

- Removed best shot engine settings and related functions from app_settings.py. - Updated SimilarityEngine to include ETA calculations during image processing. - Simplified AppController by removing best shot engine retrieval and related signals. - Removed best shot engine configuration from DialogManager and adjusted UI accordingly. - Cleaned up WorkerManager by removing unused best shot engine parameters. - Enhanced AiRatingWorker and BestShotWorker to manage threading and ETA calculations more effectively. - Deleted obsolete tests related to best shot engine and model checking. - Improved overall code readability and maintainability by consolidating logic and removing unnecessary complexity.

Copilot

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-15T04:52:09Z

src/core/ai/best_shot_pipeline.py

+    def _extract_rating(self, analysis: str) -> Optional[int]:
+        if not analysis:
+            return None
+
+        # Try JSON parsing first, as the prompt requests structured output
+        try:
+            parsed = json.loads(analysis)
+            if isinstance(parsed, dict) and "rating" in parsed:
+                return int(round(float(parsed["rating"])))
+        except (ValueError, TypeError, json.JSONDecodeError):
+            # Fall back to unstructured parsing if the model returned plain text.
+            pass
+
+        patterns = [
+            r"\brating\b[^0-9]*([1-5](?:\.[0-9]+)?)",
+            r"\boverall rating\b[^0-9]*([1-5](?:\.[0-9]+)?)",
+            r"\bscore\b[^0-9]*([1-5](?:\.[0-9]+)?)",
+            r"([1-5])\s*/\s*5",
+            r"([1-5])\s*out of\s*5",
+            r"([1-5])\s*stars",
+        ]
+        for pattern in patterns:
+            match = re.search(pattern, analysis, re.IGNORECASE)
+            if match:
+                try:
+                    return int(round(float(match.group(1))))
+                except (ValueError, TypeError):
+                    continue
+
+        return None
+


The _extract_rating method is defined but never called. Since the rate_image method now uses structured tool calls to get the rating directly from structured_payload.get("overall_rating") (line 567), this fallback parsing method appears to be dead code and should be removed.

Suggested change

def _extract_rating(self, analysis: str) -> Optional[int]:

if not analysis:

return None

# Try JSON parsing first, as the prompt requests structured output

try:

parsed = json.loads(analysis)

if isinstance(parsed, dict) and "rating" in parsed:

return int(round(float(parsed["rating"])))

except (ValueError, TypeError, json.JSONDecodeError):

# Fall back to unstructured parsing if the model returned plain text.

pass

patterns = [

r"\brating\b[^0-9]*([1-5](?:\.[0-9]+)?)",

r"\boverall rating\b[^0-9]*([1-5](?:\.[0-9]+)?)",

r"\bscore\b[^0-9]*([1-5](?:\.[0-9]+)?)",

r"([1-5])\s*/\s*5",

r"([1-5])\s*out of\s*5",

r"([1-5])\s*stars",

]

for pattern in patterns:

match = re.search(pattern, analysis, re.IGNORECASE)

if match:

try:

return int(round(float(match.group(1))))

except (ValueError, TypeError):

continue

return None

Copilot · 2025-11-15T04:52:10Z

workers/best_shot_worker.py

+            futures: Dict[Future, int] = {}
+            executor = ThreadPoolExecutor(max_workers=self._max_workers)
+            with self._executor_lock:
+                self._executor = executor
+                self._futures = futures


The executor and futures are initialized outside the context manager, but futures dictionary is only assigned during the iteration over self._iter_clusters(). If the loop never starts or returns no clusters, futures remains empty but was already assigned at line 364. However, the assignment at line 368 (self._futures = futures) happens before any futures are added to the dict (lines 420+), creating a potential race condition. The self._futures should be updated inside the lock whenever futures are added to the local futures dict.

Copilot

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-15T04:59:38Z

src/workers/ai_rating_worker.py

+            futures: Dict[Future, str] = {}
+            with self._executor_lock:
+                self._executor = executor
+                self._futures = futures


There's a potential race condition in how futures and self._futures are managed. On line 153, self._futures = futures creates a reference to the same dictionary object. When futures are added to the local futures dict (line 159), they're added under a lock, which is good. However, this means self._futures and the local futures variable are the same object throughout execution.

The issue: if stop() is called from another thread during the loop (lines 162-197), _shutdown_executor will call list(self._futures.keys()) while the main thread is iterating over futures with as_completed(futures). This creates a potential race where:

The main thread is iterating with as_completed(futures)

Another thread calls stop() → _shutdown_executor() → clears self._futures = {}

The main thread's futures dict is now empty too (same object reference)

Suggestion: Create a copy when assigning to self._futures to avoid this shared reference issue:

with self._executor_lock: self._executor = executor self._futures = futures.copy() # or dict(futures)

Or update self._futures within the lock each time instead of sharing the reference.

Suggested change

self._futures = futures

self._futures = futures.copy()

Copilot · 2025-11-15T04:59:38Z

workers/best_shot_worker.py

                    processed += 1
                    percent = int((processed / total_jobs) * 100)
+                    elapsed = max(0.0, time.perf_counter() - start_time)
+                    avg = elapsed / processed if processed else 0.0


The condition if processed else 0.0 is redundant here. At this point in the code (line 416), processed has just been incremented to at least 1 (line 413), so it's guaranteed to be non-zero. The condition will always evaluate to the true branch (elapsed / processed).

Suggested change

avg = elapsed / processed if processed else 0.0

avg = elapsed / processed

Copilot · 2025-11-15T04:59:38Z

src/workers/ai_rating_worker.py

+            return self._format_eta(0)
+        remaining = max(0, total - processed)
+        elapsed = max(0.0, time.perf_counter() - start_time)
+        avg = elapsed / processed if processed else 0.0


The condition if processed else 0.0 is redundant here. At this point in the code (line 233), processed is guaranteed to be at least 1 because the conditional check is evaluated after the calculation avg = elapsed / processed if processed else 0.0. If we reach line 233, the division by processed on line 232 would have already occurred, implying processed > 0.

Suggested change

avg = elapsed / processed if processed else 0.0

avg = elapsed / processed

Copilot · 2025-11-15T04:59:39Z

src/core/similarity_engine.py

+                elapsed = max(0.0, time.perf_counter() - start_time)
+                avg = elapsed / processed_count if processed_count else 0.0
+                remaining = max(0, total_to_process - processed_count)
+                eta_seconds = avg * remaining if processed_count else 0.0


The condition if processed_count else 0.0 is redundant here. At line 269, this check occurs after processed_count has been incremented (line 260), so processed_count is guaranteed to be at least 1. The ternary expression will always evaluate to avg * remaining.

Suggested change

eta_seconds = avg * remaining if processed_count else 0.0

eta_seconds = avg * remaining

Copilot · 2025-11-15T04:59:39Z

src/core/similarity_engine.py

+                eta_text = (
+                    self._format_eta(eta_seconds)
+                    if processed_count
+                    else eta_placeholder
+                )


The conditional if processed_count on line 272 is redundant. At this point, processed_count has already been incremented (line 260) and is guaranteed to be at least 1. The condition will always be true, so the else branch (eta_placeholder) can never be reached.

Suggested change

eta_text = (

self._format_eta(eta_seconds)

if processed_count

else eta_placeholder

)

eta_text = self._format_eta(eta_seconds)

…rkers

Copilot

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-15T05:09:43Z

src/ui/worker_manager.py

    def start_best_shot_analysis(
        self,
        cluster_map: Dict[int, List[str]],
-        models_root: Optional[str] = None,
        *,
        folder_path: Optional[str] = None,
        analysis_cache=None,


[nitpick] The removal of the models_root parameter from start_best_shot_analysis() is a breaking change to the public API. If this method is called by external code or plugins, they will receive a TypeError for unexpected keyword arguments. Consider deprecating the parameter first or documenting this as a breaking change.

Copilot · 2025-11-15T05:09:44Z

src/ui/worker_manager.py

    def start_ai_rating(
        self,
        image_paths: List[str],
-        models_root: Optional[str] = None,
-        engine: Optional[str] = None,
    ) -> None:


The start_ai_rating() method signature has changed from accepting models_root and engine parameters to accepting neither. This is a breaking API change. If this method is part of a public API or called by external code, existing callers passing these arguments will fail with TypeError. Consider maintaining backward compatibility by accepting these parameters as optional **kwargs and ignoring them with a deprecation warning.

Copilot · 2025-11-15T05:09:44Z

README.md

-the API key blank.
+PhotoSort relies on an OpenAI-compatible vision model to rank
+similar shots and request AI star ratings. Configure the endpoint under
+**Preferences → AI Rating Engine** (`F10`) by providing the API key (optional for


The documentation states that users should configure settings under "Preferences → AI Rating Engine" but based on the dialog_manager.py changes, it's not clear if this menu item or dialog section was renamed. Verify that the menu path referenced in the documentation accurately reflects the actual UI labels.

Suggested change

**Preferences → AI Rating Engine** (`F10`) by providing the API key (optional for

**Settings → AI Rating Engine** (`F10`) by providing the API key (optional for

Copilot · 2025-11-15T05:09:44Z

workers/best_shot_worker.py

+    def _format_eta(seconds: float) -> str:
+        bounded = max(0, int(round(seconds)))
+        return str(timedelta(seconds=bounded))


The _format_eta method is duplicated across multiple workers (BestShotWorker, AiRatingWorker, and SimilarityEngine). Consider extracting this to a shared utility module to reduce code duplication and improve maintainability.

Copilot · 2025-11-15T05:09:45Z

workers/best_shot_worker.py

+                    elapsed = max(0.0, time.perf_counter() - start_time)
+                    avg = elapsed / processed
+                    remaining = max(0, total_jobs - processed)
+                    eta_seconds = avg * remaining
+                    eta_text = self._format_eta(eta_seconds)


The ETA calculation logic is duplicated between BestShotWorker.run() (lines 420-424) and _estimate_eta_text() method in AiRatingWorker (lines 233-240). Consider consolidating this into a single reusable method to avoid inconsistencies and improve maintainability.

Copilot · 2025-11-15T05:09:45Z

workers/best_shot_worker.py

-                        f" - {progress_detail}"
+                    self.progress_update.emit(
+                        percent,
+                        f"Cluster {cluster_id}: best candidate {os.path.basename(best_path)} • ETA {eta_text}",


[nitpick] The bullet point separator has been changed from " - " to " • " (middle dot). Ensure this change is intentional and consistent with the application's style guide, as it changes the visual presentation of progress messages.

Suggested change

f"Cluster {cluster_id}: best candidate {os.path.basename(best_path)} • ETA {eta_text}",

f"Cluster {cluster_id}: best candidate {os.path.basename(best_path)} - ETA {eta_text}",

Copilot · 2025-11-15T05:09:45Z

src/core/ai/best_shot_pipeline.py

+    def request_cancel(self) -> None:
+        self._cancel_event.set()
+        try:
+            if hasattr(self._client, "close") and not self._client_closed:
+                self._client.close()
+                self._client_closed = True
+        except Exception:
+            logger.debug(
+                "Failed to close LLM client during cancellation", exc_info=True
+            )
+
+    def shutdown(self) -> None:
+        self.request_cancel()


[nitpick] The shutdown() method calls request_cancel(), but this creates a side effect where shutting down also sets the cancel event. Consider whether shutdown should be a clean termination separate from cancellation, or document that shutdown implies cancellation. The naming suggests they serve different purposes.

Suggested change

def request_cancel(self) -> None:

self._cancel_event.set()

try:

if hasattr(self._client, "close") and not self._client_closed:

self._client.close()

self._client_closed = True

except Exception:

logger.debug(

"Failed to close LLM client during cancellation", exc_info=True

)

def shutdown(self) -> None:

self.request_cancel()

def _close_client(self) -> None:

try:

if hasattr(self._client, "close") and not self._client_closed:

self._client.close()

self._client_closed = True

except Exception:

logger.debug(

"Failed to close LLM client", exc_info=True

)

def request_cancel(self) -> None:

self._cancel_event.set()

self._close_client()

def shutdown(self) -> None:

self._close_client()

Copilot · 2025-11-15T05:09:45Z

src/core/ai/best_shot_pipeline.py

+            if hasattr(self._client, "close") and not self._client_closed:
+                self._client.close()
+                self._client_closed = True
+        except Exception:
+            logger.debug(
+                "Failed to close LLM client during cancellation", exc_info=True
+            )
+


Potential race condition: The _client_closed flag is checked without synchronization at line 193, but it's set within the same method. If request_cancel() is called concurrently from multiple threads, multiple threads could pass the check and attempt to close the client, or one thread could close it while another is in the middle of checking. Consider using self._lock to protect this check-and-set operation.

Suggested change

if hasattr(self._client, "close") and not self._client_closed:

self._client.close()

self._client_closed = True

except Exception:

logger.debug(

"Failed to close LLM client during cancellation", exc_info=True

)

with self._lock:

if hasattr(self._client, "close") and not self._client_closed:

self._client.close()

self._client_closed = True

except Exception:

logger.debug(

"Failed to close LLM client during cancellation", exc_info=True

)

Copilot · 2025-11-15T05:09:46Z

src/core/similarity_engine.py

+    def _format_eta(seconds: float) -> str:
+        bounded = max(0, int(round(seconds)))
+        return str(timedelta(seconds=bounded))


The _format_eta method duplicates functionality across BestShotWorker (line 454-456), AiRatingWorker (line 229-231), and SimilarityEngine (line 77-79). This is the same utility being defined three times. Consider extracting it to a shared time utilities module.

Copilot · 2025-11-15T05:09:46Z

src/core/ai/__init__.py

+"""
+AI helper utilities for advanced ranking/scoring pipelines.

-from .best_photo_selector import (  # noqa: F401
-    BestPhotoSelector,
-    BestShotResult,
-    MetricSpec,
-)
+This package exposes the LLM-based best-shot pipeline used for ranking clusters
+and assigning AI star ratings.
+"""


[nitpick] The module docstring change from "AI helper utilities for best-shot ranking and scoring." to a multi-line description is inconsistent with the previous single-line style. Consider keeping the docstring format consistent with other modules in the codebase.

Copilot

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-15T06:03:54Z

tests/test_best_shot_worker.py

@@ -65,7 +65,7 @@ def rate_image(self, image_path):  # pragma: no cover - unused in test
        return None



The DummyStrategy class is missing the request_cancel method that was added to BaseBestShotStrategy in the refactored code (line 145 in best_shot_pipeline.py). This will cause AttributeError when the worker calls self._strategy.request_cancel() during shutdown (line 73 in best_shot_worker.py). Add a stub implementation: def request_cancel(self) -> None: pass.

Suggested change

def request_cancel(self) -> None:

pass

Copilot · 2025-11-15T06:03:55Z

tests/test_best_shot_worker_connectivity.py

 class _FailingStrategy(BaseBestShotStrategy):
    def __init__(self):
-        super().__init__(models_root=None, image_pipeline=None, llm_config=None)
+        super().__init__(image_pipeline=None, llm_config=None)

    def rank_cluster(self, cluster_id, image_paths):
        return []


The _FailingStrategy class is missing the request_cancel method that was added to BaseBestShotStrategy. This will cause AttributeError when the worker calls self._strategy.request_cancel() during shutdown (line 73 in best_shot_worker.py). Add a stub implementation: def request_cancel(self) -> None: pass.

Copilot · 2025-11-15T06:03:55Z

workers/best_shot_worker.py

+    @staticmethod
+    def _format_eta(seconds: float) -> str:
+        bounded = max(0, int(round(seconds)))
+        return str(timedelta(seconds=bounded))


[nitpick] The _format_eta method is duplicated across three files (best_shot_worker.py line 454, ai_rating_worker.py line 229, and similarity_engine.py line 77). Consider extracting this to a shared utility module (e.g., core.utils.time_utils) to reduce code duplication and improve maintainability.

…ing utility imports

Copilot

Pull Request Overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ld process

Copilot AI review requested due to automatic review settings November 15, 2025 03:16

Copilot started reviewing on behalf of duartebarbosadev November 15, 2025 03:17 View session

Copilot finished reviewing on behalf of duartebarbosadev November 15, 2025 03:20