You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds crash-recoverable image storage maintenance for moving existing images into the active image_subfolder_strategy.
The feature includes:
Admin-only UI controls in Settings, available to the single local user when multi-user is disabled.
External script entry point at scripts/image_storage_maintenance.py.
A journaled backend move service with startup recovery.
Database tables for move jobs and move items.
Maintenance guards around image reads, uploads, deletes, gallery mutations, queue enqueue, recall, and image-to-prompt.
Empty source directory cleanup after successful moves and recovery.
Special handling for missing intermediate image files, treating them as already cleaned up while preserving hard failures for missing non-intermediate images.
Documentation under docs/src/content/docs/features/image-storage-maintenance.mdx.
Confirm gallery images still load and no duplicate old/new full-size files remain for moved images.
Expected behavior:
Non-admin users in multi-user mode cannot see or use the Settings control.
The single local user can use it when multi-user mode is disabled.
Queue work must be idle before a move starts.
Missing intermediate image files should not fail the move.
Missing non-intermediate image files should cause the job to require operator attention.
Merge Plan
No special merge sequencing expected. This adds a SQLite migration, so use the normal InvokeAI DB migration path.
Known limitation: the external script does not yet take an interprocess lock against a running InvokeAI process. Operators should run scripts/image_storage_maintenance.py only while InvokeAI is stopped. The code includes a TODO to add this guard in a follow-up.
An additional follow-up may also display a "busy" interface in the UI or allow chunked move operations to be safely performed in the background, synchronized with the UI.
Checklist
The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)
High: invokeai/app/api/routers/boards.py:104-127 — delete_board with include_images=true calls images.delete_images_on_board(board_id=board_id), which writes to disk via image_files.delete, but the handler lacks assert_image_move_maintenance_inactive(). Every other image-mutating route on the branch was guarded (see invokeai/app/api/routers/images.py, invokeai/app/api/routers/board_images.py, invokeai/app/api/routers/recall_parameters.py, invokeai/app/api/routers/session_queue.py, invokeai/app/api/routers/utilities.py). Scenario: an admin starts an image move job; a parallel DELETE /api/v1/boards/{board_id}?include_images=true then races os.replace calls in complete_partial_filesystem_moves (invokeai/app/services/image_moves/image_moves_default.py:390-441) against image_files.delete. Outcomes: either the move's preflight rule Both old and new image files exist fires (invokeai/app/services/image_moves/image_moves_default.py:405-406) and mark_job_unrecoverable (invokeai/app/services/image_moves/image_moves_default.py:729-735) writes the job to permanent error state, or the commit-validation check (invokeai/app/services/image_moves/image_moves_default.py:481-497) fails because images rows were deleted/changed under the active job. Either way the move journal is poisoned. To expose this issue, add a test that asserts DELETE /api/v1/boards/{board_id}?include_images=true returns 409 when image_moves.is_maintenance_active() is True, parallel to tests/app/routers/test_board_images_maintenance.py.
Medium: invokeai/backend/util/image_storage_maintenance.py:67 (entry points: invokeai/backend/util/image_storage_maintenance.py:58-86, invokeai/scripts/image_storage_maintenance.py) — the TODO on line 67 is load-bearing. There is no interprocess guard preventing the CLI image_storage_maintenance script from running move_all_images while the API server process is up. Scenario: API server is idle (no pending or in_progress queue items, so the CLI's service.assert_no_active_queue_work() at line 74 passes). The CLI completes one batch and commits the job. Between commit_database_updates and the next iteration's create_move_job, the API server's assert_image_move_maintenance_inactive() checks _get_active_job_id() (invokeai/app/services/image_moves/image_moves_default.py:626-638) which returns None (the last job is committed), so a POST /api/v1/queue/{id}/enqueue_batch slips through. The session processor dequeues it and the running invocation calls image_files.save via images.create (invokeai/app/services/images/images_default.py:68, 92), writing to disk while the CLI's next plan_batch reads images and races against a half-written file. For non-intermediate images, preflight_moves raises FileNotFoundError (invokeai/app/services/image_moves/image_moves_default.py:362-364) and aborts the whole batch. Acknowledged by the in-tree TODO but unmitigated. To expose this issue, add a test that boots two ImageMoveService instances against the same SQLite DB / image root, asserts that an API-enqueue path between batches of the other can succeed, and that at least one job ends with state == 'error'.
Medium: invokeai/app/api/routers/images.py:441-455 and invokeai/app/api/routers/images.py:476-490 — get_image_full and get_image_thumbnail now call assert_image_move_maintenance_inactive(), returning HTTP 409 to every <img src> request during a move. The frontend uses these endpoints as static image sources for the gallery, board covers, viewer, and thumbnails; for a non-trivial library the move can take many minutes, during which every rendered image in the UI breaks with 409. The patch acknowledges this implicitly (the comment at invokeai/app/api/routers/images.py:438-440 already mentions browser-issued <img> URLs). To expose this issue, add a test that asserts GET /api/v1/images/i/{image_name}/full returns 409 when maintenance is active, and decide whether the UX should instead render a "maintenance in progress" placeholder rather than broken <img> tags.
Medium: invokeai/app/services/image_moves/image_moves_default.py:309-378 — plan_batch rejects the entire batch on the first non-intermediate move whose source file is missing (line 363-364, raise FileNotFoundError). For an environment whose disk has even one missing non-intermediate file (crash mid-write, external file removal), move_all_images raises out of the loop at invokeai/app/services/image_moves/image_moves_default.py:228-232 and the entire move_all background operation fails before reaching subsequent batches. There is no skip-and-continue path. To expose this issue, add a test that seeds two non-intermediate images rows, deletes one source file, calls ImageMoveService.move_all_images(), and asserts that the second image still moves (currently it will not — the FileNotFoundError aborts the whole batch).
Medium: invokeai/app/services/image_moves/image_moves_default.py:104-109 — stop() calls self._executor.shutdown(wait=False, cancel_futures=False) so the background worker thread continues to mutate filesystem and DB after the Invoker has stopped. Other services calling stop() in the shutdown sequence may release resources the worker still depends on. BACKGROUND_SHUTDOWN_ERROR is recorded on the active job even if the worker successfully finishes after shutdown, which then leaves the next startup with an error_message despite the job being internally consistent. The journal + startup_recovery flow recovers on next boot, so this is recoverable, but the recorded error is misleading. To expose this issue, add a test that starts a long-running fake move_all target, calls service.stop() while it's still in progress, lets it complete, then asserts the journal job's error_message reflects actual failure rather than a shutdown notice.
Low: invokeai/app/api/routers/images.py:281-282, invokeai/app/api/routers/images.py:340-341, invokeai/app/api/routers/images.py:397-398, and invokeai/app/api/routers/recall_parameters.py:436-440 — the maintenance assertion runs before the per-route ownership/authorization assertion. Any authenticated caller learns whether maintenance is active for any image name they choose (including names they do not own), because the 409 response fires before the _assert_image_owner 403/404 path. Minor information disclosure (presence/absence of a maintenance window, not user data). To expose this issue, add a test that an authenticated non-admin user receives 409 (not 403/404) when calling DELETE /api/v1/images/i/{owned-by-someone-else} while maintenance is active, then decide whether to reorder the checks.
Low: invokeai/frontend/web/src/features/system/components/SettingsModal/SettingsImageStorageMaintenance.tsx:54-57 — pollingInterval: 2000 runs unconditionally for as long as the settings modal is mounted and canAccess is true. There is no backoff when no job is running; an admin who leaves the settings panel open generates a GET /api/v1/image_moves/status every two seconds indefinitely. Also lastInvalidatedJobIdRef resets on remount, so closing and reopening the modal after a committed move causes a second full tag invalidation against the same latest_job.id, refetching the gallery again. To expose this issue, add a vitest that mounts and unmounts the component twice with the same committed latest job and asserts api.util.invalidateTags is dispatched only once across mounts (e.g. by hoisting the de-duplication into a redux slice instead of a ref).
Low: invokeai/app/api/routers/images.py:438-440 and invokeai/app/api/routers/images.py:471-475 — the docstring claims get_image_full and get_image_thumbnail are unauthenticated because they're hit by <img> tags. Adding assert_image_move_maintenance_inactive() does not change the auth posture but the comment block no longer reflects that the route can also fail with 409. To expose this issue, update the docstrings to mention the 409 contract; no test needed.
Open Questions
invokeai/app/services/image_moves/image_moves_default.py:191-195 — _refresh_finished_future_locked clears self._future and self._future_operation when the future completes, but self._last_background_error is left in place. After a failed background operation, a fresh start_background_move_all call resets self._last_background_error = None (line 158) before submitting, but get_background_status returned between failure and the next start will report is_running=False, operation=None, last_error=<old error>. Does the UI's Status running: None / Status: <state> composition (invokeai/frontend/web/src/features/system/components/SettingsModal/SettingsImageStorageMaintenance.tsx:86-97) intentionally swallow last_error? The component only renders latestJob.error_message, not status.last_error, so the recorded last error is invisible.
invokeai/app/services/image_moves/image_moves_default.py:688-696 — _remove_empty_parents uses start.resolve() and root.resolve() and walks up via current.parent. Python Path.resolve(strict=False) returns a path even for non-existent directories. After rmdir succeeds, current no longer exists, and current.parent on a resolved non-existent path returns the lexical parent, not a re-resolved one. On systems with symlinked outputs roots this is fine in practice, but worth a unit test confirming that a symlinked image_root does not allow the cleanup to escape past root.
DELETE /api/v1/boards/{board_id}?include_images=true now returns 409 during image storage maintenance.
Added router coverage to ensure image deletion through board deletion is not called during maintenance.
Tightened auth ordering:
Image delete, image update, workflow/graph read, clear intermediates, and recall now perform ownership/admin/read checks before returning maintenance 409.
This avoids leaking maintenance state before authorization is established.
Clarified full/thumbnail behavior:
Updated image full/thumbnail route docstrings to mention 409 during maintenance.
Updated docs to state gallery images and thumbnails may be unavailable while maintenance is active.
Improved move resilience:
Missing non-intermediate source files no longer abort the whole move_all_images() run.
They are recorded as error jobs, and later valid images continue moving.
Missing intermediate source files are treated as already cleaned up and committed.
Orphaned thumbnails for missing intermediate files are deleted.
Improved shutdown behavior:
ImageMoveService.stop() now waits for the active background worker to finish instead of recording a misleading shutdown error while the worker keeps mutating DB/filesystem state.
Improved Settings UI behavior:
Polling now runs only while a maintenance job is active or after starting/recovering a job.
Completed-job cache invalidation is deduped across modal remounts.
Background last_error is surfaced in the Settings panel.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6.14.xapibackendPRs that change backend filesdocsPRs that change docsfrontendPRs that change frontend filespythonPRs that change python filespython-testsPRs that change python testsservicesPRs that change app services
2 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds crash-recoverable image storage maintenance for moving existing images into the active
image_subfolder_strategy.The feature includes:
scripts/image_storage_maintenance.py.docs/src/content/docs/features/image-storage-maintenance.mdx.Related Issues / Discussions
#8969
QA Instructions
Validated with focused backend and frontend checks.
To test manually, use a development install with a temporary InvokeAI root. Do not test this against an existing image library. You have been warned.
image_subfolder_strategy: flat.image_subfolder_strategyininvokeai.yamltodate,type, orhash.outputs/images/andoutputs/images/thumbnails/according to the configured strategy.External script path:
statusexits successfully when no job needs attention.moveis idempotent after the UI move has already completed.Recovery smoke test:
Expected behavior:
Merge Plan
No special merge sequencing expected. This adds a SQLite migration, so use the normal InvokeAI DB migration path.
Known limitation: the external script does not yet take an interprocess lock against a running InvokeAI process. Operators should run
scripts/image_storage_maintenance.pyonly while InvokeAI is stopped. The code includes a TODO to add this guard in a follow-up.An additional follow-up may also display a "busy" interface in the UI or allow chunked move operations to be safely performed in the background, synchronized with the UI.
Checklist
What's Newcopy (if doing a release after this PR)