-
-
Notifications
You must be signed in to change notification settings - Fork 978
Description
Describe the feature you'd like
Currently, there are some edge case issue for larger databases.
The main conversation for it starts on Discord here
My proposed fixes/helps are on Discord here
Current Behavior:
When filtering scenes by folder path (e.g., /data/namer/), Stash builds a query like:
SELECT DISTINCT scenes.id
FROM scenes
LEFT JOIN scenes_files ON scenes_files.scene_id = scenes.id
LEFT JOIN files ON scenes_files.file_id = files.id
LEFT JOIN folders ON files.parent_folder_id = folders.id
...
WHERE (folders.path || '/' || files.basename LIKE '%/data/namer/%')
AND (TRIM(scene_stash_ids.stash_id) = '' OR scene_stash_ids.stash_id IS NULL)
This approach has several performance issues:
-
LIKE '%pattern%'with leading wildcard prevents SQLite from using the index onfolders.path, forcing a full table scan -
String concatenation
(folders.path || '/' || files.basename)must be computed for every row -
TRIM()function call prevents index usage and adds per-row overhead -
Join order: Starting from scenes and
LEFT JOINeverything processes all 689k scenes before filtering
Proposed Optimizations:
- Use
GLOBinstead ofLIKEfor prefix matching
-
Instead of:
folders.path LIKE '%/data/namer/%'use:folders.path GLOB '/data/namer/*' -
GLOBenables SQLite to use theB-tree indexonfolders.pathfor prefix matching. It's also case-sensitive, which is correct for filesystem paths and reduces comparison overhead.
- Reorder joins to filter early
- Start from folders (the most selective table after filtering) and
INNER JOINtofiles->scenes_files->scenes, rather than starting from scenes andLEFT JOINeverything. This lets SQLite narrow down to matching scenes immediately instead of processing all scenes first.
- Add INDEXED BY hints
- Explicitly hint the query planner to use optimal indexes:
INDEXED BY index_folders_on_path_unique(TESTING REQUIRED)INDEXED BY index_scenes_files_file_id(TESTING REQUIRED
- Remove
TRIM()from stash_id check
-
Instead of:
TRIM(scene_stash_ids.stash_id) = '' -
Use:
scene_stash_ids.stash_id = ''
The TRIM() function prevents index usage and adds overhead. Stash IDs should be trimmed/validated when stored, not on every query.
Performance Testing
Tested on a database with:
- 689,458 scenes
- 673,000+ folders
- 50+ million files
| Query Version | Time |
|---|---|
| Current (LIKE with leading wildcard) | ~530ms |
| Remove leading wildcard only | ~400ms |
| + Optimized join order | ~290ms |
| + GLOB + all optimizations | ~100ms |
Describe the benefits this would bring to existing users
Faster SQL and no warnings in their logs
Is there an existing way to achieve this goal?
No
Have you searched for an existing open/closed issue?
- I have searched for existing issues and none cover the core request of my proposal
Additional context
A simpler interim improvement would be adding a "Starts With" option to the folder filter UI. This would allow users to opt into LIKE '/path/%' (no leading wildcard) when they know their target is a top-level folder, without requiring query builder changes.