Skip to content

[FR] SQL Query Improvments for Larger DB #6455

@Gykes

Description

@Gykes

Describe the feature you'd like

Currently, there are some edge case issue for larger databases.

The main conversation for it starts on Discord here

My proposed fixes/helps are on Discord here

Current Behavior:

When filtering scenes by folder path (e.g., /data/namer/), Stash builds a query like:

  SELECT DISTINCT scenes.id
  FROM scenes
  LEFT JOIN scenes_files ON scenes_files.scene_id = scenes.id
  LEFT JOIN files ON scenes_files.file_id = files.id
  LEFT JOIN folders ON files.parent_folder_id = folders.id
  ...
  WHERE (folders.path || '/' || files.basename LIKE '%/data/namer/%')
  AND (TRIM(scene_stash_ids.stash_id) = '' OR scene_stash_ids.stash_id IS NULL)

This approach has several performance issues:

  1. LIKE '%pattern%' with leading wildcard prevents SQLite from using the index on folders.path, forcing a full table scan

  2. String concatenation (folders.path || '/' || files.basename) must be computed for every row

  3. TRIM() function call prevents index usage and adds per-row overhead

  4. Join order: Starting from scenes and LEFT JOIN everything processes all 689k scenes before filtering

Proposed Optimizations:

  1. Use GLOB instead of LIKE for prefix matching
  • Instead of: folders.path LIKE '%/data/namer/%' use: folders.path GLOB '/data/namer/*'

  • GLOB enables SQLite to use the B-tree index on folders.path for prefix matching. It's also case-sensitive, which is correct for filesystem paths and reduces comparison overhead.

  1. Reorder joins to filter early
  • Start from folders (the most selective table after filtering) and INNER JOIN to files -> scenes_files -> scenes, rather than starting from scenes and LEFT JOIN everything. This lets SQLite narrow down to matching scenes immediately instead of processing all scenes first.
  1. Add INDEXED BY hints
  • Explicitly hint the query planner to use optimal indexes:
    • INDEXED BY index_folders_on_path_unique (TESTING REQUIRED)
    • INDEXED BY index_scenes_files_file_id (TESTING REQUIRED
  1. Remove TRIM() from stash_id check
  • Instead of: TRIM(scene_stash_ids.stash_id) = ''

  • Use: scene_stash_ids.stash_id = ''

The TRIM() function prevents index usage and adds overhead. Stash IDs should be trimmed/validated when stored, not on every query.

Performance Testing

Tested on a database with:

  • 689,458 scenes
  • 673,000+ folders
  • 50+ million files
Query Version Time
Current (LIKE with leading wildcard) ~530ms
Remove leading wildcard only ~400ms
+ Optimized join order ~290ms
+ GLOB + all optimizations ~100ms

Describe the benefits this would bring to existing users

Faster SQL and no warnings in their logs

Is there an existing way to achieve this goal?

No

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover the core request of my proposal

Additional context

A simpler interim improvement would be adding a "Starts With" option to the folder filter UI. This would allow users to opt into LIKE '/path/%' (no leading wildcard) when they know their target is a top-level folder, without requiring query builder changes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions