Skip to content

Optimize import pipeline and local cache writes#524

Open
prodriguezxyz wants to merge 1 commit intojmathai:masterfrom
prodriguezxyz:review/optimization-analysis
Open

Optimize import pipeline and local cache writes#524
prodriguezxyz wants to merge 1 commit intojmathai:masterfrom
prodriguezxyz:review/optimization-analysis

Conversation

@prodriguezxyz
Copy link
Copy Markdown

Summary

This PR improves import performance by reducing repeated local cache I/O and lowering ExifTool overhead during
import.

Changes

  • Reused a single Db instance per command and deferred cache writes with flush().
  • Ensured import and update persist successful work even when exiting with partial errors.
  • Added in-memory geolocation caching plus a simple spatial index for coordinate lookups.
  • Batched EXIF reads during import and limited ExifTool requests to only the tags needed for path and filename
    generation.
  • Added benchmark and profiling scripts for reproducible validation.

Impact

  • hash_db writes are now dramatically faster in local benchmarks.
  • Geolocation lookup scales better for larger location caches.
  • Profiled import on 50 plain.jpg files improved from 1.2477s to 0.9922s.

Validation

Executed:

  • venv/bin/python -m pytest elodie/tests/localstorage_test.py
  • venv/bin/python -m pytest elodie/tests/localstorage_test.py elodie/tests/filesystem_test.py -k
    'defers_hash_db_write'
  • venv/bin/python -m pytest elodie/tests/elodie_test.py -k 'import_file_photo or
    cli_import_persists_hash_db_on_success or cli_import_flushes_successes_even_when_command_exits_with_error'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant