Skip to content
This repository was archived by the owner on Jun 27, 2020. It is now read-only.

Index caching #82

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Index caching #82

wants to merge 1 commit into from

Conversation

sblinch
Copy link
Contributor

@sblinch sblinch commented Mar 14, 2019

Added support for saving the index to a simple JSON file after each indexing job, and loading it (followed by a quick refresh) at the next startup. This allows the book library to be populated immediately at startup, rather than waiting a potentially long time for the initial indexing job to complete.

The index file is saved as index.json in the existing cover path.

There's a lot of room for improvement here; this was a quick implementation to eliminate the 15-minute-plus delay at each startup while BookBrowser indexed my tens of thousands of ebooks stored on a remote filesystem.

…ndexing job, and loading it (followed by a quick refresh) at the next startup
@pgaskin
Copy link
Owner

pgaskin commented Mar 14, 2019

Thanks! For the hashing, you may want to have a look at some code I wrote here (ff4de3e) instead of using the ID. I'll test the code this weekend.

@pgaskin pgaskin self-requested a review March 14, 2019 23:52
@pgaskin pgaskin self-assigned this Mar 14, 2019
@pgaskin pgaskin added this to the v4.1.0 milestone Mar 15, 2019
@sblinch
Copy link
Contributor Author

sblinch commented Mar 15, 2019

some code I wrote here (ff4de3e) instead of using the ID

I did see that, actually! It's a great approach for epub files, which seem to be Zip-based, but it unfortunately won't work with most other ebook formats like mobi, pdf, etc., which are not.

That was part of the rationale behind my filename/mtime/size approach -- it'll work with literally any ebook format. The other part was that it's a lot faster to just stat() each file during the index refresh rather than opening and reading from each file. It brought the initial (uncached) indexing time down from minutes to seconds on my own library.

Copy link
Owner

@pgaskin pgaskin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. 👍

@@ -25,6 +26,7 @@ type Indexer struct {
booklist booklist.BookList
mu sync.Mutex
indMu sync.Mutex
seen *SeenCache
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an interface to allow for different cache implementations.

@@ -45,7 +47,79 @@ func New(paths []string, coverpath *string, exts []string) (*Indexer, error) {
cp = &p
}

return &Indexer{paths: paths, coverpath: cp, exts: exts}, nil
return &Indexer{paths: paths, coverpath: cp, exts: exts, seen: NewSeenCache()}, nil
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache should be passed as an argument to New.

return &Indexer{paths: paths, coverpath: cp, exts: exts, seen: NewSeenCache()}, nil
}

func (i *Indexer) Load() error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this code may fit better as part of the cache. Ideally, the cache would handle it's own loading, and the indexer would query the cache during the indexing.

@@ -91,6 +98,13 @@ func (s *Server) RefreshBookIndex() error {
}

debug.FreeOSMemory()

err = s.Indexer.Save()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be able to be disabled using a command line flag.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants