-
Notifications
You must be signed in to change notification settings - Fork 115
Index caching #82
base: master
Are you sure you want to change the base?
Index caching #82
Conversation
…ndexing job, and loading it (followed by a quick refresh) at the next startup
Thanks! For the hashing, you may want to have a look at some code I wrote here (ff4de3e) instead of using the ID. I'll test the code this weekend. |
I did see that, actually! It's a great approach for epub files, which seem to be Zip-based, but it unfortunately won't work with most other ebook formats like mobi, pdf, etc., which are not. That was part of the rationale behind my filename/mtime/size approach -- it'll work with literally any ebook format. The other part was that it's a lot faster to just stat() each file during the index refresh rather than opening and reading from each file. It brought the initial (uncached) indexing time down from minutes to seconds on my own library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. 👍
@@ -25,6 +26,7 @@ type Indexer struct { | |||
booklist booklist.BookList | |||
mu sync.Mutex | |||
indMu sync.Mutex | |||
seen *SeenCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an interface to allow for different cache implementations.
@@ -45,7 +47,79 @@ func New(paths []string, coverpath *string, exts []string) (*Indexer, error) { | |||
cp = &p | |||
} | |||
|
|||
return &Indexer{paths: paths, coverpath: cp, exts: exts}, nil | |||
return &Indexer{paths: paths, coverpath: cp, exts: exts, seen: NewSeenCache()}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cache should be passed as an argument to New.
return &Indexer{paths: paths, coverpath: cp, exts: exts, seen: NewSeenCache()}, nil | ||
} | ||
|
||
func (i *Indexer) Load() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this code may fit better as part of the cache. Ideally, the cache would handle it's own loading, and the indexer would query the cache during the indexing.
@@ -91,6 +98,13 @@ func (s *Server) RefreshBookIndex() error { | |||
} | |||
|
|||
debug.FreeOSMemory() | |||
|
|||
err = s.Indexer.Save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be able to be disabled using a command line flag.
Added support for saving the index to a simple JSON file after each indexing job, and loading it (followed by a quick refresh) at the next startup. This allows the book library to be populated immediately at startup, rather than waiting a potentially long time for the initial indexing job to complete.
The index file is saved as index.json in the existing cover path.
There's a lot of room for improvement here; this was a quick implementation to eliminate the 15-minute-plus delay at each startup while BookBrowser indexed my tens of thousands of ebooks stored on a remote filesystem.