Index caching #82

sblinch · 2019-03-14T00:16:56Z

Added support for saving the index to a simple JSON file after each indexing job, and loading it (followed by a quick refresh) at the next startup. This allows the book library to be populated immediately at startup, rather than waiting a potentially long time for the initial indexing job to complete.

The index file is saved as index.json in the existing cover path.

There's a lot of room for improvement here; this was a quick implementation to eliminate the 15-minute-plus delay at each startup while BookBrowser indexed my tens of thousands of ebooks stored on a remote filesystem.

…ndexing job, and loading it (followed by a quick refresh) at the next startup

pgaskin · 2019-03-14T23:51:01Z

Thanks! For the hashing, you may want to have a look at some code I wrote here (ff4de3e) instead of using the ID. I'll test the code this weekend.

sblinch · 2019-03-15T04:28:14Z

some code I wrote here (ff4de3e) instead of using the ID

I did see that, actually! It's a great approach for epub files, which seem to be Zip-based, but it unfortunately won't work with most other ebook formats like mobi, pdf, etc., which are not.

That was part of the rationale behind my filename/mtime/size approach -- it'll work with literally any ebook format. The other part was that it's a lot faster to just stat() each file during the index refresh rather than opening and reading from each file. It brought the initial (uncached) indexing time down from minutes to seconds on my own library.

pgaskin

Looks good so far. 👍

pgaskin · 2019-03-17T14:54:39Z

indexer/indexer.go

@@ -25,6 +26,7 @@ type Indexer struct {
 	booklist  booklist.BookList
 	mu        sync.Mutex
 	indMu     sync.Mutex
+	seen      *SeenCache


This should be an interface to allow for different cache implementations.

pgaskin · 2019-03-17T14:55:25Z

indexer/indexer.go

@@ -45,7 +47,79 @@ func New(paths []string, coverpath *string, exts []string) (*Indexer, error) {
 		cp = &p
 	}

-	return &Indexer{paths: paths, coverpath: cp, exts: exts}, nil
+	return &Indexer{paths: paths, coverpath: cp, exts: exts, seen: NewSeenCache()}, nil


I think the cache should be passed as an argument to New.

pgaskin · 2019-03-17T14:56:12Z

indexer/indexer.go

+	return &Indexer{paths: paths, coverpath: cp, exts: exts, seen: NewSeenCache()}, nil
+}
+
+func (i *Indexer) Load() error {


Some of this code may fit better as part of the cache. Ideally, the cache would handle it's own loading, and the indexer would query the cache during the indexing.

pgaskin · 2019-03-17T14:57:02Z

server/server.go

@@ -91,6 +98,13 @@ func (s *Server) RefreshBookIndex() error {
 	}

 	debug.FreeOSMemory()
+
+	err = s.Indexer.Save()


This should be able to be disabled using a command line flag.

Added support for saving the index to a simple JSON file after each i…

0c7051c

…ndexing job, and loading it (followed by a quick refresh) at the next startup

pgaskin self-requested a review March 14, 2019 23:52

pgaskin self-assigned this Mar 14, 2019

pgaskin added the enhancement label Mar 14, 2019

pgaskin added this to the v4.1.0 milestone Mar 15, 2019

pgaskin suggested changes Mar 17, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index caching #82

Index caching #82

sblinch commented Mar 14, 2019

pgaskin commented Mar 14, 2019

sblinch commented Mar 15, 2019

pgaskin left a comment

pgaskin Mar 17, 2019

pgaskin Mar 17, 2019

pgaskin Mar 17, 2019

pgaskin Mar 17, 2019

Index caching #82

Are you sure you want to change the base?

Index caching #82

Conversation

sblinch commented Mar 14, 2019

pgaskin commented Mar 14, 2019

sblinch commented Mar 15, 2019

pgaskin left a comment

Choose a reason for hiding this comment

pgaskin Mar 17, 2019

Choose a reason for hiding this comment

pgaskin Mar 17, 2019

Choose a reason for hiding this comment

pgaskin Mar 17, 2019

Choose a reason for hiding this comment

pgaskin Mar 17, 2019

Choose a reason for hiding this comment