Add a MapDB implementation of CitationCache by sstults · Pull Request #234 · adsabs/montysolr

sstults · 2025-04-28T15:01:14Z

One of the limitations of our current in-memory cache is that the whole cache is stored in memory. This was an intentional design decision to prefer query speed over hardware cost. The one of the drawbacks of this approach is that each cache instance is only aware of the Lucene documents in the local index. In order to deploy Montysolr in a multi-shard collection we will need to implement a cache that uses the external document identifiers that are unique across the collection rather than the internal Lucene docids that are only unique to a shard.

This PR is a step in the direction of expanding the number of documents that can be indexed by reducing the memory requirements per-shard.

Remove test code

sstults added 12 commits April 16, 2025 13:00

Add MapDB implementation of CitationCache

0d6d9ed

Cleanups

8781ed5

Cleanups

6a0402c

Cleanups

9440b62

Cleanups

5e25250

Cleanups

2be4f4f

Cleanups

5a2d89b

Update to fix deprecation of NO_MORE_ORDS

c0c3cea

Remove test code

Added graph creation and serialization functionality

ad7fb94

Added graph creation and serialization functionality

2374377

Bugfixes around cache rebuilding

25d13f1

Fix deprecated method call

a04ee27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a MapDB implementation of CitationCache#234

Add a MapDB implementation of CitationCache#234
sstults wants to merge 12 commits intoadsabs:mainfrom
sstults:sstults-mapdbcache

sstults commented Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sstults commented Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant