-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Problem
The current statement-store uses a single RwLock<Index> that protects all index data:
struct Store {
index: RwLock<Index>, // Writers block readers, readers block writers
db: parity_db::Db,
}This creates two issues:
-
Lock Contention: When a write (
submit) is in progress, all reads (new subscriptions, network workers, etc.) must wait. When multiple reads are active, writes must wait. -
Memory Usage: The entire index is held in memory, which puts an upper bound on the number of statements we can handle since there's only so much we can scale with RAM.
Key Insight
Analysis reveals that read and write operations access completely different fields:
| Operation | Data Read | Data Written |
|---|---|---|
Write (submit) |
accounts, total_size, statement_count |
accounts, entries, by_topic, by_dec_key |
| Read | by_topic, by_dec_key |
Nothing |
The write path reads accounts for constraint checking (per-account limits, channel priorities).
The read path reads by_topic and by_dec_key for query filtering.
They don't overlap! This means we can safely separate them with independent locks.
Proposed Solution
An iterative approach in two stages:
Stage 1: Separate Read and Write Locks (In-Memory)
Split the single Index into two structures with independent locks:
struct Store {
write_index: RwLock<WriteIndex>, // For submit/insert operations
read_index: RwLock<ReadIndex>, // For query operations
db: parity_db::Db,
}
struct WriteIndex {
accounts: HashMap<AccountId, StatementsForAccount>,
entries: HashMap<Hash, (AccountId, Priority, usize)>,
expired: HashMap<Hash, u64>,
recent: HashSet<Hash>,
total_size: usize,
options: Options,
}
struct ReadIndex {
by_topic: HashMap<Topic, HashSet<Hash>>,
by_dec_key: HashMap<Option<DecryptionKey>, HashSet<Hash>>,
}Write path (submit):
- Acquire
write_indexlock for constraint checking - After DB commit succeeds, notify a background worker to update
read_index - Release lock
Read path (new subscriptions, network workers, etc.):
- Only acquire
read_indexlock - Never touches
write_index
Benefits of Stage 1:
- Reads no longer block writes during constraint checking
- Same memory usage as before (preparation for Stage 2)
- Can be validated independently before Stage 2
Stage 2: Move Read Index to Disk + LRU Cache for Write Index
2a. Move Read Index to Disk with LRU Cache
Move by_topic and by_dec_key to on-disk indexes, but keep frequently accessed topics in a bounded LRU cache for fast lookups. Cache misses fall back to database queries. This provides a good balance between memory usage and read performance.
2b. LRU Cache for Write Index
Replace full accounts HashMap with lightweight summary + bounded LRU cache:
struct WriteIndex {
// Lightweight summary for active accounts (~16 bytes each)
account_summary: LruCache<AccountId, AccountSummary>,
// Full details for active accounts (bounded LRU cache)
accounts_cache: LruCache<AccountId, StatementsForAccount>,
// Global counters (replaces `entries`)
statement_count: usize,
total_size: usize,
expired: HashMap<Hash, u64>,
recent: HashSet<Hash>,
options: Options,
}The hot accounts will hit the cache/LRU more often than not, while for the not, so hot the price of going to DB should be acceptable.
Benefits
The key benefits of this refactor are:
-
Reduced lock contention: Read and write paths use independent locks, so readers don't block writers and vice versa. After Stage 2, the read path is completely lock-free.
-
No need to hold the entire index in memory: Memory usage doesn't become the limiting factor anymore.