Skip to content

idea: mmap the kv array? #45

@ThomasWaldmann

Description

@ThomasWaldmann

the memory footprint of HashTableNT can get rather large overall, it is a sum of this:

  • the hashtable itself. to work efficiently, it needs to have a lot of unused buckets, but that does not need too much memory as we only store the key idx and the value idx into the hashtable, not the full key and full value. to work efficiently, the ht should for sure be always in RAM completely.
  • the kv array stores all the full keys and full values (key can e.g. be 256bit from a hash function, value can be of any size) in a dense memory block pointed to by uint8_t * self.kv and arranged like key0, value0, key1, value1, ...

Guess mmap under tight memory conditions only performs well if there is some locality for the memory accesses. If accesses would just randomly spread over the complete arrays it would not perform great, but maybe still better than swapping (with mmap, pages can be just discarded and reloaded instead of having to be swapped out).

The locality could be achieved by sorting the KV arrays by application-specific value tuple members (e.g. for borgbackup: sort by pack_id). Sorting the kv array shall not be part of the initial implementation, but could be done in a future optimization.

Update: when borgbackup writes new chunks to a new pack, all kv pairs appended to the kv array will belong to same pack_id. At restore time extracting files will be in archive order and also access the hashtable in order of pack_ids. So, initially we have some locality here naturally. Over time, when chunks get deleted and packs get compacted, the ordering will get mixed up though.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions