repository - going away from transactions, log, refcounts?

## current state (current master branch, borg 1.x, borg 0.x, attic)

A borg repository is primarily a key/value store (with some aux functions).

The key is the chunk id (== MAC(plaintext)), the value is the compressed/encrypted/authenticated data.

borg uses transactions and a LOG when writing to the repo:
- start of transaction (usually triggered by PUT/DEL)
- writes more objects by appending PUT entries to the log
- deletes objects by appending DEL entries to the log
- commits (appends a COMMIT entry to the log)
- end of transaction (S: saves repo index and hints, C: saves chunks index and files cache)

LOG means that new stuff is always appended at the end of the last/current segment file. In general, old segment files are never modified in place.

`borg compact` defrags non-compact segment files:
- a segment file contains PUTs, DELs, COMMITs
- if a PUT(id) is later deleted by a DEL(id), it creates a logical hole in a segment file (that object is not used any more), making it non-compact
- compaction / defragging works by reading all still-needed objects from an old segment file and appending them to a new segment file. after that is finished, the old segment file is deleted (and that frees disk space because the new segment file is smaller).

## advantages of this approach
- transactions and append-only log are a very safe approaches (even if stuff crashes it usually can roll back to previous state and be fine again)
- segment files are medium size files: not too large, not too small, not too many
  - works well even with not very scalable filesystems
  - has little overhead due to fs block / cluster size
  - can be copied or deleted rather quickly (not many fs objects)

## disadvantages of this approach
- borg compact can cause lots of I/O when shuffling objects from old non-compact segments to new compact segments
- borg compact needs some space on the fs to be able to work. bad if your fs is 100% full...
- compaction code is rather complex, same for transaction management
- to quickly access objects, the repository needs an index mapping `id -> (segment, offset, flags)`
- borg currently loads the repo index (hashtable) into memory. RAM usage is about 44b * object_count + free space in hashtable. if you have a lot of files and/or a lot of data volume, repo index can need GBs of RAM.
- to implement this, some special borg code is needed with access to the repo filesystem
- hard to work like this without locking the repository against concurrent access.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

repository - going away from transactions, log, refcounts? #7377

current state (current master branch, borg 1.x, borg 0.x, attic)

advantages of this approach

disadvantages of this approach

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

repository - going away from transactions, log, refcounts? #7377

Description

current state (current master branch, borg 1.x, borg 0.x, attic)

advantages of this approach

disadvantages of this approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions