-
Notifications
You must be signed in to change notification settings - Fork 4.1k
feat(iavl): add KV data reader & writer, and mmap wrapper #25645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b27e5d3
3599010
e581282
abf0976
7fcba13
a716e2f
ca25571
cccd14b
07d0d5e
a65f1c6
66d3dee
b212f96
e6ccd8d
fab53dc
d19f0e7
95318bb
90bad56
e37410b
0564bdb
f4607bf
78faae5
e8cfa93
3ff4da5
c75dde0
aea0c86
6d28292
8a74b7a
9b4a396
b6b140d
8f72673
647f932
b0efee3
b0b1dd1
f42a7af
4156068
b76da6b
a34a174
db113dc
493430b
9db9564
951c767
938c3c3
b38450b
eb82e46
506dcf2
f42ae70
e91d310
e2f38c9
5e76007
0b14264
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| package internal | ||
|
|
||
| import ( | ||
| "bufio" | ||
| "fmt" | ||
| "io" | ||
| "os" | ||
| ) | ||
|
|
||
| // FileWriter is a buffered writer that tracks the number of bytes written. | ||
| type FileWriter struct { | ||
| writer *bufio.Writer | ||
| written int | ||
| } | ||
|
|
||
| // NewFileWriter creates a new FileWriter. | ||
| // Currently, it uses a buffer size of 512kb. | ||
| // If we want to make that configurable, we can add a constructor with a buffer size parameter in the future. | ||
| func NewFileWriter(file *os.File) *FileWriter { | ||
| const defaultBufferSize = 512 * 1024 // 512kb | ||
| return &FileWriter{ | ||
| writer: bufio.NewWriterSize(file, defaultBufferSize), | ||
| } | ||
| } | ||
|
|
||
| // Write writes data to the underlying buffered writer and updates the written byte count. | ||
| func (f *FileWriter) Write(p []byte) (n int, err error) { | ||
| n, err = f.writer.Write(p) | ||
| f.written += n | ||
| return n, err | ||
| } | ||
|
|
||
| // Flush flushes the underlying buffered writer. | ||
| func (f *FileWriter) Flush() error { | ||
| if err := f.writer.Flush(); err != nil { | ||
| return fmt.Errorf("failed to flush writer: %w", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // Size returns the total number of bytes written so far. | ||
| func (f *FileWriter) Size() int { | ||
| return f.written | ||
| } | ||
|
|
||
| var _ io.Writer = (*FileWriter)(nil) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| package internal | ||
|
|
||
| // KVEntryType represents the type of entry in the KV data file. | ||
| type KVEntryType byte | ||
|
|
||
| const ( | ||
| // KVEntryWALStart is the first entry in an uncompacted KV data file which indicates that this KV data file | ||
| // can be used for WAL replay restoration. It must immediately be followed by the varint-encoded version number | ||
| // corresponding to the first version in this changeset. | ||
| KVEntryWALStart KVEntryType = 0x0 | ||
|
|
||
| // KVEntryWALSet indicates a set operation for a key-value pair. | ||
| // This should be followed by: | ||
| // - varint key length + key bytes, OR if KVFlagCachedKey is set, a 32-bit LE offset to a cached key | ||
| // - varint value length + value bytes | ||
| // Offsets point to the start of the varint length field, not the type byte. | ||
| KVEntryWALSet KVEntryType = 0x1 | ||
|
|
||
| // KVEntryWALDelete indicates a delete operation for a key. | ||
| // This should be followed by: | ||
| // - varint key length + key bytes, OR if KVFlagCachedKey is set, a 32-bit LE offset to a cached key | ||
| // Offsets point to the start of the varint length field, not the type byte. | ||
| KVEntryWALDelete KVEntryType = 0x2 | ||
|
|
||
| // KVEntryWALCommit indicates the commit operation for a version. | ||
| // This must be followed by a varint-encoded version number. | ||
| KVEntryWALCommit KVEntryType = 0x3 | ||
|
|
||
| // KVEntryKeyBlob indicates a standalone key data entry. | ||
| // This should be followed by varint length + raw bytes. | ||
| // Used for compacted (non-WAL) leaf or branch keys not already cached. | ||
| KVEntryKeyBlob KVEntryType = 0x4 | ||
|
|
||
| // KVEntryValueBlob indicates a standalone value data entry. | ||
| // This should be followed by varint length + raw bytes. | ||
| // Used for compacted (non-WAL) leaf values. | ||
| // The main difference between KVEntryKeyBlob and KVEntryValueBlob is that key | ||
| // entries may be cached for faster access, while value entries are not cached. | ||
|
Comment on lines
+29
to
+38
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you explain more about the caching here? this wasn't in the previous version was it?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Caching wasn't in the previous version, no. Basically in looking at data files, the kv data files are much larger than any of the other files, and my theory is we're storing lots of duplicate key data. There are likely lots of storage locations in the database which are written to repeatedly with the same key. Also branch nodes always share keys with some leaf nodes. So introducing this caching is a very simple form of data compression that hopefully will lead to some reduction in storage. There could be other forms of compression we considered, but this seems pretty straightforward and likely to have some pay off, and all it costs us is a little extra memory to maintain the cache while we're writing a file. My suggestion would be to try this and compare data file sizes between the current version and this one. |
||
| KVEntryValueBlob KVEntryType = 0x5 | ||
|
|
||
| // KVFlagCachedKey indicates that the key for this entry is cached and should be referenced by | ||
| // a 32-bit little-endian offset instead of being stored inline. | ||
| KVFlagCachedKey KVEntryType = 0x80 | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missing in the previous PR