Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
b27e5d3
feat(iavl): initialize disk layout
aaronc Dec 1, 2025
3599010
change package
aaronc Dec 1, 2025
e581282
switch size to uint40, update docs and tests
aaronc Dec 2, 2025
abf0976
update docs
aaronc Dec 2, 2025
7fcba13
Merge branch 'main' into aaronc/iavlx-init
aaronc Dec 2, 2025
a716e2f
reorder code
aaronc Dec 2, 2025
ca25571
Merge remote-tracking branch 'origin/aaronc/iavlx-init' into aaronc/i…
aaronc Dec 2, 2025
cccd14b
documented Uint40 endianness and added fmt.Stringer
aaronc Dec 2, 2025
07d0d5e
feat(iavl): add Node, MemNode, and NodePointer
aaronc Dec 2, 2025
a65f1c6
switch table tests to key: value struct init
aaronc Dec 2, 2025
66d3dee
Merge branch 'aaronc/iavlx-init' into aaronc/iavlx-init2
aaronc Dec 2, 2025
b212f96
add basic mem node getter tests
aaronc Dec 2, 2025
e6ccd8d
adding mutation, hash, verification code, basic tests
aaronc Dec 2, 2025
fab53dc
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/iavlx…
aaronc Dec 2, 2025
d19f0e7
add tests
aaronc Dec 3, 2025
95318bb
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/iavlx…
aaronc Dec 3, 2025
90bad56
reduce PR size
aaronc Dec 3, 2025
e37410b
add get tests and update docs
aaronc Dec 3, 2025
0564bdb
add more test explanations
aaronc Dec 3, 2025
f4607bf
update doc, add missing test
aaronc Dec 3, 2025
78faae5
feat(iavl): define KV data format
aaronc Dec 3, 2025
e8cfa93
WIP on kv data design
aaronc Dec 3, 2025
3ff4da5
WIP on kv data design
aaronc Dec 3, 2025
c75dde0
WIP on kv data writer
aaronc Dec 3, 2025
aea0c86
WIP on kv data writer
aaronc Dec 3, 2025
6d28292
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/iavlx…
aaronc Dec 4, 2025
8a74b7a
update leaf size, add missing ChangesetInfo size check
aaronc Dec 4, 2025
9b4a396
add Mmap
aaronc Dec 4, 2025
b6b140d
implement KVDataReader
aaronc Dec 4, 2025
8f72673
fixes to KVDataReader
aaronc Dec 4, 2025
647f932
fixes to KVDataReader
aaronc Dec 4, 2025
b0efee3
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/iavlx…
aaronc Dec 4, 2025
b0b1dd1
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/iavlx…
aaronc Dec 5, 2025
f42a7af
add tests, update go.mod's
aaronc Dec 5, 2025
4156068
WIP on tests, add WAL mode checks
aaronc Dec 5, 2025
b76da6b
WIP on tests
aaronc Dec 5, 2025
a34a174
WIP on tests
aaronc Dec 5, 2025
db113dc
document FileWriter
aaronc Dec 5, 2025
493430b
add empty key/value test cases
aaronc Dec 5, 2025
9db9564
add mmap tests, fix empty close bug, minor cleanups
aaronc Dec 5, 2025
951c767
lint fix
aaronc Dec 5, 2025
938c3c3
fix lint
aaronc Dec 5, 2025
b38450b
lint fix
aaronc Dec 8, 2025
eb82e46
Merge branch 'main' into aaronc/iavlx-part6
aljo242 Dec 8, 2025
506dcf2
Merge branch 'main' of github.com:cosmos/cosmos-sdk into aaronc/iavlx…
aaronc Dec 15, 2025
f42ae70
fixes
aaronc Dec 15, 2025
e91d310
Merge branch 'main' into aaronc/iavlx-part6
aaronc Jan 5, 2026
e2f38c9
Merge branch 'main' into aaronc/iavlx-part6
aljo242 Jan 6, 2026
5e76007
Merge branch 'main' into aaronc/iavlx-part6
aljo242 Jan 6, 2026
0b14264
add key and value size limits
aaronc Jan 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ require (
github.com/cosmos/gogoproto v1.7.2
github.com/cosmos/ledger-cosmos-go v1.0.0
github.com/decred/dcrd/dcrec/secp256k1/v4 v4.4.0
github.com/edsrzf/mmap-go v1.0.0
github.com/golang/protobuf v1.5.4
github.com/google/go-cmp v0.7.0
github.com/google/gofuzz v1.2.0
Expand Down
1 change: 1 addition & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,7 @@ github.com/eapache/go-xerial-snappy v0.0.0-20180814174437-776d5712da21/go.mod h1
github.com/eapache/queue v1.1.0/go.mod h1:6eCeP0CKFpHLu8blIFXhExK/dRa7WDZfr6jVFPTqq+I=
github.com/ebitengine/purego v0.9.1 h1:a/k2f2HQU3Pi399RPW1MOaZyhKJL9w/xFpKAg4q1s0A=
github.com/ebitengine/purego v0.9.1/go.mod h1:iIjxzd6CiRiOG0UyXP+V1+jWqUXVjPKLAI0mRfJZTmQ=
github.com/edsrzf/mmap-go v1.0.0 h1:CEBF7HpRnUCSJgGUb5h1Gm7e3VkmVDrR8lvWVLtrOFw=
github.com/edsrzf/mmap-go v1.0.0/go.mod h1:YO35OhQPt3KJa3ryjFM5Bs14WD66h8eGKpfaBNrHW5M=
github.com/emicklei/dot v1.8.0 h1:HnD60yAKFAevNeT+TPYr9pb8VB9bqdeSo0nzwIW6IOI=
github.com/emicklei/dot v1.8.0/go.mod h1:DeV7GvQtIw4h2u73RKBkkFdvVAz0D9fzeJrgPW6gy/s=
Expand Down
20 changes: 15 additions & 5 deletions iavl/internal/changeset_info.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ import (
"unsafe"
)

const (
sizeChangesetInfo = 32
)

func init() {
// Verify the size of ChangesetInfo is what we expect it to be at runtime.
if unsafe.Sizeof(ChangesetInfo{}) != sizeChangesetInfo {
panic(fmt.Sprintf("invalid ChangesetInfo size: got %d, want %d", unsafe.Sizeof(ChangesetInfo{}), sizeChangesetInfo))
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missing in the previous PR


// ChangesetInfo holds metadata about a changeset.
// This mainly tracks the start and end version of the changeset and also contains statistics about orphans in the
// changeset so that compaction can be efficiently scheduled.
Expand Down Expand Up @@ -34,7 +45,7 @@ type ChangesetInfo struct {
// RewriteChangesetInfo rewrites the info file with the given changeset info.
// This method is okay to call the first time the file is created as well.
func RewriteChangesetInfo(file *os.File, info *ChangesetInfo) error {
data := unsafe.Slice((*byte)(unsafe.Pointer(info)), int(unsafe.Sizeof(*info)))
data := unsafe.Slice((*byte)(unsafe.Pointer(info)), sizeChangesetInfo)
if _, err := file.WriteAt(data, 0); err != nil {
return fmt.Errorf("failed to write changeset info: %w", err)
}
Expand All @@ -45,8 +56,7 @@ func RewriteChangesetInfo(file *os.File, info *ChangesetInfo) error {
// ReadChangesetInfo reads changeset info from a file. It returns an empty default struct if file is empty.
func ReadChangesetInfo(file *os.File) (*ChangesetInfo, error) {
var info ChangesetInfo
size := int(unsafe.Sizeof(info))
data := unsafe.Slice((*byte)(unsafe.Pointer(&info)), size)
data := unsafe.Slice((*byte)(unsafe.Pointer(&info)), sizeChangesetInfo)

n, err := file.ReadAt(data, 0)
if err == io.EOF && n == 0 {
Expand All @@ -55,8 +65,8 @@ func ReadChangesetInfo(file *os.File) (*ChangesetInfo, error) {
if err != nil && err != io.EOF {
return nil, fmt.Errorf("failed to read changeset info: %w", err)
}
if n != size {
return nil, fmt.Errorf("info file has unexpected size: %d, expected %d", n, size)
if n != sizeChangesetInfo {
return nil, fmt.Errorf("info file has unexpected size: %d, expected %d", n, sizeChangesetInfo)
}

return &info, nil
Expand Down
46 changes: 46 additions & 0 deletions iavl/internal/file_writer.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package internal

import (
"bufio"
"fmt"
"io"
"os"
)

// FileWriter is a buffered writer that tracks the number of bytes written.
type FileWriter struct {
writer *bufio.Writer
written int
}

// NewFileWriter creates a new FileWriter.
// Currently, it uses a buffer size of 512kb.
// If we want to make that configurable, we can add a constructor with a buffer size parameter in the future.
func NewFileWriter(file *os.File) *FileWriter {
const defaultBufferSize = 512 * 1024 // 512kb
return &FileWriter{
writer: bufio.NewWriterSize(file, defaultBufferSize),
}
}

// Write writes data to the underlying buffered writer and updates the written byte count.
func (f *FileWriter) Write(p []byte) (n int, err error) {
n, err = f.writer.Write(p)
f.written += n
return n, err
}

// Flush flushes the underlying buffered writer.
func (f *FileWriter) Flush() error {
if err := f.writer.Flush(); err != nil {
return fmt.Errorf("failed to flush writer: %w", err)
}
return nil
}

// Size returns the total number of bytes written so far.
func (f *FileWriter) Size() int {
return f.written
}

var _ io.Writer = (*FileWriter)(nil)
44 changes: 44 additions & 0 deletions iavl/internal/kvdata_entry.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package internal

// KVEntryType represents the type of entry in the KV data file.
type KVEntryType byte

const (
// KVEntryWALStart is the first entry in an uncompacted KV data file which indicates that this KV data file
// can be used for WAL replay restoration. It must immediately be followed by the varint-encoded version number
// corresponding to the first version in this changeset.
KVEntryWALStart KVEntryType = 0x0

// KVEntryWALSet indicates a set operation for a key-value pair.
// This should be followed by:
// - varint key length + key bytes, OR if KVFlagCachedKey is set, a 32-bit LE offset to a cached key
// - varint value length + value bytes
// Offsets point to the start of the varint length field, not the type byte.
KVEntryWALSet KVEntryType = 0x1

// KVEntryWALDelete indicates a delete operation for a key.
// This should be followed by:
// - varint key length + key bytes, OR if KVFlagCachedKey is set, a 32-bit LE offset to a cached key
// Offsets point to the start of the varint length field, not the type byte.
KVEntryWALDelete KVEntryType = 0x2

// KVEntryWALCommit indicates the commit operation for a version.
// This must be followed by a varint-encoded version number.
KVEntryWALCommit KVEntryType = 0x3

// KVEntryKeyBlob indicates a standalone key data entry.
// This should be followed by varint length + raw bytes.
// Used for compacted (non-WAL) leaf or branch keys not already cached.
KVEntryKeyBlob KVEntryType = 0x4

// KVEntryValueBlob indicates a standalone value data entry.
// This should be followed by varint length + raw bytes.
// Used for compacted (non-WAL) leaf values.
// The main difference between KVEntryKeyBlob and KVEntryValueBlob is that key
// entries may be cached for faster access, while value entries are not cached.
Comment on lines +29 to +38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain more about the caching here? this wasn't in the previous version was it?

Copy link
Member Author

@aaronc aaronc Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching wasn't in the previous version, no. Basically in looking at data files, the kv data files are much larger than any of the other files, and my theory is we're storing lots of duplicate key data. There are likely lots of storage locations in the database which are written to repeatedly with the same key. Also branch nodes always share keys with some leaf nodes. So introducing this caching is a very simple form of data compression that hopefully will lead to some reduction in storage. There could be other forms of compression we considered, but this seems pretty straightforward and likely to have some pay off, and all it costs us is a little extra memory to maintain the cache while we're writing a file. My suggestion would be to try this and compare data file sizes between the current version and this one.

KVEntryValueBlob KVEntryType = 0x5

// KVFlagCachedKey indicates that the key for this entry is cached and should be referenced by
// a 32-bit little-endian offset instead of being stored inline.
KVFlagCachedKey KVEntryType = 0x80
)
Loading
Loading