Skip to content

Releases: pithecene/lode

v0.9.0

08 Mar 15:01
e4c873c

Choose a tag to compare

Retry-aware commits with full backoff control

Summary

Lode v0.9.0 adds opt-in automatic CAS retry on snapshot conflict for both Dataset and Volume, eliminating the need for callers to implement retry loops. The Volume option surface is unified with Dataset's shared Option interface, removing the separate VolumeOption type.

Highlights

  • Opt-in CAS retry: WithRetryCount(n) enables bounded automatic retry within the commit path. Data files are written once — only the manifest re-parent and pointer CAS are retried.
  • Full backoff customization: WithRetryBaseDelay(d), WithRetryMaxDelay(d), WithRetryJitter(j) for jittered exponential backoff tuning. Defaults: 10ms base, 2s max, full jitter.
  • Unified Option interface: NewVolume now accepts the same Option type as NewDataset and NewDatasetReader. Options that don't apply return an error at construction time.
  • WithChecksum works everywhere: Replaces the Volume-specific WithVolumeChecksum — one option for Dataset file checksums and Volume block checksums.

Breaking Changes

  • VolumeOption type removed — replace with Option
  • WithVolumeChecksum(c) removed — replace with WithChecksum(c)

Upgrade Notes

Search-and-replace migration:

  • lode.VolumeOptionlode.Option
  • lode.WithVolumeChecksum(lode.WithChecksum(

Retry is opt-in: default behavior (0 retries) is unchanged. No data migration required.

References

Full Changelog: v0.8.0...v0.9.0

v0.8.0

24 Feb 17:32
c4548d1

Choose a tag to compare

Safe concurrent writes, quality hardening, and S3 promotion

Summary

v0.8.0 adds CAS optimistic concurrency so concurrent writers get conflict detection instead of corruption, promotes the S3 adapter from experimental, and includes a broad hardening pass across code, tests, CI, and docs.

Highlights

  • CAS optimistic concurrency: ConditionalWriter interface with CompareAndSwap on FS (Unix), Memory, and S3 stores — automatic conflict detection via ErrSnapshotConflict with retry pattern
  • S3 adapter promoted: No longer experimental; documentation, ARCH_INDEX, and examples updated
  • Optimistic concurrency example: examples/optimistic_concurrency/ demonstrates CAS conflict detection and retry
  • Vector artifact pipeline example: examples/vector_artifacts/ demonstrates embeddings, indices, and active pointers
  • CI hardened: Race detection added as parallel job; all workflows scoped to least-privilege permissions
  • Test hardening: Store coverage (Exists, List, path safety), sentinel error messages, Parquet type conversion, table-driven manifest validation

Upgrade Notes

  • CAS is always-on: When the store implements ConditionalWriter, CAS is used automatically. No configuration required.
  • Retry on conflict: Callers that write concurrently should handle ErrSnapshotConflict by re-reading Latest(), merging state, and re-committing.
  • No migration required: Existing data and stores work without modification. Stores without ConditionalWriter retain the existing Delete+Put single-writer path.

References

Full Changelog: v0.7.4...v0.8.0

v0.7.4

13 Feb 03:36
7b4760c

Choose a tag to compare

Documented complexity, enforced bounds

Summary

v0.7.4 adds a complexity bounds contract documenting the cost of every public method, resolves all known complexity violations, and applies a focused readability pass to volume internals. No API changes; no migration required.

Highlights

  • CONTRACT_COMPLEXITY.md: Every public method now has documented cost in store calls, memory, and CPU — cost exceeding the bound is a bug
  • 4 complexity violations fixed: sort-at-load-time for blocks (CX-1/CX-6), single-pass ListPartitions (CX-3), filepath.WalkDir for fsStore.List (CX-7)
  • 3 complexity items reclassified: ListManifests validation Gets (contractually required), Snapshot(id) degraded fallback (documented path), Write partitioned memory (interface aliases, not copies)
  • Cross-references: 4 existing contracts now link to CONTRACT_COMPLEXITY.md for cost definitions
  • Readability pass: 10× fmt.Errorferrors.New, extracted resolveParent method and ensureBlocksSortedByOffset helper

Upgrade Notes

  • No API changes; all improvements are internal and transparent
  • No migration required; existing data is compatible without modification
  • Safe to upgrade from v0.7.3

References

Full Changelog: v0.7.3...v0.7.4

What's Changed

  • docs(contracts): 📝 add complexity bounds contract and cross-references by @justapithecus in #129
  • perf(core): ⚡️ resolve complexity violations and activate CONTRACT_COMPLEXITY by @justapithecus in #130
  • refactor(core): ♻️ post-complexity readability pass in volume.go by @justapithecus in #131
  • docs(meta): 📝 v0.7.4 changelog and README update by @justapithecus in #132

Full Changelog: v0.7.3...v0.7.4

v0.7.3

13 Feb 00:23
fe55cc4

Choose a tag to compare

O(1) snapshot resolution and complexity sweep

Summary

Eliminates all known O(N) hotspots across Dataset and Volume read/write paths. Cold-start Latest() on remote stores drops from ~30 minutes (5,800 manifests on R2) to ~2 seconds. No API changes; existing data is compatible without migration.

Highlights

  • Persistent latest pointer: Datasets and Volumes write a latest file for O(1) cold-start resolution (1 Get instead of N manifest downloads)
  • Pointer-before-manifest protocol: Pointer write failure aborts the commit, eliminating stale pointer bugs across process restarts
  • O(1) Snapshot(ctx, id) on HiveLayout: Canonical manifest at the non-partitioned path replaces O(N) partition scan
  • O(log B) Volume block lookup: Binary search on sorted blocks replaces linear scan in ReadAt
  • O(N + K log K) block merge: Merge-insert algorithm avoids re-sorting the full cumulative block set on each commit
  • Optimized scan fallback: latestByScan loads only the last manifest (1 List + 1 Get), self-heals by writing the pointer

Upgrade Notes

  • No API changes — all improvements are internal and transparent
  • No migration required — pre-v0.7.3 datasets and volumes work without modification
  • Automatic self-healing — on first write after upgrade, the latest pointer and canonical manifest (HiveLayout) are created automatically
  • Safe to upgrade from v0.7.2

References

Full Changelog: v0.7.2...v0.7.3

What's Changed

Full Changelog: v0.7.2...v0.7.3

v0.7.2

10 Feb 02:54
0f2614d

Choose a tag to compare

Rename Go module path to pithecene-io

Summary

The Go module path has been renamed from github.com/justapithecus/lode to github.com/pithecene-io/lode to reflect the GitHub organization rename. No behavior changes — all APIs, semantics, and contracts remain unchanged.

Highlights

  • 🚚 Update go.mod module declaration to github.com/pithecene-io/lode
  • 🚚 Update all Go import paths across source, tests, and examples (22 files)
  • 📝 Update documentation references (README, PUBLIC_API, contracts, tooling)
  • 📝 Add CHANGELOG entry with import path migration notes

Breaking Changes

  • Import paths have changed. All consumers must update their imports:
    • github.com/justapithecus/lode/lodegithub.com/pithecene-io/lode/lode
    • github.com/justapithecus/lode/lode/s3github.com/pithecene-io/lode/lode/s3

Upgrade Notes

  • Update require directive in go.mod to github.com/pithecene-io/lode
  • Find-and-replace justapithecuspithecene-io in all import blocks
  • No API, behavior, or semantic changes

References

  • #111 — Rename PR

Full Changelog: v0.7.1...v0.7.2

What's Changed

  • refactor(build): 🚚 rename module path from justapithecus to pithecene-io by @justapithecus in #111

Full Changelog: v0.7.1...v0.7.2

v0.7.1

10 Feb 01:29
a07634d

Choose a tag to compare

Fix O(n²) write degradation on remote stores

Summary

Write(), StreamWrite(), and StreamWriteRecords() called Latest() on every invocation, scanning all manifests via store.List + N×store.Get. On remote stores (S3, R2) with ~50–100ms per API call, this caused sequential writes to degrade quadratically. Parent snapshot ID is now cached after each successful write, turning every write after the first into O(1).

Highlights

  • ⚡️ Cache last-written DatasetSnapshotID in the dataset struct
  • 🔧 All three write paths (Write, StreamWrite, StreamWriteRecords) and streamWriter.Commit update the cache
  • 🧊 Cold start (first write) falls back to Latest(); all subsequent writes are O(1)
  • 📈 New sequential write benchmarks with simulated store latency as regression guard
  • 📝 CHANGELOG comparison links updated to pithecene-io org
  • 🩹 Fixed stale README gotcha: nil metadata has been safe since v0.7.0

Upgrade Notes

  • No API changes; transparent internal optimization
  • All write paths benefit automatically
  • Safe to upgrade from v0.7.0

References

Full Changelog: v0.7.0...v0.7.1

What's Changed

  • docs(v1-readiness): 📝 add v1.0 release criteria and dogfooding tracker by @justapithecus in #106
  • docs(volume): 🔥 consolidate VOLUME_DIRECTION.md into CONTRACT_VOLUME.md by @justapithecus in #107
  • fix(dataset): ⚡️ cache parent snapshot ID to eliminate O(n²) writes by @justapithecus in #109
  • docs(release): 📝 v0.7.1 changelog, README updates, and test matrix by @justapithecus in #110

Full Changelog: v0.7.0...v0.7.1

v0.7.0

07 Feb 03:41
79ddbbd

Choose a tag to compare

Codec-agnostic per-file statistics and nil metadata coalescing

Summary

v0.7.0 adds per-file column statistics to manifests (enabling pruning workflows without opening data files) and relaxes nil metadata handling across all write paths to coalesce to empty instead of returning an error.

Highlights

  • Per-file column statistics: New StatisticalCodec and StatisticalStreamEncoder interfaces allow any codec to report per-file column stats (min, max, null count, distinct count) persisted on FileRef
  • Parquet statistics: The Parquet codec implements StatisticalCodec, reporting column-level min/max/null count for all orderable types (int32, int64, float32, float64, string, timestamp)
  • New public types: FileStats, ColumnStats on the public API surface
  • Nil metadata coalescing: Write, StreamWrite, StreamWriteRecords, and Volume.Commit now coalesce nil metadata to Metadata{} instead of returning an error
  • Contract updates: CONTRACT_CORE, CONTRACT_WRITE_API, CONTRACT_VOLUME, and CONTRACT_PARQUET updated to reflect new semantics
  • 14 new stats tests and 4 updated coalescing tests with full traceability matrix coverage

Upgrade Notes

  • Callers that previously passed Metadata{} solely to avoid nil errors can now pass nil safely
  • Callers that relied on nil metadata returning an error should remove that expectation
  • Per-file stats are opt-in: only codecs implementing StatisticalCodec produce them; manifests without stats remain valid

References

  • Per-file statistics: #103
  • Nil metadata coalescing + housekeeping: #104

Full Changelog: v0.6.0...v0.7.0

What's Changed

  • docs(agents): 📝 enhance AGENTS.md with Go style and composition guardrails by @justapithecus in #102
  • feat(manifest): ✨ add per-file column statistics for codec-agnostic pruning by @justapithecus in #103
  • chore(api): 🩹 post-stats housekeeping and nil metadata coalescing by @justapithecus in #104
  • docs: 📝 backfill CHANGELOG for v0.6.0 and v0.7.0 by @justapithecus in #105

Full Changelog: v0.6.0...v0.7.0

v0.6.0

07 Feb 00:50
1145ee3

Choose a tag to compare

Dual persistence: Dataset + Volume

Summary

Introduces Volume as a second first-class persistence paradigm alongside Dataset. Volume provides sparse, resumable, range-addressable byte-space persistence with incremental commits and overflow-safe arithmetic throughout.

Highlights

  • NewVolume constructor with VolumeID, TotalLength, and optional WithVolumeChecksum
  • StageWriteAt / Commit / ReadAt for incremental block-level persistence
  • Cumulative snapshot manifests with strict overlap validation
  • Latest / Snapshots / Snapshot for Volume history access
  • ErrRangeMissing and ErrOverlappingBlocks error sentinels
  • Overflow-safe bounds checks across all Volume code paths

Breaking Changes

  • SnapshotDatasetSnapshot, SnapshotIDDatasetSnapshotID
  • ReaderDatasetReader, NewReaderNewDatasetReader
  • SegmentRefManifestRef (Dataset), BlockRef (Volume)
  • ListSegmentsListManifests, SegmentListOptionsManifestListOptions
  • ErrOverlappingSegmentsErrOverlappingBlocks
  • ErrOptionNotValidForReaderErrOptionNotValidForDatasetReader

Upgrade Notes

  • All Dataset type renames are mechanical find-and-replace in consuming code
  • Volume is entirely additive — no changes needed if you only use Dataset
  • Volume uses a fixed internal layout; the Layout abstraction remains Dataset-specific

References

  • docs/contracts/CONTRACT_VOLUME.md
  • docs/contracts/CONTRACT_WRITE_API.md (concurrency matrices)
  • docs/contracts/CONTRACT_ERRORS.md (new sentinels)
  • examples/volume_sparse/

Full Changelog: v0.5.0...v0.6.0

What's Changed

  • docs(roadmap): 📝 add v0.6 volume contract + api plan by @justapithecus in #92
  • refactor(api): ♻️ rename Dataset/Reader types and add Volume type definitions by @justapithecus in #94
  • feat(volume): ✨ implement core Volume persistence by @justapithecus in #95
  • test(volume): ✅ add comprehensive Volume test suite by @justapithecus in #96
  • feat(examples): ✨ add sparse Volume ranges example by @justapithecus in #97
  • docs(volume): 📝 finalize Volume documentation and README refresh by @justapithecus in #98
  • docs: 📝 pre-v0.6.0 documentation audit fixes by @justapithecus in #99
  • fix(volume): 🐛 use overflow-safe arithmetic in bounds checks by @justapithecus in #100
  • docs(roadmap): 📝 mark Parquet and Volume Phase 6 deliverables complete by @justapithecus in #101

Full Changelog: v0.5.0...v0.6.0

v0.5.0

06 Feb 04:18
d09ea08

Choose a tag to compare

Parquet codec and schema‑validated columnar storage

Summary

Adds a Parquet codec with explicit schema validation and new Parquet examples, plus dedicated Parquet error sentinels.

Highlights

  • Introduces NewParquetCodec(schema, opts...) with schema validation at construction time.
  • Adds Parquet schema/types (ParquetSchema, ParquetField, ParquetType) and compression options.
  • Adds examples/parquet/ and Parquet-specific error sentinels.

Breaking Changes

  • NewParquetCodec now returns (Codec, error) instead of Codec.

Upgrade Notes

  • Parquet is non‑streaming: StreamWriteRecords returns ErrCodecNotStreamable; use Dataset.Write.
  • When using Parquet, set the Lode compressor to NewNoOpCompressor() to avoid double compression.
  • Invalid Parquet schemas now fail at construction time.

References

  • docs/contracts/CONTRACT_PARQUET.md
  • docs/contracts/CONTRACT_ERRORS.md

Full Changelog: v0.4.1...v0.5.0

v0.4.1

05 Feb 13:55
7a7fea9

Choose a tag to compare

S3 multipart atomicity hardening

Summary

Improves S3 multipart atomic no‑overwrite for large uploads and documents backend compatibility.

Highlights

  • Adds a documented S3 backend compatibility matrix for multipart completion behavior.
  • Uses conditional CompleteMultipartUpload for large uploads to close the TOCTOU window.

Known Limitations

  • Atomic no‑overwrite for large uploads is verified on AWS S3 and assumed, but untested, on other S3‑compatible backends.

References

  • PUBLIC_API.md
  • docs/contracts/

Full Changelog: v0.4.0...v0.4.1