Releases: pithecene/lode
v0.9.0
Retry-aware commits with full backoff control
Summary
Lode v0.9.0 adds opt-in automatic CAS retry on snapshot conflict for both Dataset and Volume, eliminating the need for callers to implement retry loops. The Volume option surface is unified with Dataset's shared Option interface, removing the separate VolumeOption type.
Highlights
- Opt-in CAS retry:
WithRetryCount(n)enables bounded automatic retry within the commit path. Data files are written once — only the manifest re-parent and pointer CAS are retried. - Full backoff customization:
WithRetryBaseDelay(d),WithRetryMaxDelay(d),WithRetryJitter(j)for jittered exponential backoff tuning. Defaults: 10ms base, 2s max, full jitter. - Unified Option interface:
NewVolumenow accepts the sameOptiontype asNewDatasetandNewDatasetReader. Options that don't apply return an error at construction time. WithChecksumworks everywhere: Replaces the Volume-specificWithVolumeChecksum— one option for Dataset file checksums and Volume block checksums.
Breaking Changes
VolumeOptiontype removed — replace withOptionWithVolumeChecksum(c)removed — replace withWithChecksum(c)
Upgrade Notes
Search-and-replace migration:
lode.VolumeOption→lode.Optionlode.WithVolumeChecksum(→lode.WithChecksum(
Retry is opt-in: default behavior (0 retries) is unchanged. No data migration required.
References
CONTRACT_WRITE_API.md— Automatic retry semanticsCONTRACT_VOLUME.md— Unified Option interface, Volume retry- #163 — Originating issue
Full Changelog: v0.8.0...v0.9.0
v0.8.0
Safe concurrent writes, quality hardening, and S3 promotion
Summary
v0.8.0 adds CAS optimistic concurrency so concurrent writers get conflict detection instead of corruption, promotes the S3 adapter from experimental, and includes a broad hardening pass across code, tests, CI, and docs.
Highlights
- CAS optimistic concurrency:
ConditionalWriterinterface withCompareAndSwapon FS (Unix), Memory, and S3 stores — automatic conflict detection viaErrSnapshotConflictwith retry pattern - S3 adapter promoted: No longer experimental; documentation, ARCH_INDEX, and examples updated
- Optimistic concurrency example:
examples/optimistic_concurrency/demonstrates CAS conflict detection and retry - Vector artifact pipeline example:
examples/vector_artifacts/demonstrates embeddings, indices, and active pointers - CI hardened: Race detection added as parallel job; all workflows scoped to least-privilege permissions
- Test hardening: Store coverage (Exists, List, path safety), sentinel error messages, Parquet type conversion, table-driven manifest validation
Upgrade Notes
- CAS is always-on: When the store implements
ConditionalWriter, CAS is used automatically. No configuration required. - Retry on conflict: Callers that write concurrently should handle
ErrSnapshotConflictby re-readingLatest(), merging state, and re-committing. - No migration required: Existing data and stores work without modification. Stores without
ConditionalWriterretain the existing Delete+Put single-writer path.
References
Full Changelog: v0.7.4...v0.8.0
v0.7.4
Documented complexity, enforced bounds
Summary
v0.7.4 adds a complexity bounds contract documenting the cost of every public method, resolves all known complexity violations, and applies a focused readability pass to volume internals. No API changes; no migration required.
Highlights
CONTRACT_COMPLEXITY.md: Every public method now has documented cost in store calls, memory, and CPU — cost exceeding the bound is a bug- 4 complexity violations fixed: sort-at-load-time for blocks (CX-1/CX-6), single-pass
ListPartitions(CX-3),filepath.WalkDirforfsStore.List(CX-7) - 3 complexity items reclassified:
ListManifestsvalidation Gets (contractually required),Snapshot(id)degraded fallback (documented path),Writepartitioned memory (interface aliases, not copies) - Cross-references: 4 existing contracts now link to
CONTRACT_COMPLEXITY.mdfor cost definitions - Readability pass: 10×
fmt.Errorf→errors.New, extractedresolveParentmethod andensureBlocksSortedByOffsethelper
Upgrade Notes
- No API changes; all improvements are internal and transparent
- No migration required; existing data is compatible without modification
- Safe to upgrade from v0.7.3
References
Full Changelog: v0.7.3...v0.7.4
What's Changed
- docs(contracts): 📝 add complexity bounds contract and cross-references by @justapithecus in #129
- perf(core): ⚡️ resolve complexity violations and activate CONTRACT_COMPLEXITY by @justapithecus in #130
- refactor(core): ♻️ post-complexity readability pass in volume.go by @justapithecus in #131
- docs(meta): 📝 v0.7.4 changelog and README update by @justapithecus in #132
Full Changelog: v0.7.3...v0.7.4
v0.7.3
O(1) snapshot resolution and complexity sweep
Summary
Eliminates all known O(N) hotspots across Dataset and Volume read/write paths. Cold-start Latest() on remote stores drops from ~30 minutes (5,800 manifests on R2) to ~2 seconds. No API changes; existing data is compatible without migration.
Highlights
- Persistent latest pointer: Datasets and Volumes write a
latestfile for O(1) cold-start resolution (1 Get instead of N manifest downloads) - Pointer-before-manifest protocol: Pointer write failure aborts the commit, eliminating stale pointer bugs across process restarts
- O(1)
Snapshot(ctx, id)on HiveLayout: Canonical manifest at the non-partitioned path replaces O(N) partition scan - O(log B) Volume block lookup: Binary search on sorted blocks replaces linear scan in
ReadAt - O(N + K log K) block merge: Merge-insert algorithm avoids re-sorting the full cumulative block set on each commit
- Optimized scan fallback:
latestByScanloads only the last manifest (1 List + 1 Get), self-heals by writing the pointer
Upgrade Notes
- No API changes — all improvements are internal and transparent
- No migration required — pre-v0.7.3 datasets and volumes work without modification
- Automatic self-healing — on first write after upgrade, the
latestpointer and canonical manifest (HiveLayout) are created automatically - Safe to upgrade from v0.7.2
References
Full Changelog: v0.7.2...v0.7.3
What's Changed
- refactor(test): ♻️ use b.Context() in benchmarks by @justapithecus in #112
- docs(v1): 📝 populate quarry dogfooding entry by @justapithecus in #113
- docs(meta): 📝 update CONTRIBUTING.md and fix LICENSE formatting by @justapithecus in #114
- feat(build): 🔧 add benchmarks, integration nightly, and yamllint by @justapithecus in #115
- feat(build): 📊 add benchstat tooling and statistical benchmark tasks by @justapithecus in #116
- docs(build): 📊 add CI benchmark results for cross-environment comparison by @justapithecus in #117
- perf(core): ⚡️ O(1) resolution and complexity sweep by @justapithecus in #127
- docs(meta): 📝 v0.7.3 changelog and README update by @justapithecus in #128
Full Changelog: v0.7.2...v0.7.3
v0.7.2
Rename Go module path to pithecene-io
Summary
The Go module path has been renamed from github.com/justapithecus/lode to github.com/pithecene-io/lode to reflect the GitHub organization rename. No behavior changes — all APIs, semantics, and contracts remain unchanged.
Highlights
- 🚚 Update
go.modmodule declaration togithub.com/pithecene-io/lode - 🚚 Update all Go import paths across source, tests, and examples (22 files)
- 📝 Update documentation references (README, PUBLIC_API, contracts, tooling)
- 📝 Add CHANGELOG entry with import path migration notes
Breaking Changes
- Import paths have changed. All consumers must update their imports:
github.com/justapithecus/lode/lode→github.com/pithecene-io/lode/lodegithub.com/justapithecus/lode/lode/s3→github.com/pithecene-io/lode/lode/s3
Upgrade Notes
- Update
requiredirective ingo.modtogithub.com/pithecene-io/lode - Find-and-replace
justapithecus→pithecene-ioin all import blocks - No API, behavior, or semantic changes
References
- #111 — Rename PR
Full Changelog: v0.7.1...v0.7.2
What's Changed
- refactor(build): 🚚 rename module path from justapithecus to pithecene-io by @justapithecus in #111
Full Changelog: v0.7.1...v0.7.2
v0.7.1
Fix O(n²) write degradation on remote stores
Summary
Write(), StreamWrite(), and StreamWriteRecords() called Latest() on every invocation, scanning all manifests via store.List + N×store.Get. On remote stores (S3, R2) with ~50–100ms per API call, this caused sequential writes to degrade quadratically. Parent snapshot ID is now cached after each successful write, turning every write after the first into O(1).
Highlights
- ⚡️ Cache last-written
DatasetSnapshotIDin thedatasetstruct - 🔧 All three write paths (
Write,StreamWrite,StreamWriteRecords) andstreamWriter.Commitupdate the cache - 🧊 Cold start (first write) falls back to
Latest(); all subsequent writes are O(1) - 📈 New sequential write benchmarks with simulated store latency as regression guard
- 📝 CHANGELOG comparison links updated to
pithecene-ioorg - 🩹 Fixed stale README gotcha: nil metadata has been safe since v0.7.0
Upgrade Notes
- No API changes; transparent internal optimization
- All write paths benefit automatically
- Safe to upgrade from v0.7.0
References
Full Changelog: v0.7.0...v0.7.1
What's Changed
- docs(v1-readiness): 📝 add v1.0 release criteria and dogfooding tracker by @justapithecus in #106
- docs(volume): 🔥 consolidate VOLUME_DIRECTION.md into CONTRACT_VOLUME.md by @justapithecus in #107
- fix(dataset): ⚡️ cache parent snapshot ID to eliminate O(n²) writes by @justapithecus in #109
- docs(release): 📝 v0.7.1 changelog, README updates, and test matrix by @justapithecus in #110
Full Changelog: v0.7.0...v0.7.1
v0.7.0
Codec-agnostic per-file statistics and nil metadata coalescing
Summary
v0.7.0 adds per-file column statistics to manifests (enabling pruning workflows without opening data files) and relaxes nil metadata handling across all write paths to coalesce to empty instead of returning an error.
Highlights
- Per-file column statistics: New
StatisticalCodecandStatisticalStreamEncoderinterfaces allow any codec to report per-file column stats (min, max, null count, distinct count) persisted onFileRef - Parquet statistics: The Parquet codec implements
StatisticalCodec, reporting column-level min/max/null count for all orderable types (int32, int64, float32, float64, string, timestamp) - New public types:
FileStats,ColumnStatson the public API surface - Nil metadata coalescing:
Write,StreamWrite,StreamWriteRecords, andVolume.Commitnow coalesce nil metadata toMetadata{}instead of returning an error - Contract updates:
CONTRACT_CORE,CONTRACT_WRITE_API,CONTRACT_VOLUME, andCONTRACT_PARQUETupdated to reflect new semantics - 14 new stats tests and 4 updated coalescing tests with full traceability matrix coverage
Upgrade Notes
- Callers that previously passed
Metadata{}solely to avoid nil errors can now passnilsafely - Callers that relied on nil metadata returning an error should remove that expectation
- Per-file stats are opt-in: only codecs implementing
StatisticalCodecproduce them; manifests without stats remain valid
References
Full Changelog: v0.6.0...v0.7.0
What's Changed
- docs(agents): 📝 enhance AGENTS.md with Go style and composition guardrails by @justapithecus in #102
- feat(manifest): ✨ add per-file column statistics for codec-agnostic pruning by @justapithecus in #103
- chore(api): 🩹 post-stats housekeeping and nil metadata coalescing by @justapithecus in #104
- docs: 📝 backfill CHANGELOG for v0.6.0 and v0.7.0 by @justapithecus in #105
Full Changelog: v0.6.0...v0.7.0
v0.6.0
Dual persistence: Dataset + Volume
Summary
Introduces Volume as a second first-class persistence paradigm alongside Dataset. Volume provides sparse, resumable, range-addressable byte-space persistence with incremental commits and overflow-safe arithmetic throughout.
Highlights
NewVolumeconstructor withVolumeID,TotalLength, and optionalWithVolumeChecksumStageWriteAt/Commit/ReadAtfor incremental block-level persistence- Cumulative snapshot manifests with strict overlap validation
Latest/Snapshots/Snapshotfor Volume history accessErrRangeMissingandErrOverlappingBlockserror sentinels- Overflow-safe bounds checks across all Volume code paths
Breaking Changes
Snapshot→DatasetSnapshot,SnapshotID→DatasetSnapshotIDReader→DatasetReader,NewReader→NewDatasetReaderSegmentRef→ManifestRef(Dataset),BlockRef(Volume)ListSegments→ListManifests,SegmentListOptions→ManifestListOptionsErrOverlappingSegments→ErrOverlappingBlocksErrOptionNotValidForReader→ErrOptionNotValidForDatasetReader
Upgrade Notes
- All Dataset type renames are mechanical find-and-replace in consuming code
- Volume is entirely additive — no changes needed if you only use Dataset
- Volume uses a fixed internal layout; the Layout abstraction remains Dataset-specific
References
docs/contracts/CONTRACT_VOLUME.mddocs/contracts/CONTRACT_WRITE_API.md(concurrency matrices)docs/contracts/CONTRACT_ERRORS.md(new sentinels)examples/volume_sparse/
Full Changelog: v0.5.0...v0.6.0
What's Changed
- docs(roadmap): 📝 add v0.6 volume contract + api plan by @justapithecus in #92
- refactor(api): ♻️ rename Dataset/Reader types and add Volume type definitions by @justapithecus in #94
- feat(volume): ✨ implement core Volume persistence by @justapithecus in #95
- test(volume): ✅ add comprehensive Volume test suite by @justapithecus in #96
- feat(examples): ✨ add sparse Volume ranges example by @justapithecus in #97
- docs(volume): 📝 finalize Volume documentation and README refresh by @justapithecus in #98
- docs: 📝 pre-v0.6.0 documentation audit fixes by @justapithecus in #99
- fix(volume): 🐛 use overflow-safe arithmetic in bounds checks by @justapithecus in #100
- docs(roadmap): 📝 mark Parquet and Volume Phase 6 deliverables complete by @justapithecus in #101
Full Changelog: v0.5.0...v0.6.0
v0.5.0
Parquet codec and schema‑validated columnar storage
Summary
Adds a Parquet codec with explicit schema validation and new Parquet examples, plus dedicated Parquet error sentinels.
Highlights
- Introduces
NewParquetCodec(schema, opts...)with schema validation at construction time. - Adds Parquet schema/types (
ParquetSchema,ParquetField,ParquetType) and compression options. - Adds
examples/parquet/and Parquet-specific error sentinels.
Breaking Changes
NewParquetCodecnow returns(Codec, error)instead ofCodec.
Upgrade Notes
- Parquet is non‑streaming:
StreamWriteRecordsreturnsErrCodecNotStreamable; useDataset.Write. - When using Parquet, set the Lode compressor to
NewNoOpCompressor()to avoid double compression. - Invalid Parquet schemas now fail at construction time.
References
docs/contracts/CONTRACT_PARQUET.mddocs/contracts/CONTRACT_ERRORS.md
Full Changelog: v0.4.1...v0.5.0
v0.4.1
S3 multipart atomicity hardening
Summary
Improves S3 multipart atomic no‑overwrite for large uploads and documents backend compatibility.
Highlights
- Adds a documented S3 backend compatibility matrix for multipart completion behavior.
- Uses conditional
CompleteMultipartUploadfor large uploads to close the TOCTOU window.
Known Limitations
- Atomic no‑overwrite for large uploads is verified on AWS S3 and assumed, but untested, on other S3‑compatible backends.
References
PUBLIC_API.mddocs/contracts/
Full Changelog: v0.4.0...v0.4.1