This document defines the core data model and invariants for Lode. It is authoritative for all implementations and adapters.
Lode defines persistence structure only. It does not define execution, planning, scheduling, or optimization.
- Immutable, self-describing snapshots.
- Explicit, persisted metadata.
- Linear history per dataset.
- Backend-agnostic layout semantics.
- Query planning or execution.
- Background compaction or mutation.
- Backend-specific behavioral flags.
Logical collection of snapshots. A dataset has a stable DatasetID.
Immutable point-in-time state of a dataset. Each snapshot has a stable
SnapshotID and (optionally) a parent snapshot ID.
The authoritative description of a snapshot and its data files.
Manifests MUST be self-describing and persisted.
Minimum required fields:
- schema name + version
- dataset ID
- snapshot ID
- creation time
- explicit metadata object (see below)
- list of files with sizes and checksums (when configured)
- parent snapshot ID (when applicable)
- row/event count (total data units in snapshot)
- min/max timestamp (when data units implement
Timestamped; omit if not applicable) Optional fields: - codec name (omit when no codec is configured)
- per-file statistics (when the codec reports them via
StatisticalCodec; omit when not available)
FileRef MAY contain per-file statistics reported by the codec.
- Statistics are codec-agnostic: any codec may report them via the
StatisticalCodecinterface. - When a codec reports statistics, they MUST be persisted on the FileRef.
- When a codec does not report statistics, the stats field MUST be omitted.
- Statistics values MUST be JSON-serializable.
- Statistics MUST NOT be inferred; they are reported by the codec from observed data.
- Per-file statistics include: row count, and per-column min, max, null count, and distinct count.
- Distinct count is optional; zero means not computed.
- Checksum computation is opt-in and explicit.
- When a checksum component is configured, manifests MUST record:
- the checksum component name, and
- a checksum value for each file written by the dataset.
- When no checksum component is configured, checksum fields MUST be omitted.
Manifests are immutable once written.
- Metadata MUST be explicit on every snapshot.
nilmetadata is coalesced to empty ({}) at the API boundary.- Empty metadata (
{}) is valid and MUST be persisted as an explicit object. - Metadata values MUST be JSON-serializable.
- Manifests always contain a non-nil metadata map.
- Snapshot history is strictly linear (single head).
- Snapshots are immutable after commit.
- Data files are immutable after write.
- Commits create new state; they do not mutate existing state.
Lode stores facts, not interpretations.