Skip to content

Commit 2194a76

Browse files
docs(v1-readiness): 📝 reclassify blob file completeness as out-of-scope (#167)
## Summary Reclassifies V1_READINESS §9 (Blob File Completeness) as out-of-scope based on quarry dogfood analysis. The friction quarry reported — sidecar files not enumerable via manifest — stems from writing files via `Store.Put()` directly, bypassing the Dataset write path. This is a caller-side usage pattern, not a Lode design gap. Adds PUBLIC_API.md guidance documenting correct patterns. ## Highlights - **§9 reclassified**: Both criteria marked as out-of-scope with design rationale. Manifest self-description integrity is preserved — the manifest describes files Lode wrote, not files the caller wrote outside Lode's API. - **Quarry friction updated**: API friction row updated to reflect CAS resolution (v0.9.0 `WithRetryCount`) and §9 reclassification. Remaining friction is caller-side. - **New PUBLIC_API.md section**: "Sidecar Files and Store Access" documents correct patterns (`StreamWrite` per blob, file inventory in `Metadata`) and the anti-pattern (Store.Put bypass without tracking). ## Test plan - [x] `scripts/verify-snippets.sh` passes (new code fences use `<!-- illustrative -->`) - [ ] Review §9 evidence text for accuracy - [ ] Review PUBLIC_API.md guidance for clarity and completeness 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e4c873c commit 2194a76

2 files changed

Lines changed: 63 additions & 11 deletions

File tree

PUBLIC_API.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,58 @@ Common errors when using streaming APIs:
401401

402402
---
403403

404+
## Sidecar Files and Store Access
405+
406+
**Lode's manifest tracks only files written through Lode's write path.**
407+
408+
Files written via `Store.Put()` directly — bypassing `Dataset.Write`,
409+
`StreamWrite`, or `StreamWriteRecords` — are not recorded in the manifest
410+
and are not discoverable via `Snapshots()`, `Read()`, or any Dataset API.
411+
412+
This is by design. The manifest is authoritative because Lode produced the
413+
files it describes: it knows their size, computed their checksums (when
414+
configured), and applied the configured codec and compressor. Files Lode
415+
didn't write cannot carry these guarantees.
416+
417+
### Correct Patterns for Binary Sidecar Files
418+
419+
| Pattern | When to Use | Completeness |
420+
|---------|-------------|-------------|
421+
| `StreamWrite` per file | Each sidecar is its own snapshot; manifest tracks it | Full — `Snapshots()` enumerates all committed blobs |
422+
| `Write` with `[]byte` data | Small blobs that fit in memory | Full — manifest tracks the file |
423+
| File inventory in `Metadata` | Caller manages storage; Lode tracks the inventory | Caller-verified — paths listed in metadata, existence checked by caller |
424+
425+
### StreamWrite for Sidecar Blobs (Recommended)
426+
427+
<!-- illustrative -->
428+
```go
429+
// Each sidecar file gets its own snapshot via StreamWrite
430+
sw, _ := ds.StreamWrite(ctx, lode.Metadata{"filename": "model.bin", "run_id": "r-123"})
431+
io.Copy(sw, file)
432+
snapshot, _ := sw.Commit(ctx)
433+
// snapshot.Manifest.Files[0] tracks the blob with size and checksum
434+
```
435+
436+
### File Inventory in Metadata (When Managing Storage Directly)
437+
438+
<!-- illustrative -->
439+
```go
440+
// Caller writes files outside Lode and records paths in metadata
441+
store.Put(ctx, "custom/path/model.bin", modelReader)
442+
store.Put(ctx, "custom/path/index.bin", indexReader)
443+
444+
snapshot, _ := ds.Write(ctx, records, lode.Metadata{
445+
"sidecar_files": []string{"custom/path/model.bin", "custom/path/index.bin"},
446+
})
447+
// Caller can later read snapshot.Manifest.Metadata["sidecar_files"]
448+
// to enumerate files, but must verify existence independently
449+
```
450+
451+
**Anti-pattern:** Writing files via `Store.Put()` without tracking them in
452+
metadata or using `StreamWrite`, then expecting the manifest to enumerate them.
453+
454+
---
455+
404456
## Choosing a Write API
405457

406458
- Use `Write` for in-memory data, partitioned data, or codecs that do not support streaming.

docs/V1_READINESS.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ as `(private)` instead of by number.
4343
| Dataset write round-trip || 2026-02-23 | Hive-partitioned writes and reads over S3-compatible backend at >100k object scale |
4444
| Volume write round-trip | N/A || Dataset-only integration |
4545
| Error sentinels observed || 2026-02-23 | `ErrNoSnapshots` handled on cold start; latest-pointer resolution exercised |
46-
| API friction (none = pass) | | 2026-02-23 | Blob file completeness not verifiable without prefix scan (see §9); concurrent write safety now documented but still operational friction for non-CAS stores |
46+
| API friction (none = pass) | ⚠️ | 2026-03-22 | Blob file completeness (§9) reclassified as out-of-scope — sidecar files written via `Store.Put()` bypass the Dataset API and are the caller's responsibility (see §9 and `PUBLIC_API.md` §Sidecar Files). CAS friction resolved for CAS-capable stores (R2) via `WithRetryCount` (v0.9.0); non-CAS store guidance documented in `PUBLIC_API.md` §Adapter CAS Support. Remaining friction is caller-side. |
4747

4848
<!--
4949
### <project-name or redacted alias>
@@ -273,19 +273,19 @@ as `(private)` instead of by number.
273273
274274
### 9. Blob File Completeness
275275

276-
- [ ] Blob files written alongside structured data within a snapshot are enumerable without prefix-scanning storage
276+
- [x] ~~Blob files written alongside structured data within a snapshot are enumerable without prefix-scanning storage~~ Reclassified: out of scope (see below)
277277

278-
> **Evidence:** _not yet recorded_
279-
> Date: | Observer: | Project:
280-
> Summary:
281-
> Issue: #___
278+
> **Evidence:** Reclassified as out-of-scope. Files written via `Store.Put()` directly — bypassing the Dataset write path — are not tracked by manifests by design. The manifest describes files Lode wrote; caller-managed sidecar files are the caller's responsibility. Correct patterns: use `StreamWrite` to write blobs through Lode, or track file inventories in snapshot `Metadata`. See `PUBLIC_API.md` §Sidecar Files and Store Access.
279+
> Date: 2026-03-22 | Observer: @pithecene-io | Project: lode
280+
> Summary: Design analysis confirmed this is a caller-side usage pattern, not a Lode gap. Manifest self-description integrity preserved.
281+
> Issue:
282282
283-
- [ ] Consumers can verify which blob files were successfully persisted for a given commit
283+
- [x] ~~Consumers can verify which blob files were successfully persisted for a given commit~~ Reclassified: out of scope (see above)
284284

285-
> **Evidence:** _not yet recorded_
286-
> Date: | Observer: | Project:
287-
> Summary:
288-
> Issue: #___
285+
> **Evidence:** Same reclassification. Blobs written through Lode's write path (`Write`, `StreamWrite`, `StreamWriteRecords`) are tracked in `Manifest.Files` and verifiable. Blobs written outside Lode are outside the manifest's scope.
286+
> Date: 2026-03-22 | Observer: @pithecene-io | Project: lode
287+
> Summary: Completeness verification is available for all files written through Lode's API. Files bypassing the API are the caller's concern.
288+
> Issue:
289289
290290
### 10. Data Correctness
291291

0 commit comments

Comments
 (0)