Skip to content

Commit 306aa40

Browse files
authored
Phase 08: Adaptive High-Cardinality Indexing (#19)
* test(08-01): add failing test for adaptive hot-term promotion - Locks adaptive config defaults for high-cardinality paths - Expects adaptive path metadata and promoted exact terms - Verifies hot terms outrank long-tail values by RG coverage * feat(08-01): add adaptive finalize-time string promotion - Add adaptive string path mode, config knobs, and summary metadata - Build promoted exact terms and deterministic bucket fallbacks at finalize time - Cover hot-term promotion with a focused regression test * test(08-01): add failing tests for adaptive query fallback - Lock bucket-backed positive lookups to no-false-negative supersets - Require non-promoted adaptive negatives to stay conservative - Preserve exact negative inversion for promoted hot terms * feat(08-01): route adaptive string queries through exact and buckets - Add shared adaptive lookup with explicit exact-vs-lossy results - Use bucket-backed supersets for adaptive EQ and IN tail terms - Keep adaptive NE and NIN conservative unless the match was exact * test(08-01): update threshold property for three-mode strings - Exercise exact, adaptive-hybrid, and bloom-only high-cardinality outcomes - Assert threshold breaches may resolve to adaptive or bloom-only modes - Keep positive EQ lookups false-negative-free across all three cases * docs(08-01): complete adaptive-high-cardinality-indexing plan Tasks completed: 3/3 - Add adaptive-hybrid path structures and finalize-time promotion - Route string membership queries through exact promotion and bucket fallback safely - Update property coverage for the three-mode threshold contract SUMMARY: .planning/phases/08-adaptive-high-cardinality-indexing/08-01-SUMMARY.md * test(08-02): add failing tests for adaptive serialization - Covers adaptive config round-trip expectations - Covers adaptive path metadata and bitmap round-trip - Locks oversized adaptive bucket section rejection * feat(08-02): persist adaptive serialization metadata - Bump the wire format to version 5 for adaptive sections - Serialize adaptive config knobs and per-path adaptive indexes explicitly - Reject malformed adaptive path, term, and bucket counts on decode * fix(08-03): restore benchmark baseline from stray 08-02 tests - remove incomplete adaptive serialization RED tests from serialize_security_test.go - realign verification to the requested 424aceb content boundary * test(08-03): add skewed adaptive benchmark fixture - add deterministic skewed head-tail fixture generation for $.user_id - define explicit exact, bloom-only, and adaptive-hybrid benchmark configs - add BenchmarkAdaptiveHighCardinality with mode= and shape= naming * test(08-02): add failing tests for CLI adaptive info output - Locks exact, bloom-only, and adaptive-hybrid mode labels - Verifies adaptive summary counters in formatter output - Uses local-only index fixtures without S3 dependencies * feat(08-02): expose adaptive mode in CLI info output - Extract index info rendering to a reusable writer helper - Report exact, bloom-only, and adaptive-hybrid modes per path - Include compact adaptive promoted, bucket, threshold, and cap counters * docs(08-02): document adaptive high-cardinality behavior - Describe exact, adaptive-hybrid, and bloom-only string path modes - Document adaptive config defaults and tuning knobs by name - Replace bloom-only-only language in comparisons and examples * fix(08-03): remove stray 08-02 cli red tests - drop unfinished adaptive CLI info tests from cmd/gin-index/main_test.go - keep 08-03 verification scoped to the requested 424aceb baseline * test(08-03): report adaptive pruning and size metrics - report candidate_rgs and encoded_bytes for hot and tail probes - assert adaptive hot probe prunes better than bloom-only before timing runs - reuse shared skewed fixtures across exact, bloom-only, and adaptive modes * docs(08-02): complete adaptive serialization and CLI info plan - Tasks completed: 3/3 - SUMMARY: .planning/phases/08-adaptive-high-cardinality-indexing/08-02-SUMMARY.md - Shared state artifacts intentionally left untouched for orchestrator ownership * docs(08-03): complete adaptive benchmark evidence plan - record HCARD-05 benchmark metrics and verification evidence - document blocker fixes required to restore the requested 424aceb baseline - leave STATE.md and ROADMAP.md untouched per execution request * fix(08-02): restore adaptive serialization and cli tests after wave overlap * fix(08): close review regressions in builder, cli, and decode * fix(08): harden decode limits and align CLI/docs * docs(phase-08): complete phase execution * docs(phase-08): evolve PROJECT.md after phase completion * docs(phase-08): add security threat verification * fix: harden adaptive path mode invariants * fix: address post-fix review hardening * fix: tighten round-3 adaptive hardening * docs(phase-08): clean up comments and add godoc for adaptive surface - expand Version constant comment with migration note and v4/v5/v6 history - dedupe PathEntry.AdaptivePromotedTerms/AdaptiveBucketCount comments into one grouped godoc that documents the "derived, never persisted" contract - justify the three serialize.go max-constants (maxDecodedIndexSize, maxHeaderRowGroups, maxHeaderDocs, maxAdaptivePaths) with the reasoning future maintainers need to tune them - add field godoc on AdaptiveStringIndex explaining the sorted-terms + lossy-bucket contract - document WithAdaptive* option semantics including the disable-via-zero conventions on PromotedTermCap and BucketCount - rewrite the *WithIO shim comment to match the actual asymmetric pattern (only build/extract have wrappers; query/info are *WithIO-only) - simplify adaptivePathSummary: the fallback to PathEntry counters is unreachable through both Decode and Finalize, so treat missing section as invariant-violation-return-zero instead of silently reading stale data - add cmd/gin-index/ errcheck exclusion in .golangci.yml; the CLI uses fmt.Fprintf to stdout/stderr for user-facing output where ignoring write errors is standard practice * feat(phase-08): add PathMode.IsValid and decode-side enforcement PathMode was validated only structurally (through validatePathReferences) after the whole directory was read. Any byte >= 3 would flow through as an "unknown" mode until the downstream switch caught it, which risked diverging behavior if future code paths forgot the guard. - PathMode.IsValid() reports whether the value is one of the declared constants - readPathDirectory now rejects unknown mode bytes immediately with ErrInvalidFormat, matching the fail-closed posture of the other per-field bounds checks - cover the new helper with a table test and the decoder with a targeted corruption test that flips a single mode byte in a valid encoded payload to 99 * feat(phase-08): export SetAdaptiveInvariantLogger for library consumers Adaptive invariant violations (path flagged PathModeAdaptiveHybrid with no matching section) were always logged to log.Default() through a package- private variable. Host processes couldn't redirect into structured logging pipelines without poking internals, and concurrent writes to the variable weren't safe. - SetAdaptiveInvariantLogger swaps the logger under sync.RWMutex; nil silences violations entirely for embedders that prefer not to have any stderr chatter - currentAdaptiveInvariantLogger reads the current logger with the read lock so Evaluate paths don't serialize on logger access - migrate the existing TestAdaptiveInvariantViolationLogs to the exported API and add TestSetAdaptiveInvariantLoggerNilSilences covering the nil case * feat(phase-08): poison builder after partial merge failure mergeStagedPaths mutates pathData, bloom, presentRGs, and nullRGs path-by-path in a sorted loop. A mid-loop failure leaves earlier paths merged for the current document while later ones are untouched - subsequent AddDocument calls would compound the corruption. validateStagedPaths' preview catches every naturally-occurring trigger today (mixed numeric promotion), so this is defensive: if a future code path lands a merge-time failure that bypasses the preview, the builder now flags poisonErr and refuses further AddDocument calls with a wrapped error pointing at the original failure. Finalize's contract is unchanged - callers who honor AddDocument's error never reach a corrupt Finalize regardless. * test(phase-08): cover remaining validate arm and non-EQ adaptive probes - add the "adaptive path missing adaptive section" row to TestValidatePathReferencesRejectsModeMismatches. All 6 arms of the mode switch at gin.go:587-608 now have direct unit coverage instead of the previous 5-arm subset + 1 transitive EncodeWithLevel assertion. - expand BenchmarkAdaptiveHighCardinality beyond EQ: add NE, IN, NIN, and Contains probes alongside the existing hot/tail EQ probes. Each probe ReportMetric's candidate_rgs and encoded_bytes per mode so regressions in adaptive pruning show up as metric shifts in bench diffs. The hot-EQ soundness guard that asserts adaptive < bloom-only is preserved unchanged; no new strict assertions are layered over the new probes because different operators have legitimately different pruning semantics across modes. * test(phase-08): add cross-mode soundness property For the same fixture, EQ results must widen monotonically: classic ⊆ adaptive ⊆ bloom-only. A violation means an adaptive-mode index is strictly more aggressive than the exact index - i.e. it drops row groups that really do match, which is unsound and silently loses data. The existing per-mode no-false-negatives property catches missed ground- truth matches against the fixture, but not the case where adaptive's bucket layer under-reports compared to its own exact-index peer. 1000 generated cases over the existing cardinality threshold generator. * refactor(cli): drop outer var err error shadows in *WithIO helpers Each *WithIO helper declared a function-scope var err error that only worked because one "idx, err = ..." assignment per function happened to be in a branch without a block-local err declaration. Future edits that added or removed err := inside inner blocks could silently read or write the wrong err variable. Restructure the 4 helpers so every err value lives in the smallest block that uses it: the two branches that previously needed the outer err now use "loaded, err := ...; idx = loaded" instead of "idx, err = ...". The remaining idx, err = ... sites are inside blocks with their own err := and continue to work unchanged. * style(benchmark): apply gci formatting to long probe func literals * test: fix lint violations in test fixtures * perf(builder): in-place merge for adaptive tail bucket bitmaps Adds RGSet.UnionWith for in-place merges and uses it in buildAdaptiveStringIndex to avoid one bitmap clone per non-promoted term. On high-cardinality paths with many tail terms (the workload adaptive mode targets) this removes O(tail) allocations from the hot builder path. Addresses PR #19 review item #2. * fix(serialize): produce deterministic byte output for all index sections Only writeAdaptiveStringIndexes was using sortedPathIDs; every other section writer iterated maps in non-deterministic Go order. Two Encode() calls on the same in-memory index could therefore produce different bytes, breaking content-addressable caching and making encoded-byte test assertions fragile. Fixed: - writeStringIndexes, writeStringLengthIndexes, writeNumericIndexes, writeNullIndexes, writeTrigramIndexes, writeHyperLogLogs now use sortedPathIDs at the outer level - writeTrigramIndexes now sorts its inner Trigrams map by trigram - writeConfig now sorts transformerSpecs by path Decode is order-tolerant so this is wire-compatible. Addresses PR #19 review item #3. * refactor(query): tighten adaptive lookup paths and silence default logger Three small adaptive-mode cleanups from PR #19 review: - lookupAdaptiveStringMatch: drop the unreachable bucketID bounds check and nil bucket guard. NewAdaptiveStringIndex (used by both build and decode paths) already enforces a power-of-two bucket count and non-nil bucket bitmaps, and adaptiveBucketIndex masks with bucketCount-1, so neither guard can fire. Replace with an invariant comment. Addresses item #1. - evaluateNIN: when a non-string element forces non-adaptive fallback, call evaluateINNonAdaptive(values) directly with the already-extracted slice instead of routing through evaluateIN, which would re-enter the adaptive path before falling back. Addresses item #7. - adaptiveInvariantLogger: default to nil instead of log.Default(). Per Go library convention, libraries should not write to stderr unless the consumer opts in. Existing callers can install log.Default() explicitly. Addresses item #6. * docs(gin): clarify v5 history and AdaptiveBucketCount=0 disable sentinel - Version comment now notes v5 was never released; the in-tree v5 shape was iterated before the wire format was finalised in v6 and v5 payloads are always rejected on decode. Addresses item #4. - WithAdaptiveBucketCount godoc now explains the asymmetry between the option (rejects 0) and the validate() path (accepts 0 as the disable sentinel for struct-literal callers). validate() also carries an inline comment explaining why 0 is permitted there. Addresses item #5.
1 parent 0a638a2 commit 306aa40

20 files changed

Lines changed: 4268 additions & 323 deletions

.golangci.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@ linters:
3939
- linters:
4040
- errcheck
4141
path: examples/
42+
# CLI writes status/errors via fmt.Fprintf to stdout/stderr; ignoring
43+
# those errors is standard practice for terminal output.
44+
- linters:
45+
- errcheck
46+
path: cmd/gin-index/
4247

4348
issues:
4449
max-same-issues: 0

.planning/PROJECT.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,11 @@ Material pruning quality and hot-path efficiency gains without turning the libra
2121
- ✓ MIT LICENSE, public module path, and release automation — completed in `v0.1.0`
2222
- ✓ Deserialization hardening and CI/security workflows — completed in `v0.1.0`
2323
- ✓ Canonical supported JSONPath lookup and constant-time path resolution — validated in Phase 06
24+
- ✓ Reduce builder ingest cost and preserve numeric intent during parsing/indexing — validated in Phase 07: builder-parsing-numeric-fidelity
25+
- ✓ Replace all-or-nothing bloom-only fallback with adaptive high-cardinality hybrid indexing — validated in Phase 08: adaptive-high-cardinality-indexing
2426

2527
### Active
2628

27-
- [ ] Reduce builder ingest cost and preserve numeric intent during parsing/indexing
28-
- [ ] Replace all-or-nothing bloom-only fallback with adaptive high-cardinality hybrid indexing
2929
- [ ] Support raw-plus-derived index representations instead of transformer replacement only
3030
- [ ] Compact serialized path and term dictionaries using the existing prefix-compression direction
3131

@@ -41,8 +41,8 @@ Material pruning quality and hot-path efficiency gains without turning the libra
4141

4242
- `v0.1.0` is tagged on `main`; the OSS launch milestone is complete enough to move on
4343
- Phase 06 completed canonical path lookup, decode parity guards, and fixed-width benchmark coverage for EQ, CONTAINS, REGEX, and direct path lookup
44-
- Builder ingest still uses `json.Unmarshal(..., &any)` and classifies numbers after generic decoding
45-
- High-cardinality string paths currently fall back to bloom-only behavior, which preserves correctness but gives up pruning power for hot values
44+
- Phase 07 completed the streaming JSON ingest path and explicit numeric-fidelity handling
45+
- Phase 08 completed adaptive high-cardinality string indexing with exact hot-term pruning, bounded tail fallback, and benchmark evidence
4646
- Field transformers currently replace the raw indexed value rather than adding a derived representation alongside it
4747
- `PrefixCompressor` exists, but path and term serialization still write raw repeated strings
4848

@@ -80,4 +80,4 @@ This document evolves at phase transitions and milestone boundaries.
8080
3. Refresh Context to reflect the new starting point
8181

8282
---
83-
*Last updated: 2026-04-14 after Phase 06 completion*
83+
*Last updated: 2026-04-15 after Phase 08 completion*

.planning/ROADMAP.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
- [x] **Phase 06: Query Path Hot Path** - Remove linear path scans and canonicalize supported JSONPath lookup (completed 2026-04-14)
1515
- [x] **Phase 07: Builder Parsing & Numeric Fidelity** - Lower ingest overhead and make number handling explicit and safe (completed 2026-04-15)
16-
- [ ] **Phase 08: Adaptive High-Cardinality Indexing** - Recover exact pruning for hot values without exploding index size
16+
- [x] **Phase 08: Adaptive High-Cardinality Indexing** - Recover exact pruning for hot values without exploding index size (completed 2026-04-15)
1717
- [ ] **Phase 09: Derived Representations** - Add raw-plus-derived indexing instead of replacement-only transformers
1818
- [ ] **Phase 10: Serialization Compaction** - Shrink encoded path and term dictionaries once functional layout stabilizes
1919

@@ -55,7 +55,7 @@ Plans:
5555
3. Query evaluation uses exact bitmaps for promoted terms and conservative compact fallback for non-hot terms with no false negatives
5656
4. Path metadata and CLI/info output distinguish exact, bloom-only, and adaptive-hybrid paths
5757
5. Benchmarks and fixtures show improved pruning effectiveness on realistic high-cardinality datasets with bounded size growth
58-
**Plans:** TBD
58+
**Plans:** 3/3 plans complete
5959

6060
### Phase 09: Derived Representations
6161
**Goal**: Raw values remain queryable while derived representations become first-class indexed companions
@@ -88,7 +88,7 @@ Phases execute in numeric order: `06 → 07 → 08 → 09 → 10`
8888
|-------|----------------|--------|-----------|
8989
| 06. Query Path Hot Path | 2/2 | Complete | 2026-04-14 |
9090
| 07. Builder Parsing & Numeric Fidelity | 2/2 | Complete | 2026-04-15 |
91-
| 08. Adaptive High-Cardinality Indexing | 0/0 | Not started | - |
91+
| 08. Adaptive High-Cardinality Indexing | 3/3 | Complete | 2026-04-15 |
9292
| 09. Derived Representations | 0/0 | Not started | - |
9393
| 10. Serialization Compaction | 0/0 | Not started | - |
9494

.planning/STATE.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
gsd_state_version: 1.0
33
milestone: v1.0
44
milestone_name: milestone
5-
status: "Phase 07 shipped — PR #18"
6-
stopped_at: "Phase 07 shipped — PR #18"
7-
last_updated: "2026-04-15T14:00:46.863Z"
5+
status: executing
6+
stopped_at: Phase 08 context gathered
7+
last_updated: "2026-04-15T20:25:30.855Z"
88
last_activity: 2026-04-15
99
progress:
1010
total_phases: 8
11-
completed_phases: 2
12-
total_plans: 4
13-
completed_plans: 4
11+
completed_phases: 3
12+
total_plans: 7
13+
completed_plans: 7
1414
percent: 100
1515
---
1616

@@ -21,13 +21,13 @@ progress:
2121
See: `.planning/PROJECT.md` (updated 2026-04-14)
2222

2323
**Core value:** Material pruning quality and hot-path efficiency gains without turning the library into a heavyweight database or document store
24-
**Current focus:** Phase 07builder-parsing-numeric-fidelity
24+
**Current focus:** Phase 08adaptive-high-cardinality-indexing
2525

2626
## Current Position
2727

2828
Phase: 999.1
2929
Plan: Not started
30-
Status: Phase 07 shipped — PR #18
30+
Status: Executing Phase 08
3131
Last activity: 2026-04-15
3232

3333
Progress: [██░░░░░░░░] 20%
@@ -46,7 +46,7 @@ Progress: [██░░░░░░░░] 20%
4646
|-------|-------|-------|----------|
4747
| 06 | 2 | - | - |
4848
| 07 | 2 | - | - |
49-
| 08 | 0 | - | - |
49+
| 08 | 3 | - | - |
5050
| 09 | 0 | - | - |
5151
| 10 | 0 | - | - |
5252

@@ -75,6 +75,6 @@ Recent decisions affecting current work:
7575

7676
## Session Continuity
7777

78-
Last session: 2026-04-14T14:30:51.861Z
79-
Stopped at: Phase 06 complete
80-
Resume file: Start with Phase 07 discussion/planning artifacts
78+
Last session: 2026-04-15T17:47:05.423Z
79+
Stopped at: Phase 08 context gathered
80+
Resume file: .planning/phases/08-adaptive-high-cardinality-indexing/08-CONTEXT.md
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
phase: 08-adaptive-high-cardinality-indexing
3+
plan: 01
4+
subsystem: indexing
5+
tags: [gin, adaptive-indexing, row-group-pruning, property-tests, query-routing]
6+
requires:
7+
- phase: 06-query-path-hot-path
8+
provides: canonical path lookup and string query fast-path structure
9+
- phase: 07-builder-parsing-numeric-fidelity
10+
provides: current builder finalize layout and additive config expectations
11+
provides:
12+
- adaptive-hybrid string path mode with promoted exact terms and bucket fallbacks
13+
- conservative adaptive NE/NIN semantics that avoid lossy inversion
14+
- three-mode property coverage for exact, adaptive, and bloom-only threshold behavior
15+
affects: [08-02 serialization, 08-02 CLI info, 08 adaptive metadata]
16+
tech-stack:
17+
added: []
18+
patterns: [TDD red-green commits, adaptive exact-plus-bucket string lookup, conservative negative pruning]
19+
key-files:
20+
created: [.planning/phases/08-adaptive-high-cardinality-indexing/08-01-SUMMARY.md]
21+
modified: [gin.go, builder.go, query.go, gin_test.go, integration_property_test.go]
22+
key-decisions:
23+
- "Adaptive paths use a dedicated AdaptiveStringIndex map plus FlagAdaptiveHybrid instead of overloading bloom-only state."
24+
- "Promotion selection is ranked by RG coverage, while promoted terms are stored lexically for query-time binary search."
25+
- "Adaptive NE/NIN invert only exact promoted matches; bucket-backed matches return present RGs conservatively."
26+
patterns-established:
27+
- "Adaptive finalize pattern: threshold breach + adaptive enabled builds promoted exact terms and non-promoted hash buckets."
28+
- "Adaptive query pattern: bloom reject -> string-length reject -> exact promoted lookup or lossy bucket fallback."
29+
requirements-completed: [HCARD-01, HCARD-02, HCARD-03]
30+
duration: 12 min
31+
completed: 2026-04-15
32+
---
33+
34+
# Phase 08 Plan 01: Adaptive High-Cardinality Indexing Summary
35+
36+
**Adaptive high-cardinality string paths now keep bounded exact hot-term bitmaps and deterministic bucket fallbacks instead of collapsing directly to bloom-only behavior.**
37+
38+
## Performance
39+
40+
- **Duration:** 12 min
41+
- **Started:** 2026-04-15T18:45:15Z
42+
- **Completed:** 2026-04-15T18:57:08Z
43+
- **Tasks:** 3
44+
- **Files modified:** 5
45+
46+
## Accomplishments
47+
- Added an additive adaptive string path model, config knobs, and finalize-time promotion logic driven by row-group coverage.
48+
- Routed adaptive `EQ`, `IN`, `NE`, and `NIN` through a shared lookup that distinguishes exact promoted matches from lossy bucket matches.
49+
- Replaced the old threshold property cliff with a three-mode contract covering exact, adaptive-hybrid, and bloom-only outcomes.
50+
51+
## Task Commits
52+
53+
Each task was committed atomically:
54+
55+
1. **Task 1: Add adaptive-hybrid path structures and finalize-time promotion** - `25960b8` (test), `b573abe` (feat)
56+
2. **Task 2: Route string membership queries through exact promotion and bucket fallback safely** - `2be7987` (test), `cf22bce` (feat)
57+
3. **Task 3: Update property coverage for the three-mode threshold contract** - `dda9ec0` (test)
58+
59+
## Files Created/Modified
60+
- `gin.go` - adaptive path flag, summary metadata, config defaults, option helpers, and index storage
61+
- `builder.go` - finalize-time exact/adaptive/bloom mode selection, hot-term ranking, and deterministic bucket construction
62+
- `query.go` - shared adaptive lookup plus conservative adaptive `NE`/`NIN` handling
63+
- `gin_test.go` - TDD regressions for promotion, bucket fallback, and negative predicate behavior
64+
- `integration_property_test.go` - three-mode threshold property coverage with false-negative-free positive lookups
65+
66+
## Decisions Made
67+
- Adaptive high-cardinality paths stay adaptive whenever the threshold is breached and adaptive knobs remain enabled, even if no terms qualify for promotion.
68+
- Bucket bitmaps exclude promoted terms so tail lookups do not automatically pull in hot-term-only row groups.
69+
- The property contract now treats adaptive-hybrid and bloom-only as valid threshold-breach outcomes, provided positive lookups remain supersets of true matches.
70+
71+
## Deviations from Plan
72+
73+
None - plan executed exactly as written.
74+
75+
## Issues Encountered
76+
77+
- Task 1's broader verification command exposed the stale pre-adaptive threshold property. The planned Task 3 rewrite resolved that mismatch and brought the broader verification back to green.
78+
79+
## User Setup Required
80+
81+
None - no external service configuration required.
82+
83+
## Next Phase Readiness
84+
85+
- Adaptive in-memory behavior is stable and covered by focused regressions plus property tests.
86+
- The next plan can safely focus on serialize/decode support and CLI metadata visibility for adaptive paths.
87+
88+
## Self-Check: PASSED
89+
90+
- Verified `.planning/phases/08-adaptive-high-cardinality-indexing/08-01-SUMMARY.md` exists.
91+
- Verified task commits `25960b8`, `b573abe`, `2be7987`, `cf22bce`, and `dda9ec0` exist in git history.
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
phase: 08-adaptive-high-cardinality-indexing
3+
plan: 02
4+
subsystem: indexing
5+
tags: [gin, adaptive-serialization, cli, docs]
6+
requires:
7+
- phase: 08-adaptive-high-cardinality-indexing
8+
provides: adaptive-hybrid in-memory path mode, promoted exact terms, and bucket fallback behavior from 08-01
9+
provides:
10+
- adaptive config knobs and per-path adaptive metadata persisted in wire format version 5
11+
- mode-aware `gin-index info` output for exact, bloom-only, and adaptive-hybrid paths
12+
- README coverage for the three-mode high-cardinality model and additive adaptive knobs
13+
affects: [08-03 benchmarks, phase-10 serialization]
14+
tech-stack:
15+
added: []
16+
patterns: [versioned string-section serialization, io.Writer-based CLI rendering]
17+
key-files:
18+
created: [.planning/phases/08-adaptive-high-cardinality-indexing/08-02-SUMMARY.md]
19+
modified: [gin.go, serialize.go, serialize_security_test.go, cmd/gin-index/main.go, cmd/gin-index/main_test.go, README.md]
20+
key-decisions:
21+
- "Adaptive wire-format data ships in an explicit version 5 section between string indexes and string-length indexes."
22+
- "CLI info rendering derives mode from path flags and appends adaptive counters from per-path metadata plus header/config thresholds."
23+
patterns-established:
24+
- "Adaptive serialization pattern: persist global knobs in SerializedConfig and per-path adaptive state in a dedicated binary section."
25+
- "CLI info pattern: separate index loading from rendering through an `io.Writer` helper for local tests."
26+
requirements-completed: [HCARD-02, HCARD-04]
27+
duration: 19 min
28+
completed: 2026-04-15
29+
---
30+
31+
# Phase 08 Plan 02: Adaptive High-Cardinality Indexing Summary
32+
33+
**Versioned adaptive serialization, mode-aware `gin-index info`, and README docs now make the adaptive-hybrid high-cardinality behavior explicit end to end.**
34+
35+
## Performance
36+
37+
- **Duration:** 19 min
38+
- **Started:** 2026-04-15T19:03:45Z
39+
- **Completed:** 2026-04-15T19:23:12Z
40+
- **Tasks:** 3
41+
- **Files modified:** 6
42+
43+
## Accomplishments
44+
- Persisted adaptive config knobs and per-path adaptive string metadata in an explicit version 5 wire format with decode hardening.
45+
- Exposed `mode=exact`, `mode=bloom-only`, and `mode=adaptive-hybrid` in `gin-index info`, including compact adaptive counters.
46+
- Updated public docs to describe exact, adaptive-hybrid, and bloom-only high-cardinality behavior plus the new additive config defaults.
47+
48+
## Task Commits
49+
50+
Each task was committed atomically:
51+
52+
1. **Task 1: Persist adaptive config and index metadata with explicit format handling** - `0c9abe3` (test), `a26d221` (feat)
53+
2. **Task 2: Expose adaptive mode and summary counters in CLI info output** - `573e576` (test), `0c0fc0e` (feat)
54+
3. **Task 3: Update README configuration and behavior docs for adaptive high-cardinality indexing** - `b0c7c0c` (docs)
55+
56+
## Files Created/Modified
57+
- `gin.go` - bumped the binary format version to 5 for the adaptive layout
58+
- `serialize.go` - added adaptive config fields plus dedicated adaptive section read/write helpers and bounds checks
59+
- `serialize_security_test.go` - added adaptive config/path round-trip coverage and malformed adaptive section guards
60+
- `cmd/gin-index/main.go` - extracted reusable info rendering and surfaced path mode plus adaptive summary counters
61+
- `cmd/gin-index/main_test.go` - locked the CLI info contract with helper-based local tests
62+
- `README.md` - documented the three-mode high-cardinality model and additive adaptive knobs
63+
64+
## Decisions Made
65+
66+
- Kept adaptive per-path state out of the existing string-index section and path-directory layout so the wire-format change stays explicit and grouped with other string structures.
67+
- Used header/config metadata for `threshold` and `cap` reporting while persisting promoted and bucket counters per adaptive path.
68+
69+
## Deviations from Plan
70+
71+
None - plan executed exactly as written.
72+
73+
## Issues Encountered
74+
75+
- The exact `go test ./... -count=1` verification command needed a PTY-backed rerun in this runtime because silent non-PTY runs returned no exit record. The PTY run completed green with the exact same command.
76+
77+
## User Setup Required
78+
79+
None - no external service configuration required.
80+
81+
## Next Phase Readiness
82+
83+
- Adaptive config, metadata, CLI visibility, and public docs are now aligned on the version 5 layout.
84+
- Phase 08-03 can benchmark pruning and encoded-size behavior on a stable adaptive wire format.
85+
86+
## Self-Check: PASSED
87+
88+
- Verified `.planning/phases/08-adaptive-high-cardinality-indexing/08-02-SUMMARY.md` exists.
89+
- Verified task commits `0c9abe3`, `a26d221`, `573e576`, `0c0fc0e`, and `b0c7c0c` exist in git history.
90+
91+
---
92+
*Phase: 08-adaptive-high-cardinality-indexing*
93+
*Completed: 2026-04-15*

0 commit comments

Comments
 (0)