Skip to content

Commit 0a638a2

Browse files
authored
Phase 07: Builder Parsing & Numeric Fidelity (#18)
* test(07-01): add failing parser and atomic ingest regressions - lock explicit decoder usage in AddDocument - cover unsupported numeric rollback and transformer path compatibility * feat(07-01): add transactional explicit-number builder * fix(07-01): preserve int numeric decode compatibility * test(07-02): add phase 07 parser delta benchmarks * docs(phase-07): complete phase execution * fix(07): WR-01 prune int-only fractional ranges * fix(07): WR-02 normalize wildcard transformer input * fix: address phase 07 review findings * fix: harden builder re-review findings * test: cover nested transformer value preparation * docs(07): ship phase 07 — PR #18 * fix: guard float64-to-int64 overflow at MaxInt64 boundary float64(math.MaxInt64) rounds up to 2^63, which overflows to math.MinInt64 on int64 conversion. Use strict < against a named float64 constant to reject the boundary value. * fix: use sorted iteration in stageMaterializedValue map branch Matches the streaming path (stageStreamValue) which already uses sortedObjectKeys for deterministic error reporting.
1 parent 0b45d75 commit 0a638a2

16 files changed

Lines changed: 2430 additions & 110 deletions

.planning/ROADMAP.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
- This milestone starts at `Phase 06`
1313

1414
- [x] **Phase 06: Query Path Hot Path** - Remove linear path scans and canonicalize supported JSONPath lookup (completed 2026-04-14)
15-
- [ ] **Phase 07: Builder Parsing & Numeric Fidelity** - Lower ingest overhead and make number handling explicit and safe
15+
- [x] **Phase 07: Builder Parsing & Numeric Fidelity** - Lower ingest overhead and make number handling explicit and safe (completed 2026-04-15)
1616
- [ ] **Phase 08: Adaptive High-Cardinality Indexing** - Recover exact pruning for hot values without exploding index size
1717
- [ ] **Phase 09: Derived Representations** - Add raw-plus-derived indexing instead of replacement-only transformers
1818
- [ ] **Phase 10: Serialization Compaction** - Shrink encoded path and term dictionaries once functional layout stabilizes
@@ -43,7 +43,7 @@ Plans:
4343
3. Integers within the supported range are indexed without pre-index rounding loss
4444
4. Unsupported numeric values return an explicit error instead of being silently mis-indexed
4545
5. Benchmarks report ingest/build latency and allocation deltas for the new parser path
46-
**Plans:** TBD
46+
**Plans:** 2/2 plans complete
4747

4848
### Phase 08: Adaptive High-Cardinality Indexing
4949
**Goal**: High-cardinality string paths keep exact pruning power for hot values while retaining compact fallback behavior for the long tail
@@ -87,7 +87,7 @@ Phases execute in numeric order: `06 → 07 → 08 → 09 → 10`
8787
| Phase | Plans Complete | Status | Completed |
8888
|-------|----------------|--------|-----------|
8989
| 06. Query Path Hot Path | 2/2 | Complete | 2026-04-14 |
90-
| 07. Builder Parsing & Numeric Fidelity | 0/0 | Not started | - |
90+
| 07. Builder Parsing & Numeric Fidelity | 2/2 | Complete | 2026-04-15 |
9191
| 08. Adaptive High-Cardinality Indexing | 0/0 | Not started | - |
9292
| 09. Derived Representations | 0/0 | Not started | - |
9393
| 10. Serialization Compaction | 0/0 | Not started | - |

.planning/STATE.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,16 @@
22
gsd_state_version: 1.0
33
milestone: v1.0
44
milestone_name: milestone
5-
status: planning
6-
stopped_at: Phase 06 complete
7-
last_updated: "2026-04-14T19:24:30.753Z"
8-
last_activity: "2026-04-14 -- Phase 06 shipped — PR #17"
5+
status: "Phase 07 shipped — PR #18"
6+
stopped_at: "Phase 07 shipped — PR #18"
7+
last_updated: "2026-04-15T14:00:46.863Z"
8+
last_activity: 2026-04-15
99
progress:
10-
total_phases: 5
11-
completed_phases: 1
12-
total_plans: 2
13-
completed_plans: 2
14-
percent: 20
10+
total_phases: 8
11+
completed_phases: 2
12+
total_plans: 4
13+
completed_plans: 4
14+
percent: 100
1515
---
1616

1717
# Project State
@@ -21,14 +21,14 @@ progress:
2121
See: `.planning/PROJECT.md` (updated 2026-04-14)
2222

2323
**Core value:** Material pruning quality and hot-path efficiency gains without turning the library into a heavyweight database or document store
24-
**Current focus:** Phase 07 — Builder Parsing & Numeric Fidelity
24+
**Current focus:** Phase 07 — builder-parsing-numeric-fidelity
2525

2626
## Current Position
2727

28-
Phase: 07
28+
Phase: 999.1
2929
Plan: Not started
30-
Status: Ready to plan Phase 07
31-
Last activity: 2026-04-14 -- Phase 06 shipped — PR #17
30+
Status: Phase 07 shipped — PR #18
31+
Last activity: 2026-04-15
3232

3333
Progress: [██░░░░░░░░] 20%
3434

@@ -45,7 +45,7 @@ Progress: [██░░░░░░░░] 20%
4545
| Phase | Plans | Total | Avg/Plan |
4646
|-------|-------|-------|----------|
4747
| 06 | 2 | - | - |
48-
| 07 | 0 | - | - |
48+
| 07 | 2 | - | - |
4949
| 08 | 0 | - | - |
5050
| 09 | 0 | - | - |
5151
| 10 | 0 | - | - |
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
phase: 07-builder-parsing-numeric-fidelity
3+
plan: 01
4+
subsystem: indexing
5+
tags: [json, parser, int64, serialization, transformers]
6+
requires: []
7+
provides:
8+
- transactional AddDocument staging with explicit numeric parsing
9+
- exact-int numeric metadata and int-aware query evaluation
10+
- regression coverage for atomic failure, decode parity, and transformer numeric paths
11+
affects: [07-02, 08-adaptive-high-cardinality-indexing, 09-derived-representations, 10-serialization-compaction]
12+
tech-stack:
13+
added: []
14+
patterns: [transactional document staging, exact-int numeric mode, transformer subtree materialization]
15+
key-files:
16+
created: []
17+
modified: [builder.go, gin.go, query.go, serialize.go, gin_test.go, transformers_test.go, transformer_registry_test.go, serialize_security_test.go]
18+
key-decisions:
19+
- "Keep AddDocument transactional by staging per-document observations and merging only after parse/validation succeeds."
20+
- "Store int-only path stats as exact int64 values and promote to float mode only when every integer remains exact inside float64."
21+
- "Preserve existing transformer input expectations by normalizing transformer-targeted subtrees before applying the transformer, then classify transformed outputs explicitly."
22+
patterns-established:
23+
- "Parser path: stream objects by default, materialize only transformer-targeted subtrees and array items that need both indexed and wildcard paths."
24+
- "Numeric mode: ValueType 0 = int-only, ValueType 1 = float-or-mixed across build, query, and serialization."
25+
requirements-completed: [BUILD-01, BUILD-02, BUILD-03, BUILD-04]
26+
duration: 32min
27+
completed: 2026-04-15
28+
---
29+
30+
# Phase 07: Builder Parsing & Numeric Fidelity Summary
31+
32+
**Transactional explicit-number ingest with exact-int path semantics, guarded mixed-mode promotion, and decode-parity regressions**
33+
34+
## Performance
35+
36+
- **Duration:** 32 min
37+
- **Started:** 2026-04-15T10:40:22Z
38+
- **Completed:** 2026-04-15T11:12:36Z
39+
- **Tasks:** 3
40+
- **Files modified:** 8
41+
42+
## Accomplishments
43+
- Replaced eager `json.Unmarshal(..., &any)` ingest with transactional per-document staging driven by `json.Decoder` and `UseNumber()`.
44+
- Added exact `int64` path storage/query behavior plus explicit rejection for lossy mixed integer/decimal promotion.
45+
- Extended regression coverage to lock atomic failure, decode parity, transformer numeric compatibility, numeric transformer config round-trip behavior, and the new numeric decode bounds layout.
46+
47+
## Task Commits
48+
49+
Execution landed as one tightly-coupled implementation commit plus a small verification follow-up fix because the parser, numeric mode, and regression work shared the same core builder changes:
50+
51+
1. **Task 1: Replace eager generic decode with transactional explicit-number staging** - `cb5b7bf` (feat)
52+
2. **Task 2: Add exact-int numeric mode and reject lossy mixed-path promotion** - `cb5b7bf` (feat)
53+
3. **Task 3: Add regression coverage for atomic failure, transformer compatibility, and decode parity** - `cb5b7bf` (feat)
54+
4. **Post-verification compatibility fix: preserve int-only `GlobalMin` / `GlobalMax` expectations and update numeric decode bounds coverage** - `fc813f9` (fix)
55+
56+
## Files Created/Modified
57+
- `builder.go` - transactional document staging, explicit numeric classification, transformer-aware subtree materialization, and merge-on-success ingest
58+
- `gin.go` - exact-int numeric metadata on `NumericIndex` and `RGNumericStat`
59+
- `query.go` - int-aware equality and range evaluation for int-only numeric paths
60+
- `serialize.go` - exact-int numeric field encode/decode support with format version bump
61+
- `gin_test.go` - atomic failure, exact `int64` fidelity, mixed-promotion rejection, and decode-parity regressions
62+
- `transformers_test.go` - transformer numeric compatibility and transformed exact-int decode parity coverage
63+
- `transformer_registry_test.go` - numeric registered-transformer config/query round-trip coverage
64+
- `serialize_security_test.go` - numeric index decode bounds coverage updated for the new int metadata layout
65+
66+
## Decisions Made
67+
- Preserve the existing transformer contract by converting transformer input subtrees to legacy-style scalar shapes before running the transformer, but always classify transformed outputs through the new explicit numeric path.
68+
- Keep the streaming parser path sparse by materializing array items only where both indexed and wildcard paths must be staged from the same value.
69+
- Bump the binary format version when persisting new int-only numeric metadata so decode semantics stay explicit.
70+
71+
## Deviations from Plan
72+
73+
### Auto-fixed Issues
74+
75+
**1. [Rule 1 - Bug] Int-only numeric indexes dropped legacy float globals after the exact-int refactor**
76+
- **Found during:** Full-suite verification after Task 3
77+
- **Issue:** Existing transformer/date tests still read `GlobalMin` / `GlobalMax`, and the numeric decode bounds test was still writing the pre-Phase-07 binary layout.
78+
- **Fix:** Populate float global min/max alongside exact int globals for int-only paths during finalize, and update the numeric decode bounds test to the new encoded layout.
79+
- **Files modified:** `builder.go`, `serialize_security_test.go`
80+
- **Verification:** `go test ./... -run 'Test(DecodeBoundsNumericRGs|DateTransformerIntegration)' -count=1` and `go test ./... -count=1`
81+
- **Committed in:** `fc813f9` (fix)
82+
83+
---
84+
85+
**Total deviations:** 1 auto-fixed (1 bug)
86+
**Impact on plan:** Compatibility fix only. No scope creep and no change to the Phase 07 requirements.
87+
88+
## Issues Encountered
89+
90+
- The stale milestone branch required a one-time sync/merge from `main` before execution could start.
91+
- The subagent execution path did not return completion signals in this runtime, so the plan was executed inline under the orchestrator.
92+
93+
## User Setup Required
94+
95+
None - no external service configuration required.
96+
97+
## Next Phase Readiness
98+
99+
- Builder, query, and serialization layers now agree on explicit numeric mode semantics, so benchmark deltas in Plan `07-02` can measure the new parser path directly.
100+
- The final benchmark work can stay isolated to `benchmark_test.go`; no additional production changes are required for `BUILD-05`.
101+
102+
---
103+
*Phase: 07-builder-parsing-numeric-fidelity*
104+
*Completed: 2026-04-15*
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
phase: 07-builder-parsing-numeric-fidelity
3+
plan: 02
4+
subsystem: testing
5+
tags: [benchmark, parser, performance, allocations, transformers]
6+
requires:
7+
- phase: 07-01
8+
provides: transactional explicit-number builder and exact-int numeric semantics
9+
provides:
10+
- in-repo legacy parser benchmark control path
11+
- deterministic Phase 07 benchmark fixtures for int-only, mixed-safe, wide-flat, and transformer-heavy documents
12+
- Phase 07 benchmark families for add-document, build, and finalize parser deltas
13+
affects: [08-adaptive-high-cardinality-indexing, 09-derived-representations, 10-serialization-compaction]
14+
tech-stack:
15+
added: []
16+
patterns: [in-repo benchmark control path, parser mode benchmark labeling, deterministic fixture generation]
17+
key-files:
18+
created: []
19+
modified: [benchmark_test.go]
20+
key-decisions:
21+
- "Keep the legacy control path local to benchmark_test.go so BUILD-05 stays reproducible without reviving production code."
22+
- "Benchmark both wide-flat and transformer-heavy fixtures so Phase 07 measures the two review-identified slow paths directly."
23+
patterns-established:
24+
- "Benchmark labels use parser=/docs=/shape= segments for historical comparisons."
25+
- "Parser delta benchmarks reuse the same fixtures and doc counts across legacy and explicit modes."
26+
requirements-completed: [BUILD-05]
27+
duration: 8min
28+
completed: 2026-04-15
29+
---
30+
31+
# Phase 07: Builder Parsing & Numeric Fidelity Summary
32+
33+
**Reproducible parser-delta benchmarks with an in-repo legacy control and deterministic fixture families for Phase 07**
34+
35+
## Performance
36+
37+
- **Duration:** 8 min
38+
- **Started:** 2026-04-15T11:05:25Z
39+
- **Completed:** 2026-04-15T11:13:09Z
40+
- **Tasks:** 2
41+
- **Files modified:** 1
42+
43+
## Accomplishments
44+
- Added the benchmark-only `benchmarkAddDocumentLegacy` control path and kept it local to `benchmark_test.go`.
45+
- Added deterministic fixture generators for `shape=int-only`, `shape=mixed-safe`, `shape=wide-flat`, and `shape=transformer-heavy` plus the required `parser=` and `docs=` labels.
46+
- Landed `BenchmarkAddDocumentPhase07`, `BenchmarkBuildPhase07`, and `BenchmarkFinalizePhase07`, then verified them with `go test ./... -run '^$' -bench 'Benchmark(AddDocumentPhase07|BuildPhase07|FinalizePhase07)' -benchtime=1x -count=1`.
47+
48+
## Task Commits
49+
50+
1. **Task 1: Add deterministic Phase 07 fixtures and a benchmark-only legacy parser control** - `c0b6afb` (test)
51+
2. **Task 2: Add legacy-vs-explicit ingest/build benchmark families with explicit naming and alloc reporting** - `c0b6afb` (test)
52+
53+
## Files Created/Modified
54+
- `benchmark_test.go` - deterministic Phase 07 fixtures, benchmark-only legacy ingest control, and Phase 07 benchmark families for add-document, build, and finalize deltas
55+
56+
## Decisions Made
57+
- Reused the current builder for both modes and changed only the ingest path so the benchmark deltas stay attributable to parser behavior rather than fixture drift.
58+
- Preserved the control path inside the repo instead of relying on historical benchmark notes or old git checkouts.
59+
- Measured both wide-document and transformer-heavy shapes explicitly because those were the main review concerns about the new staging parser.
60+
61+
## Deviations from Plan
62+
63+
None - plan executed exactly as written.
64+
65+
## Issues Encountered
66+
67+
None.
68+
69+
## User Setup Required
70+
71+
None - no external service configuration required.
72+
73+
## Next Phase Readiness
74+
75+
- `BUILD-05` is satisfied with a same-branch benchmark harness and reproducible parser mode labels.
76+
- The current benchmark output shows the explicit parser is still more allocation-heavy on the wide-flat and transformer-heavy build paths, which gives Phase 08+ a concrete baseline for future optimization work.
77+
78+
---
79+
*Phase: 07-builder-parsing-numeric-fidelity*
80+
*Completed: 2026-04-15*

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ Transform values before indexing via `GINConfig.fieldTransformers`. Use cases: d
105105
- `gin.go` - `FieldTransformer` type, `GINConfig.fieldTransformers`, `WithFieldTransformer` option
106106
- `transformers.go` - All built-in transformers
107107
- `transformers_test.go` - Unit and integration tests
108-
- `builder.go:147` - Transformer application in `walkJSON` before type switch
108+
- `builder.go` - Transformer application in `decodeTransformedValue()` and `stageMaterializedValue()` before type switch
109109

110110
## Go Conventions
111111

0 commit comments

Comments
 (0)