|
| 1 | +--- |
| 2 | +phase: 07-builder-parsing-numeric-fidelity |
| 3 | +plan: 01 |
| 4 | +subsystem: indexing |
| 5 | +tags: [json, parser, int64, serialization, transformers] |
| 6 | +requires: [] |
| 7 | +provides: |
| 8 | + - transactional AddDocument staging with explicit numeric parsing |
| 9 | + - exact-int numeric metadata and int-aware query evaluation |
| 10 | + - regression coverage for atomic failure, decode parity, and transformer numeric paths |
| 11 | +affects: [07-02, 08-adaptive-high-cardinality-indexing, 09-derived-representations, 10-serialization-compaction] |
| 12 | +tech-stack: |
| 13 | + added: [] |
| 14 | + patterns: [transactional document staging, exact-int numeric mode, transformer subtree materialization] |
| 15 | +key-files: |
| 16 | + created: [] |
| 17 | + modified: [builder.go, gin.go, query.go, serialize.go, gin_test.go, transformers_test.go, transformer_registry_test.go, serialize_security_test.go] |
| 18 | +key-decisions: |
| 19 | + - "Keep AddDocument transactional by staging per-document observations and merging only after parse/validation succeeds." |
| 20 | + - "Store int-only path stats as exact int64 values and promote to float mode only when every integer remains exact inside float64." |
| 21 | + - "Preserve existing transformer input expectations by normalizing transformer-targeted subtrees before applying the transformer, then classify transformed outputs explicitly." |
| 22 | +patterns-established: |
| 23 | + - "Parser path: stream objects by default, materialize only transformer-targeted subtrees and array items that need both indexed and wildcard paths." |
| 24 | + - "Numeric mode: ValueType 0 = int-only, ValueType 1 = float-or-mixed across build, query, and serialization." |
| 25 | +requirements-completed: [BUILD-01, BUILD-02, BUILD-03, BUILD-04] |
| 26 | +duration: 32min |
| 27 | +completed: 2026-04-15 |
| 28 | +--- |
| 29 | + |
| 30 | +# Phase 07: Builder Parsing & Numeric Fidelity Summary |
| 31 | + |
| 32 | +**Transactional explicit-number ingest with exact-int path semantics, guarded mixed-mode promotion, and decode-parity regressions** |
| 33 | + |
| 34 | +## Performance |
| 35 | + |
| 36 | +- **Duration:** 32 min |
| 37 | +- **Started:** 2026-04-15T10:40:22Z |
| 38 | +- **Completed:** 2026-04-15T11:12:36Z |
| 39 | +- **Tasks:** 3 |
| 40 | +- **Files modified:** 8 |
| 41 | + |
| 42 | +## Accomplishments |
| 43 | +- Replaced eager `json.Unmarshal(..., &any)` ingest with transactional per-document staging driven by `json.Decoder` and `UseNumber()`. |
| 44 | +- Added exact `int64` path storage/query behavior plus explicit rejection for lossy mixed integer/decimal promotion. |
| 45 | +- Extended regression coverage to lock atomic failure, decode parity, transformer numeric compatibility, numeric transformer config round-trip behavior, and the new numeric decode bounds layout. |
| 46 | + |
| 47 | +## Task Commits |
| 48 | + |
| 49 | +Execution landed as one tightly-coupled implementation commit plus a small verification follow-up fix because the parser, numeric mode, and regression work shared the same core builder changes: |
| 50 | + |
| 51 | +1. **Task 1: Replace eager generic decode with transactional explicit-number staging** - `cb5b7bf` (feat) |
| 52 | +2. **Task 2: Add exact-int numeric mode and reject lossy mixed-path promotion** - `cb5b7bf` (feat) |
| 53 | +3. **Task 3: Add regression coverage for atomic failure, transformer compatibility, and decode parity** - `cb5b7bf` (feat) |
| 54 | +4. **Post-verification compatibility fix: preserve int-only `GlobalMin` / `GlobalMax` expectations and update numeric decode bounds coverage** - `fc813f9` (fix) |
| 55 | + |
| 56 | +## Files Created/Modified |
| 57 | +- `builder.go` - transactional document staging, explicit numeric classification, transformer-aware subtree materialization, and merge-on-success ingest |
| 58 | +- `gin.go` - exact-int numeric metadata on `NumericIndex` and `RGNumericStat` |
| 59 | +- `query.go` - int-aware equality and range evaluation for int-only numeric paths |
| 60 | +- `serialize.go` - exact-int numeric field encode/decode support with format version bump |
| 61 | +- `gin_test.go` - atomic failure, exact `int64` fidelity, mixed-promotion rejection, and decode-parity regressions |
| 62 | +- `transformers_test.go` - transformer numeric compatibility and transformed exact-int decode parity coverage |
| 63 | +- `transformer_registry_test.go` - numeric registered-transformer config/query round-trip coverage |
| 64 | +- `serialize_security_test.go` - numeric index decode bounds coverage updated for the new int metadata layout |
| 65 | + |
| 66 | +## Decisions Made |
| 67 | +- Preserve the existing transformer contract by converting transformer input subtrees to legacy-style scalar shapes before running the transformer, but always classify transformed outputs through the new explicit numeric path. |
| 68 | +- Keep the streaming parser path sparse by materializing array items only where both indexed and wildcard paths must be staged from the same value. |
| 69 | +- Bump the binary format version when persisting new int-only numeric metadata so decode semantics stay explicit. |
| 70 | + |
| 71 | +## Deviations from Plan |
| 72 | + |
| 73 | +### Auto-fixed Issues |
| 74 | + |
| 75 | +**1. [Rule 1 - Bug] Int-only numeric indexes dropped legacy float globals after the exact-int refactor** |
| 76 | +- **Found during:** Full-suite verification after Task 3 |
| 77 | +- **Issue:** Existing transformer/date tests still read `GlobalMin` / `GlobalMax`, and the numeric decode bounds test was still writing the pre-Phase-07 binary layout. |
| 78 | +- **Fix:** Populate float global min/max alongside exact int globals for int-only paths during finalize, and update the numeric decode bounds test to the new encoded layout. |
| 79 | +- **Files modified:** `builder.go`, `serialize_security_test.go` |
| 80 | +- **Verification:** `go test ./... -run 'Test(DecodeBoundsNumericRGs|DateTransformerIntegration)' -count=1` and `go test ./... -count=1` |
| 81 | +- **Committed in:** `fc813f9` (fix) |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +**Total deviations:** 1 auto-fixed (1 bug) |
| 86 | +**Impact on plan:** Compatibility fix only. No scope creep and no change to the Phase 07 requirements. |
| 87 | + |
| 88 | +## Issues Encountered |
| 89 | + |
| 90 | +- The stale milestone branch required a one-time sync/merge from `main` before execution could start. |
| 91 | +- The subagent execution path did not return completion signals in this runtime, so the plan was executed inline under the orchestrator. |
| 92 | + |
| 93 | +## User Setup Required |
| 94 | + |
| 95 | +None - no external service configuration required. |
| 96 | + |
| 97 | +## Next Phase Readiness |
| 98 | + |
| 99 | +- Builder, query, and serialization layers now agree on explicit numeric mode semantics, so benchmark deltas in Plan `07-02` can measure the new parser path directly. |
| 100 | +- The final benchmark work can stay isolated to `benchmark_test.go`; no additional production changes are required for `BUILD-05`. |
| 101 | + |
| 102 | +--- |
| 103 | +*Phase: 07-builder-parsing-numeric-fidelity* |
| 104 | +*Completed: 2026-04-15* |
0 commit comments