Skip to content

Phase 18: Structured IngestError + CLI integration#33

Merged
tazarov merged 36 commits into
mainfrom
phase/18-structured-ingesterror-cli-integration
Apr 27, 2026
Merged

Phase 18: Structured IngestError + CLI integration#33
tazarov merged 36 commits into
mainfrom
phase/18-structured-ingesterror-cli-integration

Conversation

@tazarov

@tazarov tazarov commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 18: Structured IngestError + CLI integration
Goal: Make per-document failures actionable by callers through a structured IngestError, then surface the same structure in gin-index experiment --on-error continue summaries.
Status: Verified

Phase 18 completes the v1.2 ingest-correctness milestone by adding an exported structured ingest error API, wiring hard parser/transformer/numeric/schema document failures into that API, and reporting grouped ingest failures from the experiment CLI in both text and JSON output. It preserves Phase 17 soft-skip behavior and keeps parser contract/internal tragic errors outside the public per-document IngestError boundary.

Changes

Plan 18-01: Public IngestError API

Added IngestLayer and IngestError with Path, Layer, Value, and Err, plus Error, Unwrap, and Cause compatibility.

Key files:

  • Created: ingest_error.go
  • Modified: builder.go, failure_modes_test.go, parser_test.go, gin_test.go, examples/failure-modes/main_test.go

Plan 18-02: Structured Error Guard Coverage

Added behavioral and AST guard coverage so hard ingest sites do not drift back to opaque plain errors.

Key files:

  • Created: ingest_error_guard_test.go
  • Modified: failure_modes_test.go

Plan 18-03: Experiment CLI Grouped Failures

Added deterministic grouped failure summaries for gin-index experiment --on-error continue, including bounded structured samples in text and JSON reports.

Key files:

  • Modified: cmd/gin-index/experiment.go, cmd/gin-index/experiment_output.go, cmd/gin-index/experiment_test.go

Plan 18-04: Docs and Final Verification

Documented the public API and release-facing behavior, including the verbatim caller-owned Value policy, then recorded final validation evidence.

Key files:

  • Modified: CHANGELOG.md, ingest_error.go, .planning/phases/18-structured-ingesterror-cli-integration/18-VALIDATION.md

Requirements Addressed

  • IERR-01: Exported IngestError carries Path, Layer, Value, and Err; errors.As extraction is tested.
  • IERR-02: Hard ingest sites wrap underlying parser, transformer, numeric, and schema errors in IngestError; behavior matrix and AST guard coverage enforce this.
  • IERR-03: gin-index experiment --on-error continue groups failures by layer with structured samples in text and JSON output.

Verification

  • Phase verification: .planning/phases/18-structured-ingesterror-cli-integration/18-VERIFICATION.md reports status: passed with 16/16 must-haves verified.
  • go test ./...
  • make lint
  • Phase code review artifact is clean.

Key Decisions

  • IngestError.Value is verbatim, not redacted or truncated by the library; callers own redaction and output-size policy.
  • Parser contract errors, tragic/internal builder errors, and Phase 17 soft-mode skips remain non-IngestError.
  • CLI failure groups are deterministic: parser, transformer, numeric, schema, then unknown layers lexically.
  • CLI samples are bounded by count, not truncation: at most 3 samples per layer for normal grouped failures.

tazarov added 30 commits April 24, 2026 14:58
- Cover errors.As and pkg/errors.Cause extraction
- Pin IngestError formatting and staged numeric formatting
- Export IngestLayer and IngestError with wrapping helpers
- Add deterministic staged numeric value formatting
- Cover parser, transformer, and schema IngestError extraction
- Assert parser contract errors remain outside IngestError
- Return parser IngestError after stage callback and soft-skip checks
- Wrap transformer and unsupported value failures with source path context
- Cover malformed literal and validator rejected promotion values
- Assert soft parser numeric and transformer paths return nil
- Return numeric IngestError for malformed literals and mixed promotion
- Preserve raw numeric value strings through validator replay
- Update affected expectations for structured error messages
- Add execution summary with verification and deviations
- Update phase progress and requirement traceability
- Assert IngestError extraction through outer wrapping
- Prove hard failures leave builders reusable and exceptions unstructured
- Parse builder.go and inspect named hard-ingest functions
- Fail direct plain errors returns outside IngestError wrappers
- Add 18-02 execution summary
- Advance Phase 18 progress to plan 03
- Cover deterministic layer ordering for ingest failure aggregation
- Assert parser sample line, input index, value, and message fields
- Add structured failure groups and samples to experiment summaries
- Convert accumulated IngestError groups in deterministic layer order
- Assert continue mode renders grouped parser failure samples
- Assert abort mode returns parser ingest error without grouped summary
- Record structured IngestError failures during continue-mode ingest
- Print grouped failure samples after error counts in text reports
- Assert grouped JSON failure samples and ordering
- Add deterministic 100-line fixture with parser, transformer, and numeric failures
- Add overridable experiment default config for package tests
- Verify JSON failure groups with deterministic 100-line parser transformer numeric fixture
- Add execution summary and verification evidence
- Update roadmap, requirements, and state for completed IERR-03 work
- Add Godoc for exported IngestError API surface
- Note verbatim Value policy and CLI grouped summaries in changelog
- Mark Phase 18 validation rows green with final command evidence
- Fix lint findings blocking the final make lint gate
- Add plan summary with verification and deviation details
- Mark Phase 18 complete in state and roadmap
@tazarov

tazarov commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@claude

@claude

claude Bot commented Apr 24, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 3m 19s —— View job


Code Review — Phase 18: Structured IngestError + CLI Integration

Tasks:

  • Gather context and understand request
  • Read key changed files (ingest_error.go, builder.go, CLI files)
  • Review IngestError API design and implementation
  • Review builder error-wrapping changes
  • Review CLI experiment grouped failure output
  • Post final review

Summary

Phase 18 is a well-executed structured-error milestone. The IngestError API is clean, the AST guard is a creative enforcement mechanism, and the CLI grouped-failure output is deterministic and bounded. I found a few small issues worth fixing before merge and one design observation to confirm.


ingest_error.go — API Design

Strengths:

  • Nil-safe receiver methods on all accessors (Path(), Layer(), Value(), Error(), Unwrap(), Cause()) — good defensive practice.
  • Both Unwrap() (stdlib) and Cause() (pkg/errors) ensure callers using either error ecosystem get proper extraction.
  • Error() format is declared API-stable in the comment — good.
  • newIngestError / newIngestErrorString returns nil when err == nil, preventing nil-wrapping bugs.

Observation — (*IngestError)(nil).Error() returns "<nil>":
The nil receiver returns "<nil>" which differs from the pattern of many stdlib types. This is fine, but callers who fmt.Fprintf(stderr, ..., err) where err is typed as error will never hit this path anyway (a nil *IngestError assigned to error is non-nil). The nil guards are still valuable for safety.


builder.go — Error Wrapping Changes

remapCompanionIngestErrorPath mutates a live error:

// builder.go:695
ingestErr.path = sourcePath

This mutates the IngestError struct in place via the pointer obtained through errors.As. Because this happens before the error is returned to the caller, the mutation is never externally observable in its pre-remap state — but the approach is fragile: if the call order between stageMaterializedValue and remapCompanionIngestErrorPath ever changes, or if stageMaterializedValue internally logs/records the error before returning it, the path would be captured with the internal derived path. A factory approach (building a new IngestError with the remapped path) would be safer, though it would require errors.As to then re-wrap. The current approach is an acceptable trade-off given the comment, but worth noting.

Consider making path remapping non-mutating →

formatStagedNumericValue uses 'g' format for floats:

// builder.go:769
return strconv.FormatFloat(value.floatVal, 'g', -1, 64)

For values sourced via stageNativeNumeric (native Go numerics from transformers), the raw field is empty, so this format is used. 'g' uses scientific notation for very large or very small floats (e.g., 1.23e+20). This may differ from the original JSON representation, making IngestError.Value() less recognizable to callers. For stageJSONNumberLiteral the raw field preserves the exact JSON string — consistent. Only a UX observation, not a bug.

stageCompanionRepresentations error return after remap:

// builder.go:660-662
if err := b.stageMaterializedValue(registration.TargetPath, transformed, state, false); err != nil {
    remapCompanionIngestErrorPath(err, canonicalPath, registration.TargetPath)
    return err
}

The remap happens before return — correct. No issue here.


ingest_error_guard_test.go — AST Guard

This is creative and robust. A few observations:

trackBadAssignments and tuple assignments:

// ingest_error_guard_test.go:182
for i, left := range lhs {
    ...
    if i >= len(rhs) {
        delete(tainted, ident.Name)
        continue
    }

For a tuple assignment a, b := f() where len(rhs) == 1, when i=1 >= len(rhs)=1, the variable at index 1 gets cleared from tainted regardless of whether f() could taint it. In practice this doesn't cause false negatives because all denied construction calls (errors.New, errors.Errorf, etc.) return single values — multi-value assignments can only come from function calls, which the guard doesn't track as taint sources. This is an acceptable limitation worth a brief comment.

Guard auto-discovery relies on // +hard-ingest directive or stage prefix:

The validateStagedPaths function uses the directive // +hard-ingest validate staged numeric transitions before merge. at builder.go:1009. This works, but the directive comment format (with extra text after +hard-ingest) could drift if copy-pasted with different text. Consider a stricter sentinel like //nolint:hard-ingest or a dedicated standalone line. Minor.


cmd/gin-index/experiment.go — CLI Grouped Failures

withExperimentDefaultConfig missing "no parallel" warning:

// experiment_test.go:138
func withExperimentDefaultConfig(t *testing.T, fn func() gin.GINConfig) {
    t.Helper()
    original := experimentDefaultConfig
    experimentDefaultConfig = fn
    ...
}

withExperimentBuilderFactory includes the explicit comment // This helper overrides a package-global seam; callers must not run in parallel while the override is active. but withExperimentDefaultConfig does not. The test at line 1054 that uses it does not call t.Parallel(), so there's no active race — but adding the matching comment would prevent future authors from accidentally adding t.Parallel().

Add "no parallel" comment to withExperimentDefaultConfig →

experimentTragicAbortError.Error() string construction:

// experiment.go:499
message := fmt.Sprintf("%s after line %d", experimentStatusTragic, e.lineNumber)

The constant experimentStatusTragic = "aborted:tragic" doubles as a display token in the error string. This means a constant rename changes user-visible output. If the status string is user-facing in reports, this coupling is worth documenting or separating. Minor.

recordExperimentIngestFailure forceSample semantics:

When forceSample=true (tragic abort), the sample is always appended regardless of the cap. This is correct and intentional (preserve the triggering sample), and the comment explains it. The only theoretical concern — if forceSample were called multiple times — doesn't apply since only one tragic abort can occur per build.

newExperimentReport changes paths from null to [] in JSON:

paths := collectExperimentPathRows(result.idx)
if paths == nil {
    paths = []experimentPathRow{}
}

When idx == nil (tragic abort), the JSON output now contains "paths": [] rather than "paths": null. This is good JSON API hygiene. Confirm this is intentional and won't break any consumers expecting null for aborted builds.

experimentIngestLayerRank treats experimentUnknownFailureLayer as rank 4 (known):

case experimentUnknownFailureLayer:
    return 4, true

Returning (4, true) for the sentinel means "unknown" sorts before any future unrecognized layer strings. This is the right behavior — known layers (including the explicit "unknown" sentinel) sort before truly new layer strings. Clear and correct.


Minor Nits

  • ingest_error.go imports only "fmt" — could be in a file-level doc comment territory per Go conventions, but the package already has a doc comment in another file so this is fine.
  • The +hard-ingest guard comment format should be explicitly called out in a developer doc or CONTRIBUTING note so future contributors know to add it when writing new hard-ingest sites.

Verdict

Approve with minor suggestions. The two actionable items are:

  1. Add the "no parallel" safety comment to withExperimentDefaultConfig (low risk).
  2. Confirm or document the "paths": [] vs "paths": null JSON change for tragic aborts.

Everything else is observation-level. The AST guard coverage, error chain design, and deterministic CLI ordering are all solid.

tazarov added 3 commits April 25, 2026 08:45
The AST guard previously matched any comment containing the substring
'+hard-ingest', allowing free-text trailing on the same line and
near-identical tokens like '+hard-ingest-foo' to pass as directives.
Require the canonical standalone form (the directive on its own
trimmed comment line) and migrate the existing call site.

Addresses PR #33 review item 4.
withExperimentDefaultConfig swaps the same kind of package-global seam
as withExperimentBuilderFactory but lacked the matching warning that
callers must not run in parallel while the override is active. Add the
comment for symmetry to prevent a future caller from accidentally
adding t.Parallel().

Addresses PR #33 review item 5.
Quick task PLAN/SUMMARY plus STATE.md row for the +hard-ingest guard
tightening (5ec60bb) and withExperimentDefaultConfig safety comment
mirror (54b7e34).
@tazarov

tazarov commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

@claude

@claude

claude Bot commented Apr 27, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 2m 0s —— View job


Claude finished @tazarov's task ——


Follow-up Review — Phase 18: Prior Review Items

Tasks:

  • Gather context on recent commits since initial review
  • Review changes addressing prior review feedback
  • Assess current state and provide follow-up summary

All five items flagged in the initial review have been addressed. Here's the rundown:

Item 4 — +hard-ingest guard sentinel form (5ec60bb)

The hasHardIngestDirective function now requires exact canonical form:

// ingest_error_guard_test.go
text := strings.TrimSpace(strings.TrimPrefix(comment.Text, "//"))
if text == "+hard-ingest" {

Trailing text and near-identical tokens (e.g. +hard-ingest-foo) are now rejected. The builder's existing directive was migrated to the clean two-line form — directive on its own line, doc comment below. The updated comment in the guard file now documents this constraint explicitly. Clean fix.

Item 5 — withExperimentDefaultConfig no-parallel comment (54b7e34)

The comment is now mirrored exactly:

// This helper overrides a package-global seam; callers must not run in
// parallel while the override is active.

Matches withExperimentBuilderFactory. Symmetry achieved.

remapCompanionIngestErrorPath mutation guard (72eea02)

The path-guard condition was strengthened — remapping now only fires when the error's path matches the companion target shape (exact match, sub-path prefix, or internal representation prefix). This addresses the fragility concern raised in the original review without needing a factory approach.

Tragic abort "paths": [] vs "paths": null (15ebe9a)

The newExperimentReport nil-to-empty conversion (paths = []experimentPathRow{}) is confirmed intentional and explicitly tested:

// experiment_test.go:526-528
parserPaths := decodeJSONArray(t, root["paths"])
if len(parserPaths) != 0 {
    t.Fatalf("len(paths) = %d, want 0 for tragic report without finalized index", len(parserPaths))
}

The requireJSONKeys call also asserts "paths" is always present in JSON output. The [] vs null distinction is a deliberate API stability choice — consumers always get an array, never a missing or null field.

Bonus fixes (not explicitly requested)

  • f6fbbf6: Test assertions use string(experimentUnknownFailureLayer) instead of the string literal "unknown" — decouples tests from string value of the constant.
  • 72eea02: TestExperimentTragicAbortErrorUnwrapExposesOnlyBuilderErr pins the Unwrap contract; ordering comment added before recordExperimentIngestFailure.

Verdict: ready to merge. All actionable items from the initial review are resolved. The 'g' float format in formatStagedNumericValue remains as-is (UX observation, not a bug) — acceptable given the raw JSON string path still covers the primary case.

@tazarov tazarov merged commit 8f4545d into main Apr 27, 2026
5 checks passed
@tazarov tazarov deleted the phase/18-structured-ingesterror-cli-integration branch April 27, 2026 06:50
@tazarov tazarov restored the phase/18-structured-ingesterror-cli-integration branch April 27, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant