Add Claude Code GitHub Workflow#3
Merged
Merged
Conversation
tazarov
added a commit
that referenced
this pull request
Apr 16, 2026
Only writeAdaptiveStringIndexes was using sortedPathIDs; every other section writer iterated maps in non-deterministic Go order. Two Encode() calls on the same in-memory index could therefore produce different bytes, breaking content-addressable caching and making encoded-byte test assertions fragile. Fixed: - writeStringIndexes, writeStringLengthIndexes, writeNumericIndexes, writeNullIndexes, writeTrigramIndexes, writeHyperLogLogs now use sortedPathIDs at the outer level - writeTrigramIndexes now sorts its inner Trigrams map by trigram - writeConfig now sorts transformerSpecs by path Decode is order-tolerant so this is wire-compatible. Addresses PR #19 review item #3.
tazarov
added a commit
that referenced
this pull request
Apr 16, 2026
* test(08-01): add failing test for adaptive hot-term promotion - Locks adaptive config defaults for high-cardinality paths - Expects adaptive path metadata and promoted exact terms - Verifies hot terms outrank long-tail values by RG coverage * feat(08-01): add adaptive finalize-time string promotion - Add adaptive string path mode, config knobs, and summary metadata - Build promoted exact terms and deterministic bucket fallbacks at finalize time - Cover hot-term promotion with a focused regression test * test(08-01): add failing tests for adaptive query fallback - Lock bucket-backed positive lookups to no-false-negative supersets - Require non-promoted adaptive negatives to stay conservative - Preserve exact negative inversion for promoted hot terms * feat(08-01): route adaptive string queries through exact and buckets - Add shared adaptive lookup with explicit exact-vs-lossy results - Use bucket-backed supersets for adaptive EQ and IN tail terms - Keep adaptive NE and NIN conservative unless the match was exact * test(08-01): update threshold property for three-mode strings - Exercise exact, adaptive-hybrid, and bloom-only high-cardinality outcomes - Assert threshold breaches may resolve to adaptive or bloom-only modes - Keep positive EQ lookups false-negative-free across all three cases * docs(08-01): complete adaptive-high-cardinality-indexing plan Tasks completed: 3/3 - Add adaptive-hybrid path structures and finalize-time promotion - Route string membership queries through exact promotion and bucket fallback safely - Update property coverage for the three-mode threshold contract SUMMARY: .planning/phases/08-adaptive-high-cardinality-indexing/08-01-SUMMARY.md * test(08-02): add failing tests for adaptive serialization - Covers adaptive config round-trip expectations - Covers adaptive path metadata and bitmap round-trip - Locks oversized adaptive bucket section rejection * feat(08-02): persist adaptive serialization metadata - Bump the wire format to version 5 for adaptive sections - Serialize adaptive config knobs and per-path adaptive indexes explicitly - Reject malformed adaptive path, term, and bucket counts on decode * fix(08-03): restore benchmark baseline from stray 08-02 tests - remove incomplete adaptive serialization RED tests from serialize_security_test.go - realign verification to the requested 424aceb content boundary * test(08-03): add skewed adaptive benchmark fixture - add deterministic skewed head-tail fixture generation for $.user_id - define explicit exact, bloom-only, and adaptive-hybrid benchmark configs - add BenchmarkAdaptiveHighCardinality with mode= and shape= naming * test(08-02): add failing tests for CLI adaptive info output - Locks exact, bloom-only, and adaptive-hybrid mode labels - Verifies adaptive summary counters in formatter output - Uses local-only index fixtures without S3 dependencies * feat(08-02): expose adaptive mode in CLI info output - Extract index info rendering to a reusable writer helper - Report exact, bloom-only, and adaptive-hybrid modes per path - Include compact adaptive promoted, bucket, threshold, and cap counters * docs(08-02): document adaptive high-cardinality behavior - Describe exact, adaptive-hybrid, and bloom-only string path modes - Document adaptive config defaults and tuning knobs by name - Replace bloom-only-only language in comparisons and examples * fix(08-03): remove stray 08-02 cli red tests - drop unfinished adaptive CLI info tests from cmd/gin-index/main_test.go - keep 08-03 verification scoped to the requested 424aceb baseline * test(08-03): report adaptive pruning and size metrics - report candidate_rgs and encoded_bytes for hot and tail probes - assert adaptive hot probe prunes better than bloom-only before timing runs - reuse shared skewed fixtures across exact, bloom-only, and adaptive modes * docs(08-02): complete adaptive serialization and CLI info plan - Tasks completed: 3/3 - SUMMARY: .planning/phases/08-adaptive-high-cardinality-indexing/08-02-SUMMARY.md - Shared state artifacts intentionally left untouched for orchestrator ownership * docs(08-03): complete adaptive benchmark evidence plan - record HCARD-05 benchmark metrics and verification evidence - document blocker fixes required to restore the requested 424aceb baseline - leave STATE.md and ROADMAP.md untouched per execution request * fix(08-02): restore adaptive serialization and cli tests after wave overlap * fix(08): close review regressions in builder, cli, and decode * fix(08): harden decode limits and align CLI/docs * docs(phase-08): complete phase execution * docs(phase-08): evolve PROJECT.md after phase completion * docs(phase-08): add security threat verification * fix: harden adaptive path mode invariants * fix: address post-fix review hardening * fix: tighten round-3 adaptive hardening * docs(phase-08): clean up comments and add godoc for adaptive surface - expand Version constant comment with migration note and v4/v5/v6 history - dedupe PathEntry.AdaptivePromotedTerms/AdaptiveBucketCount comments into one grouped godoc that documents the "derived, never persisted" contract - justify the three serialize.go max-constants (maxDecodedIndexSize, maxHeaderRowGroups, maxHeaderDocs, maxAdaptivePaths) with the reasoning future maintainers need to tune them - add field godoc on AdaptiveStringIndex explaining the sorted-terms + lossy-bucket contract - document WithAdaptive* option semantics including the disable-via-zero conventions on PromotedTermCap and BucketCount - rewrite the *WithIO shim comment to match the actual asymmetric pattern (only build/extract have wrappers; query/info are *WithIO-only) - simplify adaptivePathSummary: the fallback to PathEntry counters is unreachable through both Decode and Finalize, so treat missing section as invariant-violation-return-zero instead of silently reading stale data - add cmd/gin-index/ errcheck exclusion in .golangci.yml; the CLI uses fmt.Fprintf to stdout/stderr for user-facing output where ignoring write errors is standard practice * feat(phase-08): add PathMode.IsValid and decode-side enforcement PathMode was validated only structurally (through validatePathReferences) after the whole directory was read. Any byte >= 3 would flow through as an "unknown" mode until the downstream switch caught it, which risked diverging behavior if future code paths forgot the guard. - PathMode.IsValid() reports whether the value is one of the declared constants - readPathDirectory now rejects unknown mode bytes immediately with ErrInvalidFormat, matching the fail-closed posture of the other per-field bounds checks - cover the new helper with a table test and the decoder with a targeted corruption test that flips a single mode byte in a valid encoded payload to 99 * feat(phase-08): export SetAdaptiveInvariantLogger for library consumers Adaptive invariant violations (path flagged PathModeAdaptiveHybrid with no matching section) were always logged to log.Default() through a package- private variable. Host processes couldn't redirect into structured logging pipelines without poking internals, and concurrent writes to the variable weren't safe. - SetAdaptiveInvariantLogger swaps the logger under sync.RWMutex; nil silences violations entirely for embedders that prefer not to have any stderr chatter - currentAdaptiveInvariantLogger reads the current logger with the read lock so Evaluate paths don't serialize on logger access - migrate the existing TestAdaptiveInvariantViolationLogs to the exported API and add TestSetAdaptiveInvariantLoggerNilSilences covering the nil case * feat(phase-08): poison builder after partial merge failure mergeStagedPaths mutates pathData, bloom, presentRGs, and nullRGs path-by-path in a sorted loop. A mid-loop failure leaves earlier paths merged for the current document while later ones are untouched - subsequent AddDocument calls would compound the corruption. validateStagedPaths' preview catches every naturally-occurring trigger today (mixed numeric promotion), so this is defensive: if a future code path lands a merge-time failure that bypasses the preview, the builder now flags poisonErr and refuses further AddDocument calls with a wrapped error pointing at the original failure. Finalize's contract is unchanged - callers who honor AddDocument's error never reach a corrupt Finalize regardless. * test(phase-08): cover remaining validate arm and non-EQ adaptive probes - add the "adaptive path missing adaptive section" row to TestValidatePathReferencesRejectsModeMismatches. All 6 arms of the mode switch at gin.go:587-608 now have direct unit coverage instead of the previous 5-arm subset + 1 transitive EncodeWithLevel assertion. - expand BenchmarkAdaptiveHighCardinality beyond EQ: add NE, IN, NIN, and Contains probes alongside the existing hot/tail EQ probes. Each probe ReportMetric's candidate_rgs and encoded_bytes per mode so regressions in adaptive pruning show up as metric shifts in bench diffs. The hot-EQ soundness guard that asserts adaptive < bloom-only is preserved unchanged; no new strict assertions are layered over the new probes because different operators have legitimately different pruning semantics across modes. * test(phase-08): add cross-mode soundness property For the same fixture, EQ results must widen monotonically: classic ⊆ adaptive ⊆ bloom-only. A violation means an adaptive-mode index is strictly more aggressive than the exact index - i.e. it drops row groups that really do match, which is unsound and silently loses data. The existing per-mode no-false-negatives property catches missed ground- truth matches against the fixture, but not the case where adaptive's bucket layer under-reports compared to its own exact-index peer. 1000 generated cases over the existing cardinality threshold generator. * refactor(cli): drop outer var err error shadows in *WithIO helpers Each *WithIO helper declared a function-scope var err error that only worked because one "idx, err = ..." assignment per function happened to be in a branch without a block-local err declaration. Future edits that added or removed err := inside inner blocks could silently read or write the wrong err variable. Restructure the 4 helpers so every err value lives in the smallest block that uses it: the two branches that previously needed the outer err now use "loaded, err := ...; idx = loaded" instead of "idx, err = ...". The remaining idx, err = ... sites are inside blocks with their own err := and continue to work unchanged. * style(benchmark): apply gci formatting to long probe func literals * test: fix lint violations in test fixtures * perf(builder): in-place merge for adaptive tail bucket bitmaps Adds RGSet.UnionWith for in-place merges and uses it in buildAdaptiveStringIndex to avoid one bitmap clone per non-promoted term. On high-cardinality paths with many tail terms (the workload adaptive mode targets) this removes O(tail) allocations from the hot builder path. Addresses PR #19 review item #2. * fix(serialize): produce deterministic byte output for all index sections Only writeAdaptiveStringIndexes was using sortedPathIDs; every other section writer iterated maps in non-deterministic Go order. Two Encode() calls on the same in-memory index could therefore produce different bytes, breaking content-addressable caching and making encoded-byte test assertions fragile. Fixed: - writeStringIndexes, writeStringLengthIndexes, writeNumericIndexes, writeNullIndexes, writeTrigramIndexes, writeHyperLogLogs now use sortedPathIDs at the outer level - writeTrigramIndexes now sorts its inner Trigrams map by trigram - writeConfig now sorts transformerSpecs by path Decode is order-tolerant so this is wire-compatible. Addresses PR #19 review item #3. * refactor(query): tighten adaptive lookup paths and silence default logger Three small adaptive-mode cleanups from PR #19 review: - lookupAdaptiveStringMatch: drop the unreachable bucketID bounds check and nil bucket guard. NewAdaptiveStringIndex (used by both build and decode paths) already enforces a power-of-two bucket count and non-nil bucket bitmaps, and adaptiveBucketIndex masks with bucketCount-1, so neither guard can fire. Replace with an invariant comment. Addresses item #1. - evaluateNIN: when a non-string element forces non-adaptive fallback, call evaluateINNonAdaptive(values) directly with the already-extracted slice instead of routing through evaluateIN, which would re-enter the adaptive path before falling back. Addresses item #7. - adaptiveInvariantLogger: default to nil instead of log.Default(). Per Go library convention, libraries should not write to stderr unless the consumer opts in. Existing callers can install log.Default() explicitly. Addresses item #6. * docs(gin): clarify v5 history and AdaptiveBucketCount=0 disable sentinel - Version comment now notes v5 was never released; the in-tree v5 shape was iterated before the wire format was finalised in v6 and v5 payloads are always rejected on decode. Addresses item #4. - WithAdaptiveBucketCount godoc now explains the asymmetry between the option (rejects 0) and the validate() path (accepts 0 as the disable sentinel for struct-literal callers). validate() also carries an inline comment explaining why 0 is permitted there. Addresses item #5.
tazarov
added a commit
that referenced
this pull request
Apr 21, 2026
* docs(13): capture phase context * docs(state): record phase 13 context session * docs(13): add research and validation strategy * docs(13): add pattern map * docs(13): create phase 13 parser seam extraction plans * docs(13): add three phase 13 parser-seam-extraction plans Plan 01 (wave 1): additive parser surface — Parser interface, parserSink (package-private, 6 methods per D-01), stdlibParser (zero- field struct, Name() == "stdlib"), WithParser option, three new builder fields. Existing test suite remains green — seam not wired. Plan 02 (wave 2, depends on 01): wire AddDocument through b.parser.Parse; default stdlibParser and cache b.parserName at NewBuilder with empty-name rejection (D-06); delete parseAndStage- Document / stageStreamValue / decodeTransformedValue from builder.go. Classifier (stageJSONNumberLiteral etc.) preserved per Pitfall #1. Plan 03 (wave 3, depends on 02): parity harness — authored goldens (7 fixtures per D-05 dim #1/#4), gopter determinism property (dim #2), 12-operator Evaluate matrix (dim #3). parity_goldens_test.go gated by //go:build regenerate_goldens. Merge-gate benchmark delta captured in 13-03-BENCH.md. D-02 ROADMAP deviation (ParserSink + stdlibParser ship package-private) flagged in SUMMARY. * docs(13): address plan-checker issues (iteration 1) * docs(13): cross-AI review for phase 13 (gemini + codex) * docs(13): revise plans per cross-AI review feedback * refactor(parser): add parser seam surface * docs(13): record plan 01 completion * refactor(parser): wire AddDocument through parser seam * docs(13): record plan 02 completion * test(parser): add phase 13 parity harness * docs(13): record plan 03 benchmark review hold * docs(phase-13): add/update security threat verification * docs(phase-13): accept benchmark risk * docs(phase-13): reconcile accepted benchmark exception * docs(phase-13): refresh validation strategy * Enforce parser sink contract consistently * Keep parser state opaque * docs(13): ship phase 13 - PR #28 * docs(parser): clarify seam contract and fix numeric present-marking Close a latent contract asymmetry in the Phase 13 parser seam: stageNumericObservation now marks its path present, so all Stage* sink methods are uniformly self-marking. Parsers only need to call MarkPresent explicitly for object and array roots. Behavior-preserving for stdlibParser -- parity goldens unchanged. Addresses review items 1 and 2 on PR #28.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Installing Claude Code GitHub App
This PR adds a GitHub Actions workflow that enables Claude Code integration in our repository.
What is Claude Code?
Claude Code is an AI coding agent that can help with:
How it works
Once this PR is merged, we'll be able to interact with Claude by mentioning @claude in a pull request or issue comment.
Once the workflow is triggered, Claude will analyze the comment and surrounding context, and execute on the request in a GitHub action.
Important Notes
Security
There's more information in the Claude Code action repo.
After merging this PR, let's try mentioning @claude in a comment on any PR to get started!