Skip to content

Phase 1: Foundation — License, module path, internal cleanup#7

Merged
tazarov merged 20 commits into
mainfrom
gsd/v0.1.0-milestone
Mar 26, 2026
Merged

Phase 1: Foundation — License, module path, internal cleanup#7
tazarov merged 20 commits into
mainfrom
gsd/v0.1.0-milestone

Conversation

@tazarov

@tazarov tazarov commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase 1: Foundation
Goal: The repository is legally usable and go get github.com/amikos-tech/ami-gin works
Status: Verified ✓ (6/6 must-haves passed)

Establishes the legal and technical foundation for open-source release: adds MIT license, renames the Go module path from gin-index to ami-gin, removes internal planning artifacts from git tracking, and scrubs all internal references.

Changes

Plan 01-01: License & Module Rename

MIT LICENSE file added and Go module path renamed from gin-index to ami-gin across 16 source files with build/test/lint all green.

Key files:

  • LICENSE — MIT License, Copyright (c) 2026 Amikos Tech Ltd.
  • go.mod — Module path updated to github.com/amikos-tech/ami-gin
  • All 12 Go source files (cmd + examples) — import paths updated
  • .golangci.yml — gci prefix updated
  • README.md, CLAUDE.md — references updated

Plan 01-02: Internal Artifact Cleanup

Removed internal PRD document and gitignored .planning/ directory to prevent internal artifacts from leaking in public repository.

Key files:

  • gin-index-prd.md — deleted (contained internal "Kiba Team" references)
  • .gitignore — added .planning/ and .claude/settings.local.json

Requirements Addressed

ID Description
FOUND-01 Add MIT LICENSE file
FOUND-02 Change module path to github.com/amikos-tech/ami-gin
FOUND-03 Remove internal PRD document
FOUND-04 Add .planning/ to .gitignore and untrack

Verification

  • Automated verification: 6/6 must-haves passed
  • go build ./... passes
  • go test ./... passes (all tests green)
  • golangci-lint run — 0 issues
  • No github.com/amikos-tech/gin-index references in tracked files
  • No internal references (Kiba Team, gin-index-prd.md) in tracked files
  • Manual: go get github.com/amikos-tech/ami-gin resolves (requires public repo)

Test plan

  • Verify go build ./... passes in CI
  • Verify go test ./... passes in CI
  • Verify LICENSE file is present at repo root
  • Verify no references to old module path gin-index in Go imports

tazarov added 19 commits March 24, 2026 09:28
Generated .planning/codebase/ with 7 structured documents:
- STACK.md, INTEGRATIONS.md (tech stack & dependencies)
- ARCHITECTURE.md, STRUCTURE.md (design & file layout)
- CONVENTIONS.md, TESTING.md (code quality & test practices)
- CONCERNS.md (tech debt & improvement areas)
- MIT License with copyright (c) 2026 Amikos Tech Ltd.
- Establishes legal identity for open source release
- Update go.mod module declaration from gin-index to ami-gin
- Update all 12 Go source file imports (cmd + examples)
- Update .golangci.yml gci prefix to match new module path
- Update README.md go get and import references
- Regenerate go.sum with new module path
- Add 01-01-SUMMARY.md with execution results
- Update STATE.md with position, decisions, metrics
- Update ROADMAP.md with plan progress
- Mark FOUND-01, FOUND-02 requirements complete
- Delete gin-index-prd.md (49KB internal planning doc with "Kiba Team" references)
- Prevents internal artifacts from leaking in public repository
- Add .planning/ to .gitignore to prevent tracking GSD workflow artifacts
- Remove .planning/ from git tracking while preserving local files
- Ensures no internal planning artifacts leak in public repository
Three auto-generated GSD blocks still referenced the old module path.
@tazarov

tazarov commented Mar 26, 2026

Copy link
Copy Markdown
Contributor Author

@claude PTAL

@claude

claude Bot commented Mar 26, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 1m 36s —— View job


PR Review: Phase 1 — Foundation

Tasks:

  • Gather context and understand the request
  • Review LICENSE file
  • Review go.mod module rename
  • Review CLAUDE.md changes
  • Review .gitignore and .golangci.yml changes
  • Review README.md changes
  • Check for any remaining internal references
  • Post final review

Overall this is a clean, well-scoped phase 1. All 6 must-have requirements are satisfied and the changes are appropriately minimal. A few observations below.

LICENSE

The MIT license text is correct and GitHub's license detection will recognise it. One cosmetic point: the permission grant (line 5), condition (line 7), and disclaimer (line 9) are each squashed onto a single long line rather than using the canonical line-wrapped format. This is legally equivalent but harder to read in a plain-text viewer and less recognisable at a glance.

The copyright year 2026 matches the current year — no issue there.

go.mod — dependency promotion (unannounced change)

Several packages were silently promoted from // indirect to direct dependencies:

github.com/aws/aws-sdk-go-v2         indirect → direct
github.com/aws/aws-sdk-go-v2/config  indirect → direct
github.com/aws/aws-sdk-go-v2/credentials  indirect → direct
github.com/aws/aws-sdk-go-v2/service/s3   indirect → direct
github.com/leanovate/gopter           indirect → direct
github.com/parquet-go/parquet-go      indirect → direct
github.com/pkg/errors                 indirect → direct

This is a correctness fix (they are directly imported in s3.go, parquet.go, and *_test.go) so the change is welcome, but it wasn't called out in the PR description or plan. Worth noting so the audit trail is complete.

CLAUDE.md — Build Commands section incomplete

The Technology Stack section generated by the GSD tooling has a ## Build Commands block where only comment lines were emitted, with no actual commands:

## Build Commands
# Build library
# Build CLI tool
# Run all tests with coverage (via gotestsum)
# Run tests directly
# Run specific test
# Lint
# Lint with auto-fix
# Clean artifacts

These are headings without bodies — looks like the generator couldn't extract the commands from the Makefile. Not a blocker since CLAUDE.md already has the correct commands in the "Build and Test Commands" section above, but the stale placeholder may cause confusion. Fix this →

Everything else ✓

Check Result
No amikos-tech/gin-index module path references ✓ clean
No "Kiba Team" or PRD references ✓ clean
gin-index occurrences are CLI tool name only ✓ expected
.golangci.yml gci prefix updated
README install/import paths updated ✓ (3 occurrences)
.gitignore additions (.planning/, .claude/settings.local.json) ✓ appropriate
gin-index-prd.md deleted

The .claude/settings.local.json entry is intentionally scoped (not .claude/) so that settings.json stays tracked — that's the right call.

Verdict: Ready to merge with the go.mod change documented and the CLAUDE.md Build Commands section addressed (can be a follow-up).

- Reformat LICENSE to canonical line-wrapped MIT format
- Remove duplicate empty Build Commands section from CLAUDE.md
  (real commands already in Build and Test Commands section above)
@tazarov tazarov merged commit ebd4915 into main Mar 26, 2026
@tazarov tazarov deleted the gsd/v0.1.0-milestone branch March 26, 2026 17:47
tazarov added a commit that referenced this pull request Apr 16, 2026
…gger

Three small adaptive-mode cleanups from PR #19 review:

- lookupAdaptiveStringMatch: drop the unreachable bucketID bounds
  check and nil bucket guard. NewAdaptiveStringIndex (used by both
  build and decode paths) already enforces a power-of-two bucket
  count and non-nil bucket bitmaps, and adaptiveBucketIndex masks
  with bucketCount-1, so neither guard can fire. Replace with an
  invariant comment. Addresses item #1.
- evaluateNIN: when a non-string element forces non-adaptive
  fallback, call evaluateINNonAdaptive(values) directly with the
  already-extracted slice instead of routing through evaluateIN,
  which would re-enter the adaptive path before falling back.
  Addresses item #7.
- adaptiveInvariantLogger: default to nil instead of log.Default().
  Per Go library convention, libraries should not write to stderr
  unless the consumer opts in. Existing callers can install
  log.Default() explicitly. Addresses item #6.
tazarov added a commit that referenced this pull request Apr 16, 2026
* test(08-01): add failing test for adaptive hot-term promotion

- Locks adaptive config defaults for high-cardinality paths
- Expects adaptive path metadata and promoted exact terms
- Verifies hot terms outrank long-tail values by RG coverage

* feat(08-01): add adaptive finalize-time string promotion

- Add adaptive string path mode, config knobs, and summary metadata
- Build promoted exact terms and deterministic bucket fallbacks at finalize time
- Cover hot-term promotion with a focused regression test

* test(08-01): add failing tests for adaptive query fallback

- Lock bucket-backed positive lookups to no-false-negative supersets
- Require non-promoted adaptive negatives to stay conservative
- Preserve exact negative inversion for promoted hot terms

* feat(08-01): route adaptive string queries through exact and buckets

- Add shared adaptive lookup with explicit exact-vs-lossy results
- Use bucket-backed supersets for adaptive EQ and IN tail terms
- Keep adaptive NE and NIN conservative unless the match was exact

* test(08-01): update threshold property for three-mode strings

- Exercise exact, adaptive-hybrid, and bloom-only high-cardinality outcomes
- Assert threshold breaches may resolve to adaptive or bloom-only modes
- Keep positive EQ lookups false-negative-free across all three cases

* docs(08-01): complete adaptive-high-cardinality-indexing plan

Tasks completed: 3/3
- Add adaptive-hybrid path structures and finalize-time promotion
- Route string membership queries through exact promotion and bucket fallback safely
- Update property coverage for the three-mode threshold contract

SUMMARY: .planning/phases/08-adaptive-high-cardinality-indexing/08-01-SUMMARY.md

* test(08-02): add failing tests for adaptive serialization

- Covers adaptive config round-trip expectations
- Covers adaptive path metadata and bitmap round-trip
- Locks oversized adaptive bucket section rejection

* feat(08-02): persist adaptive serialization metadata

- Bump the wire format to version 5 for adaptive sections
- Serialize adaptive config knobs and per-path adaptive indexes explicitly
- Reject malformed adaptive path, term, and bucket counts on decode

* fix(08-03): restore benchmark baseline from stray 08-02 tests

- remove incomplete adaptive serialization RED tests from serialize_security_test.go
- realign verification to the requested 424aceb content boundary

* test(08-03): add skewed adaptive benchmark fixture

- add deterministic skewed head-tail fixture generation for $.user_id
- define explicit exact, bloom-only, and adaptive-hybrid benchmark configs
- add BenchmarkAdaptiveHighCardinality with mode= and shape= naming

* test(08-02): add failing tests for CLI adaptive info output

- Locks exact, bloom-only, and adaptive-hybrid mode labels
- Verifies adaptive summary counters in formatter output
- Uses local-only index fixtures without S3 dependencies

* feat(08-02): expose adaptive mode in CLI info output

- Extract index info rendering to a reusable writer helper
- Report exact, bloom-only, and adaptive-hybrid modes per path
- Include compact adaptive promoted, bucket, threshold, and cap counters

* docs(08-02): document adaptive high-cardinality behavior

- Describe exact, adaptive-hybrid, and bloom-only string path modes
- Document adaptive config defaults and tuning knobs by name
- Replace bloom-only-only language in comparisons and examples

* fix(08-03): remove stray 08-02 cli red tests

- drop unfinished adaptive CLI info tests from cmd/gin-index/main_test.go
- keep 08-03 verification scoped to the requested 424aceb baseline

* test(08-03): report adaptive pruning and size metrics

- report candidate_rgs and encoded_bytes for hot and tail probes
- assert adaptive hot probe prunes better than bloom-only before timing runs
- reuse shared skewed fixtures across exact, bloom-only, and adaptive modes

* docs(08-02): complete adaptive serialization and CLI info plan

- Tasks completed: 3/3
- SUMMARY: .planning/phases/08-adaptive-high-cardinality-indexing/08-02-SUMMARY.md
- Shared state artifacts intentionally left untouched for orchestrator ownership

* docs(08-03): complete adaptive benchmark evidence plan

- record HCARD-05 benchmark metrics and verification evidence
- document blocker fixes required to restore the requested 424aceb baseline
- leave STATE.md and ROADMAP.md untouched per execution request

* fix(08-02): restore adaptive serialization and cli tests after wave overlap

* fix(08): close review regressions in builder, cli, and decode

* fix(08): harden decode limits and align CLI/docs

* docs(phase-08): complete phase execution

* docs(phase-08): evolve PROJECT.md after phase completion

* docs(phase-08): add security threat verification

* fix: harden adaptive path mode invariants

* fix: address post-fix review hardening

* fix: tighten round-3 adaptive hardening

* docs(phase-08): clean up comments and add godoc for adaptive surface

- expand Version constant comment with migration note and v4/v5/v6 history
- dedupe PathEntry.AdaptivePromotedTerms/AdaptiveBucketCount comments into one
  grouped godoc that documents the "derived, never persisted" contract
- justify the three serialize.go max-constants (maxDecodedIndexSize,
  maxHeaderRowGroups, maxHeaderDocs, maxAdaptivePaths) with the reasoning
  future maintainers need to tune them
- add field godoc on AdaptiveStringIndex explaining the sorted-terms +
  lossy-bucket contract
- document WithAdaptive* option semantics including the disable-via-zero
  conventions on PromotedTermCap and BucketCount
- rewrite the *WithIO shim comment to match the actual asymmetric pattern
  (only build/extract have wrappers; query/info are *WithIO-only)
- simplify adaptivePathSummary: the fallback to PathEntry counters is
  unreachable through both Decode and Finalize, so treat missing section as
  invariant-violation-return-zero instead of silently reading stale data
- add cmd/gin-index/ errcheck exclusion in .golangci.yml; the CLI uses
  fmt.Fprintf to stdout/stderr for user-facing output where ignoring write
  errors is standard practice

* feat(phase-08): add PathMode.IsValid and decode-side enforcement

PathMode was validated only structurally (through validatePathReferences)
after the whole directory was read. Any byte >= 3 would flow through as
an "unknown" mode until the downstream switch caught it, which risked
diverging behavior if future code paths forgot the guard.

- PathMode.IsValid() reports whether the value is one of the declared
  constants
- readPathDirectory now rejects unknown mode bytes immediately with
  ErrInvalidFormat, matching the fail-closed posture of the other
  per-field bounds checks
- cover the new helper with a table test and the decoder with a
  targeted corruption test that flips a single mode byte in a valid
  encoded payload to 99

* feat(phase-08): export SetAdaptiveInvariantLogger for library consumers

Adaptive invariant violations (path flagged PathModeAdaptiveHybrid with no
matching section) were always logged to log.Default() through a package-
private variable. Host processes couldn't redirect into structured logging
pipelines without poking internals, and concurrent writes to the variable
weren't safe.

- SetAdaptiveInvariantLogger swaps the logger under sync.RWMutex; nil
  silences violations entirely for embedders that prefer not to have any
  stderr chatter
- currentAdaptiveInvariantLogger reads the current logger with the read
  lock so Evaluate paths don't serialize on logger access
- migrate the existing TestAdaptiveInvariantViolationLogs to the exported
  API and add TestSetAdaptiveInvariantLoggerNilSilences covering the nil
  case

* feat(phase-08): poison builder after partial merge failure

mergeStagedPaths mutates pathData, bloom, presentRGs, and nullRGs path-by-path
in a sorted loop. A mid-loop failure leaves earlier paths merged for the
current document while later ones are untouched - subsequent AddDocument
calls would compound the corruption.

validateStagedPaths' preview catches every naturally-occurring trigger today
(mixed numeric promotion), so this is defensive: if a future code path
lands a merge-time failure that bypasses the preview, the builder now
flags poisonErr and refuses further AddDocument calls with a wrapped error
pointing at the original failure. Finalize's contract is unchanged - callers
who honor AddDocument's error never reach a corrupt Finalize regardless.

* test(phase-08): cover remaining validate arm and non-EQ adaptive probes

- add the "adaptive path missing adaptive section" row to
  TestValidatePathReferencesRejectsModeMismatches. All 6 arms of the mode
  switch at gin.go:587-608 now have direct unit coverage instead of the
  previous 5-arm subset + 1 transitive EncodeWithLevel assertion.
- expand BenchmarkAdaptiveHighCardinality beyond EQ: add NE, IN, NIN, and
  Contains probes alongside the existing hot/tail EQ probes. Each probe
  ReportMetric's candidate_rgs and encoded_bytes per mode so regressions in
  adaptive pruning show up as metric shifts in bench diffs. The hot-EQ
  soundness guard that asserts adaptive < bloom-only is preserved
  unchanged; no new strict assertions are layered over the new probes
  because different operators have legitimately different pruning
  semantics across modes.

* test(phase-08): add cross-mode soundness property

For the same fixture, EQ results must widen monotonically:
classic ⊆ adaptive ⊆ bloom-only. A violation means an adaptive-mode index
is strictly more aggressive than the exact index - i.e. it drops row
groups that really do match, which is unsound and silently loses data.

The existing per-mode no-false-negatives property catches missed ground-
truth matches against the fixture, but not the case where adaptive's
bucket layer under-reports compared to its own exact-index peer. 1000
generated cases over the existing cardinality threshold generator.

* refactor(cli): drop outer var err error shadows in *WithIO helpers

Each *WithIO helper declared a function-scope var err error that only
worked because one "idx, err = ..." assignment per function happened to
be in a branch without a block-local err declaration. Future edits that
added or removed err := inside inner blocks could silently read or write
the wrong err variable.

Restructure the 4 helpers so every err value lives in the smallest block
that uses it: the two branches that previously needed the outer err now
use "loaded, err := ...; idx = loaded" instead of "idx, err = ...". The
remaining idx, err = ... sites are inside blocks with their own err :=
and continue to work unchanged.

* style(benchmark): apply gci formatting to long probe func literals

* test: fix lint violations in test fixtures

* perf(builder): in-place merge for adaptive tail bucket bitmaps

Adds RGSet.UnionWith for in-place merges and uses it in
buildAdaptiveStringIndex to avoid one bitmap clone per non-promoted
term. On high-cardinality paths with many tail terms (the workload
adaptive mode targets) this removes O(tail) allocations from the
hot builder path. Addresses PR #19 review item #2.

* fix(serialize): produce deterministic byte output for all index sections

Only writeAdaptiveStringIndexes was using sortedPathIDs; every other
section writer iterated maps in non-deterministic Go order. Two
Encode() calls on the same in-memory index could therefore produce
different bytes, breaking content-addressable caching and making
encoded-byte test assertions fragile.

Fixed:
- writeStringIndexes, writeStringLengthIndexes, writeNumericIndexes,
  writeNullIndexes, writeTrigramIndexes, writeHyperLogLogs now use
  sortedPathIDs at the outer level
- writeTrigramIndexes now sorts its inner Trigrams map by trigram
- writeConfig now sorts transformerSpecs by path

Decode is order-tolerant so this is wire-compatible. Addresses PR #19
review item #3.

* refactor(query): tighten adaptive lookup paths and silence default logger

Three small adaptive-mode cleanups from PR #19 review:

- lookupAdaptiveStringMatch: drop the unreachable bucketID bounds
  check and nil bucket guard. NewAdaptiveStringIndex (used by both
  build and decode paths) already enforces a power-of-two bucket
  count and non-nil bucket bitmaps, and adaptiveBucketIndex masks
  with bucketCount-1, so neither guard can fire. Replace with an
  invariant comment. Addresses item #1.
- evaluateNIN: when a non-string element forces non-adaptive
  fallback, call evaluateINNonAdaptive(values) directly with the
  already-extracted slice instead of routing through evaluateIN,
  which would re-enter the adaptive path before falling back.
  Addresses item #7.
- adaptiveInvariantLogger: default to nil instead of log.Default().
  Per Go library convention, libraries should not write to stderr
  unless the consumer opts in. Existing callers can install
  log.Default() explicitly. Addresses item #6.

* docs(gin): clarify v5 history and AdaptiveBucketCount=0 disable sentinel

- Version comment now notes v5 was never released; the in-tree v5
  shape was iterated before the wire format was finalised in v6
  and v5 payloads are always rejected on decode. Addresses item #4.
- WithAdaptiveBucketCount godoc now explains the asymmetry between
  the option (rejects 0) and the validate() path (accepts 0 as the
  disable sentinel for struct-literal callers). validate() also
  carries an inline comment explaining why 0 is permitted there.
  Addresses item #5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant