Skip to content

Emit core HNSW index properties in extract summary#1

Merged
tazarov merged 3 commits into
mainfrom
codex/extract-index-properties
Mar 5, 2026
Merged

Emit core HNSW index properties in extract summary#1
tazarov merged 3 commits into
mainfrom
codex/extract-index-properties

Conversation

@tazarov

@tazarov tazarov commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add index_properties to extract summaries with m, ef_construction, cur_element_count, max_elements, persisted_version, and word_size_bytes
  • populate the new fields from parsed header.bin values during extraction
  • expose the updated extract response shape in Go bindings and documentation
  • add extractor tests for the new properties, including capacity/count assertions in property-based coverage

Validation

  • cargo test
  • cargo clippy --all-targets -- -D warnings
  • go test ./...

@tazarov

tazarov commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the review follow-ups in commit 5e1ff08:

  • changed Go to unsigned ()
  • added Go JSON deserialization coverage for /
  • made Rust JSON casing explicit with
  • narrowed Rust to (serialized from header word size bytes)

Validation rerun:

  • cargo test
  • cargo clippy --all-targets -- -D warnings
  • go test ./...

@tazarov

tazarov commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the review follow-ups in commit 5e1ff08:

  • changed Go WordSizeBytes to unsigned type (uint32)
  • added Go JSON deserialization coverage for ExtractResponse and index_properties
  • made Rust ExtractIndexProperties JSON casing explicit with serde rename_all snake_case
  • narrowed Rust word_size_bytes to u32 (serialized from header word size bytes)

Validation rerun:

  • cargo test
  • cargo clippy --all-targets -- -D warnings
  • go test ./...

@tazarov

tazarov commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Mar 4, 2026

Copy link
Copy Markdown

Claude finished @tazarov's task in 1m 55s —— View job


Code Review

  • Gather context and read changed files
  • Review Rust changes (src/extractor.rs, src/lib.rs)
  • Review Go changes (hnswtoolbox.go, hnswtoolbox_test.go)
  • Review documentation (README.md)
  • Post final review

Overall Assessment: ✅ Looks Good

The implementation is clean, correct, and focused. The index_properties contract is well-defined, types are properly aligned between Rust and Go, and test coverage is solid. A few minor observations follow.


Rust (src/extractor.rs, src/lib.rs)

ExtractIndexProperties struct (extractor.rs:36-43):

  • Field types correctly mirror PersistentHeader fields (m: u64, ef_construction: u64, persisted_version: i32)
  • word_size_bytes: header.word_size.bytes() as u32 — the as u32 cast from usize is safe here since HeaderWordSize::bytes() can only return 4 or 8 (closed enum). No truncation risk.
  • #[serde(rename_all = "snake_case")] on fields that are already snake_case is a no-op, but it makes the serialization contract explicit — good defensive practice.

cur_element_count in index_properties vs scanned in ExtractSummary (extractor.rs:147-149):

  • index_properties.cur_element_count is a header snapshot taken before scanning begins, while scanned is incremented per-record. These will always be equal for a valid index. The property tests explicitly verify this relationship (prop_assert_eq!(summary.index_properties.cur_element_count, fixture_records.len() as u64)). This is correct and intentional.

Minor: No Deserialize on ExtractIndexProperties (extractor.rs:34):

  • The struct only derives Serialize, not Deserialize. This is consistent with ExtractSummary and is correct for the producer role this struct fills. Worth noting if a future consumer (e.g., round-trip Rust tests) needs deserialization.

lib.rs: ExtractIndexProperties is properly re-exported at the crate root (lib.rs:10). ✓


Go (hnswtoolbox.go, hnswtoolbox_test.go)

Type alignment (hnswtoolbox.go:47-54):

Go field Go type Rust type Match
M uint64 u64
EfConstruction uint64 u64
CurElementCount uint64 u64
MaxElements uint64 u64
PersistedVersion int32 i32
WordSizeBytes uint32 u32

All JSON tag names match the Rust serde snake_case output. ✓

hnswtoolbox_test.go:

  • Covers JSON deserialization roundtrip for all six new fields ✓
  • Uses t.Fatalf for every assertion — this stops at the first failure rather than reporting all mismatches. Using t.Errorf for non-fatal assertions would give a fuller picture on failure, but this is a minor style point for a new-struct deserialization test.

Documentation (README.md)

  • New index_properties block is added to both the "What It Extracts" summary (README.md:23-30) and the example Response JSON (README.md:114-123) ✓
  • Field list matches the struct definition exactly ✓

Test Fixture Sanity (src/extractor.rs:941-1010)

The write_fixture_with_capacity function produces a 100-byte header (U64 word size), which is correctly detected as HeaderWordSize::U64 by PersistentHeader::from_bytes. The written values (m=4, ef_construction=100, word_size_bytes=8) are consistent with all test assertions. ✓

@tazarov tazarov merged commit dd8d5f7 into main Mar 5, 2026
3 checks passed
@tazarov tazarov deleted the codex/extract-index-properties branch March 5, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant