tellmeY18
diff --git a/‎CLAUDE.md‎
Lines changed: 112 additions & 6 deletions b/‎CLAUDE.md‎
Lines changed: 112 additions & 6 deletions
diff --git a/‎Changelog.md‎
Lines changed: 131 additions & 0 deletions b/‎Changelog.md‎
Lines changed: 131 additions & 0 deletions
@@ -183,6 +183,33 @@ pub const ClientConfig = struct {
 
 Use `std.net.TcpStream` directly for the connection pool transport. Do not use `std.http.Client` (no external connection lifecycle control, API instability) or `http.zig` (server-only library, no outbound client API). ES's REST API is pure request/response HTTP/1.1 — framing it by hand in `pool.zig` is ~200 lines and gives the `ConnectionPool` full control over socket acquire/release/reuse.
 
+> **Known issue — `std.http.Client` vs DELETE-with-body (Zig 0.15):**
+>
+> Elasticsearch uses `DELETE` with a JSON body for several endpoints (clear
+> scroll, close PIT, delete by query). Zig's `std.http.Client` has a hard
+> `assert(r.method.requestHasBody())` inside `sendBodyUnflushed` (Client.zig
+> L924), and `requestHasBody()` returns `false` for `DELETE` (http.zig L38).
+> This means calling `req.sendBodyComplete(body)` on a DELETE request panics.
+>
+> **Workaround in `pool.zig`:** When `sendRequest` detects a body on a method
+> where `requestHasBody()` is `false`, it bypasses `sendBodyComplete` and
+> instead writes the HTTP request directly to the connection's writer:
+>
+> 1. Set `req.transfer_encoding = .{ .content_length = payload.len }` so
+>    the `content-length` header is emitted by `sendHead`.
+> 2. Call the private `sendHead` indirectly — not possible; instead
+>    replicate the header-writing logic using `req.connection.?.writer()`.
+> 3. Write the body bytes and flush.
+> 4. `receiveHead` works normally afterwards because it just reads from
+>    the same connection.
+>
+> This affects: `ClearScrollRequest` (`DELETE /_search/scroll`),
+> `PitCloseRequest` (`DELETE /_search/point_in_time`), and any future
+> DELETE-with-body endpoint.
+>
+> Other ES clients (elasticsearch-py, go-elasticsearch, elasticsearch-java)
+> use HTTP libraries that allow DELETE with body per RFC 9110 §9.3.5.
+
 ### 4. Bulk Indexer
 
 Critical for Snowstorm's RF2 import pipeline. A dedicated `BulkIndexer`
@@ -474,12 +501,91 @@ thresholds, and per-action failure reporting. Can drive RF2 import workloads at
 ### M6 — Scroll + PIT (weeks 14–15)
 **Test backend: Elasticsearch (OpenSearch)**
 
-- [ ] `ScrollIterator` — page through results, auto-clear on `deinit`
-- [ ] Point-in-time: `openPit`, `closePit`, search with `search_after`
-- [ ] `PitIterator` — preferred over scroll for read-heavy queries
-- [ ] Memory cap: never buffer more than one page
-
-Deliverable: Can page through 500K+ concept documents.
+#### Phase 1 — Scroll API Types (`src/api/scroll.zig`)
+- [x] `ScrollSearchRequest` struct — wraps a `SearchRequest` with `scroll` keep-alive duration (e.g. `"1m"`)
+- [x] `ScrollSearchRequest.httpMethod()` → `"POST"`
+- [x] `ScrollSearchRequest.httpPath(allocator)` → `"/<index>/_search?scroll=<duration>"`
+- [x] `ScrollSearchRequest.httpBody(allocator)` → delegates to inner `SearchRequest.httpBody()`
+- [x] `ScrollNextRequest` struct — holds `scroll_id: []const u8` and `scroll: []const u8` (keep-alive)
+- [x] `ScrollNextRequest.httpMethod()` → `"POST"`
+- [x] `ScrollNextRequest.httpPath(allocator)` → `"/_search/scroll"`
+- [x] `ScrollNextRequest.httpBody(allocator)` → `{"scroll": "<duration>", "scroll_id": "<id>"}`
+- [x] `ClearScrollRequest` struct — holds `scroll_id: []const u8`
+- [x] `ClearScrollRequest.httpMethod()` → `"DELETE"`
+- [x] `ClearScrollRequest.httpPath(allocator)` → `"/_search/scroll"`
+- [x] `ClearScrollRequest.httpBody(allocator)` → `{"scroll_id": "<id>"}`
+- [x] `ScrollSearchResponse(T)` — extends `SearchResponse(T)` with `_scroll_id: ?[]const u8`
+- [x] Unit tests: verify HTTP method, path (with scroll param), body for each request type
+
+#### Phase 2 — ScrollIterator (`src/api/scroll.zig`)
+- [x] `ScrollIterator(T)` struct — generic over document type `T`
+- [x] Fields: `allocator`, `pool: *ConnectionPool`, `compression: bool`, `scroll_duration`, `scroll_id`, `current_page`, `done: bool`
+- [x] `ScrollIterator.init(allocator, pool, compression, index, query, opts, scroll_duration)` — sends initial `_search?scroll=` request, parses first page
+- [x] `ScrollIterator.next()` → `?[]const Hit(T)` — returns next page of hits, or `null` when exhausted
+- [x] On each `next()`: sends `POST /_search/scroll` with current `scroll_id`, parses response, updates `scroll_id`
+- [x] Returns `null` when `hits.hits` is empty (no more results)
+- [x] `ScrollIterator.deinit()` — sends `DELETE /_search/scroll` to clear server-side scroll context, frees memory
+- [x] Memory cap: only one page of hits is live at a time; previous page is freed on `next()`
+- [x] Error handling: transport errors propagate; on `deinit` clear-scroll errors are silently ignored
+- [x] Unit tests: mock-free design tests for request construction
+
+#### Phase 3 — PIT API Types (`src/api/pit.zig`)
+- [x] `PitOpenRequest` struct — holds `index: []const u8` and `keep_alive: []const u8` (e.g. `"5m"`)
+- [x] `PitOpenRequest.httpMethod()` → `"POST"`
+- [x] `PitOpenRequest.httpPath(allocator)` → `"/<index>/_search/point_in_time?keep_alive=<duration>"` (OpenSearch-compatible)
+- [x] `PitOpenRequest.httpBody(allocator)` → `null` (no body needed)
+- [x] `PitOpenResponse` struct — `pit_id: []const u8`
+- [x] `PitCloseRequest` struct — holds `pit_id: []const u8`
+- [x] `PitCloseRequest.httpMethod()` → `"DELETE"`
+- [x] `PitCloseRequest.httpPath(allocator)` → `"/_search/point_in_time"` (OpenSearch-compatible)
+- [x] `PitCloseRequest.httpBody(allocator)` → `{"pit_id": "<id>"}`
+- [x] `PitSearchRequest` struct — search with PIT context: `pit_id`, `keep_alive`, `query`, `size`, `search_after`, `sort`
+- [x] `PitSearchRequest.httpMethod()` → `"POST"`
+- [x] `PitSearchRequest.httpPath(allocator)` → `"/_search"` (no index in path when using PIT)
+- [x] `PitSearchRequest.httpBody(allocator)` → `{"pit": {"id": "...", "keep_alive": "..."}, "query": {...}, "size": N, "sort": [...], "search_after": [...]}`
+- [x] `PitSearchResponse(T)` — extends `SearchResponse(T)` with `pit_id: ?[]const u8` (refreshed PIT ID)
+- [x] Unit tests: verify HTTP method, path, body for each request type; verify `search_after` + `sort` serialization
+
+#### Phase 4 — PitIterator (`src/api/pit.zig`)
+- [x] `PitIterator(T)` struct — generic over document type `T`, preferred over scroll for read-heavy queries
+- [x] Fields: `allocator`, `pool: *ConnectionPool`, `compression: bool`, `pit_id`, `keep_alive`, `sort_fields`, `last_sort_values`, `current_page`, `done: bool`, `page_size: u32`
+- [x] `PitIterator.init(allocator, pool, compression, index, query, opts)` — opens PIT via `POST /<index>/_search/point_in_time`, sends initial search, parses first page
+- [x] `PitIterator.next()` → `?[]const Hit(T)` — returns next page of hits, or `null` when exhausted
+- [x] On each `next()`: extracts `sort` values from last hit of previous page, sends `search_after` search, updates `pit_id` (may be refreshed by ES)
+- [x] Returns `null` when `hits.hits` is empty
+- [x] `PitIterator.deinit()` — sends `DELETE /_search/point_in_time` to close PIT, frees memory
+- [x] Memory cap: only one page of hits is live at a time; previous page is freed on `next()`
+- [x] Default sort: `[{"_doc": "asc"}]` (most efficient for full-index scans)
+- [x] Error handling: transport errors propagate; on `deinit` close-PIT errors are silently ignored
+
+#### Phase 5 — ESClient Convenience Methods (`src/client.zig`)
+- [x] `ESClient.scrollSearch(comptime T, index, query, opts, scroll_duration)` → `ScrollIterator(T)` — convenience to create and initialize a scroll iterator
+- [x] `ESClient.openPit(index, keep_alive)` → `[]u8` — open a point-in-time, returns owned pit_id
+- [x] `ESClient.closePit(pit_id)` → `void` — close a point-in-time
+- [x] `ESClient.pitSearch(comptime T, index, query, page_size, keep_alive)` → `PitIterator(T)` — convenience to create and initialize a PIT iterator
+- [x] Update `src/request.zig` — replaced placeholder `ScrollRequest`, `ClearScrollRequest`, `PitOpenRequest`, `PitCloseRequest` with real types from `api/scroll.zig` and `api/pit.zig`
+- [x] Update `src/root.zig` — re-exported `ScrollIterator`, `PitIterator`, `ScrollSearchRequest`, `PitOpenRequest`, `PitCloseRequest`, `PitSearchRequest`, and all related types
+
+#### Phase 6 — Integration Tests (`tests/integration/scroll_pit_integration.zig`)
+- [x] `integration_scroll_basic` — index 25 docs, scroll with `size=10`, collect all pages, verify 25 total hits across 3 pages
+- [x] `integration_scroll_with_query` — index 20 docs (10 active, 10 inactive), scroll with `term(active, true)`, verify only 10 hits
+- [x] `integration_scroll_empty_result` — scroll on empty index, verify immediate `null` from `next()`
+- [x] `integration_scroll_auto_clear` — scroll through partial results, call `deinit()`, verify scroll context is cleared (no leaked server resources)
+- [x] `integration_scroll_single_page` — index 5 docs, scroll with `size=10`, verify all returned in first page, `next()` returns `null`
+- [x] `integration_pit_basic` — index 25 docs, PIT iterate with `size=10`, collect all pages, verify 25 total hits
+- [x] `integration_pit_with_query` — index 20 docs, PIT iterate with query filter, verify correct subset
+- [x] `integration_pit_empty_result` — PIT iterate on empty index, verify immediate `null`
+- [x] `integration_pit_auto_close` — iterate partially, call `deinit()`, verify PIT is closed
+- [x] `integration_pit_open_close` — open PIT explicitly, verify `pit_id` returned, close PIT, verify no error
+- [x] `integration_scroll_large_dataset` — index 500 docs, scroll with `size=50`, verify all 500 retrieved across 10 pages
+- [x] Each test creates UUID-named index, indexes docs via `BulkIndexer`, refreshes, iterates, asserts, deletes index
+- [x] `build.zig` — added `scroll_pit_integration.zig` to `test-integration` step
+
+Deliverable: `ScrollIterator` and `PitIterator` page through arbitrarily large result
+sets without buffering more than one page in memory. Auto-clear/close on `deinit()`
+prevents leaked server-side resources. Both iterators use the same `Hit(T)` type from
+`SearchResponse`. Integration tests verify pagination, query filtering, empty results,
+and resource cleanup against OpenSearch. Can page through 500K+ concept documents.
 
 ---
 
 
@@ -7,6 +7,137 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Milestone M6 — Scroll + PIT ✅
+
+**Status: Complete**
+**Test backend: Elasticsearch (OpenSearch) on port 9200**
+
+#### Added
+
+- **`src/api/scroll.zig`** — Scroll API types and iterator for paging through large result sets.
+  - `ScrollSearchRequest` — initial scroll search (`POST /<index>/_search?scroll=<duration>`).
+    Builds the same JSON body as `SearchRequest` (query, size, from, `_source`, aggs).
+  - `ScrollNextRequest` — fetch next page (`POST /_search/scroll`).
+    Body: `{"scroll":"<duration>","scroll_id":"<id>"}`.
+  - `ClearScrollRequest` — clear scroll context (`DELETE /_search/scroll`).
+    Body: `{"scroll_id":"<id>"}`.
+  - `ScrollSearchResponse(T)` — generic response with `_scroll_id`, `hits: HitsEnvelope(T)`, `took`.
+  - `ScrollIterator(T)` — pages through results using a two-page memory strategy:
+    - `init()` sends the initial scroll search and parses the first page.
+    - `next()` returns the current page's hits (valid until next call), then prefetches the next page.
+    - `deinit()` sends `DELETE /_search/scroll` (errors silently ignored) and frees all memory.
+    - Uses `previous_page` / `current_page` to keep returned hits alive until the next `next()` call.
+    - `scroll_id` is an owned copy — survives parsed response arena frees.
+  - 14 unit tests: request method/path/body serialization, response deserialization.
+
+- **`src/api/pit.zig`** — Point-in-Time (PIT) API types and iterator.
+  - `PitOpenRequest` — open a PIT (`POST /<index>/_search/point_in_time?keep_alive=<duration>`), no body.
+  - `PitOpenResponse` — parses `{"pit_id": "..."}` from the open response.
+  - `PitCloseRequest` — close a PIT (`DELETE /_search/point_in_time`).
+    Body: `{"pit_id":"<id>"}`.
+  - `PitSearchRequest` — search with PIT context (`POST /_search`, no index in path).
+    Body includes `pit`, `query`, `size`, `sort`, `search_after`.
+    Defaults to `[{"_doc":"asc"}]` sort when none specified.
+  - `SortField` — sort specification with `field` and `order` strings.
+  - `PitHit(T)` — like `Hit(T)` but with an additional `sort: ?[]const std.json.Value` field
+    for `search_after` pagination.
+  - `PitHitsEnvelope(T)` — hits envelope with `total`, `hits`, `max_score`.
+  - `PitSearchResponse(T)` — response with `pit_id`, `hits`, `took`.
+  - `PitIterator(T)` — pages through results using PIT + `search_after`:
+    - `init()` opens a PIT, sends the initial search, stores the first page.
+    - `next()` returns the current page's hits and prefetches the next page; returns `null` when exhausted.
+    - `deinit()` frees all memory and sends a best-effort `DELETE` to close the server-side PIT.
+    - PIT ID is updated if the server returns a refreshed one.
+    - Only one page of hits is live at a time (previous page freed on next `next()` call).
+  - 14 unit tests: request method/path/body serialization, response deserialization.
+
+- **`src/pool.zig`** — Added `sendBodyForMethodWithoutBody` helper for DELETE-with-body.
+  Zig 0.15's `std.http.Client` hard-asserts in `sendBodyUnflushed` that the HTTP method
+  supports a body (`requestHasBody()` returns `false` for DELETE). Elasticsearch legitimately
+  requires DELETE with a JSON body for clear-scroll, close-PIT, and delete-by-query.
+  The helper replicates the essential parts of the private `sendHead` function, writing the
+  request line, headers (Host, Connection, Accept-Encoding, Content-Length, extra headers),
+  body, and flushing directly to the connection's buffered writer. After this call the
+  connection is in the correct state for `receiveHead`.
+
+- **`src/client.zig`** — Added 4 convenience methods on `ESClient`:
+  - `scrollSearch(comptime T, index, query, opts, scroll_duration)` → `ScrollIterator(T)` —
+    creates and initializes a scroll iterator.
+  - `openPit(index, keep_alive)` → `[]u8` — opens a PIT, returns the `pit_id` as an owned string.
+  - `closePit(pit_id)` → `void` — closes a PIT via DELETE.
+  - `pitSearch(comptime T, index, query, page_size, keep_alive)` → `PitIterator(T)` —
+    creates and initializes a PIT iterator.
+
+- **`src/request.zig`** — Replaced 4 placeholder structs with real types:
+  - `ScrollRequest` → `scroll_api.ScrollSearchRequest`
+  - `ClearScrollRequest` → `scroll_api.ClearScrollRequest`
+  - `PitOpenRequest` → `pit_api.PitOpenRequest`
+  - `PitCloseRequest` → `pit_api.PitCloseRequest`
+
+- **`src/root.zig`** — Re-exports for all new public types:
+  `ScrollSearchRequest`, `ScrollNextRequest`, `ClearScrollRequest`, `ScrollSearchResponse`,
+  `ScrollIterator`, `PitOpenRequest`, `PitOpenResponse`, `PitCloseRequest`, `PitSearchRequest`,
+  `SortField`, `PitSearchResponse`, `PitHit`, `PitIterator`.
+
+- **`tests/integration/scroll_pit_integration.zig`** — M6 integration tests (11 tests):
+  - `integration_scroll_basic` — 25 docs, page size 10, expects 3 pages totaling 25 hits.
+  - `integration_scroll_with_query` — 20 docs with alternating active, term query filters to 10.
+  - `integration_scroll_empty_result` — empty index, `next()` returns null immediately.
+  - `integration_scroll_single_page` — 5 docs, page size 10, all fit in one page.
+  - `integration_scroll_large_dataset` — 500 docs, page size 50, expects 10 pages.
+  - `integration_scroll_auto_clear` — partial consumption then `deinit()` clears scroll context.
+  - `integration_pit_open_close` — explicit open/close PIT lifecycle.
+  - `integration_pit_basic` — 25 docs, page size 10, collects all via PIT iterator.
+  - `integration_pit_with_query` — 20 docs with alternating active, term query filters to 10.
+  - `integration_pit_empty_result` — empty index, `next()` returns null immediately.
+  - `integration_pit_auto_close` — partial consumption then `deinit()` closes PIT.
+  - All tests use `ESClient` directly via `elaztic` module.
+
+- **`build.zig`** — Added `scroll_pit_integration.zig` to `test-integration` step.
+
+- **`CLAUDE.md`** — Documented the `std.http.Client` DELETE-with-body limitation in §3a
+  and enriched M6 section with detailed phase-by-phase checklist.
+
+#### M6 Checklist
+
+- [x] `ScrollSearchRequest` with `httpMethod()`, `httpPath()`, `httpBody()` — initial scroll search
+- [x] `ScrollNextRequest` — fetch next page with scroll_id
+- [x] `ClearScrollRequest` — clear scroll context (`DELETE /_search/scroll` with body)
+- [x] `ScrollSearchResponse(T)` — response with `_scroll_id` field
+- [x] `ScrollIterator(T)` — two-page memory strategy, auto-clear on `deinit()`
+- [x] `ScrollIterator` scroll_id is an owned copy — survives parsed response arena frees
+- [x] `PitOpenRequest` — open PIT (OpenSearch-compatible `/_search/point_in_time` path)
+- [x] `PitOpenResponse` — parse `pit_id` from response
+- [x] `PitCloseRequest` — close PIT (`DELETE /_search/point_in_time` with body)
+- [x] `PitSearchRequest` — search with PIT context, `search_after`, `sort`
+- [x] `PitSearchResponse(T)` — response with refreshed `pit_id`
+- [x] `PitHit(T)` — hit with `sort` values for `search_after` pagination
+- [x] `PitIterator(T)` — PIT + search_after pagination, auto-close on `deinit()`
+- [x] `SortField` struct — sort specification with field and order
+- [x] Default sort `[{"_doc":"asc"}]` for efficient full-index scans
+- [x] `sendBodyForMethodWithoutBody` in pool.zig — DELETE-with-body workaround for Zig 0.15
+- [x] `ESClient.scrollSearch()` — convenience method
+- [x] `ESClient.openPit()` / `closePit()` — explicit PIT lifecycle
+- [x] `ESClient.pitSearch()` — convenience method
+- [x] `request.zig` — replaced 4 M6 placeholder structs with real types
+- [x] `root.zig` — re-exports for all scroll/PIT public types
+- [x] Unit tests: 28 tests (14 scroll + 14 PIT) for request/response serialization
+- [x] Integration tests: 11 tests (6 scroll + 5 PIT) against OpenSearch
+- [x] `build.zig` — `scroll_pit_integration.zig` in `test-integration` step
+- [x] Memory safety: zero leaks in all tests (GPA-verified)
+
+#### Deliverable
+
+`ScrollIterator` and `PitIterator` page through arbitrarily large result sets without
+buffering more than one page in memory. Auto-clear/close on `deinit()` prevents leaked
+server-side resources. Both iterators use the same document type `T` from `SearchResponse`.
+Integration tests verify pagination (25 docs across 3 pages, 500 docs across 10 pages),
+query filtering, empty results, single-page results, and resource cleanup against OpenSearch.
+The DELETE-with-body workaround in pool.zig enables clear-scroll and close-PIT operations
+despite Zig 0.15's `std.http.Client` limitation. 153 unit tests + 33 integration tests pass.
+
+---
+
 ### Milestone M5 — Bulk Indexer ✅
 
 **Status: Complete**