@@ -183,6 +183,33 @@ pub const ClientConfig = struct {
183183
184184Use ` std.net.TcpStream ` directly for the connection pool transport. Do not use ` std.http.Client ` (no external connection lifecycle control, API instability) or ` http.zig ` (server-only library, no outbound client API). ES's REST API is pure request/response HTTP/1.1 — framing it by hand in ` pool.zig ` is ~ 200 lines and gives the ` ConnectionPool ` full control over socket acquire/release/reuse.
185185
186+ > ** Known issue — ` std.http.Client ` vs DELETE-with-body (Zig 0.15):**
187+ >
188+ > Elasticsearch uses ` DELETE ` with a JSON body for several endpoints (clear
189+ > scroll, close PIT, delete by query). Zig's ` std.http.Client ` has a hard
190+ > ` assert(r.method.requestHasBody()) ` inside ` sendBodyUnflushed ` (Client.zig
191+ > L924), and ` requestHasBody() ` returns ` false ` for ` DELETE ` (http.zig L38).
192+ > This means calling ` req.sendBodyComplete(body) ` on a DELETE request panics.
193+ >
194+ > ** Workaround in ` pool.zig ` :** When ` sendRequest ` detects a body on a method
195+ > where ` requestHasBody() ` is ` false ` , it bypasses ` sendBodyComplete ` and
196+ > instead writes the HTTP request directly to the connection's writer:
197+ >
198+ > 1 . Set ` req.transfer_encoding = .{ .content_length = payload.len } ` so
199+ > the ` content-length ` header is emitted by ` sendHead ` .
200+ > 2 . Call the private ` sendHead ` indirectly — not possible; instead
201+ > replicate the header-writing logic using ` req.connection.?.writer() ` .
202+ > 3 . Write the body bytes and flush.
203+ > 4 . ` receiveHead ` works normally afterwards because it just reads from
204+ > the same connection.
205+ >
206+ > This affects: ` ClearScrollRequest ` (` DELETE /_search/scroll ` ),
207+ > ` PitCloseRequest ` (` DELETE /_search/point_in_time ` ), and any future
208+ > DELETE-with-body endpoint.
209+ >
210+ > Other ES clients (elasticsearch-py, go-elasticsearch, elasticsearch-java)
211+ > use HTTP libraries that allow DELETE with body per RFC 9110 §9.3.5.
212+
186213### 4. Bulk Indexer
187214
188215Critical for Snowstorm's RF2 import pipeline. A dedicated ` BulkIndexer `
@@ -474,12 +501,91 @@ thresholds, and per-action failure reporting. Can drive RF2 import workloads at
474501### M6 — Scroll + PIT (weeks 14–15)
475502** Test backend: Elasticsearch (OpenSearch)**
476503
477- - [ ] ` ScrollIterator ` — page through results, auto-clear on ` deinit `
478- - [ ] Point-in-time: ` openPit ` , ` closePit ` , search with ` search_after `
479- - [ ] ` PitIterator ` — preferred over scroll for read-heavy queries
480- - [ ] Memory cap: never buffer more than one page
481-
482- Deliverable: Can page through 500K+ concept documents.
504+ #### Phase 1 — Scroll API Types (` src/api/scroll.zig ` )
505+ - [x] ` ScrollSearchRequest ` struct — wraps a ` SearchRequest ` with ` scroll ` keep-alive duration (e.g. ` "1m" ` )
506+ - [x] ` ScrollSearchRequest.httpMethod() ` → ` "POST" `
507+ - [x] ` ScrollSearchRequest.httpPath(allocator) ` → ` "/<index>/_search?scroll=<duration>" `
508+ - [x] ` ScrollSearchRequest.httpBody(allocator) ` → delegates to inner ` SearchRequest.httpBody() `
509+ - [x] ` ScrollNextRequest ` struct — holds ` scroll_id: []const u8 ` and ` scroll: []const u8 ` (keep-alive)
510+ - [x] ` ScrollNextRequest.httpMethod() ` → ` "POST" `
511+ - [x] ` ScrollNextRequest.httpPath(allocator) ` → ` "/_search/scroll" `
512+ - [x] ` ScrollNextRequest.httpBody(allocator) ` → ` {"scroll": "<duration>", "scroll_id": "<id>"} `
513+ - [x] ` ClearScrollRequest ` struct — holds ` scroll_id: []const u8 `
514+ - [x] ` ClearScrollRequest.httpMethod() ` → ` "DELETE" `
515+ - [x] ` ClearScrollRequest.httpPath(allocator) ` → ` "/_search/scroll" `
516+ - [x] ` ClearScrollRequest.httpBody(allocator) ` → ` {"scroll_id": "<id>"} `
517+ - [x] ` ScrollSearchResponse(T) ` — extends ` SearchResponse(T) ` with ` _scroll_id: ?[]const u8 `
518+ - [x] Unit tests: verify HTTP method, path (with scroll param), body for each request type
519+
520+ #### Phase 2 — ScrollIterator (` src/api/scroll.zig ` )
521+ - [x] ` ScrollIterator(T) ` struct — generic over document type ` T `
522+ - [x] Fields: ` allocator ` , ` pool: *ConnectionPool ` , ` compression: bool ` , ` scroll_duration ` , ` scroll_id ` , ` current_page ` , ` done: bool `
523+ - [x] ` ScrollIterator.init(allocator, pool, compression, index, query, opts, scroll_duration) ` — sends initial ` _search?scroll= ` request, parses first page
524+ - [x] ` ScrollIterator.next() ` → ` ?[]const Hit(T) ` — returns next page of hits, or ` null ` when exhausted
525+ - [x] On each ` next() ` : sends ` POST /_search/scroll ` with current ` scroll_id ` , parses response, updates ` scroll_id `
526+ - [x] Returns ` null ` when ` hits.hits ` is empty (no more results)
527+ - [x] ` ScrollIterator.deinit() ` — sends ` DELETE /_search/scroll ` to clear server-side scroll context, frees memory
528+ - [x] Memory cap: only one page of hits is live at a time; previous page is freed on ` next() `
529+ - [x] Error handling: transport errors propagate; on ` deinit ` clear-scroll errors are silently ignored
530+ - [x] Unit tests: mock-free design tests for request construction
531+
532+ #### Phase 3 — PIT API Types (` src/api/pit.zig ` )
533+ - [x] ` PitOpenRequest ` struct — holds ` index: []const u8 ` and ` keep_alive: []const u8 ` (e.g. ` "5m" ` )
534+ - [x] ` PitOpenRequest.httpMethod() ` → ` "POST" `
535+ - [x] ` PitOpenRequest.httpPath(allocator) ` → ` "/<index>/_search/point_in_time?keep_alive=<duration>" ` (OpenSearch-compatible)
536+ - [x] ` PitOpenRequest.httpBody(allocator) ` → ` null ` (no body needed)
537+ - [x] ` PitOpenResponse ` struct — ` pit_id: []const u8 `
538+ - [x] ` PitCloseRequest ` struct — holds ` pit_id: []const u8 `
539+ - [x] ` PitCloseRequest.httpMethod() ` → ` "DELETE" `
540+ - [x] ` PitCloseRequest.httpPath(allocator) ` → ` "/_search/point_in_time" ` (OpenSearch-compatible)
541+ - [x] ` PitCloseRequest.httpBody(allocator) ` → ` {"pit_id": "<id>"} `
542+ - [x] ` PitSearchRequest ` struct — search with PIT context: ` pit_id ` , ` keep_alive ` , ` query ` , ` size ` , ` search_after ` , ` sort `
543+ - [x] ` PitSearchRequest.httpMethod() ` → ` "POST" `
544+ - [x] ` PitSearchRequest.httpPath(allocator) ` → ` "/_search" ` (no index in path when using PIT)
545+ - [x] ` PitSearchRequest.httpBody(allocator) ` → ` {"pit": {"id": "...", "keep_alive": "..."}, "query": {...}, "size": N, "sort": [...], "search_after": [...]} `
546+ - [x] ` PitSearchResponse(T) ` — extends ` SearchResponse(T) ` with ` pit_id: ?[]const u8 ` (refreshed PIT ID)
547+ - [x] Unit tests: verify HTTP method, path, body for each request type; verify ` search_after ` + ` sort ` serialization
548+
549+ #### Phase 4 — PitIterator (` src/api/pit.zig ` )
550+ - [x] ` PitIterator(T) ` struct — generic over document type ` T ` , preferred over scroll for read-heavy queries
551+ - [x] Fields: ` allocator ` , ` pool: *ConnectionPool ` , ` compression: bool ` , ` pit_id ` , ` keep_alive ` , ` sort_fields ` , ` last_sort_values ` , ` current_page ` , ` done: bool ` , ` page_size: u32 `
552+ - [x] ` PitIterator.init(allocator, pool, compression, index, query, opts) ` — opens PIT via ` POST /<index>/_search/point_in_time ` , sends initial search, parses first page
553+ - [x] ` PitIterator.next() ` → ` ?[]const Hit(T) ` — returns next page of hits, or ` null ` when exhausted
554+ - [x] On each ` next() ` : extracts ` sort ` values from last hit of previous page, sends ` search_after ` search, updates ` pit_id ` (may be refreshed by ES)
555+ - [x] Returns ` null ` when ` hits.hits ` is empty
556+ - [x] ` PitIterator.deinit() ` — sends ` DELETE /_search/point_in_time ` to close PIT, frees memory
557+ - [x] Memory cap: only one page of hits is live at a time; previous page is freed on ` next() `
558+ - [x] Default sort: ` [{"_doc": "asc"}] ` (most efficient for full-index scans)
559+ - [x] Error handling: transport errors propagate; on ` deinit ` close-PIT errors are silently ignored
560+
561+ #### Phase 5 — ESClient Convenience Methods (` src/client.zig ` )
562+ - [x] ` ESClient.scrollSearch(comptime T, index, query, opts, scroll_duration) ` → ` ScrollIterator(T) ` — convenience to create and initialize a scroll iterator
563+ - [x] ` ESClient.openPit(index, keep_alive) ` → ` []u8 ` — open a point-in-time, returns owned pit_id
564+ - [x] ` ESClient.closePit(pit_id) ` → ` void ` — close a point-in-time
565+ - [x] ` ESClient.pitSearch(comptime T, index, query, page_size, keep_alive) ` → ` PitIterator(T) ` — convenience to create and initialize a PIT iterator
566+ - [x] Update ` src/request.zig ` — replaced placeholder ` ScrollRequest ` , ` ClearScrollRequest ` , ` PitOpenRequest ` , ` PitCloseRequest ` with real types from ` api/scroll.zig ` and ` api/pit.zig `
567+ - [x] Update ` src/root.zig ` — re-exported ` ScrollIterator ` , ` PitIterator ` , ` ScrollSearchRequest ` , ` PitOpenRequest ` , ` PitCloseRequest ` , ` PitSearchRequest ` , and all related types
568+
569+ #### Phase 6 — Integration Tests (` tests/integration/scroll_pit_integration.zig ` )
570+ - [x] ` integration_scroll_basic ` — index 25 docs, scroll with ` size=10 ` , collect all pages, verify 25 total hits across 3 pages
571+ - [x] ` integration_scroll_with_query ` — index 20 docs (10 active, 10 inactive), scroll with ` term(active, true) ` , verify only 10 hits
572+ - [x] ` integration_scroll_empty_result ` — scroll on empty index, verify immediate ` null ` from ` next() `
573+ - [x] ` integration_scroll_auto_clear ` — scroll through partial results, call ` deinit() ` , verify scroll context is cleared (no leaked server resources)
574+ - [x] ` integration_scroll_single_page ` — index 5 docs, scroll with ` size=10 ` , verify all returned in first page, ` next() ` returns ` null `
575+ - [x] ` integration_pit_basic ` — index 25 docs, PIT iterate with ` size=10 ` , collect all pages, verify 25 total hits
576+ - [x] ` integration_pit_with_query ` — index 20 docs, PIT iterate with query filter, verify correct subset
577+ - [x] ` integration_pit_empty_result ` — PIT iterate on empty index, verify immediate ` null `
578+ - [x] ` integration_pit_auto_close ` — iterate partially, call ` deinit() ` , verify PIT is closed
579+ - [x] ` integration_pit_open_close ` — open PIT explicitly, verify ` pit_id ` returned, close PIT, verify no error
580+ - [x] ` integration_scroll_large_dataset ` — index 500 docs, scroll with ` size=50 ` , verify all 500 retrieved across 10 pages
581+ - [x] Each test creates UUID-named index, indexes docs via ` BulkIndexer ` , refreshes, iterates, asserts, deletes index
582+ - [x] ` build.zig ` — added ` scroll_pit_integration.zig ` to ` test-integration ` step
583+
584+ Deliverable: ` ScrollIterator ` and ` PitIterator ` page through arbitrarily large result
585+ sets without buffering more than one page in memory. Auto-clear/close on ` deinit() `
586+ prevents leaked server-side resources. Both iterators use the same ` Hit(T) ` type from
587+ ` SearchResponse ` . Integration tests verify pagination, query filtering, empty results,
588+ and resource cleanup against OpenSearch. Can page through 500K+ concept documents.
483589
484590---
485591
0 commit comments