This project builds a production-grade Elasticsearch client library in Zig, designed as a prerequisite for a larger Snowstorm SNOMED CT terminology server rewrite. The client should be a first-class standalone open-source library — not a throwaway internal tool.
Design north star: Take architectural inspiration from
lambdaworks/zio-elasticsearch (typed field accessors, ADT request model,
streaming separation) but implemented idiomatically in Zig using comptime
instead of runtime reflection.
Target: Elasticsearch 7.x and 8.x. HTTP/1.1 only (no HTTP/2 needed).
Zig version: Track latest stable release (pinned via zig-overlay in flake).
This project uses Nix flakes for a fully reproducible dev environment.
The flake is at the root of the repo (flake.nix).
Entering the dev shell:
nix develop # stable Zig (default)
nix develop .#nightly # Zig nightly/master
nix develop .#ci # minimal, for automation
What the dev shell provides:
- Zig (stable or nightly depending on shell)
- ZLS (Zig Language Server)
justtask runneropensearch(from nixpkgs, Apache 2.0 licensed)es-start/es-stop/es-statushelper scriptsgit,pkg-config- Platform debugger:
gdb+valgrindon Linux,lldbon macOS
Never install Zig or ZLS globally — always use nix develop. This ensures
every contributor is on the exact same toolchain version.
Building:
zig build # debug build
zig build -Doptimize=ReleaseSafe # release build
nix build # build via Nix (reproducible)
All tests (smoke and integration) run against OpenSearch, which is available
as pkgs.opensearch from nixpkgs. OpenSearch is the Apache 2.0 licensed fork
of Elasticsearch, wire-compatible with the ES 7.x REST API. It is fully
managed by Nix — no Docker required.
Starting OpenSearch:
es-start # starts OpenSearch on port 9200, data in .opensearch-data/
es-stop # stops it
es-status # check if running
Set ES_URL=http://localhost:9200 when running tests.
Tests are skipped automatically if ES_URL is not set.
Per-test index isolation: Every test creates a fresh index
with a UUID-based name and tears it down in defer. Tests never share indices.
elaztic/
├── src/
│ ├── root.zig # Public API surface, re-exports
│ ├── client.zig # ESClient, connection pool, config
│ ├── pool.zig # HTTP connection pool (keep-alive)
│ ├── request.zig # ElasticRequest tagged union
│ ├── query/
│ │ ├── builder.zig # Comptime query DSL (BoolQuery, TermQuery, etc.)
│ │ ├── field.zig # Comptime field path accessor (FieldPath(T))
│ │ └── aggregation.zig # Aggregation DSL
│ ├── api/
│ │ ├── search.zig # Search request/response types
│ │ ├── bulk.zig # Bulk indexer
│ │ ├── index.zig # Index management (create, delete, alias)
│ │ ├── scroll.zig # Scroll API
│ │ └── pit.zig # Point-in-time API
│ ├── json/
│ │ ├── serialize.zig # Comptime JSON serializer (structs → ES JSON)
│ │ └── deserialize.zig # Comptime JSON deserializer (ES responses → structs)
│ └── error.zig # Error types and ES error envelope parsing
├── tests/
│ ├── smoke/ # Against Elasticsearch (OpenSearch)
│ └── integration/ # Against Elasticsearch (OpenSearch)
├── examples/
│ ├── basic_search.zig
│ ├── bulk_index.zig
│ └── scroll_large.zig
├── bench/ # Throughput benchmarks (separate from tests)
├── flake.nix # Dev environment — always use this
├── flake.lock
├── justfile # Task runner commands
└── build.zig
Do NOT use string literals for field names in queries. Use comptime-validated field paths that fail at compile time if the field doesn't exist.
pub fn field(comptime T: type, comptime name: []const u8) FieldPath(T) {
if (!@hasField(T, name)) {
@compileError("Field '" ++ name ++ "' does not exist on " ++ @typeName(T));
}
return .{ .name = name };
}Usage:
const Concept = struct { id: u64, active: bool, module_id: u64 };
const q = Query.bool(.{
.must = &.{
Query.term(field(Concept, "active"), true),
Query.range(field(Concept, "module_id")).gte(900000000000207008),
}
});
// field(Concept, "typo") → compile error: Field 'typo' does not exist on ConceptAll operations are values of a single tagged union, dispatched by a single
execute function. This makes the API surface minimal and composable.
pub const ElasticRequest = union(enum) {
search: SearchRequest,
bulk: BulkRequest,
create_index: CreateIndexRequest,
delete_index: DeleteIndexRequest,
get: GetRequest,
delete: DeleteRequest,
scroll: ScrollRequest,
clear_scroll: ClearScrollRequest,
pit_open: PitOpenRequest,
pit_close: PitCloseRequest,
put_mapping: PutMappingRequest,
refresh: RefreshRequest,
count: CountRequest,
};No async/await (not stable in Zig). Fixed thread pool with persistent HTTP/1.1 keep-alive connections per ES node. This is correct for an ES client — outbound to a small cluster, not inbound fan-out.
pub const ClientConfig = struct {
max_connections_per_node: u32 = 10,
request_timeout_ms: u32 = 30_000,
retry_on_failure: u32 = 3,
retry_backoff_ms: u32 = 100,
compression: bool = true,
};Use std.net.TcpStream directly for the connection pool transport. Do not use std.http.Client (no external connection lifecycle control, API instability) or http.zig (server-only library, no outbound client API). ES's REST API is pure request/response HTTP/1.1 — framing it by hand in pool.zig is ~200 lines and gives the ConnectionPool full control over socket acquire/release/reuse.
Known issue —
std.http.Clientvs DELETE-with-body (Zig 0.15):Elasticsearch uses
DELETEwith a JSON body for several endpoints (clear scroll, close PIT, delete by query). Zig'sstd.http.Clienthas a hardassert(r.method.requestHasBody())insidesendBodyUnflushed(Client.zig L924), andrequestHasBody()returnsfalseforDELETE(http.zig L38). This means callingreq.sendBodyComplete(body)on a DELETE request panics.Workaround in
pool.zig: WhensendRequestdetects a body on a method whererequestHasBody()isfalse, it bypassessendBodyCompleteand instead writes the HTTP request directly to the connection's writer:
- Set
req.transfer_encoding = .{ .content_length = payload.len }so thecontent-lengthheader is emitted bysendHead.- Call the private
sendHeadindirectly — not possible; instead replicate the header-writing logic usingreq.connection.?.writer().- Write the body bytes and flush.
receiveHeadworks normally afterwards because it just reads from the same connection.This affects:
ClearScrollRequest(DELETE /_search/scroll),PitCloseRequest(DELETE /_search/point_in_time), and any future DELETE-with-body endpoint.Other ES clients (elasticsearch-py, go-elasticsearch, elasticsearch-java) use HTTP libraries that allow DELETE with body per RFC 9110 §9.3.5.
Critical for Snowstorm's RF2 import pipeline. A dedicated BulkIndexer
handles batching and flushing — not a thin wrapper over the bulk endpoint.
pub const BulkConfig = struct {
max_docs: usize = 1000,
max_bytes: usize = 5 * 1024 * 1024, // 5MB
flush_interval_ms: ?u64 = null, // null = manual flush only
};The streaming API must yield pages — never buffer a full result set in memory. Snowstorm ECL queries can return millions of concept IDs.
Zig's std.json parses to a dynamic Value tree — insufficient for typed ES
responses. Build a comptime deserializer layer on top.
Rules:
- Omit null optional fields (ES treats missing and null differently)
- Snake_case field names map 1:1 to ES field names
- SNOMED concept IDs are
u64— neveri32oru32
Never panic. Return errors up the stack. ES errors have a well-defined JSON envelope — parse them into typed errors, not raw strings.
pub const ESError = error{
ConnectionRefused,
ConnectionTimeout,
RequestTimeout,
TooManyRequests,
IndexNotFound,
DocumentNotFound,
VersionConflict,
MappingConflict,
ShardFailure,
ClusterUnavailable,
UnexpectedResponse,
MalformedJson,
};Retry on 429 and 503 with backoff. Never retry 4xx (except 429).
Test backend: Elasticsearch (OpenSearch)
-
ConnectionPool— persistent TCP connections, keep-alive, round-robin - HTTP/1.1 request serializer and response parser
- gzip body compression (
std.compress.zlib) - Retry logic with exponential backoff
- Smoke test:
client.ping()against Elasticsearch on port 9200
Deliverable: client.ping() returns a cluster health response.
Test backend: Elasticsearch (OpenSearch)
- Comptime struct serializer → ES JSON
- Comptime JSON deserializer → typed structs
-
SearchResponse(T)— generic over_sourcedocument type -
BulkResponse— parse per-action results -
ErrorEnvelope— parse ES error JSON - Unit tests with captured response fixtures (no network)
- Smoke test: round-trip a struct through Elasticsearch index → get
Deliverable: Typed round-trip works against Elasticsearch.
Test backend: Elasticsearch (OpenSearch)
-
FieldPathstruct holding comptime field name string -
field(comptime T, comptime name)— validates field exists via@hasField, returnsFieldPath - Nested path support:
field(Outer, "inner.value")splits on.and walks struct types -
@compileErrorwith human-readable message on invalid field names - Unit tests: valid field, invalid field (compile error), nested paths,
u64fields
-
Querynamespace withtoJson(allocator)→[]u8on every query type -
Query.term(field_name, value)→{"term": {"field": value}} -
Query.terms(field_name, values_slice)→{"terms": {"field": [...]}}(large[]u64support) -
Query.match(field_name, text)→{"match": {"field": text}} -
Query.matchAll()→{"match_all": {}} -
Query.bool(opts)with.must,.filter,.should,.must_not(each[]const Query) -
Query.range(field_name)with.gt(),.gte(),.lt(),.lte()chainable builder -
Query.exists(field_name)→{"exists": {"field": "name"}} -
Query.prefix(field_name, value)→{"prefix": {"field": value}} -
Query.ids(id_slice)→{"ids": {"values": [...]}} -
Query.nested(path, query)→{"nested": {"path": "...", "query": {...}}} -
Query.wildcard(field_name, pattern)→{"wildcard": {"field": pattern}} - All queries serialize to
std.json.Value(object tree) for composability - Unit tests per query type: construct → serialize → diff against expected JSON string
-
Aggregationnamespace withtoJson(allocator)→[]u8 -
Aggregation.terms(name, field_name, size)→{"name": {"terms": {"field": "...", "size": N}}} -
Aggregation.valueCount(name, field_name)→{"name": {"value_count": {"field": "..."}}} -
Aggregation.topHits(name, size)→{"name": {"top_hits": {"size": N}}} - Sub-aggregation nesting:
.subAggs(child_agg)for terms → top_hits patterns - Unit tests per aggregation type
-
_source: falseto exclude source entirely -
_source: ["field1", "field2"]include list -
_source: {"includes": [...], "excludes": [...]}full form - Integrated into search request body builder
- Unit tests for each source filtering mode
-
integration_term_query— index 3 docs, term query onactive=true, assert hit count -
integration_terms_query— terms query with[]u64concept IDs, assert correct docs returned -
integration_bool_query— must + filter combination, verify results -
integration_range_query— range onmodule_idwith.gte(), verify boundaries -
integration_match_query— full-text match on atermfield -
integration_exists_query— filter docs with/without optional field -
integration_nested_bool— deeply nested bool with should + must_not -
integration_aggregation_terms— terms agg onmodule_id, verify bucket counts -
integration_source_filtering— search with_source: ["id"], verify onlyidreturned - Each test creates UUID-named index, indexes docs, refreshes, queries, asserts, deletes index
-
build.zig— addtest-integrationstep wiring up integration test files
Deliverable: Full query DSL. Compile-time field validation works. All query types serialize to correct ES JSON. Integration tests pass against OpenSearch.
Test backend: Elasticsearch (OpenSearch)
-
CreateIndexRequest— index name, optional settings (shards, replicas), optional mappings JSON -
DeleteIndexRequest— index name -
RefreshRequest— index name (or_all) -
PutMappingRequest— index name + mapping body (JSON[]u8) -
PutAliasRequest— index name + alias name -
GetAliasRequest— alias name, returns list of indices - Each request type has a
toHttpRequest()→ method, path, body - Unit tests: verify correct HTTP method, path, and body for each
-
IndexDocRequest— index name, optional doc ID, document body (serialized viaserialize.toJson) -
GetDocRequest— index name + doc ID, returns typedTdocument -
DeleteDocRequest— index name + doc ID -
GetDocResponse(T)— wraps_index,_id,_version,found,_source: T -
IndexDocResponse— wraps_index,_id,_version,result("created"/"updated") -
DeleteDocResponse— wraps_index,_id,_version,result("deleted"/"not_found") - Unit tests: serialize/deserialize round-trips for request/response types
-
SearchRequest— index name/pattern, query (Query), optional size/from, optional source filter, optional aggs -
SearchRequest.toJsonBody(allocator)→ full{"query": {...}, "size": N, ...}body -
CountRequest— index name/pattern, optional query -
CountResponse—count: u64,_shardsinfo - Reuse
SearchResponse(T)fromdeserialize.zigfor search results - Unit tests: search body serialization with all optional fields
-
ESClient.execute(request: ElasticRequest)— dispatch tagged union to HTTP - For each variant: compute HTTP method, path, body → call
connection_pool.sendRequest - Parse response: on 2xx → deserialize typed response; on 4xx/5xx → parse
ErrorEnvelope→ returnESError - Typed return:
search→SearchResponse(T),get→GetDocResponse(T), etc. - Convenience methods on
ESClient:-
search(comptime T, index, query, opts)→SearchResponse(T) -
getDoc(comptime T, index, id)→GetDocResponse(T) -
indexDoc(comptime T, index, doc, opts)→IndexDocResponse -
deleteDoc(index, id)→DeleteDocResponse -
count(index, query)→u64 -
createIndex(index, opts)→void(or error) -
deleteIndex(index)→void(or error) -
refresh(index)→void -
putMapping(index, mapping_body)→void -
putAlias(index, alias)→void
-
- Replace placeholder structs with real request types from
api/modules -
ElasticRequestvariants carry actual data, not empty structs -
ElasticRequest.toHttpMethod()→[]const u8 -
ElasticRequest.toPath(allocator)→[]u8 -
ElasticRequest.toBody(allocator)→?[]u8
-
integration_create_delete_index— create index with settings, verify exists, delete, verify gone -
integration_index_get_doc— index a Concept doc, get by ID, verify all fields -
integration_delete_doc— index a doc, delete it, verify 404 on get -
integration_search_with_query— index docs, search with term query via client, verify hits -
integration_count— index docs, count with/without query filter -
integration_refresh— index doc, refresh, verify searchable -
integration_put_mapping— create index, add mapping field, verify accepted -
integration_put_alias— create index, add alias, search via alias -
integration_index_without_id— index doc without explicit ID, verify auto-generated ID returned -
integration_error_index_not_found— search on non-existent index, verifyIndexNotFounderror - Each test creates UUID-named index, performs operation, asserts, cleans up
-
build.zig— addapi_integration.zigtotest-integrationstep
Deliverable: Complete CRUD surface. All operations go through ESClient.execute or
typed convenience methods. Error responses parsed into ESError. Integration tests
verify every operation against OpenSearch.
Test backend: Elasticsearch (OpenSearch)
-
BulkConfigstruct —max_docs,max_bytes,flush_interval_msthresholds -
BulkIndexerstruct — batches documents and flushes to the_bulkendpoint -
BulkIndexer.init(allocator, client, config)— creates indexer tied to anESClient -
BulkIndexer.deinit()— frees internal buffer, does NOT auto-flush (caller must flush first) -
BulkIndexer.add(index, id, doc)— serialize doc to JSON, append NDJSON action+source lines -
BulkIndexer.addRaw(index, id, json_bytes)— append pre-serialized JSON (no double-serialize) -
BulkIndexer.addDelete(index, id)— append delete action (no source line) - Auto-flush:
addtriggers flush whenmax_docsormax_bytesthreshold is exceeded -
BulkIndexer.flush()— send buffered NDJSON toPOST /_bulk, parseBulkResponse, reset buffer -
BulkIndexer.pendingCount()— number of buffered actions not yet flushed -
BulkIndexer.pendingBytes()— byte size of the buffered NDJSON payload - Flush returns
BulkResultwithtotal,succeeded,failed,items(per-action results) - Buffer is a single
ArrayList(u8)— NDJSON lines appended contiguously, no per-doc allocation - Action line format:
{"index":{"_index":"<idx>","_id":"<id>"}}\n(or{"delete":...}) - Unit tests: add docs, verify pending counts, verify NDJSON format, verify auto-flush threshold
-
appendActionLine(writer, action, index, id)— write the action metadata JSON line -
appendSourceLine(writer, json_bytes)— write the source document JSON line + newline - Action types:
index,create,delete(update deferred to M7+) - NDJSON must end with a trailing newline (ES requirement)
- No heap allocation per document — all writes go into the shared
ArrayList(u8)buffer - Unit tests: verify NDJSON output matches ES spec for various action types
-
ESClient.bulkIndexer(config)— convenience to create aBulkIndexerbound to this client -
BulkIndexer.flush()usesESClient.rawRequest("POST", "/_bulk", ndjson_body)internally - Response body parsed via existing
parseBulkResponsefromsrc/api/bulk.zig - On partial failure:
BulkResult.failed > 0but no error returned (caller inspects items) - On transport error: propagate the error from
rawRequest
-
BulkResultstruct —total: usize,succeeded: usize,failed: usize,took_ms: u64 -
BulkResult.items— optional[]BulkItemResultfor per-action inspection -
BulkResult.hasFailures()→bool -
BulkResult.failedItems()— iterator/slice over only failed items -
BulkResult.deinit()— frees the underlyingBulkResponsearena - Unit tests: verify result counts, hasFailures, failedItems filtering
-
integration_bulk_index_basic— bulk index 10 docs, flush, verify all created, search to confirm -
integration_bulk_auto_flush— set max_docs=5, add 7 docs, verify auto-flush at 5, manual flush for remaining 2 -
integration_bulk_mixed_actions— mix index + delete actions in one bulk, verify results -
integration_bulk_large_batch— index 1000 docs in one flush, verify count matches -
integration_bulk_byte_threshold— set max_bytes low, verify auto-flush triggers on size -
integration_bulk_empty_flush— flush with no pending docs, verify no error and 0 results -
integration_bulk_partial_failure— index to a read-only index or with bad mapping, verify partial failures reported - Each test creates UUID-named index, performs operations, asserts, cleans up
-
build.zig— addbulk_integration.zigtotest-integrationstep
- Benchmark harness: index N docs via
BulkIndexer, measure wall-clock time - Target: >50K docs/sec on localhost against OpenSearch
- Configurable: doc count, batch size, doc size
- Print throughput (docs/sec) and latency (ms per flush)
-
build.zig— addbenchbuild step
Deliverable: BulkIndexer handles batching, NDJSON serialization, auto-flush on
thresholds, and per-action failure reporting. Can drive RF2 import workloads at
50K docs/sec. Integration tests verify all flush modes against OpenSearch.
Test backend: Elasticsearch (OpenSearch)
-
ScrollSearchRequeststruct — wraps aSearchRequestwithscrollkeep-alive duration (e.g."1m") -
ScrollSearchRequest.httpMethod()→"POST" -
ScrollSearchRequest.httpPath(allocator)→"/<index>/_search?scroll=<duration>" -
ScrollSearchRequest.httpBody(allocator)→ delegates to innerSearchRequest.httpBody() -
ScrollNextRequeststruct — holdsscroll_id: []const u8andscroll: []const u8(keep-alive) -
ScrollNextRequest.httpMethod()→"POST" -
ScrollNextRequest.httpPath(allocator)→"/_search/scroll" -
ScrollNextRequest.httpBody(allocator)→{"scroll": "<duration>", "scroll_id": "<id>"} -
ClearScrollRequeststruct — holdsscroll_id: []const u8 -
ClearScrollRequest.httpMethod()→"DELETE" -
ClearScrollRequest.httpPath(allocator)→"/_search/scroll" -
ClearScrollRequest.httpBody(allocator)→{"scroll_id": "<id>"} -
ScrollSearchResponse(T)— extendsSearchResponse(T)with_scroll_id: ?[]const u8 - Unit tests: verify HTTP method, path (with scroll param), body for each request type
-
ScrollIterator(T)struct — generic over document typeT - Fields:
allocator,pool: *ConnectionPool,compression: bool,scroll_duration,scroll_id,current_page,done: bool -
ScrollIterator.init(allocator, pool, compression, index, query, opts, scroll_duration)— sends initial_search?scroll=request, parses first page -
ScrollIterator.next()→?[]const Hit(T)— returns next page of hits, ornullwhen exhausted - On each
next(): sendsPOST /_search/scrollwith currentscroll_id, parses response, updatesscroll_id - Returns
nullwhenhits.hitsis empty (no more results) -
ScrollIterator.deinit()— sendsDELETE /_search/scrollto clear server-side scroll context, frees memory - Memory cap: only one page of hits is live at a time; previous page is freed on
next() - Error handling: transport errors propagate; on
deinitclear-scroll errors are silently ignored - Unit tests: mock-free design tests for request construction
-
PitOpenRequeststruct — holdsindex: []const u8andkeep_alive: []const u8(e.g."5m") -
PitOpenRequest.httpMethod()→"POST" -
PitOpenRequest.httpPath(allocator)→"/<index>/_search/point_in_time?keep_alive=<duration>"(OpenSearch-compatible) -
PitOpenRequest.httpBody(allocator)→null(no body needed) -
PitOpenResponsestruct —pit_id: []const u8 -
PitCloseRequeststruct — holdspit_id: []const u8 -
PitCloseRequest.httpMethod()→"DELETE" -
PitCloseRequest.httpPath(allocator)→"/_search/point_in_time"(OpenSearch-compatible) -
PitCloseRequest.httpBody(allocator)→{"pit_id": "<id>"} -
PitSearchRequeststruct — search with PIT context:pit_id,keep_alive,query,size,search_after,sort -
PitSearchRequest.httpMethod()→"POST" -
PitSearchRequest.httpPath(allocator)→"/_search"(no index in path when using PIT) -
PitSearchRequest.httpBody(allocator)→{"pit": {"id": "...", "keep_alive": "..."}, "query": {...}, "size": N, "sort": [...], "search_after": [...]} -
PitSearchResponse(T)— extendsSearchResponse(T)withpit_id: ?[]const u8(refreshed PIT ID) - Unit tests: verify HTTP method, path, body for each request type; verify
search_after+sortserialization
-
PitIterator(T)struct — generic over document typeT, preferred over scroll for read-heavy queries - Fields:
allocator,pool: *ConnectionPool,compression: bool,pit_id,keep_alive,sort_fields,last_sort_values,current_page,done: bool,page_size: u32 -
PitIterator.init(allocator, pool, compression, index, query, opts)— opens PIT viaPOST /<index>/_search/point_in_time, sends initial search, parses first page -
PitIterator.next()→?[]const Hit(T)— returns next page of hits, ornullwhen exhausted - On each
next(): extractssortvalues from last hit of previous page, sendssearch_aftersearch, updatespit_id(may be refreshed by ES) - Returns
nullwhenhits.hitsis empty -
PitIterator.deinit()— sendsDELETE /_search/point_in_timeto close PIT, frees memory - Memory cap: only one page of hits is live at a time; previous page is freed on
next() - Default sort:
[{"_doc": "asc"}](most efficient for full-index scans) - Error handling: transport errors propagate; on
deinitclose-PIT errors are silently ignored
-
ESClient.scrollSearch(comptime T, index, query, opts, scroll_duration)→ScrollIterator(T)— convenience to create and initialize a scroll iterator -
ESClient.openPit(index, keep_alive)→[]u8— open a point-in-time, returns owned pit_id -
ESClient.closePit(pit_id)→void— close a point-in-time -
ESClient.pitSearch(comptime T, index, query, page_size, keep_alive)→PitIterator(T)— convenience to create and initialize a PIT iterator - Update
src/request.zig— replaced placeholderScrollRequest,ClearScrollRequest,PitOpenRequest,PitCloseRequestwith real types fromapi/scroll.zigandapi/pit.zig - Update
src/root.zig— re-exportedScrollIterator,PitIterator,ScrollSearchRequest,PitOpenRequest,PitCloseRequest,PitSearchRequest, and all related types
-
integration_scroll_basic— index 25 docs, scroll withsize=10, collect all pages, verify 25 total hits across 3 pages -
integration_scroll_with_query— index 20 docs (10 active, 10 inactive), scroll withterm(active, true), verify only 10 hits -
integration_scroll_empty_result— scroll on empty index, verify immediatenullfromnext() -
integration_scroll_auto_clear— scroll through partial results, calldeinit(), verify scroll context is cleared (no leaked server resources) -
integration_scroll_single_page— index 5 docs, scroll withsize=10, verify all returned in first page,next()returnsnull -
integration_pit_basic— index 25 docs, PIT iterate withsize=10, collect all pages, verify 25 total hits -
integration_pit_with_query— index 20 docs, PIT iterate with query filter, verify correct subset -
integration_pit_empty_result— PIT iterate on empty index, verify immediatenull -
integration_pit_auto_close— iterate partially, calldeinit(), verify PIT is closed -
integration_pit_open_close— open PIT explicitly, verifypit_idreturned, close PIT, verify no error -
integration_scroll_large_dataset— index 500 docs, scroll withsize=50, verify all 500 retrieved across 10 pages - Each test creates UUID-named index, indexes docs via
BulkIndexer, refreshes, iterates, asserts, deletes index -
build.zig— addedscroll_pit_integration.zigtotest-integrationstep
Deliverable: ScrollIterator and PitIterator page through arbitrarily large result
sets without buffering more than one page in memory. Auto-clear/close on deinit()
prevents leaked server-side resources. Both iterators use the same Hit(T) type from
SearchResponse. Integration tests verify pagination, query filtering, empty results,
and resource cleanup against OpenSearch. Can page through 500K+ concept documents.
Test backend: Elasticsearch (OpenSearch)
- Replace deterministic
backoff *= 2with full-jitter:random(0, min(cap, base * 2^attempt)) - Add
max_retry_backoff_ms: u32 = 30_000cap toClientConfigto prevent unbounded growth - Use
std.crypto.randomfor jitter (cryptographically secure, no seed needed) - Differentiate 429 vs 5xx in retry loop: 429 →
TooManyRequests, 503 →ClusterUnavailable - On 429, use
Retry-Afterheader from response if present (seconds), fall back to jittered backoff - Unit tests: verify backoff values are within expected range, verify cap is respected
- Add
dead_since: ?i64 = nullfield toNode— timestamp (ms) when marked unhealthy - Add
resurrect_after_ms: u32 = 60_000toClientConfig— minimum time before retrying a dead node - In
markUnhealthy: setdead_since = std.time.milliTimestamp() - In
nextNode: if all nodes are unhealthy, check if any node'sdead_since + resurrect_after_ms < now; if so, try that node (give it a chance to recover) - On successful request to a resurrected node, clear
dead_sinceand mark healthy - Unit tests: verify dead nodes are skipped, verify resurrection after timeout, verify healthy-on-success
- Wire existing
ClientConfig.basic_auth("user:password") into pool asAuthorization: Basic <base64>header - Add
api_key: ?[]const u8 = nulltoClientConfig— API key auth (Authorization: ApiKey <key>) -
basic_authandapi_keyare mutually exclusive — if both set,basic_authtakes precedence - Auth header is added via
extra_headerson every request insendRequest - Base64 encoding uses
std.base64.standard.Encoder - Unit tests: verify correct
Authorizationheader for basic auth, API key, and no-auth cases
- Add
scheme: []const u8 = "http"toClientConfig(values:"http"or"https") - Use
config.schemeinstead of hardcoded"http"inConnectionPool.init -
std.http.Clienthandles TLS natively forhttps://URIs — no extra code needed - Add
ESClient.initFromUrl(allocator, url_string)convenience — parseshttp://host:portorhttps://host:portinto ClientConfig - Unit tests: verify URL parsing for http and https schemes
- Define
LogLevelenum:debug,info,warn,err - Define
LogEventtagged union with variants:request_start: { method, path }— before sendingrequest_success: { method, path, status_code, duration_ms }— on 2xxrequest_retry: { method, path, attempt, status_code, backoff_ms }— on retryable errorrequest_error: { method, path, status_code, error_type }— on non-retryable errornode_unhealthy: { host, port }— when a node is marked deadnode_recovered: { host, port }— when a dead node comes back
- Add
log_fn: ?*const fn (LogEvent) void = nulltoClientConfig - Call
log_fnat appropriate points insendRequest(before request, on success, on retry, on error, on node state change) - No-op when
log_fnisnull— zero overhead in the default case - Unit tests: verify log events are emitted in correct order for success/retry/error scenarios
- Verify all integration tests run under
std.testing.allocator(GPA in debug) — already the case - Add explicit
std.heap.GeneralPurposeAllocatorusage to the benchmark harness (bench/bulk_bench.zig) to catch leaks in hot paths - Audit
ScrollIterator.deinit()andPitIterator.deinit()for leaks when partially consumed - Audit
BulkIndexerfor leaks on error paths (flush failure mid-batch) - Audit
ESClientconvenience methods for leaks whenhandleErrorResponseis called - Document any known leak-safe patterns in CLAUDE.md conventions section
-
integration_basic_auth— configure client withbasic_auth, ping cluster, verify success (OpenSearch accepts any auth on unauthenticated clusters) -
integration_retry_success— verify client retries and succeeds (index doc, search immediately — tests the retry path naturally) -
integration_node_failover— add a fake dead node + real node, verify requests still succeed via the healthy node -
integration_node_recovery— mark a node unhealthy, verify it's skipped, wait for resurrect timeout, verify it's retried -
integration_logging_events— configure log_fn, perform operations, verify events are emitted - Each test uses UUID-named index, cleans up after itself
-
build.zig— addhardening_integration.zigtotest-integrationstep
Deliverable: Production-ready transport layer with jittered backoff preventing thundering herd on 429/503, automatic node health recovery, HTTP Basic and API key authentication, HTTPS support via std.http.Client's native TLS, and structured logging hooks for observability. All existing tests continue to pass. Memory safety verified under GPA.
Goal: Make elaztic a first-class, discoverable, well-documented open-source
Zig package that users can install with a single zig fetch command.
Reference libraries studied:
karlseguin/pg.zig— README-as-docs pattern, standalone example project,build.zig.zonstructure, API reference inline in READMEelastic/elasticsearch-rs— progressive disclosure (zero-config → URL → auth), compatibility matrix, module-level doc tutorial, escape-hatch pattern- zigistry.dev — auto-indexed via GitHub
zig-packagetopic
Current state entering M8:
- 167 unit tests + 44 integration/smoke tests = 211 total, all passing, zero leaks
- CI already exists (
.github/workflows/ci.yml+release.yml) build.zig.zonexists but needs version + paths update- No
README.md, noexamples/directory, nojustfile - Doc comments already present on most public symbols
- License: AGPL-3.0
- Audit every
pubsymbol insrc/root.zig— ensure///doc comment present - Audit
src/client.zig— every public method onESClienthas///with:- One-line summary
- Parameter descriptions (what each arg does, default behaviour)
- Return value description (what the caller receives, who owns the memory)
- Error conditions (which errors from
ESErrorcan be returned and when) - Example usage snippet where non-obvious
- Audit
src/pool.zig—ConnectionPool,Node,HttpResponse,LogEvent,LogLevel - Audit
src/error.zig—ESErrorvariants,ErrorEnvelope,parseErrorEnvelope - Audit
src/request.zig—ElasticRequestunion and all variants - Audit
src/api/document.zig— all request/response types and their methods - Audit
src/api/index_mgmt.zig— all request types and their methods - Audit
src/api/search.zig—SearchRequest,CountRequest,SearchOptions, responses - Audit
src/api/bulk.zig—BulkResponse,BulkItemResult,parseBulkResponse - Audit
src/api/bulk_indexer.zig—BulkIndexer,BulkConfig,BulkResult, all methods - Audit
src/api/scroll.zig— all request/response types,ScrollIteratorand its methods - Audit
src/api/pit.zig— all request/response types,PitIteratorand its methods - Audit
src/query/builder.zig—Querynamespace, every query constructor - Audit
src/query/field.zig—FieldPath,field()function - Audit
src/query/aggregation.zig—Aggregationnamespace, all aggregation constructors - Audit
src/query/source_filter.zig—SourceFilterand its variants - Audit
src/json/serialize.zig— all public serialization functions - Audit
src/json/deserialize.zig— all public deserialization functions,TotalHits,Hit,HitsEnvelope,SearchResponse - Add module-level
//!doc comments to every file that lacks them (one-line summary of what the module provides) - Verify: every
deinit()method documents what memory it frees - Verify: every function returning allocated memory documents caller-owns semantics
- Expand the top-level
//!doc comment insrc/root.ziginto a full tutorial (following the Rust ES client'slib.rspattern)://! # elaztic— title//! ## Overview— one paragraph: what this library is, what ES versions it targets//! ## Compatibility— ES 7.x / 8.x, tested against OpenSearch//! ## Quick Start— progressive examples:- Zero-config:
ESClient.init(allocator, .{})→ping() - Custom URL:
ESClient.initFromUrl(allocator, "http://es:9200") - With auth:
ESClient.init(allocator, .{ .basic_auth = "user:pass" })
- Zero-config:
//! ## Query DSL— comptime field validation example (the key differentiator)//! ## Bulk Indexing—BulkIndexerexample//! ## Scrolling Large Result Sets—ScrollIterator/PitIterator//! ## Error Handling—ESErrorswitch example, retry semantics//! ## Memory Ownership— who owns what,deinit()patterns
- Keep existing re-exports unchanged — only expand the
//!header
The README is the primary documentation surface (Zig ecosystem convention: README = docs).
Follows the pg.zig pattern of exhaustive inline API docs.
- Header section:
- Title:
# elaztic - One-liner:
A production-grade Elasticsearch client library for Zig. - Badges: Zig version, license (AGPL-3.0), CI status, GitHub stars
- One paragraph: what it is, ES 7.x/8.x target, comptime field validation as key feature
- Title:
- Compatibility section:
- Table: elaztic version × ES version × OpenSearch version
- Note: tested against OpenSearch (Apache 2.0 fork, wire-compatible with ES 7.x REST API)
- Note: HTTP/1.1 only (no HTTP/2)
- Note: Zig 0.15.2+ required (tracked via
minimum_zig_versioninbuild.zig.zon)
- Install section (two steps, following pg.zig pattern):
- Step 1:
zig fetch --save git+https://github.com/<owner>/elaztic - Step 2:
build.zigsnippet showingb.dependency("elaztic", ...).module("elaztic") - Note about Nix:
nix developfor reproducible toolchain
- Step 1:
- Quick Start section (progressive disclosure, following Rust ES client pattern):
- Example 1: Connect + ping (zero config, localhost:9200)
- Example 2: Index a document + get it back
- Example 3: Search with comptime-validated query DSL
- Each example is self-contained with imports,
main(), error handling
- Query DSL section (the key selling point — lead with it prominently):
- Comptime field path example:
field(Concept, "active")vs compile error on typo Query.term,Query.bool,Query.range,Query.matchexamples- Nested bool query example
- Aggregation example
- Source filtering example
- Comptime field path example:
- Document CRUD section:
indexDoc— with and without explicit IDgetDoc— typed responsedeleteDoccreateIndex/deleteIndex/refresh/putMapping/putAlias
- Bulk Indexing section:
BulkIndexerlifecycle: init → add → flush → deinit- Auto-flush on
max_docs/max_bytesthresholds BulkResultinspection:hasFailures(),failedItems()- Performance note: >50K docs/sec target
- Scrolling & Point-in-Time section:
ScrollIteratorexample: init →while (iter.next())loop → auto-clear ondeinit()PitIteratorexample: same pattern, preferred for read-heavy queries- When to use scroll vs PIT
- Memory guarantee: one page in memory at a time
- Configuration section:
- Full
ClientConfigfield reference with defaults initFromUrlfor URL-based config- Auth:
basic_authvsapi_key(mutually exclusive, basic takes precedence) - TLS:
scheme = "https"(std.http.Client handles TLS natively) - Retry:
retry_on_failure,retry_backoff_ms,max_retry_backoff_ms - Node recovery:
resurrect_after_ms - Logging:
log_fncallback withLogEventvariants - Compression:
compression = true(gzip)
- Full
- Error Handling section:
ESErrorenum — every variant documented with when it occurs- Retry semantics: 429 + 503 retried, other 4xx never retried
ErrorEnvelope— parsed from ES JSON error responses- Example: catching
IndexNotFoundvsVersionConflict
- Memory Ownership section:
- Rule: caller owns memory returned by the library
deinit()patterns:ESClient,ClusterHealth,BulkResult,ErrorEnvelope,ScrollIterator,PitIterator- Arena allocators:
BulkResponse._arena,ErrorEnvelope._arena - All tests run under
std.testing.allocator(GPA) to catch leaks
- Building & Testing section:
nix develop— required, never install Zig globallyzig build/zig build test/zig build test-smoke/zig build test-integrationes-start/es-stopfor OpenSearchES_URL=http://localhost:9200environment variablezig build benchfor throughput benchmarks
- License section:
- AGPL-3.0 — link to LICENSE file
Standalone example project with its own build.zig + build.zig.zon (following
the pg.zig pattern — proves the library is consumable as a dependency).
- Create
examples/directory -
examples/build.zig.zon— standalone manifest declaringelazticas a path dependency:.dependencies = .{ .elaztic = .{ .path = ".." } } -
examples/build.zig— builds each example as a separate executable, each importingelaztic -
examples/basic_search.zig— Complete, runnable example:- Connect to localhost:9200
- Create a UUID-named index
- Define a
Conceptstruct withid: u64,active: bool,module_id: u64,term: []const u8 - Index 5 sample SNOMED-like concepts
- Refresh the index
- Search with
Query.bool+Query.term(field(Concept, "active"), true)+Query.range(field(Concept, "module_id")).gte(900000000000207008) - Print results
- Delete the index
- Proper error handling and
defercleanup throughout
-
examples/bulk_index.zig— Bulk indexing example:- Connect to localhost:9200
- Create index
- Create
BulkIndexerwithmax_docs = 500 - Index 1000 documents in a loop
- Show auto-flush behaviour
- Final manual
flush() - Print
BulkResultstats (total, succeeded, failed, took_ms) - Delete index
-
examples/scroll_large.zig— Scroll through large result set:- Connect to localhost:9200
- Create index, bulk-index 200 documents
- Refresh
- Create
ScrollIteratorwithsize = 50 - Page through all results, print page count and hit count per page
- Auto-clear on
deinit() - Also demonstrate
PitIteratoras alternative - Delete index
- Each example has a comment header explaining what it demonstrates
- Each example compiles and runs standalone:
cd examples && zig build run-basic-search - Verify all examples run against OpenSearch (test manually with
es-start)
- Update
.versionfrom"0.0.0"to"0.1.0"(first public release) - Add
"LICENSE"to.pathsarray (required for package distribution) - Add
"README.md"to.pathsarray (displayed by registries and zig tools) - Verify
.minimum_zig_version = "0.15.2"is correct - Verify
.name = .elazticmatches the module name inbuild.zig - Keep
.fingerprintunchanged (security/trust implications) - Remove boilerplate comments from the template (clean up for publishing)
- Verify
zig fetch --saveworks with a local path dependency
- Remove excessive template comments (keep only comments that add value)
- Verify module exposure:
b.addModule("elaztic", .{ .root_source_file = b.path("src/root.zig") }) - Verify all test steps are wired up:
test,test-smoke,test-integration,bench - Add
test-allstep that depends ontest+test-smoke+test-integration - Verify examples can be built from the examples directory
- Ensure
zig build --helpoutput is clean and descriptive (step names + descriptions) - Remove the
exe(CLI executable) build target — this is a library, not a CLI tool- Remove
src/main.zigexecutable build - Remove
runstep - Keep the
exe_testsif they test anything useful, otherwise remove - The
elazticmodule is the only thing consumers import
- Remove
- Create
justfileat project root with all commands from CLAUDE.md:just build→zig buildjust test→zig build test --summary alljust smoke→zig build test-smoke --summary all(requires ES_URL)just integration→zig build test-integration --summary all(requires ES_URL)just all→zig build test-all --summary all(requires ES_URL)just es-start→es-startjust es-stop→es-stopjust es-status→es-statusjust es-logs→tail -f .opensearch.logjust bench→zig build benchjust fmt→zig fmt src/ tests/ bench/ build.zigjust fmt-check→zig fmt --check src/ tests/ bench/ build.zigjust clean→rm -rf zig-out .zig-cache .opensearch-datajust docs→zig build-lib src/root.zig -femit-docs(if supported)just loc→ line count summary (find src/ -name '*.zig' | xargs wc -l)
- Add
justfileto.pathsinbuild.zig.zon? — No, not needed for package consumers
CI already exists and is functional. This phase hardens it.
- Review
ci.yml— verify all steps pass on current main branch - Add
zig fmt --checkto coverexamples/directory (currently onlysrc/ tests/ bench/ build.zig) - Add a dedicated "Examples Build" job:
cd examples && zig build— verifies examples compile- Does NOT run them (they need OpenSearch), but compilation proves the module import works
- Add build matrix comment documenting what each job does
- Review
release.yml:- Currently builds a CLI binary — update to package the library tarball instead
- Or remove binary release entirely (library consumers use
zig fetch, not binaries) - Create a source tarball that matches what
zig fetchwould download
- Add GitHub Actions badge to README.md:
 - Verify Nix cache is working (DeterminateSystems/magic-nix-cache-action)
- Set GitHub repo description:
Production-grade Elasticsearch client library for Zig. Comptime-validated query DSL. ES 7.x/8.x. - Add GitHub topics:
zig-package,elasticsearch,opensearch,zig,search,database-clientzig-packageis required for zigistry.dev auto-indexing
- Set repository URL in
build.zig.zonor README (for discoverability) - Verify LICENSE file is AGPL-3.0 and properly detected by GitHub
- Add a
.github/FUNDING.ymlif sponsorship is desired (optional)
- Update
Changelog.mdwith M8 section:- List all files created/modified
- Document README creation, examples, build.zig.zon updates
- Include M8 checklist (following the pattern of M1–M7 entries)
- Review all milestone entries in Changelog.md for accuracy
- Add release date to the
[Unreleased]section header →[0.1.0] — YYYY-MM-DD - Create git tag
v0.1.0after all M8 work is merged - Verify
release.ymltriggers on the tag push and creates a GitHub Release - Write release notes summarizing the full M1–M8 journey:
- Transport layer with connection pooling and keep-alive
- Comptime-validated query DSL (the key innovation)
- Full CRUD operations
- Bulk indexer with auto-flush (>50K docs/sec)
- Scroll + PIT iterators for large result sets
- Production hardening (jittered backoff, node recovery, auth, TLS, logging)
- 211+ tests, zero memory leaks
- Verify
zig fetch --save git+https://github.com/<owner>/elazticworks from a fresh project - Create a minimal test project that depends on
elazticto verify the package is consumable:zig init- Add dependency
@import("elaztic")in main.zigzig buildsucceeds
- zigistry.dev — no action needed beyond adding
zig-packagetopic (Phase 9)- Zigistry auto-crawls GitHub repos with the
zig-packagetopic - Verify listing appears after push (may take a few hours)
- Zigistry auto-crawls GitHub repos with the
- Update CLAUDE.md: replace
pkg.zig.gurureference withzigistry.dev - Consider writing a short announcement post (Ziggit forum, Reddit r/zig) — optional
-
zig build test --summary all— all 167+ unit tests pass -
zig build test-smoke --summary all— smoke tests pass against OpenSearch -
zig build test-integration --summary all— all 44+ integration tests pass -
zig build bench— bulk benchmark runs, >50K docs/sec on localhost -
zig fmt --check src/ tests/ bench/ build.zig— no formatting issues -
cd examples && zig build— all examples compile -
nix build— reproducible Nix build succeeds -
nix flake check— flake checks pass - Zero memory leaks across all test suites (GPA-verified)
- README renders correctly on GitHub (check images, code blocks, badges)
-
zig fetch --savefrom a clean project succeeds - All CLAUDE.md milestone checkboxes are checked
Deliverable: elaztic v0.1.0 published as a first-class Zig package. README serves
as comprehensive documentation with progressive quickstart examples, full API reference,
and the comptime field validation story front and center. Standalone examples prove the
library is consumable. zig fetch --save works out of the box. Listed on zigistry.dev.
CI validates formatting, unit tests, smoke tests, and integration tests on every push.
211+ tests with zero memory leaks.
SNOMED concept IDs are u64 — they exceed 32-bit range. Never use i32
or u32 for concept/description/relationship IDs anywhere in the codebase.
Branch-aware query filters — every ES query gets wrapped with branch visibility filters. The query DSL must support arbitrary filter injection without breaking the builder chain.
Multi-index search — Snowstorm queries concept, description, and relationship indices simultaneously. Support index patterns.
Large ancestor arrays — SNOMED concept documents contain []u64 ancestor
arrays with thousands of entries. The deserializer must handle these without
per-element allocation.
Large terms queries — ECL produces concept ID sets passed back to ES
as terms filters. These can contain tens of thousands of IDs. The query
serializer must handle large []u64 slices efficiently.
- Query DSL serialization snapshot tests (no network)
- JSON serialize/deserialize round-trips
- Error envelope parsing against fixtures
- FieldPath compile-error validation
- Requires
ES_URL=http://localhost:9200 - Start with
es-startfrom dev shell - Validates transport and basic JSON round-trips
- Requires
ES_URL=http://localhost:9200 - Start with
es-startfrom dev shell - Each test creates and destroys its own UUID-named index
- Skipped automatically if
ES_URLis unset
just build # zig build
just test # unit tests only
just smoke # unit + smoke tests (start es-start first)
just integration # all tests including ES integration
just es-start # start OpenSearch on :9200 (from nix dev shell)
just es-stop # stop OpenSearch
just es-status # check if OpenSearch is running
just es-logs # tail OpenSearch logs
just bench # run throughput benchmarks
just fmt # zig fmt
just clean # rm -rf zig-out .zig-cache .opensearch-data
- ES REST API spec: https://www.elastic.co/docs/api/doc/elasticsearch
- ES Query DSL: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
- ES Bulk API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
- ES PIT API: https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html
- OpenSearch docs: https://opensearch.org/docs/latest/
- zio-elasticsearch (architecture reference): https://github.com/lambdaworks/zio-elasticsearch
- Snowstorm (query patterns to support): https://github.com/IHTSDO/snowstorm
- All public symbols have doc comments (
///) - All allocations are explicit — no hidden allocations in library code
- Caller owns memory returned by the library;
deinit()is always explicit - No global state —
ESClientis the root of all state - Error sets are exhaustive — no
anyerrorin public API signatures - Smoke tests are prefixed
smoke_, integration tests prefixedintegration_ - Comptime DSL errors use
@compileErrorwith human-readable messages - Benchmarks live in
bench/and are never mixed with correctness tests