The PHP wrapper supports two communication modes with the Rust core, selected automatically:
Direct in-process calls via PHP's FFI extension. Data crosses the boundary as:
- JSON strings for documents and query results
- Opaque pointer (
void*) for the engine handle int32_t/int64_tfor status codes and counts
The C API is defined in wrappers/php/src/FFI/anvildb.h, with Rust implementations in core/src/ffi.rs. See the full function list in the C API Reference.
When ext-ffi is not available, the wrapper spawns anvildb-server as a long-running subprocess and communicates via stdin/stdout using a JSON line-delimited protocol:
PHP (ProcessDriver) anvildb-server
│ │
│── {"cmd":"insert",...}\n ──>│
│<── {"ok":true,"data":...}──│
│ │
Each command is a single JSON line on stdin; each response is a single JSON line on stdout. The process stays alive for the lifetime of the AnvilDb instance and is terminated on close()/shutdown() or __destruct().
DriverFactory selects the driver automatically:
- If env var
ANVILDB_DRIVER=ffiorprocess→ use that driver explicitly - If
ext-ffiis loaded →FFIDriver - Otherwise →
ProcessDriver
AnvilDb / Collection / QueryBuilder
│
DriverInterface
/ \
FFIDriver ProcessDriver
│ │
Bridge.php anvildb-server (stdin/stdout)
│ │
libanvildb.so Rust Engine
In FFI mode, Rust-allocated strings returned to PHP must be freed with anvildb_free_string(). The FFIDriver handles this automatically. On PHP 8.4+, FFI may return native PHP strings instead of CData pointers — the driver handles both cases.
In Process mode, memory management is handled entirely by the server process — no manual cleanup needed on the PHP side.
anvildb_open(path, key)— creates anEngineinstance, discovers collections (lazy), boxed as*mut Engine- All operations receive the engine handle
anvildb_close(handle)— reconstructs theBox<Engine>and drops itanvildb_shutdown(handle)— flushes all write buffers before close (viaDrop)
Collections are discovered on open() but not loaded from disk. Each collection starts as LazyCollection::Unloaded and transitions to LazyCollection::Loaded on first access via ensure_loaded(). This uses a double-check locking pattern: read lock to check, write lock to load.
- Process level:
RwLockaround the collections map protects in-memory state
Collections are stored as compressed binary files in data/collections/{name}.anvil. The codec pipeline:
- Write: NDJSON bytes → deflate compress → (optional) AES-256-GCM encrypt → disk
- Read: disk → (optional) decrypt → decompress → parse NDJSON →
Vec<Value>
All writes are full rewrites via atomic temp file + rename. A metadata.json in the DB root tracks the format version and encryption state.
The buffer (core/src/buffer.rs) tracks which collections have pending (unflushed) writes as a dirty set. Documents are visible immediately in queries (they're in Collection.documents), but disk writes are batched:
- Threshold flush: when a collection's dirty count reaches
max_docs(default 100), it's rewritten synchronously - Timer flush: a background thread rewrites all dirty collections every
flush_interval_secs(default 5s) - Drop/shutdown: the
Dropimpl stops the thread and flushes remaining dirty collections
The codec (core/src/storage/codec.rs) handles all data encoding/decoding:
- Compression: always active, using
miniz_oxide(pure Rust deflate). Reduces file sizes 5-10x for typical JSON data. - Encryption: opt-in AES-256-GCM via
aes-gcm(pure Rust). Each file gets a unique 12-byte random nonce prepended to the ciphertext. Key is a 32-byte value passed as 64-char hex string through the FFI boundary.
The query engine (core/src/query/engine.rs) supports INNER and LEFT joins via hash join:
- Build a
HashMapon the right collection's join field — O(m) - Probe each left document against the map — O(1) per doc
- Merge matched documents with prefixed field names
- Apply filters, sort, limit/offset on the merged result set
Multiple joins are applied sequentially (left-to-right).
- Hash:
HashMap<String, Vec<usize>>— equality lookups - Unique:
HashMap<String, usize>— equality with uniqueness enforcement - Range:
BTreeMap<String, Vec<usize>>— ordered lookups (>, <, >=, <=, between)
Indexes are persisted to data/indexes/{collection}_{field}.idx.anvil (compressed, optionally encrypted) and loaded into memory on first access.