This file provides guidance to coding agents collaborating on this repository.
Vizier is a database advisor and finetuner for DuckDB. It is implemented as a DuckDB extension in Zig and C. Priorities, in order:
- Correct, trustworthy recommendations backed by transparent scoring.
- Clean SQL API that feels natural to DuckDB users.
- Clear separation between C (DuckDB API glue) and Zig (business logic).
- Maintainable and well-tested code.
- Use English for code, comments, docs, and tests.
- Keep global mutable state minimal; currently limited to the capture buffer and persistent connection (
g_flush_conn) inextension.c. - Prefer small, focused changes over large refactoring.
- Add comments only when they clarify non-obvious behavior.
- Do not add features, error handling, or abstractions beyond what is needed for the current task.
- Use Oxford commas in inline lists: "a, b, and c" not "a, b, c".
- Do not use em dashes. Restructure the sentence, or use a colon or semicolon instead.
- Avoid colorful adjectives and adverbs. Write "TCP proxy" not "lightweight TCP proxy", "scoring components" not "transparent scoring components".
- Use noun phrases for checklist items, not imperative verbs. Write "redundant index detection" not "detect redundant indexes".
- SQL keywords in prose should be backticked:
`JOIN`,`EXPLAIN`,`CREATE INDEX`. - Headings in Markdown files must be in the title case: "Build from Source" not "Build from source". Minor words (a, an, the, and, but, or, for, in, on, at, to, by, of, is, are, was, were, be) stay lowercase unless they are the first word.
src/extension.c: C entry point, DuckDB function registration, capture/flush logic. Usesadd_pending_capture()helper for all capture paths andREGISTER_TABLE_FUNC_0macro for no-parameter table function registration.src/lib.zig: Root Zig module with C-exported functions. Key exports:zig_run_all_advisors(conn)orchestrates all advisors,zig_extract_and_store(conn, sig, sql)populates predicates/JSON,zig_normalize_sql()for query normalization.src/lib_test.zig: Test harness that pulls in all inline module tests.src/duckdb.zig: Auto-generated DuckDB C API bindings (do not edit manually).src/vizier/schema.zig: DDL for metadata tables and views.src/vizier/capture.zig: Query hashing and INSERT SQL building.src/vizier/extract.zig: Heuristic SQL tokenizer and predicate/table extractor (pure Zig, no DB deps). Captures both sides of JOIN conditions.src/vizier/sql_runner.zig: Connect/query/disconnect wrapper usingduckdb_ext_api. AlsorunOnConn()for reusing an existing connection.src/vizier/inspect.zig: Table reference parsing (pure Zig). Theinspect_tablemacro is defined inschema.zig.src/vizier/advisor.zig: All advisor SQL queries (index, sort, redundant index, parquet layout, summary table, join-path, no-action), theall_advisor_sqlsarray for orchestration, and recommendation helper functions.src/vizier/summary.zig: Workload summary view SQL definition.tests/property_tests.zig: Property-based tests (using the Minish framework).tests/integration_tests.zig: Integration tests that spawn DuckDB and validate output.tests/sql/: Standalone SQL test files runnable withduckdb -unsigned -c ".read file.sql".benches/: SQL-based benchmarks with timing metrics. Run viamake benchor./benches/run.sh.scripts/: Build tooling (metadata appender script).build.zig/build.zig.zon: Zig build configuration and dependencies.Makefile: GNU Make wrapper aroundzig build.
extension.cowns the DuckDB entry point (DUCKDB_EXTENSION_ENTRYPOINT), function registration, and table function callbacks (bind/init/execute).- Zig modules under
src/vizier/own business logic: schema DDL, hashing, SQL construction, and advisor orchestration. - C calls Zig via
externfunctions exported withcallconv(.c). Advisor execution is fully delegated tozig_run_all_advisors(conn). - Zig calls DuckDB API functions through the
duckdb_ext_apiglobal struct (not directextern fnfromduckdb.zig).
These are hard constraints discovered during development:
- No new connections from table function callbacks after init. Opening
duckdb_connectfrom callbacks causes segfaults or stale handles. Use the persistent connection (g_flush_conn) opened during init, or use SQL VIEWs/MACROs that run on the user's connection. duckdb_value_*functions are not induckdb_ext_api_v1. Useduckdb_fetch_chunk+ vector operations to read query results.- Type mismatches cause segfaults.** When copying vector data between chunks, the source column types must match exactly (e.g., INT32 vs BIGINT). Use explicit casts in SQL queries if needed.
duckdb_queryexecutes only one statement.** Multi-statement strings silently ignore everything after the first semicolon.access->get_database(info)may return a temporary pointer. Heap-copy theduckdb_databasevalue if storing it beyond the entry point scope.- Check the validity bitmap before reading scalar query results. A NULL cell's
duckdb_string_tbytes are undefined; callingduckdb_string_t_length/_dataon it reads garbage and is non-deterministic (manifests as test flakiness under a concurrent load). Useduckdb_vector_get_validity+duckdb_validity_row_is_validfirst.
Queries are buffered in a global C array (g_pending), not written to DB immediately.
vizier_flush() uses a persistent connection (g_flush_conn) opened during extension init to write all pending captures.
vizier.workload_summary and vizier.inspect_table() are SQL VIEWs/MACROs, not table functions, because they run on the user's connection and avoid
snapshot isolation issues.
Multiple capture methods feed into the same g_pending buffer:
vizier_capture(sql): single query capture.vizier_capture_bulk(table, column): reads SQL from a table column viag_flush_conn.vizier_start_capture()/vizier_session_log(sql)/vizier_stop_capture(): session-based capture usingvizier.session_logtable.vizier_import_profile(path): reads from DuckDB JSON profiling output viaread_json_auto.
- Zig code uses the
duckdbmodule (imported viabuild.zig) for type definitions. - Zig code accesses DuckDB functions via
extern var duckdb_ext_api: duckdb.duckdb_ext_api_v1. - C code uses
-std=c11and includesduckdb_extension.h. - C formatting is handled by
clang-format; Zig formatting byzig fmt. - SQL keywords must be lowercase. Write
select,from,where,create table, notSELECT,FROM,WHERE,CREATE TABLE. This applies to all SQL strings in Zig modules, C source, and generated recommendation SQL.
Run make test for any change. This runs all three test suites:
| Target | Command | What it runs |
|---|---|---|
| Unit + regression | make test-unit |
Inline test blocks in src/vizier/*.zig |
| Property-based | make test-property |
tests/property_tests.zig with Minish |
| Integration | make test-integration |
tests/integration_tests.zig (spawns DuckDB, validates output) |
| All | make test |
Runs all of the above |
For interactive exploration: make duckdb.
- Read
src/extension.cand the relevantsrc/vizier/*.zigmodule. - Implement the smallest possible change.
- Add or update inline
testblocks in the changed Zig module. Add integration tests intests/integration_tests.zigif SQL behavior changed. - Run
make test. - Verify interactively with
make duckdbif needed.
Good first tasks:
- Add a new metadata column to a vizier table (update DDL in
schema.zig). - Add a new advisor in
advisor.zig(define SQL, add toall_advisor_sqlsarray; it auto-registers). - Add a new scalar function (follow the
vizier_versionpattern inextension.c). - Add a new table macro (add SQL constant in
schema.zig, register inddl_statementsarray).
- Unit/regression tests live as inline
testblocks in the module they cover (src/vizier/*.zig).src/lib_test.zigis just a harness that pulls them in. No DuckDB runtime is required. - Property-based tests live in
tests/property_tests.zigusing the Minish framework. Use these to fuzz parsers and verify invariants (like the "tokenizer never crashes on arbitrary input"). - Integration tests live in
tests/integration_tests.zig. They spawnduckdbas a child process, load the extension, run SQL, and assert on the output. Add a test here for any new SQL-visible function. - No SQL-facing change is complete without an integration test.
- Integration-test assertions must be uniquely attributable. The harness checks for substrings in CSV output, so
"ok","1","2", or"0."collide with capture status rows, hex signatures, timestamps, and captured SQL literals. Emit a distinctive marker from the finalselect, for exampleselect 'check_count=' || count(*) from ..., and assert on"check_count=2". Save bare-substring assertions for values that are already unique (recommendation kinds, reason fragments, specific column names).
Before coding:
- Identify whether the change touches C callbacks, Zig logic, or both.
- Check if new DuckDB API functions are needed and verify they exist in
duckdb_ext_api_v1. - Consider the capture or flush constraint if the change involves writing data from callbacks.
Before submitting:
make testpasses (all three suites).- Docs updated if the API surface changed.
- Keep commits scoped to one logical change.
- PR descriptions should include:
- Behavioral change summary.
- Tests added/updated.
- Interactive verification is done (yes/no).
- Docs updated (yes/no).
Suggested PR checklist:
- Inline unit tests added/updated for logic changes
- Integration test added for new SQL-visible functions
-
make testpasses (unit + property + integration) - Docs/README updated (if API surface changed)