Duckle v0.5.0 - Data Quality, Governance & Time-Travel
The biggest release since the platform went cross-OS. v0.5.0 turns Duckle into a
governance-grade studio: a full data-quality / MDM / governance component pack,
time-travel data diffing on DuckLake, column-level lineage, a browser-based
management console for running and monitoring pipelines, and high-throughput
bulk writes to SQL Server. Plus a first-run guided tour, a redesigned console,
and a fix for the startup console flash some Windows users hit.
Local-first as ever: one executable, no servers, no JVM, no control plane. The
component catalog now stands at 348.
Highlights
- Web Management Console (
duckle serve) - operate every pipeline from a browser: job-grouped overview, run history, logs, and a built-in interval scheduler. One-click "Open web dashboard" button in the desktop app. (#75) - Time-Travel + Data Diff - read any DuckLake table "AS OF" a snapshot/timestamp, browse snapshots, and diff two snapshots to see exactly what changed, with an AI-readable change summary.
- Data Quality / Governance / MDM pack - 17 new components for masking, survivorship, match-grouping, expectations, referential integrity, profiling, reconciliation, classification, SCD3, outliers, sessionization, freshness, contracts, and surrogate keys.
- Bulk SQL Server writes via the DuckDB
mssqlcommunity extension (TDS, COPY/INSERT) instead of row-by-row inserts. (#86) - Column-level lineage resolver across the whole pipeline (provenance foundation).
- Guided tour + redesigned console and a fix for the Windows startup console flash.
New features
Web Management Console (#75)
- Launch the console from a terminal with
duckle serve --workspace <path>(the desktop binary delegates to the embedded headless runner;duckle-runner servealso works for the standalone runner). Zero-dependency std-only HTTP, embedded HTML, no extra binary on the release. - Overview grouped by job: every pipeline as a card with last status, schedule, last-run time, duration, run-count and success-rate, with an expandable per-job run history and logs.
- Runs tab: a full execution timeline across all pipelines with status / period / search filters.
- Schedules tab: enable an interval per pipeline; the console runs due schedules itself while open.
- Run any pipeline from the browser; results, the compiled plan and logs are all reachable.
- Desktop integration: a green-glowing Open web dashboard button in the top bar boots the console and opens it in your browser. (e0a1b63, 0bcc63a)
Time-travel & Data Diff (DuckLake)
- AS OF reads: query a DuckLake table at a specific snapshot version or timestamp.
- Snapshot inspector data source (
ducklake_snapshots) to list a table's history. - Browse Snapshots picker for the AS OF field, so you can pick a point in time visually.
- Data Diff node (
src.ducklake.diff): the change feed between two snapshots (inserts / updates / deletes). - AI diff summary (
xf.diffsummary): a human-readable summary of what changed between snapshots.
Data Quality, Governance & MDM component pack
17 new engine components (with full property manifests), no external service or LLM required:
- Masking / anonymization - in-place column masking for governance.
- Survivorship - golden-record survivorship (the MDM merge step).
- Match groups, expectations, and advanced sampling.
- Referential integrity - cross-input orphan / foreign-key checks.
- Advanced profiling, record linkage, reconciliation, and classification.
- Data contracts - a contract gate that fails a run on violation.
- Surrogate keys and labeled bucketize.
- SCD type 3, outlier detection (with a reject port), sessionization, and freshness checks.
Bulk SQL Server writes (#86)
snk.sqlserver/snk.synapsenow bulk-load through the DuckDBmssqlcommunity extension (pure TDS,ATTACH+COPY/INSERT) instead of row-by-row driver inserts - a large speedup on big loads.- On by default (
bulk: true); setbulk: falseto keep the fully offline row-by-row driver. Live-verified against SQL Server 2022.
Column-level lineage
- A column-level lineage resolver with cross-stage stitching back to root sources, exposed as
Engine::pipeline_column_lineage- the foundation for impact analysis and provenance.
Guided tour & console UX
- First-run guided tour: a spotlight walkthrough of the palette, canvas, properties, Run, and the web dashboard. Skippable on every step and replayable from Settings -> Replay guided tour.
- Redesigned management console with the real Duckle brand mark, status pills, and metric cards.
Enhancements
- #82 - bulk column rename from an external map file (JSON / CSV / YAML) in
xf.rename. - #83 -
src.csvfilename passthrough and extra DuckDB CSV read-options. - #84 - auto-load the spatial extension in the SQL Template when the SQL uses spatial functions over a CSV.
- #10 - per-column date / timestamp format in
xf.cast. - #76 - per-source
ATTACHaliases and Auto live-view predicate pushdown for multiple duck sources in one pipeline. - #85 - Simplified Chinese (zh-CN) i18n update merged. Thanks to the contributor.
Bug fixes
- #39 - "merge" write mode now resolves its input columns through an upstream transform, instead of erroring when a transform sits between source and sink.
- #7 - SQL/code export now documents control-flow steps instead of emitting empty stages.
- Set operations -
INTERSECT/EXCEPTno longer produce a parser error;DISTINCT ONordering is deterministic. - Secret redaction - connection secrets are stripped from run errors, run history, and NDJSON logs.
code.javascript- preserves 64-bit integers (no BIGINT to DOUBLE precision loss).- Sink nodes - removed the meaningless "View" materialize option (it caused a false-negative).
- Startup console flash (Windows) - the app no longer shells
gitduring launch (the CI-status badge defers its first poll), and the git invocation is hardened (core.fsmonitor=falseplus non-interactive env) so it can never spawn an fsmonitor / credential-manager / prompt child. The earlier eager dbt provisioning that spawned auvconsole at launch was also removed. - Guided tour - the tour tooltip is clamped into the viewport so steps anchored to large targets (the canvas) stay navigable.
Issues addressed in this release
#7, #10, #39, #75, #76, #82, #83, #84, #85, #86.
(Issues are left open for the reporters to confirm against this build.)
Notes
- Windows SmartScreen: the binaries are not yet code-signed, so Windows may show a brief one-time SmartScreen check the first time you run a freshly downloaded
.exe. This is the OS scanning an unsigned binary, not Duckle; subsequent launches are unaffected. Signed builds are on the roadmap. - The
mssqlextension for SQL Server bulk writes is fetched once from the DuckDB community repository on first use (needs network that one time); setbulk: falseon the SQL Server sink for fully offline operation.
Install
Download the binary for your OS from the assets below. No installer; it is the raw executable.
- Windows x64 / arm64
- macOS arm64 (Apple Silicon) / x64 (Intel)
- Linux x64 / arm64