Skip to content

Duckle v0.5.0

Latest

Choose a tag to compare

@github-actions github-actions released this 22 Jun 12:52
· 19 commits to main since this release

Duckle v0.5.0 - Data Quality, Governance & Time-Travel

The biggest release since the platform went cross-OS. v0.5.0 turns Duckle into a
governance-grade studio: a full data-quality / MDM / governance component pack,
time-travel data diffing on DuckLake, column-level lineage, a browser-based
management console for running and monitoring pipelines, and high-throughput
bulk writes to SQL Server. Plus a first-run guided tour, a redesigned console,
and a fix for the startup console flash some Windows users hit.

Local-first as ever: one executable, no servers, no JVM, no control plane. The
component catalog now stands at 348.


Highlights

  • Web Management Console (duckle serve) - operate every pipeline from a browser: job-grouped overview, run history, logs, and a built-in interval scheduler. One-click "Open web dashboard" button in the desktop app. (#75)
  • Time-Travel + Data Diff - read any DuckLake table "AS OF" a snapshot/timestamp, browse snapshots, and diff two snapshots to see exactly what changed, with an AI-readable change summary.
  • Data Quality / Governance / MDM pack - 17 new components for masking, survivorship, match-grouping, expectations, referential integrity, profiling, reconciliation, classification, SCD3, outliers, sessionization, freshness, contracts, and surrogate keys.
  • Bulk SQL Server writes via the DuckDB mssql community extension (TDS, COPY/INSERT) instead of row-by-row inserts. (#86)
  • Column-level lineage resolver across the whole pipeline (provenance foundation).
  • Guided tour + redesigned console and a fix for the Windows startup console flash.

New features

Web Management Console (#75)

  • Launch the console from a terminal with duckle serve --workspace <path> (the desktop binary delegates to the embedded headless runner; duckle-runner serve also works for the standalone runner). Zero-dependency std-only HTTP, embedded HTML, no extra binary on the release.
  • Overview grouped by job: every pipeline as a card with last status, schedule, last-run time, duration, run-count and success-rate, with an expandable per-job run history and logs.
  • Runs tab: a full execution timeline across all pipelines with status / period / search filters.
  • Schedules tab: enable an interval per pipeline; the console runs due schedules itself while open.
  • Run any pipeline from the browser; results, the compiled plan and logs are all reachable.
  • Desktop integration: a green-glowing Open web dashboard button in the top bar boots the console and opens it in your browser. (e0a1b63, 0bcc63a)

Time-travel & Data Diff (DuckLake)

  • AS OF reads: query a DuckLake table at a specific snapshot version or timestamp.
  • Snapshot inspector data source (ducklake_snapshots) to list a table's history.
  • Browse Snapshots picker for the AS OF field, so you can pick a point in time visually.
  • Data Diff node (src.ducklake.diff): the change feed between two snapshots (inserts / updates / deletes).
  • AI diff summary (xf.diffsummary): a human-readable summary of what changed between snapshots.

Data Quality, Governance & MDM component pack

17 new engine components (with full property manifests), no external service or LLM required:

  • Masking / anonymization - in-place column masking for governance.
  • Survivorship - golden-record survivorship (the MDM merge step).
  • Match groups, expectations, and advanced sampling.
  • Referential integrity - cross-input orphan / foreign-key checks.
  • Advanced profiling, record linkage, reconciliation, and classification.
  • Data contracts - a contract gate that fails a run on violation.
  • Surrogate keys and labeled bucketize.
  • SCD type 3, outlier detection (with a reject port), sessionization, and freshness checks.

Bulk SQL Server writes (#86)

  • snk.sqlserver / snk.synapse now bulk-load through the DuckDB mssql community extension (pure TDS, ATTACH + COPY/INSERT) instead of row-by-row driver inserts - a large speedup on big loads.
  • On by default (bulk: true); set bulk: false to keep the fully offline row-by-row driver. Live-verified against SQL Server 2022.

Column-level lineage

  • A column-level lineage resolver with cross-stage stitching back to root sources, exposed as Engine::pipeline_column_lineage - the foundation for impact analysis and provenance.

Guided tour & console UX

  • First-run guided tour: a spotlight walkthrough of the palette, canvas, properties, Run, and the web dashboard. Skippable on every step and replayable from Settings -> Replay guided tour.
  • Redesigned management console with the real Duckle brand mark, status pills, and metric cards.

Enhancements

  • #82 - bulk column rename from an external map file (JSON / CSV / YAML) in xf.rename.
  • #83 - src.csv filename passthrough and extra DuckDB CSV read-options.
  • #84 - auto-load the spatial extension in the SQL Template when the SQL uses spatial functions over a CSV.
  • #10 - per-column date / timestamp format in xf.cast.
  • #76 - per-source ATTACH aliases and Auto live-view predicate pushdown for multiple duck sources in one pipeline.
  • #85 - Simplified Chinese (zh-CN) i18n update merged. Thanks to the contributor.

Bug fixes

  • #39 - "merge" write mode now resolves its input columns through an upstream transform, instead of erroring when a transform sits between source and sink.
  • #7 - SQL/code export now documents control-flow steps instead of emitting empty stages.
  • Set operations - INTERSECT / EXCEPT no longer produce a parser error; DISTINCT ON ordering is deterministic.
  • Secret redaction - connection secrets are stripped from run errors, run history, and NDJSON logs.
  • code.javascript - preserves 64-bit integers (no BIGINT to DOUBLE precision loss).
  • Sink nodes - removed the meaningless "View" materialize option (it caused a false-negative).
  • Startup console flash (Windows) - the app no longer shells git during launch (the CI-status badge defers its first poll), and the git invocation is hardened (core.fsmonitor=false plus non-interactive env) so it can never spawn an fsmonitor / credential-manager / prompt child. The earlier eager dbt provisioning that spawned a uv console at launch was also removed.
  • Guided tour - the tour tooltip is clamped into the viewport so steps anchored to large targets (the canvas) stay navigable.

Issues addressed in this release

#7, #10, #39, #75, #76, #82, #83, #84, #85, #86.

(Issues are left open for the reporters to confirm against this build.)


Notes

  • Windows SmartScreen: the binaries are not yet code-signed, so Windows may show a brief one-time SmartScreen check the first time you run a freshly downloaded .exe. This is the OS scanning an unsigned binary, not Duckle; subsequent launches are unaffected. Signed builds are on the roadmap.
  • The mssql extension for SQL Server bulk writes is fetched once from the DuckDB community repository on first use (needs network that one time); set bulk: false on the SQL Server sink for fully offline operation.

Install

Download the binary for your OS from the assets below. No installer; it is the raw executable.

  • Windows x64 / arm64
  • macOS arm64 (Apple Silicon) / x64 (Intel)
  • Linux x64 / arm64