Skip to content

Commit de5c72b

Browse files
committed
fix: skip COUNT marker for extension-ATTACH source stages
DuckDB v1.5's binder consistently rejects the marker shape 'SELECT COUNT(*) AS r FROM <extension_view>' inside a batched session, where <extension_view> is a view backed by the postgres / mysql / sqlite / duckdb / motherduck / etc. extension's ATTACH machinery. Three earlier marker variants all tripped it ("Failed to bind column reference count_star() / r / _duckle_r"). Per-stage avoids the binder bug only because each spawn is a fresh session; in one batched session the same shape fails reproducibly under mysql-integration in CI. Pre-compute the set of extension-backed node ids at the start of execute_batched and emit count-less markers ('SELECT NULL AS _duckle_r') for any stage whose count target lands on one of them (view stage = its own node id; sink stage = its `from` upstream). Also skip preview generation for those view stages for the same reason. Cost: rows is None in batched mode for stages reading from an extension source. Net: pipelines that mix file + extension sources still batch the file-only portion; their extension portion just loses the per-stage row count.
1 parent 8ba6bd2 commit de5c72b

1 file changed

Lines changed: 40 additions & 0 deletions

File tree

  • crates/duckdb-engine/src

crates/duckdb-engine/src/lib.rs

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -874,6 +874,41 @@ impl DuckdbEngine {
874874
.replace('\'', "''")
875875
};
876876

877+
// Pre-compute the set of node ids whose stage SQL produces a
878+
// view backed by a DuckDB extension's ATTACH machinery
879+
// (postgres, mysql, sqlite, duckdb, motherduck, etc.). Their
880+
// views work for sinks downstream that do plain COPY, but
881+
// they break the marker's `SELECT COUNT(*) AS r FROM <v>`:
882+
// DuckDB's binder rejects the aliased-aggregate shape inside
883+
// a batched session with "Failed to bind column reference r".
884+
// Per-stage avoids it because each spawn is a fresh session.
885+
// For these stages and any sink that reads from them, emit a
886+
// count-less marker; we lose batched-mode row counts on
887+
// those, but the pipeline still runs and the perf win for
888+
// the rest of the pipeline is preserved.
889+
let extension_attach = |cid: &str| -> bool {
890+
matches!(
891+
cid,
892+
"src.postgres"
893+
| "src.cockroach"
894+
| "src.pgvector"
895+
| "src.redshift"
896+
| "src.mysql"
897+
| "src.mariadb"
898+
| "src.motherduck"
899+
| "src.ducklake"
900+
| "src.bigquery"
901+
| "src.quack"
902+
| "src.duckdb"
903+
| "src.sqlite"
904+
)
905+
};
906+
let extension_node_ids: std::collections::HashSet<&str> = stages
907+
.iter()
908+
.filter(|s| extension_attach(&s.component_id))
909+
.map(|s| s.node_id.as_str())
910+
.collect();
911+
877912
// Build the batched SQL: secret prefix, PRAGMA preset (once),
878913
// then per-stage SQL + per-stage markers + per-view previews.
879914
let mut batched_sql = String::new();
@@ -944,6 +979,10 @@ impl DuckdbEngine {
944979
plan::StageKind::View if !count_unsafe => Some(stage.node_id.as_str()),
945980
plan::StageKind::View => None,
946981
};
982+
// Skip the COUNT(*) entirely if the target is an extension
983+
// ATTACH view; the binder bug above otherwise aborts the
984+
// batch.
985+
let count_target = count_target.filter(|t| !extension_node_ids.contains(t));
947986
// Marker shape is just `SELECT COUNT(*) AS _duckle_r FROM <t>`
948987
// (or `SELECT NULL AS _duckle_r` when there's no countable
949988
// target). No string-literal projected alongside the
@@ -979,6 +1018,7 @@ impl DuckdbEngine {
9791018
if matches!(stage.kind, plan::StageKind::View)
9801019
&& stage.component_id != "ctl.switch"
9811020
&& stage.component_id != "xf.assert"
1021+
&& !extension_node_ids.contains(stage.node_id.as_str())
9821022
{
9831023
let schema = marker_dir.join(format!("{}_schema.json", i));
9841024
let rows = marker_dir.join(format!("{}_rows.json", i));

0 commit comments

Comments
 (0)