Kyuubi Dangerous Join Watchdog detects risky join planning patterns before query execution. It helps reduce accidental Cartesian products, oversized broadcast attempts, and long-running nested loop joins.
In shared SQL gateway environments, a single risky join can consume excessive driver memory or create very slow jobs. The Dangerous Join Watchdog adds planning-time checks for these high-risk patterns.
- Rule 1: Equi-join is marked dangerous when it degrades to a Cartesian pattern.
- Rule 2: Equi-join is marked dangerous when the estimated build side exceeds the configured broadcast ratio threshold.
- Rule 1: Non-equi join is marked dangerous when both sides exceed broadcast threshold and effectively become Cartesian risk.
- Rule 2: Non-equi join is marked dangerous when build side is not selectable and the plan falls back to a second BNLJ pattern.
| Name | Default | Meaning |
|---|---|---|
kyuubi.watchdog.dangerousJoin.enabled |
false |
Enable or disable dangerous join detection |
kyuubi.watchdog.dangerousJoin.broadcastRatio |
0.8 |
Ratio against Spark broadcast threshold for warning/reject decision |
kyuubi.watchdog.dangerousJoin.action |
WARN |
WARN logs diagnostics; REJECT throws exception and rejects submission |
- Put Kyuubi Spark extension jar into Spark classpath.
- Configure SQL extensions:
spark.sql.extensions=org.apache.kyuubi.sql.KyuubiSparkSQLExtension,org.apache.kyuubi.sql.watchdog.KyuubiDangerousJoinExtension- Configure action:
kyuubi.watchdog.dangerousJoin.action=WARNor
kyuubi.watchdog.dangerousJoin.action=REJECTWhen action is WARN, Kyuubi writes a structured JSON payload:
KYUUBI_LOG_KEY={"sql":"SELECT ...","joinType":"INNER","reason":"Cartesian","leftSize":10485760,"rightSize":15728640,"broadcastThreshold":10485760,"broadcastRatio":0.8}
When action is REJECT, query submission fails with:
errorCode=41101
Query rejected due to dangerous join strategy: {...details...}
- Disable watchdog:
kyuubi.watchdog.dangerousJoin.enabled=false- Increase tolerance:
kyuubi.watchdog.dangerousJoin.broadcastRatio=0.95Dangerous Join Watchdog runs in planner strategy phase and evaluates pre-execution plan statistics. AQE may still optimize runtime plans, but watchdog decisions are made before query execution starts.