Skip to content

Add validation for CardinalityEffect::equal when executing a plan#20875

Open
xanderbailey wants to merge 3 commits intoapache:mainfrom
xanderbailey:xb/add_validation_for_dropping_rows
Open

Add validation for CardinalityEffect::equal when executing a plan#20875
xanderbailey wants to merge 3 commits intoapache:mainfrom
xanderbailey:xb/add_validation_for_dropping_rows

Conversation

@xanderbailey
Copy link
Contributor

Which issue does this PR close?

N/A — preventive measure inspired by #20683 / #20672 where RepartitionExec silently dropped rows when spilling under memory pressure.

Rationale for this change

The bug fixed in #20672 (repartition dropping data when spilling) was a silent correctness issue — queries returned wrong results with no error. This class of bug is particularly dangerous because it can go undetected.

DataFusion operators already declare their expected cardinality effect via CardinalityEffect::Equal (meaning "I output exactly as many rows as I receive"), but nothing actually verified this at runtime. This PR adds a post-execution sanity check that catches any such violation, so bugs like #20683 can be detected immediately rather than silently producing wrong results.

What changes are included in this PR?

  1. New session config flag datafusion.execution.verify_cardinality_effect (default: false) in ExecutionOptions
  2. New cardinality_check module in datafusion-physical-plan with a validate_cardinality_effect() function that walks the executed plan tree using ExecutionPlanVisitor and verifies that every operator declaring CardinalityEffect::Equal produced the same number of output rows as its input (based on post-execution metrics)
  3. Hooks in collect() and collect_partitioned() that run the validation after all streams are consumed, when the config flag is enabled
  4. #[derive(Debug, Clone, Copy)] on CardinalityEffect — the enum was missing these basic derives

The check gracefully skips operators where metrics are unavailable, where fetch limits are set, or where the operator is not unary (one in - one out). When a violation is detected, it returns DataFusionError::Internal with the operator name and mismatched row counts.

Are these changes tested?

Yes. Six unit tests cover:

  • Equal cardinality passes (matching row counts)
  • Equal cardinality violation detected (mismatched row counts)
  • Skips operators with fetch set
  • Skips operators without metrics
  • Skips non-Equal operators
  • Detects violations in nested plan trees

Are there any user-facing changes?

New config option datafusion.execution.verify_cardinality_effect (default false). When enabled via SET execution.verify_cardinality_effect = true, DataFusion will validate row count invariants after query execution and return an error if any operator that should preserve row counts (like RepartitionExec, ProjectionExec, CoalesceBatchesExec, SortExec, etc.) produces a different number of rows than its input.

@github-actions github-actions bot added documentation Improvements or additions to documentation common Related to common crate physical-plan Changes to the physical-plan crate labels Mar 11, 2026
/// number of output rows as their input. This is a post-execution
/// sanity check useful for debugging correctness issues.
/// Disabled by default as it adds a small amount of overhead.
pub verify_cardinality_effect: bool, default = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to have this just be on by default... The cost IMO is pretty small, it's a single tree walk and an extra Arc::clone. It may also be that there are current bugs in CardinalityEffect::Equal declaration?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate documentation Improvements or additions to documentation physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant