fix(tlp): avoid false positives in TLP by kumarUjjawal · Pull Request #6 · datafusion-contrib/datafusion-fuzzer

kumarUjjawal · 2026-04-18T11:59:15Z

Summary

This PR fixes false positives in the TLP oracles.

The main change is that TLP WHERE and HAVING predicates now use a safe boolean generator. That keeps the predicates well-typed and valid for TLP checks.

The NoCrash path still keeps broader invalid-expression coverage, so this does not reduce the general crash-fuzzing goal.

Why

Before this change, the fuzzer could generate invalid predicates for TLP queries.

That caused cases where:

the base query succeeded
the partitioned TLP query failed during planning or type coercion

Those were reported like oracle failures, but they were fuzzer-side false positives rather than real DataFusion correctness bugs.

What changed

restrict TLP predicate generation to safe boolean expressions
keep the old broader expression generation for non-TLP fuzzing
add regression coverage for valid boolean predicate execution
update deterministic integration snapshots

2010YOUY01 · 2026-04-19T03:21:58Z

Thank you for working on this!

The current expression generation strategy is to try to produce an expression of the target type on a best-effort basis. For example, generate_random_expr(BOOLEAN) returns a valid boolean expression about 80% of the time; in the remaining cases, it produces either an invalid expression or one with a different type.

This design is simpler and introduces more randomness, which makes it more likely to detect panics or crashes from invalid queries.

The tradeoff introduced by this PR is that we now need to implement a dedicated generator for valid boolean expressions. This also requires updating it whenever new boolean-returning expressions are added. In return, the valid query rate for TLP oracle testing improves from ~80% to 100%.

I’d like to think about this a bit more. For now, I prefer to keep things simple. However:

If the query success rate becomes too low, this feature would be necessary.
Alternatively, we could try improving the current “best-effort” generator to increase the valid type rate.

kumarUjjawal · 2026-04-20T03:20:18Z

Sounds good! I will explore this further.

fix(tlp): avoid false positives in TLP

e81786a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tlp): avoid false positives in TLP#6

fix(tlp): avoid false positives in TLP#6
kumarUjjawal wants to merge 1 commit into
datafusion-contrib:mainfrom
kumarUjjawal:fix/tlp-false-positive-tests

kumarUjjawal commented Apr 18, 2026

Uh oh!

2010YOUY01 commented Apr 19, 2026

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kumarUjjawal commented Apr 18, 2026

Summary

Why

What changed

Uh oh!

2010YOUY01 commented Apr 19, 2026

Uh oh!

kumarUjjawal commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants