You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> This project is still under active development. The following documentation is AI-generated and requires future cleanup and validation.
6
-
>
7
-
> This is a Rust rewrite of [datafusion-sqlancer](https://github.com/apache/datafusion/issues/11030), originally implemented in Java. The rewrite aims to simplify implementation, enable better integration with existing DataFusion tooling, and make test oracles applicable to `sqllogictests`. See [this issue](https://github.com/apache/datafusion/issues/14535) for more details on the motivation behind the Rust rewrite.
3
+
A fuzzing tool for Apache DataFusion that tests SQL query execution and helps find potential bugs, crashes, and inconsistencies in query results.
8
4
9
-
A comprehensive fuzzing tool for Apache DataFusion, designed to test SQL query execution and find potential bugs, crashes, or inconsistencies in the query engine.
5
+
## Overview
6
+
This fuzzer primarily:
7
+
1. Generates random tables and SQL queries.
8
+
2. Runs them on DataFusion and checks whether the results satisfy an oracle-defined consistency rule.
10
9
11
-
## Quick Start
12
-
13
-
To run the fuzzer with default settings:
14
-
15
-
```bash
16
-
cargo run --release
17
-
```
18
-
19
-
To run with a custom configuration:
10
+
### Example
11
+
```text
12
+
Oracle: TLP (Ternary Logic Partitioning)
20
13
21
-
```bash
22
-
cargo run --release -- --config datafusion-fuzzer.toml
23
-
```
14
+
Random query (Q1):
15
+
SELECT * FROM t1;
24
16
25
-
To run with command-line options:
26
-
```bash
27
-
cargo run --release -- --config datafusion-fuzzer.toml --rounds 5 --queries-per-round 20
28
-
```
17
+
Mutated query (Q2):
18
+
SELECT * FROM t1 WHERE v1 > 0
19
+
UNION ALL
20
+
SELECT * FROM t1 WHERE NOT (v1 > 0)
21
+
UNION ALL
22
+
SELECT * FROM t1 WHERE (v1 > 0) IS NULL;
29
23
30
-
To run with verbose oracle/query logs to stdout:
31
-
```bash
32
-
RUST_LOG=info cargo run -- --config datafusion-fuzzer.toml --display-logs
24
+
Consistency check:
25
+
Q1 and Q2 should return the same multiset of rows.
33
26
```
34
27
35
-
## Oracles
36
-
37
-
The runner currently chooses one oracle at random for each test case:
38
-
39
-
-`NoCrashOracle`: checks for non-whitelisted crashes/errors.
40
-
-`TlpWhereOracle`: validates TLP partitioning over `WHERE` (`p`, `NOT p`, `p IS NULL`) via value-level multiset comparison.
41
-
-`TlpHavingOracle`: validates TLP partitioning over `HAVING` (`p`, `NOT p`, `p IS NULL`) via value-level multiset comparison.
42
-
43
-
## Configuration
28
+
This project is inspired by [SQLancer](https://github.com/sqlancer/sqlancer).
44
29
45
-
The fuzzer supports extensive configuration options to customize the fuzzing process.
30
+
For an introduction to database fuzzing techniques, see this talk by the author of SQLancer: https://youtu.be/Np46NQ6lqP8?si=lSVAU7Jy3H-QtrWV
46
31
47
-
You can configure DataFusion Fuzzer in two ways:
32
+
## Quick Start
48
33
49
-
1.**Configuration file**: Use a TOML file to specify detailed settings
50
-
2.**Command-line arguments**: Override configuration file settings or use standalone
34
+
To run the fuzzer with the default sample configuration:
51
35
52
-
### Configuration File
36
+
```bash
37
+
cargo run --release -- --config fuzzer-default.toml
38
+
```
53
39
54
-
See `datafusion-fuzzer.toml` for an example configuration file:
40
+
This runs the fuzzer against the DataFusion version specified in `Cargo.toml`.
55
41
56
-
```toml
57
-
# Fuzzing execution settings
58
-
seed = 42
59
-
rounds = 3
60
-
queries_per_round = 10
61
-
timeout_seconds = 2
42
+
The config file controls options such as round count, timeout, and log directory.
62
43
63
-
# Logging settings
64
-
display_logs = false
65
-
enable_tui = true
66
-
log_path = "logs"
67
-
sample_interval_secs = 5
44
+
If a bug is found, use the CLI output and generated log files to reproduce it.
68
45
69
-
# Table generation parameters
70
-
max_column_count = 5
71
-
max_row_count = 100
72
-
max_expr_level = 3
73
-
max_group_by_count = 3
74
-
max_table_count = 3
75
-
max_insert_per_table = 20
46
+
To override values from the configuration file by using CLI arguments:
47
+
```bash
48
+
cargo run --release -- --config fuzzer-default.toml --rounds 5 --queries-per-round 20
76
49
```
77
50
51
+
See `fuzzer-default.toml` for supported options.
52
+
78
53
### Command Line Options
79
54
80
55
```
@@ -91,27 +66,28 @@ Options:
91
66
-V, --version Print version
92
67
```
93
68
94
-
### Configuration Parameters
69
+
## Roadmap
70
+
71
+
### Implemented Oracles
72
+
The runner currently chooses one oracle at random for each test case:
95
73
96
-
-`max_table_count`: Maximum number of tables that can be selected in a single query (default: 3)
97
-
-`max_column_count`: Maximum number of columns per generated table (default: 5)
98
-
-`max_row_count`: Maximum number of rows per generated table (default: 100)
99
-
-`max_expr_level`: Maximum expression nesting level (default: 3)
100
-
-`max_group_by_count`: Maximum number of `GROUP BY` expressions (default: 3)
74
+
-[x]`NoCrashOracle`: checks for non-whitelisted crashes and errors.
75
+
-[x]`TlpWhereOracle`: validates TLP partitioning over `WHERE` (`p`, `NOT p`, `p IS NULL`) using value-level multiset comparison.
76
+
-[x]`TlpHavingOracle`: validates TLP partitioning over `HAVING` (`p`, `NOT p`, `p IS NULL`) using value-level multiset comparison.
0 commit comments