Skip to content

Commit 34c1aa0

Browse files
authored
Merge pull request #5 from 2010YOUY01/readme-update
Update `README`
2 parents 5326340 + 93f1388 commit 34c1aa0

3 files changed

Lines changed: 54 additions & 103 deletions

File tree

.cursor/rules/system-prompt.mdc

Lines changed: 0 additions & 21 deletions
This file was deleted.

README.md

Lines changed: 54 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,80 +1,55 @@
11
# DataFusion Fuzzer
22

3-
> **🚧 Work In Progress**
4-
>
5-
> This project is still under active development. The following documentation is AI-generated and requires future cleanup and validation.
6-
>
7-
> This is a Rust rewrite of [datafusion-sqlancer](https://github.com/apache/datafusion/issues/11030), originally implemented in Java. The rewrite aims to simplify implementation, enable better integration with existing DataFusion tooling, and make test oracles applicable to `sqllogictests`. See [this issue](https://github.com/apache/datafusion/issues/14535) for more details on the motivation behind the Rust rewrite.
3+
A fuzzing tool for Apache DataFusion that tests SQL query execution and helps find potential bugs, crashes, and inconsistencies in query results.
84

9-
A comprehensive fuzzing tool for Apache DataFusion, designed to test SQL query execution and find potential bugs, crashes, or inconsistencies in the query engine.
5+
## Overview
6+
This fuzzer primarily:
7+
1. Generates random tables and SQL queries.
8+
2. Runs them on DataFusion and checks whether the results satisfy an oracle-defined consistency rule.
109

11-
## Quick Start
12-
13-
To run the fuzzer with default settings:
14-
15-
```bash
16-
cargo run --release
17-
```
18-
19-
To run with a custom configuration:
10+
### Example
11+
```text
12+
Oracle: TLP (Ternary Logic Partitioning)
2013
21-
```bash
22-
cargo run --release -- --config datafusion-fuzzer.toml
23-
```
14+
Random query (Q1):
15+
SELECT * FROM t1;
2416
25-
To run with command-line options:
26-
```bash
27-
cargo run --release -- --config datafusion-fuzzer.toml --rounds 5 --queries-per-round 20
28-
```
17+
Mutated query (Q2):
18+
SELECT * FROM t1 WHERE v1 > 0
19+
UNION ALL
20+
SELECT * FROM t1 WHERE NOT (v1 > 0)
21+
UNION ALL
22+
SELECT * FROM t1 WHERE (v1 > 0) IS NULL;
2923
30-
To run with verbose oracle/query logs to stdout:
31-
```bash
32-
RUST_LOG=info cargo run -- --config datafusion-fuzzer.toml --display-logs
24+
Consistency check:
25+
Q1 and Q2 should return the same multiset of rows.
3326
```
3427

35-
## Oracles
36-
37-
The runner currently chooses one oracle at random for each test case:
38-
39-
- `NoCrashOracle`: checks for non-whitelisted crashes/errors.
40-
- `TlpWhereOracle`: validates TLP partitioning over `WHERE` (`p`, `NOT p`, `p IS NULL`) via value-level multiset comparison.
41-
- `TlpHavingOracle`: validates TLP partitioning over `HAVING` (`p`, `NOT p`, `p IS NULL`) via value-level multiset comparison.
42-
43-
## Configuration
28+
This project is inspired by [SQLancer](https://github.com/sqlancer/sqlancer).
4429

45-
The fuzzer supports extensive configuration options to customize the fuzzing process.
30+
For an introduction to database fuzzing techniques, see this talk by the author of SQLancer: https://youtu.be/Np46NQ6lqP8?si=lSVAU7Jy3H-QtrWV
4631

47-
You can configure DataFusion Fuzzer in two ways:
32+
## Quick Start
4833

49-
1. **Configuration file**: Use a TOML file to specify detailed settings
50-
2. **Command-line arguments**: Override configuration file settings or use standalone
34+
To run the fuzzer with the default sample configuration:
5135

52-
### Configuration File
36+
```bash
37+
cargo run --release -- --config fuzzer-default.toml
38+
```
5339

54-
See `datafusion-fuzzer.toml` for an example configuration file:
40+
This runs the fuzzer against the DataFusion version specified in `Cargo.toml`.
5541

56-
```toml
57-
# Fuzzing execution settings
58-
seed = 42
59-
rounds = 3
60-
queries_per_round = 10
61-
timeout_seconds = 2
42+
The config file controls options such as round count, timeout, and log directory.
6243

63-
# Logging settings
64-
display_logs = false
65-
enable_tui = true
66-
log_path = "logs"
67-
sample_interval_secs = 5
44+
If a bug is found, use the CLI output and generated log files to reproduce it.
6845

69-
# Table generation parameters
70-
max_column_count = 5
71-
max_row_count = 100
72-
max_expr_level = 3
73-
max_group_by_count = 3
74-
max_table_count = 3
75-
max_insert_per_table = 20
46+
To override values from the configuration file by using CLI arguments:
47+
```bash
48+
cargo run --release -- --config fuzzer-default.toml --rounds 5 --queries-per-round 20
7649
```
7750

51+
See `fuzzer-default.toml` for supported options.
52+
7853
### Command Line Options
7954

8055
```
@@ -91,27 +66,28 @@ Options:
9166
-V, --version Print version
9267
```
9368

94-
### Configuration Parameters
69+
## Roadmap
70+
71+
### Implemented Oracles
72+
The runner currently chooses one oracle at random for each test case:
9573

96-
- `max_table_count`: Maximum number of tables that can be selected in a single query (default: 3)
97-
- `max_column_count`: Maximum number of columns per generated table (default: 5)
98-
- `max_row_count`: Maximum number of rows per generated table (default: 100)
99-
- `max_expr_level`: Maximum expression nesting level (default: 3)
100-
- `max_group_by_count`: Maximum number of `GROUP BY` expressions (default: 3)
74+
- [x] `NoCrashOracle`: checks for non-whitelisted crashes and errors.
75+
- [x] `TlpWhereOracle`: validates TLP partitioning over `WHERE` (`p`, `NOT p`, `p IS NULL`) using value-level multiset comparison.
76+
- [x] `TlpHavingOracle`: validates TLP partitioning over `HAVING` (`p`, `NOT p`, `p IS NULL`) using value-level multiset comparison.
77+
- [ ] `NoREC` (planned): [paper](https://www.manuelrigger.at/preprints/NoREC.pdf)
10178

102-
## Progress Tracker
10379
### SQL Features
104-
- [x] where
105-
- [ ] sort + limit, offset
106-
- [ ] aggregate
107-
- [x] having
108-
- [ ] join
109-
- [ ] union/union all/intersect/except
110-
111-
### SQL - Subqueries
112-
- [ ] views
113-
- [ ] scalar subquery
114-
- [ ] 'relation-like' subquery
80+
- [x] WHERE
81+
- [ ] SORT + LIMIT/OFFSET
82+
- [ ] AGGREGATE
83+
- [x] HAVING
84+
- [ ] JOIN
85+
- [ ] UNION/UNION ALL/INTERSECT/EXCEPT
86+
87+
### SQL Subqueries
88+
- [ ] Views
89+
- [ ] Scalar subquery
90+
- [ ] `Relation-like` subquery
11591

11692
### Expressions
11793
- [ ] Operators
@@ -120,15 +96,11 @@ Options:
12096
- [ ] Window Functions
12197

12298
### Types
123-
- [ ] Complete Primitive types
99+
- [ ] Complete primitive type coverage
124100
- [ ] Time-related types
125101
- [ ] Array types
126-
- [ ] Struct/Json
102+
- [ ] Struct/JSON
127103

128104
### Infrastructure
129105
- [x] CLI
130106
- [x] Oracle interface
131-
132-
## License
133-
134-
[MIT](LICENSE)

0 commit comments

Comments
 (0)