You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+91-42Lines changed: 91 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,80 +5,129 @@ Documentation is undergoing a significant revamp - the new documentation will be
5
5
6
6
## Overview
7
7
8
-
`dft` is a batteriesincluded suite of a [DataFusion](https://github.com/apache/arrow-datafusion) applications. The batteries being several common features to modern query execution engines such as:
8
+
`dft` is a batteries-included suite of [DataFusion](https://github.com/apache/arrow-datafusion) applications that provides:
9
9
10
-
- Query files from S3 or HuggingFace datasets
11
-
- Support for common table formats (Deltalake, Iceberg, Hudi)
12
-
- UDFs defined in multiple languages (WASM and soon Python)
13
-
-Popular helper functions (for example for working with JSON and Parquet data)
10
+
-**Data Source Integration**: Query files from S3, local filesystems, or HuggingFace datasets
11
+
-**Table Format Support**: Native support for Delta Lake, Iceberg, and Hudi
12
+
-**Extensibility**: UDFs defined in WASM (and soon Python)
13
+
-**Helper Functions**: Built-in functions for JSON and Parquet data processing
14
14
15
-
It provides two client interfaces to the query execution engine:
16
-
1. Text User Interface (TUI): An IDE for DataFusion developers and users that provides a local database experience with utilities to analyze / benchmark queries.
17
-
2. Command Line Interface (CLI): Scriptable engine for executing queries from files.
15
+
The project offers four complementary interfaces:
18
16
19
-
And two server implementation, FlightSQL & HTTP, leveraging the same execution engine behind the TUI and CLI. This allows users to iterate and quickly develop a database then seamlessly deploy applications built on it.
17
+
1.**Text User Interface (TUI)**: An interactive SQL IDE with real-time query analysis, benchmarking, and catalog exploration
18
+
2.**Command Line Interface (CLI)**: A scriptable engine for executing queries from files or command line
19
+
3.**FlightSQL Server**: A standards-compliant SQL interface for programmatic access
20
+
4.**HTTP Server**: A REST API for SQL queries and catalog exploration
20
21
21
-
`dft` is inspired by [`datafusion-cli`], but has some differences:
22
-
1. The TUI focuses on more complete and interactive experience for users.
23
-
2. It contains many built in integrations such as Delta Lake and Iceberg that are not available in `datafusion-cli`.
24
-
3. It provides FlightSQL and HTTP server implementations to make it easy to deploy DataFusion based applications / backends.
22
+
All interfaces share the same execution engine, allowing you to develop locally with the TUI and then seamlessly deploy with the server implementations.
`dft` builds upon [`datafusion-cli`](https://datafusion.apache.org/user-guide/cli/overview.html) with enhanced interactivity, additional integrations, and ready-to-use server implementations.
27
25
28
26
## User Guide
29
27
30
28
### Installation
31
29
32
-
Currently, the only supported packaging is on [crates.io](https://crates.io/search?q=datafusion-dft). If you already have Rust installed it can be installed by running `cargo install datafusion-dft`. If rust is not installed you can download following the directions [here](https://www.rust-lang.org/tools/install).
30
+
#### From crates.io (Recommended)
31
+
```sh
32
+
# If you have Rust installed
33
+
cargo install datafusion-dft
33
34
34
-
### Running the apps
35
+
# For full functionality with all features
36
+
cargo install datafusion-dft --all-features
37
+
```
35
38
36
-
The command for each of the apps are:
39
+
If you don't have Rust installed, follow the [installation instructions](https://www.rust-lang.org/tools/install).
See the [Features documentation](docs/features.md) for all available features.
55
+
56
+
### Running the apps
57
+
58
+
```sh
59
+
# Interactive TUI (default)
40
60
dft
41
61
42
-
#Execute command with CLI (enabled by default)
43
-
dft -c "SELECT 1"
62
+
#CLI with direct query execution
63
+
dft -c "SELECT 1 + 2"
44
64
45
-
#Execute SQL from file (enabled by default)
65
+
#CLI with file-based query
46
66
dft -f query.sql
47
67
68
+
# Benchmark a query (with stats)
69
+
dft -c "SELECT * FROM my_table" --bench
70
+
48
71
# Start FlightSQL Server (requires `flightsql` feature)
49
72
dft serve-flightsql
50
73
51
74
# Start HTTP Server (requires `http` feature)
52
75
dft serve-http
53
76
```
54
77
55
-
### DDL
56
-
57
-
The CLI can also run your configured DDL prior to executing the query by adding the `--run-ddl` parameter.
78
+
### Setting Up Tables with DDL
58
79
59
-
To have the best experience with `dft`it is highly recommended to define all of your DDL in `~/.config/ddl.sql` so that any tables you wish to query are available at startup. Additionally, now that DataFusion supports `CREATE VIEW` via sql you can also make a `VIEW` based on these tables.
80
+
`dft`can automatically load table definitions at startup, giving you a persistent "database-like" experience.
60
81
61
-
For example, your DDL file could look like the following:
82
+
#### Using DDL Files
62
83
63
-
```
64
-
CREATE EXTERNAL TABLE users STORED AS NDJSON LOCATION 's3://bucket/users';
65
-
66
-
CREATE EXTERNAL TABLE transactions STORED AS PARQUET LOCATION 's3://bucket/transactions';
84
+
1. Create a DDL file (default: `~/.config/dft/ddl.sql`)
85
+
2. Add your table and view definitions:
67
86
68
-
CREATE EXTERNAL TABLE listings STORED AS PARQUET LOCATION 'file://folder/listings';
87
+
```sql
88
+
-- S3 data source (requires s3 feature)
89
+
CREATE EXTERNAL TABLE users
90
+
STORED AS NDJSON
91
+
LOCATION 's3://bucket/users';
69
92
70
-
CREATE VIEW OR REPLACE users_listings AS SELECT * FROM users LEFT JOIN listings USING (user_id);
71
-
```
93
+
-- Parquet files
94
+
CREATE EXTERNAL TABLE transactions
95
+
STORED AS PARQUET
96
+
LOCATION 's3://bucket/transactions';
72
97
73
-
This would make the tables `users`, `transactions`, `listings`, and the view `users_listings` available at startup. Any of these DDL statements could also be run interactively from the SQL editor as well to create the tables.
98
+
-- Local files
99
+
CREATE EXTERNAL TABLE listings
100
+
STORED AS PARQUET
101
+
LOCATION 'file://folder/listings';
74
102
75
-
# Additional Documentation
103
+
-- Create views from tables
104
+
CREATEVIEWusers_listingsAS
105
+
SELECT*FROM users
106
+
LEFT JOIN listings USING (user_id);
76
107
77
-
Links to more detailed documentation for each of the apps and all of the features can be found below.
108
+
-- Delta Lake table (requires deltalake feature)
109
+
CREATE EXTERNAL TABLE delta_table
110
+
STORED AS DELTATABLE
111
+
LOCATION 's3://bucket/delta_table';
112
+
```
78
113
79
-
-[Features](docs/features.md)
80
-
-[CLI Docs](docs/cli.md)
81
-
-[TUI Docs](docs/tui.md)
82
-
-[FlightSQL Server Docs](docs/flightsql_server.md)
83
-
-[HTTP Server Docs](docs/http_server.md)
84
-
-[Config Reference](docs/config.md)
114
+
#### Loading DDL
115
+
116
+
-**TUI**: DDL is automatically loaded at startup
117
+
-**CLI**: Add `--run-ddl` flag to execute DDL before your query
118
+
-**Custom Path**: Configure a custom DDL path in your config file
Copy file name to clipboardExpand all lines: docs/config.md
+29-3Lines changed: 29 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,45 @@
1
-
# Config Reference
1
+
# Configuration Reference
2
2
3
-
`dft` is configuration through a TOML file with default location of `~/.config/dft/config.toml`. Within the config there is a shared configuration that can be used across all apps and app specific configuration. The shared and app specific configs are merged to come up with a final configuration that is used. DataFusion's execution configuration can be fully customized as part of this.
3
+
`dft` is configured through a TOML file located at `~/.config/dft/config.toml` by default. The configuration system uses a hierarchical approach:
4
+
5
+
1.**Default values** built into the application
6
+
2.**Shared configuration** applied to all interfaces
7
+
3.**App-specific configuration** for each interface (TUI, CLI, etc.)
8
+
9
+
The final configuration for each interface is a merge of these three layers, with app-specific settings taking precedence over shared settings.
10
+
11
+
## Configuration Structure
4
12
5
-
The sections for configuring each app are shown below.
6
13
```toml
14
+
# Settings applied to all interfaces
7
15
[shared]
16
+
# ...shared settings...
8
17
18
+
# CLI-specific settings
9
19
[cli]
20
+
# ...CLI settings...
10
21
22
+
# TUI-specific settings
11
23
[tui]
24
+
# ...TUI settings...
12
25
26
+
# FlightSQL client settings
13
27
[flightsql_client]
28
+
# ...FlightSQL client settings...
14
29
30
+
# FlightSQL server settings
15
31
[flightsql_server]
32
+
# ...FlightSQL server settings...
33
+
34
+
# HTTP server settings
35
+
[http_server]
36
+
# ...HTTP server settings...
37
+
```
38
+
39
+
You can specify a different config file with the `--config` parameter:
`dft` includes an **experimental FlightSQL server** that you can run to provide programmatic SQL access to DataFusion.
4
-
5
-
---
3
+
`dft` includes a FlightSQL server that provides programmatic SQL access to DataFusion through a standards-compliant interface. This enables you to connect using language-specific FlightSQL clients and build applications on top of DataFusion.
6
4
7
5
## Starting the Server
8
6
9
7
```sh
8
+
# Start with default settings
10
9
dft serve-flightsql
10
+
11
+
# Start with your custom DDL loaded
12
+
dft serve-flightsql --run-ddl
11
13
```
12
14
13
-
## Endpoints
15
+
## Supported Operations
16
+
17
+
The server implements the FlightSQL protocol, providing:
14
18
15
-
TODO List endpoints
19
+
- SQL query execution
20
+
- Schema fetching (TODO)
21
+
- Prepared statements (TODO)
22
+
- Catalog browsing (TODO)
16
23
17
-
## Auth
24
+
## Client Connections (TODO - Test this)
18
25
19
-
Require basic or bearer authentication to make requests.
26
+
You can connect to the server using any FlightSQL-compatible client:
0 commit comments