Skip to content

Commit 147cc46

Browse files
Docs (#310)
1 parent 5dcd5f6 commit 147cc46

4 files changed

Lines changed: 258 additions & 79 deletions

File tree

README.md

Lines changed: 91 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -5,80 +5,129 @@ Documentation is undergoing a significant revamp - the new documentation will be
55

66
## Overview
77

8-
`dft` is a batteries included suite of a [DataFusion](https://github.com/apache/arrow-datafusion) applications. The batteries being several common features to modern query execution engines such as:
8+
`dft` is a batteries-included suite of [DataFusion](https://github.com/apache/arrow-datafusion) applications that provides:
99

10-
- Query files from S3 or HuggingFace datasets
11-
- Support for common table formats (Deltalake, Iceberg, Hudi)
12-
- UDFs defined in multiple languages (WASM and soon Python)
13-
- Popular helper functions (for example for working with JSON and Parquet data)
10+
- **Data Source Integration**: Query files from S3, local filesystems, or HuggingFace datasets
11+
- **Table Format Support**: Native support for Delta Lake, Iceberg, and Hudi
12+
- **Extensibility**: UDFs defined in WASM (and soon Python)
13+
- **Helper Functions**: Built-in functions for JSON and Parquet data processing
1414

15-
It provides two client interfaces to the query execution engine:
16-
1. Text User Interface (TUI): An IDE for DataFusion developers and users that provides a local database experience with utilities to analyze / benchmark queries.
17-
2. Command Line Interface (CLI): Scriptable engine for executing queries from files.
15+
The project offers four complementary interfaces:
1816

19-
And two server implementation, FlightSQL & HTTP, leveraging the same execution engine behind the TUI and CLI. This allows users to iterate and quickly develop a database then seamlessly deploy applications built on it.
17+
1. **Text User Interface (TUI)**: An interactive SQL IDE with real-time query analysis, benchmarking, and catalog exploration
18+
2. **Command Line Interface (CLI)**: A scriptable engine for executing queries from files or command line
19+
3. **FlightSQL Server**: A standards-compliant SQL interface for programmatic access
20+
4. **HTTP Server**: A REST API for SQL queries and catalog exploration
2021

21-
`dft` is inspired by [`datafusion-cli`], but has some differences:
22-
1. The TUI focuses on more complete and interactive experience for users.
23-
2. It contains many built in integrations such as Delta Lake and Iceberg that are not available in `datafusion-cli`.
24-
3. It provides FlightSQL and HTTP server implementations to make it easy to deploy DataFusion based applications / backends.
22+
All interfaces share the same execution engine, allowing you to develop locally with the TUI and then seamlessly deploy with the server implementations.
2523

26-
[`datafusion-cli`]: https://datafusion.apache.org/user-guide/cli/overview.html
24+
`dft` builds upon [`datafusion-cli`](https://datafusion.apache.org/user-guide/cli/overview.html) with enhanced interactivity, additional integrations, and ready-to-use server implementations.
2725

2826
## User Guide
2927

3028
### Installation
3129

32-
Currently, the only supported packaging is on [crates.io](https://crates.io/search?q=datafusion-dft). If you already have Rust installed it can be installed by running `cargo install datafusion-dft`. If rust is not installed you can download following the directions [here](https://www.rust-lang.org/tools/install).
30+
#### From crates.io (Recommended)
31+
```sh
32+
# If you have Rust installed
33+
cargo install datafusion-dft
3334

34-
### Running the apps
35+
# For full functionality with all features
36+
cargo install datafusion-dft --all-features
37+
```
3538

36-
The command for each of the apps are:
39+
If you don't have Rust installed, follow the [installation instructions](https://www.rust-lang.org/tools/install).
3740

41+
#### Feature Flags
42+
Common feature combinations:
3843
```sh
39-
# TUI (enabled by default)
44+
# Core with S3 support
45+
cargo install datafusion-dft --features=s3
46+
47+
# Data lake formats
48+
cargo install datafusion-dft --features=deltalake,iceberg,hudi
49+
50+
# With JSON and Parquet functions
51+
cargo install datafusion-dft --features=function-json,functions-parquet
52+
```
53+
54+
See the [Features documentation](docs/features.md) for all available features.
55+
56+
### Running the apps
57+
58+
```sh
59+
# Interactive TUI (default)
4060
dft
4161

42-
# Execute command with CLI (enabled by default)
43-
dft -c "SELECT 1"
62+
# CLI with direct query execution
63+
dft -c "SELECT 1 + 2"
4464

45-
# Execute SQL from file (enabled by default)
65+
# CLI with file-based query
4666
dft -f query.sql
4767

68+
# Benchmark a query (with stats)
69+
dft -c "SELECT * FROM my_table" --bench
70+
4871
# Start FlightSQL Server (requires `flightsql` feature)
4972
dft serve-flightsql
5073

5174
# Start HTTP Server (requires `http` feature)
5275
dft serve-http
5376
```
5477

55-
### DDL
56-
57-
The CLI can also run your configured DDL prior to executing the query by adding the `--run-ddl` parameter.
78+
### Setting Up Tables with DDL
5879

59-
To have the best experience with `dft` it is highly recommended to define all of your DDL in `~/.config/ddl.sql` so that any tables you wish to query are available at startup. Additionally, now that DataFusion supports `CREATE VIEW` via sql you can also make a `VIEW` based on these tables.
80+
`dft` can automatically load table definitions at startup, giving you a persistent "database-like" experience.
6081

61-
For example, your DDL file could look like the following:
82+
#### Using DDL Files
6283

63-
```
64-
CREATE EXTERNAL TABLE users STORED AS NDJSON LOCATION 's3://bucket/users';
65-
66-
CREATE EXTERNAL TABLE transactions STORED AS PARQUET LOCATION 's3://bucket/transactions';
84+
1. Create a DDL file (default: `~/.config/dft/ddl.sql`)
85+
2. Add your table and view definitions:
6786

68-
CREATE EXTERNAL TABLE listings STORED AS PARQUET LOCATION 'file://folder/listings';
87+
```sql
88+
-- S3 data source (requires s3 feature)
89+
CREATE EXTERNAL TABLE users
90+
STORED AS NDJSON
91+
LOCATION 's3://bucket/users';
6992

70-
CREATE VIEW OR REPLACE users_listings AS SELECT * FROM users LEFT JOIN listings USING (user_id);
71-
```
93+
-- Parquet files
94+
CREATE EXTERNAL TABLE transactions
95+
STORED AS PARQUET
96+
LOCATION 's3://bucket/transactions';
7297

73-
This would make the tables `users`, `transactions`, `listings`, and the view `users_listings` available at startup. Any of these DDL statements could also be run interactively from the SQL editor as well to create the tables.
98+
-- Local files
99+
CREATE EXTERNAL TABLE listings
100+
STORED AS PARQUET
101+
LOCATION 'file://folder/listings';
74102

75-
# Additional Documentation
103+
-- Create views from tables
104+
CREATE VIEW users_listings AS
105+
SELECT * FROM users
106+
LEFT JOIN listings USING (user_id);
76107

77-
Links to more detailed documentation for each of the apps and all of the features can be found below.
108+
-- Delta Lake table (requires deltalake feature)
109+
CREATE EXTERNAL TABLE delta_table
110+
STORED AS DELTATABLE
111+
LOCATION 's3://bucket/delta_table';
112+
```
78113

79-
- [Features](docs/features.md)
80-
- [CLI Docs](docs/cli.md)
81-
- [TUI Docs](docs/tui.md)
82-
- [FlightSQL Server Docs](docs/flightsql_server.md)
83-
- [HTTP Server Docs](docs/http_server.md)
84-
- [Config Reference](docs/config.md)
114+
#### Loading DDL
115+
116+
- **TUI**: DDL is automatically loaded at startup
117+
- **CLI**: Add `--run-ddl` flag to execute DDL before your query
118+
- **Custom Path**: Configure a custom DDL path in your config file
119+
```toml
120+
[execution]
121+
ddl_path = "/path/to/my/ddl.sql"
122+
```
123+
124+
## Quick Reference
125+
126+
| Feature | Documentation |
127+
|---------|---------------|
128+
| **Core Features** | [Features Guide](docs/features.md) |
129+
| **TUI Interface** | [TUI Guide](docs/tui.md) |
130+
| **CLI Usage** | [CLI Guide](docs/cli.md) |
131+
| **FlightSQL Server** | [FlightSQL Guide](docs/flightsql_server.md) |
132+
| **HTTP Server** | [HTTP Guide](docs/http_server.md) |
133+
| **Configuration Options** | [Config Reference](docs/config.md) |

docs/config.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,45 @@
1-
# Config Reference
1+
# Configuration Reference
22

3-
`dft` is configuration through a TOML file with default location of `~/.config/dft/config.toml`. Within the config there is a shared configuration that can be used across all apps and app specific configuration. The shared and app specific configs are merged to come up with a final configuration that is used. DataFusion's execution configuration can be fully customized as part of this.
3+
`dft` is configured through a TOML file located at `~/.config/dft/config.toml` by default. The configuration system uses a hierarchical approach:
4+
5+
1. **Default values** built into the application
6+
2. **Shared configuration** applied to all interfaces
7+
3. **App-specific configuration** for each interface (TUI, CLI, etc.)
8+
9+
The final configuration for each interface is a merge of these three layers, with app-specific settings taking precedence over shared settings.
10+
11+
## Configuration Structure
412

5-
The sections for configuring each app are shown below.
613
```toml
14+
# Settings applied to all interfaces
715
[shared]
16+
# ...shared settings...
817

18+
# CLI-specific settings
919
[cli]
20+
# ...CLI settings...
1021

22+
# TUI-specific settings
1123
[tui]
24+
# ...TUI settings...
1225

26+
# FlightSQL client settings
1327
[flightsql_client]
28+
# ...FlightSQL client settings...
1429

30+
# FlightSQL server settings
1531
[flightsql_server]
32+
# ...FlightSQL server settings...
33+
34+
# HTTP server settings
35+
[http_server]
36+
# ...HTTP server settings...
37+
```
38+
39+
You can specify a different config file with the `--config` parameter:
1640

41+
```bash
42+
dft --config /path/to/custom/config.toml
1743
```
1844

1945
## Execution Config

docs/flightsql_server.md

Lines changed: 51 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,76 @@
11
# FlightSQL Server Guide
22

3-
`dft` includes an **experimental FlightSQL server** that you can run to provide programmatic SQL access to DataFusion.
4-
5-
---
3+
`dft` includes a FlightSQL server that provides programmatic SQL access to DataFusion through a standards-compliant interface. This enables you to connect using language-specific FlightSQL clients and build applications on top of DataFusion.
64

75
## Starting the Server
86

97
```sh
8+
# Start with default settings
109
dft serve-flightsql
10+
11+
# Start with your custom DDL loaded
12+
dft serve-flightsql --run-ddl
1113
```
1214

13-
## Endpoints
15+
## Supported Operations
16+
17+
The server implements the FlightSQL protocol, providing:
1418

15-
TODO List endpoints
19+
- SQL query execution
20+
- Schema fetching (TODO)
21+
- Prepared statements (TODO)
22+
- Catalog browsing (TODO)
1623

17-
## Auth
24+
## Client Connections (TODO - Test this)
1825

19-
Require basic or bearer authentication to make requests.
26+
You can connect to the server using any FlightSQL-compatible client:
27+
28+
```python
29+
TODO with https://docs.influxdata.com/influxdb3/clustered/reference/client-libraries/flight/python-flightsql-dbapi/
30+
```
31+
32+
## Authentication
33+
34+
Configure Basic Auth or Bearer Token authentication:
2035

2136
```toml
2237
[flightsql_server.auth]
38+
# Option 1: Bearer token auth
2339
bearer_token = "MyToken"
24-
basic_auth.username = "User"
40+
41+
# Option 2: Basic auth
42+
basic_auth.username = "User"
2543
basic_auth.password = "Pass"
2644
```
2745

28-
## Metrics
29-
30-
Prometheus metrics are automatically published.
46+
## Metrics and Monitoring
3147

48+
Prometheus metrics are automatically published to help you monitor server performance:
3249

3350
```toml
3451
[flightsql_server]
52+
# Configure metrics port
3553
server_metrics_port = "0.0.0.0:9000"
3654
```
55+
56+
Available metrics include:
57+
- Query execution time
58+
- Active connections
59+
- Errors by type
60+
- Memory usage
61+
62+
## Configuration
63+
64+
Set connection and execution parameters in your config file:
65+
66+
```toml
67+
[flightsql_server]
68+
# Server address
69+
connection_url = "http://localhost:50051"
70+
71+
# Configure execution parameters
72+
[flightsql_server.execution.datafusion]
73+
target_partitions = 8
74+
```
75+
76+
See the [Config Reference](config.md) for all available options.

0 commit comments

Comments
 (0)