datafusion-skills

A Claude Code plugin that adds Apache DataFusion-powered skills for data exploration, querying, and materialized views.

Installation

From GitHub

Add the repository as a plugin source and install:

/plugin marketplace add datafusion-contrib/datafusion-skills
/plugin install datafusion-skills@datafusion-skills

This registers the GitHub repo as a marketplace and installs the plugin. Skills will be available as /datafusion-skills:<skill-name> in all future sessions.

Updating

/plugin marketplace update datafusion-skills
/plugin update datafusion-skills@datafusion-skills

Skills

`query`

Run SQL queries against registered tables or ad-hoc against files. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, Arrow IPC, and Avro.

/datafusion-skills:query SELECT * FROM 'trades.parquet' WHERE symbol = 'AAPL' LIMIT 10
/datafusion-skills:query "what are the top 5 symbols by volume?"
/datafusion-skills:query FROM sales WHERE amount > 100

`read-file`

Read and explore any data file — Parquet, CSV, JSON, Arrow IPC, Avro — locally or from S3/GCS. Auto-detects format by extension.

/datafusion-skills:read-file trades.parquet what columns does it have?
/datafusion-skills:read-file s3://my-bucket/data.parquet describe the schema
/datafusion-skills:read-file metrics.csv how many rows?

`create-table`

Register a data file as a persistent external table. Explores the schema and persists the registration so all other skills can access the table automatically.

/datafusion-skills:create-table trades.parquet
/datafusion-skills:create-table data.csv --name sales --format csv

`materialized-view`

Create and manage materialized views — persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes.

/datafusion-skills:materialized-view "create a daily summary of trades grouped by symbol"
/datafusion-skills:materialized-view refresh trades_daily
/datafusion-skills:materialized-view status
/datafusion-skills:materialized-view list

`explain-plan`

Visualize and analyze query execution plans. Identifies performance bottlenecks and suggests optimizations.

/datafusion-skills:explain-plan SELECT * FROM trades WHERE date > '2024-01-01'
/datafusion-skills:explain-plan --analyze SELECT COUNT(*) FROM large_table GROUP BY category

`datafusion-docs`

Search Apache DataFusion documentation — user guide, SQL reference, and API docs. Returns relevant documentation for a question or keyword.

/datafusion-skills:datafusion-docs window functions
/datafusion-skills:datafusion-docs "how do I create an external table?"
/datafusion-skills:datafusion-docs APPROX_PERCENTILE_CONT

`install-datafusion`

Install or update datafusion-cli. Supports Homebrew, cargo install, and pre-built binaries.

/datafusion-skills:install-datafusion
/datafusion-skills:install-datafusion --update

Session state

All skills share a single state.sql file per project — a plain SQL file containing CREATE EXTERNAL TABLE statements and configuration. When state is first needed, you'll be asked where to store it:

In the project directory (.datafusion-skills/state.sql) — colocated with the project, optionally gitignored
In your home directory (~/.datafusion-skills/<project>/state.sql) — keeps the repo clean

Any skill restores the session via datafusion-cli --file state.sql.

How the skills work together

Skills reference each other where it makes sense:

read-file suggests query for follow-up exploration and create-table for persisting data
query uses session state from create-table automatically
materialized-view creates persistent Parquet files registered via create-table
explain-plan helps optimize queries from query
All skills use datafusion-docs to troubleshoot DataFusion errors automatically

Why DataFusion?

Apache DataFusion is a fast, extensible query engine built in Rust on top of Apache Arrow. It offers:

High performance: Vectorized execution, predicate pushdown, partition pruning
Standard SQL: Full SQL support including window functions, CTEs, subqueries
Extensibility: Custom table providers, UDFs, optimizer rules
File format support: Parquet, CSV, JSON, Arrow IPC, Avro
Cloud native: S3, GCS, Azure object store support
Materialized views: Persist query results and track dependencies (unique to DataFusion ecosystem)

Local development

# Clone the repo
git clone https://github.com/datafusion-contrib/datafusion-skills.git
cd datafusion-skills

# Launch Claude Code with the local plugin directory
claude --plugin-dir .

Test individual skills:

/datafusion-skills:read-file some_local_file.parquet
/datafusion-skills:query SELECT 42
/datafusion-skills:datafusion-docs window functions

Prerequisites: datafusion-cli must be installed. If it isn't, the skills will offer to install it via /datafusion-skills:install-datafusion.

Platform support

These skills have been tested on macOS and Linux. Windows is not yet fully supported.

Reporting issues

Found a bug or have an idea? Open an issue at:

https://github.com/datafusion-contrib/datafusion-skills/issues

For DataFusion-specific bugs, please include the datafusion-cli version (datafusion-cli --version) and the full error message.

License

Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude-plugin		.claude-plugin
skills		skills
test-data		test-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

datafusion-skills

Installation

From GitHub

Updating

Skills

`query`

`read-file`

`create-table`

`materialized-view`

`explain-plan`

`datafusion-docs`

`install-datafusion`

Session state

How the skills work together

Why DataFusion?

Local development

Platform support

Reporting issues

License

About

Releases

Packages

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

datafusion-skills

Installation

From GitHub

Updating

Skills

query

read-file

create-table

materialized-view

explain-plan

datafusion-docs

install-datafusion

Session state

How the skills work together

Why DataFusion?

Local development

Platform support

Reporting issues

License

About

Resources

Stars

Watchers

Forks

Releases

Packages

Contributors

`query`

`read-file`

`create-table`

`materialized-view`

`explain-plan`

`datafusion-docs`

`install-datafusion`