Skip to content

datafusion-contrib/datafusion-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datafusion-skills

A Claude Code plugin that adds Apache DataFusion-powered skills for data exploration, querying, and materialized views.

Installation

From GitHub

Add the repository as a plugin source and install:

/plugin marketplace add datafusion-contrib/datafusion-skills
/plugin install datafusion-skills@datafusion-skills

This registers the GitHub repo as a marketplace and installs the plugin. Skills will be available as /datafusion-skills:<skill-name> in all future sessions.

Updating

/plugin marketplace update datafusion-skills
/plugin update datafusion-skills@datafusion-skills

Skills

query

Run SQL queries against registered tables or ad-hoc against files. Accepts raw SQL or natural language questions. Supports Parquet, CSV, JSON, Arrow IPC, and Avro.

/datafusion-skills:query SELECT * FROM 'trades.parquet' WHERE symbol = 'AAPL' LIMIT 10
/datafusion-skills:query "what are the top 5 symbols by volume?"
/datafusion-skills:query FROM sales WHERE amount > 100

read-file

Read and explore any data file — Parquet, CSV, JSON, Arrow IPC, Avro — locally or from S3/GCS. Auto-detects format by extension.

/datafusion-skills:read-file trades.parquet what columns does it have?
/datafusion-skills:read-file s3://my-bucket/data.parquet describe the schema
/datafusion-skills:read-file metrics.csv how many rows?

create-table

Register a data file as a persistent external table. Explores the schema and persists the registration so all other skills can access the table automatically.

/datafusion-skills:create-table trades.parquet
/datafusion-skills:create-table data.csv --name sales --format csv

materialized-view

Create and manage materialized views — persist SQL query results as Parquet files for fast repeated access. Track source dependencies and refresh when data changes.

/datafusion-skills:materialized-view "create a daily summary of trades grouped by symbol"
/datafusion-skills:materialized-view refresh trades_daily
/datafusion-skills:materialized-view status
/datafusion-skills:materialized-view list

explain-plan

Visualize and analyze query execution plans. Identifies performance bottlenecks and suggests optimizations.

/datafusion-skills:explain-plan SELECT * FROM trades WHERE date > '2024-01-01'
/datafusion-skills:explain-plan --analyze SELECT COUNT(*) FROM large_table GROUP BY category

datafusion-docs

Search Apache DataFusion documentation — user guide, SQL reference, and API docs. Returns relevant documentation for a question or keyword.

/datafusion-skills:datafusion-docs window functions
/datafusion-skills:datafusion-docs "how do I create an external table?"
/datafusion-skills:datafusion-docs APPROX_PERCENTILE_CONT

install-datafusion

Install or update datafusion-cli. Supports Homebrew, cargo install, and pre-built binaries.

/datafusion-skills:install-datafusion
/datafusion-skills:install-datafusion --update

Session state

All skills share a single state.sql file per project — a plain SQL file containing CREATE EXTERNAL TABLE statements and configuration. When state is first needed, you'll be asked where to store it:

  1. In the project directory (.datafusion-skills/state.sql) — colocated with the project, optionally gitignored
  2. In your home directory (~/.datafusion-skills/<project>/state.sql) — keeps the repo clean

Any skill restores the session via datafusion-cli --file state.sql.

How the skills work together

Skills reference each other where it makes sense:

  • read-file suggests query for follow-up exploration and create-table for persisting data
  • query uses session state from create-table automatically
  • materialized-view creates persistent Parquet files registered via create-table
  • explain-plan helps optimize queries from query
  • All skills use datafusion-docs to troubleshoot DataFusion errors automatically

Why DataFusion?

Apache DataFusion is a fast, extensible query engine built in Rust on top of Apache Arrow. It offers:

  • High performance: Vectorized execution, predicate pushdown, partition pruning
  • Standard SQL: Full SQL support including window functions, CTEs, subqueries
  • Extensibility: Custom table providers, UDFs, optimizer rules
  • File format support: Parquet, CSV, JSON, Arrow IPC, Avro
  • Cloud native: S3, GCS, Azure object store support
  • Materialized views: Persist query results and track dependencies (unique to DataFusion ecosystem)

Local development

# Clone the repo
git clone https://github.com/datafusion-contrib/datafusion-skills.git
cd datafusion-skills

# Launch Claude Code with the local plugin directory
claude --plugin-dir .

Test individual skills:

/datafusion-skills:read-file some_local_file.parquet
/datafusion-skills:query SELECT 42
/datafusion-skills:datafusion-docs window functions

Prerequisites: datafusion-cli must be installed. If it isn't, the skills will offer to install it via /datafusion-skills:install-datafusion.

Platform support

These skills have been tested on macOS and Linux. Windows is not yet fully supported.

Reporting issues

Found a bug or have an idea? Open an issue at:

https://github.com/datafusion-contrib/datafusion-skills/issues

For DataFusion-specific bugs, please include the datafusion-cli version (datafusion-cli --version) and the full error message.

License

Apache License 2.0. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors