Drop-in Python files that add new data-source connectors to Data Formulator without modifying its source code. If the built-in connectors don't cover your data source (an internal warehouse, a SaaS API, a niche database), write a small plugin and DF will pick it up on the next restart.
This folder contains example plugins. Treat them as templates: copy one, rename it, and adapt the body.
-
Find your plugin directory. It lives under your Data Formulator home dir:
$DATA_FORMULATOR_HOME/plugins/If
DATA_FORMULATOR_HOMEis not set, DF defaults to~/.data_formulator/, so the plugin dir is~/.data_formulator/plugins/.Power users can point somewhere else with
DF_PLUGIN_DIR(highest precedence) — useful for sharing one plugin folder across multiple DF installs. -
Copy an example into it. For instance:
mkdir -p "${DATA_FORMULATOR_HOME:-$HOME/.data_formulator}/plugins" cp examples/plugins/sqlite_data_loader.py \ "${DATA_FORMULATOR_HOME:-$HOME/.data_formulator}/plugins/"
-
Restart Data Formulator. The new connector appears in the UI automatically. No registry edits, no rebuilds.
To verify it loaded, check the startup log for a line like:
INFO ... Plugin loader 'sqlite' registered from sqlite_data_loader.py
INFO ... Plugin scan complete: 1 registered, 0 failed (dir=..., reason=WORKSPACE_BACKEND=local)
| Precedence | Source | Default |
|---|---|---|
| 1 | DF_PLUGIN_DIR env var (explicit override) |
— |
| 2 | $DATA_FORMULATOR_HOME/plugins |
— |
| 3 | Fallback | ~/.data_formulator/plugins/ |
| Filename | Registry key |
|---|---|
sqlite_data_loader.py |
sqlite |
acme_warehouse_data_loader.py |
acme_warehouse |
notion_data_loader.py |
notion |
Rules:
- The filename must end in
_data_loader.py. - The prefix becomes the registry key — keep it lowercase, no spaces.
- If the key matches a built-in (e.g.
mysql_data_loader.py), the plugin overrides the built-in. Useful for hot-patching.
Each plugin defines exactly one class that subclasses
ExternalDataLoader.
The minimum surface area:
from data_formulator.data_loader.external_data_loader import (
ExternalDataLoader, MAX_IMPORT_ROWS,
)
import pyarrow as pa
class MyLoader(ExternalDataLoader):
# Optional: human-friendly UI label. Without this, the registry key
# is title-cased (``"my_warehouse"`` → ``"My Warehouse"``). Override
# to fix awkward casing (``"SQLite"``, ``"BigQuery"``).
DISPLAY_NAME = "My Warehouse"
@staticmethod
def list_params() -> list[dict]:
"""Declare connection-form fields. The UI auto-renders this."""
return [
{"name": "endpoint", "type": "string", "required": True,
"tier": "connection", "description": "Server URL"},
{"name": "token", "type": "string", "required": True,
"tier": "auth", "sensitive": True, "description": "API token"},
]
@staticmethod
def auth_instructions() -> str:
"""Markdown help text shown next to the form."""
return "Get your API token from https://example.com/settings/tokens"
def __init__(self, params: dict):
self.params = params
# validate + open connection here
def list_tables(self, table_filter: str | None = None) -> list[dict]:
"""Return catalog: [{name, metadata: {columns, row_count}}, ...]"""
...
def fetch_data_as_arrow(self, source_table: str,
import_options: dict | None = None) -> pa.Table:
"""Read rows. Honour import_options['size'] up to MAX_IMPORT_ROWS."""
...Look at sqlite_data_loader.py for a runnable
implementation (~170 lines, stdlib only).
| Key | Meaning |
|---|---|
name |
Parameter key passed into __init__(params) |
type |
"string", "int", "bool", "password" |
required |
If True, DF rejects connections that omit it |
default |
Pre-filled value in the form |
sensitive |
If True, value is redacted from logs / stored metadata |
tier |
"auth", "connection", or "filter" — groups fields visually |
description |
Help text shown under the field |
Plugins execute arbitrary Python in the server process. To prevent accidental code execution in shared deployments, the plugin scanner is enabled only in single-user local mode:
WORKSPACE_BACKENDunset orlocal→ scanner runs.WORKSPACE_BACKENDis anything else → scanner is skipped silently.- To opt in for a hosted deployment, set
DF_ALLOW_PLUGINS=1and make sure the plugin directory is writable only by trusted administrators.
The scanner reports every failure two ways:
-
In the log, with a full traceback:
WARNING Failed to load plugin foo_data_loader.py: ... ModuleNotFoundError ... -
In
DISABLED_LOADERS, which the frontend surfaces as a greyed-out connector with the reason. Common reasons:Message Fix missing dependency 'X' (pip install X)uv pip install Xin DF's venvno ExternalDataLoader subclass foundAdd a class Foo(ExternalDataLoader)to the filecould not create import specFile isn't valid Python or has wrong extension <ExceptionType>: <message>Anything else raised during import— check the traceback
The plugin scanner cleans sys.modules on failure, so simply restarting
DF after editing the file picks up the fix — no manual cleanup needed.
If your plugin needs a package that isn't already in DF's environment:
# from the data-formulator repo root
uv pip install <package>The plugin file is loaded into the same Python process as DF, so anything importable from DF's venv is importable from your plugin.
When asked to add a new data source to a user's Data Formulator install, prefer writing a plugin file over modifying DF's source tree:
- Read
sqlite_data_loader.pyand theExternalDataLoaderABC inpy-src/data_formulator/data_loader/external_data_loader.pyto understand the required methods. - Create a new file
<source>_data_loader.pyin this folder (or directly in~/.data_formulator/plugins/). - Subclass
ExternalDataLoader. Implement at minimum:list_params,auth_instructions,__init__,list_tables,fetch_data_as_arrow. - Return data as a
pyarrow.Tablefromfetch_data_as_arrow— do not convert to pandas in the hot path. - Quote identifiers when building SQL (see
_quote_identin the SQLite example) to avoid injection vulnerabilities. - Respect
import_options['size']and cap atMAX_IMPORT_ROWS. - If the source has credentials, mark those params
sensitive: Trueandtier: "auth"so DF redacts them from stored metadata. - Do not modify the user's data — open read-only connections where the source supports it.
After writing the file, verify it loads with:
DF_PLUGIN_DIR=<dir> uv run python -c \
"from data_formulator import data_loader as dl; \
print(dl.PLUGIN_LOADERS, dl.DISABLED_LOADERS)"