MPZSQL - Apache Arrow FlightSQL Server

MPZSQL is a Python implementation of an Apache Arrow FlightSQL server that supports DuckLake.

See these resources for more understanding:

My initial idea
A demo of what you can achieve if you make it run.

Update 17-Sep-2025 MPZSQL now targets DuckDB 1.4.0 alongside the DuckLake 0.3 platform update. Read more about the DuckLake 0.3 release here.

Update 12-Sep-2025 The Azure deploment repo can be found here: https://github.com/MiguelElGallo/mpzsql_azinfra

Warning!

This software is in experimental state. I have not tested the security features yet. The software it uses like DuckLake is also in experimental state. (Aligned with the September 2025 DuckLake 0.3 release and DuckDB 1.4.0 runtime.)

Do NOT use in production!

Minimum Configuration for Starting in DuckLake Mode

Azure Login

Make sure you are authenticated with Azure CLI:

az login

The server relies on Azure Identity's DefaultAzureCredential, so it will automatically pick up the credentials produced by az login when it runs. The Azure account you use (or you log in with) must have data-plane permissions on the storage account that hosts your DuckLake files:

Storage Blob Data Contributor (if you need write access)

You can grant those roles at the storage-account or container scope. See Microsoft's documentation for details:

Assign data-plane roles – https://learn.microsoft.com/azure/storage/common/storage-auth-aad-rbac-portal
Role definitions – https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#storage-blob-data-roles

Without the correct data-plane privileges, the server will fail when it tries to list, read, or write blobs.

Running the Server

Set all the environment variables like: POSTGRESQL_SERVER, POSTGRESQL_USER, etc. and then run the following command:

uv run python -m mpzsql.cli \
  --database "localconf.duckdb" \
  --print-queries \
  --secret-key "test-secret-key" \
  --postgresql-server "$POSTGRESQL_SERVER" \
  --postgresql-port "$POSTGRESQL_PORT" \
  --postgresql-user "$POSTGRESQL_USER" \
  --postgresql-password "$POSTGRESQL_PASSWORD" \
  --postgresql-catalogdb "$POSTGRESQL_CATALOGDB" \
  --azure-storage-account "$AZURE_STORAGE_ACCOUNT" \
  --azure-storage-container "$AZURE_STORAGE_CONTAINER"

The PostgreSQL database must exist, as mentioned.

The server is verbose and creates 5 log files, check the root folder.

Testing with a Client

The server will run on localhost port 8080. You can use any JDBC client to connect, with a connection string like: jdbc:arrow-flight-sql://localhost:8080?useEncryption=false&disableCertificateVerification=true

I have tested with DBeaver, and I found this guide.

Notes about Logfire

The server can send logs to Logfire. Just set the environment variable LOGFIRE_WRITE_TOKEN.

Features

Apache Arrow FlightSQL interface
JDBC interface

FlightSQL Implementation Status

✅ Implemented FlightSQL Actions

CreatePreparedStatement - Create prepared statements for efficient query execution
ClosePreparedStatement - Clean up prepared statements
BeginTransaction - Start database transactions
EndTransaction - Commit or rollback transactions
CloseSession - Clean up session resources and state

✅ Implemented FlightSQL Commands

CommandStatementQuery - Execute SQL queries
CommandGetCatalogs - List database catalogs
CommandGetDbSchemas - List database schemas
CommandGetTables - List tables with metadata
CommandGetTableTypes - List available table types
CommandGetColumns - List column metadata
CommandGetSqlInfo - SQL capability information
CommandPreparedStatementQuery - Execute prepared statements
CommandStatementUpdate - Execute INSERT/UPDATE/DELETE statements
CommandPreparedStatementUpdate - Execute prepared statement updates

❌ Not Implemented Yet

The following FlightSQL commands are not yet implemented but could be added for enhanced JDBC compatibility:

Priority 1: Essential Metadata Commands

GetXdbcTypeInfo - JDBC/ODBC type information
GetPrimaryKeys - Primary key metadata

Priority 2: Advanced Metadata Commands

GetImportedKeys - Foreign key information (imported)
GetExportedKeys - Foreign key information (exported)
GetCrossReference - Cross-reference between tables

Current Compatibility: ~85% FlightSQL compatible With Priority 1: ~95% FlightSQL compatible With full implementation: 100% FlightSQL compatible

FlightSQL → DuckDB backend mapping

This document maps FlightSQL operations to their DuckDB backend implementations and the SQL each method executes.

Flight SQL: the FlightSQL command/action.
Backend method: method in src/mpzsql/backends/duckdb_backend.py.
SQL / Behavior: the actual SQL executed (or behavior when no direct SQL).

Flight SQL	Backend method	SQL / Behavior
CommandGetCatalogs	get_catalogs	SELECT DISTINCT catalog_name FROM information_schema.schemata ORDER BY catalog_name; Fallback: SHOW DATABASES
CommandGetDbSchemas	get_db_schemas	Base: SELECT catalog_name, schema_name AS db_schema_name FROM information_schema.schemata WHERE 1 = 1 [AND catalog_name = ? or AND catalog_name = CURRENT_DATABASE()] [AND schema_name LIKE ?] ORDER BY catalog_name, db_schema_name
CommandGetTables	get_tables	Base: SELECT table_catalog AS catalog_name, table_schema AS db_schema_name, table_name, table_type FROM information_schema.tables WHERE 1=1 [AND table_catalog = ? or AND table_catalog = CURRENT_DATABASE()] [AND table_schema LIKE ?] [AND table_name LIKE ?] [AND table_type IN (?, ...)] ORDER BY table_name
CommandGetTableTypes	get_table_types	No direct SQL; returns a static Arrow table: ["BASE TABLE","VIEW","LOCAL TEMPORARY","SYSTEM TABLE"]
CommandGetColumns	get_columns	Base: SELECT table_catalog AS catalog_name, table_schema AS db_schema_name, table_name, column_name, data_type, ordinal_position AS "DECIMAL_DIGITS", 'YES' AS "IS_NULLABLE", 0 AS "NUM_PREC_RADIX" FROM information_schema.columns WHERE 1=1 [AND table_catalog = '...'] [AND table_schema LIKE '...'] [AND table_name LIKE '...'] [AND column_name LIKE '...'] ORDER BY table_catalog, table_schema, table_name, ordinal_position; Then reshapes to FlightSQL GetColumns schema (adds/mapps data_type codes, etc.)
CommandGetSqlInfo	get_sql_info	No direct SQL; returns a placeholder Arrow table from provided info codes
CommandStatementQuery	execute_query	Executes client-provided SELECT/SQL and returns Arrow table (varies per client query)
CommandStatementUpdate	execute_update	Executes client-provided DML (INSERT/UPDATE/DELETE); returns affected rows (DuckDB returns BIGINT count)
CommandPreparedStatementQuery	get_statement_schema (prepare phase) / execute_query (execution)	Prepare phase (schema inference): typically PREPARE stmt AS {query}; DESCRIBE stmt; DEALLOCATE stmt. For parameterized queries: attempts a LIMIT 0 variant replacing ? with 1; fallback to PREPARE with NULLs. Execution: same as CommandStatementQuery with parameters
CommandPreparedStatementUpdate	execute_update	Executes client-provided DML with bound parameters; returns affected rows
Action: CreatePreparedStatement	get_statement_schema	Same schema inference logic as above: PREPARE/ DESCRIBE/ DEALLOCATE; for parameterized queries, tries LIMIT 0 with dummy values, fallback PREPARE with NULLs
Action: ClosePreparedStatement	—	No backend SQL; server clears cached handle
Action: BeginTransaction / EndTransaction	—	No backend SQL in current DuckDB backend; transaction state tracked in server layer
Action: CloseSession	—	No backend SQL; server/session cleanup only

Notes:

Patterns: Incoming FlightSQL patterns use * and are converted to SQL LIKE % in backend methods.
CURRENT_DATABASE(): When catalog is not provided for GetTables/GetDbSchemas, backend uses CURRENT_DATABASE() to match FlightSQL metadata behavior.
Errors/Empty results: On errors, backend returns empty Arrow tables with correct schemas where appropriate.

Arrow Flight → DuckDB backend mapping

This document maps core Arrow Flight methods implemented in MinimalFlightSQLServer to DuckDB backend behavior in DuckDBBackend, including the SQL they execute (when applicable).

Reference: https://arrow.apache.org/docs/python/api/flight.html

Arrow Flight API	Server method	Backend method(s)	DuckDB SQL / Behavior
Handshake	`MinimalFlightSQLServer.handshake`	—	Authentication handshake; no SQL. Returns capability/auth messages.
ListFlights	`MinimalFlightSQLServer.list_flights`	—	Advertises a few metadata endpoints as PATH descriptors; no SQL.
GetSchema (DescriptorType.PATH)	`MinimalFlightSQLServer._get_schema_for_path`	—	Static schemas for “catalogs”, “schemas”, “tables”, “table_types”, “sql_info”. Unknown paths return a generic schema; no SQL.
GetFlightInfo (DescriptorType.PATH: “<table_name>”)	`MinimalFlightSQLServer.get_flight_info`	`DuckDBBackend.get_table_schema`, `DuckDBBackend.get_table_row_count`	Schema: inferred via a lightweight select (typically `SELECT * FROM {table_name} LIMIT 1`) to get Arrow schema. Row count: `SELECT COUNT(*) FROM {table_name}`. Ticket returned: `PATH:{table_name}`.
DoGet (Ticket produced by PATH GetFlightInfo)	`MinimalFlightSQLServer.do_get`	—	MISSING: current implementation does

Configuration Options

MPZSQL can be configured using command-line switches or environment variables. Environment variables take precedence over CLI defaults but CLI switches take precedence over environment variables (except for MPZSQL_PORT which always takes precedence).

Backend Options

Switch	Environment Variable	Description	Type	Default
`--backend`	-	Database backend (duckdb)	string	duckdb
`--database`	-	Database filename (defaults to in-memory for DuckDB)	string	None

Network Options

Switch	Environment Variable	Description	Type	Default
`--hostname`	`MPZSQL_HOSTNAME`	Server hostname to listen on	string	localhost
`--advertised-hostname`	`MPZSQL_ADVERTISED_HOSTNAME` or `WEBSITE_HOSTNAME`	Hostname to advertise to clients (defaults to hostname)	string	None
`--port`	`MPZSQL_PORT`	Server port (MPZSQL_PORT env var takes precedence over CLI)	int (1-65535)	8080

Authentication Options

Switch	Environment Variable	Description	Type	Default
`--username`	`MPZSQL_USERNAME`	Authentication username	string	None
`--password`	`MPZSQL_PASSWORD`	Authentication password	string	None
`--secret-key`	`SECRET_KEY`	JWT secret key (random if not provided)	string	None

TLS Options

Switch	Environment Variable	Description	Type	Default
`--tls-cert`	-	TLS certificate file path	string	None
`--tls-key`	-	TLS private key file path	string	None
`--mtls-ca`	`MPZSQL_MTLS_CA`	mTLS CA certificate for client verification	string	None

SQL Initialization Options

Switch	Environment Variable	Description	Type	Default
`--init-sql`	`MPZSQL_INIT_SQL`	SQL commands to run on startup	string	None
`--init-sql-file`	`MPZSQL_INIT_SQL_FILE`	File containing SQL commands to run on startup	string	None

Server Behavior Options

Switch	Environment Variable	Description	Type	Default
`--print-queries`	-	Print executed queries to console	bool	False
`--read-only`	-	Enable read-only mode	bool	False

PostgreSQL Connection Options

Switch	Environment Variable	Description	Type	Default
`--postgresql-server`	`POSTGRESQL_SERVER`	PostgreSQL server hostname	string	None
`--postgresql-port`	`POSTGRESQL_PORT`	PostgreSQL server port	int	5432
`--postgresql-user`	`POSTGRESQL_USER`	PostgreSQL username	string	None
`--postgresql-password`	`POSTGRESQL_PASSWORD`	PostgreSQL password (use "AZURE" for Azure AD auth)	string	None
`--postgresql-catalogdb`	`POSTGRESQL_CATALOGDB`	PostgreSQL catalog database name	string	None

Azure Storage Connection Options

Switch	Environment Variable	Description	Type	Default
`--azure-storage-account`	`AZURE_STORAGE_ACCOUNT`	Azure Storage account name	string	None
`--azure-storage-container`	`AZURE_STORAGE_CONTAINER`	Azure Storage container name	string	None

Other Options

Switch	Environment Variable	Description	Type	Default
`--version`	-	Show version and exit	bool	False

Notes

Both --tls-cert and --tls-key must be provided together
The MPZSQL_PORT environment variable takes precedence over the --port CLI option
For Azure AD authentication with PostgreSQL, set --postgresql-password AZURE
The WEBSITE_HOSTNAME environment variable is automatically set by Azure Web Apps

Starting the Server

Check the help with:

uv run python -m mpzsql.cli --help

Development

Running Tests

Tests can be run locally using:

uv run pytest tests/ -v

For coverage reporting:

uv run coverage run -m pytest tests/
uv run coverage report
uv run coverage xml

Setting up Codecov (Optional)

If you want to enable code coverage reporting to Codecov in GitHub Actions, you need to:

Create a free account at codecov.io
Add your repository to Codecov
Get your repository's upload token from Codecov
Add the token as a repository secret named CODECOV_TOKEN in your GitHub repository settings

Without this token, the Codecov upload step will be skipped in the CI pipeline, but all tests will still run successfully.

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.github		.github
.vscode		.vscode
certs		certs
proto		proto
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CODE_QUALITY.md		CODE_QUALITY.md
CONTRIBUTING.md		CONTRIBUTING.md
FLIGHT_DO_PUT_IMPLEMENTATION_CHECKLIST.md		FLIGHT_DO_PUT_IMPLEMENTATION_CHECKLIST.md
FLIGHT_DO_PUT_IMPLEMENTATION_PLAN.md		FLIGHT_DO_PUT_IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
PHASE1_TEST_SUITE_SUMMARY.md		PHASE1_TEST_SUITE_SUMMARY.md
PHASE2_TEST_SUITE_SUMMARY.md		PHASE2_TEST_SUITE_SUMMARY.md
PHASE3_TEST_SUITE_SUMMARY.md		PHASE3_TEST_SUITE_SUMMARY.md
PHASE_1_IMPLEMENTATION_COMPLETE.md		PHASE_1_IMPLEMENTATION_COMPLETE.md
PRIORITY_3_COMPLETE.md		PRIORITY_3_COMPLETE.md
README.md		README.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
test_minimal_client.py		test_minimal_client.py
test_phase1_implementation.py		test_phase1_implementation.py
test_phase1_syntax.py		test_phase1_syntax.py
test_phase1_verification.py		test_phase1_verification.py
use_proto.md		use_proto.md
uv.lock		uv.lock

Uh oh!

License

MiguelElGallo/mpzsql

Folders and files

Latest commit

History

Repository files navigation

MPZSQL - Apache Arrow FlightSQL Server

Warning!

Minimum Configuration for Starting in DuckLake Mode

Azure Login

Running the Server

Testing with a Client

Notes about Logfire

Features

FlightSQL Implementation Status

✅ Implemented FlightSQL Actions

✅ Implemented FlightSQL Commands

❌ Not Implemented Yet

Priority 1: Essential Metadata Commands

Priority 2: Advanced Metadata Commands

FlightSQL → DuckDB backend mapping

Arrow Flight → DuckDB backend mapping

Configuration Options

Backend Options

Network Options

Authentication Options

TLS Options

SQL Initialization Options

Server Behavior Options

PostgreSQL Connection Options

Azure Storage Connection Options

Other Options

Notes

Starting the Server

Development

Running Tests

Setting up Codecov (Optional)

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages