Skip to content

Add DuckDB-style Table Functions for Catalog Metadata#5

Merged
shefeek-jinnah merged 9 commits into
mainfrom
shefeek/catalog_management_functions
Dec 17, 2025
Merged

Add DuckDB-style Table Functions for Catalog Metadata#5
shefeek-jinnah merged 9 commits into
mainfrom
shefeek/catalog_management_functions

Conversation

@shefeek-jinnah
Copy link
Copy Markdown
Collaborator

@shefeek-jinnah shefeek-jinnah commented Dec 11, 2025

Adds SQL-queryable information_schema virtual tables to expose DuckLake catalog metadata, following the SQL standard pattern

#4

Changes

  • New virtual tables in information_schema schema:

    • snapshots - Lists all catalog snapshots
    • schemata - Lists schemas with paths
    • tables - Lists tables across all schemas
    • columns - Lists columns with types
    • files - Lists data files with size and delete file status
  • Live querying: All tables query the catalog database on every execution (no caching)

  • Integration: Registered as a special schema in DuckLakeCatalog

User SQL Query

DataFusion Query Planner

DuckLakeCatalog.schema("information_schema")

InformationSchemaProvider.table("tables")

TablesTable (Custom TableProvider)

TablesTable.scan() → LIVE QUERY

MetadataProvider.list_*() methods

DuckDB Catalog Database

Arrow RecordBatch (fresh results)

New Table Functions

Three new table functions are available:

  • ducklake_snapshots() - List all snapshots in the catalog
  • ducklake_table_info() - Table metadata with aggregated file statistics
  • ducklake_list_files() - Enumerate all data files across tables

Registration

use datafusion::prelude::*;
use datafusion_ducklake::{DuckLakeCatalog, DuckdbMetadataProvider, register_ducklake_functions};
use std::sync::Arc;

let provider = DuckdbMetadataProvider::new("catalog.db")?;
let ducklake_catalog = DuckLakeCatalog::new(provider)?;
let ctx = SessionContext::new();

// Get provider before moving catalog
let provider = ducklake_catalog.provider();

// Standard DataFusion two-step registration
ctx.register_catalog("ducklake", Arc::new(ducklake_catalog));
register_ducklake_functions(&ctx, provider);

Usage

Table Functions (DuckDB-style)

-- List all snapshots
SELECT * FROM ducklake_snapshots();

-- Get table statistics
SELECT table_name, file_count, file_size_bytes
FROM ducklake_table_info();

-- List all data files
SELECT schema_name, table_name, file_path, file_size_bytes
FROM ducklake_list_files();

-- Aggregations work naturally
SELECT SUM(file_size_bytes) as total_bytes
FROM ducklake_table_info();

information_schema Tables (SQL Standard)
The underlying information_schema virtual tables remain available:


-- List all snapshots
SELECT * FROM ducklake.information_schema.snapshots;

-- Find all tables in a schema
SELECT table_name FROM ducklake.information_schema.tables
WHERE schema_name = 'main';

-- Get column details
SELECT column_name, column_type FROM ducklake.information_schema.columns
WHERE table_name = 'users';

-- List all data files
SELECT schema_name, table_name, file_path, file_size_bytes
FROM ducklake.information_schema.files;

-- Aggregated table statistics
SELECT table_name, file_count, file_size_bytes
FROM ducklake.information_schema.table_info;

@shefeek-jinnah shefeek-jinnah changed the title Add information_schema virtual tables for catalog metadata queries Add DuckDB-style Table Functions for Catalog Metadata Dec 12, 2025
@shefeek-jinnah shefeek-jinnah marked this pull request as ready for review December 15, 2025 16:41
Comment thread src/information_schema.rs Outdated
Comment thread src/information_schema.rs
Comment thread src/information_schema.rs Outdated
Comment thread src/catalog.rs Outdated
Comment thread src/information_schema.rs Outdated
Comment thread src/lib.rs Outdated
Comment thread src/catalog.rs
@shefeek-jinnah shefeek-jinnah marked this pull request as draft December 16, 2025 15:47
@shefeek-jinnah shefeek-jinnah marked this pull request as ready for review December 16, 2025 17:04
Comment thread src/metadata_provider.rs Outdated
Comment thread src/metadata_provider.rs Outdated
Comment thread src/metadata_provider.rs Outdated
Comment thread src/metadata_provider_duckdb.rs Outdated
Comment thread src/information_schema.rs Outdated
Comment thread src/information_schema.rs Outdated
Comment thread src/information_schema.rs Outdated
Copy link
Copy Markdown
Collaborator

@zfarrell zfarrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@shefeek-jinnah shefeek-jinnah merged commit 0b06715 into main Dec 17, 2025
3 checks passed
@shefeek-jinnah shefeek-jinnah deleted the shefeek/catalog_management_functions branch December 17, 2025 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants