[epic] multi-catalog support

## Background

DataFusion uses a three-level catalog hierarchy (catalog → schema → table). DuckLake 1.0 is two-level (schema → table). Today this extension collapses them: one DuckLake metadata database corresponds to one DataFusion catalog. This blocks use cases that require multiple catalogs without the operational burden of maintaining parallel metadata infrastructure (for example, DuckDB's pattern of attaching the same metadata database multiple times under different `METADATA_SCHEMA` values).

## Goals

- Refactor the extension's metadata provider trait so catalog becomes a first-class dimension.
- Preserve compatibility with vanilla DuckLake 1.0 metadata. When using a standard metadata provider against a standard metadata database, the extension exposes a single implicit default catalog and behaves exactly as it does today.
- Enable a specialized metadata provider, backed by a metadata schema that is a superset of DuckLake 1.0, to expose multiple catalogs from a single metadata database.
- Longer term, help drive an upstream DuckLake spec change for first-class catalogs. Not an immediate goal.

## Non-goals

- Cross-engine interoperability of the multi-catalog functionality. A catalog-aware metadata database is not expected to be readable by DuckDB or other DuckLake implementations.
- Compatibility with standard DuckLake metadata applies only when using a compatible metadata provider. The multi-catalog provider is not required to also read or write standard DuckLake metadata.
- Upstream spec changes as part of delivering this.

## Design note: metadata changes are not cosmetic

A naive approach would add a `ducklake_catalog` table and a `catalog_id` column on `ducklake_schema`, letting scoping flow transitively through `schema_id`. On review, this is not sufficient.

`schema_version` on `ducklake_snapshot` is a single global counter for the entire instance. Any DDL anywhere bumps it, and cache invalidation keys off this number:

```
ducklake_snapshot
├── snapshot_id
├── schema_version   (one number, entire instance)
├── next_catalog_id
└── next_file_id
```

With multiple catalogs in one metadata database, an `ALTER TABLE` on a small staging table in catalog B would invalidate cached metadata for every schema in catalog A. `ducklake_snapshot_lineage` helps with time travel and conflict detection but does not scope `schema_version` per catalog.

The implication: `catalog_id` needs to propagate down through the metadata, at minimum into snapshot-level versioning, so cache invalidation and snapshot evolution can be scoped per catalog rather than instance-wide. The exact shape is part of this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[epic] multi-catalog support #107

Background

Goals

Non-goals

Design note: metadata changes are not cosmetic

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[epic] multi-catalog support #107

Description

Background

Goals

Non-goals

Design note: metadata changes are not cosmetic

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions