Support for `add_files` Functionality

## Feature Request: Support for `add_files` Functionality

### Problem Statement

Currently, pg_lake does not provide a way to register existing Parquet/ORC/Avro files into an Iceberg table without rewriting the data. This is a common use case when:

- Migrating existing data lakes to Iceberg
- Integrating external data that's already in optimal file formats
- Building incremental pipelines where data is written by other systems
- Avoiding unnecessary data rewrites for cost and performance reasons

While PostgreSQL's `COPY` command can import data, it rewrites the underlying files, which defeats the purpose of registering existing data in place.

### Proposed Solution

Iceberg's Java and Python APIs provide an `add_files` method that allows programmatic registration of existing data files into the catalog without rewriting them. I'd like to request similar functionality in pg_lake, preferably through one of these approaches:

**Option 1: SQL Procedure/Function**
```sql
-- Register existing files into an Iceberg table
CALL iceberg.add_files(
    table_name := 'my_table',
    file_paths := ARRAY[
        's3://bucket/data/file1.parquet',
        's3://bucket/data/file2.parquet'
    ]
);
```

**Option 2: Catalog Interoperability**
Expose pg_lake's catalog in a way that's compatible with PyIceberg's `SqlCatalog` or provide a REST catalog interface, allowing users to use PyIceberg for metadata operations like `add_files` while querying through pg_lake.

### Use Case Example

I have a data lake with thousands of existing Parquet files organized by date. I want to:
1. Create an Iceberg table in pg_lake with the appropriate schema
2. Register the existing Parquet files without copying/rewriting them
3. Query the data through PostgreSQL using pg_lake
4. Continue adding new files as they arrive

### Related Issues

This is related to #41 regarding catalog interoperability. Having either native `add_files` support or a way to use PyIceberg with pg_lake's catalog would solve this use case.

### Benefits

- Enables zero-copy data lake migrations to Iceberg
- Reduces storage costs and migration time
- Allows pg_lake to integrate with existing data pipelines
- Provides feature parity with standard Iceberg tooling

Would love to hear the maintainers' thoughts on this! Happy to provide more details or help test if this feature is being considered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `add_files` Functionality #190

Feature Request: Support for `add_files` Functionality

Problem Statement

Proposed Solution

Use Case Example

Related Issues

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for add_files Functionality #190

Description

Feature Request: Support for add_files Functionality

Problem Statement

Proposed Solution

Use Case Example

Related Issues

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support for `add_files` Functionality #190

Feature Request: Support for `add_files` Functionality