Skip to content

Conversation

@chl-wxp
Copy link
Contributor

@chl-wxp chl-wxp commented Jan 6, 2026

DuckDB Connector Notes / Known Limitations

This PR intentionally does not add DuckDB end-to-end (E2E) tests.
The decision is based on DuckDB’s embedded nature and the following concrete limitations.

1. DuckDB database file is process-scoped

A DuckDB database file is process-scoped.
A single .db file can only be opened by one process at a time.
If multiple processes attempt to open the same database file, DuckDB will throw runtime errors.

2. Only one connection per process is effectively supported

Within a single process, DuckDB effectively supports only one active connection.
Maintaining multiple concurrent connections in the same process may lead to exceptions or undefined behavior.

3. Primary key and unique key metadata cannot be reliably retrieved

DuckDB does not provide a stable or reliable way to retrieve primary key or unique constraint information via JDBC metadata.

Currently:

  • DatabaseMetaData does not consistently expose primary key or unique constraint information
  • There is no officially supported or stable workaround

As a result, features that depend on schema constraints cannot be safely implemented.

4. Column length metadata derived from SQL is severely incomplete

When deriving schema metadata from SQL queries (e.g., via ResultSetMetaData),
column length and precision information is often missing or highly incomplete.

This limits the usefulness of metadata-driven schema inference.

5. Identifier case behavior

DuckDB behaves as follows for table identifiers:

  • Table creation is case-sensitive
  • Creating another table with the same letters but different casing is not allowed

Example:

CREATE TABLE MyTable (...);
CREATE TABLE mytable (...); -- not allowed

6. UPSERT is not supported

DuckDB currently does not support UPSERT in a way that can be used by SeaTunnel
(e.g., no compatible MERGE or INSERT ... ON CONFLICT support for the connector’s upsert abstraction).

Therefore, this PR does not implement upsert for DuckDB.

7.Why no DuckDB E2E tests are added

DuckDB is an embedded database, not a standalone service.
Introducing DuckDB via Docker in E2E tests has limited practical value because:

  • the connector fundamentally interacts with a local embedded engine rather than a remote service
  • process-level file locking makes multi-process E2E setups fragile
  • the incremental validation gained from Docker-based E2E tests is low compared to the added maintenance cost

@chl-wxp chl-wxp marked this pull request as draft January 6, 2026 05:54
@github-actions github-actions bot removed the api label Jan 6, 2026
@github-actions github-actions bot removed the e2e label Jan 7, 2026
@chl-wxp chl-wxp marked this pull request as ready for review January 7, 2026 07:36
@chl-wxp
Copy link
Contributor Author

chl-wxp commented Jan 7, 2026

task before

image image

task

image image

task after

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants