Skip to content

New Source Connector: DuckDB 🦆  #31

Open
@aaronsteers

Description

@aaronsteers

Overview

We do not yet have a DuckDB source connector. Normally, DuckDB database are local files and not very useful as sources, but now they can also be remote (e.g. MotherDuck) and they can be a pass-through for other datasource (e.g. #30 and the Hugging Face Datasets).

Technical spec

You would write a new source connector which can connect to a (remote) DuckDB dataset or database, and emit records from DuckDB, allowing Airbyte users to send these to any Airbyte destination.

Notes:

  • We do have a DuckDB Destination and a PyAirbyte Cache and SQLProcessor.
  • It is not obvious how (or if) incremental processing should be handled for DuckDB sources. Whoever pick this task should plan to propose a path forward for this during development.

Definition of Done

  • You would build a new "DuckDB" source in Python (reusing code if helpful).
  • If primary keys exist, they should be registered in the catalog.
  • You should use the CDK as much as possible.
  • The connector should pass integration tests and acceptance tests.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

  • Status

    Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions