New Destination Connector: Dewey #77757
Replies: 2 comments
-
|
Hi Gareth Ari Aye (@lambdabaa), thanks for the detailed proposal and for offering to contribute and maintain this! We've escalated this to our team for review and tracking: airbytehq/oncall#12197. Note that community discussions and contributions do not have an SLA — review may take some time, and acceptance of a new community-maintained destination is a decision for the engineering team rather than something we can pre-approve here. A few things that will help when you open the PR:
Feel free to open the PR whenever you're ready. We'll also share this with the connectors team via the internal tracking issue so they're aware before it lands. Need more help? Join the Airbyte Community Slack for peer support and to chat with the connectors team about contributions. |
Beta Was this translation helpful? Give feedback.
-
|
PR for new connector: #77771 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Dewey is a managed RAG service that ingests documents and runs the chunking, embedding, indexing, and retrieval pipeline server-side. Today, anyone wanting to load Airbyte source data into Dewey has to script the document upload themselves — there is no native destination, so RAG users on Dewey can't take advantage of Airbyte's source catalog for keeping their collections in sync.
Proposed Solution
Add a Python destination connector
destination-dewey. Design summary:destination-vectarathandestination-pinecone/destination-weaviate. Dewey owns embedding and chunking, so the connector does not pull in theairbyte_cdk.destinations.vector_db_basedframework (no Embedder/Indexer/Writer/DocumentProcessor).stream_collections: { stream_name → collection_id }map, with optionalauto_create_collectionsfor first-time setups.POST /collections/:id/documents(multipart). Dewey accepts JSON natively, so no markdown synthesis is required. Optionaltext_fieldsprojects only specific dot-paths into the indexed body;metadata_fieldslifts other fields into Dewey's per-documentmetadatafor query-time filtering.overwrite: at sync start, list documents taggedairbyte_stream:<stream>and batch-delete them, then upload.append: upload only.append_dedup: at flush time, delete prior versions byairbyte_pk:<pk>tag, then upload.dwy_live_.../dwy_test_...). Default base URLhttps://api.meetdewey.com/v1, overridable for self-hosted.Implementation status
A working implementation is ready locally: 27 unit tests + 4 live integration tests passing end-to-end against the production Dewey API (check, append_dedup with PK-based replacement, overwrite). I will open the PR once this discussion is acknowledged.
Author
Personal contribution — happy to maintain.
Beta Was this translation helpful? Give feedback.
All reactions