Create dbt package generator using GenAI and PyAirbyte

## Summary

The goal with this application is to take a specific raw data schema for a source being run with PyAirbyte and to auto-generate a simple dbt project for that data.

This could be the foundation of a new type of integration opportunity for Airbyte users.

## Definition of Done

These are not specifically related to GenAI, but are the foundation of the code-gen:

- Solution should be written in Python.
- Solution can be written as a new feature in PyAirbyte or as a standalone project. (Author's preference, but probably easier/faster as a new project that just calls PyAirbyte.)
- Solution should be able to generate basic dbt project scaffold, including a basic "profiles" yaml and "dbt_project.yml". (Okay if these are hard-coded or hand-written as generic boilerplate.)
- Solution should be able to generate a "sources" yaml file for one or more sources that are being extracted using PyAirbyte. This should describe the tables being used.
- Solution should be able to be executed with `dbt run`, as proof of the working solution

The GenAI "code gen" application portion of this project is:

- Solution should be able to generate a dbt model (a .sql file) performing some basic transforms on top of the source table(s) defined in the "sources" yaml.
  - For instance, if the raw data is a 'sales' table, the LLM may create an aggregate table.
  - The LLM may also create 'stage' tables that take the raw schema and map the raw schema to new column names with conformed naming convention and/or conformed data types.
- Solution should use an LLM to generate the SQL.
- Instructions to the LLM can be hard-coded to one particular source, but the LLM should be doing the work of generating the SQL.

In terms of documentation:

- A README.md will be required for this project.
- A walkthrough tutorial explaining usage is also required. The walkthrough can exist within the README.md or can be provided in any other format, such as blog.
- A demo video walkthrough is optional, but not required.

## Suggestions (Per Author's Discretion)

These are some suggestions - but are not required:

- We suggest using a simple source like `source-faker`, `source-coin-api`, or similar.
- We suggest using DuckDB as a backend - since it is easy to replicate results locally, doesn't required a paid account, and has good SQL support.


## Resources to Assist

- PyAirbyte can be used to gather json schema for each stream.
- (@aaronsteers will add more resources and info here shortly.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create dbt package generator using GenAI and PyAirbyte #6

Summary

Definition of Done

Suggestions (Per Author's Discretion)

Resources to Assist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create dbt package generator using GenAI and PyAirbyte #6

Description

Summary

Definition of Done

Suggestions (Per Author's Discretion)

Resources to Assist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions