feat: Add Kedro+Ibis POC with source-faker integration (do not merge) #130
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat: Add Kedro+Ibis POC with source-faker integration
Summary
This PR adds a standalone proof-of-concept demonstrating Kedro+Ibis integration patterns for scalable data pipelines. The POC is located in
kedro-ibis-poc/
directory and implements a three-stage pipeline:Key features:
kedro-datasets[ibis-duckdb]
maintained by Kedro teamThe implementation follows patterns from the Kedro+Ibis blog article and deepyaman/jaffle-shop reference implementation.
Review & Testing Checklist for Human
Critical (3 items) - This POC has not been tested end-to-end:
cd kedro-ibis-poc && uv sync && kedro run
- verify it completes without errorsduckdb data/kedro_ibis_poc.duckdb -c "SELECT COUNT(*) FROM raw_users; SELECT * FROM stg_users LIMIT 5; SELECT * FROM fct_user_purchases LIMIT 10;"
uv sync
installs all required packages without conflicts (kedro~=0.19.0, airbyte~=0.24.2, ibis-framework[duckdb]~=9.0)Notes
Link to Devin run: https://app.devin.ai/sessions/9c3ec40c87694e3ba931dcc52837b7c3
Requested by: @aaronsteers