-
Notifications
You must be signed in to change notification settings - Fork 5
0.1.2 warehousing using dbt, duckdb and iceberg #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.1.1
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@redpheonixx what issue is this associated with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Create publish.yml * Update publish.yml
* Create publish.yml * Update publish.yml * Update publish.yml
* Update publish.yml * Update publish.yml * Update publish.yml
* Update publish.yml * Update publish.yml * Update publish.yml * Update publish.yml updated yaml file to copy distribution o/p from build to root directory * Update publish.yml added detailed copy from /ldf/dist to gihub/wo../dist * Update publish.yml added a code for creating dist directory --------- Co-authored-by: Tushar Choudhary <[email protected]>
* Update publish.yml * Update publish.yml * Update publish.yml * Update publish.yml updated yaml file to copy distribution o/p from build to root directory * Update publish.yml added detailed copy from /ldf/dist to gihub/wo../dist * Update publish.yml added a code for creating dist directory * Update publish.yml relocating dots from ./github to /.github --------- Co-authored-by: Tushar Choudhary <[email protected]>
* Update publish.yml * Update pyproject.toml
* Update publish.yml * Update pyproject.toml * Update publish.yml
* Release v1.1 dist changes * Release v1.1 publish.yml changes * Release v1.1 publish.yml changes
Update publish.yml
Create manual.yml
Update publish.yml
* Release v1.1 dist changes * Release v1.1 publish.yml changes * Release v1.1 publish.yml changes * Release v1.1
Raising a pull request (PR) from a fork's main branch to a repository's main branch is generally discouraged for several reasons:
Best Practice Workflow:
This approach keeps your development process clean and organized. It ensures that the main branches remain stable and only contain code that is ready for production. Feel free to ask if you need more details or help with Git workflows! 🚀✨ |
dbt transformations commit in new branch
* fixed bug warehouse uri Thu Oct 24 9:25 PM IST * added logging to CSV.get() * supported big query ts format * refactored parameter to config from catalog to reduce confusion * Fixed bug of logger in GCP * replaced local path with dynamic path * replaced local path with dynamic path * replaced local path with dynamic path * demo * demo * Release v1.1 dist changes * Release v1.1 publish.yml changes * Release v1.1 publish.yml changes * added a class implementation for github issue hoping it is useful for guiding users to resolution ETAs eventually * added a exception for PlanNotFound to ask users to raise issues on the repository for resolution * Updated overview and milestones. Added directory structure under technical specifications. * Updated components in technical speciifcations * added testing for BigQueryToCSV.extract() * added testing for BigQueryToCSV.extract() * added testing for Iceberg.get() * Release v1.1 * Release v1.1 bug fix * Release v1.1 bug fix * Release v1.1 bug fix * Pytest Added for BigQuery Source --------- Co-authored-by: Tushar Choudhary <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix review comments and move into draft PR until changes are made.
Great work on exception handling and clean coding efforts!
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discuss and settle on a version of python 3.6.5 or higher
Check for library dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add tar.gz, whl or heavy source files as a part of code commits
Only consider code, sample data and config file, scripts for PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving all SQL files to a central place
Consider keeping code, data, queries, configs in seperate directory structures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script needs to be OOP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file need to be a part of the commit? If no, please mention in gitignore
|
||
p=Path('C:/Users//singsina//Desktop//local-data-platform//local-data-platform//tmp//warehouse//') | ||
print(p) | ||
p_path="file:///"+str(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use f-strings for string manipulation
Title: Boiler Plate Code for DBT Transformations
Assignees: Amit Singh Labels: Transformation layer
Description:
This pull request introduces a boilerplate code for DBT transformations using the medallion architecture. The transformations are structured across three layers:
Bronze Layer: Initial raw data ingestion and storage.
Silver Layer: Intermediate transformations to refine and standardize data.
Gold Layer: Final transformations for analytical purposes, providing clean and consumable data.
The implementation utilizes the Iceberg REST catalog to manage the datasets and transformations efficiently. This setup aims to streamline the development and maintenance of DBT models, ensuring a robust and scalable data pipelin
#45
wrote dbt transformation layer as per medallion architecture
dbt's low learning curve and SQL-based workflow empower data teams without advanced programming knowledge to build and maintain transformations. This makes it a better fit for companies looking for agility and simplicity in data pipelines.