Data Workflow Github API

Databricks Spark workflow for getting data from GitHub REST API and loading into Databricks lakehouse tables.

Summary

Get source data from the following endpoints:

Organizations
Repositories
Contributors
Pull Requests

API response data is conditionally loaded to the github_api database in Databricks warehouse using upsert logic. Basically, if any new data is encountered it's inserted into the tables and if any modified data is encountered the affected rows are overwritten.

Repo Organization

config
- etl: set etl job config values (org names, repo names, etc) here
etl
- github
  - api_request: general framework for making api requests with requests library
  - contributors: api response data from stats/contributors endpoint
  - org: api response data from orgs endpoint
  - pulls: api response data from pulls endpoint
  - repo: api response data from repos endpoint
- spark
  - date_utils: general datetime helpers
  - etl_config: class for creating dynamic config
  - spark_table: SparkTable class for working with source tables in our metastore
  - spark_utills: general databricks spark helpers
  - sql_table: SqlTable subclass of SparkTable for tables created from a sql query of source tables
- tables
  - users: SqlTable for users dimension table
src
- dag_notebook: databricks notebook with workflow code to be run with orhcestrator
- source_tables: etl logic for source tables (warehouse tables ending in _source_)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
config		config
etl		etl
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
chart_example.png		chart_example.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Workflow Github API

Summary

Repo Organization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Workflow Github API

Summary

Repo Organization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages