Stablecoin Research Data Pipeline

This project provides a data engineering pipeline to extract, load, and transform data. It uses dlt for data ingestion and dbt for data transformation.

Architecture

The pipeline follows an ELT (Extract, Load, Transform) architecture:

Extract & Load: A Python script using the dlt library fetches log data from Etherscan, and some other sources. This raw data is then loaded into a local DuckDB database.
Transform: dbt is used to transform raw data into a analytics-ready tables.
Example usage:
- scripts/curve_dlt_pipeline.py for Curve Finance's crvUSD market data, raw data is loaded into data/raw/raw_curve.duckdb.
- dbt_subprojects/curve/models/staging/logs.sql for parsing the raw logs into decoded logs (with decoded topics).

Tech Stack

Python: The core language for the data ingestion scripts.
dlt (Data Load Tool): For creating robust and scalable data ingestion pipelines.
dbt (Data Build Tool): For transforming data in the warehouse using SQL.
DuckDB: As the local data warehouse.
Etherscan API: As the source for blockchain data.
uv: For Python package management.

Project Structure

.
├── data/                 # Raw and processed data
├── dbt_subprojects/      # dbt project for data transformation (example: dbt_subprojects/curve)
├── notebooks/            # Jupyter notebooks for analysis
├── scripts/              # Python scripts for data ingestion (dlt pipelines)
├── src/                  # Python source code
├── pyproject.toml        # Project dependencies
└── README.md

Setup and Usage

git clone [email protected]:newgnart/stables.git
cd stables

uv sync
cp .env.example .env  # then add Etherscan API key to the `.env` file

4. Run the example dlt pipeline

To start the data ingestion process, run the dlt pipeline script:

python scripts/curve_dlt_pipeline.py

This will fetch the event logs and load them into data/raw/raw_curve.duckdb.

5. Run dbt transformations

Once the raw data is loaded, you can run the dbt models to transform.

Navigate to the dbt project directory and run the models:

cd dbt_subprojects/curve
uv run dbt run

This will run the dbt models and save the transformed data in the staged schema data/staged/staged_curve.duckdb.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
data		data
dbt_subprojects		dbt_subprojects
notebooks		notebooks
scripts		scripts
src/stables		src/stables
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
index.qmd		index.qmd
pyproject.toml		pyproject.toml
run_ethena_dbt.sh		run_ethena_dbt.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stablecoin Research Data Pipeline

Architecture

Tech Stack

Project Structure

Setup and Usage

4. Run the example dlt pipeline

5. Run dbt transformations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

newgnart/stables

Folders and files

Latest commit

History

Repository files navigation

Stablecoin Research Data Pipeline

Architecture

Tech Stack

Project Structure

Setup and Usage

4. Run the example dlt pipeline

5. Run dbt transformations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages