Skip to content

Commit d6755c5

Browse files
committed
docs and readme
1 parent a988a47 commit d6755c5

File tree

3 files changed

+136
-143
lines changed

3 files changed

+136
-143
lines changed

README.md

Lines changed: 43 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,123 +1,75 @@
1-
# Ethereum Blockchain Data Analytics Platform
1+
# Stables Analytics
22

3-
Capstone project for [Foundry AI Academy](https://www.foundry.academy/) Data & AI Engineering program. An ELT pipeline for extracting, loading, and transforming Ethereum blockchain data with focus on stablecoin analytics.
3+
Production-grade ELT pipeline for on-chain stablecoin analytics. Built for [Foundry AI Academy](https://www.foundry.academy/) Data & AI Engineering program.
44

5-
Inspired by [Visa on Chain Analytics](https://visaonchainanalytics.com/).
5+
**Pipeline**: `HyperSync GraphQL → PostgreSQL/Snowflake → dbt → Analytics`
66

77
## Quick Start
88

9-
### Prerequisites
109
```bash
11-
# Create Docker network
12-
docker network create fa-dae2-capstone_kafka_network
13-
14-
# Start PostgreSQL
10+
# Setup
1511
docker-compose up -d
16-
17-
# Install dependencies
1812
uv sync
19-
20-
# Setup environment
2113
cp .env.example .env
22-
export $(cat .env | xargs)
23-
```
2414

25-
### Extract Data
26-
```bash
27-
# Extract logs and transactions from Etherscan
28-
uv run python scripts/el/extract_etherscan.py \
29-
-c ethereum \
30-
-a 0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72 \
31-
--logs --transactions \
32-
--from_block 18.5M --to_block 20M \
33-
-v
34-
```
35-
36-
### Load Data
37-
```bash
38-
# Load Parquet to PostgreSQL
39-
uv run python scripts/el/load.py \
40-
-f .data/raw/ethereum_0xaddress_logs_18500000_20000000.parquet \
41-
-c postgres \
42-
-s raw \
43-
-t logs \
44-
-w append
45-
```
15+
# Extract blockchain data via GraphQL
16+
# Run the indexer separately: https://github.com/newgnart/envio-stablecoins
17+
uv run python scripts/el/extract_graphql.py --from_block 18500000 --to_block 20000000 -v
4618

47-
### Transform Data
48-
```bash
49-
# Run dbt models
50-
./scripts/dbt.sh run
19+
# Load to database
20+
uv run python scripts/el/load.py -f .data/raw/data_*.parquet -c postgres -s raw -t raw_transfer -w append
5121

52-
# Run specific model
53-
./scripts/dbt.sh run --select stg_logs_decoded
22+
# Transform with dbt
23+
cd dbt_project && dbt run
5424
```
5525

5626
## Architecture
5727

58-
**Extract****Load****Transform**
28+
```
29+
HyperSync GraphQL API → Parquet Files → PostgreSQL/Snowflake → dbt → Analytics Tables
30+
```
5931

60-
1. **Extract** (`scripts/el/extract_etherscan.py`): Pulls blockchain data from Etherscan API to `.data/raw/*.parquet`
61-
2. **Load** (`scripts/el/load.py`): Loads Parquet files into PostgreSQL/Snowflake `raw` schema
62-
3. **Transform** (`dbt_project/`): dbt models transform raw data into analytics-ready tables
32+
**Key Components:**
33+
34+
- **Extract**: High-performance GraphQL API (HyperSync) for blockchain data with block-range filtering
35+
- **Load**: Pluggable loaders for PostgreSQL (dev) and Snowflake (prod) with `append`/`replace`/`merge` modes
36+
- **Transform**: dbt three-tier modeling (Staging → Intermediate → Marts) with SCD Type 2 support
37+
- **Migrate**: PostgreSQL to Snowflake data transfer via `pg2sf_raw_transfer.py`
6338

6439
### Project Structure
40+
6541
```
66-
├── scripts/el/ # Extract & Load scripts
67-
├── src/onchaindata/ # Reusable Python package
68-
│ ├── data_extraction/ # Etherscan/GraphQL clients
69-
│ ├── data_pipeline/ # Loader classes
70-
│ └── utils/ # Database clients
71-
├── dbt_project/ # dbt transformation layer
72-
│ ├── models/01_staging/ # Raw data cleanup (views)
73-
│ ├── models/intermediate/# Business logic (ephemeral)
74-
│ └── models/marts/ # Analytics tables (tables)
75-
└── .data/raw/ # Extracted Parquet files
42+
scripts/el/ # Extract & Load scripts
43+
src/onchaindata/ # Python package (extraction, loading, database clients)
44+
dbt_project/ # dbt models, snapshots, macros
45+
docs/ # MkDocs documentation
7646
```
7747

78-
## Key Features
48+
## Features
7949

80-
- **Multi-chain support**: Ethereum, Polygon, BSC via chainid mapping
81-
- **Automatic retry**: Failed extractions retry with smaller chunks (10x reduction)
82-
- **Flexible loading**: PostgreSQL and Snowflake support
83-
- **Block number shortcuts**: Use `18.5M` instead of `18500000`
84-
- **dbt transformations**: Staging → Intermediate → Marts layers
50+
- **High-Performance Extraction**: HyperSync GraphQL API for fast blockchain data retrieval
51+
- **Flexible Loading**: PostgreSQL & Snowflake support with multiple write modes
52+
- **Multi-Chain**: Ethereum, Polygon, BSC via configurable endpoints
53+
- **SCD Type 2**: Historical tracking for stablecoin metadata via dbt snapshots
54+
- **Cross-Database Migration**: Seamless PostgreSQL → Snowflake transfers
8555

86-
## Environment Variables
56+
## Tech Stack
8757

88-
Required (see `.env.example`):
89-
- `POSTGRES_*`: Database connection
90-
- `ETHERSCAN_API_KEY`: API access
91-
- `DB_SCHEMA`: Default schema
58+
Python 3.11+ • SQL • Polars • dlt • PostgreSQL • Snowflake • dbt Core • Docker • uv
9259

93-
Optional (for Snowflake):
94-
- `SNOWFLAKE_*`: Snowflake connection details
60+
## Documentation
9561

96-
## Common Commands
62+
- **Full Docs**: [https://newgnart.github.io/stables-analytics/](https://newgnart.github.io/stables-analytics/)
63+
- **Dev Guide**: [CLAUDE.md](CLAUDE.md)
9764

98-
```bash
99-
# SQL operations
100-
./scripts/sql_pg.sh ./scripts/sql/init.sql
101-
102-
# dbt operations
103-
./scripts/dbt.sh test # Run tests
104-
./scripts/dbt.sh docs generate # Generate docs
105-
./scripts/dbt.sh run --select staging.* # Run staging models
106-
107-
# Extract with time range
108-
uv run python scripts/el/extract_etherscan.py \
109-
-a 0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72 \
110-
--logs --transactions \
111-
--last_n_days 7
112-
```
65+
## Environment Setup
11366

114-
## Database Schema
67+
Create `.env` file with database credentials:
68+
- `POSTGRES_*`: PostgreSQL connection details
69+
- `SNOWFLAKE_*`: (Optional) Snowflake connection details
11570

116-
- `raw.logs`: Raw event logs with JSONB topics
117-
- `raw.transactions`: Transaction data
118-
- `staging.stg_logs_decoded`: Decoded logs with parsed topics (topic0-topic3)
119-
- Marts: Analytics tables created by dbt
71+
See `.env.example` for full configuration.
12072

121-
## Documentation
73+
---
12274

123-
For detailed documentation, see [CLAUDE.md](CLAUDE.md) or the [docs/](docs/) directory.
75+
**License**: MIT • Educational capstone project

docs/index.md

Lines changed: 50 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,52 @@
1-
# Home
2-
The capstone project for [Foundry AI Academy](https://www.foundry.academy/) Data&AI Engineering program.
1+
# Stables Analytics Platform
32

4-
Inspired by [Visa on Chain Analytics](https://visaonchainanalytics.com/) which showcases how fiat-backed stablecoins move via public blockchains globally
5-
- Key Metrics: Stablecoin Supply, Transactions Volume, Addresses and Lending
6-
- Stablecoins: USDC, USDT, PYUSD, FDUSD, USDP and USDG on several blockchains
3+
A production-grade data analytics platform for on-chain stablecoin transactions, built as a capstone project for [Foundry AI Academy](https://www.foundry.academy/) Data & AI Engineering program.
74

8-
This project is built as Analytics Platform for decentralized stablecoins usages on Ethereum.
5+
## Technical Overview
6+
7+
### Data Engineering Architecture
8+
9+
The platform implements a modern **ELT (Extract, Load, Transform)** pipeline optimized for blockchain data:
10+
11+
```
12+
HyperSync GraphQL API → PostgreSQL/Snowflake → dbt Transformations → Analytics Tables
13+
```
14+
15+
**Key Engineering Components:**
16+
17+
#### 1. **Extraction Layer** (`Python + GraphQL`)
18+
- High-performance blockchain data extraction via HyperSync GraphQL API
19+
- Block-range filtering with dynamic query generation (supports custom `from_block`/`to_block` parameters)
20+
- Batch mode for large historical extracts with automatic Parquet serialization
21+
- Streaming mode for real-time data ingestion directly to databases
22+
- Multi-chain support (Ethereum, Polygon, BSC) through configurable endpoints
23+
24+
#### 2. **Loading Layer** (`dlt + SQL`)
25+
- Pluggable database clients supporting PostgreSQL and Snowflake
26+
- Multiple write modes: `append`, `replace`, `merge` (upsert) with composite key support, with dlt pipeline
27+
- Connection pooling and optimized batch loading for high-throughput ingestion
28+
29+
#### 3. **Transformation Layer** (`dbt`)
30+
- Three-tier modeling: Staging (views) → Intermediate (ephemeral) → Marts (tables)
31+
- Slowly Changing Dimension (SCD Type 2) implementation via dbt snapshots for stablecoin metadata
32+
- Custom Ethereum macros for address extraction and uint256 conversion
33+
- Cross-database compatibility (PostgreSQL for dev, Snowflake for production)
34+
35+
#### 4. **Data Migration** (`Python`)
36+
- Block-range based PostgreSQL to Snowflake data transfer for cloud warehousing
37+
- Polars-powered efficient data transformation and loading
38+
39+
### Tech Stack
40+
41+
- **Languages**: Python 3.11+, SQL
42+
- **Data Processing**: Polars, Pandas, PyArrow, dlt
43+
- **Databases**: PostgreSQL, Snowflake
44+
- **Transformation**: dbt Core (Postgres/Snowflake adapters)
45+
- **Infrastructure**: Docker, uv (dependency management)
46+
- **Documentation**: MkDocs Material
47+
48+
## Getting Started
49+
50+
Detailed setup instructions and API reference available in the navigation menu.
51+
52+
For development workflows, see [CLAUDE.md](https://github.com/newgnart/stables-analytics/blob/main/CLAUDE.md).

mkdocs.yml

Lines changed: 43 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,57 @@
11
site_name: Stables Analytics
2+
# Repository
3+
repo_url: https://github.com/newgnart/stables-analytics
4+
repo_name: stables-analytics
5+
# Theme configuration
26
theme:
37
name: material
4-
features:
5-
- announce.dismiss
6-
- content.action.edit
7-
- content.action.view
8-
- content.code.annotate
9-
- content.code.copy
10-
# - content.code.select
11-
# - content.footnote.tooltips
12-
# - content.tabs.link
13-
- content.tooltips
14-
# - header.autohide
15-
# - navigation.expand
16-
- navigation.footer
17-
- navigation.indexes
18-
# - navigation.instant
19-
# - navigation.instant.prefetch
20-
# - navigation.instant.progress
21-
# - navigation.prune
22-
- navigation.sections
23-
- navigation.tabs
24-
# - navigation.tabs.sticky
25-
- navigation.top
26-
- navigation.tracking
27-
- search.highlight
28-
- search.share
29-
- search.suggest
30-
- toc.follow
318
palette:
32-
- media: "(prefers-color-scheme)"
33-
toggle:
34-
icon: material/link
35-
name: Switch to light mode
36-
- media: "(prefers-color-scheme: light)"
37-
scheme: default
9+
# Light mode
10+
- scheme: default
3811
primary: indigo
3912
accent: indigo
4013
toggle:
41-
icon: material/toggle-switch
14+
icon: material/brightness-7
4215
name: Switch to dark mode
43-
- media: "(prefers-color-scheme: dark)"
44-
scheme: slate
45-
primary: black
16+
# Dark mode
17+
- scheme: slate
18+
primary: indigo
4619
accent: indigo
4720
toggle:
48-
icon: material/toggle-switch-off
49-
name: Switch to system preference
50-
font:
51-
text: Roboto
52-
code: Roboto Mono
53-
# favicon: assets/favicon.png
54-
# icon:
55-
# logo: logo
56-
# - toc.integrate
21+
icon: material/brightness-4
22+
name: Switch to light mode
23+
features:
24+
- navigation.tabs
25+
- navigation.sections
26+
- navigation.top
27+
- navigation.tracking
28+
- search.suggest
29+
- search.highlight
30+
- content.code.copy
31+
- content.code.annotate
5732

33+
# Extensions
34+
markdown_extensions:
35+
- pymdownx.highlight:
36+
anchor_linenums: true
37+
line_spans: __span
38+
pygments_lang_class: true
39+
- pymdownx.inlinehilite
40+
- pymdownx.snippets
41+
- pymdownx.superfences:
42+
custom_fences:
43+
- name: mermaid
44+
class: mermaid
45+
format: !!python/name:pymdownx.superfences.fence_code_format
46+
- pymdownx.tabbed:
47+
alternate_style: true
48+
- pymdownx.details
49+
- admonition
50+
- toc:
51+
permalink: true
52+
- tables
53+
- attr_list
54+
- md_in_html
5855
nav:
5956
- Home: index.md
6057
- Getting Started: 01_getting_started.md

0 commit comments

Comments
 (0)