ingest_aws_-blockchain

Import public AWS Ethereum transactions parquet into TiDB/MySQL-compatible DB.

Features

Read from three sources:
- download: S3 → local directory (concurrent, resumable), then import locally
- local: read existing local parquet files
- s3: read directly from S3 without downloading
Multi-day import: from a start date counting backward N days
Robust insert: bulk multi-row INSERT with fallback to to_sql
Schema: primary key (date, hash, block_timestamp) and input as LONGTEXT

Requirements

Python 3.9+ recommended.

Install Python dependencies:

pip install -r requirements.txt

Contents of requirements.txt:

pyarrow==17.0.0
pandas==2.2.3
s3fs==2024.9.0
SQLAlchemy==2.0.36
mysql-connector-python==9.0.0
python-dotenv==1.0.1

Environment variables (TiDB/MySQL)

Set these to point to your TiDB/MySQL-compatible endpoint (defaults shown):

export TIDB_USER=root
export TIDB_PASSWORD=""
export TIDB_HOST=127.0.0.1
export TIDB_PORT=4000
export TIDB_DB=eth

Usage

Entry script: eth_import_transactions.py

Key arguments:

--start-date YYYY-MM-DD: start date (inclusive)
--days N: number of days counting backward from start date
--source {download,local,s3}: input mode
--local-dir PATH: where parquet files are stored/read (for download/local)
--drop-table: drop and recreate table before import (default can be changed in code)
--chunksize: rows per DB insert chunk (default in code)
--batch-size: arrow scanner batch size (default in code)
Download tuning: --download-timeout, --download-retries, --download-workers
-v/--verbose: verbose logging

Examples

Download 3 days starting from 2025-10-29 to local and import:

python eth_import_transactions.py \
  --start-date 2025-10-29 --days 3 \
  --source download \
  --chunksize 1000 --batch-size 1000

Import from local only (no download), using date-prefixed filenames in local_data_multi/:

python eth_import_transactions.py \
  --start-date 2025-10-29 --days 3 \
  --source local --local-dir local_data_multi

Read directly from S3 (no local files):

python eth_import_transactions.py \
  --start-date 2025-10-29 --days 3 \
  --source s3

Recreate table before import:

python eth_import_transactions.py --start-date 2025-10-29 --days 1 --drop-table

File naming for local mode

Downloaded files are saved as:

<local-dir>/<YYYY-MM-DD>__<original_parquet_name>.parquet

The importer uses the YYYY-MM-DD prefix to set the date column and group files by day.

Defaults in code

You can change defaults without CLI by editing constants near the top of main():

DEFAULT_START_DATE = "YYYY-MM-DD"
DEFAULT_DAYS = 1
DEFAULT_CHUNKSIZE = 10000
DEFAULT_BATCH_SIZE = 10000
DEFAULT_SOURCE = "download"  # download | local | s3
DEFAULT_LOCAL_DIR = "local_data_multi"
DEFAULT_DROP_TABLE = False
DEFAULT_DOWNLOAD_TIMEOUT = 300
DEFAULT_DOWNLOAD_RETRIES = 3
DEFAULT_DOWNLOAD_WORKERS = 4

Schema

The script creates database eth and table eth.eth_transactions if not present:

CREATE TABLE IF NOT EXISTS eth.eth_transactions (
  date VARCHAR(10) NOT NULL,
  hash VARCHAR(66) NOT NULL,
  block_timestamp BIGINT NOT NULL,
  nonce BIGINT NULL,
  transaction_index BIGINT NULL,
  from_address VARCHAR(42) NULL,
  to_address VARCHAR(42) NULL,
  value DOUBLE NULL,
  gas BIGINT NULL,
  gas_price BIGINT NULL,
  input LONGTEXT NULL,
  receipt_cumulative_gas_used BIGINT NULL,
  receipt_gas_used BIGINT NULL,
  receipt_contract_address VARCHAR(42) NULL,
  receipt_status BIGINT NULL,
  block_number BIGINT NULL,
  block_hash VARCHAR(66) NULL,
  max_fee_per_gas BIGINT NULL,
  max_priority_fee_per_gas BIGINT NULL,
  transaction_type BIGINT NULL,
  receipt_effective_gas_price BIGINT NULL,
  PRIMARY KEY (date, hash, block_timestamp)
);

Notes & Troubleshooting

S3 region is us-east-2 and public bucket path is aws-public-blockchain/v1.0/eth/transactions/date=YYYY-MM-DD.
If direct S3 reads stall with very large batch sizes, reduce --batch-size (e.g., 1000–2000).
Bulk insert is faster; on error it falls back to to_sql automatically.
For TiDB, prefer mysql+mysqlconnector URI used in the script; ensure port (4000) is reachable.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
benchmark_queries.py		benchmark_queries.py
check_daily_counts.py		check_daily_counts.py
convert_local_parquet.py		convert_local_parquet.py
eth_import_transactions.py		eth_import_transactions.py
monitor_latest_fts_count.py		monitor_latest_fts_count.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ingest_aws_-blockchain

Features

Requirements

Environment variables (TiDB/MySQL)

Usage

Examples

File naming for local mode

Defaults in code

Schema

Notes & Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ingest_aws_-blockchain

Features

Requirements

Environment variables (TiDB/MySQL)

Usage

Examples

File naming for local mode

Defaults in code

Schema

Notes & Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages