Import public AWS Ethereum transactions parquet into TiDB/MySQL-compatible DB.
- Read from three sources:
- download: S3 → local directory (concurrent, resumable), then import locally
- local: read existing local parquet files
- s3: read directly from S3 without downloading
- Multi-day import: from a start date counting backward N days
- Robust insert: bulk multi-row INSERT with fallback to
to_sql - Schema: primary key
(date, hash, block_timestamp)andinputasLONGTEXT
Python 3.9+ recommended.
Install Python dependencies:
pip install -r requirements.txtContents of requirements.txt:
pyarrow==17.0.0
pandas==2.2.3
s3fs==2024.9.0
SQLAlchemy==2.0.36
mysql-connector-python==9.0.0
python-dotenv==1.0.1Set these to point to your TiDB/MySQL-compatible endpoint (defaults shown):
export TIDB_USER=root
export TIDB_PASSWORD=""
export TIDB_HOST=127.0.0.1
export TIDB_PORT=4000
export TIDB_DB=ethEntry script: eth_import_transactions.py
Key arguments:
--start-date YYYY-MM-DD: start date (inclusive)--days N: number of days counting backward from start date--source {download,local,s3}: input mode--local-dir PATH: where parquet files are stored/read (for download/local)--drop-table: drop and recreate table before import (default can be changed in code)--chunksize: rows per DB insert chunk (default in code)--batch-size: arrow scanner batch size (default in code)- Download tuning:
--download-timeout,--download-retries,--download-workers -v/--verbose: verbose logging
Download 3 days starting from 2025-10-29 to local and import:
python eth_import_transactions.py \
--start-date 2025-10-29 --days 3 \
--source download \
--chunksize 1000 --batch-size 1000Import from local only (no download), using date-prefixed filenames in local_data_multi/:
python eth_import_transactions.py \
--start-date 2025-10-29 --days 3 \
--source local --local-dir local_data_multiRead directly from S3 (no local files):
python eth_import_transactions.py \
--start-date 2025-10-29 --days 3 \
--source s3Recreate table before import:
python eth_import_transactions.py --start-date 2025-10-29 --days 1 --drop-tableDownloaded files are saved as:
<local-dir>/<YYYY-MM-DD>__<original_parquet_name>.parquet
The importer uses the YYYY-MM-DD prefix to set the date column and group files by day.
You can change defaults without CLI by editing constants near the top of main():
DEFAULT_START_DATE = "YYYY-MM-DD"
DEFAULT_DAYS = 1
DEFAULT_CHUNKSIZE = 10000
DEFAULT_BATCH_SIZE = 10000
DEFAULT_SOURCE = "download" # download | local | s3
DEFAULT_LOCAL_DIR = "local_data_multi"
DEFAULT_DROP_TABLE = False
DEFAULT_DOWNLOAD_TIMEOUT = 300
DEFAULT_DOWNLOAD_RETRIES = 3
DEFAULT_DOWNLOAD_WORKERS = 4The script creates database eth and table eth.eth_transactions if not present:
CREATE TABLE IF NOT EXISTS eth.eth_transactions (
date VARCHAR(10) NOT NULL,
hash VARCHAR(66) NOT NULL,
block_timestamp BIGINT NOT NULL,
nonce BIGINT NULL,
transaction_index BIGINT NULL,
from_address VARCHAR(42) NULL,
to_address VARCHAR(42) NULL,
value DOUBLE NULL,
gas BIGINT NULL,
gas_price BIGINT NULL,
input LONGTEXT NULL,
receipt_cumulative_gas_used BIGINT NULL,
receipt_gas_used BIGINT NULL,
receipt_contract_address VARCHAR(42) NULL,
receipt_status BIGINT NULL,
block_number BIGINT NULL,
block_hash VARCHAR(66) NULL,
max_fee_per_gas BIGINT NULL,
max_priority_fee_per_gas BIGINT NULL,
transaction_type BIGINT NULL,
receipt_effective_gas_price BIGINT NULL,
PRIMARY KEY (date, hash, block_timestamp)
);- S3 region is
us-east-2and public bucket path isaws-public-blockchain/v1.0/eth/transactions/date=YYYY-MM-DD. - If direct S3 reads stall with very large batch sizes, reduce
--batch-size(e.g., 1000–2000). - Bulk insert is faster; on error it falls back to
to_sqlautomatically. - For TiDB, prefer
mysql+mysqlconnectorURI used in the script; ensure port (4000) is reachable.