Scrapes historical stock data from cafef.vn for Vietnamese stock market tickers.
- Tab 1 - Lich su gia (Price History): Daily OHLC prices, volume, and negotiated trades
uvfor Python version and virtual environment management- Docker (for PostgreSQL)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.12
uv python pin 3.12
uv venv
source .venv/bin/activate
uv pip install -r requirements.txtuv is a fast Python package and environment manager. This project keeps dependencies in requirements.txt, and uv handles installing Python and creating the local virtual environment.
uv python pin 3.12 creates a local .python-version file for your machine so uv venv uses Python 3.12 automatically.
cp .env.example .env # edit credentials if needed
docker compose up -d # starts PostgreSQL 16The schema is auto-applied on first container start via db/schema.sql.
# Scrape price history to CSV
python main.py scrape HDB
# Scrape with date range (MM/DD/YYYY)
python main.py scrape HDB --start 01/01/2024 --end 12/31/2025
# Load CSV into PostgreSQL
python main.py load HDB
# Scrape and load in one step
python main.py scrape-and-load HDBRun the commands above after activating the environment with source .venv/bin/activate.
Output CSVs are saved to data/{SYMBOL}_price_history.csv.
- API endpoint:
cafef.vn/du-lieu/Ajax/PageNew/DataHistory/PriceHistory.ashx - URL pattern:
cafef.vn/du-lieu/lich-su-giao-dich-{ticker}-{tab}.chnwhere tab 1-6 maps to:- Lich su gia (Price History)
- Thong ke dat lenh (Order Statistics)
- Khoi ngoai (Foreign Trading)
- Tu doanh (Proprietary Trading)
- Khop lenh theo phien (Intraday Matching)
- Co dong & Noi bo (Shareholders & Insiders)
kabushiki/
├── main.py # CLI entry point
├── scraper/
│ ├── config.py # API URL, headers, rate limit
│ └── price_history.py # Price history scraper
├── db/
│ ├── schema.sql # PostgreSQL schema
│ └── loader.py # CSV -> PostgreSQL loader
├── data/ # Output CSVs (gitignored)
├── docker-compose.yml # PostgreSQL container
├── requirements.txt # Python dependencies
└── .env.example # DB config template