The ai-trader framework now includes a powerful SQLite-based persistent data storage layer using SQLModel ORM. This eliminates redundant API calls and dramatically speeds up repeated backtests.
Benefits:
- 40x faster on repeated fetches (~2-3s → ~50ms)
- Incremental updates - Only fetch missing dates
- Multi-market support - US stocks, Taiwan stocks, crypto, forex, VIX
- Type-safe ORM - SQLModel with proper schema constraints
- Easy management - CLI commands for list, delete, clean
# Default: Save to CSV only
ai-trader fetch AAPL --market us_stock --start-date 2024-01-01
# NEW: Save to SQLite only
ai-trader fetch AAPL --market us_stock --start-date 2024-01-01 --storage sqlite
# Save to both CSV and SQLite
ai-trader fetch AAPL --market us_stock --start-date 2024-01-01 --storage both# List all cached tickers
ai-trader data list
# Filter by market
ai-trader data list --market us_stock
# View database statistics
ai-trader data info
# Delete specific ticker
ai-trader data delete --ticker AAPL --market us_stock
# Clean old data (before 2020)
ai-trader data clean --market us_stock --before 2020-01-01database: data/market_data.db
Per-Market Tables:
├── us_stock_data [ticker, date, open, high, low, close, volume, adj_close]
├── tw_stock_data [ticker, date, open, high, low, close, volume]
├── crypto_data [ticker, date, open, high, low, close, volume, adj_close]
├── forex_data [ticker, date, open, high, low, close, volume, adj_close]
├── vix_data [ticker, date, open, high, low, close, volume, adj_close]
└── data_metadata [ticker, market_type, first_date, last_date, total_rows, last_fetched_at]
Constraints:
├── UNIQUE(ticker, date) on each market table
├── Indexed: ticker, date
└── last_fetched_at tracks when data was last updated
- Schema Flexibility - Taiwan stocks don't have
adj_close, no nullable columns needed - Query Performance - Fast indexed lookups, no cross-market JOINs
- Data Integrity - Market-specific validation and constraints
- Future Extensibility - Easy to add market-specific columns
from ai_trader.data.storage import SQLiteDataStorage
import pandas as pd
# Initialize
storage = SQLiteDataStorage(db_path="data/market_data.db")
# Save data
df = pd.DataFrame(...) # OHLCV with DatetimeIndex
rows = storage.save(df=df, ticker="AAPL", market_type="us_stock")
# Load data
df = storage.load("AAPL", "us_stock", "2024-01-01", "2024-12-31")
# Check coverage
coverage = storage.get_coverage("AAPL", "us_stock")
# Returns: (date(2024,1,1), date(2024,12,31))
# Find missing dates
missing = storage.get_missing_ranges("AAPL", "us_stock",
"2023-01-01", "2024-12-31")
# Returns: [(date(2023,1,1), date(2023,12,31))]
# List tickers
tickers = storage.list_tickers()
tickers = storage.list_tickers(market_type="us_stock") # Filter by market
# Database info
info = storage.get_database_info()
# Clean up
storage.delete_ticker("AAPL", "us_stock")
storage.delete_before("us_stock", "2020-01-01")| Market | Table | Has adj_close | Example Tickers |
|---|---|---|---|
us_stock |
us_stock_data | Yes | AAPL, MSFT, TSLA |
tw_stock |
tw_stock_data | No | 2330, 2454, 3008 |
crypto |
crypto_data | Yes | BTC-USD, ETH-USD |
forex |
forex_data | Yes | EURUSD, GBPUSD |
vix |
vix_data | Yes | ^VIX |
The system intelligently handles incremental updates:
Request: Fetch AAPL data for 2023-01-01 to 2024-12-31
Step 1: Check metadata table
Database has: 2024-01-01 to 2024-06-30
Step 2: Calculate missing ranges
Need: 2023-01-01 -------- 2024-06-30 -------- 2024-12-31
Have: [████████████]
Missing: [████████████████] [██████████]
Step 3: Fetch only missing data (2 ranges)
API call 1: 2023-01-01 to 2023-12-31
API call 2: 2024-07-01 to 2024-12-31
Step 4: Merge with existing data and update metadata
| Operation | CSV | SQLite |
|---|---|---|
| First fetch (500 rows) | ~2-3s (API) | ~2-3s (API) |
| Repeated fetch (same data) | ~2-3s (API) | ~50ms (cache) |
| Incremental fetch (partial) | ~2-3s (full API) | ~500ms (partial API) |
| Query 1 year | ~100ms | ~30ms |
Savings on typical workflow:
- 50 backtests × 2.95s saved = 147.5 seconds (~2.5 minutes)
- ~40x faster for repeated data
# First run: Fetches from API (~3 seconds)
ai-trader fetch AAPL --market us_stock --start-date 2024-01-01 --storage sqlite
ai-trader quick SMAStrategy data/AAPL.csv
# Second run: Uses cached data (~1 second total)
ai-trader fetch AAPL --market us_stock --start-date 2024-01-01 --storage sqlite
ai-trader quick BBandsStrategy data/AAPL.csv
# Third run: Incremental update only (~1 second total)
ai-trader fetch AAPL --market us_stock --start-date 2024-01-01 --storage sqlite
ai-trader quick RSIStrategy data/AAPL.csv# Fetch multiple symbols and cache them
ai-trader fetch AAPL MSFT GOOGL --market us_stock --start-date 2024-01-01 --storage sqlite
# List what was cached
ai-trader data list --market us_stock
# Output:
# • AAPL (us_stock): 2024-01-01 to 2024-03-28 (251 rows)
# • MSFT (us_stock): 2024-01-01 to 2024-03-28 (251 rows)
# • GOOGL (us_stock): 2024-01-01 to 2024-03-28 (251 rows)
# Later: Add one more year of data (incremental)
ai-trader fetch AAPL MSFT GOOGL --market us_stock --start-date 2024-01-01 --end-date 2025-03-28 --storage sqlite
# Only fetches 2024-03-29 to 2025-03-28 from APIpython scripts/examples/sqlite_storage_example.py-
Use SQLite for repeated backtests
ai-trader fetch AAPL --storage sqlite # Later, same command = instant load -
Use CSV for sharing data
ai-trader fetch AAPL --storage csv # Share data/us_stock/AAPL_*.csv with team -
Use both for production workflows
ai-trader fetch AAPL --storage both # Persistent cache + shareable CSV -
Periodically clean old data
ai-trader data clean --market us_stock --before 2015-01-01
-
Monitor database size
ai-trader data info
-
Don't use SQLite for one-off backtests
- CSV is fine if data isn't reused
-
Don't commit database files to git
- Database regenerates on demand
- Use
.gitignorefordata/market_data.db
-
Don't delete metadata accidentally
- Database tracks coverage, deletion loses this info
- Use
ai-trader data delete --ticker Xto clean properly
-
Don't mix storage modes without understanding
# BAD: Creates confusion about data origin ai-trader fetch AAPL --storage sqlite ai-trader fetch AAPL --storage csv # Different location! # GOOD: Use consistent approach ai-trader fetch AAPL --storage sqlite ai-trader fetch AAPL --storage sqlite # Same database
# Old way: CSV files accumulate
ai-trader fetch AAPL --market us_stock --start-date 2020-01-01 # Creates CSV
# New way: Use SQLite
ai-trader fetch AAPL --market us_stock --start-date 2020-01-01 --storage sqlite
# Import existing CSVs (if needed)
python -c """
from ai_trader.data.storage import SQLiteDataStorage
import pandas as pd
from pathlib import Path
storage = SQLiteDataStorage()
for csv_file in Path('data/us_stock').glob('*.csv'):
df = pd.read_csv(csv_file, index_col=0, parse_dates=True)
ticker = csv_file.stem.split('_')[0]
storage.save(df=df, ticker=ticker, market_type='us_stock')
"""Q: Database file is too large
A: Clean old data
ai-trader data clean --market us_stock --before 2020-01-01Q: Why is first fetch not faster?
A: First fetch always calls API. SQLite speeds up REPEATED fetches.
Speedup happens on 2nd and subsequent calls.Q: Can I use the same database for multiple projects?
A: Yes, each table is per-market, no conflicts.
Or use different db_path: SQLiteDataStorage(db_path="project2/data.db")Q: How to reset the database?
A: Delete the database file
rm data/market_data.db
Next fetch will create a fresh one.All table definitions use SQLModel for type safety:
from ai_trader.data.storage import USStockData, TWStockData, CryptoData
# These are proper SQLModel ORM models, not raw SQL tables
# Provides:
# - Type hints and IDE completion
# - Automatic schema validation
# - Migration-free updates
# - Seamless integration with SQLAlchemy 2.0All database operations use the SQLModel ORM:
# Good: Using ORM
session.exec(select(USStockData).where(USStockData.ticker == "AAPL")).all()
# Bad: Raw SQL (not used in this project)
session.exec("SELECT * FROM us_stock_data WHERE ticker = 'AAPL'").all()SQLite storage is thread-safe:
# Safe for concurrent access
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [
executor.submit(storage.load, symbol, "us_stock", "2024-01-01", "2024-12-31")
for symbol in ["AAPL", "MSFT", "GOOGL", "AMZN"]
]
results = [f.result() for f in futures]- Main Documentation: README.md
- Agentic Engine Guide: agentic_ai_trader/trading-backtester/README.md
- Examples: scripts/examples/sqlite_storage_example.py
- Source Code: ai_trader/data/storage/
Licensed under GPL-3.0. See LICENSE for details.