A Python tool to extract cryptocurrency purchase information from email data stored in mbox files. This project aims to help crypto enthusiasts quickly parse their email history for purchase records.
- This is a helper tool, not a production-ready solution
- Results may be incomplete or inaccurate - always verify the data
- The project is not actively maintained
- Uses local LLM (Ollama) for email parsing and analysis
Get started with Docker in one command:
./setup.shThis automated setup script will:
- Check and install Docker if needed
- Build the Docker image
- Set up Ollama service with the required model
- Start all services with docker-compose
Manual Docker commands:
# Build and start services
docker compose up -d
# Run the harvester
docker compose exec harvester digital-asset-harvester --help
# Process emails
docker compose exec harvester digital-asset-harvester \
--mbox-file /app/your_emails.mbox \
--output /app/output/crypto_purchases.csv
# View logs
docker compose logs -f
# Stop services
docker compose downNote: Place your
.mboxfiles in the project directory - they will be accessible in the container at/app/. Output files are saved to./output/.
-
Set up the project environment:
Windows (PowerShell):
python -m venv venv .\venv\Scripts\Activate.ps1 pip install -r requirements.txt pip install -e .
Linux/macOS:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt pip install -e .
-
Test the setup:
pytest
-
Process your emails:
- From an mbox file:
digital-asset-harvester --mbox-file your_emails.mbox --output crypto_purchases.csv
- Directly from Gmail:
digital-asset-harvester --gmail --output crypto_purchases.csv
- From an IMAP server:
(or run
digital-asset-harvester --imap --imap-server imap.example.com --imap-user user@example.com --output crypto_purchases.csv
python -m digital_asset_harvester.cliif you prefer module execution)
- From an mbox file:
The Digital Asset Purchase Harvester now includes an interactive web interface for processing email files.
python -m digital_asset_harvester.web.runThe web interface will be available at http://localhost:8000
- Upload: Navigate to the home page and upload your mbox file
- Processing: The file will be processed in the background, and you'll see a real-time status page
- Results: Once complete, view extracted purchases in a sortable table
- Export: Download results in CSV or JSON format
The web UI provides:
- 📤 File upload with background processing
- 📊 Real-time processing status updates
- 📋 Interactive table display with sortable columns
- 💾 Export options (CSV and JSON)
- 🔄 Process multiple files sequentially
You can install the package locally in editable mode to get the console script:
pip install -e .Install development dependencies (tests, linting, docs) with:
pip install -e .[dev]Create a wheel and source distribution using the Python build tool:
pip install build
python -m buildArtifacts will appear under dist/ and can be uploaded to a package index of your choice.
- Smart Preprocessing: Filters out newsletters, marketing emails, and non-purchase content before LLM analysis
- Comprehensive Keyword Detection: Recognizes 30+ major cryptocurrency exchanges and 50+ crypto terms
- Improved Classification Prompts: Better LLM prompts with specific examples and exclusion criteria
- Confidence Scoring: Each detection includes a confidence score to help identify uncertain results
- Data Validation: Validates extracted data for completeness and reasonableness
- Faster Processing: Pre-filtering reduces unnecessary LLM calls by ~70% for typical email sets
- Enhanced Error Handling: Robust error handling with detailed logging and retry mechanisms
- Major Exchanges: Coinbase, Binance, Kraken, Gemini, Bitfinex, Bitstamp, and 25+ others
- Regional Platforms: Coinspot, BTCMarkets, Swyftx (AU), Coinsquare, Newton (CA), and more
- Cryptocurrency Coverage: Bitcoin, Ethereum, Litecoin, and 30+ major cryptocurrencies
- Extracts email data from mbox files with improved accuracy
- Intelligent cryptocurrency purchase email identification
- Extracts detailed purchase information: amounts, currencies, vendors, dates, and transaction IDs
- Advanced preprocessing to filter out irrelevant emails
- Confidence scoring and data validation
- Comprehensive logging and error reporting
- Outputs extracted data to CSV format
- Resilient LLM client wrapper with automatic retries and JSON parsing
- Dedicated validation module with typed purchase schemas and reusable validators
- Configurable prompt templates with reusable manager
- Structured logging and lightweight metrics summary reporting
In addition to the default local LLM (Ollama), the harvester now supports cloud-based LLM providers.
- Enable Cloud Providers: Set the
DAP_ENABLE_CLOUD_LLMenvironment variable totrue. - Select a Provider: Use
DAP_LLM_PROVIDERto choose betweenollama,openai, oranthropic. - API Keys: Provide the appropriate API key for your chosen cloud provider:
DAP_OPENAI_API_KEYDAP_ANTHROPIC_API_KEY
For privacy-conscious users, the harvester supports a Privacy Mode that ensures all email content is processed locally and never sent to cloud providers.
- Enable Privacy Mode: Set
DAP_ENABLE_PRIVACY_MODE=true. This forces the use of local Ollama and disables all cloud-based features. - PII Scrubbing: Privacy mode automatically enables Personally Identifiable Information (PII) scrubbing to mask sensitive data like emails, phone numbers, and wallet addresses before they reach the local LLM.
- Context Window: Local models often have smaller default context windows. You can adjust this using
DAP_LLM_CONTEXT_WINDOW(default: 4096) to ensure long emails are parsed correctly.
Example configuration for maximum privacy:
export DAP_ENABLE_PRIVACY_MODE=true
export DAP_LLM_PROVIDER=ollama
export DAP_LLM_MODEL_NAME="llama3.2:3b"If you have a slow computer and local Ollama processing takes too long, you can enable automatic fallback to a cloud provider. Note: Fallback is disabled when Privacy Mode is active.
- Enable Fallback: Set
DAP_ENABLE_OLLAMA_FALLBACK=true. - Threshold: Set the timeout threshold in seconds using
DAP_OLLAMA_FALLBACK_THRESHOLD_SECONDS(default: 10). - Cloud Provider: Specify the fallback cloud provider with
DAP_FALLBACK_CLOUD_PROVIDER(default:openai).
When enabled, if Ollama takes longer than the threshold, the harvester will automatically switch to the configured cloud provider for that email.
Example configuration:
export DAP_ENABLE_CLOUD_LLM=true
export DAP_LLM_PROVIDER=openai
export DAP_OPENAI_API_KEY="your-openai-api-key"- Exchange-Specific Email Format Guides: A reference for the email formats used by various cryptocurrency exchanges.
- Ollama Setup Guide for Windows: Detailed instructions for installing and configuring Ollama on Windows (Native or WSL2).
digital_asset_harvester/
├── cli.py # CLI entry point
├── config.py # Configuration handling
├── ingest/ # Email ingestion (mbox, Gmail, IMAP)
├── llm/ # LLM clients
├── output/ # CSV output utilities
├── processing/ # Extraction logic
├── web/ # Web UI (FastAPI)
└── validation/ # Data validation
- Python 3.7+
- Required Python packages (listed in
requirements.txt)
-
Clone the repository:
git clone https://github.com/yourusername/digital-asset-purchase-harvester.git cd digital-asset-purchase-harvester -
Create and activate a virtual environment:
Windows (PowerShell):
python -m venv venv .\venv\Scripts\Activate.ps1Windows (Command Prompt):
python -m venv venv venv\Scripts\activate.bat
Linux/macOS:
python3 -m venv venv source venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
Note: Always activate the virtual environment before running the script or installing packages. You should see
(venv)in your terminal prompt when the virtual environment is active.
Important: Always activate the virtual environment before running any commands!
Windows:
.\venv\Scripts\Activate.ps1 # PowerShell
# or
venv\Scripts\activate.bat # Command PromptLinux/macOS:
source venv/bin/activate-
Run the script:
- From an mbox file:
digital-asset-harvester --mbox-file path/to/your.mbox --output path/to/output.csv
- Directly from Gmail:
digital-asset-harvester --gmail --output path/to/output.csv
- From an mbox file:
The Digital Asset Purchase Harvester can generate a CSV file compatible with Koinly's "universal" format for manual import. This allows you to easily upload your transaction data to Koinly for tax reporting.
To use the Koinly CSV export, you must enable the enable_koinly_csv_export feature flag in your configuration. You can do this by setting the DAP_ENABLE_KOINLY_CSV_EXPORT environment variable to true:
export DAP_ENABLE_KOINLY_CSV_EXPORT=trueOnce the feature flag is enabled, you can generate the Koinly-compatible CSV file by specifying koinly as the --output-format:
digital-asset-harvester --mbox-file your_emails.mbox --output-format koinly --output koinly_transactions.csvThis will create a koinly_transactions.csv file in the correct format for manual upload to Koinly.
The harvester can verify your harvested totals against actual on-chain wallet balances using the blockchain-core library.
To enable this feature:
- Set the
DAP_ENABLE_BLOCKCHAIN_VERIFICATIONenvironment variable totrue. - Provide your wallet addresses using the
DAP_BLOCKCHAIN_WALLETSenvironment variable as a comma-separated list ofASSET:ADDRESSpairs.
Example:
export DAP_ENABLE_BLOCKCHAIN_VERIFICATION=true
export DAP_BLOCKCHAIN_WALLETS="BTC:1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa,ETH:0xde0B295669a9FD93d5F28D9Ec85E40f4cb697BAe"Then run the harvester with the --verify flag:
digital-asset-harvester --mbox-file your_emails.mbox --verifyA verification report will be displayed in the logs, showing matches and discrepancies.
Note: Koinly does not currently provide a public API for uploading transactions. The API integration feature is implemented as a placeholder for future compatibility.
The harvester includes a Koinly API client that is designed to support direct transaction uploads when Koinly releases an API. To enable this feature:
export DAP_ENABLE_KOINLY_API=true
export DAP_KOINLY_API_KEY=your_api_key
export DAP_KOINLY_PORTFOLIO_ID=your_portfolio_idThen use the --koinly-upload flag:
digital-asset-harvester --mbox-file your_emails.mbox --koinly-uploadCurrent Status: When the API upload is attempted, the client will provide informative error messages and automatically fall back to CSV export. The CSV file can then be manually uploaded through Koinly's web interface at https://app.koinly.io.
Manual Upload Process:
-
Generate a Koinly CSV file as described above
-
Log in to your Koinly account
-
Navigate to: Wallets > Add Wallet > File Import
-
Upload the generated CSV file
-
Review and confirm the imported transactions
-
The script will process the mbox file and output the purchase data to the specified CSV file.
-
When finished, deactivate the virtual environment:
deactivate
Configuration is centralized in digital_asset_harvester/config.py, which defines a typed HarvesterSettings dataclass. Use the helper functions to access or override settings:
from digital_asset_harvester import get_settings, get_settings_with_overrides
settings = get_settings()
customised = get_settings_with_overrides(llm_model_name="gemma3:4b")Every field can be overridden with environment variables using the DAP_ prefix. For example:
set DAP_LLM_MODEL_NAME=gemma3:4b
set DAP_ENABLE_PREPROCESSING=false
set DAP_MIN_CONFIDENCE_THRESHOLD=0.75
set DAP_LOG_JSON_OUTPUT=trueCall reload_settings() to refresh the cached settings after changing environment variables or advanced configuration:
from digital_asset_harvester import reload_settings
reload_settings()Run the test suite to validate the detection capabilities:
pytestThis will test various email scenarios including:
- Coinbase, Binance, and Kraken purchase confirmations
- Newsletter and price alert filtering
- Processing speed improvements
# Basic usage
digital-asset-harvester --mbox-file example.mbox --output output/purchase_data.csv
# The improved system will now:
# 1. Pre-filter emails using keyword detection
# 2. Use enhanced LLM prompts for better accuracy
# 3. Validate extracted data for quality
# 4. Provide detailed processing statisticsAdvanced defaults can be configured in a TOML configuration file (default: config/config.toml).
[harvester]
llm_model_name = "llama3.2:3b"
min_confidence_threshold = 0.6
enable_preprocessing = true
strict_validation = true
enable_debug_output = falseThe application now uses a factory function, get_llm_client, to create the appropriate LLM client based on the current settings. You can still provide a custom client to the EmailPurchaseExtractor:
from digital_asset_harvester import EmailPurchaseExtractor, get_llm_client, get_settings_with_overrides
# Example of using the factory with custom settings
settings = get_settings_with_overrides(llm_provider="openai", enable_cloud_llm=True)
llm_client = get_llm_client(provider="openai")
extractor = EmailPurchaseExtractor(settings=settings, llm_client=llm_client)All extracted purchase records flow through PurchaseValidator, which enforces numeric sanity checks, ISO currency formatting, and vendor/date presence. You can swap in your own validator or disable strict mode via configuration:
from digital_asset_harvester import EmailPurchaseExtractor, PurchaseValidator, get_settings_with_overrides
settings = get_settings_with_overrides(strict_validation=False)
validator = PurchaseValidator(allow_unknown_crypto=False)
extractor = EmailPurchaseExtractor(settings=settings)
extractor.validator = validatorPrompts are stored centrally via PromptManager. You can supply custom templates at runtime:
from digital_asset_harvester import EmailPurchaseExtractor, PromptManager, get_settings
settings = get_settings()
prompts = PromptManager()
prompts.register("classification", "Custom classification prompt for ${email_content}")
prompts.register("extraction", "Custom extraction prompt for ${email_content}")
extractor = EmailPurchaseExtractor(settings=settings, prompts=prompts)If you use a niche exchange that isn't supported out of the box, or if your purchase emails use unusual language, you can extend the pre-filtering list with custom keywords.
- Create a file named
keywords.txtin the root of the project. - Add one keyword or phrase per line.
- Lines starting with
#are ignored.
The harvester will load these keywords and use them to ensure your emails are not filtered out during the preprocessing stage.
You can also specify a different filename using the DAP_CUSTOM_KEYWORDS_FILE environment variable:
export DAP_CUSTOM_KEYWORDS_FILE="my_niche_keywords.txt"Enable JSON-formatted logs and capture processing metrics with the built-in telemetry helpers:
from digital_asset_harvester import MetricsTracker, StructuredLoggerFactory, log_event
factory = StructuredLoggerFactory(json_output=True)
logger = factory.build("demo", default_fields={"component": "demo"})
metrics = MetricsTracker()
metrics.increment("emails_processed")
log_event(logger, "demo_event", status="ok")If you need to recreate the virtual environment:
# Remove existing environment
rm -rf venv # Linux/macOS
rmdir /s venv # Windows
# Create new environment
python -m venv venvFor contributors, install development dependencies:
# After activating virtual environment
pip install -r requirements-dev.txtVirtual Environment Issues:
- "venv not recognized": Make sure you're in the project directory
- Permission errors on Windows: Run PowerShell as Administrator or use Command Prompt
- Python not found: Ensure Python 3.7+ is installed and in your PATH
Ollama Issues:
- Model not found: Run
ollama pull llama3.2:3bto download the model - Ollama not running: Start Ollama service or desktop application
- Connection errors: Check if Ollama is running on the default port (11434)
Import Errors:
- Always ensure virtual environment is activated before running scripts
- Reinstall requirements:
pip install --force-reinstall -r requirements.txt
You can set these environment variables to customize behavior:
# Windows
set OLLAMA_HOST=http://localhost:11434
set PYTHONPATH=%PYTHONPATH%;.
# Linux/macOS
export OLLAMA_HOST=http://localhost:11434
export PYTHONPATH=$PYTHONPATH:.To use the Gmail integration, you need to enable the Gmail API and create credentials.
-
Enable the Gmail API:
- Go to the Google Cloud Console.
- Create a new project or select an existing one.
- In the API Library, search for "Gmail API" and enable it.
-
Create OAuth 2.0 Credentials:
- Go to the "Credentials" page in the Google Cloud Console.
- Click "Create Credentials" and select "OAuth client ID".
- Choose "Desktop app" as the application type.
- Download the JSON file and save it as
credentials.jsonin the root of the project.
When you run the script with the --gmail flag for the first time, you will be prompted to authorize the application.
To use the IMAP integration, you need to enable the feature flag and provide the server address and credentials.
The IMAP feature is controlled by the enable_imap feature flag. You can enable it by setting the DAP_ENABLE_IMAP environment variable to true:
export DAP_ENABLE_IMAP=trueAlternatively, you can set enable_imap = true in your configuration file.
If your IMAP server uses password authentication, you can provide your credentials using the --imap-user and --imap-password arguments:
digital-asset-harvester --imap \
--imap-server imap.example.com \
--imap-user user@example.com \
--imap-password your_password \
--output crypto_purchases.csvIf you're using Gmail, you'll need to use OAuth2. First, follow the instructions in the "Gmail API Setup" section to get your credentials.json file. Then, you can run the script with the --imap-auth-type gmail_oauth2 argument:
digital-asset-harvester --imap \
--imap-server imap.gmail.com \
--imap-user user@gmail.com \
--imap-auth-type gmail_oauth2 \
--output crypto_purchases.csvIf you're using Outlook, you'll need to use OAuth2. First, you'll need to register an application in the Azure portal and get a client ID and authority URL. Then, you can run the script with the --imap-auth-type outlook_oauth2 argument:
digital-asset-harvester --imap \
--imap-server outlook.office365.com \
--imap-user user@outlook.com \
--imap-auth-type outlook_oauth2 \
--client-id your_client_id \
--authority https://login.microsoftonline.com/your_tenant_id \
--output crypto_purchases.csvIn addition to mbox files and IMAP, the harvester supports direct ingestion using Gmail and Outlook APIs. This is often faster and more convenient than downloading large mbox files.
Follow the "Gmail API Setup" section to get your credentials.json. Then run:
digital-asset-harvester --gmail --output crypto_purchases.csvYou can customize the search query with --gmail-query (default: from:coinbase OR from:binance).
To use the Outlook API, you'll need to register an application in the Azure portal and get a client ID and authority URL. Then, you can run the script with the --outlook flag:
digital-asset-harvester --outlook \
--client-id your_client_id \
--authority https://login.microsoftonline.com/your_tenant_id \
--output crypto_purchases.csvYou can customize the search query with --outlook-query (default: from:coinbase OR from:binance).
This project is licensed under the MIT License. See the LICENSE file for details.