Backup tool for Harvest Time Tracking (Please use our affiliate link to try out Harvest to support the development of this tool as Open Source software) data using the Harvest API v2. This tool automatically discovers all your Harvest accounts and backs up all data from all endpoints, including invoices, estimates, time entries, projects, clients, and more.
We used the Harvest API v2 OpenAPI specification, kindly provided by jolicode, to generate this tool.
The idea for this tool is to have at least a full backup or export of all data from Harvest. It can be used to restore data to a new Harvest account, archive it for compliance purposes or to migrate data to a new or different system.
ATM it only implements backup functionality, but it could be easily extended to include restore functionality in the future. We decided to solve that problem when we actually need it. The JSON files generated by the tool are fully compliant with the Harvest API v2 OpenAPI specification and can be used to restore data to a new Harvest account via simple POST requests. No need to write a restore tool, just use the JSON files as input for the POST requests.
- Auto-discovery: Automatically discovers all Harvest accounts using your Personal Access Token (PAT)
- Comprehensive Backup: Backs up all Harvest API v2 endpoints including:
- Clients
- Contacts
- Projects and assignments
- Tasks
- Time entries
- Users and assignments (including
/users/me) - Expenses and expense categories
- Invoices (with PDF downloads) and invoice item categories
- Estimates (with PDF downloads) and estimate item categories
- Roles
- Company settings
- Incremental Backup: Binary artifacts (PDFs) are downloaded incrementally
- Rate Limiting: Automatically handles Harvest API rate limits (100 requests per 15 seconds)
- Error Handling: Robust error handling with retries and exponential backoff
- Docker Support: Run in a containerized environment
-
Get a Personal Access Token (PAT) from Harvest ID from a user with the necessary permissions to access the data you need to backup.
-
Create a directory to store the backup:
mkdir -p backup- Run the backup tool via Docker like this:
docker run --rm -t \
-u $(id -u):$(id -g) \
-e HARVEST_PAT=your_personal_access_token \
-v $(pwd)/backup:/backup ghcr.io/tektit/getharvest-backup:latestThe backup will be stored in the backup directory. Add --verbose to see more detailed output or --quiet to show only errors.
First, install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | shThen install the project:
git clone https://github.com/tektit/getharvest-backup.git
cd getharvest-backup
uv sync --no-devNote: By default, uv sync includes dev dependencies. Use --no-dev to install only runtime dependencies.
docker build -t getharvest-backup .Or use the pre-built image from GitHub Packages:
docker pull ghcr.io/tektit/getharvest-backup:latestdocker run --rm -t \
-e HARVEST_PAT=your_personal_access_token \
-v $(pwd)/backup:/backup \
ghcr.io/tektit/getharvest-backup:latest# Using environment variable for PAT
export HARVEST_PAT=your_personal_access_token
uv run harvest-backup --output ./backup
# Or specify PAT directly
uv run harvest-backup --pat your_personal_access_token --output ./backup
# Dry run mode (test without writing files)
uv run harvest-backup --pat your_pat --dry-run
# Verbose logging
uv run harvest-backup --pat your_pat --verbose# Build the image
docker build -t getharvest-backup .
# Run the backup (one-shot job)
# Use -t for interactive use (full terminal width)
docker run --rm -t \
-e HARVEST_PAT=your_personal_access_token \
-v $(pwd)/backup:/backup \
getharvest-backupYou can also use a .env file for the PAT:
# Create .env file with your PAT
echo "HARVEST_PAT=your_personal_access_token" > .env
# Run with env file
# Use -t for interactive use (full terminal width)
docker run --rm -t \
--env-file .env \
-v $(pwd)/backup:/backup \
getharvest-backupThe backup will be stored in /backup inside the container (mapped to ./backup on your host by default).
backup/
├── accounts.json # List of all discovered accounts
├── harvest_account_12345/
│ ├── clients/
│ │ ├── list.json # All clients (source of truth - contains full objects)
│ │ └── 123.json # Individual client (synthetically created from list.json)
│ ├── contacts/
│ │ ├── list.json # All contacts
│ │ └── 456.json # Individual contact detail
│ ├── projects/
│ │ ├── list.json
│ │ ├── 456.json
│ │ ├── 456_user_assignments.json
│ │ ├── 456_task_assignments.json
│ │ └── artifacts/ # Project receipts (incremental)
│ ├── tasks/
│ │ ├── list.json
│ │ └── 789.json
│ ├── time_entries/
│ │ ├── list.json
│ │ └── 100.json
│ ├── users/
│ │ ├── list.json
│ │ ├── 1.json
│ │ ├── 1_billable_rates.json
│ │ ├── 1_cost_rates.json
│ │ ├── 1_project_assignments.json
│ │ └── 1_teammates.json
│ ├── users_me/
│ │ └── data.json # Current user info
│ ├── users_me_project_assignments/
│ │ └── list.json # Current user project assignments
│ ├── expenses/
│ │ ├── list.json
│ │ └── 200.json
│ ├── expense_categories/
│ │ ├── list.json
│ │ └── 300.json
│ ├── invoices/
│ │ ├── list.json
│ │ └── 300.json
│ │ # Note: PDFs not available via API - client_key is in JSON for web access
│ ├── invoice_item_categories/
│ │ ├── list.json
│ │ └── 400.json
│ ├── estimates/
│ │ ├── list.json
│ │ └── 400.json
│ │ # Note: PDFs not available via API - client_key is in JSON for web access
│ ├── estimate_item_categories/
│ │ ├── list.json
│ │ └── 500.json
│ ├── roles/
│ │ ├── list.json
│ │ └── 500.json
│ └── company/
│ └── data.json
└── harvest_account_67890/
└── ...
list.json: Contains all items from the API list endpoint. This is the source of truth and contains full object data (same schema as detail endpoints).- Individual
{id}.jsonfiles: Created synthetically fromlist.jsonfor convenience and direct access. No additional API calls are made. - For restoration: Use the data from
list.jsonor individual files (they contain the same data). The Harvest API uses POST endpoints with the object data to create/restore items.
The tool uses incremental backup for binary artifacts:
- JSON files: Always written (overwritten) to ensure data is up-to-date
- Binary files: Only downloaded if they don't already exist or have changed
Note on PDFs: The Harvest API does not provide PDF downloads via the /v2/invoices/{id}.pdf or /v2/estimates/{id}.pdf endpoints. These endpoints return JSON identical to the detail endpoints. PDFs can be accessed via the public web URL using the client_key field from the invoice/estimate JSON: https://{subdomain}.harvestapp.com/client/invoices/{client_key}.pdf
Artifact manifests are stored in .artifacts_manifest.json in the backup root directory.
The tool backs up all Harvest API v2 data endpoints as defined in the official OpenAPI specification:
- Clients:
/v2/clients- All clients with full details - Contacts:
/v2/contacts- All client contacts (standalone endpoint) - Projects:
/v2/projects- All projects with user and task assignments - Tasks:
/v2/tasks- All tasks - Time Entries:
/v2/time_entries- All time entries - Users:
/v2/users- All users with nested resources:- Billable rates (
/v2/users/{id}/billable_rates) - Cost rates (
/v2/users/{id}/cost_rates) - Project assignments (
/v2/users/{id}/project_assignments) - Teammates (
/v2/users/{id}/teammates)
- Billable rates (
- Current User:
/v2/users/me- Authenticated user information/v2/users/me/project_assignments- Current user's project assignments
- Expenses:
/v2/expenses- All expenses/v2/expense_categories- All expense categories
- Invoices:
/v2/invoices- All invoices (note: PDFs not available via API)/v2/invoice_item_categories- All invoice item categories
- Estimates:
/v2/estimates- All estimates (note: PDFs not available via API)/v2/estimate_item_categories- All estimate item categories
- Roles:
/v2/roles- All roles - Company:
/v2/company- Company settings (single resource)
- Report endpoints (
/v2/reports/*): These require date range parameters and are analytical rather than raw data. They can be added as a future enhancement if needed.
All endpoints are backed up with full pagination support, ensuring complete data coverage even for accounts with thousands of records.
The tool automatically handles Harvest API rate limits and pagination:
- Rate limit: 100 requests per 15 seconds
- Pagination: Automatically handles pagination with up to 2000 items per page (maximum per Harvest API)
- Pagination detection: Supports both
next_page(number) andlinks.next(URL) for maximum compatibility - Automatic retry: Exponential backoff on rate limit errors (429)
- Network retry: Automatic retry on network errors
- No retry: Authentication errors (401, 403) are not retried
The tool includes comprehensive test coverage covering:
- API client functionality (rate limiting, pagination, retries)
- Account discovery
- Backup executor
- File writer with incremental backup
- Error handling
- Integration tests with full backup flow
Run tests with pytest:
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=harvest_backup --cov-report=htmlYou can test the backup tool locally without a real Harvest API token using mock data:
# Run backup with mock data (no API token needed)
python tests/run_mock_backup.py --output ./test_backup
# Run with verbose logging to see all API calls
python tests/run_mock_backup.py --output ./test_backup --verbose
# Run with debug logging
python tests/run_mock_backup.py --output ./test_backup --debug
# Use any CLI option
python tests/run_mock_backup.py --output ./test_backup --quietThe mock backup script:
- Uses sample test data for all endpoints (clients, projects, invoices, etc.)
- Includes PDF test data for invoices and estimates
- Supports all CLI features (logging, progress indicators, error handling)
- Produces the same output structure as a real backup
- Shows VERBOSE logs when using
--verboseflag
Example output:
[20:20:13] INFO Starting backup of all Harvest accounts...
[20:20:13] INFO Discovering Harvest accounts...
[20:20:13] INFO Found 2 Harvest account(s)
[20:20:13] INFO - Account 12345: Test Company
[20:20:13] INFO - Account 67890: Another Company
[20:20:13] INFO Backing up account 12345 (Test Company)...
[20:20:13] INFO Completed backup for account 12345
[20:20:13] INFO Backing up account 67890 (Another Company)...
[20:20:13] INFO Completed backup for account 67890
[20:20:13] INFO Backup completed successfully
[20:20:13] INFO ✓ Backup completed successfully
This is useful for:
- Testing the backup tool without API access
- Verifying output structure and file organization
- Debugging backup logic locally
- Demonstrating the tool's functionality
HARVEST_PAT: Personal Access Token (can also be provided via--patargument)
--pat: Personal Access Token (overridesHARVEST_PATenv var)--output,-o: Output directory (default:./backup)--user-agent: User-Agent header value (default:HarvestBackupTool/0.1.0)--verbose,-v: Enable verbose logging--debug,-d: Enable debug logging--quiet,-q: Enable quiet logging (only show errors)
The tool handles various error scenarios:
- Rate limit errors (429): Automatic retry with backoff based on
Retry-Afterheader - Network errors: Retry with exponential backoff
- Authentication errors: Clear error message, no retry
- Missing accounts: Skip and log warning
- File write errors: Log and continue with other endpoints
Apache License 2.0 - see LICENSE file for details.
Contributions are welcome! Please ensure:
- Code follows Python 3.13 best practices
- All tests pass
- Type hints are used throughout
- Code is formatted with black (line length 100) and passes flake8 linting
For issues, questions, or feature requests, please open an issue on GitHub.
For commercial support please contact us via our website or reach out to Schlomo.
For small scale sponsoring please use our Try Harvest affiliate link to take a look at Harvest for Time Tracking.