getharvest-backup

Backup tool for Harvest Time Tracking (Please use our affiliate link to try out Harvest to support the development of this tool as Open Source software) data using the Harvest API v2. This tool automatically discovers all your Harvest accounts and backs up all data from all endpoints, including invoices, estimates, time entries, projects, clients, and more.

We used the Harvest API v2 OpenAPI specification, kindly provided by jolicode, to generate this tool.

The idea for this tool is to have at least a full backup or export of all data from Harvest. It can be used to restore data to a new Harvest account, archive it for compliance purposes or to migrate data to a new or different system.

ATM it only implements backup functionality, but it could be easily extended to include restore functionality in the future. We decided to solve that problem when we actually need it. The JSON files generated by the tool are fully compliant with the Harvest API v2 OpenAPI specification and can be used to restore data to a new Harvest account via simple POST requests. No need to write a restore tool, just use the JSON files as input for the POST requests.

Features

Auto-discovery: Automatically discovers all Harvest accounts using your Personal Access Token (PAT)
Comprehensive Backup: Backs up all Harvest API v2 endpoints including:
- Clients
- Contacts
- Projects and assignments
- Tasks
- Time entries
- Users and assignments (including /users/me)
- Expenses and expense categories
- Invoices (with PDF downloads) and invoice item categories
- Estimates (with PDF downloads) and estimate item categories
- Roles
- Company settings
Incremental Backup: Binary artifacts (PDFs) are downloaded incrementally
Rate Limiting: Automatically handles Harvest API rate limits (100 requests per 15 seconds)
Error Handling: Robust error handling with retries and exponential backoff
Docker Support: Run in a containerized environment

Quick Start

Get a Personal Access Token (PAT) from Harvest ID from a user with the necessary permissions to access the data you need to backup.
Create a directory to store the backup:

mkdir -p backup

Run the backup tool via Docker like this:

docker run --rm -t \
  -u $(id -u):$(id -g) \
  -e HARVEST_PAT=your_personal_access_token \
  -v $(pwd)/backup:/backup ghcr.io/tektit/getharvest-backup:latest

The backup will be stored in the backup directory. Add --verbose to see more detailed output or --quiet to show only errors.

Installation

From Source

First, install uv if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then install the project:

git clone https://github.com/tektit/getharvest-backup.git
cd getharvest-backup
uv sync --no-dev

Note: By default, uv sync includes dev dependencies. Use --no-dev to install only runtime dependencies.

Using Docker

docker build -t getharvest-backup .

Or use the pre-built image from GitHub Packages:

docker pull ghcr.io/tektit/getharvest-backup:latest

docker run --rm -t \
  -e HARVEST_PAT=your_personal_access_token \
  -v $(pwd)/backup:/backup \
  ghcr.io/tektit/getharvest-backup:latest

Usage

Command Line

# Using environment variable for PAT
export HARVEST_PAT=your_personal_access_token
uv run harvest-backup --output ./backup

# Or specify PAT directly
uv run harvest-backup --pat your_personal_access_token --output ./backup

# Dry run mode (test without writing files)
uv run harvest-backup --pat your_pat --dry-run

# Verbose logging
uv run harvest-backup --pat your_pat --verbose

Docker

# Build the image
docker build -t getharvest-backup .

# Run the backup (one-shot job)
# Use -t for interactive use (full terminal width)
docker run --rm -t \
  -e HARVEST_PAT=your_personal_access_token \
  -v $(pwd)/backup:/backup \
  getharvest-backup

You can also use a .env file for the PAT:

# Create .env file with your PAT
echo "HARVEST_PAT=your_personal_access_token" > .env

# Run with env file
# Use -t for interactive use (full terminal width)
docker run --rm -t \
  --env-file .env \
  -v $(pwd)/backup:/backup \
  getharvest-backup

The backup will be stored in /backup inside the container (mapped to ./backup on your host by default).

Backup Output Structure

backup/
├── accounts.json                    # List of all discovered accounts
├── harvest_account_12345/
│   ├── clients/
│   │   ├── list.json               # All clients (source of truth - contains full objects)
│   │   └── 123.json                # Individual client (synthetically created from list.json)
│   ├── contacts/
│   │   ├── list.json               # All contacts
│   │   └── 456.json                # Individual contact detail
│   ├── projects/
│   │   ├── list.json
│   │   ├── 456.json
│   │   ├── 456_user_assignments.json
│   │   ├── 456_task_assignments.json
│   │   └── artifacts/             # Project receipts (incremental)
│   ├── tasks/
│   │   ├── list.json
│   │   └── 789.json
│   ├── time_entries/
│   │   ├── list.json
│   │   └── 100.json
│   ├── users/
│   │   ├── list.json
│   │   ├── 1.json
│   │   ├── 1_billable_rates.json
│   │   ├── 1_cost_rates.json
│   │   ├── 1_project_assignments.json
│   │   └── 1_teammates.json
│   ├── users_me/
│   │   └── data.json               # Current user info
│   ├── users_me_project_assignments/
│   │   └── list.json               # Current user project assignments
│   ├── expenses/
│   │   ├── list.json
│   │   └── 200.json
│   ├── expense_categories/
│   │   ├── list.json
│   │   └── 300.json
│   ├── invoices/
│   │   ├── list.json
│   │   └── 300.json
│   │   # Note: PDFs not available via API - client_key is in JSON for web access
│   ├── invoice_item_categories/
│   │   ├── list.json
│   │   └── 400.json
│   ├── estimates/
│   │   ├── list.json
│   │   └── 400.json
│   │   # Note: PDFs not available via API - client_key is in JSON for web access
│   ├── estimate_item_categories/
│   │   ├── list.json
│   │   └── 500.json
│   ├── roles/
│   │   ├── list.json
│   │   └── 500.json
│   └── company/
│       └── data.json
└── harvest_account_67890/
    └── ...

Data Structure Notes

list.json: Contains all items from the API list endpoint. This is the source of truth and contains full object data (same schema as detail endpoints).
Individual {id}.json files: Created synthetically from list.json for convenience and direct access. No additional API calls are made.
For restoration: Use the data from list.json or individual files (they contain the same data). The Harvest API uses POST endpoints with the object data to create/restore items.

Incremental Backup

The tool uses incremental backup for binary artifacts:

JSON files: Always written (overwritten) to ensure data is up-to-date
Binary files: Only downloaded if they don't already exist or have changed

Note on PDFs: The Harvest API does not provide PDF downloads via the /v2/invoices/{id}.pdf or /v2/estimates/{id}.pdf endpoints. These endpoints return JSON identical to the detail endpoints. PDFs can be accessed via the public web URL using the client_key field from the invoice/estimate JSON: https://{subdomain}.harvestapp.com/client/invoices/{client_key}.pdf

Artifact manifests are stored in .artifacts_manifest.json in the backup root directory.

API Endpoint Coverage

The tool backs up all Harvest API v2 data endpoints as defined in the official OpenAPI specification:

Core Data Endpoints

Clients: /v2/clients - All clients with full details
Contacts: /v2/contacts - All client contacts (standalone endpoint)
Projects: /v2/projects - All projects with user and task assignments
Tasks: /v2/tasks - All tasks
Time Entries: /v2/time_entries - All time entries
Users: /v2/users - All users with nested resources:
- Billable rates (/v2/users/{id}/billable_rates)
- Cost rates (/v2/users/{id}/cost_rates)
- Project assignments (/v2/users/{id}/project_assignments)
- Teammates (/v2/users/{id}/teammates)
Current User:
- /v2/users/me - Authenticated user information
- /v2/users/me/project_assignments - Current user's project assignments
Expenses:
- /v2/expenses - All expenses
- /v2/expense_categories - All expense categories
Invoices:
- /v2/invoices - All invoices (note: PDFs not available via API)
- /v2/invoice_item_categories - All invoice item categories
Estimates:
- /v2/estimates - All estimates (note: PDFs not available via API)
- /v2/estimate_item_categories - All estimate item categories
Roles: /v2/roles - All roles
Company: /v2/company - Company settings (single resource)

What's Not Included

Report endpoints (/v2/reports/*): These require date range parameters and are analytical rather than raw data. They can be added as a future enhancement if needed.

All endpoints are backed up with full pagination support, ensuring complete data coverage even for accounts with thousands of records.

Rate Limiting and Pagination

The tool automatically handles Harvest API rate limits and pagination:

Rate limit: 100 requests per 15 seconds
Pagination: Automatically handles pagination with up to 2000 items per page (maximum per Harvest API)
Pagination detection: Supports both next_page (number) and links.next (URL) for maximum compatibility
Automatic retry: Exponential backoff on rate limit errors (429)
Network retry: Automatic retry on network errors
No retry: Authentication errors (401, 403) are not retried

Testing

The tool includes comprehensive test coverage covering:

API client functionality (rate limiting, pagination, retries)
Account discovery
Backup executor
File writer with incremental backup
Error handling
Integration tests with full backup flow

Unit Tests

Run tests with pytest:

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=harvest_backup --cov-report=html

Local Testing with Mock Data

You can test the backup tool locally without a real Harvest API token using mock data:

# Run backup with mock data (no API token needed)
python tests/run_mock_backup.py --output ./test_backup

# Run with verbose logging to see all API calls
python tests/run_mock_backup.py --output ./test_backup --verbose

# Run with debug logging
python tests/run_mock_backup.py --output ./test_backup --debug

# Use any CLI option
python tests/run_mock_backup.py --output ./test_backup --quiet

The mock backup script:

Uses sample test data for all endpoints (clients, projects, invoices, etc.)
Includes PDF test data for invoices and estimates
Supports all CLI features (logging, progress indicators, error handling)
Produces the same output structure as a real backup
Shows VERBOSE logs when using --verbose flag

Example output:

[20:20:13] INFO     Starting backup of all Harvest accounts...
[20:20:13] INFO     Discovering Harvest accounts...
[20:20:13] INFO     Found 2 Harvest account(s)
[20:20:13] INFO       - Account 12345: Test Company
[20:20:13] INFO       - Account 67890: Another Company
[20:20:13] INFO     Backing up account 12345 (Test Company)...
[20:20:13] INFO     Completed backup for account 12345
[20:20:13] INFO     Backing up account 67890 (Another Company)...
[20:20:13] INFO     Completed backup for account 67890
[20:20:13] INFO     Backup completed successfully
[20:20:13] INFO     ✓ Backup completed successfully

This is useful for:

Testing the backup tool without API access
Verifying output structure and file organization
Debugging backup logic locally
Demonstrating the tool's functionality

Configuration

Environment Variables

HARVEST_PAT: Personal Access Token (can also be provided via --pat argument)

Command Line Options

--pat: Personal Access Token (overrides HARVEST_PAT env var)
--output, -o: Output directory (default: ./backup)
--user-agent: User-Agent header value (default: HarvestBackupTool/0.1.0)
--verbose, -v: Enable verbose logging
--debug, -d: Enable debug logging
--quiet, -q: Enable quiet logging (only show errors)

Error Handling

The tool handles various error scenarios:

Rate limit errors (429): Automatic retry with backoff based on Retry-After header
Network errors: Retry with exponential backoff
Authentication errors: Clear error message, no retry
Missing accounts: Skip and log warning
File write errors: Log and continue with other endpoints

License

Apache License 2.0 - see LICENSE file for details.

Contributing

Contributions are welcome! Please ensure:

Code follows Python 3.13 best practices
All tests pass
Type hints are used throughout
Code is formatted with black (line length 100) and passes flake8 linting

Support & Sponsoring

For issues, questions, or feature requests, please open an issue on GitHub.

For commercial support please contact us via our website or reach out to Schlomo.

For small scale sponsoring please use our Try Harvest affiliate link to take a look at Harvest for Time Tracking.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
src/harvest_backup		src/harvest_backup
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
harvest-openapi-2025-11-03.yaml		harvest-openapi-2025-11-03.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

getharvest-backup

Features

Quick Start

Installation

From Source

Using Docker

Usage

Command Line

Docker

Backup Output Structure

Data Structure Notes

Incremental Backup

API Endpoint Coverage

Core Data Endpoints

What's Not Included

Rate Limiting and Pagination

Testing

Unit Tests

Local Testing with Mock Data

Configuration

Environment Variables

Command Line Options

Error Handling

License

Contributing

Support & Sponsoring

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

tektit/getharvest-backup

Folders and files

Latest commit

History

Repository files navigation

getharvest-backup

Features

Quick Start

Installation

From Source

Using Docker

Usage

Command Line

Docker

Backup Output Structure

Data Structure Notes

Incremental Backup

API Endpoint Coverage

Core Data Endpoints

What's Not Included

Rate Limiting and Pagination

Testing

Unit Tests

Local Testing with Mock Data

Configuration

Environment Variables

Command Line Options

Error Handling

License

Contributing

Support & Sponsoring

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages