Migrate is a command-line tool for orchestrating large-scale Archivematica AIP migrations. It automates the process of moving and replicating AIPs between Archivematica Storage Service locations, e.g. migrating AIPs from on-premises storage into DuraCloud while keeping a local replica set.
Archivematica’s Storage Service already supports moving and replicating AIPs, but running those operations at scale (tens of thousands of AIPs) is:
- Slow and error-prone if done manually
- Hard to resume if interrupted
- Lacking visibility into what has succeeded or failed
Migrate solves these problems by adding a durable orchestration layer:
- Move and replicate AIPs between Storage Service locations
- Batch processing from a simple UUID list (
input.txt) - Reliable orchestration with Temporal so workflows survive crashes and restarts
- State tracking in a local SQLite database
- Parallel execution across multiple workers
- CSV reports (
move-report.csv,replication-report.csv) for QA, auditing, and compliance
In short: give it a list of AIPs, and Migrate will orchestrate every move and replication reliably until the job is done.
Because Storage Service was never designed to be a workflow engine. It can move and replicate AIPs, but it has no durable queue, no orchestration layer, and no good way to resume or audit thousands of operations. It works for one-off jobs, but at scale it becomes brittle.
Migrate fills that gap: it wraps Storage Service operations in durable Temporal workflows, adding the missing layer of scale, reliability, and reporting that Storage Service can't provide on its own. Over time, we expect new approaches to emerge for large-scale digital preservation workflows, but today Migrate is the practical tool to get the job done.
Migrate separates preparation, command submission, and execution:
-
Preparation (
migrate load-input): Reads UUIDs frominput.txt, inserts them into the local SQLite database, queries Storage Service for their metadata, and produces an initial report. This gives you a baseline snapshot of AIP states before running long migrations. -
Command submission (
migrate replicate/migrate move): The client submits one workflow per AIP to Temporal. -
Execution (
migrate worker): Workers pick up tasks and execute them: talk to the Storage Service API, request move or replication operations, check fixity, and update the database. -
Tracking & reporting (
migrate export): The database keeps a durable record of every AIP’s state. At any time you can generate CSV reports to monitor progress or validate results.
The following diagram illustrates the basic architecture:
flowchart TD
A[Client: migrate CLI] -->|Submit workflows| B[Temporal Server]
B --> C[Worker: migrate worker]
C --> D[Archivematica Storage Service]
D -->|Move/Replicate| E1[Storage Location A]
D -->|Move/Replicate| E2[Storage Location B]
D -->|Move/Replicate| E3[Storage Location C]
C --> F[(SQLite DB: configurable path)]
F --> G[CSV Reports]
When you run migrate move or migrate replicate, the CLI reads input.txt,
starts a Temporal workflow for the first UUID, and waits until that workflow
finishes before it touches the next line. AIPs are handled strictly one after
another: even though Temporal can queue lots of work, the client makes the job
serial by blocking on each result.
Temporal gives us durability and resumability, but we reintroduce a bottleneck by tying workflow submission to a blocking CLI loop. We can explore future iterations that decouple submission from the CLI process so that Temporal can exploit higher parallelism while we retain the same operational guarantees.
If you stop the CLI and run it again, it will reread input.txt and inspect the
local database for each UUID. Entries whose status is already moved or
not-found are skipped, so completed work is not repeated.
-
Access to an Archivematica Storage Service instance with valid API credentials.
-
A running Temporal Server:
-
Production-ready deployment (recommended):
Connect to a server already maintained by your organization, use Temporal Cloud, or deploy your own cluster by following the production deployment guide. Be aware that operating and maintaining your own cluster adds substantial complexity to your setup.
-
Local development (quick start):
Start a lightweight development server with the Temporal CLI:
temporal server start-dev --db-filename ./temporal.db
-
-
Install Migrate: prebuilt packages are available from the releases page.
Copy the provided config.example.json to config.json and adjust the values
for your environment. The application looks for config.json in several
locations, in this order:
- The current working directory
- The directory containing the
migratebinary $XDG_CONFIG_HOME/migrate(or%AppData%\migrateon Windows)- Your home directory
The first readable file found wins. The parser accepts standard JSON, but also supports comments and trailing commas via HuJSON.
Create an input.txt file containing the UUIDs of AIPs you want to process
(one UUID per line):
12345678-1234-1234-1234-123456789abc
87654321-4321-4321-4321-cba987654321
abcdef01-2345-6789-abcd-ef0123456789If you need to trim an existing UUID list before loading it, use the
migrate list-filter subcommand. Place original_list.txt and
to_filter_out.txt in the current directory, run migrate list-filter, and
then use the generated final_list.txt as your filtered list.
migrate load-inputThis validates the UUIDs in input.txt and initializes them in the database.
migrate workerThis starts a worker process that handles Temporal workflows. Keep this running in a separate terminal.
At this point, you can either replicate or move AIPs.
The following command starts the replication process for AIPs to the configured replication locations:
migrate replicate
On the other hand, to move AIPs from source to destination, run:
migrate move
Generate CSV reports for move or replication workflows:
migrate export move
migrate export replicate
Each command writes the corresponding report (move-report.csv or
replication-report.csv) with the latest status for every AIP.