Smelt can capture the complete state of a running stack to disk and later restore it — contract state on chain, docker volumes, identity keys, UCAN proofs, and the manifest that described the topology. A snapshot-restored boot reaches "piri registered, ready to upload" in ~10s instead of the ~45s a cold boot costs.
A cold boot of smelt is dominated by on-chain activity: contract deployment, piri registration with the delegator and upload services, and PDP data-set creation. Each of those steps waits on blockchain finality (several epochs × 3s), serializes across multiple piri nodes, and totals tens of seconds to multiple minutes depending on topology. For iterative development — tweak code, restart stack, reproduce a bug — paying this cost on every cycle is brutal.
Snapshots let you pay the cost once, then resume from the checkpointed state on every subsequent boot.
- You brought up a stack, it reached a known-good state, and you want to preserve that state for tomorrow's dev session.
- You're iterating on a service and want to skip the multi-minute registration dance on every restart.
- You want to switch between different scenarios (1 piri vs 3, sqlite vs postgres) without paying full cold-boot costs each time.
- Docker engine 25+. Smelt pins the compose project name and uses
healthcheck.start_interval(both added in engine 25). The Makefile'scheck-dockertarget fails early with an upgrade pointer if you're on an older engine. - Linux or macOS host. Smelt assumes a Unix-family docker host.
- CI. CI should exercise the cold-boot path — that's what's broken if an upstream image or contract regresses. Snapshots hide regressions.
- Contract code changes. The committed baseline
(
systems/blockchain/state/anvil-state.json) encodes deployed contract addresses and state. Saved snapshots encode them too. When filecoin-services ships a new release, recapture your snapshots.
Stack up and healthy:
./smelt snapshot save baselineLater, restore and boot in one step:
YES=1 make nuke
make up SNAPSHOT=baselineThe stack comes up in ~10s, already at the state you saved.
Captures the running stack into generated/snapshots/<name>/. Stops the
stack gracefully (so the blockchain can dump chain state on SIGTERM),
archives each docker volume, copies keys/proofs/manifest, writes a
descriptor. The stack is left stopped on success — run make up to
resume or make up SNAPSHOT=<name> to restart from the saved point.
The stack must be fully healthy at save time. A save of a half-healthy stack would produce an inconsistent checkpoint.
Table of known snapshots with age, size, and volume count.
Deletes the snapshot directory. No undo.
The canonical way to boot from a snapshot. Accepts either a name
(resolved under generated/snapshots/<name>/) or a path (absolute or
relative, for snapshots elsewhere on disk):
make up SNAPSHOT=baseline
make up SNAPSHOT=/tmp/archived-snapUnder the hood, this runs ./smelt snapshot load <value> and then the
normal make up flow.
A building block for make up SNAPSHOT=…. Populates on-disk state from
the snapshot but doesn't start the stack. Useful for tests or when you
want to poke at the restored state before boot.
generated/snapshots/<name>/
├── manifest.json # name, created_at, volumes, keys, proofs, images{tag,digest}
├── smelt.yml # topology at save time (session manifest source)
├── blockchain/
│ ├── anvil-state.json # chain state captured via SIGTERM dump
│ └── deployed-addresses.json # PDP contract addresses
├── keys/ # every *.pem, *.pub, *.hex in generated/keys/
├── proofs/ # every *.txt in generated/proofs/
└── volumes/ # .tar per named docker volume
├── piri-0-data.tar
├── dynamodb-data.tar # delegator allow list, upload registry
├── minio-data.tar # upload's S3 backend
├── ipni-data.tar # content discovery index
├── guppy-data.tar # client login + spaces
├── piri-postgres-data.tar # only when topology uses postgres
└── piri-minio-data.tar # only when topology uses S3
Tracked files at the project root (smelt.yml,
systems/blockchain/state/*.json) are never modified by a load. Your
git working tree stays clean.
When you run make up SNAPSHOT=X, smelt copies the snapshot's
smelt.yml to generated/snapshot-scratch/smelt.yml. This path is
gitignored and acts as the session manifest: while it exists,
smelt generate and smelt snapshot save read from it instead of the
project's tracked smelt.yml.
This has two important consequences:
-
Subsequent
make up(without SNAPSHOT) stays on the same topology. The session persists acrossmake down/make upcycles until you explicitly end it. This is what makes the resume-from-dump flow work: you can boot from a snapshot, stop, come back tomorrow, and continue. -
Your tracked
smelt.ymlis never silently overridden. Edits to it don't take effect while a session is active. To apply changes fromsmelt.yml, end the session first.
Ending a session:
make clean— wipes volumes + chain state + session manifest.make nuke/make fresh— same, plus keys/proofs.
After either, the next make up reads the project's smelt.yml.
Integration tests using the testcontainers-go stack (stack.MustNewStack)
can boot from a snapshot in two ways depending on where the test lives.
Every snapshot committed to smelt's snapshots/ directory is bundled
into the Go module via //go:embed, so consumers that import smelt as
a dependency can reach them without knowing any filesystem paths:
import "github.com/storacha/smelt/pkg/stack"
func TestFromExternalRepo(t *testing.T) {
s := stack.MustNewStack(t,
stack.WithEmbeddedSnapshot("3-piri-filesystem-sqlite"),
)
// Stack up in ~10s from the bundled snapshot.
}Discover what's available at runtime:
names, _ := stack.ListEmbeddedSnapshots()
// → ["3-piri-filesystem-sqlite", ...]This is the recommended path for anything outside the smelt repo — snapshots travel with the Go import, there's nothing to vendor, and bumping the smelt version gives you the latest snapshot fixtures too.
If you're writing tests inside the smelt repo, or you have a snapshot
that isn't bundled (e.g., one you saved yourself via smelt snapshot save), pass a filesystem path:
s := stack.MustNewStack(t,
stack.WithSnapshot("../../snapshots/3-piri-filesystem-sqlite"),
)Paths are resolved relative to the test's working directory. Absolute
paths also work. Under the hood, WithEmbeddedSnapshot extracts the
embedded files to a temp subdir and feeds the resulting path through
the same code as WithSnapshot — they differ only in how the snapshot
files reach disk.
Topology always comes from the snapshot's embedded smelt.yml —
pairing either option with WithPiriCount or WithPiriNodes returns
an error from NewStack, since the snapshot already dictates piri
count and backend mix.
Each test gets its own compose project (smeltery-<sanitized-testname>),
so parallel tests that load the same snapshot get isolated copies of
every volume. Teardown (t.Cleanup → composeStack.Down(RemoveVolumes=true))
removes both containers and volumes.
CI should exercise the cold-boot path to catch regressions in contract deploy or piri registration. Gate either snapshot option on an env var:
var opts []stack.Option
if os.Getenv("SMELT_TEST_NO_SNAPSHOT") == "" {
opts = append(opts, stack.WithEmbeddedSnapshot("3-piri-filesystem-sqlite"))
}
s := stack.MustNewStack(t, opts...)Then CI sets SMELT_TEST_NO_SNAPSHOT=1 to force cold-boots.
The SDK registers t.Cleanup before calling compose.Up, so
containers are torn down even if the stack fails to start (healthcheck
timeout, image pull failure, etc). But cleanup can still be skipped
when a test process is SIGKILLed, panics mid-cleanup, or runs under
WithKeepOnFailure and the developer forgets to tear the stack down
manually.
Reset a dirty docker state before a suite runs:
func TestMain(m *testing.M) {
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
if err := stack.CleanupLeaked(ctx); err != nil {
log.Printf("smeltery: sweep warning: %v", err)
}
cancel()
os.Exit(m.Run())
}CleanupLeaked removes every container and volume whose name starts
with smeltery- (the compose project prefix used by NewStack). It
doesn't touch the shared storacha-network or anything outside that
namespace. Safe to run on a machine where unrelated docker projects
live; not safe when two pkg/stack-using test suites run concurrently
(the sweeper from one will nuke the other's live stacks).
- No session manifest: tests are ephemeral; there's no across-run resume semantics to preserve.
- No image-drift warning: the test author chose the snapshot path
deliberately, so is responsible for ensuring images match. If you
need the warning, use the compose path or call
snapshot.LoadFiles/snapshot.RestoreVolumedirectly. - No save from tests: capturing a snapshot from a running test
stack isn't supported. If you need to regenerate snapshots, boot
the compose path and run
./smelt snapshot save.
make up # cold boot; wait for healthy
./smelt snapshot save goodmake up SNAPSHOT=good
# ... work ...
make down # pause; volumes + chain dump preserved
make up # resume exactly where you left offThe second make up reads the session manifest, so there's no
SNAPSHOT= needed — you're still in the session.
# Save a sqlite+filesystem baseline
./smelt snapshot save sqlite-baseline
# Edit smelt.yml (topology becomes postgres+s3 with 3 piri)
YES=1 make clean # wipe state and end any prior session
make up # cold boot new topology
./smelt snapshot save postgres-s3-3x
# Later, switch freely
YES=1 make clean
make up SNAPSHOT=sqlite-baseline
YES=1 make clean
make up SNAPSHOT=postgres-s3-3xcp -r generated/snapshots/good /tmp/good-snap
# Later, possibly from a different clone of smelt:
make up SNAPSHOT=/tmp/good-snapA postgres data directory is version-specific. If systems/piri/ is
regenerated against a bumped postgres:17 image, a snapshot saved
with postgres:16 will fail to boot — postgres refuses to run older
on-disk format against a newer server. Recapture the snapshot after
version bumps.
An upstream filecoin-services release changes the committed baseline
anvil-state.json. Your existing snapshots are still on the old
contracts and may behave unexpectedly against a mixed-version stack.
Treat filecoin-services releases as "recapture snapshots."
Snapshots are portable across Linux/macOS checkouts as long as:
- The compose project name is pinned (
name: smeltat the top ofcompose.ymldoes this — shipped by default). - Docker engine ≥ 25 on both saver and loader (enforced by Make's
check-dockertarget; see Prerequisites). - Image identity matches between save and load.
Smelt captures both the image reference (tag) and the content digest for every service at save time. On load it compares against the current compose config and reports two kinds of drift independently:
WARNING: images differ from snapshot:
piri-0: tag ghcr.io/storacha/piri:main → ghcr.io/storacha/piri:test-build
blockchain: digest drift at filecoin-localdev:local (sha256:83bfc639… → sha256:a1b2c3d4…)
- Tag drift means your
.envresolves a service to a different image reference than the snapshot was saved against — likely a local override. - Digest drift at the same tag catches the rolling-tag silent-pull
case: both sides use
ghcr.io/storacha/piri:main, but one pulled Monday and one pulled Wednesday, so the bytes differ.
A warning doesn't block the restore. It's a heads-up: if the image actually changed, behavior may diverge from what the snapshot's state was produced against.
Committing snapshots: smelt expects committed, team-shared snapshots
to live under snapshots/ at the project root (not gitignored), while
personal/throwaway ones stay in generated/snapshots/ (gitignored by
the existing generated/ rule). make up SNAPSHOT=… accepts either
a name (resolved under generated/snapshots/) or a path (use for
committed ones, e.g. SNAPSHOT=snapshots/quickstart).
Windows / non-Unix hosts: not supported. Smelt assumes a Linux or macOS docker host.
Volume tarballs compress the data dirs but still add up. ./smelt snapshot list shows sizes; rm as you go.
The snapshot's smelt.yml is authoritative while a session is active —
your tracked smelt.yml edits don't apply. Run make clean first to
leave the session and pick up your changes.
Run make down first. A snapshot load must wipe docker volumes, and
docker refuses to remove volumes attached to running containers.
Stopped-but-extant containers are still holding mounts. Load runs
docker compose down --remove-orphans automatically to clean these
up; if you still see it, manually run that command and retry.
Most likely, the snapshot's chain state has drifted from the
filecoin-services image referenced by your current .env. The load
output prints a WARNING: image tags differ from snapshot: block for
exactly this case — check which services drifted and either match
their tags to what the snapshot expects, or recapture the snapshot
against your current images.
It shouldn't — snapshot load writes to generated/snapshot-scratch/
only. If you see it, you may be on an older binary; rebuild:
go build -o smelt ./cmd/smeltYou're in a snapshot session. Run make clean to end it and re-run
make up. Confirm via ls generated/snapshot-scratch/smelt.yml
(should be gone).