This file tells AI agents (Claude Code, Codex, etc.) how to drive this CLI without inquirer prompts, watch progress, decide pass/fail, and summarize results for the user.
If you are doing anything other than running a playbook end-to-end, ignore this file and use README.md.
# 1. Pre-flight (no real spend)
docker info > /dev/null && grep -E '^(MAIN_PRIVATE_KEY|PARENT_CHAIN_RPC)=' .env
# 2. Run (background, will take 5–10 min for malicious-mint, 30–60 min for bold-challenge)
yarn run:script examples/malicious-mint.yaml
# 3. Decide pass/fail from one file
jq '.exitCode, .result.success' logs/latest-result.jsonExit 0 + result.success: true ⇒ demo worked. Anything else ⇒ inspect logs/latest-result.json .failure block then logs/latest-log.txt.
You're triggering this guide when the user asks any of:
- "Run the malicious-mint demo"
- "Run the BoLD challenge demo"
- "Verify a chain feature playbook end-to-end"
- "Test that the headless runner still works"
If the user wants to add a new playbook or understand the architecture, read readme-dev.md instead. This file is only about running existing playbooks.
| Playbook | Command | Mode | Typical duration | Approx. Sepolia ETH |
|---|---|---|---|---|
malicious-validator |
malicious-mint |
chain | 7–10 min | ~0.06 ETH |
malicious-validator |
bold-challenge |
chain | 30–90 min | ~0.10 ETH |
Both spend real Arbitrum Sepolia ETH. Confirm with the user before kicking off a run unless they explicitly asked.
Run all five before starting. Each takes < 1 second except the image build (skip-if-cached).
# 1. Node version (must be 23+)
node --version | grep -E 'v(2[3-9]|[3-9][0-9])'
# 2. Docker daemon up
docker info > /dev/null 2>&1 && echo OK
# 3. Required env vars present
grep -E '^(MAIN_PRIVATE_KEY|PARENT_CHAIN_RPC)=.+' .env | wc -l # expect: 2
# 4. Submodules populated (nitro-devnode pinned at 1e535a6)
test -f nitro-devnode/run-dev-node.sh && echo OK
# 5. Required Docker images (see "Obtaining Docker images" below if missing)
docker image inspect jasonwan123/nitro-node-malicious-arbminter > /dev/null 2>&1 && echo OK # malicious-mint
docker image inspect nitro-malicious-playbook-challenge-demo:latest > /dev/null 2>&1 && echo OK # bold-challengeIf any check fails, follow the matching subsection below — don't run the demo until they all pass.
| Failed check | Action | Cost |
|---|---|---|
| 1: Node | Tell the user; don't auto-install | — |
| 2: Docker daemon | Tell the user to start Docker Desktop | — |
| 3: env vars | Tell the user which var is missing; never auto-write .env |
— |
| 4: submodule | git submodule update --init --recursive (pulls ~5 MB) |
< 30 s |
| 5: images | See "Obtaining Docker images" — confirm with user first since the build is heavy | 0–25 min |
There are two images. The first is public and pullable; the second must be built locally.
Image 1 — jasonwan123/nitro-node-malicious-arbminter (malicious-mint demo)
Public on Docker Hub. ~3.3 GB.
# Confirm with the user before pulling — this is a 3+ GB download.
docker pull jasonwan123/nitro-node-malicious-arbminter:latestImage 2 — nitro-malicious-playbook-challenge-demo:latest (bold-challenge demo)
Must be built from the nitro repo's Dockerfile.malicious. Takes 15–25 minutes (compiles Rust prover + Solidity contracts + Go binaries). Build only once per machine.
# Confirm with the user before kicking this off — it's a long build.
git clone --depth 1 \
--branch malicious-validator-readInbox \
--recurse-submodules --shallow-submodules \
https://github.com/OffchainLabs/nitro.git /tmp/nitro-malicious-build
cd /tmp/nitro-malicious-build
docker build -f Dockerfile.malicious -t nitro-malicious-remote-test .
docker tag nitro-malicious-remote-test nitro-malicious-playbook-challenge-demo:latest
# Optional: verify the build by checking module root hash:
docker run --rm --entrypoint cat nitro-malicious-playbook-challenge-demo:latest \
/home/user/target/machines/latest/module-root.txt
# Expected: 0xc2c02df561d4afaf9a1d6785f70098ec3874765c638e3cb6dbe8d3c83333e14cSource of truth — if the commands above drift, defer to src/playbooks/malicious-validator/README.md.
# Existing nitro-* containers from previous runs
docker ps --filter "name=nitro-" --format '{{.Names}}'If any are listed and you set orphanContainerPolicy: warn (default), the run will warn but not stop them — and a port conflict on 9642 / 8449 will likely fail the run. Two options:
- Set
orphanContainerPolicy: stopin the YAML, or docker stop <name>manually before running.
Use the existing examples first. Only write your own if the user wants non-default params.
ls examples/
# malicious-mint.yaml
# bold-challenge.yamlMinimum viable script:
mode: chain
playbook: malicious-validator
command: malicious-mint
chainRestorePolicy: auto # let the runner detect "this command redeploys"
orphanContainerPolicy: warn # use "stop" to auto-clean leftover containers
params: {} # all-defaults; override only what user requested
# timeoutSeconds: 1800 # uncomment to add a hard capETH amounts in params accept decimal strings ("0.005") — converted to bigint wei automatically.
yarn start does not work for you (it spawns inquirer). Always use yarn run:script:
yarn run:script examples/malicious-mint.yaml > /tmp/run.stdout 2>&1 &
echo "started pid=$!"Or use your harness's background-task mechanism. Do not keep the foreground tool blocked for 10–60 minutes.
The demo emits progress every few seconds. Do not poll faster than 30 s — RPC traffic is already heavy.
Sensible check cadence:
| Demo | First check | Subsequent |
|---|---|---|
malicious-mint |
90 s | every 60 s |
bold-challenge |
3 min | every 3 min |
test -f logs/latest-result.json && echo DONE || echo RUNNINGlogs/latest-result.json is only written after the run has emitted its envelope (success, failure, or cancellation). If it doesn't exist, the run is still going.
# Most recent step transition (latest-log.txt is a pointer file)
grep -oE '\[[0-9]+/[0-9]+\] [^"]+' "$(cat logs/latest-log.txt)" | tail -3
# Most recent EVENT marker (high-signal milestones)
grep '\[EVENT\]' "$(cat logs/latest-log.txt)" | tail -5Examples of high-signal events (substring matches):
Chain deployed successfully→ step 1/11 doneNode started→ step 2/11 doneHacker minted→ malicious tx confirmedWithdrawal executed successfully on L1!→ demo essentially wonEdge Added: Block, Length=128 ... [LayerZero]→ BoLD challenge started bisectingEdgeConfirmedByOneStepProof→ BoLD challenge resolved by OSP
logs/latest-events.txt is a one-line pointer file containing the absolute path of this run's events JSONL. Read it then tail the real file:
tail -f "$(cat logs/latest-events.txt)" | jq
# Each line: {"ts": ..., "type": "challenge", "payload": {ChallengeEvent}}malicious-mint doesn't emit structured events; its events-*.jsonl will be empty (expected, not a bug). The same pointer pattern applies to latest-log.txt, latest-jsonl.txt, latest-transcript.txt — each contains an absolute path. latest-result.json is the only one that is a real JSON file, not a pointer.
Always read logs/latest-result.json first. It has everything needed in one file.
jq '{exitCode, success: .result.success, failure: .failure}' logs/latest-result.json| Exit code | Meaning | What to do |
|---|---|---|
0 |
Success | Build summary for user (next section) |
2 |
Playbook reported failure | Read .failure.failedAtStep + .failure.errorMessage |
3 |
Validation error | YAML or env wrong — fix and re-run |
64 |
Usage error | Wrong CLI args |
130 |
Cancelled (timeout / SIGINT / SIGTERM) | Either user killed it or timeoutSeconds hit |
1 |
Fatal / unexpected | Read .failure.errorStack + logs/latest-log.txt |
Even on success: true, sanity-check the numeric fields are non-zero (the runner had a soft-failure bug class where it would return all-zeros):
jq '.result.data | {mintAmount, withdrawAmount, bridgeBalanceFinal}' logs/latest-result.jsonIf any of those are "0", treat it as failure even though success: true.
jq '.result.data | {success, winner, totalEdges, totalBisections, ospTxHash}' logs/latest-result.jsonA real win has winner: "honest" and ospTxHash: "0x...". winner: "timeout" means the demo ran out of monitor time, not that the demo logic failed — relay this nuance to the user.
Use this template after a successful run. Fill in from logs/latest-result.json:
✅ Malicious Mint demo completed in {duration}.
Demonstrated: a chain running a malicious ArbMinter precompile creates ETH
out of thin air, then withdraws it through the bridge — and the withdrawal
is confirmed on L1 because no honest validator is challenging.
Numbers:
- Hacker minted: {mintAmount} wei (≈ {mintAmount in ETH})
- Hacker withdrew: {withdrawAmount} wei (executed on L1)
- L1 execution tx: grep '"executeTransaction"' logs/latest-log.txt | tail -1
- Bridge balance: {bridgeBalanceInitial} → {bridgeBalanceFinal} wei
Logs:
- Result: logs/latest-result.json
- Full text log: cat the path in logs/latest-log.txt
✅ BoLD Challenge demo completed in {duration}.
Demonstrated: an honest validator defeats a malicious validator through
multi-level bisection (Block → BigStep → SmallStep → OSP).
Numbers:
- Edges created: {totalEdges}
- Bisections: {totalBisections}
- Winner: {winner}
- OSP tx hash: {ospTxHash}
Logs:
- Result: logs/latest-result.json
- Structured events: cat the path in logs/latest-events.txt
- Full text log: cat the path in logs/latest-log.txt
Lead with the failure cause, not the demo description:
❌ {playbook}/{command} failed at step "{failure.failedAtStep}" (exit code {exitCode}).
Reason: {failure.errorMessage or .result.message}
Completed steps before failure: {failure.completedSteps}
Inspect: cat the path in logs/latest-log.txt
| Symptom | Cause | Fix |
|---|---|---|
port is already allocated on 9642 / 8449 |
Leftover container from previous run | Set orphanContainerPolicy: stop, or docker stop nitro-* manually |
node-config.json chain-id (X) does not match … chainId (Y). Refusing to overwrite |
CHAIN_DEPLOYMENT_TRANSACTION_HASH env var disagrees with leftover node-config.json |
YAML should have chainRestorePolicy: auto (the example does); if it does, the playbook's redeploysChain flag may not be set. For redeployment commands this is a code bug — escalate. |
Insufficient balance. Current: 0.0… ETH, Required: ~0.05 ETH |
Deployer key out of Sepolia ETH | Tell the user — get from https://www.alchemy.com/faucets/arbitrum-sepolia |
Docker image "…" not found locally |
Image not pulled / built | See "Obtaining Docker images" above. Confirm cost (3 GB pull or 20-min build) with user before kicking off. |
Message status: UNCONFIRMED for many polls in malicious-mint |
Normal — assertion covering the L2 send root hasn't been confirmed yet | Wait. Up to 10 min of polling is built-in. |
| Run cancelled with exit 130 | Timeout, SIGINT, or SIGTERM | Check .failure.failedAtStep to see how far it got. The chain on Sepolia is real and intact. |
- Do not keep a foreground command blocked for the entire run. Always background-and-poll.
- Do not poll faster than every 30 seconds — JSON-RPC backoff is real.
- Do not auto-
docker pull(3 GB), auto-docker build(20 min, heavy CPU), auto-fund the deployer key, or auto-edit.env. Confirm the cost with the user first; only proceed on explicit go-ahead. - Do not delete
logs/,node-config*.json, or.arbitrum/without confirming — they hold debug state for any failed run. - Do not confuse
winner: "timeout"with demo failure. It means the monitor stopped watching, not that the protocol failed. - Do not report
success: truewithout sanity-checking.result.datanumeric fields (see "Sanity-check" sections above). - Do tell the user the approximate cost in ETH and time before kicking off a run, unless they explicitly authorized spend.