Skip to content

Make worker installs reproducible with iii.lock#1527

Merged
ytallo merged 9 commits intomainfrom
reproducible-worker-registry
Apr 24, 2026
Merged

Make worker installs reproducible with iii.lock#1527
ytallo merged 9 commits intomainfrom
reproducible-worker-registry

Conversation

@ytallo
Copy link
Copy Markdown
Contributor

@ytallo ytallo commented Apr 22, 2026

Summary

Adds reproducible registry-managed worker installs through iii.lock, including dependency graph locking, CI verification, update flows, legacy registry fallback, portable binary artifacts across platforms, and fail-closed lockfile validation.

This makes registry shorthand installs resolver-first: when the registry returns a worker graph, the CLI installs the whole graph and records deterministic pins for every resolved worker.

What Changed

Lockfile

  • Added iii.lock YAML format with version: 1 and sorted workers entries.
  • Each locked worker records its resolved version, type, dependencies, and pinned source.
  • Image workers must pin the resolved OCI image by digest.
  • Binary workers now use a platform-neutral artifact map keyed by target triple:
source:
  kind: binary
  artifacts:
    aarch64-apple-darwin:
      url: https://example.com/worker-aarch64-apple-darwin.tar.gz
      sha256: ...
    x86_64-unknown-linux-gnu:
      url: https://example.com/worker-x86_64-unknown-linux-gnu.tar.gz
      sha256: ...

This is closer to Cargo's lockfile model: the resolved worker version and dependency graph are shared across machines, while each platform selects its matching artifact. Older single-target binary sources using target, url, and sha256 remain readable and are rewritten into the artifacts shape on update.

Worker CLI

  • iii worker add <worker[@version]> tries /resolve first and writes or merges lockfile entries when a graph is returned.
  • A worker with dependencies installs the full graph. For example, one root worker plus three dependencies adds all four workers and records all four in iii.lock.
  • Graph installs now validate resolved worker names, resolved lockfile pins, and any existing iii.lock before install side effects.
  • Graph installs snapshot config.yaml and restore it if a later node, lockfile merge, or lockfile write fails, avoiding configured-but-unpinned partial installs.
  • iii worker update [worker] re-resolves latest for one locked worker, or inferred root workers when no worker is supplied.
  • Dependency-only lockfile entries are refreshed through their root graph instead of being treated as independent update roots.
  • iii worker update with an empty lockfile now reports that there is nothing to update instead of exiting silently.
  • iii worker verify checks lockfile-managed workers from config.yaml, allows extra lockfile entries, skips built-ins/direct OCI/local-path sandbox workers, and validates that locked binary workers include an artifact for the current target.
  • iii worker sync --frozen delegates to verify.
  • iii worker sync remains read-only for now: it reads/parses iii.lock and reports the pinned worker count.
  • /resolve compatibility failures such as 404/405 fall back to the legacy /download endpoint.
  • CLI help for sync now matches the current read-only implementation.

Docs

  • Added a how-to guide for reproducing worker installs with iii.lock.
  • Added a managed-worker lockfile reference page documenting schema and command behavior.
  • Linked the lockfile workflow from the container-worker guide.
  • Added a 0.11.0 changelog note.

Validation

  • cargo fmt --all
  • cargo fmt --all -- --check
  • cargo test -p iii-worker cli::lockfile::tests -- --nocapture
  • cargo test -p iii-worker cli::config_file::tests -- --nocapture
  • cargo test -p iii-worker managed::tests -- --nocapture
  • cargo test -p iii-worker cli::registry::tests -- --nocapture
  • cargo test -p iii-worker sync_help_matches_current_read_only_behavior -- --nocapture
  • cargo test -p iii-worker handle_resolved_graph_add_rejects_invalid_existing_lockfile_before_side_effects -- --nocapture
  • cargo test -p iii-worker handle_resolved_graph_add_restores_config_when_later_node_fails -- --nocapture
  • cargo test -p iii-worker handle_worker_update_reports_empty_lockfile_without_resolving -- --nocapture
  • Parsed docs/docs.json
  • git diff --check

Notes / Known Gaps

  • pnpm fmt:check was not run in this shell because pnpm was not available on PATH. Earlier local attempts also exposed existing Biome nested-root configuration issues unrelated to this PR.
  • Full cargo test -p iii-worker was not used as the final verification path because an existing worker-list discovery integration test previously hung locally on a copied /bin/sleep sentinel. The affected lockfile, registry, managed-worker, config-file, and CLI-help suites were run directly.
  • Non-frozen worker sync intentionally remains a read/parse/report command in this PR; it does not install artifacts from the lockfile yet.

iii-worker can parse registry graph responses, write deterministic lockfiles, verify config against the lock, and expose sync/update/verify commands. The add path tries the graph resolver first while retaining the older single-worker response fallback.

Constraint: Preserve existing iii worker add behavior for old registries and built-ins

Rejected: Store lock metadata in config.yaml | mixes generated pins with user config

Rejected: Remove /download fallback | would break current registry fixtures and older APIs

Confidence: medium

Scope-risk: moderate

Directive: Keep stdout machine-readable; human status stays on stderr

Tested: cargo test -p iii-worker --no-fail-fast; cargo fmt --all; git -C ../iii diff --check

Not-tested: cargo clippy -p iii-worker --lib -- -D warnings due pre-existing lint debt in local_worker.rs, managed.rs, vm_boot.rs, worker_manager/libkrun.rs
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
iii-website Ready Ready Preview, Comment Apr 24, 2026 0:06am
motia-docs Ready Ready Preview, Comment Apr 24, 2026 0:06am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a YAML-backed iii.lock lockfile, registry-resolved worker-graph resolution and models, read/write/verify helpers, three new CLI subcommands (update, sync, verify) with handlers wired into main, updates managed-worker install/update flows to emit/consume lockfiles, tests, and documentation.

Changes

Cohort / File(s) Summary
CLI Command Definitions
crates/iii-worker/src/cli/app.rs
Added Commands::Update { worker_name: Option<String> }, Commands::Sync { frozen: bool }, and Commands::Verify.
Lockfile Module
crates/iii-worker/src/cli/lockfile.rs
New YAML lockfile implementation (WorkerLockfile, LockedWorker, LockedWorkerType, LockedSource), serde mappings, deterministic serialization, read/write helpers, verify_config_workers(), and lockfile_path().
Managed Worker Handlers
crates/iii-worker/src/cli/managed.rs
Added handle_worker_update, handle_worker_sync, handle_worker_verify; binary_config_yaml() helper; conversion of resolved graphs into lockfile entries; install/merge/write logic; updated handle_managed_add fallback; extensive tests for flows and errors.
Registry API & Models
crates/iii-worker/src/cli/registry.rs
Replaced WorkerConfig with raw serde_json::Value, added resolved-graph types (ResolvedWorkerGraph, ResolvedWorker, ResolvedEdge, ResolvedRoot), and fetch_resolved_worker_graph() (supports file:// fixtures and POST /resolve HTTP).
Module Exports & CLI Dispatch
crates/iii-worker/src/cli/mod.rs, crates/iii-worker/src/main.rs
Exported new lockfile module and added update/sync/verify match arms in main() to call new async handlers.
Tests & Integration
crates/iii-worker/tests/worker_integration.rs, crates/iii-worker/src/lib.rs
Expanded CLI parse tests for update/sync/verify; added test-only TEST_ENV_LOCK mutex; multiple unit tests for lockfile serialization, graph→lock conversions, and handler behaviors.
Documentation
docs/*, docs/how-to/*, docs/workers/*, docs/changelog.mdx, docs/docs.json
New how-to and lockfile docs, changelog entry, and docs navigation updates describing iii.lock semantics and update/sync/verify workflows.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as "iii (CLI)"
    participant FS as "Filesystem"
    participant Registry as "Registry API"
    participant Sources as "Binary/Image Sources"

    User->>CLI: iii worker update [<name>]
    CLI->>FS: read `config.yaml`
    CLI->>Registry: fetch_resolved_worker_graph(name?, version, target)
    Registry-->>CLI: ResolvedWorkerGraph
    CLI->>Sources: fetch artifacts (binaries / images)
    Sources-->>CLI: artifacts (url, sha256 / image ref)
    CLI->>CLI: build `WorkerLockfile` from graph
    CLI->>FS: read/merge/write `iii.lock`

    User->>CLI: iii worker sync [--frozen]
    CLI->>FS: read `iii.lock`
    alt frozen = true
        CLI->>FS: read `config.yaml`
        CLI->>CLI: verify_config_workers()
    else
        CLI->>Sources: (optionally) ensure artifacts installed per lockfile
    end

    User->>CLI: iii worker verify
    CLI->>FS: read `config.yaml` & `iii.lock`
    CLI->>CLI: verify_config_workers()
    CLI-->>User: verification result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • sergiofilhowz

Poem

🐰
I hop through YAML, pins snug and bright,
Resolve the graph by lantern light.
Sync, verify, update the thread,
Lock the versions, store them instead.
A rabbit cheers and nibbles a byte. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.28% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective of the PR: adding reproducible worker installs via the iii.lock file.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed Pull request description comprehensively covers all required sections: What, Why, and Notes with extensive detail on changes, validation steps, and known gaps.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch reproducible-worker-registry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/iii-worker/src/cli/lockfile.rs`:
- Around line 62-64: The from_yaml function currently accepts any schema; after
deserializing into Self, validate the lockfile version field and reject
unsupported versions instead of silently accepting them. Change from_yaml to
parse into Self, check self.version (or the version field on the deserialized
struct) equals the supported value (e.g. 1), and return Err(format!("unsupported
{LOCKFILE_NAME} version: {}", self.version)) if not; keep using LOCKFILE_NAME
and ensure callers of from_yaml (e.g., sync/verify) only operate on validated
lockfiles.

In `@crates/iii-worker/src/cli/managed.rs`:
- Around line 222-225: The current flow calls
super::config_file::list_worker_names() which silently returns an empty Vec on
read errors, allowing lockfile.verify_config_workers(&names) to erroneously
succeed; change the call to a fallible API (e.g., add/use a read_result variant
or make list_worker_names return Result<Vec<_>, Error>) and handle the error
before calling lockfile.verify_config_workers: if reading config.yaml fails,
surface the error (e.g., eprintln! with context including "config.yaml" and the
error) and exit/return non‑zero instead of proceeding to call
verify_config_workers; update the branch that currently prints "config.yaml
matches iii.lock" to only run when the config read succeeded and
verify_config_workers returned Ok.
- Around line 587-605: The current match on fetch_resolved_worker_graph only
falls back when the graph parse fails, but it should also fall back when the
registry doesn't support /resolve (e.g., 404 or 405 responses) instead of
exiting; update the Err arm(s) in the match around fetch_resolved_worker_graph
so that errors indicating an unsupported resolve endpoint (HTTP 404/405 or
equivalent messages) are treated like the parse-failure case and run the same
fallback logic (call handle_resolved_graph_add(&graph, brief) flow or continue
to the existing /download path) rather than printing the error and returning 1;
refer to fetch_resolved_worker_graph, handle_resolved_graph_add, and finish_add
to locate the code to change.
- Around line 367-409: The loop in handle_resolved_graph_add calls handlers with
node.name without validating it; add a validation step to prevent invalid worker
names from causing install/config side effects. Before performing any installs,
validate each node.name (e.g., non-empty and matches your worker-name rules or
reuse an existing validator) and return a non-zero error with a clear message if
any name is invalid; alternately, perform a pre-check pass over graph to
validate all names before the main install loop. Update
handle_resolved_graph_add to reference node.name and use the validation result
to short-circuit before calling handle_binary_add or handle_oci_pull_and_add.
- Around line 188-210: handle_worker_sync currently only reads the lockfile and
prints the count instead of performing the non-frozen sync; replace the
"count-only" branch so that after reading the lockfile
(WorkerLockfile::read_from(lock_path)) you iterate lockfile.workers and perform
the lockfile-driven install/update workflow (invoke the existing install/pull
routines for each worker entry — e.g., the function(s) responsible for
installing binaries or images), apply any necessary updates to config.yaml,
surface errors (return non-zero on failure) and log progress, while keeping the
frozen path that delegates to handle_worker_verify() intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0251bb52-439f-4d81-ad26-d5f6a7fbfe60

📥 Commits

Reviewing files that changed from the base of the PR and between 1c92a73 and f52d70d.

📒 Files selected for processing (7)
  • crates/iii-worker/src/cli/app.rs
  • crates/iii-worker/src/cli/lockfile.rs
  • crates/iii-worker/src/cli/managed.rs
  • crates/iii-worker/src/cli/mod.rs
  • crates/iii-worker/src/cli/registry.rs
  • crates/iii-worker/src/main.rs
  • crates/iii-worker/tests/worker_integration.rs

Comment thread crates/iii-worker/src/cli/lockfile.rs
Comment thread crates/iii-worker/src/cli/managed.rs
Comment thread crates/iii-worker/src/cli/managed.rs Outdated
Comment thread crates/iii-worker/src/cli/managed.rs
Comment thread crates/iii-worker/src/cli/managed.rs Outdated
Add regression coverage for paths exercised by the new `sync`, `verify`,
and `update` commands: lockfile round-trip through disk, verify_config
symmetry, lockfile_from_graph error branches (missing binaries, wrong
target, missing image, unsupported type), binary_config_yaml(null), and
handler return codes when iii.lock is absent or the named worker is not
pinned.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/iii-worker/src/cli/lockfile.rs (1)

62-64: ⚠️ Potential issue | 🟡 Minor

Reject unsupported iii.lock versions on read.

from_yaml still accepts any version, so future/incompatible schemas can be silently treated as v1 by sync/verify.

🛡️ Proposed validation
 pub fn from_yaml(input: &str) -> Result<Self, String> {
-    serde_yaml::from_str(input).map_err(|e| format!("failed to parse {LOCKFILE_NAME}: {e}"))
+    let lockfile: Self =
+        serde_yaml::from_str(input).map_err(|e| format!("failed to parse {LOCKFILE_NAME}: {e}"))?;
+    if lockfile.version != LOCKFILE_VERSION {
+        return Err(format!(
+            "unsupported {LOCKFILE_NAME} version {} (expected {})",
+            lockfile.version, LOCKFILE_VERSION
+        ));
+    }
+    Ok(lockfile)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/lockfile.rs` around lines 62 - 64, The from_yaml
function currently accepts any lockfile version, risking silently treating
future/incompatible schemas as v1; update the from_yaml implementation to
deserialize into the lockfile struct (using serde_yaml::from_str) then
explicitly check the deserialized.version field against the supported version
constant (or allowed set) and return Err(format!("unsupported {LOCKFILE_NAME}
version: {}", version)) when it doesn't match; make this change inside from_yaml
(and keep the existing error wrapping for parse failures) so callers (e.g.,
sync/verify) only operate on supported lockfile versions.
🧹 Nitpick comments (1)
crates/iii-worker/src/cli/lockfile.rs (1)

78-80: Write iii.lock via a temporary file before replacing it.

std::fs::write truncates the destination first; a crash or full disk can leave iii.lock partially written and break subsequent sync/verify runs. Prefer writing beside the lockfile and renaming into place.

♻️ Safer replacement pattern
     pub fn write_to(&self, path: &Path) -> Result<(), String> {
         let yaml = self.to_yaml()?;
-        std::fs::write(path, yaml).map_err(|e| format!("failed to write {}: {e}", path.display()))
+        let tmp_path = path.with_extension("lock.tmp");
+        std::fs::write(&tmp_path, yaml)
+            .map_err(|e| format!("failed to write {}: {e}", tmp_path.display()))?;
+        std::fs::rename(&tmp_path, path).map_err(|e| {
+            let _ = std::fs::remove_file(&tmp_path);
+            format!("failed to replace {}: {e}", path.display())
+        })
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/lockfile.rs` around lines 78 - 80, The write_to
method currently uses std::fs::write which truncates the destination; change it
to write the YAML to a temporary file next to the target (e.g., create a temp
file in the same directory using a unique extension or NamedTempFile), flush and
sync the temp file to disk, then atomically rename (std::fs::rename) the temp
file over the target path to replace iii.lock; ensure errors from
creating/writing/syncing/renaming are propagated as the existing Result<String,
()> pattern so callers of write_to see the failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/iii-worker/src/cli/lockfile.rs`:
- Around line 62-64: The from_yaml function currently accepts any lockfile
version, risking silently treating future/incompatible schemas as v1; update the
from_yaml implementation to deserialize into the lockfile struct (using
serde_yaml::from_str) then explicitly check the deserialized.version field
against the supported version constant (or allowed set) and return
Err(format!("unsupported {LOCKFILE_NAME} version: {}", version)) when it doesn't
match; make this change inside from_yaml (and keep the existing error wrapping
for parse failures) so callers (e.g., sync/verify) only operate on supported
lockfile versions.

---

Nitpick comments:
In `@crates/iii-worker/src/cli/lockfile.rs`:
- Around line 78-80: The write_to method currently uses std::fs::write which
truncates the destination; change it to write the YAML to a temporary file next
to the target (e.g., create a temp file in the same directory using a unique
extension or NamedTempFile), flush and sync the temp file to disk, then
atomically rename (std::fs::rename) the temp file over the target path to
replace iii.lock; ensure errors from creating/writing/syncing/renaming are
propagated as the existing Result<String, ()> pattern so callers of write_to see
the failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 41b95339-d294-4140-8d77-416e7f3efe40

📥 Commits

Reviewing files that changed from the base of the PR and between f52d70d and 72ce5ef.

📒 Files selected for processing (2)
  • crates/iii-worker/src/cli/lockfile.rs
  • crates/iii-worker/src/cli/managed.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/iii-worker/src/cli/managed.rs

Comment thread crates/iii-worker/src/cli/lockfile.rs
PR #1527 adds lockfile-backed managed worker resolution, so the docs now need a workflow page and reference entry that describe the current user-facing behavior accurately, including the intentionally limited sync behavior.

Constraint: Docs must reflect the current PR implementation where non-frozen sync reads and reports the lockfile but does not install artifacts.

Rejected: Present iii worker sync as a full installer | current handler does not install from the lockfile yet

Confidence: high

Scope-risk: narrow

Directive: Keep sync wording aligned with CLI behavior before expanding the docs to an install workflow

Tested: docs/docs.json parse; docs navigation target check; changed-doc link check; git diff --check; Mintlify preview at http://localhost:3002 returned 200 for both new pages

Not-tested: pnpm fmt:check due missing node_modules/local biome; cargo fmt due pre-existing Rust formatting diff in PR code
The PR branch needed to absorb main after the changelog gained the new iii worker exec entry. The conflict resolution keeps main's exec note and the PR's reproducible worker install note together in the 0.11.0 changelog.

Constraint: Preserve main's merged changelog updates while keeping the iii.lock documentation entry from this PR.

Rejected: Drop either changelog entry | both describe distinct user-facing worker changes

Confidence: high

Scope-risk: moderate

Directive: Keep future changelog edits additive when multiple worker features land in the same release section

Tested: git diff --cached --check; cargo fmt --all -- --check; no unmerged files; conflict-marker search; docs/docs.json navigation target check; changed-doc link check

Not-tested: Full repo test suite after merging main
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/iii-worker/src/cli/managed.rs (1)

142-154: Silent fallback on serde_yaml serialization failure.

unwrap_or_default() turns any serialization error into an empty string and then into None, so a malformed registry config is silently dropped as "no config" instead of surfacing. Serialization failures on JSON→YAML are vanishingly rare in practice, but since this function is used for the config block that ends up in config.yaml, propagating the error (or at least tracing::warn!ing it) would prevent a confusing "worker installed but config section missing" outcome.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 142 - 154, The current use
of serde_yaml::to_string(config).unwrap_or_default() swallows serialization
errors; change it to explicitly handle the Result: call
serde_yaml::to_string(config) and match on Ok(yaml_str) to continue with the
existing strip_prefix/trim_end/empty checks, and on Err(e) call
tracing::warn!(%e, ?config, "failed to serialize registry config to YAML") (or
propagate the error if the surrounding function returns a Result) and then
return None; this uses the same symbols (serde_yaml::to_string, tracing::warn!,
strip_prefix) so you can locate and update the block that produces the YAML
string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/iii-worker/src/cli/managed.rs`:
- Around line 235-287: handle_worker_update currently re-resolves every entry in
lockfile.workers (names) including transitive deps and blindly inserts merged
nodes via handle_resolved_graph_add, which fails to prune removed nodes and
allows overlapping re-resolves to overwrite correct pins; fix by (1) changing
the None branch that builds names to only include top-level root workers from
your project config (parse config.yaml to get the true roots instead of using
lockfile.workers.keys()), (2) when processing each resolved graph returned by
fetch_resolved_worker_graph(name, ...) compute the set of nodes in that graph
and diff against lockfile.workers before calling handle_resolved_graph_add so
you remove any lockfile entries that are no longer referenced by that root, and
(3) update handle_resolved_graph_add (or replace its merge behavior) to perform
a replace-with-diff for the root’s subtree instead of unconditional
lockfile.workers.insert(...) so transitive dependencies aren’t re-pinned by
separate resolutions.

---

Nitpick comments:
In `@crates/iii-worker/src/cli/managed.rs`:
- Around line 142-154: The current use of
serde_yaml::to_string(config).unwrap_or_default() swallows serialization errors;
change it to explicitly handle the Result: call serde_yaml::to_string(config)
and match on Ok(yaml_str) to continue with the existing
strip_prefix/trim_end/empty checks, and on Err(e) call tracing::warn!(%e,
?config, "failed to serialize registry config to YAML") (or propagate the error
if the surrounding function returns a Result) and then return None; this uses
the same symbols (serde_yaml::to_string, tracing::warn!, strip_prefix) so you
can locate and update the block that produces the YAML string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 20849afa-1f0e-452f-940e-a2758e4de63e

📥 Commits

Reviewing files that changed from the base of the PR and between 46c2c09 and 9e810ef.

📒 Files selected for processing (8)
  • crates/iii-worker/src/cli/app.rs
  • crates/iii-worker/src/cli/managed.rs
  • crates/iii-worker/src/cli/mod.rs
  • crates/iii-worker/src/main.rs
  • crates/iii-worker/tests/worker_integration.rs
  • docs/changelog.mdx
  • docs/docs.json
  • docs/how-to/managing-container-workers.mdx
✅ Files skipped from review due to trivial changes (3)
  • crates/iii-worker/src/cli/mod.rs
  • docs/docs.json
  • docs/changelog.mdx
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/how-to/managing-container-workers.mdx
  • crates/iii-worker/tests/worker_integration.rs

Comment thread crates/iii-worker/src/cli/managed.rs
@ytallo ytallo closed this Apr 23, 2026
@ytallo ytallo deleted the reproducible-worker-registry branch April 23, 2026 18:40
@ytallo ytallo restored the reproducible-worker-registry branch April 23, 2026 18:42
@ytallo ytallo reopened this Apr 23, 2026
The lockfile flow now matches the supported worker taxonomy: frozen verification checks registry-managed workers without rejecting built-ins, direct OCI refs, or local-path sandbox workers. Legacy registries that do not expose /resolve continue through the existing /download path, and bulk update skips dependency-only pins instead of treating every locked entry as a root.

Constraint: Direct OCI refs, built-ins, and local-path sandbox workers remain outside the current iii.lock format.

Constraint: iii.lock has no explicit roots field, so root updates are inferred from dependency edges for this PR.

Rejected: Require every config.yaml worker to be locked | breaks valid mixed worker projects and contradicts the documented lockfile scope.

Rejected: Add a lockfile roots schema field | broader public schema change than needed for this PR.

Confidence: high

Scope-risk: moderate

Directive: Do not broaden frozen verification without adding a config-level marker or lockfile schema support for unmanaged worker kinds.

Tested: cargo fmt --all

Tested: cargo fmt --all -- --check

Tested: cargo test -p iii-worker managed::tests -- --nocapture

Tested: cargo test -p iii-worker cli::registry::tests -- --nocapture

Tested: cargo test -p iii-worker sync_help_matches_current_read_only_behavior -- --nocapture

Tested: docs/docs.json parsed with node

Tested: git diff --check

Not-tested: Full cargo test -p iii-worker completed; worker_list_discovery_integration hung in a pre-existing copied /bin/sleep sentinel and the run was stopped after all affected suites had passed.

Not-tested: pnpm fmt:check completed; Biome exits on existing nested root configuration in console/packages/console-frontend/biome.json and sdk/biome.json.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
crates/iii-worker/src/cli/managed.rs (4)

415-457: ⚠️ Potential issue | 🟠 Major

Resolved graph node names are still not validated before install/config side effects.

node.name comes straight from the registry payload and is passed into handle_binary_add / handle_oci_pull_and_app / append_worker without validate_worker_name. A hostile (or broken) registry can inject names with .., path separators, or YAML-breaking characters and cause filesystem writes and config.yaml edits before the rest of the pipeline notices.

🛡️ Pre-validate all nodes before any install side effect
 async fn handle_resolved_graph_add(graph: &ResolvedWorkerGraph, brief: bool) -> i32 {
     let target = graph
         .target
         .clone()
         .unwrap_or_else(|| binary_download::current_target().to_string());

+    for node in &graph.graph {
+        if let Err(e) = super::registry::validate_worker_name(&node.name) {
+            eprintln!(
+                "{} invalid resolved worker '{}': {}",
+                "error:".red(),
+                node.name,
+                e
+            );
+            return 1;
+        }
+    }
+
     for node in &graph.graph {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 415 - 457, The loop in
handle_resolved_graph_add performs installs/config changes using node.name
without validation; before doing any side effects, iterate over graph.graph and
call validate_worker_name for each node.name (and also reject empty/missing
names) and return a non-zero error if validation fails so no calls to
handle_binary_add, handle_oci_pull_and_add, or append_worker occur; ensure you
reference ResolvedWorkerGraph/graph.graph and node.name when adding this
pre-check so all nodes are validated up front.

262-314: ⚠️ Potential issue | 🟠 Major

worker update still never prunes stale lockfile entries.

locked_root_worker_names (lines 316–335) is a good improvement for the None branch — it filters out transitive deps before re-resolving, addressing the overlap concern. However, the merge in handle_resolved_graph_add (lines 467–471) is still insert-only: entries that existed in iii.lock but are no longer in the newly-resolved graph remain indefinitely. If A@1 dropped its dep on B upstream, B's pin lingers forever and verify / sync --frozen keep honoring a ghost requirement.

Consider diffing the resolved subtree for each root against the existing lockfile and removing entries that are no longer reachable from any root's new graph before writing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 262 - 314,
handle_worker_update currently re-resolves root graphs via
locked_root_worker_names but never prunes stale entries because
handle_resolved_graph_add only inserts new pins; to fix, after fetching each
resolved graph in handle_worker_update (or inside handle_resolved_graph_add)
compute the set of nodes reachable from the newly resolved root graph and then
remove any entries from the in-memory WorkerLockfile that are not reachable from
any root's new graph before persisting iii.lock; use the existing symbols
locked_root_worker_names, fetch_resolved_worker_graph, handle_resolved_graph_add
and the WorkerLockfile structure to locate where to diff the resolved subtree
against lockfile.workers and delete unreachable keys so removed upstream deps no
longer linger.

212-234: ⚠️ Potential issue | 🟠 Major

verify can still report success when config.yaml is unreadable.

list_worker_names() swallows read errors and returns an empty Vec. lockfile_relevant_config_worker_names then filters an empty list to an empty list, and verify_config_workers(&[]) returns Ok(()), so the command prints "config.yaml matches iii.lock" and exits 0 on an unreadable/invalid config. Use a fallible read API here and fail closed; filtering does not cover this case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 212 - 234,
handle_worker_verify currently calls list_worker_names which swallows read
errors, causing verify to succeed on unreadable config; change the flow to use a
fallible API (or make list_worker_names return Result) so the function returns
Err on config read failure: replace the call to list_worker_names() with a
fallible reader (e.g., a new list_worker_names_result or a read_config_names
that returns Result<Vec<String>, Error>), propagate the error out of
handle_worker_verify (printing the error and returning non-zero) before calling
lockfile_relevant_config_worker_names and verify_config_workers; keep using
lockfile_relevant_config_worker_names(&lockfile, &names) and
lockfile.verify_config_workers(&names) after successful read.

188-210: ⚠️ Potential issue | 🟠 Major

Non-frozen worker sync still only reports — does not install.

handle_worker_sync reads iii.lock and prints the count; it never installs binaries/images or reconciles config.yaml. A fresh checkout will print "✓ N worker(s) pinned" and exit 0 without producing any runnable state. This matches the "known gap" called out in the PR description but is worth landing before users hit it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 188 - 210,
handle_worker_sync currently reads iii.lock (via WorkerLockfile::read_from) and
only prints the pinned count (using lockfile.workers.len()) then exits; when
frozen is false it should instead perform reconciliation/installation of worker
binaries/images and reconcile config.yaml. Replace the printing-only path in
handle_worker_sync with a call to the worker reconcile/install routine
(implement one if missing, e.g., reconcile_workers_from_lockfile or
handle_worker_install that accepts the loaded WorkerLockfile) and propagate its
Result/exit code (return non-zero on errors); keep the verify path calling
handle_worker_verify() for frozen=true and ensure errors from
WorkerLockfile::read_from still produce non-zero exit codes as before.
🧹 Nitpick comments (1)
crates/iii-worker/src/cli/managed.rs (1)

131-156: Unwrap of inner "config" key can surprise users.

When the top-level registry config object happens to contain a "config" key, this code unwraps to that inner value and drops the rest. That is correct for the documented {name, config} registry shape, but a legitimate user config that defines its own config: subkey (not the wrapper) will be silently rewritten to just that subkey's contents. It is worth a comment here pointing to the exact wrapper shape this is keyed on so future contributors don't broaden it accidentally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 131 - 156, The function
binary_config_yaml currently unwraps a nested "config" key for any top-level
JSON object which can mistakenly drop other keys; update binary_config_yaml to
only unwrap into the inner "config" when the object matches the expected
registry wrapper shape (e.g., contains exactly the keys "name" and "config" or
at least both "name" and "config" and no other unknown keys), and add a clear
comment above the match explaining this is intentional for the documented {name,
config} wrapper shape so future contributors don't broaden the behavior
accidentally; reference the function name binary_config_yaml and the "config"
key when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/iii-worker/src/cli/managed.rs`:
- Around line 421-484: The loop currently appends workers to config.yaml via
handle_binary_add/handle_oci_pull_and_add but returns early on first failure,
leaving iii.lock out of sync; modify managed.rs to track the names (and
corresponding WorkerLockfile entries) of successfully installed workers during
the for node in &graph.graph loop (e.g., collect into a Vec or a temporary
WorkerLockfile map), and if any handler returns rc != 0 then before returning
that rc write a best-effort lockfile containing only those successfully
installed workers (use the same WorkerLockfile struct read_from/insert/write_to
flow used later) so config.yaml and iii.lock remain consistent; update places
where lockfile_from_graph(graph, &target) is used to still produce the full
lockfile on success but ensure the early-return path uses the collected subset
and calls lockfile.write_to(...) before returning.
- Around line 279-314: After computing names (from worker_name or
locked_root_worker_names), add a check for names.is_empty() and if true print a
clear stderr message like "no workers pinned in iii.lock; nothing to update"
(using eprintln! and the same red prefix style as other errors) and return a
non-zero exit code (e.g., 1) so the command does not silently succeed; place
this check before the for name in &names loop (the surrounding function that
uses fetch_resolved_worker_graph, handle_resolved_graph_add, and fail_count).

---

Duplicate comments:
In `@crates/iii-worker/src/cli/managed.rs`:
- Around line 415-457: The loop in handle_resolved_graph_add performs
installs/config changes using node.name without validation; before doing any
side effects, iterate over graph.graph and call validate_worker_name for each
node.name (and also reject empty/missing names) and return a non-zero error if
validation fails so no calls to handle_binary_add, handle_oci_pull_and_add, or
append_worker occur; ensure you reference ResolvedWorkerGraph/graph.graph and
node.name when adding this pre-check so all nodes are validated up front.
- Around line 262-314: handle_worker_update currently re-resolves root graphs
via locked_root_worker_names but never prunes stale entries because
handle_resolved_graph_add only inserts new pins; to fix, after fetching each
resolved graph in handle_worker_update (or inside handle_resolved_graph_add)
compute the set of nodes reachable from the newly resolved root graph and then
remove any entries from the in-memory WorkerLockfile that are not reachable from
any root's new graph before persisting iii.lock; use the existing symbols
locked_root_worker_names, fetch_resolved_worker_graph, handle_resolved_graph_add
and the WorkerLockfile structure to locate where to diff the resolved subtree
against lockfile.workers and delete unreachable keys so removed upstream deps no
longer linger.
- Around line 212-234: handle_worker_verify currently calls list_worker_names
which swallows read errors, causing verify to succeed on unreadable config;
change the flow to use a fallible API (or make list_worker_names return Result)
so the function returns Err on config read failure: replace the call to
list_worker_names() with a fallible reader (e.g., a new list_worker_names_result
or a read_config_names that returns Result<Vec<String>, Error>), propagate the
error out of handle_worker_verify (printing the error and returning non-zero)
before calling lockfile_relevant_config_worker_names and verify_config_workers;
keep using lockfile_relevant_config_worker_names(&lockfile, &names) and
lockfile.verify_config_workers(&names) after successful read.
- Around line 188-210: handle_worker_sync currently reads iii.lock (via
WorkerLockfile::read_from) and only prints the pinned count (using
lockfile.workers.len()) then exits; when frozen is false it should instead
perform reconciliation/installation of worker binaries/images and reconcile
config.yaml. Replace the printing-only path in handle_worker_sync with a call to
the worker reconcile/install routine (implement one if missing, e.g.,
reconcile_workers_from_lockfile or handle_worker_install that accepts the loaded
WorkerLockfile) and propagate its Result/exit code (return non-zero on errors);
keep the verify path calling handle_worker_verify() for frozen=true and ensure
errors from WorkerLockfile::read_from still produce non-zero exit codes as
before.

---

Nitpick comments:
In `@crates/iii-worker/src/cli/managed.rs`:
- Around line 131-156: The function binary_config_yaml currently unwraps a
nested "config" key for any top-level JSON object which can mistakenly drop
other keys; update binary_config_yaml to only unwrap into the inner "config"
when the object matches the expected registry wrapper shape (e.g., contains
exactly the keys "name" and "config" or at least both "name" and "config" and no
other unknown keys), and add a clear comment above the match explaining this is
intentional for the documented {name, config} wrapper shape so future
contributors don't broaden the behavior accidentally; reference the function
name binary_config_yaml and the "config" key when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 05165721-712b-4cf1-ba54-ed66f7b33959

📥 Commits

Reviewing files that changed from the base of the PR and between 9e810ef and ddb1c2b.

📒 Files selected for processing (7)
  • crates/iii-worker/src/cli/app.rs
  • crates/iii-worker/src/cli/managed.rs
  • crates/iii-worker/src/cli/registry.rs
  • crates/iii-worker/src/lib.rs
  • crates/iii-worker/tests/worker_integration.rs
  • docs/how-to/reproduce-worker-installs.mdx
  • docs/workers/managed-worker-lockfile.mdx
✅ Files skipped from review due to trivial changes (2)
  • docs/how-to/reproduce-worker-installs.mdx
  • docs/workers/managed-worker-lockfile.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/iii-worker/src/cli/app.rs

Comment on lines +279 to +314
let names: Vec<String> = match worker_name {
Some(name) => {
if !lockfile.workers.contains_key(name) {
eprintln!("{} Worker '{}' is not in iii.lock", "error:".red(), name);
return 1;
}
vec![name.to_string()]
}
None => locked_root_worker_names(&lockfile),
};

let mut fail_count = 0;
for name in &names {
let graph = match fetch_resolved_worker_graph(
name,
Some("latest"),
Some(binary_download::current_target()),
)
.await
{
Ok(graph) => graph,
Err(e) => {
eprintln!("{} {}", "error:".red(), e);
fail_count += 1;
continue;
}
};

let rc = handle_resolved_graph_add(&graph, false).await;
if rc != 0 {
fail_count += 1;
}
}

if fail_count == 0 { 0 } else { 1 }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: update with no name and an empty lockfile exits 0 silently.

If iii.lock parses but contains zero entries, locked_root_worker_names returns an empty vec, the loop runs zero iterations, and the command reports success with no output. This is very easy to mistake for "updated successfully" in CI logs. Consider emitting a message like "no workers pinned in iii.lock; nothing to update" (or a non-zero exit) to distinguish from the happy path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/iii-worker/src/cli/managed.rs` around lines 279 - 314, After computing
names (from worker_name or locked_root_worker_names), add a check for
names.is_empty() and if true print a clear stderr message like "no workers
pinned in iii.lock; nothing to update" (using eprintln! and the same red prefix
style as other errors) and return a non-zero exit code (e.g., 1) so the command
does not silently succeed; place this check before the for name in &names loop
(the surrounding function that uses fetch_resolved_worker_graph,
handle_resolved_graph_add, and fail_count).

Comment thread crates/iii-worker/src/cli/managed.rs
Binary lock entries record a single target triple today, so a lockfile created on macOS could previously satisfy Linux CI as long as the worker name existed. Frozen verification now checks the current target for binary workers and fails when the lock was resolved for another platform.

Constraint: iii.lock v1 still stores one binary target per entry.

Rejected: Expand the schema to multi-target binary pins | requires resolver/API certainty that every graph response includes all platform artifacts.

Confidence: high

Scope-risk: narrow

Directive: If the resolver starts returning all binary artifacts reliably, prefer evolving iii.lock to multi-target binary source pins instead of weakening this target check.

Tested: cargo fmt --all

Tested: cargo fmt --all -- --check

Tested: cargo test -p iii-worker cli::lockfile::tests -- --nocapture

Tested: cargo test -p iii-worker managed::tests -- --nocapture

Tested: docs/docs.json parsed with node

Tested: git diff --check

Not-tested: pnpm fmt:check; still blocked by existing Biome nested-root configuration errors.
Binary lockfile entries now record a sorted artifact map keyed by target triple, so one resolved worker version can carry macOS and Linux binaries in the same iii.lock. The resolver path asks for a platform-neutral graph, while verify checks only that the current target exists in the locked artifact set. Legacy single-target binary source entries are still accepted on read and rewritten into the artifacts form.

Constraint: Existing single-target iii.lock files must remain readable.

Rejected: Keep one binary target per lock entry | reproduces the macOS-to-Linux CI failure the lockfile is meant to avoid.

Confidence: high

Scope-risk: moderate

Directive: Do not reintroduce target-filtered graph resolution unless the registry has a separate immutable multi-target metadata source.

Tested: cargo fmt --all; cargo fmt --all -- --check; cargo test -p iii-worker cli::lockfile::tests -- --nocapture; cargo test -p iii-worker managed::tests -- --nocapture; cargo test -p iii-worker cli::registry::tests -- --nocapture; docs/docs.json parse; git diff --check

Not-tested: pnpm fmt:check because pnpm is not available on PATH in this shell.
sergiofilhowz
sergiofilhowz previously approved these changes Apr 23, 2026
The lockfile is now validated before read/write callers use it, so unsupported schema versions, invalid binary checksums, unpinned images, invalid worker names, and type/source mismatches do not pass as reproducible pins. Worker verify now uses a fallible config.yaml name reader so parse/read errors cannot turn into empty-config success, and resolved registry graph nodes are prevalidated before any install or config side effects.

Constraint: Non-frozen worker sync remains intentionally read-only in this PR.

Rejected: Implement full lockfile sync installer here | larger behavior change already documented as follow-up and outside the necessary review fixes.

Confidence: high

Scope-risk: narrow

Directive: Keep verify fail-closed; do not route it through tolerant config helpers that hide read or YAML errors.

Tested: cargo fmt --all; cargo fmt --all -- --check; cargo test -p iii-worker cli::lockfile::tests -- --nocapture; cargo test -p iii-worker managed::tests -- --nocapture; cargo test -p iii-worker cli::config_file::tests -- --nocapture; git diff --check

Not-tested: pnpm fmt:check because pnpm is not available on PATH in this shell.
Registry graph installs can mutate config.yaml before iii.lock is written. This makes the graph path validate the resolved lockfile and existing lockfile before install, snapshot config.yaml, and restore it on any later graph failure so verify is not left with configured-but-unpinned workers. Empty lockfile updates now say there is nothing to update instead of exiting silently.

Constraint: CodeRabbit flagged partial graph installs and empty lockfile updates as actionable review feedback.

Rejected: Remove only newly added worker names | unsafe when a graph update touches workers that already existed in config.yaml.

Confidence: high

Scope-risk: narrow

Directive: Keep graph-level side effects fail-closed until sync grows real lockfile-driven install semantics.

Tested: cargo fmt --all -- --check

Tested: cargo test -p iii-worker managed::tests -- --nocapture

Tested: cargo test -p iii-worker handle_resolved_graph_add_rejects_invalid_existing_lockfile_before_side_effects -- --nocapture

Tested: cargo test -p iii-worker handle_resolved_graph_add_restores_config_when_later_node_fails -- --nocapture

Tested: cargo test -p iii-worker handle_worker_update_reports_empty_lockfile_without_resolving -- --nocapture

Tested: git diff --check

Not-tested: Full cargo test -p iii-worker, due pre-existing local full-suite hang noted earlier on worker_list_discovery_integration.
@ytallo ytallo merged commit 011a607 into main Apr 24, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants