Skip to content

feat: migrate DefenseClaw POC from Docker to Fly.io#2066

Draft
mfbx9da4 wants to merge 10 commits intomainfrom
feat/claw-poc-fly
Draft

feat: migrate DefenseClaw POC from Docker to Fly.io#2066
mfbx9da4 wants to merge 10 commits intomainfrom
feat/claw-poc-fly

Conversation

@mfbx9da4
Copy link
Copy Markdown

@mfbx9da4 mfbx9da4 commented Apr 1, 2026

Summary

  • Migrates the DefenseClaw x Gram POC from Docker containers to Fly.io Firecracker microVMs
  • OpenShell runs natively on a real VM — no --privileged, no nested namespace hacks
  • Secrets managed via fly secrets set instead of command-line env vars
  • DNS uses the VM's own resolv.conf instead of hardcoded 8.8.8.8

What changed

Docker Fly.io
Runtime docker run --privileged fly deploy (Firecracker VM)
Orchestration docker build/run/exec/logs fly deploy/ssh/logs
Namespace hack unshare --mount Direct mount --bind
DNS Hardcoded 8.8.8.8 VM's /etc/resolv.conf
Secrets -e GRAM_API_KEY=... on CLI fly secrets set (encrypted)

Test plan

  • fly auth login succeeds
  • uv run python poc.py deploys to Fly, starts OpenShell sandbox, passes all 3 tests
  • fly apps destroy defenseclaw-gram-poc --yes tears down cleanly

mfbx9da4 and others added 7 commits March 31, 2026 19:12
Provisions an OpenClaw agent inside Docker, pre-configured with a Gram
tenant's MCP servers. LLM calls route through Gram's completions endpoint
and network egress is locked down via iptables to only allowed hosts.

Verified end-to-end: MCP tool calls, chat completions, network allow/block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual iptables-based network policy with OpenShell sandbox,
providing kernel-level isolation via Linux namespaces, Landlock LSM,
seccomp-BPF, and OPA/Rego network policy enforcement.

- Add openshell-sandbox binary install from NVIDIA OCI image
- Add OPA Rego policy from DefenseClaw for per-CONNECT evaluation
- Route all sandbox HTTP(S) through OpenShell's CONNECT proxy
- Rewrite entrypoint as start.py with proper sandbox lifecycle
- Add start-openclaw.sh that runs inside the sandbox namespace
- Container now requires --privileged for namespace creation
The previous tests used a non-existent web_fetch tool, causing the LLM
to hallucinate responses and bypass the proxy entirely. Now tests use
curl via nsenter into the gateway's network namespace, where traffic
goes through the OpenShell CONNECT proxy with OPA policy enforcement.

- Fix HOME not set in start-openclaw.sh (openshell-sandbox doesn't set it)
- Fix .env.local values not overriding mise env vars in config.py
- Add sandbox_curl() that runs curl inside sandbox network namespace
- asdf.com now properly blocked by OPA policy (CONNECT denied)
…iency

- Remove unused OPENSHELL_NETWORK_POLICY and GUARDRAIL dicts from config.py
  (start.py generates the policy YAML directly, these were never consumed)
- Fix README and config.py referencing deleted start.sh (now start.py)
- Add warning when gateway health check times out in start.py
- Remove unconditional sleep before sandbox PID lookup
- Check-then-sleep instead of sleep-then-check in poc.py gateway wait loop
- Reuse DefenseClaw clone from dc-builder stage instead of cloning twice
- Remove git from runtime image (no longer needed)
Policy files (/etc/defenseclaw/) are now root:root 644, matching
DefenseClaw's production architecture. The sandbox user can read
them but cannot modify them to weaken its own network restrictions.

The openshell-sandbox binary loads policy at startup on the host side
of the namespace boundary — even if the file were somehow modified,
the running proxy would not reload it.
Replace Docker container runtime with Fly.io Firecracker microVMs.
OpenShell runs natively on a real VM — no --privileged flag, no nested
namespace hacks, no Docker-specific iptables bridging.

- Add fly.toml for Fly app configuration
- Simplify start.py: remove unshare --mount, use VM's DNS directly
- Rewrite poc.py: use fly CLI (deploy, ssh, logs) instead of docker CLI
- Update README with Fly architecture and setup instructions
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 1, 2026

⚠️ No Changeset found

Latest commit: 773917a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gram-docs-redirect Ready Ready Preview, Comment Apr 1, 2026 0:47am

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@mfbx9da4 mfbx9da4 changed the base branch from feat/claw-poc to main April 1, 2026 12:02
OpenShell's OPA policy checks the full ancestor chain of every process
making network requests. Commands run via fly ssh have /.fly/hallpass
in their ancestry, which fails the integrity check.

Fix: run network tests from start-openclaw.sh (inside the sandbox),
where the process tree is bash -> python3 -> curl. poc.py reads the
test results from Fly logs instead of running curl remotely.

Also add grant-scopes.py and test-network-policy.py as baked-in scripts
to avoid shell quoting issues with fly ssh -C.
- Merge fly_ssh() and fly_ssh_ok() into one function with check param
- Move inline import json to top-level
- Extract FLY_ORG constant (was hardcoded speakeasy-lab)
- Fix config.py docstring referencing Docker container
- Add grant-scopes.py and test-network-policy.py to README files table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant