Miladyos local agent on android#7176
Conversation
…LADY mirrors `shouldWarmupLocalEmbeddingModel` had two real bugs that made every override silently no-op: 1. Both branches of each `||` referenced the same ELIZA env var, so the intended MILADY_* mirrors (used in discord-runtime-roundtrip-live and the electrobun-packaged test helpers) never short-circuited. 2. The reader compared against `=== "1"`, but `eliza/packages/agent/src/runtime/eliza.ts` sets `ELIZA_CLOUD_EMBEDDINGS_DISABLED = "true"`. The values never matched, so cloud-embedding-disabled environments still skipped local warmup. Replace both checks with a small `isTruthyEnv(...names)` helper that accepts "1" | "true" | "yes" (case-insensitive, trimmed) across both ELIZA_* and MILADY_* names. Cover all four var/value combinations in a new vitest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…TALOG - `smollm2-360m` (SmolLM2 360M Instruct, Q4_K_M, ~270MB): genuinely phone-friendly default for the tiny/small bucket. Verified file `SmolLM2-360M-Instruct-Q4_K_M.gguf` exists in bartowski/SmolLM2-360M-Instruct-GGUF. - `bonsai-8b-1bit` (apothic/bonsai-8B-1bit-turboquant, ~1.16GB): 1-bit TurboQuant Qwen3-8B. Weights load on stock llama.cpp; the KV-cache memory win requires the apothic/llama.cpp-1bit-turboquant fork. Blurb flags it as mobile-experimental. The file lives at `models/gguf/8B/Bonsai-8B.gguf` inside the repo, so `buildHuggingFaceResolveUrl` now encodes path segments individually instead of percent-encoding the slashes. Side changes: - `CatalogModel.params` gains `"360M"` so SmolLM2-360M typechecks. - The catalog test no longer requires `quant` to start with `Q\d` — Bonsai uses a non-standard `1-bit TurboQuant` quant scheme. - New regression test for nested-path resolve URLs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erviceRouting Lets callers opt specific non-llmText capabilities (e.g. embeddings) out of the cloud-proxy default routing. Type-safe — `excludeServices` is typed as `Exclude<ServiceCapability, "llmText">[]`, so passing a bad capability is a compile error and `llmText` stays gated by `includeInference`. Default behavior is unchanged when the arg is absent. The pre-existing `next[capability] ??= ...` semantics are preserved for capabilities that are not in the exclude set, so a base route already supplied by the caller still wins. Adds focused unit tests covering: default routing, exclude embeddings, exclude with pre-existing base, and empty-exclude ≡ default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `onboardingUseLocalEmbeddings?: boolean` to `BuildOnboardingConnectionArgs`. When true, the cloud-defaults branch passes `excludeServices: ["embeddings"]` to `buildDefaultElizaCloudServiceRouting`, so embeddings stay unconfigured and the agent can fall back to a local embedding provider instead of the cloud-proxy route. tts/media/rpc still route to the cloud proxy. Default behavior is unchanged: absent or `false` → embeddings keep the existing cloud-proxy route. Existing test file location: `eliza/packages/app-core/test/onboarding-config.test.ts` (not under `src/`). Extended with three cases covering default, flag-true, and flag-false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `useLocalEmbeddings?: boolean` to: - `applyOnboardingConnectionConfig` (3rd `options` arg). Translates `true` to `excludeServices: ["embeddings"]` at the boundary so the public surface stays a clean boolean. Both `buildDefaultElizaCloudServiceRouting` callsites — the cloud-managed branch and the local-provider-with-cloud-defaults branch — honor the flag. - `applyFirstTimeSetupTopology` args. Same translation at the builder callsite. - `POST /api/provider/switch` request body. The route handler reads the boolean and forwards it to `applyOnboardingConnectionConfig`. Inline body validation keeps the existing pattern (no zod here). Default behavior is unchanged: absent or `false` → embeddings keep the existing cloud-proxy route. Tests: - `provider-switch-config.test.ts` (new) — 4 cases covering the default/true/false translations on cloud-managed connections plus the local-provider-with-cloud-defaults branch. - `first-time-setup.test.ts` — 2 added cases for the topology builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent now actually exec's on the cuttlefish image. Successive fixes:
- MiladyAgentService writes a per-boot 32-byte hex token to
/data/data/<pkg>/files/auth/local-agent-token (mode 0600) and passes
it via ELIZA_API_TOKEN to the bun child. Server-side, the new
MILADY_REQUIRE_LOCAL_AUTH=1 env var flips
isTrustedLocalRequest()'s "loopback is implicitly trusted" heuristic
off, so every API call from the WebView (or anywhere else) needs a
bearer token even though it lives on the same loopback. Closes the
multi-app-on-loopback IPC gap the user surfaced.
- ABI selection in MiladyAgentService.resolveRuntimeAbi() now respects
Build.SUPPORTED_ABIS[0] order — cuttlefish_x86_64 reports
["x86_64","arm64-v8a"] and was previously preferring arm64
unconditionally, which produced ENOEXEC on the wrong-arch binary.
- Phase A's stage-android-agent.mjs and the spawn site land binaries
back under assets/agent/{abi}/ instead of jniLibs/{abi}/. The
earlier jniLibs pivot worked around an SELinux execute denial that
turned out to be the platform_app vs priv_app domain (we kept the
platform certificate) — narrower fix is a single vendor sepolicy
allow rule. The lib*.so renaming + symlink dance disappears with
it.
- Vendor sepolicy adds milady_agent.te:
allow platform_app app_data_file:file { execute execute_no_trans };
AOSP's stock platform_app does not include the equivalent of
priv_app's `allow priv_app privapp_data_file:file execute`, so
without this rule the bun execve fails with avc: denied { execute }
on the loader. validateSepolicy() now pins the rule.
- MiladyAgentService creates a libstdc++.so.6 → libstdc++.so.6.0.33
symlink in agent/{abi}/ at extraction time so bun's musl loader
finds the shared object by its soname instead of crashing with
hundreds of relocation errors.
- run-mobile-build.mjs's isCapacitorPlatformReady() now also requires
AndroidManifest.xml to exist (was the cause of one round-trip APK
build failure earlier — a deleted manifest never re-generated).
- RuntimeGate auto-picks Local Agent on Android when the probe
succeeds and no mode is persisted. User boots cuttlefish → straight
into chat. Settings ▸ Runtime can re-open the picker for users who
want cloud/remote later.
- Phase D bundle's externals list adds @elizaos/plugin-browser-bridge
→ null-plugin stub. The eager static import in eliza.ts was
pulling puppeteer transitively into the mobile bundle.
End state on cuttlefish: bun process executes, MiladyAgentService
keeps it alive via watchdog with exponential backoff, /api/health
responds 200 against the spike-stub bundle. The full @elizaos/agent
bundle exec's but exits with SIGSYS (code 159) — Android's seccomp
filter blocks a syscall bun makes (likely io_uring_setup or one of
the platform-specific futex variants). That's the next yard:
seccomp policy diagnosis + targeted exemption in vendor sepolicy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
| useEffect(() => { | ||
| if (!isAndroid) return; | ||
| if (!showLocalOption) return; | ||
| if (readPersistedMobileRuntimeMode() != null) return; | ||
| finishAsLocal(); | ||
| // intentionally only triggers once — finishAsLocal persists the mode | ||
| // and dispatches SPLASH_CONTINUE; a second invocation is a no-op. | ||
| // eslint-disable-next-line react-hooks/exhaustive-deps | ||
| }, [showLocalOption]); | ||
|
|
||
| const showLocalOption = localProbeResult === true; |
There was a problem hiding this comment.
showLocalOption used before declaration — TDZ ReferenceError
useEffect(fn, [showLocalOption]) is called at line 157, but const showLocalOption = localProbeResult === true is not declared until line 167. JavaScript const bindings are in the temporal dead zone until their declaration is reached: the dependency array [showLocalOption] is evaluated immediately when useEffect(...) is called, throwing ReferenceError: Cannot access 'showLocalOption' before initialization on every render of RuntimeGate, which will crash the component universally — not just on Android.
Move the new useEffect to after line 167 (after both showLocalOption and localProbePending are declared).
| useEffect(() => { | |
| if (!isAndroid) return; | |
| if (!showLocalOption) return; | |
| if (readPersistedMobileRuntimeMode() != null) return; | |
| finishAsLocal(); | |
| // intentionally only triggers once — finishAsLocal persists the mode | |
| // and dispatches SPLASH_CONTINUE; a second invocation is a no-op. | |
| // eslint-disable-next-line react-hooks/exhaustive-deps | |
| }, [showLocalOption]); | |
| const showLocalOption = localProbeResult === true; | |
| const showLocalOption = localProbeResult === true; | |
| const localProbePending = localProbeResult === null; | |
| useEffect(() => { | |
| if (!isAndroid) return; | |
| if (!showLocalOption) return; | |
| if (readPersistedMobileRuntimeMode() != null) return; | |
| finishAsLocal(); | |
| // intentionally only triggers once — finishAsLocal persists the mode | |
| // and dispatches SPLASH_CONTINUE; a second invocation is a no-op. | |
| // eslint-disable-next-line react-hooks/exhaustive-deps | |
| }, [showLocalOption]); |
| } | ||
| } | ||
|
|
||
| // bun's binary requests `libstdc++.so.6` at runtime (the soname), | ||
| // but the actual file we shipped is the versioned realpath | ||
| // (`libstdc++.so.6.0.33`). Without a symlink the musl loader | ||
| // can't find the shared object and bun crashes with hundreds of | ||
| // "Error relocating: symbol not found" lines. Create the symlink | ||
| // pointing from the soname to the realpath inside the same abi | ||
| // dir so LD_LIBRARY_PATH resolution works without LD_PRELOAD. | ||
| for (String name : abiFiles) { | ||
| if (name.startsWith("libstdc++.so.6.")) { | ||
| File realPath = new File(abiDir, name); | ||
| File symlink = new File(abiDir, "libstdc++.so.6"); | ||
| if (realPath.exists() && !symlink.exists()) { | ||
| try { | ||
| java.nio.file.Files.createSymbolicLink( |
There was a problem hiding this comment.
Symlink idempotency breaks on ABI library version bump
The guard !symlink.exists() prevents recreating libstdc++.so.6 if it already exists. If the app is updated and libstdc++.so.6.0.33 is replaced by libstdc++.so.6.0.34, the old symlink is left in place pointing to the now-absent versioned file, while copyAssetIfMissing skips overwriting existing ABI files — so the agent silently fails to start with unresolved-symbol errors. Consider replacing the !symlink.exists() check with a version-aware comparison (e.g. delete and re-create when the target name changes), or recreate the symlink unconditionally on each extract.
| if (!dir.exists() && !dir.mkdirs()) { | ||
| throw new IOException("Could not create " + dir); | ||
| } | ||
| File file = new File(dir, "local-agent-token"); | ||
| try (FileOutputStream out = new FileOutputStream(file)) { | ||
| out.write(token.getBytes()); | ||
| } | ||
| file.setReadable(false, false); | ||
| file.setReadable(true, true); | ||
| file.setWritable(false, false); | ||
| file.setWritable(true, true); | ||
| } | ||
|
|
||
| private long safePid(Process process) { | ||
| // Process#pid() is Java 9+; Android's java.lang.Process exposes it | ||
| // since API 24. AGP's d8 desugaring on this project rejects the |
There was a problem hiding this comment.
Token file world-readable for a brief window during creation
new FileOutputStream(file) creates the file with the process's default umask permissions before the setReadable/setWritable calls restrict it. On Android the default umask typically allows group/world read, so there is a short race between file creation and the setReadable(false, false) call during which another process could observe the token. Use openFileOutput(name, MODE_PRIVATE) (Context API, always 0600) or write to a temp file and renameTo to keep the file restricted from the moment it is created.
Miladyos local agent on android
Relates to
Risks
Background
What does this PR do?
What kind of change is this?
Documentation changes needed?
Testing
Where should a reviewer start?
Detailed testing steps
Greptile Summary
This PR wires up a local Eliza agent running inside an Android foreground service (
MiladyAgentService), including per-boot bearer-token auth (shared loopback requires explicit token, not just loopback trust),libstdc++symlink creation for the musl runtime, local-embeddings opt-out across the service-routing layer, and a new auto-pick flow inRuntimeGatethat lands Android users directly in chat when the on-device agent is detected.RuntimeGate.tsx: The newuseEffectreferencesshowLocalOption(and passes it as a dependency) beforeconst showLocalOption = ...is declared; this is a Temporal Dead ZoneReferenceErrorthat crashes every render of the component.MiladyAgentService.java: Thelibstdc++.so.6symlink guard (!symlink.exists()) will silently leave a dangling symlink after an app update that bumps the versioned filename, breaking agent startup with unresolved-symbol errors.Confidence Score: 2/5
Not safe to merge — the TDZ crash in RuntimeGate will break the component on every render, and the symlink issue will silently break the agent on app updates.
A P0 TDZ ReferenceError in RuntimeGate.tsx crashes the UI component universally on every render, and a P1 symlink idempotency bug in MiladyAgentService.java silently breaks the Android agent on library version bumps. Both must be fixed before merge.
packages/app-core/src/components/shell/RuntimeGate.tsx (P0 TDZ crash) and packages/app-core/platforms/android/app/src/main/java/ai/elizaos/app/MiladyAgentService.java (P1 symlink / P2 token race)
Security Review
MiladyAgentService.java): The per-boot bearer token is written vianew FileOutputStream(file), which creates the file with default (potentially world-readable) permissions beforesetReadable(false, false)is called. UsingContext.openFileOutput(name, MODE_PRIVATE)or an atomic rename would eliminate the window.server-helpers-auth.ts): Design is sound — the env var correctly disables loopback-only trust on Android where the loopback interface is shared across apps, requiring bearer auth on all routes except/api/health.Important Files Changed
Sequence Diagram
sequenceDiagram participant AS as MiladyAgentService (Android) participant BUN as Bun/Agent Process participant WV as WebView (Capacitor) participant RT as RuntimeGate (React) participant SR as ElizaOS Server AS->>AS: Generate per-boot bearer token (SecureRandom) AS->>AS: Persist token to restricted file AS->>BUN: execve(musl-loader, bun, bundle) with token + auth env vars BUN->>SR: Start HTTP on port 31337 WV->>AS: Read token via Capacitor plugin AS-->>WV: token WV->>SR: GET /api/health (local probe) SR-->>WV: 200 OK WV->>RT: localProbeResult = true RT->>RT: showLocalOption = true Note over RT: P0 - useEffect references showLocalOption BEFORE declaration RT->>RT: finishAsLocal() → persist mode → SPLASH_CONTINUE WV->>SR: API calls with Authorization Bearer token SR->>SR: Loopback trust disabled on Android → bearer token requiredReviews (1): Last reviewed commit: "feat(local-agent-android): bun execve + ..." | Re-trigger Greptile