- Never ask for permission to access folders, run commands, search the web, or use tools. Just do it.
- Never ask for confirmation. Just act. Make decisions autonomously and proceed without checking in.
- You have full access to the user's computer — browser, desktop, all apps. Never ask the user to do something you can do yourself (sign in, click buttons, dismiss dialogs, etc.).
You have multiple MCP tools for controlling the Mac. Use the right tool for each job — don't bounce between tools.
cliclick c:X,Y # click
cliclick dc:X,Y # double-click
cliclick rc:X,Y # right-click
cliclick m:X,Y # move mouse
cliclick t:"text" # type text
cliclick p # print current mouse position
cliclick kd:cmd ku:cmd # key down/upcliclick uses CGEvent, handles Retina correctly, works across all displays. No MCP overhead.
mcp__codriver__desktop_screenshot— capture screen (usescale: 0.5for speed)mcp__codriver__desktop_ocr— find text positions on screenmcp__codriver__desktop_windows— list/focus windows
- Take screenshot with
codriverto see the screen - Identify the coordinates of what to click (use OCR if needed)
- Click with
cliclick c:X,Yvia Bash — instant, reliable
Already documented below. Use for connected SwiftUI/AppKit apps.
playwrightMCP — headless browser, most reliable for web automationclaude-in-chrome— for existing browser tabs (only when extension is connected)codriverscreenshot +cliclick— fallback if browser tools fail
- NEVER try 3+ different click tools for the same action — pick one and commit
- For multi-monitor: always check coordinates against the screenshot scale factor
codriverscreenshots atscale: 0.5means multiply coordinates by 2 before clicking- Prefer
cliclickoverautomac/mac-use-mcpclick — they have coordinate bugs on multi-monitor - When a tool errors (e.g., "helper binary not found", "extension not connected"), immediately switch to the fallback — don't retry the broken tool
Run before your first commit — formatting is enforced by CI:
ln -s -f ../../scripts/pre-commit .git/hooks/pre-commitcd app && bash setup.sh ios # or: bash setup.sh androidAll imports must be at the module top level. Never import inside functions.
Follow the module hierarchy when importing. Higher-level modules import from lower-level modules, never the reverse.
Module hierarchy (lowest to highest):
database/- Database connections, cache instancesutils/- Utility functions, helpersrouters/- API endpointsmain.py- Application entry point
Free large objects immediately after use. E.g., del for byte arrays after processing, .clear() for dicts/lists holding data.
Never log raw sensitive data. Use sanitize() and sanitize_pii() from utils.log_sanitizer.
Rules:
sanitize()forresponse.text, API responses, and error bodies.sanitize_pii()for names, emails, and user text.- Keep log levels as-is (don't downgrade to hide data).
- Keep UIDs, IPs, status codes, and structural info visible for debugging.
- Never put raw
response.textin exception messages.
Shared: Firestore, Redis
backend (main.py)
├── ws ──► pusher (pusher/)
├── ──────► diarizer (diarizer/)
├── ──────► vad (modal/)
└── ──────► deepgram (self-hosted or cloud)
pusher
├── ──────► diarizer (diarizer/)
└── ──────► deepgram (cloud)
agent-proxy (agent-proxy/main.py)
└── ws ──► user agent VM (private IP, port 8080)
notifications-job (modal/job.py) [cron]
Helm charts: backend/charts/{backend-listen,pusher,diarizer,vad,deepgram-self-hosted,agent-proxy}/
See service descriptions in AGENTS.md. Update both files when service boundaries change.
- All user-facing strings must use l10n. Use
context.l10n.keyNameinstead of hardcoded strings. Add new keys to ARB files usingjq(never read full ARB files - they're large and will burn tokens). See skilladd-a-new-localization-key-l10n-arbfor details. - Translate all locales: When adding new l10n keys, provide real translations for all 33 non-English locales — do not leave English text in non-English ARB files. Use the
omi-add-missing-language-keys-l10nskill to generate proper translations. Ensure{parameter}placeholders match the English ARB exactly. - After modifying ARB files in
app/lib/l10n/, regenerate the localization files:
cd app && flutter gen-l10nAfter editing Flutter UI code, verify the change programmatically — do not just hot restart and hope.
Marionette is already integrated in debug builds (marionette_flutter: ^0.3.0). Install agent-flutter once: npm install -g agent-flutter-cli.
Edit → Verify → Evidence loop:
# 1. Edit Dart code, then hot restart
kill -SIGUSR2 $(pgrep -f "flutter run" | head -1)
# 2. Connect (must reconnect after every hot restart)
AGENT_FLUTTER_LOG=/tmp/flutter-run.log agent-flutter connect
# 3. See what's on screen
agent-flutter snapshot -i # list interactive widgets
agent-flutter snapshot -i --json # structured data for parsing
# 4. Interact
agent-flutter press @e3 # tap by ref
agent-flutter press 540 1200 # tap by coordinates (ADB fallback)
agent-flutter dismiss # dismiss system dialogs (location, permissions)
agent-flutter find type button press # find and tap (more stable than @ref)
agent-flutter fill @e5 "hello" # type into textfield
agent-flutter scroll down # scroll current view
# 5. Screenshot evidence for PRs
agent-flutter screenshot /tmp/after-change.pngKey rules:
- Refs go stale frequently (Flutter rebuilds aggressively) — always re-snapshot before every interaction. Use
press x yas fallback. AGENT_FLUTTER_LOGmust point to the flutter run stdout log file (not logcat). This is how agent-flutter finds the correct VM Service URI.find type Xorfind text "label"is more stable than hardcoded@refnumbers.- When adding new interactive widgets, use
Key('descriptive_name')so agents can usefind key(survives i18n and theme changes). - Android: auto-detects via ADB. iOS: requires
AGENT_FLUTTER_LOGor explicit URI. - App flows & exploration skill: See
app/e2e/SKILL.mdfor navigation architecture, screen map, widget patterns, and known flows. Read this when developing features or exploring the app.
Never run flutterfire configure — it overwrites prod credentials. Prod config files in app/ios/Config/Prod/, app/lib/firebase_options_prod.dart, app/android/app/src/prod/.
After editing Swift UI code, verify the change programmatically via the macOS Accessibility API — no app-side instrumentation needed.
Install agent-swift once: brew install beastoin/tap/agent-swift. Requires Accessibility permission for Terminal.app (System Settings → Privacy & Security → Accessibility).
Edit → Verify → Evidence loop:
# 1. Edit Swift code, rebuild and run
cd desktop && ./run.sh
# 2. Connect to the running app
agent-swift connect --bundle-id com.omi.desktop-dev
# 3. See what's on screen
agent-swift snapshot -i # interactive elements only (recommended)
agent-swift snapshot -i --json # structured data for parsing
# 4. Interact
agent-swift click @e3 # CGEvent click (works with SwiftUI)
agent-swift press @e3 # AXPress action (AppKit buttons)
agent-swift fill @e5 "search text" # type into a text field
agent-swift find role button click # find + chained action
agent-swift scroll down # scroll the view
# 5. Assert & wait
agent-swift is exists @e3 # exit 0 = true, exit 1 = false
agent-swift wait text "Settings" # wait for text to appear (5s default)
# 6. Screenshot evidence for PRs
agent-swift screenshot /tmp/after-change.png # capture app windowKey rules:
agent-swift doctorverifies Accessibility permission and can check the target app.- Prefer
clickoverpressfor SwiftUI apps —clicksends CGEvent mouse clicks that trigger NavigationLink/gesture handlers, whilepresssends AXPress which only works for AppKit buttons. - Refs go stale after
click/press/fill/scroll— re-snapshot before the next interaction. - Always use
snapshot -i(interactive only) — full snapshots of complex apps are very verbose. - Argument order:
get <property> <ref>,is <condition> <ref>,wait <condition> [<target>],find <locator> <value>. - JSON output:
--jsonflag,AGENT_SWIFT_JSON=1env var, or pipe to auto-detect. - 15 commands:
doctor,connect,disconnect,status,snapshot,press,click,fill,get,find,screenshot,is,wait,scroll,schema. - Works with any macOS app (SwiftUI, AppKit, Electron) — no Marionette or app-side setup.
- Bundle ID for dev:
com.omi.desktop-dev. For prod:com.omi.computer-macos. - Named test bundles: When testing a feature or bug fix, ALWAYS create a separate named bundle with
OMI_APP_NAME="feature-name" ./run.sh. This installs to/Applications/feature-name.appwith bundle IDcom.omi.feature-name, running side-by-side with "Omi Dev" and "Omi Beta". NEVER overwrite "Omi Dev" when testing a specific change — the user may have it running. Connect agent-swift with--bundle-id com.omi.feature-name. - Keep the bundle suffix and app name identical so auth callbacks reopen the correct app. Example:
1233.appshould usecom.omi.1233,search.appshould usecom.omi.search, and mismatches like1233.appwithcom.omi.desktop-devare not allowed. - App flows & exploration skill: See
desktop/e2e/SKILL.mdfor navigation architecture, screen map, interaction patterns (click vs press), and known flows. Read this when developing features or exploring the app. - When asked to build or rebuild the desktop app for testing, don't stop at a successful compile: launch the named test app, interact with it programmatically to confirm it actually runs, and report any environment blocker if full interaction is impossible.
Always format code after making changes. The pre-commit hook handles this automatically, but you can also run manually:
dart format --line-length 120 <files>Note: Files ending in .gen.dart or .g.dart are auto-generated and should not be formatted manually.
black --line-length 120 --skip-string-normalization <files>clang-format -i <files>- Always commit to the current branch — never switch branches.
- Never push directly to
main. - Never merge directly from a local branch. Land changes through a PR only.
- When a change should go remote, create or use a feature branch, commit there, open/update a PR, and merge via the PR.
- Never squash merge PRs — use regular merge.
- Make individual commits per file, not bulk commits.
- The pre-commit hook auto-formats staged code — no need to format manually before committing.
- If push fails because the remote is ahead, pull with rebase first:
git pull --rebase && git push. - Never push or create PRs unless explicitly asked — commit locally by default.
- Always work in a git worktree for code changes. Use
EnterWorktreeat the start of a task to isolate your work.
When the user says "RELEASE", create a branch from main, make individual commits per changed file, push/create a PR, merge without squash, then switch back to main and pull.
Run the full RELEASE flow, then deploy backend to production with gh workflow run gcp_backend.yml -f environment=prod -f branch=main.
See docs/runbooks/deploy.md for deploy triggers and checks.
See docs/runbooks/logging.md for log commands.
- If a PR changes setup steps, test commands, safety rules, service boundaries, or env vars — update this file in the same PR.
- Keep
AGENTS.mdsynced with this file. Update both in the same commit. - Keep rules concise (one-line statements). No code examples or verbose prose in this file.
- For significant changes to architecture, core flows, or APIs — update the Mintlify docs (
docs/) in the same PR. Key files:docs/doc/developer/backend/backend_deepdive.mdx(architecture),docs/doc/developer/backend/chat_system.mdx(chat),docs/doc/developer/backend/transcription.mdx(STT pipeline). - If a PR changes how audio streaming, transcription, conversation lifecycle, speaker identification, or the listen/pusher WebSocket protocol works — update
docs/doc/developer/backend/listen_pusher_pipeline.mdxin the same PR. This includes changes to timeouts, event types, processing flow, or inter-service communication between listen and pusher.
Run backend/test-preflight.sh to verify environment. Run backend/test.sh (backend) or app/test.sh (app) before committing.