A macOS-native intelligent overlay for pharmacy-terminal workflows.
Reads an RxConnect-style terminal via screen capture + OCR, reasons about the task, researches the answer, and — under earned, per-workflow autonomy — drives the terminal with synthetic input. Pixels in, keystrokes out. No backdoors.
Note
Status: blueprint / design phase. Synthetic data only. Independent
build — not connected to, and must not be wired to, any proprietary pharmacy
system. The docs in docs/ are the source of truth; there is no
application code yet.
- What Shield does
- The goal
- The concept in one picture
- How it learns the job
- Read the blueprint
- Stack
- Architectural invariants
- Decisions
reads the screen → reasons about the task → acts (under guardrails)
(Vision OCR) (LLM + knowledge base) (synthetic keyboard/mouse)
It turns an unadjudicated or rejected prescription into a closed, billable (or escalated) outcome — eligibility verification, benefits investigation, reject interpretation, COB fixes, test claims, prior-auth initiation, outreach, and documentation. The integration is deliberately black-box: image in, keyboard/mouse out, more image in — exactly like a human, with no API, DB, or accessibility-tree access to the target.
A drop-in for the closable majority of a benefits/PA tech's work — target save 5 of 8 hours (worst case), leaving the human the ~3 hrs that need a person. Not a human replacement. It runs end-to-end on what it can close, sends outreach act-then-notify, and routes the rest to a review queue. Oversight, approvals, and notifications run through companion macOS / watchOS / iOS apps.
┌──────────── SwiftUI overlay (operator) ────────────┐
│ proposals · research · confirmations · teach mode │
└───────▲───────────────────────────────────┬─────────┘
│ observe / advise │ typed intents
┌───────┴─────────── Orchestration ──────────▼─────────┐
│ task state machine · plan → execute → verify · │
│ confidence gate · earned-autonomy policy │
└──▲────────────────▲──────────────────────────┬────────┘
│ ScreenState │ ResearchResult │ Action
┌────┴─────┐ ┌───────┴────────┐ ┌─────────▼─────────┐
│Perception│ │ Research engine│ │ Actuation │
│capture+OCR│ │ LLM router + │ │ CGEvent / visual │
│(on-device)│ │ knowledge base │ │ aiming │
└──────────┘ └────────────────┘ └───────────────────┘
Division of labor: the LLM owns what & why (diagnosis, planning, drafting); deterministic code owns how & whether-it's-safe (navigation, input, interlocks, fact lookups). The LLM never emits a keystroke.
Shield is taught, not fine-tuned. Knowledge enters through three channels:
| Channel | Role | Risk |
|---|---|---|
| 🎓 Interactive teaching | Primary. Teach a module on synthetic cases — teach → attempt → correct — producing a versioned Site Profile | Low (synthetic) |
| 🔬 Passive observation | Hardening. Thin real-run capture for production messiness | Higher (PHI) |
| 🔁 Learning loop | The long tail. Novel cases reasoned once, human-reviewed, promoted into Skill/Fact/Rule stores | Bounded |
Generalize to solve it once. Specialize to repeat it forever — by growing auditable tables and macros, never by drifting model weights.
See docs/TRAINABILITY.md for the full design.
| Doc | What it covers |
|---|---|
| 📋 docs/JOB.md | The end-to-end job Shield automates (intake → resolution → follow-up). |
| 🏗️ docs/ARCHITECTURE.md | System design: Perception / Orchestration / Research / Actuation / Teaching, frameworks, domain model, testing, security, roadmap. |
| 🧠 docs/DECISION-ARCHITECTURE.md | Logic gates vs. LLM — who controls what, plan→execute→verify, the autonomy policy, the "LLM council." |
| 💸 docs/MODEL-STRATEGY.md | Cost vs. capability: model routing, when a council pays off, OpenRouter vs. direct, PHI guardrails. |
| 🎓 docs/TRAINABILITY.md | Teaching it what a good tech knows — interactive teaching, the three channels, the three stores, long-tail saturation, the capability ladder. |
Swift + SwiftUI · ScreenCaptureKit (capture) · Vision (on-device OCR) ·
CoreGraphics CGEvent + visual aiming (input) · GRDB/SQLite + SQLCipher
(persistence) · tiered LLM router (cheap workhorse → top-tier judgment) over
OpenRouter / direct.
Warning
This is a macOS-native (Swift) project. It cannot be built or run in a Linux CI / web session — verifying Swift code requires a macOS toolchain.
The nine load-bearing decisions (click to expand)
- The LLM never emits a keystroke — it emits typed intents; the state machine executes them.
- plan → execute → verify — every actuation is bracketed by verified screen reads; re-plan on divergence.
- Confidence gate — low-confidence OCR is marked uncertain and forces confirmation, never action.
- Facts come from tables, not the model — reject codes, formularies, BIN/PCN live in a queried knowledge base.
- Autonomy is E2E but earned per-workflow — gate the few dangerous writes hard; outreach is act-then-notify.
- PHI stays on-device for perception — OCR runs locally; PHI-touching LLM calls go only to BAA endpoints.
- Pure-pixels target, no backdoors — no API/DB/AX on the target; aim clicks visually, verify visually.
- Learning grows tables/macros, never weights — interactive, synthetic-first teaching → versioned Site Profiles.
- Oversight runs through companion apps — macOS/watchOS/iOS own approvals, notifications, the review queue.
Full text in CLAUDE.md.
Tip
Decided: pure-pixels interface (no backdoors/AX on target) · E2E autonomy with act-then-notify outreach · companion macOS/watchOS/iOS oversight apps · interactive teach mode (synthetic-first) as the training front door.
Still open: real research source (clearinghouse vs. payer FHIR vs. portal) ·
multi-monitor capture scope · how the existing Swift/OCR code maps into
ShieldKit · the thin hardening-capture mechanism under PHI constraints.