EquiSmile Known Issues

Phase 33 — Patrick deferred items: dental chart colouring, manual/follow-up booking, intervention catalogue (2026-06-05)

ID	Severity	Description	Resolution
PR33-DENTAL-CHART-COLOUR (A2)	Feature	The interactive `Fiche dentaire` arcade chart and the PDF report both marked a tooth with a single colour ("has a finding"), so the vet couldn't tell what kind of finding (caries vs fracture vs wave …) a tooth carried at a glance. Patrick's dental-form spec wanted the chart to convey finding type.	Resolved — new shared `lib/dental/finding-styles.ts` (single source of truth for colours + labels + the per-tooth finding extraction, mirroring the PDF's correct per-section logic incl. diastema bounding teeth + `present` gating). `components/dental/fiche/ToothChart.tsx` now colours each tooth by its dominant finding category (clinical priority order) via inline hex (no Tailwind-purge risk) and renders a legend of the finding types present; the tooltip lists every finding on a tooth. `lib/services/dental-report.service.ts::drawOcclusalChart` renders the same per-category colours + a wrapped colour legend under the chart. `FicheDentaire` swaps its flat `marked` set for `teethFindingsByCategory`. 12 new unit tests (`finding-styles.test.ts`); existing fiche tests unchanged.
PR33-MANUAL-FOLLOWUP-BOOKING (B5)	Feature	There was no fast way to create an appointment by hand (a phone booking) or to say "see this horse again in three weeks" — appointments only came from approved route runs or completion-triggered follow-up visit requests in the pool.	Resolved — new `lib/services/manual-booking.service.ts` creates a backing `VisitRequest` (`BOOKED`) + `CONFIRMED` `Appointment` + optional primary staff assignment in one transaction, with the duration from the shared timing model (overridable). New `POST /api/appointments` (VET+, `createManualAppointmentSchema`). New `/[locale]/appointments/new` quick-add page (customer → yard scoped to that customer → services → date/time → vet), reachable from a "New appointment" button on the appointments list. The same page doubles as the follow-up screen: the appointment detail page now has a "Book follow-up" card with 3/6/12-week presets (+ custom) that deep-link `/appointments/new` pre-filled from the visit and dated N weeks out. 7 new service tests.
PR33-INTERVENTION-CATALOGUE (B6)	Feature	The intervention catalogue was missing the types Patrick listed (Medication Delivery, Client Call, Administrative, Private), and five service durations (consultation / blood sample / osteopathy / lameness-gait / massage) were hard-coded in `visit-timing.service.ts` rather than team-editable — contradicting "make every duration editable".	Resolved — `VisitServiceType` gains `MEDICATION_DELIVERY` (on-yard, billable) + `CLIENT_CALL` / `ADMIN_TASK` / `PRIVATE` (non-visit time blocks, zero travel overhead). `PracticeSchedulingConfig` gains 11 additive columns (the 5 previously-hard-coded per-horse durations, the OTHER base + per-extra-horse, and 4 flat durations for the new types) — all defaulting to the old constants, so behaviour is unchanged for any practice that never opens the tuning UI. `visit-timing.service.ts` reads every duration from config; overhead applies only to genuine on-yard work (time blocks get none). Admin UI (`/admin/practice-config`) gains "Other on-yard services" + "Non-visit time blocks" sections; `VisitServiceChip` + `myDay.services` (EN/FR) gain the four labels. Additive migration `20260605020000_intervention_catalogue` (4 enum values + 11 columns). 11 new visit-timing tests.

Phase 31 — Emergency auto-alert + stop-automation gate (2026-06-04)

ID	Severity	Description	Resolution
PR31-EMERGENCY-AUTO-ALERT	Feature	Client requirement: urgent cases must leave the standard workflow immediately — "remove from standard workflow, notify the vet immediately, summarise, ask the vet to contact the client, stop further automated scheduling until reviewed." Before this, an URGENT enquiry created an `URGENT_REVIEW` triage task but the vet was not actively paged, and nothing structurally prevented the case from being pulled into an automated route proposal / slot suggestion.	Resolved — new `lib/services/emergency.service.ts`. When `auto-triage.service.ts` classifies a request URGENT (uncertain/unknown urgency is treated as urgent too — fail-safe towards the vet), it keeps the `URGENT_REVIEW` task and calls `emergencyService.raiseUrgentAlert`: sets `VisitRequest.automationHold = true` (+ reason + timestamp) and dispatches an immediate vet alert — a concise summary (customer, horse count, yard, message excerpt, clinical flags) plus "please contact the client directly". Alert is idempotent (`urgent-alert:<visitRequestId>`, persistent idempotency table) and never throws out of intake (per-channel + outer catch → dead-letter). Recipient resolves from `VET_ALERT_EMAIL` / `VET_ALERT_WHATSAPP`, falling back to an active VET-role `Staff` email; if none is reachable the hold still applies and the miss is dead-lettered. Stop-gate: `route-proposal.service.ts` pool query now excludes `automationHold: true`; `slot-suggestion.service.ts` returns no suggestions for a held request. Clear-hold: VET-only `POST /api/visit-requests/[id]/clear-hold` (`emergencyService.clearHold` records `automationClearedById/At`) plus a "Clear hold" control + hold banner on `/[locale]/triage`. Additive Prisma migration `20260605000000_emergency_automation_hold` adds five columns + an index. The operator-facing planning workspace (`planning.service.ts`) deliberately still shows held items so the vet sees the urgent case — only automated scheduling is gated. 16 new tests (emergency service, auto-triage gate, pool exclusion, clear-hold RBAC).

Issue #143 — Vercel preview deploy red on every PR (env-var drift) (2026-05-20)

ID	Severity	Description	Resolution
ISSUE-143-VERCEL-PREVIEW-RED	Low (cosmetic)	Vercel preview deploys persistently failed on `main` and on every PR (e.g. #141, #142). Root cause traced via the build pipeline: `vercel.json` → `scripts/vercel-build.sh` → `npx next build` → page-data collection imports server routes → routes import `lib/env.ts` → `validateEnv()` throws because `DATABASE_URL` (a `z.string().min(1)`) is unset on the Preview environment. The five-check local gate (`lint` / `typecheck` / `prisma validate` / `test` / `build`) was unaffected — the failure was Vercel-side configuration only. `docs/VERCEL.md` §5 already documents this exact failure mode and the operator-side fix.	Resolved (operator action + code-side polish). Operator action (no-code): install the Neon Vercel integration per `docs/VERCEL.md` §5.1 (writes a Preview-scoped `DATABASE_URL` per PR branch), OR add `DATABASE_URL` manually with only the "Preview" checkbox ticked. Code-side: `scripts/vercel-build.sh` now runs an upfront env probe immediately before `next build` and emits a single signposted error block (with a direct pointer to `docs/VERCEL.md` §5) instead of letting `next build` abort with a noisy Zod stack trace deep in page-data collection. `lib/env.ts` was deliberately NOT softened — that fail-fast guard exists to catch the production-secret-drift class of bug (per the AUTH_SECRET incident). `docs/HANDOVER.md` § 1 also updated to inventory previously-missing Phase 17–22 vars (`WHATSAPP_TEST_NUMBER`, `NEXT_PUBLIC_MAP_ROUTING_MODE`, `SENTRY_DSN`, `VERCEL_PREVIEW_MIGRATE`, `EQUISMILE_FORCE_DEMO_SEED`).

Phase 30 — Phase 2 build, 10 rounds, 11 slices (2026-05-26 → 2026-05-27)

ID	Severity	Description	Resolution
PR30-QUICK-ANSWER-MODE	Feature	Vet had only two paths to reply to an enquiry — pre-built templates (slow for multi-question replies) or pure free-text typing on a phone between yards (also slow). Kathelijne planning call 2026-05-26: "I get a simple question, AI can answer this."	Resolved Round 2 (#163) — new `/[locale]/enquiries/[id]/answer` single-screen UI generates three AI-drafted reply options (Direct / Friendly / Detailed) via Claude Haiku, pre-fills the editor, vet edits + taps Send. Curated FAQ match surfaces as a fourth card when confidence ≥ 0.55 (Round 10 integration with § 2.11). Hard rule preserved: AI never sends; vet's tap is the only outbound path. DEMO_MODE / no-key returns deterministic mocked drafts. sessionStorage caches drafts 1h per enquiry id.
PR30-FICHE-DENTAIRE-FORM	Feature	`DentalChart` only stored `generalNotes` (free text). Vets had to rewrite the same 11-section dental form from scratch each visit; no structured query path for later. Real `Fiche dentaire` PDF + flowchart docx shared by Kathelijne 2026-05-26.	Resolved Round 3 (#164) — new 11-section structured form at `/[locale]/horses/[id]/dental-charts/new` (Surdents / Vagues / Pointes / Escalier / Diastèmes / Slabfracture / Caries infundibulaires / Canaux pulpaires cariés / Fracture dentaire / Dent mobile / Caries périphériques). Each section defaults to "Non", flips to "Oui" auto-opens detail. Triadan tooth picker, AI prefill from pasted text, persisted as `DentalChart.checklist` JSONB. Round 10 added structured-finding summary badges on past-chart rows in `HorseClinicalHistory`.
PR30-LOST-MONEY	Feature	Kathelijne's most emotional planning-call moment: "Where I think we lose a lot of money today — we forget to invoice medication that we dropped off." Real recurring financial loss because no system captured the act of "I dropped off X for Y mid-route".	Resolved Round 4 (#165) — new `SelfInvoiceTask` model + intake routing. Vet WhatsApps the practice number from her own phone ("dropped off 100ml Equimax for Cajoleur 45chf"); intake matches sender to `Staff` row, runs the text through Claude Haiku → fuzzy-matches Customer + Horse → PENDING task on `/[locale]/admin/self-invoice`. Vet edits + Confirms → real `Invoice` issued via existing `invoiceService.issue()`. Reject preserves the row for audit. Round 10 integration: voice notes with `wakeIntent=PRESCRIPTION` skip this path (clinical, not billable).
PR30-VOICE-INTAKE	Feature	WhatsApp `type=audio` messages were silently skipped at intake (`reason: 'non-text-message'`). Voice notes were invisible to the app. Kathelijne wanted voice-to-invoice + voice-to-prescription.	Resolved Round 9 (#170) + Round 10 routing — new `voiceTranscriptionService` (mock-fallback by default; production Whisper is a one-function drop-in) + pure `detectWakeWord` function. Intake transcribes the audio, runs wake-word detection, persists transcript + intent on `EnquiryMessage`. UI shows "Voice note" + "Invoice trigger" / "Prescription trigger" badges in the message thread. Round 10 wired PRESCRIPTION-intent routing to skip self-invoice creation so clinical notes don't become billable lines.
PR30-VET-PAIRING	Feature	Route planner produced one RouteRun per cluster regardless of yard size. Kathelijne's rule from the call: "3+ horses at a stable, both vets go together; 1-2 horses, we split up."	Resolved Round 5 (#166) — `planPairing` pure function in `lib/services/vet-pairing.service.ts` classifies stops as joint vs solo, distributes solo postcode-aware. When joint stops exist + ≥2 active vets exist, planner emits N parallel `RouteRun` rows sharing a `parallelGroupId`; joint stops marked `isJoint=true`. UI shows lead-vet pill + "Parallel route" + "Joint stops" badges on each card. Single-vet practice collapses to one run. Threshold + vets-per-joint tunable via Practice Config.
PR30-VISIT-TIMING-DRIFT	Bug	Three places in the codebase computed visit duration differently — `triage-rules.estimateDuration` used 30+25(n-1), route planner used n×30+15, route-constraints had unused constants. For a 4-horse dental visit Kathelijne's actual time is 150 min (15+4×30+15); the planner used 135, triage used 105. Days regularly ran past 17:30 in real operations.	Resolved Round 1 (#161) — single source of truth at `lib/services/visit-timing.service.ts:calculateVisitDurationWithConfig()`. Triage + route planner both call it. Replaced four inline formulas in `route-proposal.service.ts` with `v.estimatedDurationMinutes` reads. New `VisitServiceType` enum + `VisitRequest.services` field so dry-needling, vaccination, OTHER are addressable per visit (not just dental).
PR30-CONFIG-TUNABLE	Feature	All scheduling thresholds + timings were code constants in `lib/config/route-constraints.ts`. The practice couldn't tune them without a code change.	Resolved Round 1 (#161) + Round 10 admin UI — singleton `PracticeSchedulingConfig` row holds dayStart/End, lunch, max travel/yards/horses, all six timing coefficients, joint-visit threshold, vets-per-joint. `lib/services/practice-config.service.ts` reads via 30s in-process cache; defaults match the legacy constants so behaviour is unchanged for any practice that never opens the admin UI. New `/[locale]/admin/practice-config` page (ADMIN+) editable form propagates changes within 30s.
PR30-VETUP-IMPORT	Feature	Real VetUp PMS export Kathelijne shared (2,277 horses × 24 columns) was 60% covered by EquiSmile's schema. Production cutover required full data-shape parity.	Resolved Round 6 (#167) — additive schema parity (12 new nullable Customer + Horse columns, two new enums `AnimalSpecies` + `AnimalSex`, both with unique `vetup*Id` keys for idempotent re-import). CLI-only import at `scripts/import-vetup.ts` with `--dry-run` support. Path-safety guard refuses to read CSV files inside the repo (PII guard). Liberal date parsing for VetUp's French DD.MM.YYYY format. Defensive merge: manually-curated email/phone never erased by blank import values.
PR30-SLOT-SUGGESTION	Feature	Vet picked slots manually from the calendar. Contract § 4.2 carry-over: "system proposes morning/afternoon slots in regional rounds, for vet approval".	Resolved Round 7 (#168) — `slotSuggestionService.suggest()` computes 1–3 ranked options per visit request. JOIN_EXISTING (non-booked RouteRun within 25km in next 60 days, scored by distance + preferred-day match); NEW_DAY fallback (next preferred-day date in the window). Inline expandable panel on every `/visit-requests` row. Results cached on `VisitRequest.suggestedSlots` so re-renders are cheap.
PR30-FAQ-AI	Feature	Kathelijne wanted AI to handle common-question replies but without autonomous send (Patrick's constraint). Path forward: curate the practice's approved answers + match incoming enquiries to them.	Resolved Round 8 (#169) — new `FaqEntry` model (key, topic, aliases, EN + FR answers, audit). Two-pass matcher: lexical pre-filter (top 6 candidates by alias word overlap) → Claude Haiku picks the best (or null) with 0..1 confidence. Below 0.55 = no suggestion. 5 seeded starter FAQs (price.routine-dental, coverage.area, service.pony, emergency.process, service.frequency). Admin UI at `/[locale]/admin/faqs`. Round 10 integration: Quick Answer Mode surfaces the matched FAQ as a fourth "★ Curated answer" card.
PR30-ONPREM-VARIANT	Infrastructure	Kathelijne preferred on-prem hosting per the 2026-05-26 call ("I don't like the cloud very much").	Scoped + built Round 10 (#172) — `docker-compose.onprem.yml` packages the full stack (postgres / redis / app / n8n / caddy / cloudflared / backup-runner) for a CHF 700–1000 mini-PC. `docs/ONPREM_SETUP.md` is the 9-section one-day install runbook. Cloud variant stays canonical until Kathelijne opts in post-UAT — see `docs/ARCHITECTURE_ONPREM.md § 11` for the decision framework.
PR30-SCHEMA-MERGE-DROPPED-MODEL	Build hygiene	Multi-PR merge of Rounds 4 + 5 + 6 dropped the `SelfInvoiceTask` model definition while preserving the back-relations on Customer / Horse / Staff. Caught by `prisma generate` failing with "Type SelfInvoiceTask is neither a built-in type nor refers to another model".	Restored Round 10 — `SelfInvoiceTask` model put back into `prisma/schema.prisma`. Migration unchanged because Round 4's migration is still applied (the loss was in the schema file only). Lesson logged in `.claude/memory.md`: after multi-PR merges, `git grep "^model <NewModel>"` for each Phase-introduced model before `prisma generate`.
PR30-GITGUARDIAN-FALSE-POSITIVE	Build hygiene	GitGuardian flagged `docker-compose.onprem.yml` lines 33 + 105 + 76 as "Generic Password" — the literal text inside Compose required-var syntax (`${VAR:?MESSAGE}` form) was matched as a credential when the MESSAGE happened to describe how to generate the secret.	Resolved Round 10 (commit 647555c) — replaced the descriptive error messages with the bare keyword `required`. Compose behaviour unchanged (same fail-fast on missing env var). Lesson logged in `.claude/memory.md`. Future docs that describe this fix should paraphrase the original strings rather than quote them, to avoid re-tripping the same scanner on the doc PR.

Phase 29 — Free-text reply UI + Triage in desktop sidebar (2026-05-22)

ID	Severity	Description	Resolution
PR29-NO-FREE-TEXT-REPLY	Medium	Kathelijne flagged during the 2026-05-21 demo: "how do you just basically respond to an email or whatsapp — that's basic functionality". The app only had the four pre-approved template replies on `/en/triage`, and `/en/triage` itself wasn't in the desktop sidebar (only the mobile nav), so even that path was hard to find.	Resolved — new `FreeTextReplyComposer` on the enquiry detail page lets a NURSE+ operator type a reply, see a live 24-hour WhatsApp customer-service-window indicator, and send via `POST /api/enquiries/[id]/reply` → `replyService.sendReply`. Channel is chosen automatically (enquiry's own channel first, then customer's preferred, then whichever contact is populated). Outbound WhatsApp messages are logged to `EnquiryMessage` via the existing `messageLogService` path inside `whatsappService.sendTextMessage`; the operator action is recorded in `AuditLog` as `ENQUIRY_FREE_TEXT_REPLY_SENT`. Outside the 24h WhatsApp window the service refuses and the UI guides the operator to the template-reply path on `/triage`. Sidebar fix: `Triage` is now in the desktop sidebar between `Inbox` and `Enquiries` (the i18n key `nav.triage` already existed). 24 new tests (window util ×7, service ×10, API endpoint ×7) plus all 1415 prior.

Phase 28 — DLQ visibility + replay for failed inbound webhooks (2026-05-21)

ID	Severity	Description	Resolution
PR28-INBOUND-FAILURES-VANISH	High	The WhatsApp webhook responded 200 to Meta within ~50ms and then ran `processWhatsAppPayload(payload).catch(logger.error)` asynchronously. Any database failure inside the async chain — Neon cold-start, Prisma pool timeout, transient network blip, malformed Prisma query — was logged and then silently swallowed. Meta would not retry (it saw 200), and the customer's message would simply never appear in `/inbox`. Confirmed live during the 2026-05-21 client demo: webhook logs showed the message being parsed correctly, followed by `Can't reach database server at ep-green-dust...neon.tech:5432`, after which the message was lost forever. The email webhook had the same shape — n8n would retry, but only within its configured retry budget.	Resolved — webhook routes now enqueue async failures to the existing `FailedOperation` DLQ (Phase 14 PR D) with scopes `whatsapp-inbound` / `email-inbound`. `deadLetterService` gains a `replay(id)` method that re-runs the original intake for these scopes (outbound scopes still require manual mark-replayed). New `POST /api/admin/observability/failed-operations/[id]/replay` endpoint (ADMIN-only, audit-logged) drives the new "Replay" button on the `/admin/observability` DLQ table. Replay is idempotent — the intake services dedupe by `externalMessageId`, so re-running a row whose underlying enquiry already exists just flips the row to REPLAYED without creating duplicates. 17 new tests cover the service (6 replay branches), webhook wiring (4 scenarios), and the API endpoint (6 status paths). The failure case that bit us in the demo — Neon cold-start kills async intake — would now produce a visible PENDING row in `/admin/observability` with a "Replay" button.

Phase 27 — Demo simulator actually writes to /inbox (2026-05-21)

ID	Severity	Description	Resolution
PR27-DEMO-SIM-DOES-NOT-WRITE-TO-INBOX	High	The `/en/demo` page buttons ("Simulate WhatsApp EN/FR", "Simulate Email EN/FR") generated realistic inbound webhook payloads and returned them as JSON, but never fed those payloads to the intake pipeline. Net effect: clicking the buttons displayed a JSON blob in the Results panel but nothing appeared in `/en/inbox` or anywhere else in the app. The architectural gap surfaced on 2026-05-21 during a live client demo when the real-WhatsApp path was blocked by Neon Free-tier auto-suspend AND the in-app simulator was the supposed fallback. Both paths failed; the demo's WhatsApp flow could not be shown.	Resolved — extracted the intake logic from `app/api/webhooks/whatsapp/route.ts` and `app/api/webhooks/email/route.ts` into `lib/services/whatsapp-intake.service.ts` (`processWhatsAppPayload`) and `lib/services/email-intake.service.ts` (`processEmailPayload`). The webhook routes are now thin wrappers — signature/rate-limit/key checks delegate to the service. The demo simulator endpoints (`/api/demo/simulate-whatsapp`, `/api/demo/simulate-email`) call the same service after generating the payload, so the message goes through the full dedup → customer resolution → enquiry create → message log → appointment matching → yard/horse matching → visit-request → auto-triage pipeline and lands in `/inbox`. The demo page UI now shows a "Created enquiry for X — Open in Inbox" summary line above the raw JSON (collapsed in a `<details>`). 22 webhook + simulator tests pass (all 1398 in the suite). Behaviour for real Meta webhooks is byte-identical; the route still calls `processWhatsAppPayload` from the same async-fire-and-forget catch as before.

Phase 26 — Commercial paperwork for client demo (2026-05-20, revised 2026-05-21)

ID	Severity	Description	Resolution
PR26-CONTRACT-DRAFT-FROM-PDF-WITH-PATRICK-V2-EDITS	Medium	Patrick's 2026-05-19 second pass on the EquiSmile contract raised four points against the in-flight revised draft (FH-ES-2026-004, 15 May 2026): (1) soften "field-service operations platform" wording, (2) add a phased roadmap section (Phase 1 MVP + stabilisation 40-55h / 30-day target, Phase 2 semi-automation, Phase 3 advanced optimisation), (3) extend the 30-day warranty to a 90-day warranty + stabilisation period and introduce an optional maintenance retainer, (4) clarify running-cost disclosure including whether the practice needs to invest in hosting. The first cut at the v2 draft in this repo was structured loosely against the PDF; the canonical draft for client review needed to preserve the PDF's section structure so the supplier and client can read the same numbering.	Resolved — `docs/CONTRACT_DRAFT_v3.md` (FH-ES-2026-005, 21 May 2026) supersedes both FH-ES-2026-004 and the in-repo v2 draft. Preserves the PDF's section structure (renumbered for two inserts) and applies all four Patrick points: § 1 wording softened ("AI-assisted workflow and scheduling system"); new § 4 three-phase roadmap (indicative, not committed); § 7.2 warranty rewritten as 90-day warranty + stabilisation with explicit inclusions (bug fixes, workflow adjustments, refinements, real-world edge cases) and exclusions (major new features, Phase 2 items, architectural redesign, external API changes outside supplier control); new § 8 post-delivery support model with three retainer tiers (Light CHF 200 / Standard CHF 350 / Premium CHF 600) plus CHF 100/hr hourly alternative; § 6 running-cost lead paragraph answering "does the practice need hosting?" directly. Appendix A carries a change-history table mapping every diff vs FH-ES-2026-004 back to which Patrick point drove it. Companion docs (`docs/RUNNING_COSTS.md`, `docs/SUPPORT_MODEL.md`, `docs/CLIENT_DEMO_DAY.md`) cross-references updated to point at v3 section numbers. CLAUDE.md updated. Previous v2 draft deleted.
PR26-DEMO-PATH-WAS-LAPTOP-BOUND	Low	`docs/DEMO_RUNBOOK.md` documented the Pinggy-tunnel demo path, which requires the developer's laptop to stay on, the tunnel to stay up, and the demo to happen synchronously. For a real client-facing first-demo session this is brittle: the client can't re-open the URL later, the tunnel can flake, and the demo URL changes between sessions.	Resolved — `docs/CLIENT_DEMO_DAY.md` documents the Vercel-preview path as the primary client-demo arrangement. Branch `demo/<client>-<date>` off main, Vercel Preview scope with `DEMO_MODE=true` + `VERCEL_PREVIEW_MIGRATE=true` + `EQUISMILE_LIVE_MAPS=true` + Maps keys, leave `AUTH_URL`/`NEXT_PUBLIC_APP_URL` unset (Vercel auto-derives). The runbook includes pre-flight checklist, 25-min walkthrough script, anticipated Q&A bank, mid-demo recovery table, and post-demo follow-up steps. `docs/DEMO_RUNBOOK.md` remains the laptop-bound fallback.

Phase 25 — Build hardening: SKIP_ENV_VALIDATION honoured at module-import time (2026-05-19)

ID	Severity	Description	Resolution
PR25-BUILD-IGNORES-SKIP-FLAG	Medium	`SKIP_ENV_VALIDATION=true npm run build` failed locally and in CI for every PR going back to at least the Phase 22 batch (this was the "pre-existing failure" footnote in PRs #148 and #149). Root cause: `lib/env.ts` validated `process.env` against the Zod schema at module-import time via `export const env = validateEnv()`. The `SKIP_ENV_VALIDATION=true` flag only short-circuited the standalone `scripts/check-env.ts` validator (which runs as a separate node process before `next build`), NOT the module-level validation triggered when `next build` evaluated route modules during page-data collection. Net effect: the build threw "Environment variable validation failed: DATABASE_URL is required" the moment any route module imported `lib/env`, with the misleading "Failed to collect page data for /api/appointments/[id]/cancel" wrapper. Every PR with five-check-gate language was operating with a 4-of-5 gate, not 5-of-5.	Resolved — `lib/env.ts` `validateEnv()` now checks `SKIP_ENV_VALIDATION` at the top of the function. When the flag is set, the validator supplies a placeholder `DATABASE_URL=postgresql://skip:skip@localhost:5432/skip` (only if DATABASE_URL is unset) and parses with Zod's `.optional().default(…)` fields filling the rest — no exception thrown. A `console.warn` fires so a production-runtime leak of the flag is loud (suppressed in tests). Pinned by 5 new tests in `__tests__/unit/lib/env-skip-validation.test.ts` covering: regression guard (throws when flag unset + DATABASE_URL missing), fix (does not throw when flag set + DATABASE_URL missing), placeholder application, real-DATABASE_URL preservation alongside the flag, normal-flow unchanged. Production runtime semantics identical to before — real Vercel / Docker production builds never set the flag and validate normally.

Phase 24 — Operator readiness: UAT refresh + DR drill + operator quick-start (2026-05-19)

ID	Severity	Description	Resolution
PR24-STALE-UAT	Medium	`docs/UAT_v2_VALIDATION.md` (2026-05-07) was the most recent UAT pass, against commit `7cb7efb`. Phases 17–23 have shipped since — maps cost control, unified inbox + IMAP, CSV import, WhatsApp simulator, RouteMap DirectionsService fix, Sentry sink option, WhatsApp token boot probe, pre-migrate snapshot, SW cache verification + VersionBanner. A future UAT pass had no plan for what to re-test vs. what's newly testable.	Resolved — new `docs/UAT_v3_REFRESH.md`. Lists the delta from v2 per phase, updates the status of v2's three defects (D-2 status-check, D-3 closed by Phase E `/recalls`, D-4 status-check), and adds 14 new test cases across four new sections (Maps cost, Inbox/IMAP, Admin tools, Observability/PWA). Total v3 matrix is 39 cases across 9 sections. The doc is a plan for the next live UAT pass, not a fresh validation — the actual execution needs a live deploy URL.
PR24-NO-DR-DRILL	Medium	`docs/BACKUP.md` § 4 + § 7 and `docs/OPERATIONS.md` § 4 documented restore procedures + the weekly automated `backup-restore-verify.sh` smoke test, but there was no operator-facing rehearsal book. Operators had no muscle memory for DR scenarios because nothing said "on a Tuesday morning, run these three drills."	Resolved — new `docs/DR_DRILL.md` with three quarterly-cadence rehearsal scenarios: Drill A (bad migration, uses Phase 22 pre-migrate snapshot, RTO 30 min), Drill B (disk lost overnight, uses Phase 16 nightly dump, RTO 2 h, RPO ≤ 24 h), Drill C (weekly verify failed, the meta-recovery drill that protects the recovery path itself). Each drill has scenario narrative, RTO/RPO targets, step-by-step rehearsal procedure, success criteria, and a common-failure table that maps rehearsal gotchas to production incident causes. The doc cross-references BACKUP.md + OPERATIONS.md as the reference manual rather than duplicating them.
PR24-NO-QUICKSTART	Medium	A new operator handed EquiSmile had to read 12+ docs in the order set by `CLAUDE.md`'s doc-first principle to know what to do on day 1, week 1, month 1. There was no single-page index that linked the existing runbooks in operational order without duplicating them.	Resolved — new `docs/OPERATOR_QUICKSTART.md`. One-page checklist with three time horizons (day 1: get the stack up, verify probes, sign in — 8 steps; week 1: load real data, start Meta approval, walk the simulator with Kathelijne — 9 steps; month 1: Meta cutover, first DR drill, spend baseline establishment — 10 steps). Explicit stop conditions per horizon. Standing-state reference table linking each operational topic to its canonical doc. Emergency-contacts sequence (5 scenarios → 5 doc references). Every step links to a deeper runbook for the actual procedure — this doc is the index.

Phase 23.1 — Live maps end-to-end fix (2026-05-19)

ID	Severity	Description	Resolution
PR23.1-YARD-PATCH-NO-REGEOCODE	High	`PATCH /api/yards/[id]` updated address fields without re-geocoding. The DB held stale `latitude`/`longitude`, so the route map kept rendering pins at the pre-edit location indefinitely. Compounded by `geocodingService.geocodeYard()` short-circuiting on `geocodedAt` alone — even a direct call after the edit was a no-op.	Resolved — PATCH now diffs the incoming payload against the row's existing address fields; on change it clears `latitude/longitude/geocodedAt/formattedAddress` and fires `geocodingService.geocodeYard(id)` fire-and-forget. The service's short-circuit now also requires the persisted `formattedAddress` to contain the current postcode, so address edits always re-run the geocoder. New regression test in `__tests__/unit/services/geocoding.service.test.ts`.
PR23.1-YARD-POST-NO-GEOCODE	High	`POST /api/yards` returned 201 with `latitude/longitude = null`. New yards never appeared on the route map until someone hit the batch endpoint or the route planner ran.	Resolved — POST handler now fires `geocodingService.geocodeYard(id)` fire-and-forget after create. Failures still flip `geocodeFailed=true` and surface on `/admin/maps-usage`; they never block yard creation.
PR23.1-GEOCODE-REGION-GB	Medium	`geocodingService.geocodeAddress()` passed `region=gb` to Google Geocoding. EquiSmile is Swiss; short Swiss postcodes (e.g. `1807`) could be biased toward GB look-alikes. `google-maps.client.ts` already used `ch` correctly — this was a one-line drift.	Resolved — region is now `ch` in both call sites.
PR23.1-DEMO-ROUTES-NO-STOPS	Medium	`POST /api/demo/generate-routes` created the `RouteRun` summary but never inserted any `RouteRunStop` rows. The `/route-runs/[id]` page rendered an empty stops list and a map with zero pins, so the demo's most visible button looked broken.	Resolved — endpoint now creates `RouteRunStop` rows inside a transaction with `routeRun`, mapping each ordered waypoint back to its source `VisitRequest` by coords and computing planned arrival/departure from leg durations anchored to 08:00.
PR23.1-PUBLIC-HOME-BASE-MISSING	Low	`app/[locale]/route-runs/[id]/page.tsx` read `NEXT_PUBLIC_HOME_BASE_LAT/LNG` but `.env.example` only documented the server-side `HOME_BASE_LAT/LNG`. The map always fell back to a hardcoded Blonay default; operators couldn't relocate the H marker without grepping the source.	Resolved — added `NEXT_PUBLIC_HOME_BASE_LAT/LNG` to `.env.example` next to their server-side twins with a comment explaining the duplication.

See docs/LIVE_MAPS_TEST_CHECKLIST.md for the end-to-end verification runbook and sample data.

Phase 23 — Go-live runbooks: WhatsApp Meta production approval + production data load (2026-05-16)

ID	Severity	Description	Resolution
PR23-NO-META-APPROVAL-DOC	Medium	`docs/OPERATIONS.md` § 1 covered WhatsApp token rotation after Meta approval but there was no operator-facing runbook for the externally-blocked work that gets you there: business verification, display name approval, template submission per template per locale, system-user token mint, webhook + verify-token install, phased cutover. Meta review is the longest external lead time on the go-live critical path (1–2 weeks typical) and "guess and check" was actively burning that timer.	Resolved — new `docs/WHATSAPP_PRODUCTION_APPROVAL.md` (10 sections). Covers the full sequence end-to-end: Swiss business verification documents (Handelsregisterauszug, VAT/UID, authorised signatory), Meta Business account, WABA creation, display name approval, all nine `lib/demo/template-registry.ts` templates × EN/FR (18 submissions) with category guidance + common rejection causes, system-user permanent token mint (cross-references `docs/OPERATIONS.md` § 1.2 rather than duplicating), webhook + verify-token install in the Meta App Dashboard, phased cutover via the Phase 20 simulator's "Send to me (real)" path, rollback via `DEMO_MODE=true`, and a failure-mode quick-reference table.
PR23-NO-DATA-LOAD-DOC	Medium	`docs/IMPORT_GUIDE.md` covered the mechanics of `/admin/import` (dry-run + commit, conflict policies, column reference) but not the upstream prep: source-data inventory across Kathelijne's existing book shapes (VetUp export, Outlook contacts, appointment diary, WhatsApp history, handwritten yard notes), practice-specific dedup + mapping decisions, data-quality pre-checks, post-load verification queries, rollback paths. The Phase 20 import UI was operationally invisible without this runbook.	Resolved — new `docs/PRODUCTION_DATA_LOAD.md` (9 sections). Covers source-data inventory, the practice-specific mapping decisions IMPORT_GUIDE deliberately stays generic about (couple-vs-single legal-entity for customers, E.164 Swiss numbers, francophone-vs-anglophone preferred language, when to leave Lat/Lng blank, owner-vs-yard-manager distinction for horses), pre-load data-quality checks, the customer→yard→horse load procedure with a manual pre-migrate snapshot bracket, a post-load SQL verification rollup, three-tier rollback (re-import / manual snapshot / nightly backup window), and a common-gotchas table.

Phase 22 — Audit tail: WhatsApp token boot probe + pre-migrate snapshot + SW cache verification (2026-05-16)

ID	Severity	Description	Resolution
PR22-MED05-WA-TOKEN-PROBE	Medium	`WHATSAPP_ACCESS_TOKEN` could be revoked by Meta (rotation forgotten, business-account owner removed, suspected compromise) and the app would only discover it when the first outbound confirmation failed — often hours later. No boot-time signal existed; `docs/OPERATIONS.md` §1 only covered the manual rotation procedure.	Resolved — new `lib/services/whatsapp-token-probe.service.ts` fires once per server start from `instrumentation.ts`. Makes a single non-state-mutating `GET https://graph.facebook.com/v21.0/<phone_number_id>` with a 5-second timeout. On HTTP 401 writes `AuditLog{action:'WHATSAPP_TOKEN_INVALID'}` and sends a once-per-UTC-day alert email via `emailService.sendBrandedEmail` to `MAPS_ALERT_EMAIL` (re-uses the Phase 17 `maybeFireSoftCapAlert` dedup pattern). Transient failures (5xx, network errors) log but never alert. Demo mode skips the probe entirely.
PR22-LOW01-PRE-MIGRATE-SNAPSHOT	Low	The nightly `backup` compose service runs `pg_dump` at 02:30 UTC. A destructive migration deployed at 14:00 left up to a 23-hour data-loss window before the most-recent dump aged out — the operator's only recovery path was a backup from the previous day.	Resolved — new `pre-migrate-snapshot` compose service (postgres:16-alpine) runs `docker/pre-migrate-snapshot.sh` once on every `docker compose up` immediately before `migrator`, writing a labelled `pre-migrate-<UTC>.sql.gz` into the shared `backups_data` volume. `migrator` now `depends_on: pre-migrate-snapshot: service_completed_successfully`, so the snapshot lands before any schema change. Skips cleanly on first-ever boot (empty schema). Same safety guards as the nightly backup (libpq `.pgpass`, narrow env-var whitelists, no password literals in commands). Retention reuses the nightly backup's `BACKUP_RETENTION_DAYS` sweep. Runbook in `docs/BACKUP.md` §7.
PR22-LOW03-SW-CACHE-VERIFY	Low	The audit asked "does Serwist actually invalidate after deploy?". The answer for next-navigation loads is yes — hashed-asset `__SW_MANIFEST` + `skipWaiting: true` + `clientsClaim: true` is canonically invalidation-safe. The gap was a tab that stayed open across a deploy (Kathelijne's inbox sitting open all day): the SW installs the new bundle but the existing tab silently runs the old code until the user reloads.	Resolved — verification documented in `docs/OPERATIONS.md` §7. Defensive open-tab safety net shipped on top: `scripts/write-version.ts` stamps `public/version.json` with the Git SHA at `prebuild` time; new `components/system/VersionBanner.tsx` polls `/version.json` every 5 minutes (cache-busted), captures the bootstrap SHA on first poll, and surfaces a non-modal `<div role="status" aria-live="polite">` banner with a "Refresh" button when the SHA changes. Skipped when the bootstrap SHA is `'dev'` (no production build). EN + FR i18n keys under `version.*`.

Phase 21 — Audit residue: Sentry option + pool-param boot warning (2026-05-15)

ID	Severity	Description	Resolution
PR21-HIGH02-SENTRY	High	The 2026-04-18 production-readiness audit asked for Sentry-grade error tracking. Phase 16 shipped a generic webhook error sink (`lib/observability/webhook-error-sink.ts`) which covers Slack / Teams / generic log collectors — but a practice that explicitly wants Sentry's grouped issues + breadcrumbs + release tracking had no first-class path.	Resolved — new `lib/observability/sentry-error-sink.ts`. When `SENTRY_DSN` is set AND `@sentry/nextjs` is installed (operator opts in via `npm install @sentry/nextjs`), a Sentry sink registers alongside the existing webhook sink — both fire in parallel. When the SDK isn't installed the sink logs a one-time stderr warning and falls through to the webhook path. `@sentry/nextjs` stays an OPTIONAL operator install — no new hard dependency. Operator runbook in `docs/OPERATIONS.md` §6.
PR21-HIGH05-POOL-PARAMS	High	`DATABASE_URL` could ship without Prisma pool-tuning params (`connection_limit` / `pool_timeout`), leaving the app on Prisma's default 5-connection pool. Under concurrent load (WhatsApp webhooks + n8n callbacks + UI traffic) the pool would silently exhaust and requests would time out. Phase 15 documented the recipe in `docs/OPERATIONS.md` §2 but didn't enforce it.	Resolved — `lib/utils/env-check.ts` now warns at boot when `DATABASE_URL` lacks the pool params (non-demo mode only). `/api/status` exposes `probes.database.poolConfigured` + `poolMissing[]` so the gap is visible on the observability admin page. `.env.example` shows the recommended URL with `?connection_limit=10&pool_timeout=10`. The URL is never silently mutated — the operator decides.

Phase 20 — Template UX + customer-DB import + WhatsApp simulator + road routing (2026-05-13)

ID	Severity	Description	Resolution
PR20-TEMPLATES-RAW	Medium	The `/admin/templates` editor showed raw positional `{{1}}` / `{{2}}` placeholders and required a manual Save click per language pane. Non-technical operators (vet) found the screen intimidating.	Resolved — `components/admin/TemplatesAdmin.tsx` rewritten with click-to-insert placeholder pills (per-template toolbar), debounced auto-save, live validation badges (ok / missing / unknown), and a "Preview as customer" panel that renders against real DB data. Storage format unchanged (Meta API still gets positional placeholders). New `lib/utils/template-placeholders.ts` provides a round-trip-tested `{{N}}` ↔ `[name]` serialiser. New DELETE endpoint + `messageTemplateService.deleteOverride()` for the Reset-to-default button.
PR20-NO-IMPORT	High	The practice could `Export Customers` / `Export Yards` / `Export VetUp CSV` from the Customers page but had no way to upload an existing customer database — bulk onboarding required manual UI clicks per customer.	Resolved — new `/[locale]/admin/import` admin page with drag-drop CSV upload, profile selector (customers / yards / horses), dry-run preview table, and three conflict policies (skip / update / abort). Round-trips with the existing VetUp export schema. New `csv-parse.service.ts` + `csv-import.service.ts` + `app/api/admin/import/{preview,commit}/route.ts`. Atomic `$transaction` commit; ADMIN-only; file SHA-256 + per-action counts written to `AuditLog{action:'IMPORT_RUN'}`; uploaded file is NOT persisted on disk. New runbook `docs/IMPORT_GUIDE.md`.
PR20-NO-SIMULATOR	Medium	No way to test what a WhatsApp template would look like for a real customer without actually sending — the only options were "send to a real customer" or "wait until production exercises it".	Resolved — new `/[locale]/admin/simulator` page lets the admin pick template + locale + customer + (optional) appointment and see the rendered output. "Simulate send" never touches Meta and writes a `TEMPLATE_SIMULATED` audit row. "Send to me (real)" is gated on `WHATSAPP_TEST_NUMBER` env var, rate-limited 3/hour per admin, audited as `TEMPLATE_TEST_SENT`. Renders use `lib/services/template-render.service.ts` (shared with the templates editor preview).
PR20-MAP-CROSSES-LAKE	Medium	`components/maps/RouteMap.tsx` drew a `geodesic: true` straight-line `Polyline` between yards. For Vaud-side and Valais-side practice yards, the line went straight through Lake Geneva instead of routing around it via Lausanne / Évian.	Resolved — `RoutePolyline` replaced by `RouteDirections`, which uses Google's client-side `DirectionsService` (no server quota cost; included in the Maps JS API base load) for road-following polylines per leg. Per-leg results cached in `sessionStorage` keyed by `lat,lng→lat,lng` so revisits don't re-request. Falls back to a fainter geodesic line on per-leg failure. New `NEXT_PUBLIC_MAP_ROUTING_MODE` env var (`directions` in production, `straight` for demo deploys with synthetic coordinates). The optimizer's own time/distance estimates (Phase 5 + Phase 17) are unchanged — the map is a visualisation layer only.

Phase 19 — Outlook setup + scope clarifications + handover runbook (2026-05-13)

ID	Severity	Description	Resolution
PR19-OUTLOOK-NO-DOC	Low	Phase 18 wired Gmail through the existing n8n IMAP workflow but no operator documentation existed for pointing the same workflow at Outlook / Microsoft 365. The build-update doc had marked Outlook inbound as defer-to-follow-up.	Resolved — new `docs/OUTLOOK_INBOUND.md` with full IMAP + app-password setup runbook against Outlook / 365. Covers troubleshooting, "running Gmail AND Outlook simultaneously" pattern (deduped by `Message-ID` at the webhook level), and a sketched OAuth2 / Microsoft Graph upgrade path for when IMAP becomes untenable. No code changes — the existing `emailReadImap` node is provider-agnostic.
PR19-SCOPE-AMBIGUITY	Medium	Patrick's 2026-04-12 consultant review surfaced six pointed questions about what "appointment management" actually means in MVP. Recurring stakeholder ask. The contract excluded automatic time-slot proposals (§ 3.3), but the boundary between "regional grouping" and "route optimisation" was never recorded in writing for the practice operator.	Resolved — new `docs/SCOPE_CLARIFICATIONS.md` answers each of the six questions point-by-point against the as-built state (Phase 18). Records the MVP positioning ("intelligent workflow automation and scheduling assistant", not autonomous scheduler), the in-scope vs. out-of-scope register, and a path-to-yes sketch for auto AM/PM slot suggestion if Kathelijne finds manual selection painful in practice. Establishes that scope changes require a contract amendment, not a doc update.
PR19-H06-NO-HANDOVER	Medium	Contract row H-06 (source-code transfer to the practice-owned GitHub account) was marked Pending in the April-12 build-update doc. No runbook existed; the procedure lived in tribal knowledge.	Resolved — new `docs/HANDOVER.md` covers the full transfer including pre-transfer secret inventory (~40 env vars cross-referenced to `lib/env.ts`), external integration inventory (Meta WhatsApp webhook, Vercel, n8n credentials, Anthropic billing, Google Maps API key, GitHub OAuth app, GitGuardian), the GitHub transfer itself, post-transfer verification checklist, and a rollback plan (GitHub transfers are reversible within 48h by the new owner).

Phase 18 — Unified inbox + journey-planner reorder (2026-05-13)

ID	Severity	Description	Resolution
PR18-NO-INBOX	Medium	The April-12 stakeholder build-update doc promised a "unified inbox" aggregating WhatsApp + email. The `/enquiries` page existed as a triage queue but there was no operator-facing thread view; mobile nav surfaced the triage queue, not an inbox.	Resolved — new `/[locale]/inbox` page + `components/inbox/InboxView.tsx`. Thread-grouped (per-customer or per-sourceFrom for unknown senders), channel-segmented (ALL / WhatsApp / Email), search-debounced, mobile-first. Sidebar entry added; MobileNav now surfaces Inbox in place of the triage queue. Tapping a thread links to the existing `/enquiries/[id]` page which already renders the `EnquiryMessage` timeline.
PR18-N8N-STUBS	Medium	`n8n/02-inbound-email.json` shipped as noOp placeholder nodes. The webhook handler at `POST /api/webhooks/email` was complete and idempotent, but no actual mail flowed through it because the n8n pipeline didn't exist beyond skeleton form.	Resolved — workflow now contains real nodes: `emailReadImap` trigger (Gmail/365 compatible; credential configured in n8n UI), a Code node reshaping the IMAP item into the strict zod payload contract, an HTTP Request POSTing to `/api/webhooks/email` with `Authorization: Bearer ${N8N_API_KEY}`, and an IF branch routing failures to a logger. Shipped inactive — operator activates after configuring the IMAP credential. New static-analysis test (`__tests__/unit/n8n/inbound-email-workflow.test.ts`) fails CI if the workflow ever regresses to noOp stubs.
PR18-ROUTE-REORDER-MISSING	Medium	Patrick's feedback assumed the vet would adjust proposed routes before approval. The implementation only let the vet approve or reject — no drag-reorder, no sequence edit, no mobile-friendly affordance.	Resolved — new `PATCH /api/route-planning/proposals/[id]/reorder-stops` endpoint (atomic `$transaction` reorder of all stops; locks at APPROVED+) + new `components/route-runs/RouteRunStopsList.tsx` component with two reorder mechanisms: HTML5 drag-and-drop on the row, and up/down arrow buttons (accessible, touch-friendly, screen-reader-labelled). Reorder clears the per-stop `travelFromPrev*` figures (stale after a resequence) so the UI doesn't show misleading numbers; an operator can re-run route generation to refresh them. A `<BottomSheet>` wrapper opens the same reorderable list in a focused mobile drawer via a "Reorder stops" button visible only on `<lg` viewports.

Phase 17 — Google Maps cost-control + go-live readiness gate (2026-05-13)

ID	Severity	Description	Resolution
PR17-MAPS-UNCAPPED	High	`EQUISMILE_LIVE_MAPS=true` opened uncapped billing exposure on Google Geocoding + Route Optimization. No per-call telemetry, no daily spend cap, no operator-visible usage page. A runaway batch (or a malicious test of the geocode endpoint) could rack up unbounded cost before anyone noticed.	Resolved — new `MapsApiCall` model + `lib/services/maps-cost-tracker.service.ts` (`checkBudget` / `recordCall` / `getDailySpendUsd`). Three call sites instrumented: `googleMapsClient.geocode`, `geocodingService.geocodeAddress`, `routeOptimizerService.optimizeRoute`. Hard cap (`MAPS_DAILY_SPEND_CAP_USD`) throws `MapsBudgetExceededError` before the network call; soft cap (`MAPS_SOFT_CAP_PCT`) flags the threshold + sends a once-per-UTC-day email (`MAPS_ALERT_EMAIL`). New admin page `/[locale]/admin/maps-usage` polls `/api/admin/maps-usage` for today's spend + 7-day rollup + recent calls. Demo-mode calls are not wrapped and produce no telemetry. KI-001 (rate-limit on large batches) closes mechanically — `batchGeocodeYards` now uses budget-driven gating instead of a fixed 100ms delay.
PR17-DOCS-BACKFILL	Low	`docs/BUILD_PLAN.md` had been maintained through Phase 14; Phase 15 (Production-readiness, 2026-04-23) and Phase 16 (Overnight hardening, eight slices, 2026-04-25/27) were documented only in this file.	Resolved alongside Phase 17 entry — Phase 15 + 16 backfilled into the BUILD_PLAN phase-overview table and Phase 17 added as the new entry.

Phase 16 — Overnight hardening, eighth slice (2026-04-27)

ID	Severity	Description	Resolution
OVH8-SOFTDEL-UI	Medium	The soft-delete infrastructure shipped across PRs #51, #52, the AuditLog parity work, and the Prisma extension was operationally invisible — operators had to `curl` the DELETE endpoints. No UI button, no confirmation flow, no toast. The feature was in practice unused, leaving the AuditLog table empty and the audit story untested in production.	Resolved — new `components/ui/DeleteEntityButton.tsx` reusable component (role-aware, modal-confirmed, toast-on-result, locale-aware redirect). Wired into the four detail pages: `app/[locale]/{customers,yards,horses,enquiries}/[id]/page.tsx`. Customer/yard/enquiry require admin; horse requires vet (mirrors the API). EN + FR i18n strings added under `softDelete.*`. 12 vitest cases regress role gating (admin vs readonly/nurse/vet/no-session), the no-one-click rule, fetch wiring, success-toast-and-redirect, error-toast-and-stay, and network-throw handling.

Phase 16 — Overnight hardening, seventh slice (2026-04-27)

ID	Severity	Description	Resolution
OVH7-SETUP-EXECSYNC	Medium	PR #51 known risk #4 — `/api/setup` invoked `execSync('npx prisma migrate deploy')` and `execSync('npx tsx prisma/seed-demo.ts')` from an HTTP handler. Three problems: (1) child-process spawn from a request handler is a code-execution vector if the `DEMO_MODE` gate ever weakens; (2) `execSync` blocks the Node event loop for the full duration of the migration/seed, starving every other in-flight request; (3) error handling worked off raw stderr text, which can carry DB credentials in failure modes.	Resolved — handler rewritten to a stable 410 Gone response with operator guidance. The compose stack already runs migrations correctly via the `migrator` service; local-dev callers see `npx prisma migrate deploy && npx tsx prisma/seed-demo.ts` in the response body. `DEMO_MODE` gate retained as defence-in-depth. New `__tests__/unit/api/setup.test.ts` (5 cases) including a static-analysis regression that fails CI if `child_process`, `execSync`, `spawn`, or `fork` ever return to the route.

Phase 16 — Overnight hardening (2026-04-25)

ID	Severity	Description	Resolution
OVH-AUTH-COMPLETE	High	No mechanical proof that every business `app/api/*` route is gated by a session — relied on per-route audits	Resolved — `__tests__/unit/auth/auth-guard-completeness.test.ts` walks every `route.ts` under `app/api/` and asserts that any non-whitelisted path returns 401 unauthenticated.
OVH-DEMO-LEAK-CLIENT	Medium	`RouteMap` read `process.env.NEXT_PUBLIC_DEMO_MODE`, baking demo-mode state into the live client bundle	Resolved — removed; client now uses absence-of-`NEXT_PUBLIC_GOOGLE_MAPS_BROWSER_KEY` + an explicit `forceStatic` prop driven by server-side runtime status.
OVH-ENQ-SOFTDEL	High	Phase 15 added soft-delete to Customer/Yard/Horse but Enquiry rows (inbound customer messages) were still hard-deletable	Resolved — `Enquiry.deletedAt` + `Enquiry.deletedById` migration `20260425000000_phase16_enquiry_softdelete_auditlog`; repository filters `deletedAt: null` by default.
OVH-NO-AUDIT-GENERIC	Medium	`SecurityAuditLog` covers security events; `TriageAuditLog` covers visit-request fields; nothing covered generic operator mutations (enquiry tombstone, route-run flips)	Resolved — generic `AuditLog` model + `lib/services/audit-log.service.ts` with redacted JSON `details`, append-only writes, best-effort failure handling.
OVH-CADDY-CSP	Medium	Caddy emitted basic security headers but no CSP — a request that bypassed the Next middleware (cached static, n8n subdomain, error page) had no CSP fallback	Resolved — Caddyfile now sets a CSP at the proxy layer mirroring `lib/security/headers.ts`, plus `Permissions-Policy`, COOP and CORP.
OVH-STATUS-SHALLOW	Medium	`/api/status` reported integration modes but did not actively probe DB / n8n / messaging readiness	Resolved — `/api/status` now runs a live `SELECT 1`, n8n `/healthz` probe (3s timeout), and per-integration readiness summaries with `missing[]` lists.
OVH-PII-RESIDUAL	Low	Two stray PII paths: full address in geocoding partial-match warning, raw error object in manual-enquiry auto-triage failure	Resolved — geocoding now logs postcode prefix only; auto-triage failure logs `error.message` against `enquiryId`.

Phase 16 — Overnight hardening, third slice (2026-04-26)

ID	Severity	Description	Resolution
OVH3-AUDITLOG-PARITY	Medium	The generic `AuditLog` table introduced in PR #51 had only one caller (`DELETE /api/enquiries/[id]` from PR #52). The pre-existing Customer / Yard / Horse soft-delete handlers (Phase 15) wrote only `SecurityAuditLog`, leaving the `AuditLog` table half-built — an operator looking up "everything that has happened to customer X" via `AuditLog.entityId` would see only enquiries.	Resolved — Customer / Yard / Horse `DELETE` handlers now dual-write to BOTH `SecurityAuditLog` (security-event timeline) AND `AuditLog` (per-entity index). New tests in `__tests__/unit/api/yards.test.ts` and `__tests__/unit/api/horses.test.ts` plus an extended `customers.test.ts` regression. Documented as a hard rule in `docs/ARCHITECTURE.md` → "Audit trail" so future contributors don't drift.

Phase 16 — Overnight hardening, second slice (2026-04-25)

ID	Severity	Description	Resolution
OVH2-COERCE-BOOL	High	`enquiryQuerySchema.includeDeleted` used `z.coerce.boolean()`. JS `Boolean()` returns true for any non-empty string — including `"false"`. `?includeDeleted=false` would have silently exposed tombstoned enquiries containing inbound customer messages. (Cursor Bugbot #c7a7eb5c.)	Resolved — replaced with `z.enum(['true','false']).transform(v => v === 'true')` matching the customer/yard/horse pattern, plus regression tests asserting `'false'` → `false` and rejection of `'1'`/`'yes'`/empty string.
OVH2-ENQ-INC-DEL-GATE	Medium	The new `includeDeleted` flag flowed unguarded through `GET /api/enquiries`. A `READONLY` user could URL-hack `?includeDeleted=true` and read tombstoned enquiry PII. (Cursor Bugbot #99773815.)	Resolved — handler now silently downgrades the flag to `false` for non-admin sessions, mirroring `app/api/customers/route.ts`. Three new vitest cases lock this in.
OVH2-N8N-PROBE-DEAD-BRANCH	Low	`probeN8n` returned `'unconfigured'` when `!env.N8N_HOST`, but `lib/env.ts` defaults `N8N_HOST` to `'localhost'` — making the branch dead code. Stacks with no n8n burned a 3-second timeout per `/api/status` poll and reported `'unreachable'`. (Cursor Bugbot #da88139d.)	Resolved — branch now keys on `!env.N8N_API_KEY`, the credential every n8n callback already fail-closes on. New regression test asserts no `fetch()` call when the key is absent.
OVH2-ENQ-DELETE-ROUTE	High	The Enquiry repository gained `delete()` / `restore()` / `hardDelete()` in PR #51 but no HTTP entry point existed — admins could delete customers/yards/horses but not misrouted spam enquiries.	Resolved — `DELETE /api/enquiries/[id]` (admin-gated, soft-delete, writes both `SecurityAuditLog{event:'ENQUIRY_DELETED'}` and the generic `AuditLog{action:'ENQUIRY_DELETED'}`). Migration `20260425100000_phase16_enquiry_audit_events` adds the new enum values.
OVH2-AUTOTRIAGE-LOG-PII	Low	Auto-triage failure log included `err.message`. Inner triage services occasionally embed raw inbound text in their error messages (e.g. `"failed to parse: '<customer message>'"`), bypassing the `maskPhone`/`maskEmail` utilities.	Resolved — log now records `errorClass` (e.g. `"ZodError"`) plus `enquiryId` only. Operators reach the full payload through redacted channels.

Phase 16 — Operational-readiness uplift (2026-04-23)

ID	Severity	Description	Resolution
PR16-NO-ERRSINK-IMPL	Medium	Phase 15 shipped a sink interface but no wireable implementation	Resolved — `lib/observability/webhook-error-sink.ts` + `instrumentation.ts` auto-register when `EQUISMILE_ERROR_WEBHOOK_URL` is set.
PR16-BACKUP-MANUAL	High	Backup was a host cron the operator had to install by hand	Resolved — `backup` compose service runs `pg_dump` on an internal cron; no host setup required.
PR16-NO-RESTORE-DRILL	Medium	No mechanical way to verify a backup is restorable	Resolved — `scripts/backup-restore-verify.sh` restores the newest dump into a scratch DB and asserts schema + row presence.
PR16-NO-OPS-UI	Medium	No operator-visible view of DLQ depth, audit activity, backup freshness	Resolved — `/api/admin/observability` + `/[locale]/admin/observability` page (admin-only).
PR16-PII-SWEEP	Low	Remaining raw phone in confirmation.service and n8n send-whatsapp trigger	Resolved — `maskPhone()` applied to all outbound logs.

Phase 15 — Production-readiness uplift (2026-04-23)

Filed and closed during the Phase 15 PR. See PRODUCTION_READINESS.md for the updated go-live checklist.

ID	Severity	Description	Resolution
PR15-SOFT-DEL	High	Hard deletes on Customer/Yard/Horse cascaded clinical records	Resolved — `deletedAt` / `deletedById` tombstones + repo-level `deletedAt: null` default filter.
PR15-DOCKER-ENV	High	Docker compose missing Auth.js / Anthropic / `NEXT_PUBLIC_GOOGLE_MAPS_BROWSER_KEY` pass-through	Resolved — `env_file: .env` + explicit `args:` for NEXT_PUBLIC_* build-time vars.
PR15-NO-BACKUP	High	No backup script or restore runbook	Resolved — `scripts/backup-db.sh` + `docs/BACKUP.md`.
PR15-API-RATE	Medium	Only webhook & vision endpoints rate-limited; no floor on authenticated write traffic	Resolved — middleware-level per-user API write-limit (60s / 120 writes).
PR15-PII-LOGS	Medium	Raw phone/email in WhatsApp/email/n8n-trigger logs	Resolved — `maskPhone()` / `maskEmail()` wrapped around every outbound log.
PR15-NO-ERRSINK	Low	No hook to forward errors to Sentry / log aggregator	Resolved — `registerErrorSink()` in `lib/utils/logger.ts`.
PR15-NO-LEGAL	Low	No public privacy notice or terms page	Resolved — `/[locale]/privacy` + `/[locale]/terms` (EN + FR).
PR15-NO-TOKENOPS	Low	WhatsApp token lifecycle not documented	Resolved — `docs/OPERATIONS.md` §1.
PR15-NO-POOLTUNE	Low	Prisma pool tuning / `pool_timeout` not documented	Resolved — `docs/OPERATIONS.md` §2.
PR15-WEAK-DBPW	High	`docker-compose.yml` used `equismile_dev` as a default POSTGRES_PASSWORD	Resolved — compose now fails loud via `${POSTGRES_PASSWORD:?}` with no default; `.env.example` uses an obvious `<strong-password-here>` placeholder.

Active Issues

ID	Phase	Severity	Description	Workaround
~~KI-001~~	5	~~Low~~	~~Google Maps API rate limiting may cause batch geocoding to fail for large batches (50+ yards)~~	Resolved in Phase 17 — `batchGeocodeYards()` now drives request rate from `mapsCostTrackerService.checkBudget()`: slows to 200ms-per-call when the soft cap is hit, and aborts with a partial-result return when the hard cap is hit. See `docs/MAPS_COST_CONTROL.md`.
~~KI-002~~	6	~~Low~~	~~Reminder scheduling depends on `POST /api/reminders/check` being called periodically — no built-in cron~~	Resolved in Phase 12d
~~KI-003~~	7	~~Low~~	~~PWA offline queue does not retry mutations in the original submission order if multiple were queued~~	Resolved in v1.1 — see Resolved Issues table
KI-004	3	Medium	WhatsApp webhook verification requires the app to be publicly accessible — not possible in local dev	Use ngrok or similar tunnel for local WhatsApp testing
KI-005	4	Low	Auto-triage confidence scores are heuristic-based and may misclassify edge cases	Manual override is available; triage tasks created for low-confidence classifications
KI-006	9	Info	`/api/webhooks/`, `/api/n8n/`, and `/api/reminders/check` intentionally bypass session auth and stay behind the separate `N8N_API_KEY` check — by design, because n8n calls them server-to-server without a browser session. Phase 14 PR E hardened this: the key gate now FAILS CLOSED in production (HTTP 500) when `N8N_API_KEY` is unset, instead of silently accepting anonymous traffic.	No action; enforced in `middleware.ts` via `PUBLIC_PATH_PATTERNS` + `lib/utils/signature.ts#requireN8nApiKey`.
KI-007	14	Info	In-memory rate limiters (`lib/utils/rate-limit.ts`) do not share state across horizontally-scaled instances. Acceptable for the single-vet single-VPS deploy shape; promote to Redis when the deploy goes multi-node.	No action required for v1 scale.

v1.0.0 Retrospective Audit — AMBER items (2026-04-20)

Filed during the Phase Verification Plan audit. See V1_AUDIT_FINDINGS.md for the per-phase evidence tables.

ID	Phase	Severity	Description	Workaround / Recommendation
~~AMBER-01~~	7	~~Low~~	~~Demo-startup exec-bit test fails on Windows~~	Closed in-audit — guarded 3 exec-bit tests with `itPosix` helper in `__tests__/unit/infra/demo-startup.test.ts`; POSIX CI still enforces
~~AMBER-02~~	1	~~Low~~	~~Brand colour is `#1e40af` (blue) in manifest/layout/globals.css instead of `#9b214d` (maroon) specified in PHASE_1_MASTER_PROMPT § 1.2 and shown in Logo.png~~	Resolved — aligned all four code sites (globals.css, manifest.ts, layout.tsx, RouteMap.tsx) to the spec maroon `#9b214d`. Added `--color-primary-light` (`#c23b6c`) and `--color-primary-dark` (`#6f1738`) tints.
~~AMBER-03~~	2	~~Low~~	~~Seed counts below Phase 2 target (5c/8y/15h/10e/5vr vs 8/6/20/12/10)~~	Resolved by PR #17 (Phase 12d) — `seed.ts` split into production (minimal) + `seed-demo.ts` (8c/8y/20h/12e)
~~AMBER-04~~	2	~~Low~~	~~No dedicated `/visit-requests` route~~	Resolved in Phase 14 PR D — added `/[locale]/visit-requests` list page with status + urgency filters.
~~AMBER-05~~	4	~~Low~~	~~Triage dispositions split across `TriageStatus` + `PlanningStatus` + `TriageTaskType`~~	Resolved by docs in Phase 14 PR D — `docs/ARCHITECTURE.md` now carries an explicit disposition mapping table.
~~AMBER-06~~	5	~~Low~~	~~Geocoding fields on Yard lack `source`, `precision`, `formattedAddress`~~	Resolved in Phase 14 PR D — added `geocodeSource`, `geocodePrecision`, `formattedAddress` nullable columns via additive migration.
~~AMBER-07~~	5	~~Low~~	~~`RouteRun`/`RouteRunStop` used instead of master prompt's `RouteProposal`/`RouteStop`~~	Resolved by docs in Phase 14 PR D — explicit rename rationale + mapping in `docs/ARCHITECTURE.md`.
~~AMBER-08~~	6	~~Medium~~	~~Single `AppointmentStatus` enum instead of separate Booking/Confirmation/Reminder enums~~	Resolved by docs in Phase 14 PR D — rationale in `docs/ARCHITECTURE.md`; multi-send audit now captured by `ConfirmationDispatch` (AMBER-10).
AMBER-09	6	Low	No explicit `AppointmentHorse` link table; horses inferred from VisitRequest relation	Adequate if per-appointment horse metadata (order, per-horse duration) is not tracked
~~AMBER-10~~	6	~~Medium~~	~~No `ConfirmationDispatch` event log~~	Resolved in Phase 14 PR D — `ConfirmationDispatch` table + `appointmentAuditService.logConfirmationDispatch`; every send attempt (success or failure) recorded.
~~AMBER-11~~	6	~~Medium~~	~~No `AppointmentResponse` model~~	Resolved in Phase 14 PR D — `AppointmentResponse` table + `appointmentAuditService.logResponse`; captures inbound confirm/cancel/reschedule replies linked directly to the appointment.
~~AMBER-12~~	6	~~Low~~	~~No `ReminderSchedule` queue~~	Resolved by docs in Phase 14 PR D — inline timestamps + idempotent cron are adequate for single-vet scale; promotion plan documented in `docs/ARCHITECTURE.md`.
~~AMBER-13~~	6	~~Low~~	~~No `AppointmentStatusHistory` table~~	Resolved in Phase 14 PR D — `AppointmentStatusHistory` table; booking / reschedule / visit-outcome services write history rows in the same transaction as status mutations.
~~AMBER-14~~	7	~~Medium~~	~~Idempotency key store is in-memory (`processedKeys: Set<string>`) — lost on restart and not shared across instances~~	Resolved by phase 13 — `IdempotencyKey` Prisma model + `lib/services/idempotency.service.ts` (Postgres-backed). `hasBeenProcessed`/`markAsProcessed` are now async. Survives restarts, shared across instances, 30-day TTL with `pruneExpired()` cron.
~~AMBER-15~~	7	~~Low~~	~~No dead-letter queue for permanent failures after `maxRetries`~~	Resolved in Phase 14 PR D — `FailedOperation` table + `deadLetterService`. `whatsappService` and `emailService` enqueue permanent failures; operators replay via `deadLetterService.markStatus`. Payloads scrubbed with `redact()` before storage.
~~AMBER-16~~	7	~~Low~~	~~No direct unit test for retry.ts~~	Retracted — `__tests__/unit/utils/retry.test.ts` already exists with full coverage

Resolved Issues

ID	Phase	Description	Resolution
KI-002	6	Reminder scheduling had no built-in cron	Added `n8n/07-reminder-scheduling.json` — n8n workflow triggers `GET /api/reminders/check` every 15 minutes
KI-003	7	PWA offline queue did not retry mutations in submission order and broke on the first failure, blocking subsequent items	v1.1 PR — `lib/offline/queue-replay.ts` adds an explicit monotonic `sequence` field per queued record, sorts on replay, drops 4xx/2xx, retains 5xx for retry, and aborts only on a fetch throw (genuine offline). Service worker (`app/sw.ts`) wires through the helpers; pure logic covered by `__tests__/unit/offline/queue-replay.test.ts` (13 cases).

Conventions

Log issues discovered during development or UAT here
Include phase, severity (low/medium/high/critical), and description
Add workaround if available
Move to Resolved section when fixed, with resolution notes
Remove from Resolved after one release cycle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EquiSmile Known Issues

Phase 33 — Patrick deferred items: dental chart colouring, manual/follow-up booking, intervention catalogue (2026-06-05)

Phase 31 — Emergency auto-alert + stop-automation gate (2026-06-04)

Issue #143 — Vercel preview deploy red on every PR (env-var drift) (2026-05-20)

Phase 30 — Phase 2 build, 10 rounds, 11 slices (2026-05-26 → 2026-05-27)

Phase 29 — Free-text reply UI + Triage in desktop sidebar (2026-05-22)

Phase 28 — DLQ visibility + replay for failed inbound webhooks (2026-05-21)

Phase 27 — Demo simulator actually writes to /inbox (2026-05-21)

Phase 26 — Commercial paperwork for client demo (2026-05-20, revised 2026-05-21)

Phase 25 — Build hardening: SKIP_ENV_VALIDATION honoured at module-import time (2026-05-19)

Phase 24 — Operator readiness: UAT refresh + DR drill + operator quick-start (2026-05-19)

Phase 23.1 — Live maps end-to-end fix (2026-05-19)

Phase 23 — Go-live runbooks: WhatsApp Meta production approval + production data load (2026-05-16)

Phase 22 — Audit tail: WhatsApp token boot probe + pre-migrate snapshot + SW cache verification (2026-05-16)

Phase 21 — Audit residue: Sentry option + pool-param boot warning (2026-05-15)

Phase 20 — Template UX + customer-DB import + WhatsApp simulator + road routing (2026-05-13)

Phase 19 — Outlook setup + scope clarifications + handover runbook (2026-05-13)

Phase 18 — Unified inbox + journey-planner reorder (2026-05-13)

Phase 17 — Google Maps cost-control + go-live readiness gate (2026-05-13)

Phase 16 — Overnight hardening, eighth slice (2026-04-27)

Phase 16 — Overnight hardening, seventh slice (2026-04-27)

Phase 16 — Overnight hardening (2026-04-25)

Phase 16 — Overnight hardening, third slice (2026-04-26)

Phase 16 — Overnight hardening, second slice (2026-04-25)

Phase 16 — Operational-readiness uplift (2026-04-23)

Phase 15 — Production-readiness uplift (2026-04-23)

Active Issues

v1.0.0 Retrospective Audit — AMBER items (2026-04-20)

Resolved Issues

Conventions

FilesExpand file tree

KNOWN_ISSUES.md

Latest commit

History

KNOWN_ISSUES.md

File metadata and controls

EquiSmile Known Issues

Phase 33 — Patrick deferred items: dental chart colouring, manual/follow-up booking, intervention catalogue (2026-06-05)

Phase 31 — Emergency auto-alert + stop-automation gate (2026-06-04)

Issue #143 — Vercel preview deploy red on every PR (env-var drift) (2026-05-20)

Phase 30 — Phase 2 build, 10 rounds, 11 slices (2026-05-26 → 2026-05-27)

Phase 29 — Free-text reply UI + Triage in desktop sidebar (2026-05-22)

Phase 28 — DLQ visibility + replay for failed inbound webhooks (2026-05-21)

Phase 27 — Demo simulator actually writes to /inbox (2026-05-21)

Phase 26 — Commercial paperwork for client demo (2026-05-20, revised 2026-05-21)

Phase 25 — Build hardening: SKIP_ENV_VALIDATION honoured at module-import time (2026-05-19)

Phase 24 — Operator readiness: UAT refresh + DR drill + operator quick-start (2026-05-19)

Phase 23.1 — Live maps end-to-end fix (2026-05-19)

Phase 23 — Go-live runbooks: WhatsApp Meta production approval + production data load (2026-05-16)

Phase 22 — Audit tail: WhatsApp token boot probe + pre-migrate snapshot + SW cache verification (2026-05-16)

Phase 21 — Audit residue: Sentry option + pool-param boot warning (2026-05-15)

Phase 20 — Template UX + customer-DB import + WhatsApp simulator + road routing (2026-05-13)

Phase 19 — Outlook setup + scope clarifications + handover runbook (2026-05-13)

Phase 18 — Unified inbox + journey-planner reorder (2026-05-13)

Phase 17 — Google Maps cost-control + go-live readiness gate (2026-05-13)

Phase 16 — Overnight hardening, eighth slice (2026-04-27)

Phase 16 — Overnight hardening, seventh slice (2026-04-27)

Phase 16 — Overnight hardening (2026-04-25)

Phase 16 — Overnight hardening, third slice (2026-04-26)

Phase 16 — Overnight hardening, second slice (2026-04-25)

Phase 16 — Operational-readiness uplift (2026-04-23)

Phase 15 — Production-readiness uplift (2026-04-23)

Active Issues

v1.0.0 Retrospective Audit — AMBER items (2026-04-20)

Resolved Issues

Conventions