Phase 33 — Patrick deferred items: dental chart colouring, manual/follow-up booking, intervention catalogue (2026-06-05)
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR33-DENTAL-CHART-COLOUR (A2) | Feature | The interactive Fiche dentaire arcade chart and the PDF report both marked a tooth with a single colour ("has a finding"), so the vet couldn't tell what kind of finding (caries vs fracture vs wave …) a tooth carried at a glance. Patrick's dental-form spec wanted the chart to convey finding type. |
Resolved — new shared lib/dental/finding-styles.ts (single source of truth for colours + labels + the per-tooth finding extraction, mirroring the PDF's correct per-section logic incl. diastema bounding teeth + present gating). components/dental/fiche/ToothChart.tsx now colours each tooth by its dominant finding category (clinical priority order) via inline hex (no Tailwind-purge risk) and renders a legend of the finding types present; the tooltip lists every finding on a tooth. lib/services/dental-report.service.ts::drawOcclusalChart renders the same per-category colours + a wrapped colour legend under the chart. FicheDentaire swaps its flat marked set for teethFindingsByCategory. 12 new unit tests (finding-styles.test.ts); existing fiche tests unchanged. |
| PR33-MANUAL-FOLLOWUP-BOOKING (B5) | Feature | There was no fast way to create an appointment by hand (a phone booking) or to say "see this horse again in three weeks" — appointments only came from approved route runs or completion-triggered follow-up visit requests in the pool. | Resolved — new lib/services/manual-booking.service.ts creates a backing VisitRequest (BOOKED) + CONFIRMED Appointment + optional primary staff assignment in one transaction, with the duration from the shared timing model (overridable). New POST /api/appointments (VET+, createManualAppointmentSchema). New /[locale]/appointments/new quick-add page (customer → yard scoped to that customer → services → date/time → vet), reachable from a "New appointment" button on the appointments list. The same page doubles as the follow-up screen: the appointment detail page now has a "Book follow-up" card with 3/6/12-week presets (+ custom) that deep-link /appointments/new pre-filled from the visit and dated N weeks out. 7 new service tests. |
| PR33-INTERVENTION-CATALOGUE (B6) | Feature | The intervention catalogue was missing the types Patrick listed (Medication Delivery, Client Call, Administrative, Private), and five service durations (consultation / blood sample / osteopathy / lameness-gait / massage) were hard-coded in visit-timing.service.ts rather than team-editable — contradicting "make every duration editable". |
Resolved — VisitServiceType gains MEDICATION_DELIVERY (on-yard, billable) + CLIENT_CALL / ADMIN_TASK / PRIVATE (non-visit time blocks, zero travel overhead). PracticeSchedulingConfig gains 11 additive columns (the 5 previously-hard-coded per-horse durations, the OTHER base + per-extra-horse, and 4 flat durations for the new types) — all defaulting to the old constants, so behaviour is unchanged for any practice that never opens the tuning UI. visit-timing.service.ts reads every duration from config; overhead applies only to genuine on-yard work (time blocks get none). Admin UI (/admin/practice-config) gains "Other on-yard services" + "Non-visit time blocks" sections; VisitServiceChip + myDay.services (EN/FR) gain the four labels. Additive migration 20260605020000_intervention_catalogue (4 enum values + 11 columns). 11 new visit-timing tests. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR31-EMERGENCY-AUTO-ALERT | Feature | Client requirement: urgent cases must leave the standard workflow immediately — "remove from standard workflow, notify the vet immediately, summarise, ask the vet to contact the client, stop further automated scheduling until reviewed." Before this, an URGENT enquiry created an URGENT_REVIEW triage task but the vet was not actively paged, and nothing structurally prevented the case from being pulled into an automated route proposal / slot suggestion. |
Resolved — new lib/services/emergency.service.ts. When auto-triage.service.ts classifies a request URGENT (uncertain/unknown urgency is treated as urgent too — fail-safe towards the vet), it keeps the URGENT_REVIEW task and calls emergencyService.raiseUrgentAlert: sets VisitRequest.automationHold = true (+ reason + timestamp) and dispatches an immediate vet alert — a concise summary (customer, horse count, yard, message excerpt, clinical flags) plus "please contact the client directly". Alert is idempotent (urgent-alert:<visitRequestId>, persistent idempotency table) and never throws out of intake (per-channel + outer catch → dead-letter). Recipient resolves from VET_ALERT_EMAIL / VET_ALERT_WHATSAPP, falling back to an active VET-role Staff email; if none is reachable the hold still applies and the miss is dead-lettered. Stop-gate: route-proposal.service.ts pool query now excludes automationHold: true; slot-suggestion.service.ts returns no suggestions for a held request. Clear-hold: VET-only POST /api/visit-requests/[id]/clear-hold (emergencyService.clearHold records automationClearedById/At) plus a "Clear hold" control + hold banner on /[locale]/triage. Additive Prisma migration 20260605000000_emergency_automation_hold adds five columns + an index. The operator-facing planning workspace (planning.service.ts) deliberately still shows held items so the vet sees the urgent case — only automated scheduling is gated. 16 new tests (emergency service, auto-triage gate, pool exclusion, clear-hold RBAC). |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| ISSUE-143-VERCEL-PREVIEW-RED | Low (cosmetic) | Vercel preview deploys persistently failed on main and on every PR (e.g. #141, #142). Root cause traced via the build pipeline: vercel.json → scripts/vercel-build.sh → npx next build → page-data collection imports server routes → routes import lib/env.ts → validateEnv() throws because DATABASE_URL (a z.string().min(1)) is unset on the Preview environment. The five-check local gate (lint / typecheck / prisma validate / test / build) was unaffected — the failure was Vercel-side configuration only. docs/VERCEL.md §5 already documents this exact failure mode and the operator-side fix. |
Resolved (operator action + code-side polish). Operator action (no-code): install the Neon Vercel integration per docs/VERCEL.md §5.1 (writes a Preview-scoped DATABASE_URL per PR branch), OR add DATABASE_URL manually with only the "Preview" checkbox ticked. Code-side: scripts/vercel-build.sh now runs an upfront env probe immediately before next build and emits a single signposted error block (with a direct pointer to docs/VERCEL.md §5) instead of letting next build abort with a noisy Zod stack trace deep in page-data collection. lib/env.ts was deliberately NOT softened — that fail-fast guard exists to catch the production-secret-drift class of bug (per the AUTH_SECRET incident). docs/HANDOVER.md § 1 also updated to inventory previously-missing Phase 17–22 vars (WHATSAPP_TEST_NUMBER, NEXT_PUBLIC_MAP_ROUTING_MODE, SENTRY_DSN, VERCEL_PREVIEW_MIGRATE, EQUISMILE_FORCE_DEMO_SEED). |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR30-QUICK-ANSWER-MODE | Feature | Vet had only two paths to reply to an enquiry — pre-built templates (slow for multi-question replies) or pure free-text typing on a phone between yards (also slow). Kathelijne planning call 2026-05-26: "I get a simple question, AI can answer this." | Resolved Round 2 (#163) — new /[locale]/enquiries/[id]/answer single-screen UI generates three AI-drafted reply options (Direct / Friendly / Detailed) via Claude Haiku, pre-fills the editor, vet edits + taps Send. Curated FAQ match surfaces as a fourth card when confidence ≥ 0.55 (Round 10 integration with § 2.11). Hard rule preserved: AI never sends; vet's tap is the only outbound path. DEMO_MODE / no-key returns deterministic mocked drafts. sessionStorage caches drafts 1h per enquiry id. |
| PR30-FICHE-DENTAIRE-FORM | Feature | DentalChart only stored generalNotes (free text). Vets had to rewrite the same 11-section dental form from scratch each visit; no structured query path for later. Real Fiche dentaire PDF + flowchart docx shared by Kathelijne 2026-05-26. |
Resolved Round 3 (#164) — new 11-section structured form at /[locale]/horses/[id]/dental-charts/new (Surdents / Vagues / Pointes / Escalier / Diastèmes / Slabfracture / Caries infundibulaires / Canaux pulpaires cariés / Fracture dentaire / Dent mobile / Caries périphériques). Each section defaults to "Non", flips to "Oui" auto-opens detail. Triadan tooth picker, AI prefill from pasted text, persisted as DentalChart.checklist JSONB. Round 10 added structured-finding summary badges on past-chart rows in HorseClinicalHistory. |
| PR30-LOST-MONEY | Feature | Kathelijne's most emotional planning-call moment: "Where I think we lose a lot of money today — we forget to invoice medication that we dropped off." Real recurring financial loss because no system captured the act of "I dropped off X for Y mid-route". | Resolved Round 4 (#165) — new SelfInvoiceTask model + intake routing. Vet WhatsApps the practice number from her own phone ("dropped off 100ml Equimax for Cajoleur 45chf"); intake matches sender to Staff row, runs the text through Claude Haiku → fuzzy-matches Customer + Horse → PENDING task on /[locale]/admin/self-invoice. Vet edits + Confirms → real Invoice issued via existing invoiceService.issue(). Reject preserves the row for audit. Round 10 integration: voice notes with wakeIntent=PRESCRIPTION skip this path (clinical, not billable). |
| PR30-VOICE-INTAKE | Feature | WhatsApp type=audio messages were silently skipped at intake (reason: 'non-text-message'). Voice notes were invisible to the app. Kathelijne wanted voice-to-invoice + voice-to-prescription. |
Resolved Round 9 (#170) + Round 10 routing — new voiceTranscriptionService (mock-fallback by default; production Whisper is a one-function drop-in) + pure detectWakeWord function. Intake transcribes the audio, runs wake-word detection, persists transcript + intent on EnquiryMessage. UI shows "Voice note" + "Invoice trigger" / "Prescription trigger" badges in the message thread. Round 10 wired PRESCRIPTION-intent routing to skip self-invoice creation so clinical notes don't become billable lines. |
| PR30-VET-PAIRING | Feature | Route planner produced one RouteRun per cluster regardless of yard size. Kathelijne's rule from the call: "3+ horses at a stable, both vets go together; 1-2 horses, we split up." | Resolved Round 5 (#166) — planPairing pure function in lib/services/vet-pairing.service.ts classifies stops as joint vs solo, distributes solo postcode-aware. When joint stops exist + ≥2 active vets exist, planner emits N parallel RouteRun rows sharing a parallelGroupId; joint stops marked isJoint=true. UI shows lead-vet pill + "Parallel route" + "Joint stops" badges on each card. Single-vet practice collapses to one run. Threshold + vets-per-joint tunable via Practice Config. |
| PR30-VISIT-TIMING-DRIFT | Bug | Three places in the codebase computed visit duration differently — triage-rules.estimateDuration used 30+25(n-1), route planner used n×30+15, route-constraints had unused constants. For a 4-horse dental visit Kathelijne's actual time is 150 min (15+4×30+15); the planner used 135, triage used 105. Days regularly ran past 17:30 in real operations. |
Resolved Round 1 (#161) — single source of truth at lib/services/visit-timing.service.ts:calculateVisitDurationWithConfig(). Triage + route planner both call it. Replaced four inline formulas in route-proposal.service.ts with v.estimatedDurationMinutes reads. New VisitServiceType enum + VisitRequest.services field so dry-needling, vaccination, OTHER are addressable per visit (not just dental). |
| PR30-CONFIG-TUNABLE | Feature | All scheduling thresholds + timings were code constants in lib/config/route-constraints.ts. The practice couldn't tune them without a code change. |
Resolved Round 1 (#161) + Round 10 admin UI — singleton PracticeSchedulingConfig row holds dayStart/End, lunch, max travel/yards/horses, all six timing coefficients, joint-visit threshold, vets-per-joint. lib/services/practice-config.service.ts reads via 30s in-process cache; defaults match the legacy constants so behaviour is unchanged for any practice that never opens the admin UI. New /[locale]/admin/practice-config page (ADMIN+) editable form propagates changes within 30s. |
| PR30-VETUP-IMPORT | Feature | Real VetUp PMS export Kathelijne shared (2,277 horses × 24 columns) was 60% covered by EquiSmile's schema. Production cutover required full data-shape parity. | Resolved Round 6 (#167) — additive schema parity (12 new nullable Customer + Horse columns, two new enums AnimalSpecies + AnimalSex, both with unique vetup*Id keys for idempotent re-import). CLI-only import at scripts/import-vetup.ts with --dry-run support. Path-safety guard refuses to read CSV files inside the repo (PII guard). Liberal date parsing for VetUp's French DD.MM.YYYY format. Defensive merge: manually-curated email/phone never erased by blank import values. |
| PR30-SLOT-SUGGESTION | Feature | Vet picked slots manually from the calendar. Contract § 4.2 carry-over: "system proposes morning/afternoon slots in regional rounds, for vet approval". | Resolved Round 7 (#168) — slotSuggestionService.suggest() computes 1–3 ranked options per visit request. JOIN_EXISTING (non-booked RouteRun within 25km in next 60 days, scored by distance + preferred-day match); NEW_DAY fallback (next preferred-day date in the window). Inline expandable panel on every /visit-requests row. Results cached on VisitRequest.suggestedSlots so re-renders are cheap. |
| PR30-FAQ-AI | Feature | Kathelijne wanted AI to handle common-question replies but without autonomous send (Patrick's constraint). Path forward: curate the practice's approved answers + match incoming enquiries to them. | Resolved Round 8 (#169) — new FaqEntry model (key, topic, aliases, EN + FR answers, audit). Two-pass matcher: lexical pre-filter (top 6 candidates by alias word overlap) → Claude Haiku picks the best (or null) with 0..1 confidence. Below 0.55 = no suggestion. 5 seeded starter FAQs (price.routine-dental, coverage.area, service.pony, emergency.process, service.frequency). Admin UI at /[locale]/admin/faqs. Round 10 integration: Quick Answer Mode surfaces the matched FAQ as a fourth "★ Curated answer" card. |
| PR30-ONPREM-VARIANT | Infrastructure | Kathelijne preferred on-prem hosting per the 2026-05-26 call ("I don't like the cloud very much"). | Scoped + built Round 10 (#172) — docker-compose.onprem.yml packages the full stack (postgres / redis / app / n8n / caddy / cloudflared / backup-runner) for a CHF 700–1000 mini-PC. docs/ONPREM_SETUP.md is the 9-section one-day install runbook. Cloud variant stays canonical until Kathelijne opts in post-UAT — see docs/ARCHITECTURE_ONPREM.md § 11 for the decision framework. |
| PR30-SCHEMA-MERGE-DROPPED-MODEL | Build hygiene | Multi-PR merge of Rounds 4 + 5 + 6 dropped the SelfInvoiceTask model definition while preserving the back-relations on Customer / Horse / Staff. Caught by prisma generate failing with "Type SelfInvoiceTask is neither a built-in type nor refers to another model". |
Restored Round 10 — SelfInvoiceTask model put back into prisma/schema.prisma. Migration unchanged because Round 4's migration is still applied (the loss was in the schema file only). Lesson logged in .claude/memory.md: after multi-PR merges, git grep "^model <NewModel>" for each Phase-introduced model before prisma generate. |
| PR30-GITGUARDIAN-FALSE-POSITIVE | Build hygiene | GitGuardian flagged docker-compose.onprem.yml lines 33 + 105 + 76 as "Generic Password" — the literal text inside Compose required-var syntax (${VAR:?MESSAGE} form) was matched as a credential when the MESSAGE happened to describe how to generate the secret. |
Resolved Round 10 (commit 647555c) — replaced the descriptive error messages with the bare keyword required. Compose behaviour unchanged (same fail-fast on missing env var). Lesson logged in .claude/memory.md. Future docs that describe this fix should paraphrase the original strings rather than quote them, to avoid re-tripping the same scanner on the doc PR. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR29-NO-FREE-TEXT-REPLY | Medium | Kathelijne flagged during the 2026-05-21 demo: "how do you just basically respond to an email or whatsapp — that's basic functionality". The app only had the four pre-approved template replies on /en/triage, and /en/triage itself wasn't in the desktop sidebar (only the mobile nav), so even that path was hard to find. |
Resolved — new FreeTextReplyComposer on the enquiry detail page lets a NURSE+ operator type a reply, see a live 24-hour WhatsApp customer-service-window indicator, and send via POST /api/enquiries/[id]/reply → replyService.sendReply. Channel is chosen automatically (enquiry's own channel first, then customer's preferred, then whichever contact is populated). Outbound WhatsApp messages are logged to EnquiryMessage via the existing messageLogService path inside whatsappService.sendTextMessage; the operator action is recorded in AuditLog as ENQUIRY_FREE_TEXT_REPLY_SENT. Outside the 24h WhatsApp window the service refuses and the UI guides the operator to the template-reply path on /triage. Sidebar fix: Triage is now in the desktop sidebar between Inbox and Enquiries (the i18n key nav.triage already existed). 24 new tests (window util ×7, service ×10, API endpoint ×7) plus all 1415 prior. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR28-INBOUND-FAILURES-VANISH | High | The WhatsApp webhook responded 200 to Meta within ~50ms and then ran processWhatsAppPayload(payload).catch(logger.error) asynchronously. Any database failure inside the async chain — Neon cold-start, Prisma pool timeout, transient network blip, malformed Prisma query — was logged and then silently swallowed. Meta would not retry (it saw 200), and the customer's message would simply never appear in /inbox. Confirmed live during the 2026-05-21 client demo: webhook logs showed the message being parsed correctly, followed by Can't reach database server at ep-green-dust...neon.tech:5432, after which the message was lost forever. The email webhook had the same shape — n8n would retry, but only within its configured retry budget. |
Resolved — webhook routes now enqueue async failures to the existing FailedOperation DLQ (Phase 14 PR D) with scopes whatsapp-inbound / email-inbound. deadLetterService gains a replay(id) method that re-runs the original intake for these scopes (outbound scopes still require manual mark-replayed). New POST /api/admin/observability/failed-operations/[id]/replay endpoint (ADMIN-only, audit-logged) drives the new "Replay" button on the /admin/observability DLQ table. Replay is idempotent — the intake services dedupe by externalMessageId, so re-running a row whose underlying enquiry already exists just flips the row to REPLAYED without creating duplicates. 17 new tests cover the service (6 replay branches), webhook wiring (4 scenarios), and the API endpoint (6 status paths). The failure case that bit us in the demo — Neon cold-start kills async intake — would now produce a visible PENDING row in /admin/observability with a "Replay" button. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR27-DEMO-SIM-DOES-NOT-WRITE-TO-INBOX | High | The /en/demo page buttons ("Simulate WhatsApp EN/FR", "Simulate Email EN/FR") generated realistic inbound webhook payloads and returned them as JSON, but never fed those payloads to the intake pipeline. Net effect: clicking the buttons displayed a JSON blob in the Results panel but nothing appeared in /en/inbox or anywhere else in the app. The architectural gap surfaced on 2026-05-21 during a live client demo when the real-WhatsApp path was blocked by Neon Free-tier auto-suspend AND the in-app simulator was the supposed fallback. Both paths failed; the demo's WhatsApp flow could not be shown. |
Resolved — extracted the intake logic from app/api/webhooks/whatsapp/route.ts and app/api/webhooks/email/route.ts into lib/services/whatsapp-intake.service.ts (processWhatsAppPayload) and lib/services/email-intake.service.ts (processEmailPayload). The webhook routes are now thin wrappers — signature/rate-limit/key checks delegate to the service. The demo simulator endpoints (/api/demo/simulate-whatsapp, /api/demo/simulate-email) call the same service after generating the payload, so the message goes through the full dedup → customer resolution → enquiry create → message log → appointment matching → yard/horse matching → visit-request → auto-triage pipeline and lands in /inbox. The demo page UI now shows a "Created enquiry for X — Open in Inbox" summary line above the raw JSON (collapsed in a <details>). 22 webhook + simulator tests pass (all 1398 in the suite). Behaviour for real Meta webhooks is byte-identical; the route still calls processWhatsAppPayload from the same async-fire-and-forget catch as before. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR26-CONTRACT-DRAFT-FROM-PDF-WITH-PATRICK-V2-EDITS | Medium | Patrick's 2026-05-19 second pass on the EquiSmile contract raised four points against the in-flight revised draft (FH-ES-2026-004, 15 May 2026): (1) soften "field-service operations platform" wording, (2) add a phased roadmap section (Phase 1 MVP + stabilisation 40-55h / 30-day target, Phase 2 semi-automation, Phase 3 advanced optimisation), (3) extend the 30-day warranty to a 90-day warranty + stabilisation period and introduce an optional maintenance retainer, (4) clarify running-cost disclosure including whether the practice needs to invest in hosting. The first cut at the v2 draft in this repo was structured loosely against the PDF; the canonical draft for client review needed to preserve the PDF's section structure so the supplier and client can read the same numbering. | Resolved — docs/CONTRACT_DRAFT_v3.md (FH-ES-2026-005, 21 May 2026) supersedes both FH-ES-2026-004 and the in-repo v2 draft. Preserves the PDF's section structure (renumbered for two inserts) and applies all four Patrick points: § 1 wording softened ("AI-assisted workflow and scheduling system"); new § 4 three-phase roadmap (indicative, not committed); § 7.2 warranty rewritten as 90-day warranty + stabilisation with explicit inclusions (bug fixes, workflow adjustments, refinements, real-world edge cases) and exclusions (major new features, Phase 2 items, architectural redesign, external API changes outside supplier control); new § 8 post-delivery support model with three retainer tiers (Light CHF 200 / Standard CHF 350 / Premium CHF 600) plus CHF 100/hr hourly alternative; § 6 running-cost lead paragraph answering "does the practice need hosting?" directly. Appendix A carries a change-history table mapping every diff vs FH-ES-2026-004 back to which Patrick point drove it. Companion docs (docs/RUNNING_COSTS.md, docs/SUPPORT_MODEL.md, docs/CLIENT_DEMO_DAY.md) cross-references updated to point at v3 section numbers. CLAUDE.md updated. Previous v2 draft deleted. |
| PR26-DEMO-PATH-WAS-LAPTOP-BOUND | Low | docs/DEMO_RUNBOOK.md documented the Pinggy-tunnel demo path, which requires the developer's laptop to stay on, the tunnel to stay up, and the demo to happen synchronously. For a real client-facing first-demo session this is brittle: the client can't re-open the URL later, the tunnel can flake, and the demo URL changes between sessions. |
Resolved — docs/CLIENT_DEMO_DAY.md documents the Vercel-preview path as the primary client-demo arrangement. Branch demo/<client>-<date> off main, Vercel Preview scope with DEMO_MODE=true + VERCEL_PREVIEW_MIGRATE=true + EQUISMILE_LIVE_MAPS=true + Maps keys, leave AUTH_URL/NEXT_PUBLIC_APP_URL unset (Vercel auto-derives). The runbook includes pre-flight checklist, 25-min walkthrough script, anticipated Q&A bank, mid-demo recovery table, and post-demo follow-up steps. docs/DEMO_RUNBOOK.md remains the laptop-bound fallback. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR25-BUILD-IGNORES-SKIP-FLAG | Medium | SKIP_ENV_VALIDATION=true npm run build failed locally and in CI for every PR going back to at least the Phase 22 batch (this was the "pre-existing failure" footnote in PRs #148 and #149). Root cause: lib/env.ts validated process.env against the Zod schema at module-import time via export const env = validateEnv(). The SKIP_ENV_VALIDATION=true flag only short-circuited the standalone scripts/check-env.ts validator (which runs as a separate node process before next build), NOT the module-level validation triggered when next build evaluated route modules during page-data collection. Net effect: the build threw "Environment variable validation failed: DATABASE_URL is required" the moment any route module imported lib/env, with the misleading "Failed to collect page data for /api/appointments/[id]/cancel" wrapper. Every PR with five-check-gate language was operating with a 4-of-5 gate, not 5-of-5. |
Resolved — lib/env.ts validateEnv() now checks SKIP_ENV_VALIDATION at the top of the function. When the flag is set, the validator supplies a placeholder DATABASE_URL=postgresql://skip:skip@localhost:5432/skip (only if DATABASE_URL is unset) and parses with Zod's .optional().default(…) fields filling the rest — no exception thrown. A console.warn fires so a production-runtime leak of the flag is loud (suppressed in tests). Pinned by 5 new tests in __tests__/unit/lib/env-skip-validation.test.ts covering: regression guard (throws when flag unset + DATABASE_URL missing), fix (does not throw when flag set + DATABASE_URL missing), placeholder application, real-DATABASE_URL preservation alongside the flag, normal-flow unchanged. Production runtime semantics identical to before — real Vercel / Docker production builds never set the flag and validate normally. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR24-STALE-UAT | Medium | docs/UAT_v2_VALIDATION.md (2026-05-07) was the most recent UAT pass, against commit 7cb7efb. Phases 17–23 have shipped since — maps cost control, unified inbox + IMAP, CSV import, WhatsApp simulator, RouteMap DirectionsService fix, Sentry sink option, WhatsApp token boot probe, pre-migrate snapshot, SW cache verification + VersionBanner. A future UAT pass had no plan for what to re-test vs. what's newly testable. |
Resolved — new docs/UAT_v3_REFRESH.md. Lists the delta from v2 per phase, updates the status of v2's three defects (D-2 status-check, D-3 closed by Phase E /recalls, D-4 status-check), and adds 14 new test cases across four new sections (Maps cost, Inbox/IMAP, Admin tools, Observability/PWA). Total v3 matrix is 39 cases across 9 sections. The doc is a plan for the next live UAT pass, not a fresh validation — the actual execution needs a live deploy URL. |
| PR24-NO-DR-DRILL | Medium | docs/BACKUP.md § 4 + § 7 and docs/OPERATIONS.md § 4 documented restore procedures + the weekly automated backup-restore-verify.sh smoke test, but there was no operator-facing rehearsal book. Operators had no muscle memory for DR scenarios because nothing said "on a Tuesday morning, run these three drills." |
Resolved — new docs/DR_DRILL.md with three quarterly-cadence rehearsal scenarios: Drill A (bad migration, uses Phase 22 pre-migrate snapshot, RTO 30 min), Drill B (disk lost overnight, uses Phase 16 nightly dump, RTO 2 h, RPO ≤ 24 h), Drill C (weekly verify failed, the meta-recovery drill that protects the recovery path itself). Each drill has scenario narrative, RTO/RPO targets, step-by-step rehearsal procedure, success criteria, and a common-failure table that maps rehearsal gotchas to production incident causes. The doc cross-references BACKUP.md + OPERATIONS.md as the reference manual rather than duplicating them. |
| PR24-NO-QUICKSTART | Medium | A new operator handed EquiSmile had to read 12+ docs in the order set by CLAUDE.md's doc-first principle to know what to do on day 1, week 1, month 1. There was no single-page index that linked the existing runbooks in operational order without duplicating them. |
Resolved — new docs/OPERATOR_QUICKSTART.md. One-page checklist with three time horizons (day 1: get the stack up, verify probes, sign in — 8 steps; week 1: load real data, start Meta approval, walk the simulator with Kathelijne — 9 steps; month 1: Meta cutover, first DR drill, spend baseline establishment — 10 steps). Explicit stop conditions per horizon. Standing-state reference table linking each operational topic to its canonical doc. Emergency-contacts sequence (5 scenarios → 5 doc references). Every step links to a deeper runbook for the actual procedure — this doc is the index. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR23.1-YARD-PATCH-NO-REGEOCODE | High | PATCH /api/yards/[id] updated address fields without re-geocoding. The DB held stale latitude/longitude, so the route map kept rendering pins at the pre-edit location indefinitely. Compounded by geocodingService.geocodeYard() short-circuiting on geocodedAt alone — even a direct call after the edit was a no-op. |
Resolved — PATCH now diffs the incoming payload against the row's existing address fields; on change it clears latitude/longitude/geocodedAt/formattedAddress and fires geocodingService.geocodeYard(id) fire-and-forget. The service's short-circuit now also requires the persisted formattedAddress to contain the current postcode, so address edits always re-run the geocoder. New regression test in __tests__/unit/services/geocoding.service.test.ts. |
| PR23.1-YARD-POST-NO-GEOCODE | High | POST /api/yards returned 201 with latitude/longitude = null. New yards never appeared on the route map until someone hit the batch endpoint or the route planner ran. |
Resolved — POST handler now fires geocodingService.geocodeYard(id) fire-and-forget after create. Failures still flip geocodeFailed=true and surface on /admin/maps-usage; they never block yard creation. |
| PR23.1-GEOCODE-REGION-GB | Medium | geocodingService.geocodeAddress() passed region=gb to Google Geocoding. EquiSmile is Swiss; short Swiss postcodes (e.g. 1807) could be biased toward GB look-alikes. google-maps.client.ts already used ch correctly — this was a one-line drift. |
Resolved — region is now ch in both call sites. |
| PR23.1-DEMO-ROUTES-NO-STOPS | Medium | POST /api/demo/generate-routes created the RouteRun summary but never inserted any RouteRunStop rows. The /route-runs/[id] page rendered an empty stops list and a map with zero pins, so the demo's most visible button looked broken. |
Resolved — endpoint now creates RouteRunStop rows inside a transaction with routeRun, mapping each ordered waypoint back to its source VisitRequest by coords and computing planned arrival/departure from leg durations anchored to 08:00. |
| PR23.1-PUBLIC-HOME-BASE-MISSING | Low | app/[locale]/route-runs/[id]/page.tsx read NEXT_PUBLIC_HOME_BASE_LAT/LNG but .env.example only documented the server-side HOME_BASE_LAT/LNG. The map always fell back to a hardcoded Blonay default; operators couldn't relocate the H marker without grepping the source. |
Resolved — added NEXT_PUBLIC_HOME_BASE_LAT/LNG to .env.example next to their server-side twins with a comment explaining the duplication. |
See docs/LIVE_MAPS_TEST_CHECKLIST.md for the end-to-end verification
runbook and sample data.
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR23-NO-META-APPROVAL-DOC | Medium | docs/OPERATIONS.md § 1 covered WhatsApp token rotation after Meta approval but there was no operator-facing runbook for the externally-blocked work that gets you there: business verification, display name approval, template submission per template per locale, system-user token mint, webhook + verify-token install, phased cutover. Meta review is the longest external lead time on the go-live critical path (1–2 weeks typical) and "guess and check" was actively burning that timer. |
Resolved — new docs/WHATSAPP_PRODUCTION_APPROVAL.md (10 sections). Covers the full sequence end-to-end: Swiss business verification documents (Handelsregisterauszug, VAT/UID, authorised signatory), Meta Business account, WABA creation, display name approval, all nine lib/demo/template-registry.ts templates × EN/FR (18 submissions) with category guidance + common rejection causes, system-user permanent token mint (cross-references docs/OPERATIONS.md § 1.2 rather than duplicating), webhook + verify-token install in the Meta App Dashboard, phased cutover via the Phase 20 simulator's "Send to me (real)" path, rollback via DEMO_MODE=true, and a failure-mode quick-reference table. |
| PR23-NO-DATA-LOAD-DOC | Medium | docs/IMPORT_GUIDE.md covered the mechanics of /admin/import (dry-run + commit, conflict policies, column reference) but not the upstream prep: source-data inventory across Kathelijne's existing book shapes (VetUp export, Outlook contacts, appointment diary, WhatsApp history, handwritten yard notes), practice-specific dedup + mapping decisions, data-quality pre-checks, post-load verification queries, rollback paths. The Phase 20 import UI was operationally invisible without this runbook. |
Resolved — new docs/PRODUCTION_DATA_LOAD.md (9 sections). Covers source-data inventory, the practice-specific mapping decisions IMPORT_GUIDE deliberately stays generic about (couple-vs-single legal-entity for customers, E.164 Swiss numbers, francophone-vs-anglophone preferred language, when to leave Lat/Lng blank, owner-vs-yard-manager distinction for horses), pre-load data-quality checks, the customer→yard→horse load procedure with a manual pre-migrate snapshot bracket, a post-load SQL verification rollup, three-tier rollback (re-import / manual snapshot / nightly backup window), and a common-gotchas table. |
Phase 22 — Audit tail: WhatsApp token boot probe + pre-migrate snapshot + SW cache verification (2026-05-16)
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR22-MED05-WA-TOKEN-PROBE | Medium | WHATSAPP_ACCESS_TOKEN could be revoked by Meta (rotation forgotten, business-account owner removed, suspected compromise) and the app would only discover it when the first outbound confirmation failed — often hours later. No boot-time signal existed; docs/OPERATIONS.md §1 only covered the manual rotation procedure. |
Resolved — new lib/services/whatsapp-token-probe.service.ts fires once per server start from instrumentation.ts. Makes a single non-state-mutating GET https://graph.facebook.com/v21.0/<phone_number_id> with a 5-second timeout. On HTTP 401 writes AuditLog{action:'WHATSAPP_TOKEN_INVALID'} and sends a once-per-UTC-day alert email via emailService.sendBrandedEmail to MAPS_ALERT_EMAIL (re-uses the Phase 17 maybeFireSoftCapAlert dedup pattern). Transient failures (5xx, network errors) log but never alert. Demo mode skips the probe entirely. |
| PR22-LOW01-PRE-MIGRATE-SNAPSHOT | Low | The nightly backup compose service runs pg_dump at 02:30 UTC. A destructive migration deployed at 14:00 left up to a 23-hour data-loss window before the most-recent dump aged out — the operator's only recovery path was a backup from the previous day. |
Resolved — new pre-migrate-snapshot compose service (postgres:16-alpine) runs docker/pre-migrate-snapshot.sh once on every docker compose up immediately before migrator, writing a labelled pre-migrate-<UTC>.sql.gz into the shared backups_data volume. migrator now depends_on: pre-migrate-snapshot: service_completed_successfully, so the snapshot lands before any schema change. Skips cleanly on first-ever boot (empty schema). Same safety guards as the nightly backup (libpq .pgpass, narrow env-var whitelists, no password literals in commands). Retention reuses the nightly backup's BACKUP_RETENTION_DAYS sweep. Runbook in docs/BACKUP.md §7. |
| PR22-LOW03-SW-CACHE-VERIFY | Low | The audit asked "does Serwist actually invalidate after deploy?". The answer for next-navigation loads is yes — hashed-asset __SW_MANIFEST + skipWaiting: true + clientsClaim: true is canonically invalidation-safe. The gap was a tab that stayed open across a deploy (Kathelijne's inbox sitting open all day): the SW installs the new bundle but the existing tab silently runs the old code until the user reloads. |
Resolved — verification documented in docs/OPERATIONS.md §7. Defensive open-tab safety net shipped on top: scripts/write-version.ts stamps public/version.json with the Git SHA at prebuild time; new components/system/VersionBanner.tsx polls /version.json every 5 minutes (cache-busted), captures the bootstrap SHA on first poll, and surfaces a non-modal <div role="status" aria-live="polite"> banner with a "Refresh" button when the SHA changes. Skipped when the bootstrap SHA is 'dev' (no production build). EN + FR i18n keys under version.*. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR21-HIGH02-SENTRY | High | The 2026-04-18 production-readiness audit asked for Sentry-grade error tracking. Phase 16 shipped a generic webhook error sink (lib/observability/webhook-error-sink.ts) which covers Slack / Teams / generic log collectors — but a practice that explicitly wants Sentry's grouped issues + breadcrumbs + release tracking had no first-class path. |
Resolved — new lib/observability/sentry-error-sink.ts. When SENTRY_DSN is set AND @sentry/nextjs is installed (operator opts in via npm install @sentry/nextjs), a Sentry sink registers alongside the existing webhook sink — both fire in parallel. When the SDK isn't installed the sink logs a one-time stderr warning and falls through to the webhook path. @sentry/nextjs stays an OPTIONAL operator install — no new hard dependency. Operator runbook in docs/OPERATIONS.md §6. |
| PR21-HIGH05-POOL-PARAMS | High | DATABASE_URL could ship without Prisma pool-tuning params (connection_limit / pool_timeout), leaving the app on Prisma's default 5-connection pool. Under concurrent load (WhatsApp webhooks + n8n callbacks + UI traffic) the pool would silently exhaust and requests would time out. Phase 15 documented the recipe in docs/OPERATIONS.md §2 but didn't enforce it. |
Resolved — lib/utils/env-check.ts now warns at boot when DATABASE_URL lacks the pool params (non-demo mode only). /api/status exposes probes.database.poolConfigured + poolMissing[] so the gap is visible on the observability admin page. .env.example shows the recommended URL with ?connection_limit=10&pool_timeout=10. The URL is never silently mutated — the operator decides. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR20-TEMPLATES-RAW | Medium | The /admin/templates editor showed raw positional {{1}} / {{2}} placeholders and required a manual Save click per language pane. Non-technical operators (vet) found the screen intimidating. |
Resolved — components/admin/TemplatesAdmin.tsx rewritten with click-to-insert placeholder pills (per-template toolbar), debounced auto-save, live validation badges (ok / missing / unknown), and a "Preview as customer" panel that renders against real DB data. Storage format unchanged (Meta API still gets positional placeholders). New lib/utils/template-placeholders.ts provides a round-trip-tested {{N}} ↔ [name] serialiser. New DELETE endpoint + messageTemplateService.deleteOverride() for the Reset-to-default button. |
| PR20-NO-IMPORT | High | The practice could Export Customers / Export Yards / Export VetUp CSV from the Customers page but had no way to upload an existing customer database — bulk onboarding required manual UI clicks per customer. |
Resolved — new /[locale]/admin/import admin page with drag-drop CSV upload, profile selector (customers / yards / horses), dry-run preview table, and three conflict policies (skip / update / abort). Round-trips with the existing VetUp export schema. New csv-parse.service.ts + csv-import.service.ts + app/api/admin/import/{preview,commit}/route.ts. Atomic $transaction commit; ADMIN-only; file SHA-256 + per-action counts written to AuditLog{action:'IMPORT_RUN'}; uploaded file is NOT persisted on disk. New runbook docs/IMPORT_GUIDE.md. |
| PR20-NO-SIMULATOR | Medium | No way to test what a WhatsApp template would look like for a real customer without actually sending — the only options were "send to a real customer" or "wait until production exercises it". | Resolved — new /[locale]/admin/simulator page lets the admin pick template + locale + customer + (optional) appointment and see the rendered output. "Simulate send" never touches Meta and writes a TEMPLATE_SIMULATED audit row. "Send to me (real)" is gated on WHATSAPP_TEST_NUMBER env var, rate-limited 3/hour per admin, audited as TEMPLATE_TEST_SENT. Renders use lib/services/template-render.service.ts (shared with the templates editor preview). |
| PR20-MAP-CROSSES-LAKE | Medium | components/maps/RouteMap.tsx drew a geodesic: true straight-line Polyline between yards. For Vaud-side and Valais-side practice yards, the line went straight through Lake Geneva instead of routing around it via Lausanne / Évian. |
Resolved — RoutePolyline replaced by RouteDirections, which uses Google's client-side DirectionsService (no server quota cost; included in the Maps JS API base load) for road-following polylines per leg. Per-leg results cached in sessionStorage keyed by lat,lng→lat,lng so revisits don't re-request. Falls back to a fainter geodesic line on per-leg failure. New NEXT_PUBLIC_MAP_ROUTING_MODE env var (directions in production, straight for demo deploys with synthetic coordinates). The optimizer's own time/distance estimates (Phase 5 + Phase 17) are unchanged — the map is a visualisation layer only. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR19-OUTLOOK-NO-DOC | Low | Phase 18 wired Gmail through the existing n8n IMAP workflow but no operator documentation existed for pointing the same workflow at Outlook / Microsoft 365. The build-update doc had marked Outlook inbound as defer-to-follow-up. | Resolved — new docs/OUTLOOK_INBOUND.md with full IMAP + app-password setup runbook against Outlook / 365. Covers troubleshooting, "running Gmail AND Outlook simultaneously" pattern (deduped by Message-ID at the webhook level), and a sketched OAuth2 / Microsoft Graph upgrade path for when IMAP becomes untenable. No code changes — the existing emailReadImap node is provider-agnostic. |
| PR19-SCOPE-AMBIGUITY | Medium | Patrick's 2026-04-12 consultant review surfaced six pointed questions about what "appointment management" actually means in MVP. Recurring stakeholder ask. The contract excluded automatic time-slot proposals (§ 3.3), but the boundary between "regional grouping" and "route optimisation" was never recorded in writing for the practice operator. | Resolved — new docs/SCOPE_CLARIFICATIONS.md answers each of the six questions point-by-point against the as-built state (Phase 18). Records the MVP positioning ("intelligent workflow automation and scheduling assistant", not autonomous scheduler), the in-scope vs. out-of-scope register, and a path-to-yes sketch for auto AM/PM slot suggestion if Kathelijne finds manual selection painful in practice. Establishes that scope changes require a contract amendment, not a doc update. |
| PR19-H06-NO-HANDOVER | Medium | Contract row H-06 (source-code transfer to the practice-owned GitHub account) was marked Pending in the April-12 build-update doc. No runbook existed; the procedure lived in tribal knowledge. | Resolved — new docs/HANDOVER.md covers the full transfer including pre-transfer secret inventory (~40 env vars cross-referenced to lib/env.ts), external integration inventory (Meta WhatsApp webhook, Vercel, n8n credentials, Anthropic billing, Google Maps API key, GitHub OAuth app, GitGuardian), the GitHub transfer itself, post-transfer verification checklist, and a rollback plan (GitHub transfers are reversible within 48h by the new owner). |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR18-NO-INBOX | Medium | The April-12 stakeholder build-update doc promised a "unified inbox" aggregating WhatsApp + email. The /enquiries page existed as a triage queue but there was no operator-facing thread view; mobile nav surfaced the triage queue, not an inbox. |
Resolved — new /[locale]/inbox page + components/inbox/InboxView.tsx. Thread-grouped (per-customer or per-sourceFrom for unknown senders), channel-segmented (ALL / WhatsApp / Email), search-debounced, mobile-first. Sidebar entry added; MobileNav now surfaces Inbox in place of the triage queue. Tapping a thread links to the existing /enquiries/[id] page which already renders the EnquiryMessage timeline. |
| PR18-N8N-STUBS | Medium | n8n/02-inbound-email.json shipped as noOp placeholder nodes. The webhook handler at POST /api/webhooks/email was complete and idempotent, but no actual mail flowed through it because the n8n pipeline didn't exist beyond skeleton form. |
Resolved — workflow now contains real nodes: emailReadImap trigger (Gmail/365 compatible; credential configured in n8n UI), a Code node reshaping the IMAP item into the strict zod payload contract, an HTTP Request POSTing to /api/webhooks/email with Authorization: Bearer ${N8N_API_KEY}, and an IF branch routing failures to a logger. Shipped inactive — operator activates after configuring the IMAP credential. New static-analysis test (__tests__/unit/n8n/inbound-email-workflow.test.ts) fails CI if the workflow ever regresses to noOp stubs. |
| PR18-ROUTE-REORDER-MISSING | Medium | Patrick's feedback assumed the vet would adjust proposed routes before approval. The implementation only let the vet approve or reject — no drag-reorder, no sequence edit, no mobile-friendly affordance. | Resolved — new PATCH /api/route-planning/proposals/[id]/reorder-stops endpoint (atomic $transaction reorder of all stops; locks at APPROVED+) + new components/route-runs/RouteRunStopsList.tsx component with two reorder mechanisms: HTML5 drag-and-drop on the row, and up/down arrow buttons (accessible, touch-friendly, screen-reader-labelled). Reorder clears the per-stop travelFromPrev* figures (stale after a resequence) so the UI doesn't show misleading numbers; an operator can re-run route generation to refresh them. A <BottomSheet> wrapper opens the same reorderable list in a focused mobile drawer via a "Reorder stops" button visible only on <lg viewports. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR17-MAPS-UNCAPPED | High | EQUISMILE_LIVE_MAPS=true opened uncapped billing exposure on Google Geocoding + Route Optimization. No per-call telemetry, no daily spend cap, no operator-visible usage page. A runaway batch (or a malicious test of the geocode endpoint) could rack up unbounded cost before anyone noticed. |
Resolved — new MapsApiCall model + lib/services/maps-cost-tracker.service.ts (checkBudget / recordCall / getDailySpendUsd). Three call sites instrumented: googleMapsClient.geocode, geocodingService.geocodeAddress, routeOptimizerService.optimizeRoute. Hard cap (MAPS_DAILY_SPEND_CAP_USD) throws MapsBudgetExceededError before the network call; soft cap (MAPS_SOFT_CAP_PCT) flags the threshold + sends a once-per-UTC-day email (MAPS_ALERT_EMAIL). New admin page /[locale]/admin/maps-usage polls /api/admin/maps-usage for today's spend + 7-day rollup + recent calls. Demo-mode calls are not wrapped and produce no telemetry. KI-001 (rate-limit on large batches) closes mechanically — batchGeocodeYards now uses budget-driven gating instead of a fixed 100ms delay. |
| PR17-DOCS-BACKFILL | Low | docs/BUILD_PLAN.md had been maintained through Phase 14; Phase 15 (Production-readiness, 2026-04-23) and Phase 16 (Overnight hardening, eight slices, 2026-04-25/27) were documented only in this file. |
Resolved alongside Phase 17 entry — Phase 15 + 16 backfilled into the BUILD_PLAN phase-overview table and Phase 17 added as the new entry. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| OVH8-SOFTDEL-UI | Medium | The soft-delete infrastructure shipped across PRs #51, #52, the AuditLog parity work, and the Prisma extension was operationally invisible — operators had to curl the DELETE endpoints. No UI button, no confirmation flow, no toast. The feature was in practice unused, leaving the AuditLog table empty and the audit story untested in production. |
Resolved — new components/ui/DeleteEntityButton.tsx reusable component (role-aware, modal-confirmed, toast-on-result, locale-aware redirect). Wired into the four detail pages: app/[locale]/{customers,yards,horses,enquiries}/[id]/page.tsx. Customer/yard/enquiry require admin; horse requires vet (mirrors the API). EN + FR i18n strings added under softDelete.*. 12 vitest cases regress role gating (admin vs readonly/nurse/vet/no-session), the no-one-click rule, fetch wiring, success-toast-and-redirect, error-toast-and-stay, and network-throw handling. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| OVH7-SETUP-EXECSYNC | Medium | PR #51 known risk #4 — /api/setup invoked execSync('npx prisma migrate deploy') and execSync('npx tsx prisma/seed-demo.ts') from an HTTP handler. Three problems: (1) child-process spawn from a request handler is a code-execution vector if the DEMO_MODE gate ever weakens; (2) execSync blocks the Node event loop for the full duration of the migration/seed, starving every other in-flight request; (3) error handling worked off raw stderr text, which can carry DB credentials in failure modes. |
Resolved — handler rewritten to a stable 410 Gone response with operator guidance. The compose stack already runs migrations correctly via the migrator service; local-dev callers see npx prisma migrate deploy && npx tsx prisma/seed-demo.ts in the response body. DEMO_MODE gate retained as defence-in-depth. New __tests__/unit/api/setup.test.ts (5 cases) including a static-analysis regression that fails CI if child_process, execSync, spawn, or fork ever return to the route. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| OVH-AUTH-COMPLETE | High | No mechanical proof that every business app/api/* route is gated by a session — relied on per-route audits |
Resolved — __tests__/unit/auth/auth-guard-completeness.test.ts walks every route.ts under app/api/ and asserts that any non-whitelisted path returns 401 unauthenticated. |
| OVH-DEMO-LEAK-CLIENT | Medium | RouteMap read process.env.NEXT_PUBLIC_DEMO_MODE, baking demo-mode state into the live client bundle |
Resolved — removed; client now uses absence-of-NEXT_PUBLIC_GOOGLE_MAPS_BROWSER_KEY + an explicit forceStatic prop driven by server-side runtime status. |
| OVH-ENQ-SOFTDEL | High | Phase 15 added soft-delete to Customer/Yard/Horse but Enquiry rows (inbound customer messages) were still hard-deletable | Resolved — Enquiry.deletedAt + Enquiry.deletedById migration 20260425000000_phase16_enquiry_softdelete_auditlog; repository filters deletedAt: null by default. |
| OVH-NO-AUDIT-GENERIC | Medium | SecurityAuditLog covers security events; TriageAuditLog covers visit-request fields; nothing covered generic operator mutations (enquiry tombstone, route-run flips) |
Resolved — generic AuditLog model + lib/services/audit-log.service.ts with redacted JSON details, append-only writes, best-effort failure handling. |
| OVH-CADDY-CSP | Medium | Caddy emitted basic security headers but no CSP — a request that bypassed the Next middleware (cached static, n8n subdomain, error page) had no CSP fallback | Resolved — Caddyfile now sets a CSP at the proxy layer mirroring lib/security/headers.ts, plus Permissions-Policy, COOP and CORP. |
| OVH-STATUS-SHALLOW | Medium | /api/status reported integration modes but did not actively probe DB / n8n / messaging readiness |
Resolved — /api/status now runs a live SELECT 1, n8n /healthz probe (3s timeout), and per-integration readiness summaries with missing[] lists. |
| OVH-PII-RESIDUAL | Low | Two stray PII paths: full address in geocoding partial-match warning, raw error object in manual-enquiry auto-triage failure | Resolved — geocoding now logs postcode prefix only; auto-triage failure logs error.message against enquiryId. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| OVH3-AUDITLOG-PARITY | Medium | The generic AuditLog table introduced in PR #51 had only one caller (DELETE /api/enquiries/[id] from PR #52). The pre-existing Customer / Yard / Horse soft-delete handlers (Phase 15) wrote only SecurityAuditLog, leaving the AuditLog table half-built — an operator looking up "everything that has happened to customer X" via AuditLog.entityId would see only enquiries. |
Resolved — Customer / Yard / Horse DELETE handlers now dual-write to BOTH SecurityAuditLog (security-event timeline) AND AuditLog (per-entity index). New tests in __tests__/unit/api/yards.test.ts and __tests__/unit/api/horses.test.ts plus an extended customers.test.ts regression. Documented as a hard rule in docs/ARCHITECTURE.md → "Audit trail" so future contributors don't drift. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| OVH2-COERCE-BOOL | High | enquiryQuerySchema.includeDeleted used z.coerce.boolean(). JS Boolean() returns true for any non-empty string — including "false". ?includeDeleted=false would have silently exposed tombstoned enquiries containing inbound customer messages. (Cursor Bugbot #c7a7eb5c.) |
Resolved — replaced with z.enum(['true','false']).transform(v => v === 'true') matching the customer/yard/horse pattern, plus regression tests asserting 'false' → false and rejection of '1'/'yes'/empty string. |
| OVH2-ENQ-INC-DEL-GATE | Medium | The new includeDeleted flag flowed unguarded through GET /api/enquiries. A READONLY user could URL-hack ?includeDeleted=true and read tombstoned enquiry PII. (Cursor Bugbot #99773815.) |
Resolved — handler now silently downgrades the flag to false for non-admin sessions, mirroring app/api/customers/route.ts. Three new vitest cases lock this in. |
| OVH2-N8N-PROBE-DEAD-BRANCH | Low | probeN8n returned 'unconfigured' when !env.N8N_HOST, but lib/env.ts defaults N8N_HOST to 'localhost' — making the branch dead code. Stacks with no n8n burned a 3-second timeout per /api/status poll and reported 'unreachable'. (Cursor Bugbot #da88139d.) |
Resolved — branch now keys on !env.N8N_API_KEY, the credential every n8n callback already fail-closes on. New regression test asserts no fetch() call when the key is absent. |
| OVH2-ENQ-DELETE-ROUTE | High | The Enquiry repository gained delete() / restore() / hardDelete() in PR #51 but no HTTP entry point existed — admins could delete customers/yards/horses but not misrouted spam enquiries. |
Resolved — DELETE /api/enquiries/[id] (admin-gated, soft-delete, writes both SecurityAuditLog{event:'ENQUIRY_DELETED'} and the generic AuditLog{action:'ENQUIRY_DELETED'}). Migration 20260425100000_phase16_enquiry_audit_events adds the new enum values. |
| OVH2-AUTOTRIAGE-LOG-PII | Low | Auto-triage failure log included err.message. Inner triage services occasionally embed raw inbound text in their error messages (e.g. "failed to parse: '<customer message>'"), bypassing the maskPhone/maskEmail utilities. |
Resolved — log now records errorClass (e.g. "ZodError") plus enquiryId only. Operators reach the full payload through redacted channels. |
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR16-NO-ERRSINK-IMPL | Medium | Phase 15 shipped a sink interface but no wireable implementation | Resolved — lib/observability/webhook-error-sink.ts + instrumentation.ts auto-register when EQUISMILE_ERROR_WEBHOOK_URL is set. |
| PR16-BACKUP-MANUAL | High | Backup was a host cron the operator had to install by hand | Resolved — backup compose service runs pg_dump on an internal cron; no host setup required. |
| PR16-NO-RESTORE-DRILL | Medium | No mechanical way to verify a backup is restorable | Resolved — scripts/backup-restore-verify.sh restores the newest dump into a scratch DB and asserts schema + row presence. |
| PR16-NO-OPS-UI | Medium | No operator-visible view of DLQ depth, audit activity, backup freshness | Resolved — /api/admin/observability + /[locale]/admin/observability page (admin-only). |
| PR16-PII-SWEEP | Low | Remaining raw phone in confirmation.service and n8n send-whatsapp trigger | Resolved — maskPhone() applied to all outbound logs. |
Filed and closed during the Phase 15 PR. See PRODUCTION_READINESS.md for the updated go-live checklist.
| ID | Severity | Description | Resolution |
|---|---|---|---|
| PR15-SOFT-DEL | High | Hard deletes on Customer/Yard/Horse cascaded clinical records | Resolved — deletedAt / deletedById tombstones + repo-level deletedAt: null default filter. |
| PR15-DOCKER-ENV | High | Docker compose missing Auth.js / Anthropic / NEXT_PUBLIC_GOOGLE_MAPS_BROWSER_KEY pass-through |
Resolved — env_file: .env + explicit args: for NEXT_PUBLIC_* build-time vars. |
| PR15-NO-BACKUP | High | No backup script or restore runbook | Resolved — scripts/backup-db.sh + docs/BACKUP.md. |
| PR15-API-RATE | Medium | Only webhook & vision endpoints rate-limited; no floor on authenticated write traffic | Resolved — middleware-level per-user API write-limit (60s / 120 writes). |
| PR15-PII-LOGS | Medium | Raw phone/email in WhatsApp/email/n8n-trigger logs | Resolved — maskPhone() / maskEmail() wrapped around every outbound log. |
| PR15-NO-ERRSINK | Low | No hook to forward errors to Sentry / log aggregator | Resolved — registerErrorSink() in lib/utils/logger.ts. |
| PR15-NO-LEGAL | Low | No public privacy notice or terms page | Resolved — /[locale]/privacy + /[locale]/terms (EN + FR). |
| PR15-NO-TOKENOPS | Low | WhatsApp token lifecycle not documented | Resolved — docs/OPERATIONS.md §1. |
| PR15-NO-POOLTUNE | Low | Prisma pool tuning / pool_timeout not documented |
Resolved — docs/OPERATIONS.md §2. |
| PR15-WEAK-DBPW | High | docker-compose.yml used equismile_dev as a default POSTGRES_PASSWORD |
Resolved — compose now fails loud via ${POSTGRES_PASSWORD:?} with no default; .env.example uses an obvious <strong-password-here> placeholder. |
| ID | Phase | Severity | Description | Workaround |
|---|---|---|---|---|
Resolved in Phase 17 — batchGeocodeYards() now drives request rate from mapsCostTrackerService.checkBudget(): slows to 200ms-per-call when the soft cap is hit, and aborts with a partial-result return when the hard cap is hit. See docs/MAPS_COST_CONTROL.md. |
||||
POST /api/reminders/check being called periodically — no built-in cron |
Resolved in Phase 12d | |||
| Resolved in v1.1 — see Resolved Issues table | ||||
| KI-004 | 3 | Medium | WhatsApp webhook verification requires the app to be publicly accessible — not possible in local dev | Use ngrok or similar tunnel for local WhatsApp testing |
| KI-005 | 4 | Low | Auto-triage confidence scores are heuristic-based and may misclassify edge cases | Manual override is available; triage tasks created for low-confidence classifications |
| KI-006 | 9 | Info | /api/webhooks/*, /api/n8n/*, and /api/reminders/check intentionally bypass session auth and stay behind the separate N8N_API_KEY check — by design, because n8n calls them server-to-server without a browser session. Phase 14 PR E hardened this: the key gate now FAILS CLOSED in production (HTTP 500) when N8N_API_KEY is unset, instead of silently accepting anonymous traffic. |
No action; enforced in middleware.ts via PUBLIC_PATH_PATTERNS + lib/utils/signature.ts#requireN8nApiKey. |
| KI-007 | 14 | Info | In-memory rate limiters (lib/utils/rate-limit.ts) do not share state across horizontally-scaled instances. Acceptable for the single-vet single-VPS deploy shape; promote to Redis when the deploy goes multi-node. |
No action required for v1 scale. |
Filed during the Phase Verification Plan audit. See V1_AUDIT_FINDINGS.md for the per-phase evidence tables.
| ID | Phase | Severity | Description | Workaround / Recommendation |
|---|---|---|---|---|
Closed in-audit — guarded 3 exec-bit tests with itPosix helper in __tests__/unit/infra/demo-startup.test.ts; POSIX CI still enforces |
||||
#1e40af (blue) in manifest/layout/globals.css instead of #9b214d (maroon) specified in PHASE_1_MASTER_PROMPT § 1.2 and shown in Logo.png |
Resolved — aligned all four code sites (globals.css, manifest.ts, layout.tsx, RouteMap.tsx) to the spec maroon #9b214d. Added --color-primary-light (#c23b6c) and --color-primary-dark (#6f1738) tints. |
|||
Resolved by PR #17 (Phase 12d) — seed.ts split into production (minimal) + seed-demo.ts (8c/8y/20h/12e) |
||||
/visit-requests route |
Resolved in Phase 14 PR D — added /[locale]/visit-requests list page with status + urgency filters. |
|||
TriageStatus + PlanningStatus + TriageTaskType |
Resolved by docs in Phase 14 PR D — docs/ARCHITECTURE.md now carries an explicit disposition mapping table. |
|||
source, precision, formattedAddress |
Resolved in Phase 14 PR D — added geocodeSource, geocodePrecision, formattedAddress nullable columns via additive migration. |
|||
RouteRun/RouteRunStop used instead of master prompt's RouteProposal/RouteStop |
Resolved by docs in Phase 14 PR D — explicit rename rationale + mapping in docs/ARCHITECTURE.md. |
|||
AppointmentStatus enum instead of separate Booking/Confirmation/Reminder enums |
Resolved by docs in Phase 14 PR D — rationale in docs/ARCHITECTURE.md; multi-send audit now captured by ConfirmationDispatch (AMBER-10). |
|||
| AMBER-09 | 6 | Low | No explicit AppointmentHorse link table; horses inferred from VisitRequest relation |
Adequate if per-appointment horse metadata (order, per-horse duration) is not tracked |
ConfirmationDispatch event log |
Resolved in Phase 14 PR D — ConfirmationDispatch table + appointmentAuditService.logConfirmationDispatch; every send attempt (success or failure) recorded. |
|||
AppointmentResponse model |
Resolved in Phase 14 PR D — AppointmentResponse table + appointmentAuditService.logResponse; captures inbound confirm/cancel/reschedule replies linked directly to the appointment. |
|||
ReminderSchedule queue |
Resolved by docs in Phase 14 PR D — inline timestamps + idempotent cron are adequate for single-vet scale; promotion plan documented in docs/ARCHITECTURE.md. |
|||
AppointmentStatusHistory table |
Resolved in Phase 14 PR D — AppointmentStatusHistory table; booking / reschedule / visit-outcome services write history rows in the same transaction as status mutations. |
|||
processedKeys: Set<string>) — lost on restart and not shared across instances |
Resolved by phase 13 — IdempotencyKey Prisma model + lib/services/idempotency.service.ts (Postgres-backed). hasBeenProcessed/markAsProcessed are now async. Survives restarts, shared across instances, 30-day TTL with pruneExpired() cron. |
|||
maxRetries |
Resolved in Phase 14 PR D — FailedOperation table + deadLetterService. whatsappService and emailService enqueue permanent failures; operators replay via deadLetterService.markStatus. Payloads scrubbed with redact() before storage. |
|||
Retracted — __tests__/unit/utils/retry.test.ts already exists with full coverage |
| ID | Phase | Description | Resolution |
|---|---|---|---|
| KI-002 | 6 | Reminder scheduling had no built-in cron | Added n8n/07-reminder-scheduling.json — n8n workflow triggers GET /api/reminders/check every 15 minutes |
| KI-003 | 7 | PWA offline queue did not retry mutations in submission order and broke on the first failure, blocking subsequent items | v1.1 PR — lib/offline/queue-replay.ts adds an explicit monotonic sequence field per queued record, sorts on replay, drops 4xx/2xx, retains 5xx for retry, and aborts only on a fetch throw (genuine offline). Service worker (app/sw.ts) wires through the helpers; pure logic covered by __tests__/unit/offline/queue-replay.test.ts (13 cases). |
- Log issues discovered during development or UAT here
- Include phase, severity (low/medium/high/critical), and description
- Add workaround if available
- Move to Resolved section when fixed, with resolution notes
- Remove from Resolved after one release cycle