|
| 1 | +# WhatsApp Evaluation — Decision Document |
| 2 | + |
| 3 | +Status: draft |
| 4 | +Date: 2026-05-03 |
| 5 | + |
| 6 | +Summary |
| 7 | +- Goal: evaluate WhatsApp as a messaging surface for GAIA and recommend a Phase 1 path (integration/not-ready/deferral). |
| 8 | +- Short recommendation: For Phase 1, pursue the WhatsApp Business Cloud API via a partner (Twilio or 360dialog) **only if** we can justify the cost and business verification friction; otherwise defer WhatsApp until post-v0.20.0 and focus on Telegram for Phase 0. |
| 9 | + |
| 10 | +Integration paths (concrete pros/cons) |
| 11 | + |
| 12 | +1) WhatsApp Business Cloud API (Meta-hosted) |
| 13 | + - Pros: |
| 14 | + - Official, supported API with predictable behaviour and low ban risk. |
| 15 | + - Scales to many users and supports media messages, templates, and webhooks. |
| 16 | + - Legal / terms-of-service alignment - suitable for production. |
| 17 | + - Cons: |
| 18 | + - Requires Business Manager verification, a phone number, and template approval for outbound messages. |
| 19 | + - Conversations are template-gated for initial outbound messages (session vs template model) which constrains UX for unsolicited streaming or voice-first flows. |
| 20 | + - Metadata and message content transit Meta servers - privacy implications. |
| 21 | + |
| 22 | +2) WhatsApp Business API via partners (Twilio, 360dialog, MessageBird) |
| 23 | + - Pros: |
| 24 | + - Polished onboarding, SDKs, billing, and delivery guarantees; some partners reduce Meta verification friction. |
| 25 | + - Single integration point with enterprise-grade features (message queues, retries, logging). |
| 26 | + - Cons: |
| 27 | + - Adds third-party vendor costs and another privacy/metadata recipient. |
| 28 | + - Still subject to Meta template rules; partner doesn't eliminate wrapper constraints. |
| 29 | + |
| 30 | +3) whatsapp-web.js (community library driving WhatsApp Web) |
| 31 | + - Pros: |
| 32 | + - Works with personal accounts, quick to prototype, free software (MIT). |
| 33 | + - Enables features not available via Business API (ad-hoc messages, voice notes from personal threads) and can be run locally. |
| 34 | + - Cons: |
| 35 | + - Violates WhatsApp Terms of Service for automated non-official clients; accounts are commonly banned. |
| 36 | + - Unreliable long-term: session invalidation, frequent breakages when WhatsApp updates Web protocol. |
| 37 | + - Metadata still goes to Meta (via Web client) - plus our client holds session credentials. |
| 38 | + |
| 39 | +4) Baileys (reverse-engineered protocol) |
| 40 | + - Pros: low-level control, performant, community-maintained, avoids Puppeteer overhead. |
| 41 | + - Cons: same TOS risk as whatsapp-web.js; higher maintenance; frequent breakage after protocol changes. |
| 42 | + |
| 43 | +Privacy posture (which servers see message data / metadata) |
| 44 | +- Business Cloud API: Message payloads and metadata flow through Meta Cloud API endpoints (Meta-managed infra). If using a partner, message copies / delivery metadata also flow through the partner's servers. |
| 45 | +- Partner (Twilio/360dialog): Partner sees metadata and possibly message bodies (depending on partner model). They store logs/delivery receipts per contract. |
| 46 | +- whatsapp-web.js / Baileys: Messages flow end-to-end between participants but are proxied through Meta (WhatsApp Web). Our local client holds session keys; Meta still sees metadata (device/IP) and may analyze content per their privacy policy. Because these are unofficial clients, WhatsApp monitors for automation patterns and may ban accounts. |
| 47 | + |
| 48 | +Cost model (production) |
| 49 | +- Business Cloud API (Meta): Meta pricing varies and often has free tier for limited messages; template messages may have per-message charges in some regions. Expect low per-message cost but operational overhead (verification) and possible per-message charges for large campaigns. |
| 50 | +- Partner (Twilio): Typical model is per-message + monthly number fees + optional throughput/queueing. Budget: modest monthly fixed + per-message variable (example: $0.005–$0.05/msg depending on region and message type). |
| 51 | +- whatsapp-web.js / Baileys: Zero direct provider fees, but higher maintenance cost and risk (replace suspended accounts, dev time). Not viable for reliable production. |
| 52 | + |
| 53 | +UX implications vs Telegram (side-by-side) |
| 54 | +- Message templates + session model (WhatsApp Business) make unsolicited streaming (e.g., long TTS or progressive voice notes) awkward: initial outbound must be a pre-approved template in many cases, while Telegram allows free-form message delivery and bot-initiated rich interactions. |
| 55 | +- Voice notes: WhatsApp supports voice messages natively, but streaming partial audio to render progressive playback is not supported by Business Cloud API — we'd need to upload final audio or use ephemeral media sessions. Telegram supports bots sending audio and streaming-like behaviours more easily. |
| 56 | +- Presence and discoverability: WhatsApp relies on phone numbers and a business profile; Telegram bots have usernames and deep-linking that are easier for consumer discovery and opt-in. |
| 57 | + |
| 58 | +Recommended Phase 1 path and rationale |
| 59 | +- Recommendation: Defer unlimited WhatsApp integration until post-v0.20.0 unless there is a funded business case requiring WhatsApp immediately. If we must ship Phase 1, target the **WhatsApp Business Cloud API via a partner (Twilio or 360dialog)** as the implementation path for production-grade users. Rationale: |
| 60 | + - Official APIs + partner tooling reduce ban/legal risk and provide production SLAs. |
| 61 | + - Although template gating reduces some UX flexibility, a partner + careful conversation design can deliver acceptable UX for business use-cases (support agents, notifications, private assistant flows initiated by users). |
| 62 | + - Avoid community drivers for Phase 1 due to high ban risk and maintenance burden. |
| 63 | + |
| 64 | +Sample conversation (Business Cloud via partner) |
| 65 | +User -> GAIA: "Hey, summarize my travel plan for May 11" |
| 66 | +GAIA -> (incoming webhook) -> agent resolves intent and replies as a session message (no template needed if user initiated within 24h). If GAIA needs to proactively message later, it uses an approved template: "Your travel summary is ready: [link]". |
| 67 | + |
| 68 | +Setup steps (partner-assisted Phase 1) |
| 69 | +1. Create Meta Business Manager, complete verification (documented; can take days). |
| 70 | +2. Provision WhatsApp Business account and phone number (via Twilio/360dialog). |
| 71 | +3. Configure webhooks to GAIA's API server (TLS endpoint), map partner events to `gaia` adapter. |
| 72 | +4. Implement conversation mapping: webhook -> adapter -> agent -> reply via partner API. |
| 73 | +5. Submit required message templates for proactive messages. |
| 74 | + |
| 75 | +Phase 1 investigation tasks & effort estimate |
| 76 | +- Spike A — whatsapp-web.js prototype (safety spike): 2–3d engineer. Deliverable: short-running prototype, documented ban-rate evidence (public threads & maintainers), and a decision note explaining operational risk. |
| 77 | +- Spike B — Business Cloud API free-tier onboarding: 3–5d engineer (mainly procedural). Deliverable: documented steps, expected Meta verification time, sample webhook round-trip, template submission trial. |
| 78 | +- Design task — UX mapping vs Telegram: 1–2d PM/Designer to map conversational flows and constraints (template gating, voice notes, streaming). Deliverable: side-by-side UX matrix and concrete recommendations. |
| 79 | +- Integration spec — Partner integration adapter + auth + message mapping: 3–5d engineer to draft Phase 1 implementation PR scope and API shapes. |
| 80 | +- Total Phase 1 investigation estimate: 2–3 weeks (one engineer + PM part-time). |
| 81 | + |
| 82 | +Risks & mitigations |
| 83 | +- Account bans (web.js/Baileys): risk -> avoid for Phase 1. |
| 84 | +- Long Meta verification times: mitigation -> use partner onboarding guidance and start process early, budget for delays. |
| 85 | +- Privacy concerns: mitigate via docs, opt-in flows, minimal logging, and contractual review with partners. |
| 86 | + |
| 87 | +Decision outcome & next steps |
| 88 | +- Primary choice: Defer or build on Business Cloud API via partner. Phase 1 should focus on investigation spikes and a partner-backed adapter if the project has commercial justification. |
| 89 | +- Next steps (Phase 1): run Spike A and Spike B in parallel, finalize integration spec, then open implementation PR scoped to partner adapter + tests + docs. |
| 90 | + |
| 91 | +Links and references |
| 92 | +- WhatsApp Business Cloud API: https://developers.facebook.com/docs/whatsapp |
| 93 | +- whatsapp-web.js: https://github.com/pedroslopez/whatsapp-web.js |
| 94 | +- Baileys: https://github.com/WhiskeySockets/Baileys |
0 commit comments