Skip to content

Commit caaf64a

Browse files
committed
docs(spec): add WhatsApp messaging adapter evaluation and decision document
Signed-off-by: theonlychant <sacehenry@gmail.com>
1 parent 63e4995 commit caaf64a

7 files changed

Lines changed: 2393 additions & 0 deletions

File tree

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Phase 1: WhatsApp Integration — Implementation Proposal
2+
3+
Status: draft
4+
Related: docs/spec/whatsapp-evaluation.md, issue #889 (Telegram Phase 0)
5+
6+
Objective
7+
- Produce a scoped Phase 1 implementation that validates feasibility and produces a production-ready adapter design for WhatsApp using the chosen path.
8+
9+
Decision (from evaluation)
10+
- Preferred production path: WhatsApp Business Cloud API via a partner (Twilio or 360dialog). If no commercial funding / business justification, defer until after v0.20.0.
11+
12+
Scope (what this PR will do)
13+
- Implement a `whatsapp` messaging adapter for GAIA that:
14+
- Registers a partner-backed transport (Twilio/360dialog) with config-driven credentials.
15+
- Translates incoming webhooks into GAIA adapter events (message, media, reaction, read receipts).
16+
- Sends replies via partner API including media uploads.
17+
- Handles template message sending for proactive outbound workflows.
18+
- Includes unit tests for adapter mapping and an integration smoke test (manual-run) guide.
19+
20+
Out of scope
21+
- Supporting community drivers (`whatsapp-web.js`, Baileys) in the same PR; these remain experimental spikes.
22+
- Automating Meta Business verification or partner account creation.
23+
24+
Phase 1 tasks (concrete)
25+
1) Onboarding docs & credentials (owner: PM) - start Meta Business verification and partner account setup; document expected timelines. (3–7d external)
26+
2) Adapter skeleton (owner: eng) - implement webhook handling, auth, and event mappings; unit tests. (3–5d)
27+
3) Template workflow (owner: eng) - implement template send path and test harness for approved templates. (2–4d)
28+
4) Media handling (owner: eng) - media upload/download flow via partner. (2–4d)
29+
5) Integration spec & docs (owner: eng/pm) - docs/spec/whatsapp-evaluation.md link, configuration docs, runbook for rate-limiting and errors. (2d)
30+
6) Spikes (parallel):
31+
- Spike A: whatsapp-web.js prototype (safety spike) - 2–3d, document ban-rate and operational risks.
32+
- Spike B: Business Cloud API free-tier onboarding trial - 3–5d, document verification friction and webhook round-trip.
33+
34+
Acceptance criteria
35+
- Adapter code checked in under `src/gaia/` with unit tests.
36+
- Documentation updated: `docs/spec/whatsapp-evaluation.md` and adapter README with setup steps.
37+
- Spike reports for A/B filed under `docs/spikes/whatsapp-webjs.md` and `docs/spikes/whatsapp-cloudapi.md` (manual deliverables).
38+
39+
Risks
40+
- Long external wait times (Meta verification) - mitigate by starting verification early and running local integration tests with partner sandbox numbers.
41+
- Privacy contractual review required before storing message logs - involve legal.
42+
43+
Estimates
44+
- Implementation (adapter + tests + docs): 2–3 engineer-weeks.
45+
- Spikes & onboarding: 1–2 engineer-weeks (parallelizable).
46+
47+
Next steps
48+
- Confirm path (partner vs. defer). If confirmed, assign engineer and PM, start partner onboarding, and open a scoped implementation PR with the adapter skeleton.

docs/spec/whatsapp-evaluation.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# WhatsApp Evaluation — Decision Document
2+
3+
Status: draft
4+
Date: 2026-05-03
5+
6+
Summary
7+
- Goal: evaluate WhatsApp as a messaging surface for GAIA and recommend a Phase 1 path (integration/not-ready/deferral).
8+
- Short recommendation: For Phase 1, pursue the WhatsApp Business Cloud API via a partner (Twilio or 360dialog) **only if** we can justify the cost and business verification friction; otherwise defer WhatsApp until post-v0.20.0 and focus on Telegram for Phase 0.
9+
10+
Integration paths (concrete pros/cons)
11+
12+
1) WhatsApp Business Cloud API (Meta-hosted)
13+
- Pros:
14+
- Official, supported API with predictable behaviour and low ban risk.
15+
- Scales to many users and supports media messages, templates, and webhooks.
16+
- Legal / terms-of-service alignment - suitable for production.
17+
- Cons:
18+
- Requires Business Manager verification, a phone number, and template approval for outbound messages.
19+
- Conversations are template-gated for initial outbound messages (session vs template model) which constrains UX for unsolicited streaming or voice-first flows.
20+
- Metadata and message content transit Meta servers - privacy implications.
21+
22+
2) WhatsApp Business API via partners (Twilio, 360dialog, MessageBird)
23+
- Pros:
24+
- Polished onboarding, SDKs, billing, and delivery guarantees; some partners reduce Meta verification friction.
25+
- Single integration point with enterprise-grade features (message queues, retries, logging).
26+
- Cons:
27+
- Adds third-party vendor costs and another privacy/metadata recipient.
28+
- Still subject to Meta template rules; partner doesn't eliminate wrapper constraints.
29+
30+
3) whatsapp-web.js (community library driving WhatsApp Web)
31+
- Pros:
32+
- Works with personal accounts, quick to prototype, free software (MIT).
33+
- Enables features not available via Business API (ad-hoc messages, voice notes from personal threads) and can be run locally.
34+
- Cons:
35+
- Violates WhatsApp Terms of Service for automated non-official clients; accounts are commonly banned.
36+
- Unreliable long-term: session invalidation, frequent breakages when WhatsApp updates Web protocol.
37+
- Metadata still goes to Meta (via Web client) - plus our client holds session credentials.
38+
39+
4) Baileys (reverse-engineered protocol)
40+
- Pros: low-level control, performant, community-maintained, avoids Puppeteer overhead.
41+
- Cons: same TOS risk as whatsapp-web.js; higher maintenance; frequent breakage after protocol changes.
42+
43+
Privacy posture (which servers see message data / metadata)
44+
- Business Cloud API: Message payloads and metadata flow through Meta Cloud API endpoints (Meta-managed infra). If using a partner, message copies / delivery metadata also flow through the partner's servers.
45+
- Partner (Twilio/360dialog): Partner sees metadata and possibly message bodies (depending on partner model). They store logs/delivery receipts per contract.
46+
- whatsapp-web.js / Baileys: Messages flow end-to-end between participants but are proxied through Meta (WhatsApp Web). Our local client holds session keys; Meta still sees metadata (device/IP) and may analyze content per their privacy policy. Because these are unofficial clients, WhatsApp monitors for automation patterns and may ban accounts.
47+
48+
Cost model (production)
49+
- Business Cloud API (Meta): Meta pricing varies and often has free tier for limited messages; template messages may have per-message charges in some regions. Expect low per-message cost but operational overhead (verification) and possible per-message charges for large campaigns.
50+
- Partner (Twilio): Typical model is per-message + monthly number fees + optional throughput/queueing. Budget: modest monthly fixed + per-message variable (example: $0.005–$0.05/msg depending on region and message type).
51+
- whatsapp-web.js / Baileys: Zero direct provider fees, but higher maintenance cost and risk (replace suspended accounts, dev time). Not viable for reliable production.
52+
53+
UX implications vs Telegram (side-by-side)
54+
- Message templates + session model (WhatsApp Business) make unsolicited streaming (e.g., long TTS or progressive voice notes) awkward: initial outbound must be a pre-approved template in many cases, while Telegram allows free-form message delivery and bot-initiated rich interactions.
55+
- Voice notes: WhatsApp supports voice messages natively, but streaming partial audio to render progressive playback is not supported by Business Cloud API — we'd need to upload final audio or use ephemeral media sessions. Telegram supports bots sending audio and streaming-like behaviours more easily.
56+
- Presence and discoverability: WhatsApp relies on phone numbers and a business profile; Telegram bots have usernames and deep-linking that are easier for consumer discovery and opt-in.
57+
58+
Recommended Phase 1 path and rationale
59+
- Recommendation: Defer unlimited WhatsApp integration until post-v0.20.0 unless there is a funded business case requiring WhatsApp immediately. If we must ship Phase 1, target the **WhatsApp Business Cloud API via a partner (Twilio or 360dialog)** as the implementation path for production-grade users. Rationale:
60+
- Official APIs + partner tooling reduce ban/legal risk and provide production SLAs.
61+
- Although template gating reduces some UX flexibility, a partner + careful conversation design can deliver acceptable UX for business use-cases (support agents, notifications, private assistant flows initiated by users).
62+
- Avoid community drivers for Phase 1 due to high ban risk and maintenance burden.
63+
64+
Sample conversation (Business Cloud via partner)
65+
User -> GAIA: "Hey, summarize my travel plan for May 11"
66+
GAIA -> (incoming webhook) -> agent resolves intent and replies as a session message (no template needed if user initiated within 24h). If GAIA needs to proactively message later, it uses an approved template: "Your travel summary is ready: [link]".
67+
68+
Setup steps (partner-assisted Phase 1)
69+
1. Create Meta Business Manager, complete verification (documented; can take days).
70+
2. Provision WhatsApp Business account and phone number (via Twilio/360dialog).
71+
3. Configure webhooks to GAIA's API server (TLS endpoint), map partner events to `gaia` adapter.
72+
4. Implement conversation mapping: webhook -> adapter -> agent -> reply via partner API.
73+
5. Submit required message templates for proactive messages.
74+
75+
Phase 1 investigation tasks & effort estimate
76+
- Spike A — whatsapp-web.js prototype (safety spike): 2–3d engineer. Deliverable: short-running prototype, documented ban-rate evidence (public threads & maintainers), and a decision note explaining operational risk.
77+
- Spike B — Business Cloud API free-tier onboarding: 3–5d engineer (mainly procedural). Deliverable: documented steps, expected Meta verification time, sample webhook round-trip, template submission trial.
78+
- Design task — UX mapping vs Telegram: 1–2d PM/Designer to map conversational flows and constraints (template gating, voice notes, streaming). Deliverable: side-by-side UX matrix and concrete recommendations.
79+
- Integration spec — Partner integration adapter + auth + message mapping: 3–5d engineer to draft Phase 1 implementation PR scope and API shapes.
80+
- Total Phase 1 investigation estimate: 2–3 weeks (one engineer + PM part-time).
81+
82+
Risks & mitigations
83+
- Account bans (web.js/Baileys): risk -> avoid for Phase 1.
84+
- Long Meta verification times: mitigation -> use partner onboarding guidance and start process early, budget for delays.
85+
- Privacy concerns: mitigate via docs, opt-in flows, minimal logging, and contractual review with partners.
86+
87+
Decision outcome & next steps
88+
- Primary choice: Defer or build on Business Cloud API via partner. Phase 1 should focus on investigation spikes and a partner-backed adapter if the project has commercial justification.
89+
- Next steps (Phase 1): run Spike A and Spike B in parallel, finalize integration spec, then open implementation PR scoped to partner adapter + tests + docs.
90+
91+
Links and references
92+
- WhatsApp Business Cloud API: https://developers.facebook.com/docs/whatsapp
93+
- whatsapp-web.js: https://github.com/pedroslopez/whatsapp-web.js
94+
- Baileys: https://github.com/WhiskeySockets/Baileys

docs/spikes/whatsapp-cloudapi.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Spike B - WhatsApp Business Cloud API onboarding trial
2+
3+
Status: in-progress
4+
Owner: engineering / PM
5+
Start date: 2026-05-03
6+
7+
Objective
8+
- Attempt the Business Cloud API free-tier onboarding and verify webhook round-trip, template submission, and expected Meta business verification friction.
9+
10+
Success criteria
11+
- Able to register a test WhatsApp Business Account (or provision via partner sandbox), register a webhook, and send/receive a test message.
12+
- Document time-to-verify for Business Manager and template approval steps.
13+
14+
How to run the trial (high level)
15+
1. Create a Meta Business Manager account and begin verification (requires legal company info).
16+
2. Option A: Use direct Business Cloud API sandbox (if available) — follow Meta docs.
17+
3. Option B (recommended for Phase 1): Provision a partner sandbox (Twilio/360dialog) with a test number and follow their webhook docs.
18+
4. Configure a TLS endpoint to receive webhooks from partner / Meta and map to GAIA adapter.
19+
20+
Notes
21+
- Meta verification can take days; partners often streamline onboarding and provide sandbox/test numbers.
22+
- Templates must be submitted and approved for proactive outbound messages; this step can also take time.
23+
24+
Deliverables
25+
- `docs/spikes/whatsapp-cloudapi.md` (this doc, updated with findings)
26+
- Short onboarding log with screenshots and time estimates.

docs/spikes/whatsapp-webjs.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Spike A - whatsapp-web.js prototype
2+
3+
Status: in-progress
4+
Owner: engineering
5+
Start date: 2026-05-03
6+
7+
Objective
8+
- Build a minimal prototype using `whatsapp-web.js` to confirm: connection flow, message send/receive, media handling, and measure ban/instability signals over short runs.
9+
10+
Success criteria
11+
- Can connect to a personal WhatsApp account and receive/send messages programmatically.
12+
- Produce a short log of breakages, session invalidations, or bans over a 48–72h window (if feasible).
13+
- Produce a brief operational risk note summarizing public ban-rate signals from maintainers and GitHub issues.
14+
15+
How to run the prototype (manual steps)
16+
1. Prepare a dedicated test phone + WhatsApp account (not a primary personal account).
17+
2. On a machine with Node.js (16+), run:
18+
19+
```bash
20+
cd experiments/whatsapp-webjs
21+
npm install
22+
node index.js
23+
```
24+
25+
3. Scan the QR code with the test phone (the script prints a QR to the terminal).
26+
4. Send messages to the account and observe logs.
27+
28+
Initial run notes
29+
- I started the prototype and it printed the QR to the terminal; you scanned it and the script accepted the handshake payload. The process was interrupted with Ctrl+C (exit code 130) during the manual run - see `experiments/whatsapp-webjs/run.log` for recorded events.
30+
- Puppeteer required native Chromium libraries; on Debian/Ubuntu install `libnss3`, `libnspr4`, `libgtk-3-0`, etc. before running.
31+
32+
Next steps for the spike
33+
- Keep the prototype running on a sacrificial test account for 24–72h and record any `disconnected`, `auth_failure`, or account ban signals in `run.log`.
34+
- Collect public evidence: search maintainer issues and community threads for ban-rate anecdotes and link findings in this doc.
35+
36+
Notes and warnings
37+
- This uses an unofficial client that automates WhatsApp Web; using it for automation is against WhatsApp's TOS and accounts do get banned. Use a sacrificial test account and do not use production / personal numbers.
38+
- Keep runs short and document any account suspensions. Do not publish account credentials.
39+
40+
Deliverables
41+
- `experiments/whatsapp-webjs` prototype scaffolding (index.js, package.json).
42+
- `docs/spikes/whatsapp-webjs.md` — this doc (updated with findings).
43+
- Short report capturing observed instability and references to public threads.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
const { Client, LocalAuth } = require('whatsapp-web.js');
2+
const qrcode = require('qrcode-terminal');
3+
const fs = require('fs');
4+
const path = require('path');
5+
6+
const LOG_PATH = path.resolve(__dirname, 'run.log');
7+
function writeLog(...parts) {
8+
const line = `[${new Date().toISOString()}] ${parts.join(' ')}\n`;
9+
process.stdout.write(line);
10+
try {
11+
fs.appendFileSync(LOG_PATH, line);
12+
} catch (e) {
13+
// ignore logging errors
14+
}
15+
}
16+
17+
// Minimal prototype: prints QR, logs incoming messages, echoes them back.
18+
const client = new Client({ authStrategy: new LocalAuth() });
19+
20+
client.on('qr', (qr) => {
21+
qrcode.generate(qr, { small: true });
22+
writeLog('EVENT qr');
23+
});
24+
25+
client.on('authenticated', (session) => {
26+
writeLog('EVENT authenticated');
27+
});
28+
29+
client.on('auth_failure', (msg) => {
30+
writeLog('EVENT auth_failure', msg);
31+
});
32+
33+
client.on('ready', () => {
34+
writeLog('EVENT ready');
35+
});
36+
37+
client.on('disconnected', (reason) => {
38+
writeLog('EVENT disconnected', reason);
39+
});
40+
41+
client.on('message', async (msg) => {
42+
writeLog('IN', msg.from, msg.body);
43+
try {
44+
await msg.reply('Echo: ' + msg.body);
45+
writeLog('OUT reply', msg.from);
46+
} catch (e) {
47+
writeLog('ERROR reply', e && e.message);
48+
}
49+
});
50+
51+
process.on('SIGINT', async () => {
52+
writeLog('SIGINT received — shutting down');
53+
try {
54+
await client.destroy();
55+
} catch (e) {
56+
// ignore
57+
}
58+
process.exit(0);
59+
});
60+
61+
client.initialize();

0 commit comments

Comments
 (0)