Production-ready WhatsApp auto-responder powered by Google's Gemini API with AI persona learning and voice message support.
- Features
- Requirements
- Getting Started
- Project Layout
- API Endpoints
- Voice Message Configuration
- Deployment Notes
- Frontend Usage
- Troubleshooting
- Contributing
- License
- Modular service architecture under
src/(config, middleware, controllers, routes, services, utils) - Secure multi-session management gated by rotating auth codes
- Configurable Gemini model, API key, and optional system prompt per session
- Personal chat only: Automatically ignores group messages and status updates
- Toggleable auto replies with per-session context windows (10-100 messages retained)
- Conversation memory for the last N user/assistant messages to improve AI relevance
- AI Persona Learning: Automatically learns your typing style from chat history
- Uses contact-specific persona for chats with 250+ messages
- Learns from conversation pairs (user message → your reply) to understand context
- Sees how you respond to different types of messages
- Falls back to universal persona for new conversations (your replies only)
- Number of examples used matches your context window setting (10-1000 messages)
- Intelligently filters out AI-generated messages to learn only from your actual writing
- Mimics your tone, sentence structure, punctuation, emojis, vocabulary, and contextual adaptation
- Uses contact-specific persona for chats with 250+ messages
- Voice message support with Google Speech-to-Text and Text-to-Speech APIs
- Custom keyword, prefix, or regex replies that trigger before AI hand-off
- Bulk messaging console with CSV import and delivery reporting
- Scheduled messaging queue with cancel support for time-based campaigns
- Rate limiting, Helmet hardening, compression, and centralized logging
- 24-hour per-chat opt-out via
!stopcommand - 24-hour re-enable via
!startcommand - Graceful shutdown with automatic WhatsApp client teardown
- Health endpoint (
/health) and structured logs for observability - Memory management with automatic pruning for 1GB RAM environments
- Node.js 20.x or newer (for native
fetchandAbortSignal.timeoutsupport) - Google Gemini API key (for AI text responses)
- Google Cloud Speech-to-Text API key (optional, for voice message transcription)
- Google Cloud Text-to-Speech API key (optional, for voice message replies)
- Chrome-compatible environment for Puppeteer (required by
@wppconnect-team/wppconnect) - MongoDB instance (for session persistence, AI config, and scheduled jobs)
- Install dependencies:
npm install # or bun install - Copy
.env.exampleto.envand adjust values. At minimum set:AUTH_CODES(comma-separated) for login access, or configure incodes/codes.jsonMONGO_URIfor MongoDB connectionNODE_ENV,PORT,LOG_LEVELas needed- Optional resource tuning flags:
ENABLE_COMPRESSION(true/false, default auto-enables in production)ENABLE_REQUEST_LOGGER(true/false, default off in production)AUTO_RESTORE_SESSIONS(true/false, defaulttrue)SESSION_RESTORE_THROTTLE_MS(delay between session restores, default1000)
- Run lint checks (optional but recommended before commits):
npm run lint
- Run in development (includes auto-reload and pretty logs):
npm run dev # or for nodemon npm run devn - Launch the production server:
npm run start
The frontend is served from frontend/ and provides a comprehensive control panel for QR login, AI configuration, bulk messaging, scheduling, voice settings, and persona management.
src/
├── app.ts # Express app factory
├── index.ts # Process bootstrap & graceful shutdown
├── bootstrap/ # Startup, shutdown, and session restoration helpers
├── config/ # Environment + logger setup
├── constants/ # Application constants
├── controllers/ # Route handlers (auth, AI config, health, QR, messages, persona)
├── middleware/ # Logging, errors, rate limiting, request logger
├── routes/ # Route registration modules
├── services/ # WhatsApp session, AI, database, and persistence services
├── types/ # TypeScript type definitions
├── utils/ # Shared helpers (HTTP fetch wrapper)
├── validation/ # Zod schemas for request validation
frontend/ # Static control panel (HTML, JS, CSS)
codes/ # Authentication codes store (codes.json)
server.js # CommonJS entry point
- Deploy behind HTTPS and supply a persistent data directory so
LocalAuthcan reuse QR sessions. - Keep
codes/codes.jsonout of version control or override with theAUTH_CODESenvironment variable. - Add process supervision (PM2, systemd, Docker, etc.) for automatic restarts.
- Tune rate limits (
RATE_LIMIT_MAX,AUTH_RATE_LIMIT_MAX) to match expected traffic. - Monitor logs and the
/healthendpoint to detect failures early.
- Enter a valid auth code to request a QR code.
- Scan the QR using the paired WhatsApp account. Once connected, configure the Gemini API key, model, and optional system prompt—these settings are persisted in MongoDB for the session.
- Toggle auto replies, adjust the context window, and manage custom replies directly in the console—saved rules persist in MongoDB and carry across restarts.
- Navigate to the Utils tab to enable voice message support:
- Toggle "Enable voice message replies"
- Enter your Google Cloud Speech-to-Text API key (for transcribing incoming voice notes)
- Enter your Google Cloud Text-to-Speech API key (for generating voice responses)
- Select the desired language and voice gender
- Click "Save voice configuration"
- When enabled, the bot will automatically transcribe voice messages and reply with voice messages
- Navigate to the Persona Manager tab to inspect and manage AI learning data:
- View all contacts with saved persona data
- Search contacts by phone number
- View universal persona (used for new chats)
- View contact-specific personas (used for established chats with 250+ messages)
- See statistics: total messages, user messages, your replies, AI replies
- Edit or delete individual messages from any persona
- Filter to show only "My reply:" messages used for learning
- Paste or upload recipients to broadcast bulk messages—results show successes and failures.
- Schedule messages in advance; monitor, cancel, or remove jobs from the schedule table. Scheduled runs survive restarts and resume automatically once the service is back online.
- Users can send
!stopin the chat to disable automated replies for 24 hours, or!startto re-enable them early.- Note:
!stopdisables both text and voice auto-replies. The bot will not process any messages (including voice transcription) from stopped users to save API costs. - Users must send
!startas a text message to re-enable auto-replies.
- Note:
GET /ai/:code– Retrieve the sanitized AI configuration for a session (includes voice settings).POST /ai/:code– Update API key, model, auto-reply toggle, context window, and voice settings.POST /ai/:code/replies– Update custom keyword replies for a session.
POST /messages/:code/bulk– Send an immediate broadcast to multiple numbers.POST /messages/:code/voice– Send a voice message (used internally for AI voice replies).GET /messages/:code/schedule– List upcoming scheduled jobs and historical results.POST /messages/:code/schedule– Schedule a delayed broadcast.DELETE /messages/:code/schedule/:jobId– Cancel a pending scheduled job (?mode=removeto delete the record entirely).
GET /persona/:code/contacts– List all contacts with persona data.GET /persona/:code/contact/:contactId– Get persona messages for a specific contact.GET /persona/:code/universal– Get universal persona messages.PUT /persona/:code/contact/:contactId/message/:messageIndex– Update a message in contact persona.PUT /persona/:code/universal/message/:messageIndex– Update a message in universal persona.DELETE /persona/:code/contact/:contactId/message/:messageIndex– Delete a message from contact persona.DELETE /persona/:code/universal/message/:messageIndex– Delete a message from universal persona.
GET /auth/:code– Validate authentication code.GET /qr/:code– Get QR code for WhatsApp session authentication.GET /health– Health check endpoint.
The bot supports automatic transcription of incoming voice notes and can reply with AI-generated voice messages using Google Cloud APIs.
-
Create Google Cloud Project:
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable billing for the project
-
Enable Required APIs:
- Enable Speech-to-Text API
- Enable Text-to-Speech API
-
Generate API Keys:
- Go to "APIs & Services" → "Credentials"
- Click "Create Credentials" → "API Key"
- Create two separate API keys (or use the same key for both):
- One for Speech-to-Text
- One for Text-to-Speech
- Optionally restrict the keys to only the necessary APIs
-
Configure in Frontend:
- Navigate to the "Utils" tab
- Enable "Voice message replies"
- Paste your API keys
- Select language (e.g., en-US, es-ES, fr-FR)
- Select voice gender (Neutral, Male, or Female)
- Click "Save voice configuration"
-
When a user sends a voice note, the bot:
- Downloads the audio file (OGG/OPUS format from WhatsApp)
- Transcribes it using Google Speech-to-Text API
- Processes the transcribed text through the AI (Gemini)
- Converts the AI response to speech using Google Text-to-Speech API
- Sends the audio back as a voice message
-
Voice settings are persisted in MongoDB and automatically restored on session reconnect
-
If voice processing fails, the bot logs the error and skips the message
-
Text messages continue to work normally alongside voice message support
The voice feature supports 20+ languages including:
- English (US/UK)
- Spanish (Spain/US)
- French, German, Italian
- Portuguese (Brazil/Portugal)
- Japanese, Korean, Chinese (Simplified/Traditional)
- Arabic, Hindi, Russian, Turkish, Polish, Dutch, Swedish
- Google Cloud charges per request:
- Speech-to-Text: ~$0.006 per 15 seconds of audio
- Text-to-Speech: ~$4.00 per 1 million characters
- Monitor usage in Google Cloud Console
- Set up billing alerts to avoid unexpected charges
- Consider implementing rate limits for voice messages if needed
- If QR codes stop refreshing, delete the
.wwebjs_authfolder for the affected code and restart the service. - Gemini API errors will be logged with HTTP status. Verify API key, model name, and rate limits.
- Use
LOG_LEVEL=debugfor verbose output during incident response.
We welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Run
npm run lintbefore committing to ensure code quality - Use
npm run devfor development with auto-reload - Test your changes thoroughly before submitting
- Follow the existing TypeScript/JavaScript style
- Use ESLint configuration for consistency
- Write clear, descriptive commit messages
This project is licensed under the ISC License - see the LICENSE file for details.