WhatsApp Automation

Production-ready WhatsApp auto-responder powered by Google's Gemini API with AI persona learning and voice message support.

Features

Modular service architecture under src/ (config, middleware, controllers, routes, services, utils)
Secure multi-session management gated by rotating auth codes
Configurable Gemini model, API key, and optional system prompt per session
Personal chat only: Automatically ignores group messages and status updates
Toggleable auto replies with per-session context windows (10-100 messages retained)
Conversation memory for the last N user/assistant messages to improve AI relevance
AI Persona Learning: Automatically learns your typing style from chat history
- Uses contact-specific persona for chats with 250+ messages
  - Learns from conversation pairs (user message → your reply) to understand context
  - Sees how you respond to different types of messages
- Falls back to universal persona for new conversations (your replies only)
- Number of examples used matches your context window setting (10-1000 messages)
- Intelligently filters out AI-generated messages to learn only from your actual writing
- Mimics your tone, sentence structure, punctuation, emojis, vocabulary, and contextual adaptation
Voice message support with Google Speech-to-Text and Text-to-Speech APIs
Custom keyword, prefix, or regex replies that trigger before AI hand-off
Bulk messaging console with CSV import and delivery reporting
Scheduled messaging queue with cancel support for time-based campaigns
Rate limiting, Helmet hardening, compression, and centralized logging
24-hour per-chat opt-out via !stop command
24-hour re-enable via !start command
Graceful shutdown with automatic WhatsApp client teardown
Health endpoint (/health) and structured logs for observability
Memory management with automatic pruning for 1GB RAM environments

Requirements

Node.js 20.x or newer (for native fetch and AbortSignal.timeout support)
Google Gemini API key (for AI text responses)
Google Cloud Speech-to-Text API key (optional, for voice message transcription)
Google Cloud Text-to-Speech API key (optional, for voice message replies)
Chrome-compatible environment for Puppeteer (required by @wppconnect-team/wppconnect)
MongoDB instance (for session persistence, AI config, and scheduled jobs)

Getting Started

Install dependencies:
```
npm install
# or
bun install
```
Copy .env.example to .env and adjust values. At minimum set:
- AUTH_CODES (comma-separated) for login access, or configure in codes/codes.json
- MONGO_URI for MongoDB connection
- NODE_ENV, PORT, LOG_LEVEL as needed
- Optional resource tuning flags:
  - ENABLE_COMPRESSION (true/false, default auto-enables in production)
  - ENABLE_REQUEST_LOGGER (true/false, default off in production)
  - AUTO_RESTORE_SESSIONS (true/false, default true)
  - SESSION_RESTORE_THROTTLE_MS (delay between session restores, default 1000)
Run lint checks (optional but recommended before commits):
```
npm run lint
```
Run in development (includes auto-reload and pretty logs):
```
npm run dev
# or for nodemon
npm run devn
```
Launch the production server:
```
npm run start
```

The frontend is served from frontend/ and provides a comprehensive control panel for QR login, AI configuration, bulk messaging, scheduling, voice settings, and persona management.

Project layout

src/
├── app.ts              # Express app factory
├── index.ts            # Process bootstrap & graceful shutdown
├── bootstrap/          # Startup, shutdown, and session restoration helpers
├── config/             # Environment + logger setup
├── constants/          # Application constants
├── controllers/        # Route handlers (auth, AI config, health, QR, messages, persona)
├── middleware/         # Logging, errors, rate limiting, request logger
├── routes/             # Route registration modules
├── services/           # WhatsApp session, AI, database, and persistence services
├── types/              # TypeScript type definitions
├── utils/              # Shared helpers (HTTP fetch wrapper)
├── validation/         # Zod schemas for request validation
frontend/               # Static control panel (HTML, JS, CSS)
codes/                  # Authentication codes store (codes.json)
server.js               # CommonJS entry point

Deployment Notes

Deploy behind HTTPS and supply a persistent data directory so LocalAuth can reuse QR sessions.
Keep codes/codes.json out of version control or override with the AUTH_CODES environment variable.
Add process supervision (PM2, systemd, Docker, etc.) for automatic restarts.
Tune rate limits (RATE_LIMIT_MAX, AUTH_RATE_LIMIT_MAX) to match expected traffic.
Monitor logs and the /health endpoint to detect failures early.

Frontend Usage

Enter a valid auth code to request a QR code.
Scan the QR using the paired WhatsApp account. Once connected, configure the Gemini API key, model, and optional system prompt—these settings are persisted in MongoDB for the session.
Toggle auto replies, adjust the context window, and manage custom replies directly in the console—saved rules persist in MongoDB and carry across restarts.
Navigate to the Utils tab to enable voice message support:
- Toggle "Enable voice message replies"
- Enter your Google Cloud Speech-to-Text API key (for transcribing incoming voice notes)
- Enter your Google Cloud Text-to-Speech API key (for generating voice responses)
- Select the desired language and voice gender
- Click "Save voice configuration"
- When enabled, the bot will automatically transcribe voice messages and reply with voice messages
Navigate to the Persona Manager tab to inspect and manage AI learning data:
- View all contacts with saved persona data
- Search contacts by phone number
- View universal persona (used for new chats)
- View contact-specific personas (used for established chats with 250+ messages)
- See statistics: total messages, user messages, your replies, AI replies
- Edit or delete individual messages from any persona
- Filter to show only "My reply:" messages used for learning
Paste or upload recipients to broadcast bulk messages—results show successes and failures.
Schedule messages in advance; monitor, cancel, or remove jobs from the schedule table. Scheduled runs survive restarts and resume automatically once the service is back online.
Users can send !stop in the chat to disable automated replies for 24 hours, or !start to re-enable them early.
- Note: !stop disables both text and voice auto-replies. The bot will not process any messages (including voice transcription) from stopped users to save API costs.
- Users must send !start as a text message to re-enable auto-replies.

API Endpoints

AI Configuration

GET /ai/:code – Retrieve the sanitized AI configuration for a session (includes voice settings).
POST /ai/:code – Update API key, model, auto-reply toggle, context window, and voice settings.
POST /ai/:code/replies – Update custom keyword replies for a session.

Messaging

POST /messages/:code/bulk – Send an immediate broadcast to multiple numbers.
POST /messages/:code/voice – Send a voice message (used internally for AI voice replies).
GET /messages/:code/schedule – List upcoming scheduled jobs and historical results.
POST /messages/:code/schedule – Schedule a delayed broadcast.
DELETE /messages/:code/schedule/:jobId – Cancel a pending scheduled job (?mode=remove to delete the record entirely).

Persona Management

GET /persona/:code/contacts – List all contacts with persona data.
GET /persona/:code/contact/:contactId – Get persona messages for a specific contact.
GET /persona/:code/universal – Get universal persona messages.
PUT /persona/:code/contact/:contactId/message/:messageIndex – Update a message in contact persona.
PUT /persona/:code/universal/message/:messageIndex – Update a message in universal persona.
DELETE /persona/:code/contact/:contactId/message/:messageIndex – Delete a message from contact persona.
DELETE /persona/:code/universal/message/:messageIndex – Delete a message from universal persona.

Other

GET /auth/:code – Validate authentication code.
GET /qr/:code – Get QR code for WhatsApp session authentication.
GET /health – Health check endpoint.

Voice Message Configuration

The bot supports automatic transcription of incoming voice notes and can reply with AI-generated voice messages using Google Cloud APIs.

Setup Steps

Create Google Cloud Project:
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable billing for the project
Enable Required APIs:
- Enable Speech-to-Text API
- Enable Text-to-Speech API
Generate API Keys:
- Go to "APIs & Services" → "Credentials"
- Click "Create Credentials" → "API Key"
- Create two separate API keys (or use the same key for both):
  - One for Speech-to-Text
  - One for Text-to-Speech
- Optionally restrict the keys to only the necessary APIs
Configure in Frontend:
- Navigate to the "Utils" tab
- Enable "Voice message replies"
- Paste your API keys
- Select language (e.g., en-US, es-ES, fr-FR)
- Select voice gender (Neutral, Male, or Female)
- Click "Save voice configuration"

How It Works

When a user sends a voice note, the bot:
1. Downloads the audio file (OGG/OPUS format from WhatsApp)
2. Transcribes it using Google Speech-to-Text API
3. Processes the transcribed text through the AI (Gemini)
4. Converts the AI response to speech using Google Text-to-Speech API
5. Sends the audio back as a voice message
Voice settings are persisted in MongoDB and automatically restored on session reconnect
If voice processing fails, the bot logs the error and skips the message
Text messages continue to work normally alongside voice message support

Supported Languages

The voice feature supports 20+ languages including:

English (US/UK)
Spanish (Spain/US)
French, German, Italian
Portuguese (Brazil/Portugal)
Japanese, Korean, Chinese (Simplified/Traditional)
Arabic, Hindi, Russian, Turkish, Polish, Dutch, Swedish

Cost Considerations

Google Cloud charges per request:
- Speech-to-Text: ~$0.006 per 15 seconds of audio
- Text-to-Speech: ~$4.00 per 1 million characters
Monitor usage in Google Cloud Console
Set up billing alerts to avoid unexpected charges
Consider implementing rate limits for voice messages if needed

Troubleshooting

If QR codes stop refreshing, delete the .wwebjs_auth folder for the affected code and restart the service.
Gemini API errors will be logged with HTTP status. Verify API key, model name, and rate limits.
Use LOG_LEVEL=debug for verbose output during incident response.

Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

Run npm run lint before committing to ensure code quality
Use npm run dev for development with auto-reload
Test your changes thoroughly before submitting

Code Style

Follow the existing TypeScript/JavaScript style
Use ESLint configuration for consistency
Write clear, descriptive commit messages

License

This project is licensed under the ISC License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
frontend		frontend
src		src
.env.example		.env.example
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
bun.lock		bun.lock
eslint.config.mjs		eslint.config.mjs
package.json		package.json
server.js		server.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhatsApp Automation

Table of Contents

Features

Requirements

Getting Started

Project layout

Deployment Notes

Frontend Usage

API Endpoints

AI Configuration

Messaging

Persona Management

Other

Voice Message Configuration

Setup Steps

How It Works

Supported Languages

Cost Considerations

Troubleshooting

Contributing

Development Setup

Code Style

License

About

Uh oh!

Releases

Packages

Languages

UmarSidiki/whatsapp-automation

Folders and files

Latest commit

History

Repository files navigation

WhatsApp Automation

Table of Contents

Features

Requirements

Getting Started

Project layout

Deployment Notes

Frontend Usage

API Endpoints

AI Configuration

Messaging

Persona Management

Other

Voice Message Configuration

Setup Steps

How It Works

Supported Languages

Cost Considerations

Troubleshooting

Contributing

Development Setup

Code Style

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages