vMeNext - AI-Powered Blog Chatbot

A sophisticated AI chatbot system that provides intelligent responses about blog content, handles website monitoring, and manages user interactions through a modern web interface.

Features • Installation • Configuration • Usage • API • Deployment

📋 Table of Contents

Overview
Features
Installation
Configuration
Usage
API Reference
Architecture
Deployment
Troubleshooting
Contributing
License

🎯 Overview

vMeNext is a comprehensive AI-powered chatbot system designed to serve as an intelligent interface for blog content and website management. Built with modern Python technologies, it combines the power of OpenAI's GPT models with automated web scraping, monitoring, and user engagement features.

Key Capabilities

Intelligent Conversations: Powered by OpenAI's latest GPT models for natural, context-aware responses
Blog Content Integration: Automatic scraping, processing, and summarization of blog posts
Website Monitoring: Continuous availability checking with real-time alerts
Document Processing: Support for multiple file formats (PDF, DOCX, TXT, MD)
User Engagement: Automated email notifications and contact management
Analytics Dashboard: Website uptime statistics with visualizations

✨ Features

🤖 AI-Powered Chat

Context-Aware Responses: Maintains conversation history and context
Tool Integration: Can execute functions like sending emails and fetching data
Customizable Personality: Configurable system prompts for different personas
Streaming Responses: Real-time response generation for better UX

📝 Blog Content Management

Automatic Scraping: Uses Playwright for robust web scraping
Content Summarization: AI-powered summarization of blog posts
Multi-format Support: Handles various blog layouts and structures
Pagination Handling: Automatically follows pagination links

🌐 Website Monitoring

Continuous Monitoring: 24/7 website availability checking
Email Alerts: Instant notifications when issues are detected
Response Time Tracking: Monitors and logs response times
Historical Data: Maintains logs with configurable retention periods

📊 Analytics & Statistics

Uptime Visualization: Interactive charts showing website availability
Performance Metrics: Response time analysis and trends
Data Export: JSON-based logging for external analysis

📧 Communication System

SMTP Integration: Uses SMTP2GO for reliable email delivery
User Notifications: Automated contact form handling
Admin Alerts: System status and user interaction notifications

🚀 Installation

Prerequisites

Python 3.8 or higher
OpenAI API key
SMTP2GO account (for email functionality)
Modern web browser

Step 1: Clone the Repository

git clone https://github.com/yourusername/vMeNext.git
cd vMeNext

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Install Playwright Browsers

playwright install chromium

Note: The application automatically installs Playwright Chromium on startup, but you can also install it manually for faster startup times.

Step 4: Environment Configuration

Copy the environment template:
```
cp .env.example .env
```
Edit .env with your configuration (see Configuration section)
Validate your setup:
```
python check_env.py
```

Step 5: Run the Application

python main.py

The application will start a Gradio web interface accessible at http://localhost:7860

⚙️ Configuration

The application requires comprehensive environment configuration. All variables are validated by check_env.py.

Required Environment Variables

OpenAI Configuration

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4  # or gpt-3.5-turbo

Email Configuration (SMTP2GO)

SMTP2GO_API_KEY=your_smtp2go_api_key
ALERT_EMAIL_TO=[email protected]
ALERT_EMAIL_FROM=[email protected]
SMTP2GO_API_URL=https://api.smtp2go.com/v3/email/send
EMAIL_TIMEOUT=30

Website Monitoring

MONITOR_URL=https://yourwebsite.com
ALLOWED_SCRAPE_DOMAIN=yourdomain.com
HTTP_TIMEOUT=10
USER_AGENT=Mozilla/5.0 (compatible; vMeNext/1.0)

File Paths

LOGS_DIR=logs
DOCUMENTS_DIR=about_me
AVAILABILITY_LOG_FILE=logs/availability_log.json
CHAT_MEMORY_FILE=memory/chat_memory.json
BLOG_SUMMARY_FILE=about_me/blog_posts_summary.md
AVAILABILITY_PLOT_FILE=logs/availability_plot.png

Blog Configuration

BLOG_CREATOR_NAME=Your Name
BLOG_BASE_URL=https://yourblog.com
MAX_SCRAPE_PAGES=10
MAX_SCRAPE_POSTS=50
PLAYWRIGHT_TIMEOUT=30000

System Configuration

TIMEZONE=UTC
LOG_RETENTION_DAYS=30

Gradio Interface

GRADIO_TITLE=AI Blog Assistant
GRADIO_WELCOME_MESSAGE=Welcome! How can I help you today?
GRADIO_INPUT_LABEL=Your message
GRADIO_BUTTON_TEXT=Send

Hugging Face Deployment

HF_TOKEN=hf_your_access_token_here

💻 Usage

Basic Chat Interface

Start the application:
```
python main.py
```
Access the web interface at http://localhost:7860
Begin chatting with the AI assistant about blog content, cybersecurity, or any configured topics

Special Commands

The chatbot supports several special admin commands:

read all blog posts - Scrapes and summarizes all blog posts
display stats - Shows website availability statistics
reload context - Reloads document context from files

Document Processing

Place documents in the about_me/ directory:

PDF files: Automatically processed and indexed
DOCX files: Microsoft Word documents supported
TXT files: Plain text files
MD files: Markdown documents

Website Monitoring

The monitoring system runs automatically in the background:

Checks website availability every 60 seconds
Logs all results with timestamps
Sends email alerts for downtime
Generates uptime statistics and visualizations

📚 API Reference

Core Functions

`chat(user_input: str, chat_id: Optional[str]) -> Tuple[str, str, Optional[str], str]`

Main chat function that processes user input and generates AI responses.

Parameters:

user_input: User's message
chat_id: Optional session identifier

Returns:

Response text
Table HTML (if applicable)
Graph HTML (if applicable)
Final chat ID

`scrape_and_summarize(base_url: str, max_pages: int, max_posts: int) -> int`

Scrapes blog posts and generates AI summaries.

Parameters:

base_url: Blog URL to scrape
max_pages: Maximum pages to crawl
max_posts: Maximum posts to process

Returns:

Number of posts successfully summarized

`check_availability() -> None`

Continuously monitors website availability in background thread.

Tool Functions

`send_user_contact_notification(user_email: str, user_name: str, message: str, chat_id: str) -> str`

Sends admin notification when user provides contact information.

🏗️ Architecture

vMeNext/
├── main.py                 # Application entry point and Gradio interface
├── check_env.py           # Environment validation
├── requirements.txt       # Python dependencies
├── packages.txt          # System dependencies (libnss3 for Playwright)
├── src/
│   ├── chatbot.py        # Core AI conversation logic
│   ├── memory.py         # Chat session persistence
│   ├── availability_checker.py  # Website monitoring
│   ├── email.py          # Email notification system
│   ├── blog_scraper.py   # Blog content extraction
│   ├── document_loader.py # Document processing
│   ├── stats.py          # Analytics and visualizations
│   └── utils.py          # Utility functions
├── prompts/
│   └── system_prompt.txt # AI system prompt template
├── about_me/             # Document storage directory
├── logs/                 # Log files and visualizations
└── memory/               # Chat session storage

Component Overview

Gradio Interface: Modern web UI with real-time chat
OpenAI Integration: GPT model integration with tool calling
Playwright Scraping: Robust web scraping with browser automation
Email System: SMTP2GO integration for notifications
Monitoring System: Background thread for continuous monitoring
Document Processing: Multi-format document loading and indexing

🚀 Deployment

vMeNext can be deployed on multiple platforms. Choose the option that best fits your needs:

Deployment on Hugging Face Spaces

This application can be deployed to HuggingFace Spaces using Gradio. Follow the steps below:

Before you begin

Ensure your about_me/ folder contains personalized documents (e.g., your résumé, portfolio summary)
Remove any pre-existing README.md files inside your project directory if created by previous deployments
Ensure all required files are in your project directory:
- main.py (application entry point)
- requirements.txt (Python dependencies)
- packages.txt (system dependencies)
- .env file with all required environment variables

Step-by-step Deployment

Create a Hugging Face account:
- Visit huggingface.co and sign up or log in
Generate an Access Token:
- Click your avatar in the top right → "Access Tokens"
- Click "Create New Token", name it something like gradio-deploy, and give it WRITE permissions
- Copy the generated token
Add the token to your .env file:
```
HF_TOKEN=hf_...
```
Deploy with Gradio CLI: Run from your project directory:
```
uv run gradio deploy
```
If Hugging Face doesn't detect your token, use:
```
uv run dotenv -f ../.env run -- uv run gradio deploy
```
During deployment, you'll be prompted to enter:
- Space name: e.g., vme or vmenext
- Script path: main.py
- Hardware type: cpu-basic (free) or upgrade for better performance
- Secrets: Add secrets such as OPENAI_API_KEY and SMTP2GO_API_KEY
- Skip GitHub Actions unless you're automating CI/CD
Access your deployed application:
- Your application will be available at https://huggingface.co/spaces/yourusername/your-space-name
- The deployment process will automatically handle system dependencies from packages.txt

Alternative: Manual Deployment via Web Interface

If you prefer using the web interface instead of CLI:

Go to huggingface.co/spaces and sign in
Click "Create new Space"
Fill in the space details:
- Space name: vmenext (or your preferred name)
- License: Choose appropriate license (e.g., MIT)
- SDK: Gradio
- Hardware: CPU basic (free) or upgrade for better performance
- Visibility: Public or Private
Connect your GitHub repository and set:
- App file: main.py
- SDK: gradio
- SDK version: 5.38.0 (or latest)
Configure Environment Variables in Settings → Variables
Deploy: Your Space will automatically build and deploy

Resources

Gradio Cloud Deployment (Alternative)

Gradio Cloud also provides hosting for Gradio applications.

Prerequisites for Gradio Deployment

Gradio Account: Sign up at gradio.app
GitHub Repository: Push your code to a GitHub repository
Environment Variables: Configure all required environment variables in Gradio Cloud

Deployment Steps

Prepare your repository:
- Ensure all files are committed to your GitHub repository
- The main.py file should be in the root directory
- Include requirements.txt and packages.txt files
Deploy to Gradio Cloud:
- Go to gradio.app and sign in
- Click "Create" → "New Space"
- Connect your GitHub repository
- Set the following configuration:
  - App File: main.py
  - SDK: gradio
  - SDK Version: 5.38.0 (or latest)
Configure Environment Variables:
- In your Gradio Space settings, add all required environment variables
- Go to Settings → Variables and add each variable from the Configuration section
System Dependencies:
- The packages.txt file includes system dependencies (libnss3) required for Playwright
- Gradio Cloud will automatically install these dependencies

Local Development

# Development mode with auto-reload
python main.py

The application will be accessible at http://localhost:7860

Deployment Considerations

Platform Comparison

Feature	Hugging Face Spaces	Gradio Cloud
Free Tier	✅ CPU basic	✅ Available
Performance	⭐⭐⭐ Excellent	⭐⭐ Good
Community	⭐⭐⭐ Large ML community	⭐⭐ Gradio-focused
Custom Domains	❌ No	✅ Yes
Private Spaces	✅ Yes	✅ Yes
Hardware Upgrades	✅ CPU/GPU options	✅ Available

Environment Considerations

Security: Use environment variables for all sensitive data
Monitoring: Set up log rotation for the logs directory
Backup: Regularly backup the memory and logs directories
Updates: Keep dependencies updated for security patches
Rate Limits: Be aware of API rate limits for OpenAI and email services
Resource Usage: Monitor memory and CPU usage, especially for web scraping

🔧 Troubleshooting

Common Issues

Environment Variables Not Set

# Check environment configuration
python check_env.py

Solution: Ensure all required variables are set in .env file

Playwright Browser Issues

# Reinstall browsers
playwright install chromium

Solution: Ensure Playwright browsers are properly installed

Email Not Sending

Verify SMTP2GO API key and configuration
Check network connectivity
Review email timeout settings

Website Monitoring Not Working

Verify MONITOR_URL is accessible
Check HTTP_TIMEOUT settings
Review logs in logs/availability_log.json

Chat Memory Issues

Ensure memory/ directory exists and is writable
Check CHAT_MEMORY_FILE path configuration

Hugging Face Spaces Deployment Issues

Build Failures: Check that all dependencies in requirements.txt are compatible
Environment Variables: Ensure all required variables are set in Space settings
Memory Issues: Consider upgrading to a higher hardware tier if running out of memory
Timeout Issues: Increase timeout values for web scraping operations
Playwright Issues: Verify that packages.txt includes all required system dependencies

Debug Mode

Enable debug logging by setting:

DEBUG=true

Log Files

Availability logs: logs/availability_log.json
Chat memory: memory/chat_memory.json
Application logs: Check console output

🤝 Contributing

We welcome contributions! Please follow these guidelines:

Development Setup

Fork the repository

Create a feature branch:

git checkout -b feature/your-feature-name

Make your changes
Test thoroughly
Submit a pull request

Code Style

Follow PEP 8 guidelines
Use type hints for function parameters and returns
Add docstrings for all functions and classes
Keep functions focused and modular

Testing

Test all new features thoroughly
Ensure environment validation passes
Test with various document formats
Verify email functionality

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for providing the GPT models
Gradio for the excellent web interface framework
Playwright for robust web scraping capabilities
SMTP2GO for reliable email delivery

Made with ❤️ for the cybersecurity community

Report Bug • Request Feature • Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
about_me		about_me
assets		assets
logs		logs
memory		memory
prompts		prompts
src		src
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
check_env.py		check_env.py
main.py		main.py
packages.txt		packages.txt
requirements.txt		requirements.txt

License

InfosecOTB/vme-next

Folders and files

Latest commit

History

Repository files navigation

vMeNext - AI-Powered Blog Chatbot

📋 Table of Contents

🎯 Overview

Key Capabilities

✨ Features

🤖 AI-Powered Chat

📝 Blog Content Management

🌐 Website Monitoring

📊 Analytics & Statistics

📧 Communication System

🚀 Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Install Playwright Browsers

Step 4: Environment Configuration

Step 5: Run the Application

⚙️ Configuration

Required Environment Variables

OpenAI Configuration

Email Configuration (SMTP2GO)

Website Monitoring

File Paths

Blog Configuration

System Configuration

Gradio Interface

Hugging Face Deployment

💻 Usage

Basic Chat Interface

Special Commands

Document Processing

Website Monitoring

📚 API Reference

Core Functions

chat(user_input: str, chat_id: Optional[str]) -> Tuple[str, str, Optional[str], str]

scrape_and_summarize(base_url: str, max_pages: int, max_posts: int) -> int

check_availability() -> None

Tool Functions

send_user_contact_notification(user_email: str, user_name: str, message: str, chat_id: str) -> str

🏗️ Architecture

Component Overview

🚀 Deployment

Deployment on Hugging Face Spaces

Before you begin

Step-by-step Deployment

Alternative: Manual Deployment via Web Interface

Resources

Gradio Cloud Deployment (Alternative)

Prerequisites for Gradio Deployment

Deployment Steps

Local Development

Deployment Considerations

Platform Comparison

Environment Considerations

🔧 Troubleshooting

Common Issues

Environment Variables Not Set

Playwright Browser Issues

Email Not Sending

Website Monitoring Not Working

Chat Memory Issues

Hugging Face Spaces Deployment Issues

Debug Mode

Log Files

🤝 Contributing

Development Setup

Code Style

Testing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

`chat(user_input: str, chat_id: Optional[str]) -> Tuple[str, str, Optional[str], str]`

`scrape_and_summarize(base_url: str, max_pages: int, max_posts: int) -> int`

`check_availability() -> None`

`send_user_contact_notification(user_email: str, user_name: str, message: str, chat_id: str) -> str`

Packages