Skip to content

A sophisticated AI chatbot system that provides intelligent responses about blog content, handles website monitoring, and manages user interactions through a modern web interface.

License

Notifications You must be signed in to change notification settings

InfosecOTB/vme-next

Repository files navigation

vMeNext - AI-Powered Blog Chatbot

vMeNext

Python OpenAI Gradio License

A sophisticated AI chatbot system that provides intelligent responses about blog content, handles website monitoring, and manages user interactions through a modern web interface.

FeaturesInstallationConfigurationUsageAPIDeployment

📋 Table of Contents

🎯 Overview

vMeNext is a comprehensive AI-powered chatbot system designed to serve as an intelligent interface for blog content and website management. Built with modern Python technologies, it combines the power of OpenAI's GPT models with automated web scraping, monitoring, and user engagement features.

Key Capabilities

  • Intelligent Conversations: Powered by OpenAI's latest GPT models for natural, context-aware responses
  • Blog Content Integration: Automatic scraping, processing, and summarization of blog posts
  • Website Monitoring: Continuous availability checking with real-time alerts
  • Document Processing: Support for multiple file formats (PDF, DOCX, TXT, MD)
  • User Engagement: Automated email notifications and contact management
  • Analytics Dashboard: Website uptime statistics with visualizations

✨ Features

🤖 AI-Powered Chat

  • Context-Aware Responses: Maintains conversation history and context
  • Tool Integration: Can execute functions like sending emails and fetching data
  • Customizable Personality: Configurable system prompts for different personas
  • Streaming Responses: Real-time response generation for better UX

📝 Blog Content Management

  • Automatic Scraping: Uses Playwright for robust web scraping
  • Content Summarization: AI-powered summarization of blog posts
  • Multi-format Support: Handles various blog layouts and structures
  • Pagination Handling: Automatically follows pagination links

🌐 Website Monitoring

  • Continuous Monitoring: 24/7 website availability checking
  • Email Alerts: Instant notifications when issues are detected
  • Response Time Tracking: Monitors and logs response times
  • Historical Data: Maintains logs with configurable retention periods

📊 Analytics & Statistics

  • Uptime Visualization: Interactive charts showing website availability
  • Performance Metrics: Response time analysis and trends
  • Data Export: JSON-based logging for external analysis

📧 Communication System

  • SMTP Integration: Uses SMTP2GO for reliable email delivery
  • User Notifications: Automated contact form handling
  • Admin Alerts: System status and user interaction notifications

🚀 Installation

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key
  • SMTP2GO account (for email functionality)
  • Modern web browser

Step 1: Clone the Repository

git clone https://github.com/yourusername/vMeNext.git
cd vMeNext

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Install Playwright Browsers

playwright install chromium

Note: The application automatically installs Playwright Chromium on startup, but you can also install it manually for faster startup times.

Step 4: Environment Configuration

  1. Copy the environment template:

    cp .env.example .env
  2. Edit .env with your configuration (see Configuration section)

  3. Validate your setup:

    python check_env.py

Step 5: Run the Application

python main.py

The application will start a Gradio web interface accessible at http://localhost:7860

⚙️ Configuration

The application requires comprehensive environment configuration. All variables are validated by check_env.py.

Required Environment Variables

OpenAI Configuration

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4  # or gpt-3.5-turbo

Email Configuration (SMTP2GO)

SMTP2GO_API_KEY=your_smtp2go_api_key
ALERT_EMAIL_TO=[email protected]
ALERT_EMAIL_FROM=[email protected]
SMTP2GO_API_URL=https://api.smtp2go.com/v3/email/send
EMAIL_TIMEOUT=30

Website Monitoring

MONITOR_URL=https://yourwebsite.com
ALLOWED_SCRAPE_DOMAIN=yourdomain.com
HTTP_TIMEOUT=10
USER_AGENT=Mozilla/5.0 (compatible; vMeNext/1.0)

File Paths

LOGS_DIR=logs
DOCUMENTS_DIR=about_me
AVAILABILITY_LOG_FILE=logs/availability_log.json
CHAT_MEMORY_FILE=memory/chat_memory.json
BLOG_SUMMARY_FILE=about_me/blog_posts_summary.md
AVAILABILITY_PLOT_FILE=logs/availability_plot.png

Blog Configuration

BLOG_CREATOR_NAME=Your Name
BLOG_BASE_URL=https://yourblog.com
MAX_SCRAPE_PAGES=10
MAX_SCRAPE_POSTS=50
PLAYWRIGHT_TIMEOUT=30000

System Configuration

TIMEZONE=UTC
LOG_RETENTION_DAYS=30

Gradio Interface

GRADIO_TITLE=AI Blog Assistant
GRADIO_WELCOME_MESSAGE=Welcome! How can I help you today?
GRADIO_INPUT_LABEL=Your message
GRADIO_BUTTON_TEXT=Send

Hugging Face Deployment

HF_TOKEN=hf_your_access_token_here

💻 Usage

Basic Chat Interface

  1. Start the application:

    python main.py
  2. Access the web interface at http://localhost:7860

  3. Begin chatting with the AI assistant about blog content, cybersecurity, or any configured topics

Special Commands

The chatbot supports several special admin commands:

  • read all blog posts - Scrapes and summarizes all blog posts
  • display stats - Shows website availability statistics
  • reload context - Reloads document context from files

Document Processing

Place documents in the about_me/ directory:

  • PDF files: Automatically processed and indexed
  • DOCX files: Microsoft Word documents supported
  • TXT files: Plain text files
  • MD files: Markdown documents

Website Monitoring

The monitoring system runs automatically in the background:

  • Checks website availability every 60 seconds
  • Logs all results with timestamps
  • Sends email alerts for downtime
  • Generates uptime statistics and visualizations

📚 API Reference

Core Functions

chat(user_input: str, chat_id: Optional[str]) -> Tuple[str, str, Optional[str], str]

Main chat function that processes user input and generates AI responses.

Parameters:

  • user_input: User's message
  • chat_id: Optional session identifier

Returns:

  • Response text
  • Table HTML (if applicable)
  • Graph HTML (if applicable)
  • Final chat ID

scrape_and_summarize(base_url: str, max_pages: int, max_posts: int) -> int

Scrapes blog posts and generates AI summaries.

Parameters:

  • base_url: Blog URL to scrape
  • max_pages: Maximum pages to crawl
  • max_posts: Maximum posts to process

Returns:

  • Number of posts successfully summarized

check_availability() -> None

Continuously monitors website availability in background thread.

Tool Functions

send_user_contact_notification(user_email: str, user_name: str, message: str, chat_id: str) -> str

Sends admin notification when user provides contact information.

🏗️ Architecture

vMeNext/
├── main.py                 # Application entry point and Gradio interface
├── check_env.py           # Environment validation
├── requirements.txt       # Python dependencies
├── packages.txt          # System dependencies (libnss3 for Playwright)
├── src/
│   ├── chatbot.py        # Core AI conversation logic
│   ├── memory.py         # Chat session persistence
│   ├── availability_checker.py  # Website monitoring
│   ├── email.py          # Email notification system
│   ├── blog_scraper.py   # Blog content extraction
│   ├── document_loader.py # Document processing
│   ├── stats.py          # Analytics and visualizations
│   └── utils.py          # Utility functions
├── prompts/
│   └── system_prompt.txt # AI system prompt template
├── about_me/             # Document storage directory
├── logs/                 # Log files and visualizations
└── memory/               # Chat session storage

Component Overview

  • Gradio Interface: Modern web UI with real-time chat
  • OpenAI Integration: GPT model integration with tool calling
  • Playwright Scraping: Robust web scraping with browser automation
  • Email System: SMTP2GO integration for notifications
  • Monitoring System: Background thread for continuous monitoring
  • Document Processing: Multi-format document loading and indexing

🚀 Deployment

vMeNext can be deployed on multiple platforms. Choose the option that best fits your needs:

Deployment on Hugging Face Spaces

This application can be deployed to HuggingFace Spaces using Gradio. Follow the steps below:

Before you begin

  • Ensure your about_me/ folder contains personalized documents (e.g., your résumé, portfolio summary)
  • Remove any pre-existing README.md files inside your project directory if created by previous deployments
  • Ensure all required files are in your project directory:
    • main.py (application entry point)
    • requirements.txt (Python dependencies)
    • packages.txt (system dependencies)
    • .env file with all required environment variables

Step-by-step Deployment

  1. Create a Hugging Face account:

  2. Generate an Access Token:

    • Click your avatar in the top right → "Access Tokens"
    • Click "Create New Token", name it something like gradio-deploy, and give it WRITE permissions
    • Copy the generated token
  3. Add the token to your .env file:

    HF_TOKEN=hf_...
  4. Deploy with Gradio CLI: Run from your project directory:

    uv run gradio deploy

    If Hugging Face doesn't detect your token, use:

    uv run dotenv -f ../.env run -- uv run gradio deploy
  5. During deployment, you'll be prompted to enter:

    • Space name: e.g., vme or vmenext
    • Script path: main.py
    • Hardware type: cpu-basic (free) or upgrade for better performance
    • Secrets: Add secrets such as OPENAI_API_KEY and SMTP2GO_API_KEY
    • Skip GitHub Actions unless you're automating CI/CD
  6. Access your deployed application:

    • Your application will be available at https://huggingface.co/spaces/yourusername/your-space-name
    • The deployment process will automatically handle system dependencies from packages.txt

Alternative: Manual Deployment via Web Interface

If you prefer using the web interface instead of CLI:

  1. Go to huggingface.co/spaces and sign in
  2. Click "Create new Space"
  3. Fill in the space details:
    • Space name: vmenext (or your preferred name)
    • License: Choose appropriate license (e.g., MIT)
    • SDK: Gradio
    • Hardware: CPU basic (free) or upgrade for better performance
    • Visibility: Public or Private
  4. Connect your GitHub repository and set:
    • App file: main.py
    • SDK: gradio
    • SDK version: 5.38.0 (or latest)
  5. Configure Environment Variables in Settings → Variables
  6. Deploy: Your Space will automatically build and deploy

Resources

Gradio Cloud Deployment (Alternative)

Gradio Cloud also provides hosting for Gradio applications.

Prerequisites for Gradio Deployment

  1. Gradio Account: Sign up at gradio.app
  2. GitHub Repository: Push your code to a GitHub repository
  3. Environment Variables: Configure all required environment variables in Gradio Cloud

Deployment Steps

  1. Prepare your repository:

    • Ensure all files are committed to your GitHub repository
    • The main.py file should be in the root directory
    • Include requirements.txt and packages.txt files
  2. Deploy to Gradio Cloud:

    • Go to gradio.app and sign in
    • Click "Create" → "New Space"
    • Connect your GitHub repository
    • Set the following configuration:
      • App File: main.py
      • SDK: gradio
      • SDK Version: 5.38.0 (or latest)
  3. Configure Environment Variables:

    • In your Gradio Space settings, add all required environment variables
    • Go to Settings → Variables and add each variable from the Configuration section
  4. System Dependencies:

    • The packages.txt file includes system dependencies (libnss3) required for Playwright
    • Gradio Cloud will automatically install these dependencies

Local Development

# Development mode with auto-reload
python main.py

The application will be accessible at http://localhost:7860

Deployment Considerations

Platform Comparison

Feature Hugging Face Spaces Gradio Cloud
Free Tier ✅ CPU basic ✅ Available
Performance ⭐⭐⭐ Excellent ⭐⭐ Good
Community ⭐⭐⭐ Large ML community ⭐⭐ Gradio-focused
Custom Domains ❌ No ✅ Yes
Private Spaces ✅ Yes ✅ Yes
Hardware Upgrades ✅ CPU/GPU options ✅ Available

Environment Considerations

  • Security: Use environment variables for all sensitive data
  • Monitoring: Set up log rotation for the logs directory
  • Backup: Regularly backup the memory and logs directories
  • Updates: Keep dependencies updated for security patches
  • Rate Limits: Be aware of API rate limits for OpenAI and email services
  • Resource Usage: Monitor memory and CPU usage, especially for web scraping

🔧 Troubleshooting

Common Issues

Environment Variables Not Set

# Check environment configuration
python check_env.py

Solution: Ensure all required variables are set in .env file

Playwright Browser Issues

# Reinstall browsers
playwright install chromium

Solution: Ensure Playwright browsers are properly installed

Email Not Sending

  • Verify SMTP2GO API key and configuration
  • Check network connectivity
  • Review email timeout settings

Website Monitoring Not Working

  • Verify MONITOR_URL is accessible
  • Check HTTP_TIMEOUT settings
  • Review logs in logs/availability_log.json

Chat Memory Issues

  • Ensure memory/ directory exists and is writable
  • Check CHAT_MEMORY_FILE path configuration

Hugging Face Spaces Deployment Issues

  • Build Failures: Check that all dependencies in requirements.txt are compatible
  • Environment Variables: Ensure all required variables are set in Space settings
  • Memory Issues: Consider upgrading to a higher hardware tier if running out of memory
  • Timeout Issues: Increase timeout values for web scraping operations
  • Playwright Issues: Verify that packages.txt includes all required system dependencies

Debug Mode

Enable debug logging by setting:

DEBUG=true

Log Files

  • Availability logs: logs/availability_log.json
  • Chat memory: memory/chat_memory.json
  • Application logs: Check console output

🤝 Contributing

We welcome contributions! Please follow these guidelines:

Development Setup

  1. Fork the repository
  2. Create a feature branch:
    git checkout -b feature/your-feature-name
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

Code Style

  • Follow PEP 8 guidelines
  • Use type hints for function parameters and returns
  • Add docstrings for all functions and classes
  • Keep functions focused and modular

Testing

  • Test all new features thoroughly
  • Ensure environment validation passes
  • Test with various document formats
  • Verify email functionality

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • OpenAI for providing the GPT models
  • Gradio for the excellent web interface framework
  • Playwright for robust web scraping capabilities
  • SMTP2GO for reliable email delivery

Made with ❤️ for the cybersecurity community

Report BugRequest FeatureDocumentation

About

A sophisticated AI chatbot system that provides intelligent responses about blog content, handles website monitoring, and manages user interactions through a modern web interface.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages