Skip to content

duongduc388222/market-pulse

Repository files navigation

MarketPulse

A comprehensive web scraping and data analysis system for tracking product prices across multiple e-commerce platforms. MarketPulse features automated daily scraping, price trend analysis, anomaly detection, and a responsive dashboard for visualization.

Features

  • Multi-Store Scraping: Automated price collection from Amazon, eBay, Best Buy, and Target
  • Price Trend Analysis: Historical price tracking with trend visualization
  • Anomaly Detection: Automatic detection of unusual price changes
  • Competitor Analysis: Cross-store price comparison for similar products
  • RESTful API: Full-featured API for product search and analytics
  • Interactive Dashboard: React-based frontend with price charts and trends
  • Automated Scheduling: Daily scraping with Celery task queue
  • Data Warehouse: Structured database schema optimized for time-series data

Architecture

┌─────────────────┐    ┌──────────────┐    ┌─────────────────┐
│   Web Scrapers  │────│ ETL Pipeline │────│   Data Warehouse│
│   (Selenium/BS4)│    │  (Validation)│    │   (SQLite/PG)   │
└─────────────────┘    └──────────────┘    └─────────────────┘
                                                     │
┌─────────────────┐    ┌──────────────┐             │
│  React Dashboard│────│  FastAPI     │─────────────┘
│   (Charts/UI)   │    │   (REST API) │
└─────────────────┘    └──────────────┘

┌─────────────────┐    ┌──────────────┐
│ Celery Scheduler│────│  Analytics   │
│ (Daily Tasks)   │    │  (Anomalies) │
└─────────────────┘    └──────────────┘

Quick Start

1. Environment Setup

# Clone and setup
git clone <repository>
cd price-compare

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements-minimal.txt

2. Database Setup

# Initialize database and create sample data
python database/setup.py
python demo.py

3. Start Services

# Terminal 1: Start API server
source venv/bin/activate
python api/main.py

# Terminal 2: Start frontend (in another terminal)
cd frontend
npm install
npm start

# Terminal 3: Start Celery worker (optional, for automated scraping)
source venv/bin/activate
celery -A scrapers.tasks worker --loglevel=info

4. Access the Application

Project Structure

price-compare/
├── api/                    # FastAPI application
│   ├── main.py            # API routes and configuration
│   ├── crud.py            # Database operations
│   └── schemas.py         # Pydantic models
├── analytics/             # Price analysis and anomaly detection
│   └── detector.py        # Analytics engine
├── database/              # Database models and utilities
│   ├── models.py          # SQLAlchemy models
│   ├── connection.py      # Database connection management
│   ├── setup.py           # Database initialization
│   └── seed.py            # Initial data seeding
├── scrapers/              # Web scraping framework
│   ├── base.py            # Abstract scraper base class
│   ├── amazon.py          # Amazon-specific scraper
│   ├── ebay.py            # eBay-specific scraper
│   ├── pipeline.py        # ETL data processing
│   └── tasks.py           # Celery background tasks
├── frontend/              # React dashboard
│   └── src/
│       ├── components/    # React components
│       ├── services/      # API client
│       └── types/         # TypeScript definitions
├── config/                # Configuration management
│   └── settings.py        # Application settings
├── tests/                 # Test suite
├── docker-compose.yml     # Docker orchestration
└── requirements.txt       # Python dependencies

API Endpoints

Products

  • GET /products/ - List products with latest prices
  • GET /products/{id} - Get specific product details
  • GET /products/{id}/price-history - Get price history
  • GET /products/{id}/trends - Get price trend analysis

Stores

  • GET /stores/ - List all configured stores
  • GET /stores/{id} - Get store details

Search & Analytics

  • POST /search - Search products across stores
  • GET /analytics/price-changes - Recent price changes
  • POST /scrape/product - Trigger manual product scraping

Configuration

Copy .env.example to .env and configure:

# Database
DATABASE_URL=sqlite:///./price_compare.db

# API
API_HOST=0.0.0.0
API_PORT=8000

# Scraping
SCRAPING_DELAY=1
MAX_CONCURRENT_REQUESTS=5

# Redis (for Celery)
REDIS_URL=redis://localhost:6379/0

Development

Running Tests

source venv/bin/activate
python -m pytest tests/ -v

Adding New Scrapers

  1. Create new scraper class inheriting from BaseScraper
  2. Implement extract_product_info() and search_products() methods
  3. Add scraper to the registry in scrapers/tasks.py
  4. Update database with new store entry

Database Migrations

For production use, implement proper migrations:

# Install Alembic
pip install alembic

# Initialize migrations
alembic init alembic

# Create migration
alembic revision --autogenerate -m "description"

# Apply migrations
alembic upgrade head

Production Deployment

Docker Deployment

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f api

# Scale workers
docker-compose up -d --scale celery=3

Manual Deployment

  1. Database: Use PostgreSQL for production
  2. Redis: Required for Celery task queue
  3. Web Server: Use Nginx as reverse proxy
  4. Process Manager: Use systemd or supervisor for process management
  5. Monitoring: Implement logging aggregation and metrics collection

Security Considerations

  • Rate Limiting: Built-in request throttling to avoid IP blocking
  • Data Validation: Comprehensive input validation on all endpoints
  • Environment Variables: Sensitive configuration via environment variables
  • CORS: Configurable cross-origin resource sharing
  • Error Handling: Graceful error handling without data exposure

Legal and Ethical Considerations

  • Terms of Service: Ensure compliance with each site's ToS
  • Rate Limiting: Respectful scraping with appropriate delays
  • User Agents: Proper identification in requests
  • Data Usage: Only collect publicly available pricing data
  • Caching: Implement reasonable caching to minimize requests

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement changes with tests
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is for educational purposes. Ensure compliance with all applicable terms of service and legal requirements when scraping websites.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published