Skip to content

pranavsoh/myntra_review_project

Repository files navigation

Myntra Review Scraper & Analysis

A modular Python project to scrape product reviews from Myntra, store them in MongoDB, and perform interactive analysis using Streamlit.
The project is designed with a clear separation between data collection, storage, analysis, and presentation layers.

⚠️ This project is built strictly for educational and learning purposes.


📌 Project Overview

The Myntra Review Scraper allows users to:

  • Search products on Myntra
  • Scrape customer reviews programmatically
  • Store scraped data in MongoDB
  • Perform exploratory analysis and generate insights using an interactive Streamlit dashboard

The project follows a clean, layered architecture, making it easy to extend, debug, and maintain.


🧱 Architecture & Design

The project is structured into independent modules:

  • Scraping Layer
    Handles product search and review extraction logic.

  • Persistence Layer (Cloud I/O)
    Manages MongoDB operations such as storing and fetching reviews.

  • Analytics Layer
    Generates summary statistics and product-level insights from review data.

  • Presentation Layer
    Streamlit pages for search and analysis.

This separation ensures:

  • No UI logic inside scraping code
  • No database logic inside UI pages
  • Better testability and scalability

📂 Project Structure

myntra-review-scraper/ │ ├── dashboard/ │ └── streamlit_app.py # Streamlit entry point │ ├── src/ │ ├── cloud_io/ # MongoDB interaction layer │ │ └── cloud_io.py │ │ │ ├── constants/ # Application constants │ │ └── constants.py │ │ │ ├── data_report/ # Analytics & dashboard logic │ │ └── generate_data_report.py │ │ │ ├── scrapper/ # Myntra scraping logic │ │ └── scrape.py │ │ │ ├── utils/ # Utility/helper functions │ │ └── init.py │ │ │ └── exception.py # Custom exception handling │ ├── requirements.txt ├── .gitignore └── README.md

yaml Copy code


🛠️ Tech Stack

  • Language: Python
  • Web Scraping: Selenium, BeautifulSoup
  • Database: MongoDB
  • Data Handling: Pandas
  • Dashboard / UI: Streamlit
  • Version Control: Git & GitHub

🚀 Features

  • Product-based review scraping
  • Config-driven session management
  • MongoDB-backed persistent storage
  • Interactive data analysis dashboard
  • Modular, extensible codebase

⚙️ Setup Instructions

1️⃣ Clone the repository

git clone https://github.com/<your-username>/myntra-review-scraper.git
cd myntra-review-scraper
2️⃣ Create and activate virtual environment
bash
Copy code
python -m venv env
env\Scripts\activate   # Windows
3️⃣ Install dependencies
bash
Copy code
pip install -r requirements.txt
4️⃣ Configure MongoDB
Ensure MongoDB is running locally or

Provide MongoDB connection details inside cloud_io.py

▶️ How to Run the Project
Run Streamlit Dashboard
bash
Copy code
streamlit run dashboard/streamlit_app.py
The dashboard will open in your browser at:

arduino
Copy code
http://localhost:8501
📊 Analysis Workflow
Search for a product using the Streamlit search page

Scrape reviews and store them in MongoDB

Navigate to the analysis page

Generate:

General review statistics

Product-level insights

Structured exploratory analysis

⚠️ Disclaimer
This project is not affiliated with Myntra

Scraping is performed only for educational purposes

Users are responsible for complying with website terms of service

📈 Learning Outcomes
Through this project, the following concepts are demonstrated:

Modular Python project design

Web scraping with Selenium

Database-backed analytics pipelines

Streamlit-based data applications

Clean separation of concerns

📬 Future Improvements
API-based backend (Flask/FastAPI)

Advanced sentiment analysis

Pagination & filtering

Async scraping for performance

Deployment on cloud platforms

👤 Author
Pranav Sohaney
Data Analytics & Data Science Enthusiast

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages