A modular Python project to scrape product reviews from Myntra, store them in MongoDB, and perform interactive analysis using Streamlit.
The project is designed with a clear separation between data collection, storage, analysis, and presentation layers.
⚠️ This project is built strictly for educational and learning purposes.
The Myntra Review Scraper allows users to:
- Search products on Myntra
- Scrape customer reviews programmatically
- Store scraped data in MongoDB
- Perform exploratory analysis and generate insights using an interactive Streamlit dashboard
The project follows a clean, layered architecture, making it easy to extend, debug, and maintain.
The project is structured into independent modules:
-
Scraping Layer
Handles product search and review extraction logic. -
Persistence Layer (Cloud I/O)
Manages MongoDB operations such as storing and fetching reviews. -
Analytics Layer
Generates summary statistics and product-level insights from review data. -
Presentation Layer
Streamlit pages for search and analysis.
This separation ensures:
- No UI logic inside scraping code
- No database logic inside UI pages
- Better testability and scalability
myntra-review-scraper/ │ ├── dashboard/ │ └── streamlit_app.py # Streamlit entry point │ ├── src/ │ ├── cloud_io/ # MongoDB interaction layer │ │ └── cloud_io.py │ │ │ ├── constants/ # Application constants │ │ └── constants.py │ │ │ ├── data_report/ # Analytics & dashboard logic │ │ └── generate_data_report.py │ │ │ ├── scrapper/ # Myntra scraping logic │ │ └── scrape.py │ │ │ ├── utils/ # Utility/helper functions │ │ └── init.py │ │ │ └── exception.py # Custom exception handling │ ├── requirements.txt ├── .gitignore └── README.md
yaml Copy code
- Language: Python
- Web Scraping: Selenium, BeautifulSoup
- Database: MongoDB
- Data Handling: Pandas
- Dashboard / UI: Streamlit
- Version Control: Git & GitHub
- Product-based review scraping
- Config-driven session management
- MongoDB-backed persistent storage
- Interactive data analysis dashboard
- Modular, extensible codebase
git clone https://github.com/<your-username>/myntra-review-scraper.git
cd myntra-review-scraper
2️⃣ Create and activate virtual environment
bash
Copy code
python -m venv env
env\Scripts\activate # Windows
3️⃣ Install dependencies
bash
Copy code
pip install -r requirements.txt
4️⃣ Configure MongoDB
Ensure MongoDB is running locally or
Provide MongoDB connection details inside cloud_io.py
▶️ How to Run the Project
Run Streamlit Dashboard
bash
Copy code
streamlit run dashboard/streamlit_app.py
The dashboard will open in your browser at:
arduino
Copy code
http://localhost:8501
📊 Analysis Workflow
Search for a product using the Streamlit search page
Scrape reviews and store them in MongoDB
Navigate to the analysis page
Generate:
General review statistics
Product-level insights
Structured exploratory analysis
⚠️ Disclaimer
This project is not affiliated with Myntra
Scraping is performed only for educational purposes
Users are responsible for complying with website terms of service
📈 Learning Outcomes
Through this project, the following concepts are demonstrated:
Modular Python project design
Web scraping with Selenium
Database-backed analytics pipelines
Streamlit-based data applications
Clean separation of concerns
📬 Future Improvements
API-based backend (Flask/FastAPI)
Advanced sentiment analysis
Pagination & filtering
Async scraping for performance
Deployment on cloud platforms
👤 Author
Pranav Sohaney
Data Analytics & Data Science Enthusiast