An end-to-end NLP application that scrapes Daraz product reviews, performs dynamic topic modeling, sentiment analysis, and topic-wise summarization, and presents insights through an interactive UI.
The system is designed to work with English, Nepali, and code-mixed reviews, making it suitable for real-world Nepali e-commerce data.
- Paste any Daraz product URL
- Scrape dynamic reviews
- Automatically group reviews into meaningful topics
- Multilingual topic modeling (English + Nepali + mixed text)
- Sentiment analysis using a fine-tuned model
- Topic-wise review summarization
- Interactive UI with expandable topic sections
-
BERTopic
Used for dynamic topic modeling and topic grouping.
It clusters semantically similar reviews using embeddings and extracts interpretable topic representations without predefined labels. -
SentenceTransformers
Used to generate dense multilingual embeddings for reviews.
These embeddings capture semantic meaning and are required by BERTopic for accurate clustering. -
XLM-RoBERTa (Fine-tuned)
Used for sentiment analysis.
The model is fine-tuned on review data, enabling accurate sentiment detection for multilingual and code-mixed customer feedback. -
Facebook BART Large CNN
Used for abstractive summarization.
Generates concise, high-quality summaries for each topic group. -
Playwright
Used for scraping Daraz reviews.
Handles JavaScript-rendered and dynamically loaded content that traditional scrapers cannot reliably extract. -
FastAPI
Used as the backend service.
Handles scraping and heavy NLP processing while keeping the system modular and scalable. -
Streamlit
Used for building the interactive user interface.
Displays topics, summaries, and expandable review sections cleanly. -
Python
Core language used for orchestration, NLP pipelines, backend logic, and scraping.
- User pastes a product link into the UI
- URL is sent to the FastAPI backend
- Reviews are scraped using Playwright
- Text is cleaned and normalized
- SentenceTransformers generate embeddings
- BERTopic clusters reviews into topics
- XLM-RoBERTa predicts sentiment per review
- BART Large CNN summarizes reviews per topic
- Results are rendered in the Streamlit UI
- Review dataset collection from e-commerce platforms
- Text cleaning and normalization
- Multilingual tokenization using XLM-RoBERTa tokenizer
- Fine-tuning on labeled sentiment data
- Evaluation using Accuracy, Precision, Recall, F1-score
- Deployment inside the FastAPI inference pipeline
git clone https://github.com/roshan-acharya/Review-Analyzer.git
cd Review-Analyzerpython -m venv venv
source venv/bin/activate # Linux / Mac
venv\Scripts\activate # Windows
pip install -r requirements.txt
uvicorn api:app --reload
streamlit run app.py
-
Scraping may fail if Daraz blocks automated requests
-
Very short reviews may reduce topic quality
-
Performance depends on system resources
-
Sentiment analysis model is quite imperfect due to low resouce language
Roshan Acharya

