🛒 Daraz Review Topic Analyzer

An end-to-end NLP application that scrapes Daraz product reviews, performs dynamic topic modeling, sentiment analysis, and topic-wise summarization, and presents insights through an interactive UI.

The system is designed to work with English, Nepali, and code-mixed reviews, making it suitable for real-world Nepali e-commerce data.

🚀 Features

Paste any Daraz product URL
Scrape dynamic reviews
Automatically group reviews into meaningful topics
Multilingual topic modeling (English + Nepali + mixed text)
Sentiment analysis using a fine-tuned model
Topic-wise review summarization
Interactive UI with expandable topic sections

🧠 Tech Stack (with Purpose)

BERTopic
Used for dynamic topic modeling and topic grouping.
It clusters semantically similar reviews using embeddings and extracts interpretable topic representations without predefined labels.
SentenceTransformers
Used to generate dense multilingual embeddings for reviews.
These embeddings capture semantic meaning and are required by BERTopic for accurate clustering.
XLM-RoBERTa (Fine-tuned)
Used for sentiment analysis.
The model is fine-tuned on review data, enabling accurate sentiment detection for multilingual and code-mixed customer feedback.
Facebook BART Large CNN
Used for abstractive summarization.
Generates concise, high-quality summaries for each topic group.
Playwright
Used for scraping Daraz reviews.
Handles JavaScript-rendered and dynamically loaded content that traditional scrapers cannot reliably extract.
FastAPI
Used as the backend service.
Handles scraping and heavy NLP processing while keeping the system modular and scalable.
Streamlit
Used for building the interactive user interface.
Displays topics, summaries, and expandable review sections cleanly.
Python
Core language used for orchestration, NLP pipelines, backend logic, and scraping.

🏗️ System Architecture & Workflow

🔄 Overall Workflow Diagram

Workflow Explanation

User pastes a product link into the UI
URL is sent to the FastAPI backend
Reviews are scraped using Playwright
Text is cleaned and normalized
SentenceTransformers generate embeddings
BERTopic clusters reviews into topics
XLM-RoBERTa predicts sentiment per review
BART Large CNN summarizes reviews per topic
Results are rendered in the Streamlit UI

🧪 Model Fine-tuning Pipeline

🔧 Fine-tuning Diagram

Fine-tuning Process (XLM-RoBERTa)

Review dataset collection from e-commerce platforms
Text cleaning and normalization
Multilingual tokenization using XLM-RoBERTa tokenizer
Fine-tuning on labeled sentiment data
Evaluation using Accuracy, Precision, Recall, F1-score
Deployment inside the FastAPI inference pipeline

⚙️ Installation

1. Clone the repository

git clone https://github.com/roshan-acharya/Review-Analyzer.git
cd Review-Analyzer

2. Create virtual environment

python -m venv venv
source venv/bin/activate      # Linux / Mac
venv\Scripts\activate         # Windows

3. Install dependencies

pip install -r requirements.txt

4. Start Backend

uvicorn api:app --reload

5. Start UI

streamlit run app.py

⚠️ Limitations

Scraping may fail if Daraz blocks automated requests
Very short reviews may reduce topic quality
Performance depends on system resources
Sentiment analysis model is quite imperfect due to low resouce language

👤 Author

Roshan Acharya

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Data		Data
Pipeline		Pipeline
Preprocessing		Preprocessing
Scraper		Scraper
Topic		Topic
__pycache__		__pycache__
assets		assets
notebook		notebook
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api.py		api.py
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛒 Daraz Review Topic Analyzer

🚀 Features

🧠 Tech Stack (with Purpose)

🏗️ System Architecture & Workflow

🔄 Overall Workflow Diagram

Workflow Explanation

🧪 Model Fine-tuning Pipeline

🔧 Fine-tuning Diagram

Fine-tuning Process (XLM-RoBERTa)

⚙️ Installation

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Start Backend

5. Start UI

⚠️ Limitations

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛒 Daraz Review Topic Analyzer

🚀 Features

🧠 Tech Stack (with Purpose)

🏗️ System Architecture & Workflow

🔄 Overall Workflow Diagram

Workflow Explanation

🧪 Model Fine-tuning Pipeline

🔧 Fine-tuning Diagram

Fine-tuning Process (XLM-RoBERTa)

⚙️ Installation

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Start Backend

5. Start UI

⚠️ Limitations

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages