Amazon Product Review Analysis

This project analyzes Amazon product reviews to detect fake reviews and perform customer segmentation using machine learning techniques.

Project Overview

The project aims to:

Analyze customer behavior through review patterns
Segment customers based on their reviewing patterns
Detect potentially fake reviews using unsupervised learning
Perform sentiment analysis on reviews

Dependencies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from sklearn.cluster import KMeans
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Data Preprocessing

Text Processing:

# NLTK preprocessing
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('omw-1.4')

Custom Functions:

def token_stop_pos(text):
    tags = pos_tag(word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in stop_words:
            newlist.append(tuple([word, pos_dict.get(tag[0])]))
    return newlist

Customer Segmentation

Feature Engineering:

Review counts per customer
Average expenditure
Positive/negative review ratio
Review length statistics

K-means Clustering:

kmeans = KMeans(n_clusters=6, random_state=47)
clusters = kmeans.fit_predict(dfs1[columns])

Fake Review Detection

Text Vectorization:

tfidf = TfidfVectorizer(ngram_range=(1, 2), max_df=0.9)
text_features = tfidf.fit_transform(dfl['summaryreview_lemma'].values)

Anomaly Detection:

from sklearn.ensemble import IsolationForest
isolation_forest = IsolationForest(contamination=0.1)
outlier_labels = isolation_forest.fit_predict(outlier_detection_df)

Results and Analysis

Customer Segments:

Cluster 0: Moderate reviewers
Cluster 1: Negative reviewers (potential fake)
Cluster 2: High-volume reviewers
Clusters 3-5: Various authentic patterns

Fake Review Indicators:

Extreme sentiment scores
Unusual review lengths
Irregular voting patterns
Suspicious customer behavior

Conclusions

Customer behavior patterns can effectively identify suspicious reviewing activity
Combined analysis of text features and numerical metrics improves fake review detection
Unsupervised learning techniques successfully segment customers and identify anomalies

Future Improvements

Include more features for analysis
Implement supervised learning with labeled data
Add real-time detection capabilities
Enhance visualization techniques

For detailed implementation and code examples, please refer to the Jupyter notebook.

Dataset

The dataset used in this project is not included in this repository.

👉 You can access the original dataset from the following source:

[Ni, J., Li, J., & McAuley, J. (2019, November). Justifying recommendations using distantly�labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 188-197).]

[Amazon Product Data by Julian McAuley in https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/]

If you're using a modified or custom-labeled version of the dataset, please contact the author for more information.

Notice

⚠️ This repository is part of an ongoing academic research project. The code is released under the MIT License for educational and non-commercial use. Please do not reuse this work in publications or derivative projects without proper citation or prior permission. If you're interested in collaborating, feel free to get in touch!

Contact

Zahra Hasannejad

📧 [email protected]

🌐 GitHub: Zahra Hasannejad

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
Coding.ipynb		Coding.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon Product Review Analysis

Table of Contents

Project Overview

Dependencies

Data Preprocessing

Customer Segmentation

Fake Review Detection

Results and Analysis

Customer Segments:

Fake Review Indicators:

Conclusions

Future Improvements

Dataset

Notice

Contact

About

Uh oh!

Releases

Packages

Languages

License

ZahraHasannejad/sentiment-Analysis-and-fake-review-detection-of-Amazon-reviews

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Review Analysis

Table of Contents

Project Overview

Dependencies

Data Preprocessing

Customer Segmentation

Fake Review Detection

Results and Analysis

Customer Segments:

Fake Review Indicators:

Conclusions

Future Improvements

Dataset

Notice

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages