You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-3Lines changed: 26 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,12 +64,15 @@ The model is a Logistic Regression classifier, trained using GridSearchCV to fin
64
64
## Evaluation and Explainability
65
65
To evaluate the model, we generate a classification graph, showcasing precision, recall and F1-score per class. We also save a confusion matrix (true vs predicted labels). Both are saved as .png images under the `/metrics` directory. Our model presented the following results:
The blue (precision), orange (recall) and green (f1-score) bars show the metrics for each class. We can see that the model is overall more precise when detecting positive reviews, but shows solid scores on both cases.
As we see from the confusion matrix, the model's most frequent mistake is confusing negative reviews as positive. Aside from training and using other models, this could also be a result of an unbalanced dataset - one that has more positive reviews than negative ones.
73
76
74
77
For explainability, we want to know why a review was classified as positive or negative. We use two complementary tools:
75
78
@@ -78,9 +81,29 @@ LIME works on individual predictions. For a given review, it identifies the top
78
81
79
82

80
83
84
+
This graph shows which words weighted the model the most towards a specific prediction. Green words weight towards a positive review and red ones, towards a negative review.
81
85
82
86
### SHAP (SHapley Additive exPlanations)
83
87
SHAP provides a more general view. Instead of only explaining one prediction, it highlights the most influential words across many reviews. The `explainability.py` file loads the trained Logistic Regression model and the TF–IDF pipeline and samples reviews, generating LIME and SHAP visualizations in the `/explainability` folder.
84
88
85
-
86
89

90
+
91
+
This is an SHAP Summary Plot. The Y-axis shows the most important words/features, sorted by overall importance, while the X-axis shows their SHAP value (impact on the model’s output).
92
+
93
+
Red means the feature increased the chance of predicting “positive”, and blue means the feature pushed towards “negative”.Their position from left to right define how strong the impact was.
94
+
95
+
## Summary
96
+
In this project, we built a sentiment analysis model using Olist’s Brazilian E-Commerce Public Dataset. Our objective was to automatically classify customer reviews as positive or negative and to further interpret the model’s decisions using explainability tools.
97
+
98
+
The final model is a Logistic Regression classifier, trained on TF–IDF features and optimized with GridSearchCV. The training/test split was 80/20, and the model was optimized for F1-score.
99
+
100
+
### Results:
101
+
-**Accuracy**: ~88%
102
+
-**F1-score**: 0.82 (negative), 0.91 (positive)
103
+
-**Precision/Recall**: the model is more reliable at detecting positive reviews, but still achieves solid results for negative ones.
104
+
105
+
### Explainability:
106
+
107
+
**LIME** demonstrated how individual words influence single predictions. **SHAP** provided a global perspective, ranking the most influential words across the dataset.
108
+
109
+
Overall, the project achieved its dual goal: building a performant model for sentiment classification and making its predictions transparent and interpretable.
0 commit comments