-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathwebsite_implamentation.htm
More file actions
205 lines (150 loc) · 12.1 KB
/
website_implamentation.htm
File metadata and controls
205 lines (150 loc) · 12.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<!DOCTYPE html>
<html>
<head>
<title>EPAI Sentiment Analysis of Tweets Project</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://www.w3schools.com/w3css/4/w3.css">
<link rel="stylesheet" href="https://www.w3schools.com/lib/w3-theme-indigo.css">
<link rel="stylesheet" href="website_styles.css">
</head>
<body>
<!-- Header -->
<header class="w3-theme-d5 (w3-theme-dark) w3-container w3-center w3-padding-32">
<h1>Sentiment Analysis of Tweets</h1>
<h1 class="w3-xlarge">EPAI</h1>
<div class="w3-padding-32 link-bar">
<div class="w3-bar w3-border">
<a href="#intro" class="w3-bar-item w3-button w3-padding-16 w3-border w3-border-white w3-hover-pale-blue">Introduction</a>
<a href="#project" class="w3-bar-item w3-button w3-padding-16 w3-border w3-border-white w3-hover-pale-blue">Our Project</a>
<a href="#download" class="w3-bar-item w3-button w3-padding-16 w3-border w3-border-white w3-hover-pale-blue">Download</a>
<a href="#team" class="w3-bar-item w3-button w3-padding-16 w3-border w3-border-white w3-hover-pale-blue">Our Team</a>
</div>
</div>
</header>
<!-- Page content -->
<div class="w3-theme-d5 (w3-theme-dark)">
<div class="w3-content" style="max-width:1100px">
<!-- Introduction Section -->
<div class="w3-container" id="intro">
<h1 style="color:orange;">Introduction</h1><br>
</div>
<div class="w3-row w3-padding-32">
<div class="w3-half w3-container">
<p>Toxic posts, abuse, accusation, attack, and hateful language have become regular phenomena. One way to contain such posts is through content moderation. But the bigger problem with hate speeches is that they are hard to contain when they go into the internet, especially on popular social media platforms. In this project, we ask, can we design something to contain harmful posts even before a user posts them online? To answer this question, we have created an application for hate speech detection using NLP models.</p>
</div>
<div class="w3-half w3-container">
<div class="video-container">
<iframe src="https://www.youtube.com/embed/diIKAaBDHe0"></iframe>
</div>
</div>
</div>
<!-- Our Project Section -->
<div class="w3-container" id="project">
<h1 style="color:orange;">Our Project</h1><br>
</div>
<!-- Dataset Section-->
<div class="w3-half w3-container">
</div>
<div class="w3-half w3-container">
<h4 style="color:orange;">Dataset</h4>
</div>
<div class="w3-row w3-padding-32">
<div class="w3-half w3-container" id="histograms">
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/histograms_of_data.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="Histograms of Data" style="width:100%">
</div>
<div class="w3-half w3-container">
<p>The <a href="https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech" title="Measuring Hate Speech Dataset" target="_blank" class="w3-text-light-blue w3-hover-text-orange">Measuring Hate Speech Dataset</a> from UC Berkeley was used and contains 39,565 comments from Youtube, Reddit and Twitter. After cleaning, each comment is between 4 and 600 characters long. 7,912 annotators went through the comments and for each, indicated any identities described, and evaluated 10 labels. The labels we used were Respect, Insult, Humiliate, Status, Dehumanize, Violence, Genocide, and Attack Defend. Respect describes whether the comment was respectful towards the group(s) in the comment. Insult describes whether the comment was insulting the group(s) mentioned. Humiliate describes whether the comment was humiliating the group(s) mentioned. Dehumanize describes whether the comment dehumanized the group(s) mentioned. Violence describes whether the comment called for using violence against the group(s) mentioned. Genocide describes whether the comment calls for the deliberate killing of the group(s) mentioned. Aside from Status and Attack-Defend, each label was an ordinal Likert-style variable with 5 response options: strongly disagree, disagree, neutral, agree, strongly agree. Status describes whether the comment describes the group(s) in the comment as strongly inferior, inferior, neutral, superior, or strongly superior, and Attack-Defend describes whether the comment was strongly defending, defending, neutral, supporting, or strongly supporting the group(s) mentioned. For each label, all the ratings of the label were combined into a continuous score for each comment.</p>
</div>
</div>
<!-- Data Processing Section-->
<div class="w3-container">
<h4 style="color:orange;">Data Processing</h4>
<p>The dataset was processed using the Pandas and Numpy libraries. All comments were cleaned so that punctuations are separated and counted as individual “words” for the convenience of sentiment analysis. Due to the nature of our task and the problematic language in the dataset, pre-built word embeddings such as GloVe and BERT could not be used, thus a custom word embedding library was needed. Thus, all words were compiled into a dictionary to create word-to-index and index-to-word mappings for lookup; the lookup arrays were then stored into their respective pickle files. The data was then split using a 70:15:15 train:validation:test ratio, with around 90k training samples, and 20k validation.</p>
</div>
<!-- Model & Results Section-->
<div class="w3-half w3-container">
</div>
<div class="w3-half w3-container">
<h4 style="color:orange;">Model & Results</h4>
</div>
<div class="w3-row">
<div class="w3-half w3-container">
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/training_curve_graph.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="Training Curve Graph" style="width:90%">
</div>
<div class="w3-half w3-container">
<p>Using the processed data, we have experimented with three models: basic RNN, LSTM, and GRU. Out of the three models, RNN was first trained, though the model had trouble learning in later epochs, and the validation accuracy was capped at around 30%. Though tweets are short by nature and limited by a character count, our dataset contained longer comments, so we still decided to try out models with long-term memories; indeed, an LSTM model was able to capture the long term dependencies and obtain an F1 test accuracy score of around 50%. Further experiments and fine-tuning with the GRU model presented the best results with the F1 test accuracy of 60%.</p><br>
</div>
</div>
<div class="w3-row w3-padding-32">
<div class="w3-half w3-container">
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/loss_curve_graph.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="Loss Curve Graph" style="width:90%">
</div>
<div class="w3-half w3-container">
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/validation_curve_graph.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="Validation Curve Graph" style="width:90%">
</div>
</div>
<!-- Limitations & Future Improvements Section-->
<div class="w3-container">
<h4 style="color:orange;">Limitations & Future Improvements</h4>
<p>Though the quantitative result is promising, qualitative testing has exposed several limitations of the model. For starters, due to our customized word embeddings, if the model receives words that did not exist in the data set, it will ignore that word; this means that if the user makes any typos, uses any slangs that are recently invented, etc., the result will be less accurate since the model will not be able to capture the sentiment of words it does not recognize. Secondly, the model produces more accurate scores when the inputted text is longer. This phenomenon could be caused by the nature of the training data, and could be potentially improved with more training data containing short text. Lastly, due to the stochastic nature of the model, the sentiment scores will fluctuate when the model analyzes the same body of text; based on qualitative observations, in categories where the distribution of scores are not even, such as Violence and Genocide, the score fluctuation could be as high as 15%, while categories with an even distribution such as Humiliate and Dehumanize produces smaller fluctuations (see histograms <a href="#histograms" class="w3-text-light-blue w3-hover-text-orange">here</a>). This behaviour could be improved by balancing the distribution in the data processing stage, though this could prove difficult since there exist relationships between the categories. For instance, the categories Violence and Genocide are intrinsically linked, as well, Attack Defend is usually the opposite of Insult. It will be difficult to balance one category without visibly impacting other training categories.</p>
</div>
<!-- Github Section -->
<div class="w3-container">
<p>For the codespace, visit our github repository <a href="https://github.com/AngelinaZhai/epai-sentiment-of-tweets" title="GITHUB" target="_blank" class="w3-text-light-blue w3-hover-text-orange">here</a>.</p>
</div>
<!-- Download Section -->
<div class="w3-container w3-padding-32" id="download">
<h1 style="color:orange;">Download</h1><br>
<p>1. Download the macOS version <a href="https://www.mediafire.com/file/8czyfoy10s1non9/EPAI_Sentiment_Analyzer_%2528mac%2529.zip/file" title="Download macOS version" target="_blank" class="w3-text-light-blue w3-hover-text-orange">here</a> or Windows version <a href="https://www.mediafire.com/file/skhrdj8ovkxy8pq/EPAI_Sentiment_Analyzer_%2528windows%2529.zip/file" title="Download Windows version" target="_blank" class="w3-text-light-blue w3-hover-text-orange">here</a>.</p>
<p>2. Once you have downloaded, unzip the folder and then run the main.exe (this may take up to one minute to load).</p>
<p>3. Enjoy using our application!</p>
<div class="w3-row">
<p>NOTE: If your computer shows warnings for the application being from an unidentified developer, follow the steps below</p>
<div class="w3-row">
<p>Mac:</p>
<p>Go to Settings -> Privacy & Security, and select “Open Anyway”</p>
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/mac_block.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="mac instruction" style="width:60%">
</div>
<div class="w3-row">
<p>Windows:</p>
<div class="w3-half w3-container">
<p>1. Select “More Info”</p>
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/windows_more_info_white.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="windows instruction 1" style="width:100%">
</div>
<div class="w3-half w3-container">
<p>2. Select “Run Anyway”</p>
<img src="https://github.com/AngelinaZhai/epai-sentiment-of-tweets/blob/main/images/windows_run_white.png?raw=true" class="w3-round w3-image w3-opacity-min" alt="windows instruction 2" style="width:100%">
</div>
</div>
</div>
</div>
<!-- Our Team Section -->
<div class="w3-container w3-padding-32" id="team">
<h1 style="color:orange;">Our Team</h1><br>
<div class="w3-col w3-center" style="width:25%">
<p class="member"><b>Angelina Zhai</b></p>
<p>Project Lead</p>
</div>
<div class="w3-col w3-center" style="width:25%">
<p class="member"><b>Dian Rong</b></p>
<p>Backend Developer</p>
</div>
<div class="w3-col w3-center" style="width:25%">
<p class="member"><b>Kathy Lin</b></p>
<p>Webmaster & Backend Developer</p>
</div>
<div class="w3-col w3-center" style="width:25%">
<p class="member"><b>Akriti Sharma</b></p>
<p>Project Researcher</p>
</div>
</div>
</div>
<!-- End page content -->
</div>
<!-- Footer -->
<footer class="w3-center w3-theme-d5 (w3-theme-dark) w3-padding-32">
<p>Powered by <a href="https://www.w3schools.com/w3css/default.asp" title="W3.CSS" target="_blank" class="w3-text-light-blue w3-hover-text-orange">w3.css</a></p>
</footer>
</body>
</html>