US President Donald Trump and President-Elect Joe Biden had their chance to challenge each other face to face twice during the Presidential Debates. In this project we're trying to analyze and visualize a dataset that contains the transcripts of the debates.
- Python : 3.8.5
- Libraries : pandas, seaborn, wordcloud, bs4, re, nltk, sklearn
- Data : Kaggle, additional data scraped from Factba.se and Rev
Upon initial analysis of the data obtained from Kaggle we can identify a number of data cleaning and pre-processing steps required.
-
Inconsistent speaker names : There are inconsistencies in the speaker names such as Chris Wallace being represenred as 'Chris Wallace' and 'Chris Wallace : ', President Trump being represented as 'President Donald J. Trump', 'President Trump' and 'Donald Trump'. All of these are normalised to 'Donald Trump'.

















