This repository showcases a data cleaning project performed entirely in Microsoft Excel, using a dataset of U.S. Presidents. The work includes identifying and correcting inconsistencies, formatting data for readability, and preparing the dataset for future data analysis or visualization.
The dataset contains historical information about U.S. Presidents, including:
- President Name
- Political Party
- Vice President
- Salary
- Date Created
- Date Updated
- (Originally included more columns like βprior roleβ and row indexes)
All cleaning was done manually using Excel formulas, formatting tools, and filters. Below are the key changes made:
- Deleted the index column (
Unnamed: 0) and unnecessary metadata. - Removed the βpriorβ column due to inconsistent formatting and encoding errors.
- Fixed inconsistent casing (e.g.,
john adams,JAMES MONROE) by converting all names to title case. - Trimmed extra spaces within names (e.g.,
George ClintonβGeorge Clinton).
- Standardized inconsistent party names like:
Democratic- RepublicanβDemocratic-Republican
- Verified numeric consistency and formatting for the salary column.
- Ensured all date_created and date_updated fields follow the standard ISO format:
YYYY-MM-DD.
President Information - Data Cleaning.xlsx: The Excel file containing:US_Presidents Dataβ original raw datasetUS_Presidents Data Fixedβ cleaned and formatted version
- Microsoft Excel (no code required!)
- Find & Replace
- Text functions (e.g.,
PROPER(),TRIM()) - Filter and Sort
- Manual inspection and correction
- Export cleaned dataset to CSV for public data analysis.
- Visualize trends in U.S. Presidential data (e.g., salaries, party changes).
- Augment dataset with additional fields like education, birthplace, or term years.