🧹 Data Cleaning Task - Internship

📌 Overview

This is the first task of my internship. The objective is to clean and preprocess the provided dataset to make it suitable for further analysis and machine learning tasks.

Data cleaning is important because raw data often contains missing values, duplicates, incorrect data types, and outliers. By cleaning the data, we ensure better quality and reliability of results.

📂 Files in this Repository

Task1_DataCleaning.ipynb → Jupyter Notebook containing all data cleaning steps.
dataset.csv → The raw dataset provided for this task.
README.md → Project documentation (this file).

🛠️ Steps Performed

Importing Libraries – Pandas, NumPy, Matplotlib, Seaborn.
Loading Data – Read the dataset with pandas.read_csv().
Exploring Data – Used .head(), .info(), .describe() to understand structure.
Handling Missing Values – Checked with .isnull().sum(), applied mean/median/mode imputation or dropped rows.
Removing Duplicates – Identified using .duplicated(), then removed duplicates.
Data Type Conversion – Converted incorrect data types to correct ones (e.g., object → datetime/int).
Outlier Treatment – Used boxplots and IQR method to detect and handle outliers.
Renaming & Standardizing Columns – Fixed inconsistent column names.
Final Output – Saved the cleaned dataset for further use.

📊 Results

The dataset was cleaned successfully.
Missing values and duplicates were handled.
Outliers were treated.
Final dataset is now ready for upcoming tasks.

🚀 How to Run

Clone the repository:

git clone <your-repository-link>
cd DataCleaningTask

Open Jupyter Notebook:

jupyter notebook

Run the notebook Task1_DataCleaning.ipynb step by step.

✅ Conclusion

This task provided practical experience in data preprocessing. The cleaned dataset will be used for analysis and modeling in the upcoming internship tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
data_cleaning.py		data_cleaning.py
titanic.csv		titanic.csv
titanic_cleaned.csv		titanic_cleaned.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧹 Data Cleaning Task - Internship

📌 Overview

📂 Files in this Repository

🛠️ Steps Performed

📊 Results

🚀 How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧹 Data Cleaning Task - Internship

📌 Overview

📂 Files in this Repository

🛠️ Steps Performed

📊 Results

🚀 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages