Skip to content

panduru-aadhithya/Data_Cleaning_Task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧹 Data Cleaning Task - Internship

📌 Overview

This is the first task of my internship. The objective is to clean and preprocess the provided dataset to make it suitable for further analysis and machine learning tasks.

Data cleaning is important because raw data often contains missing values, duplicates, incorrect data types, and outliers. By cleaning the data, we ensure better quality and reliability of results.


📂 Files in this Repository

  • Task1_DataCleaning.ipynb → Jupyter Notebook containing all data cleaning steps.
  • dataset.csv → The raw dataset provided for this task.
  • README.md → Project documentation (this file).

🛠️ Steps Performed

  1. Importing Libraries – Pandas, NumPy, Matplotlib, Seaborn.
  2. Loading Data – Read the dataset with pandas.read_csv().
  3. Exploring Data – Used .head(), .info(), .describe() to understand structure.
  4. Handling Missing Values – Checked with .isnull().sum(), applied mean/median/mode imputation or dropped rows.
  5. Removing Duplicates – Identified using .duplicated(), then removed duplicates.
  6. Data Type Conversion – Converted incorrect data types to correct ones (e.g., object → datetime/int).
  7. Outlier Treatment – Used boxplots and IQR method to detect and handle outliers.
  8. Renaming & Standardizing Columns – Fixed inconsistent column names.
  9. Final Output – Saved the cleaned dataset for further use.

📊 Results

  • The dataset was cleaned successfully.
  • Missing values and duplicates were handled.
  • Outliers were treated.
  • Final dataset is now ready for upcoming tasks.

🚀 How to Run

  1. Clone the repository:
    git clone <your-repository-link>
    cd DataCleaningTask

Open Jupyter Notebook:

jupyter notebook

Run the notebook Task1_DataCleaning.ipynb step by step.

✅ Conclusion

This task provided practical experience in data preprocessing. The cleaned dataset will be used for analysis and modeling in the upcoming internship tasks.

About

“Data cleaning & preprocessing task for Titanic dataset”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages