Skip to content

Latest commit

 

History

History
52 lines (37 loc) · 2.11 KB

File metadata and controls

52 lines (37 loc) · 2.11 KB

🧹 Data Cleaning Task - Internship

📌 Overview

This is the first task of my internship. The objective is to clean and preprocess the provided dataset to make it suitable for further analysis and machine learning tasks.

Data cleaning is important because raw data often contains missing values, duplicates, incorrect data types, and outliers. By cleaning the data, we ensure better quality and reliability of results.


📂 Files in this Repository

  • Task1_DataCleaning.ipynb → Jupyter Notebook containing all data cleaning steps.
  • dataset.csv → The raw dataset provided for this task.
  • README.md → Project documentation (this file).

🛠️ Steps Performed

  1. Importing Libraries – Pandas, NumPy, Matplotlib, Seaborn.
  2. Loading Data – Read the dataset with pandas.read_csv().
  3. Exploring Data – Used .head(), .info(), .describe() to understand structure.
  4. Handling Missing Values – Checked with .isnull().sum(), applied mean/median/mode imputation or dropped rows.
  5. Removing Duplicates – Identified using .duplicated(), then removed duplicates.
  6. Data Type Conversion – Converted incorrect data types to correct ones (e.g., object → datetime/int).
  7. Outlier Treatment – Used boxplots and IQR method to detect and handle outliers.
  8. Renaming & Standardizing Columns – Fixed inconsistent column names.
  9. Final Output – Saved the cleaned dataset for further use.

📊 Results

  • The dataset was cleaned successfully.
  • Missing values and duplicates were handled.
  • Outliers were treated.
  • Final dataset is now ready for upcoming tasks.

🚀 How to Run

  1. Clone the repository:
    git clone <your-repository-link>
    cd DataCleaningTask

Open Jupyter Notebook:

jupyter notebook

Run the notebook Task1_DataCleaning.ipynb step by step.

✅ Conclusion

This task provided practical experience in data preprocessing. The cleaned dataset will be used for analysis and modeling in the upcoming internship tasks.