Water Quality Analysis & Employee Data Tasks

Set of scripts for data processing, ML classification, and database integration for the club project.

1. Prerequisites

Python 3.10+
MySQL Server
Virtualenv (recommended)

2. Installation

Clone the repo and install the requirements via pip.

we'll set up a virtual environment, and then install pandas in it. because many times installs conflict with global permissions on users' computers. Many times projects require different versions of dependencies, so thats why this is a requirement. but it should run fine on any system with pandas

python -m venv venv
source venv/bin/activate  # venv\Scripts\activate on Windows
pip install -r requirements.txt

TASK- 1:

Implements a Gaussian Naive Bayes model to predict water potability based on chemical features. The pipeline includes median-based null handling and exports results directly to a SQLite database.

TASK - 2:

As per the task, I made a commented program that replaces all NaNs in the salary column with the department-wise median.

Why is Grouped Imputation better than Global Imputation? IN a company people are paid differently in each department, if we replace all null values to the median of the entire slary column, we will underestimate the salary of a Project Mangager and overestimate the salary of a janitor, so thats why I took the median of the salary department wise, that way the null values will be replaced more accurately

TASK -3:

I first converted the timestamp from string to date-time so that i can sort by it as mentioned in the task, then i found and counted the number of duplicates in transaction id, stored it in a series, so that i can just give the length of series to tell how many duplicates there were, this wasnt in the task but it seems like valuable info. then i sorted the dataframe according to timestamp and removed all duplicates except the last.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Task1		Task1
Task2		Task2
Task3		Task3
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Quality Analysis & Employee Data Tasks

1. Prerequisites

2. Installation

TASK- 1:

TASK - 2:

TASK -3:

About

Uh oh!

Releases

Packages

Languages

YashBhatt2/BHCG-The-Nudge-DA-Tasks

Folders and files

Latest commit

History

Repository files navigation

Water Quality Analysis & Employee Data Tasks

1. Prerequisites

2. Installation

TASK- 1:

TASK - 2:

TASK -3:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages