capstone-project-eda

Repository part of a bigger project: Adversarial AI in Wealth Management, a Capstone Project for IE University.

Packages Involved

Pandas: used for dataset manipulation. Alternatively could have used polars.
- Used Pandas.pd.get_dummies for quick OneHotEncoding-like result. Ordinal Variable treated differently.
ydata_profiling.ProfileReport: Used to automatically generate EDA report.
sklearn.preprocessing.OrdinalEncoder: Used to handle Ordinal Variables.
sklearn.impute.KNNImputer : Used to fill in missing values. Not recommended for larger datasets. But for a dataset of ~15,000 records, it works fine.

EDA

Attatched to repository as index.html. Check it out here.

Missing Value Report

ydata_profiling alert	Alert Type
Income has 2250 (15.0%) missing values	Missing
Credit Score has 2250 (15.0%) missing values	Missing
Loan Amount has 2250 (15.0%) missing values	Missing
Assets Value has 2250 (15.0%) missing values	Missing
Number of Dependents has 2250 (15.0%) missing values	Missing
Previous Defaults has 2250 (15.0%) missing values	Missing
Debt-to-Income Ratio has unique values	Unique
Years at Current Job has 727 (4.8%)	zeros

Categorical Ordinal Variables

Credit Risk Rating (Low-Medium-High)
Level of Education (High School, Bachelor's, Master's, PhD)
Payment History (Poor-Fair-Good-Excellent)
Number of Dependents
Previous Defaults

Categorical Nominal

Gender
Marital Status
Loan Purpose
Unemployment Status

Numerical Variables (sent to `sklearn.preprocessor.StandardScaler()`)

Age
Income
Credit Score
Loan Amount
Years at Current Job
Debt-to-Income Ratio
Assets Value
Number of Dependents
Previous Defaults

Coding

Saving Report to .html file.

report.to_file("report.html")  # Save report as an HTML file

Removing Location Data

At present, I cannot handle location data. I attempted to concatenate City and Country Column, but was unsuccessful.

data = data.drop(['City', 'State', 'Country'], axis=1)

Encdoing Datset

ordinal_mapping = {
    'Education Level': ['High School', "Bachelor's", "Master's", "PhD"],
    'Risk Rating': ["Low", "Medium", "High"],
    'Payment History': ["Poor", "Fair", "Good", "Excellent"]
}
encoder = OrdinalEncoder(categories=[ordinal_mapping[col] for col in ordinal_mapping])
data[list(ordinal_mapping.keys())] = encoder.fit_transform(data[list(ordinal_mapping.keys())])

data = pd.get_dummies(data)

5-NN Mean imputation

imputer = KNNImputer(n_neighbors=5)
data = pd.DataFrame(imputer.fit_transform(data), columns=data.columns)

Export File

Final dataset exported in .parquet format.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
EDA.ipynb		EDA.ipynb
README.md		README.md
cleaned_data.parquet		cleaned_data.parquet
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

capstone-project-eda

Packages Involved

EDA

Missing Value Report

Categorical Ordinal Variables

Categorical Nominal

Numerical Variables (sent to `sklearn.preprocessor.StandardScaler()`)

Coding

Saving Report to .html file.

Removing Location Data

Encdoing Datset

5-NN Mean imputation

Export File

About

Uh oh!

Releases

Packages

Languages

ckranon/capstone-project-eda

Folders and files

Latest commit

History

Repository files navigation

capstone-project-eda

Packages Involved

EDA

Missing Value Report

Categorical Ordinal Variables

Categorical Nominal

Numerical Variables (sent to sklearn.preprocessor.StandardScaler())

Coding

Saving Report to .html file.

Removing Location Data

Encdoing Datset

5-NN Mean imputation

Export File

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Numerical Variables (sent to `sklearn.preprocessor.StandardScaler()`)

Packages