Sampling Techniques and Model Performance Evaluation

Overview

This project evaluates the performance of various machine learning models on a dataset using different sampling techniques. The objective is to understand how different sampling methods impact the accuracy of machine learning models and to identify the best-performing combination.

Dataset

The dataset used for this project is [Credit Card Data]. The dataset is highly imbalanced, and it is converted into a balanced class dataset using sampling techniques.

Sampling Techniques

The following sampling techniques were used:

Simple Random Sampling: Selects a subset of individuals randomly from the larger dataset.
Stratified Sampling:Divides the population into homogeneous subgroups before sampling.
Cluster Sampling:Divides the population into clusters and randomly selects entire clusters.
Bootstrap Sampling:Selects a subset of individuals randomly with replacement from a larger dataset, allowing the same individual to be selected multiple times.
Systematic Sampling: Selects samples based on a fixed periodic interval.

Machine Learning Models

The following machine learning models were evaluated:

Random Forest
Logistic Regression
Support Vector Machine (SVM)
K-Nearest Neighbors (KNN)
Decision Tree

Requirements

This project is implemented in Python, using libraries such as Pandas, NumPy, scikit-learn, and imbalanced-learn. Ensure you have these installed:

pip install pandas numpy scikit-learn imbalanced-learn

Setup and Execution

Clone the repository or download the project to your local machine
Place your dataset in the root directory or modify the dataset path in the script.
Run the script using Python

python sampling.py

Project Workflow

Data Preprocessing:
- Handle missing values, if any.
- Normalize or standardize the data for models like SVM and KNN.
- Encode categorical features as required.
Balancing the Dataset:
- Applied sampling techniques to create balanced datasets.
Model Training and Evaluation:
- Split the dataset into training and testing sets.
- Train models using each sampling technique.
- Evaluate the models using accuracy as the performance metric.
Result Analysis:
- Compare the accuracy of models across different sampling techniques.
- Identify the best combination of model and sampling method.

Results

Results are saved in a pivot table in the Results folder which includes all results and the best results.In all__results, each row represents a machine learning model and each column a sampling technique and in best_results, the best combination of model and sampling method is shown.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
Results		Results
README.md		README.md
sampling.py		sampling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sampling Techniques and Model Performance Evaluation

Overview

Dataset

Sampling Techniques

Machine Learning Models

Requirements

Setup and Execution

Project Workflow

Results

About

Uh oh!

Releases

Packages

Languages

PrishaSingh11/Sampling_102217109

Folders and files

Latest commit

History

Repository files navigation

Sampling Techniques and Model Performance Evaluation

Overview

Dataset

Sampling Techniques

Machine Learning Models

Requirements

Setup and Execution

Project Workflow

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages