Data to Decision

This repository contains the code and data for the research project "From Data to Decision: Investigating Bias Amplification in Decision-Making Algorithms". The project examines how biases present in the Adult/Census Income dataset influence the outputs of machine learning models, exploring whether these biases are amplified by Logistic Regression, Decision Tree, and Random Forest algorithms using various fairness metrics.

Title: From Data to Decision: Investigating Bias Amplification in Decision-Making Algorithms
Date: June 23, 2024
Institution: Delft University of Technology
Link: Download the thesis (PDF)

Repository Structure

data-to-decision/
├── adult.csv
├── preprocess/
│   ├── preprocess_remove.py
│   ├── preprocess_impute_simple.py
│   ├── adult_preprocessed_train.csv
│   └── adult_preprocessed_test.csv
├── algorithms/
│   ├── algs.py
│   └── adult_trained_test_results.csv
├── bias/
│   ├── metrics_race.py
│   ├── metrics_sex.py
│   ├── metrics_plots.py
│   ├── race_plots.py
│   ├── sex_plots.py
│   └── dummy/
│       ├── data.csv
│       ├── pred.csv
│       └── metrics_dummy.py
└── README.md

preprocess/: Scripts and preprocessed data for cleaning, encoding, and scaling the raw dataset. Note that preprocess_impute_simple.py is maintained for reference but should not be used in the analysis pipeline.
bias/: Scripts to compute fairness metrics (Demographic Parity, Disparate Impact, Equal Opportunity, Equalized Odds) and generate corresponding plots.
algorithms/: Contains algs.py, which trains Logistic Regression, Decision Tree, and Random Forest models on the preprocessed data, and outputs test set predictions and performance metrics to adult_trained_test_results.csv.

Installation

Clone the repository:

git clone https://github.com/emihalache/data-to-decision.git
cd data-to-decision

(Optional) Create a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install pandas numpy scikit-learn fairlearn matplotlib seaborn

Usage

Data Preprocessing
- Run the preprocessing pipeline to clean, encode, and impute missing values:
```
python preprocess/preprocess_remove.py
```
- The processed training and test sets will be saved as adult_preprocessed_train.csv and adult_preprocessed_test.csv in the preprocess/ directory.
Model Training
- Train the classification models and generate test-set results:
```
python algorithms/algs.py
```
- The script will train Logistic Regression, Decision Tree, and Random Forest models, then save predictions and performance metrics to algorithms/adult_trained_test_results.csv.
Bias Analysis
- Compute fairness metrics and generate plots:
```
cd bias
```
- Then run:
```
python metrics_race.py
python race_plots.py
python metrics_sex.py
python sex_plots.py
```
- When finished, return to the root:
```
cd ..
```

Requirements

Python 3.7 or higher
pandas
numpy
scikit-learn
fairlearn
matplotlib
seaborn

Acknowledgements

Adult/Census Income Dataset: UCI Machine Learning Repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data to Decision

Repository Structure

Installation

Usage

Requirements

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
adult		adult
algorithms		algorithms
bias		bias
preprocess		preprocess
.gitignore		.gitignore
README.md		README.md
adult.csv		adult.csv

emihalache/data-to-decision

Folders and files

Latest commit

History

Repository files navigation

Data to Decision

Repository Structure

Installation

Usage

Requirements

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages