Data anonymization using k-Anonymity

✔️ Team's contribution (Amir, Nikita, Karol)

Repository was forked from here: https://github.com/kaylode/k-anonymity

Our contribution:

We prepared our own dataset to be anonymized, together with quasi-identifiers' hierarchies definition. All can be found in /data/covid
We anonymized the dataset using the datafly algorithm for different k values. Results can be found in /results/covid
We calculated CAVG, DM and NCP metrics for different k-values. Results are stored in metrics_calculated.csv
We generated plots with above-mentioned metrics, which can be found in /plots

Anonymization of the data, together with metrics calculation and saving them into csv can be called this way:

python metrics_calculate.py

Variables methods, dataset, k_array inside metrics_calculate.py are responsible, respectively, for the choice of the algorithm, data and k values. By default covid dataset and datafly algorithm are being used. The script might take a while to run.

Plots can then be generated by calling:

python metrics_plot.py

Explanations below come directly from the forked repository and have not been changed

✔️ Experiments

Provides 5 k-anonymization method:
- Datafly
- Incognito
- Topdown Greedy
- Classic Mondrian
- Basic Mondrian
Implements 3 anonymization metrics:
- Equivalent Class size metric (CAVG)
- Discernibility Metric (DM)
- Normalized Certainty Penalty (NCP)
Implements 3 classification models:
- Random Forests
- Support Vector Machines
- K-Nearest Neighbors

📖 Reports

Report edit link: link
Slide link: link

Folder Structure

A dataset must comes with a .csv file contains features information and a hierarchy folder which contains predefined generalization hierarchies for its QID attributes.

this repo
│   anonymize.py
|
└───data  
│   │
│   └───adult
│       │   adult.csv
│       └───hierarchies
│       │     adult_hierarchy_workclass.csv
│       │     ....

Here is an example for a generalization hierarchy of the 'workclass' attribute from ADULT dataset, described in adult_hierarchy_workclass.csv, which is a csv file using ";" as delimiter

Private;Non-Government;*
Self-emp-not-inc;Non-Government;*
Self-emp-inc;Non-Government;*
Federal-gov;Government;*
Local-gov;Government;*
State-gov;Government;*
Without-pay;Unemployed;*
Never-worked;Unemployed;*

which describes this tree:

🌟 Executing

To anonymize dataset, run:

python anonymize.py --method=<model_type> --k=<k-anonymity> --dataset=<dataset_name>

model_type: [mondrian | classic_mondrian | mondrian_ldiv | topdown | cluster | datafly]
dataset_name: [adult | cahousing | cmc | mgm | informs | italia]

Results will be in results/{dataset}/{method} folder

To run evaluation metrics on every combination of algorithms, datasets and value k, run:

python visualize.py

Results will be in demo/{metrics.png, metrics_ml.png}

K-Anonymity examples

Before anonymization	After anonymization with k = 2

Evaluation Metrics

Evaluate anonymization using information loss metrics

Evaluate anonymization using classification models

References:

Basic Mondrian, Top-Down Greedy, Cluster-based (https://github.com/fhstp/k-AnonML)
L-Diversity (https://github.com/Nuclearstar/K-Anonymity, https://github.com/qiyuangong/Mondrian_L_Diversity)
Classic Mondrian (https://github.com/qiyuangong/Mondrian)
Datafly Algorithm (https://github.com/nazilkbahar/python-datafly)
Normalized Certainty Penalty from Utility-Based Anonymization for Privacy Preservation with Less Information Loss
Discernibility, Average Equivalent Class Size from A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners
Privacy in a Mobile-Social World
Code and idea based on k-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data anonymization using k-Anonymity

✔️ Team's contribution (Amir, Nikita, Karol)

Explanations below come directly from the forked repository and have not been changed

✔️ Experiments

📖 Reports

Folder Structure

🌟 Executing

K-Anonymity examples

Evaluation Metrics

References:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
algorithms		algorithms
data		data
datasets		datasets
demo		demo
diagram		diagram
metrics		metrics
models		models
plots		plots
utils		utils
.gitignore		.gitignore
README.md		README.md
anonymize.py		anonymize.py
metric_result		metric_result
metrics_calculate.py		metrics_calculate.py
metrics_calculated.csv		metrics_calculated.csv
metrics_plot.py		metrics_plot.py
requirements.txt		requirements.txt
visualize.py		visualize.py

karurb92/k-anonymity

Folders and files

Latest commit

History

Repository files navigation

Data anonymization using k-Anonymity

✔️ Team's contribution (Amir, Nikita, Karol)

Explanations below come directly from the forked repository and have not been changed

✔️ Experiments

📖 Reports

Folder Structure

🌟 Executing

K-Anonymity examples

Evaluation Metrics

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages