SKALD

SKALD is a scalable, chunk-wise K-anonymization tool based on the Optimal Lattice Anonymization (OLA) algorithm. It is designed to handle large datasets by processing them in manageable chunks, ensuring data privacy while maintaining utility.

Features

Chunk-wise k-anonymization using the Optimal Lattice Anonymization (OLA) method
Supports numerical and categorical quasi-identifiers
Efficient encoding for sparse numerical attributes
Decoding of generalized encoded values for interpretability
Global histogram merging for optimal bin width selection
Suppression mechanism to meet k-anonymity without excessive data distortion
Clean architecture separating core logic, generalization, and utilities
Fully configurable via a YAML file
Logging support and reproducible output structure

Installation

git clone https://github.com/datakaveri/k-anonymisation-SKALD.git
cd k-anonymisation-SKALD
pip install -r requirements.txt
pip install -e .

Requirements: Python 3.8+ and the dependencies listed in requirements.txt.

Usage

Run SKALD from the command line:

SKALD --config config.yaml [--k 10] [--chunks 5] [--chunk_dir chunks]

CLI Arguments

Argument	Description
`--k`	Desired k-anonymity level (e.g., 500)
`--chunks`	Number of chunks to process (e.g., 100)
`--chunk_dir`	Directory containing the dataset chunks
`--config`	Path to custom YAML config file

Configuration File (`config.yaml`)

number_of_chunks: 1
k: 10
max_number_of_eq_classes: 15000000
suppression_limit: 0.001

chunk_directory: datachunks
output_path: generalized_chunk1.csv
log_file: log.txt
save_output: true

quasi_identifiers:
  numerical:
    - column: Age
      encode: true
      type: int
    - column: BMI
      encode: true
      type: float
    - column: PIN Code
      encode: true
      type: int
  categorical: 
    - column: Blood Group
    - column: Profession

bin_width_multiplication_factor:
  Age: 2
  BMI: 2
  PIN Code: 2

hardcoded_min_max:
  Age: [19, 85]
  BMI: [12.7, 35.8]
  PIN Code: [560001, 591346]

Example Workflow

Prepare your chunked dataset (e.g., datachunks/KanonMedicalData_chunk1.csv, ..., chunk100.csv)
Define your quasi-identifiers in config.yaml
Run SKALD:

SKALD --config config.yaml

Output

generalized_chunk.csv: Anonymized first chunk
encodings/: JSON encoding maps
log.txt: Logging output, bin width info
Console: Runtime and generalization details

Project Structure

k-anonymisation-SKALD/
├── SKALD/
│   ├── core.py
│   ├── generalization_ri.py
│   ├── generalization_rf.py
│   ├── quasi_identifier.py
│   ├── utils.py
│   ├── cli.py
│   └── ...
├── encodings/
├── datachunks/
├── generalized_chunk.csv
├── config.yaml
├── README.md
└── requirements.txt

How It Works

Chunk Encoding: Encodes high-cardinality numerical QIs and stores mappings as JSON.
OLA Phase 1: Constructs a lattice of bin widths to meet equivalence class constraints.
OLA Phase 2: Merges histograms from all chunks to refine bin widths.
Generalization: Applies the finalized bin widths to the target chunk for anonymization.

Authors

Kailash R — Core Developer

Acknowledgements

Based on the Optimal Lattice Anonymization (OLA) algorithm.
Utilizes open-source libraries such as pandas, numpy, and PyYAML.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
SKALD		SKALD
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
SKALD_main.py		SKALD_main.py
SKALD_server.py		SKALD_server.py
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
server_config.cfg		server_config.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SKALD

Features

Installation

Usage

CLI Arguments

Configuration File (`config.yaml`)

Example Workflow

Output

Project Structure

How It Works

Authors

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

datakaveri/k-anonymisation-SKALD

Folders and files

Latest commit

History

Repository files navigation

SKALD

Features

Installation

Usage

CLI Arguments

Configuration File (config.yaml)

Example Workflow

Output

Project Structure

How It Works

Authors

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Configuration File (`config.yaml`)

Packages