Adversary-Adaptive Representation Learning for Privacy-Preserving Image Tasks
CS 750/850 Course Project — University of New Hampshire
This project learns image representations that preserve utility labels (for example, emotion or category) while suppressing sensitive biometric information (identity, gender, age).
Image → CNN Feature Extractor
→ Privacy Filter (Conv1D + VIB)
→ Task Model (Attention Pooling + Classifier) [utility]
→ GRL → Multi-Head Adversary [privacy]
Training objective: min_{θ,φ} max_{ψ} L_utility - λ · L_privacy
bash scripts/setup_env.sh appr-photos 3.10 auto
conda activate appr-photosThe auto backend uses CPU wheels on Linux and standard PyTorch wheels on Apple
Silicon. Use cpu when you want a small CPU-only environment explicitly:
bash scripts/setup_env.sh appr-photos 3.10 cpu
conda activate appr-photosUse a CUDA wheel tag only when you want an accelerated PyTorch install:
bash scripts/setup_env.sh appr-photos 3.10 cuda:cu128
conda activate appr-photosRecommended practical dataset: CelebA.
This repo downloads CelebA through torchvision's official dataset integration
and builds a repo-compatible metadata.csv automatically. The default setup uses:
- utility label:
Smilingvsnot_smiling - privacy labels:
identityandgender
bash scripts/download_data.sh celeba data/raw/celebaThis writes the prepared dataset to data/raw/celeba and creates metadata.csv.
For a detailed setup, dataset, training, and comparison guide, see
docs/setup_dataset_runbook.md.
For a direct training command checklist, see
docs/training_commands.md.
Organize images under data/raw/celeba:
data/raw/celeba/
<class_name>/
<speaker_id>/image_001.jpg
<speaker_id>/image_002.jpg
Or place metadata.csv in data/raw/celeba/ with fields such as:
filename,utility_label,speaker_id,gender,age.
Verification and metadata generation:
python scripts/prepare_datasets.py --verify --root data/raw/celeba
python scripts/prepare_datasets.py --build-metadata --root data/raw/celeba
python scripts/prepare_datasets.py --stats --root data/raw/celeba# CelebA baseline training
python scripts/train.py --config configs/experiment/celeba_baseline.yaml
# Larger-batch config for machines with enough accelerator or CPU memory
python scripts/train.py --config configs/experiment/celeba_accelerated.yaml
# Precompute features for faster training loops (optional)
python scripts/precompute_features.py --config configs/experiment/celeba_baseline.yaml
python scripts/train.py --config configs/experiment/celeba_cached.yaml
# Override config values from CLI
python scripts/train.py --config configs/experiment/celeba_baseline.yaml training.num_epochs=20 training.lambda_privacy=0.05The CelebA configs use smiling classification as the utility task and identity/gender as the baseline privacy targets.
Run the full comparison with separate utility runs and one combined multi-utility run:
python scripts/run_celeba_attribute_comparison.py \
--mode both \
--epochs 10 \
--batch-size 128 \
--num-workers 8This uses CelebA-provided utility labels Smiling, Mouth_Slightly_Open,
Eyeglasses, Wearing_Hat, and Blurry, with privacy heads for speaker_id,
gender, and young.
python scripts/evaluate.py --checkpoint outputs/celeba_baseline/checkpoints/best_model.ptpython scripts/sweep_lambda.py --config configs/experiment/celeba_baseline.yaml --epochs 20python scripts/visualize.py --checkpoint outputs/celeba_baseline/checkpoints/best_model.ptFor report-ready figures from a trained run:
python scripts/generate_report_figures.py \
--checkpoint outputs/celeba_baseline/checkpoints/best_model.pt \
--output_dir outputs/report_figurespytest tests/ -vUtility (higher is better): UAR, Weighted Accuracy, Macro F1
Privacy (lower is more private): identity accuracy, gender accuracy, de-identification rate, MI(Z; S)
src/aapr/
├── data/ # Photo dataset loaders and split/collation utils
├── features/ # CNN image feature extractor + feature cache
├── models/ # Privacy filter, task model, adversary, GRL
├── training/ # Adversarial trainer, losses, schedulers, metrics
├── evaluation/ # Evaluator, cross-dataset, Pareto analysis
├── visualization/ # Embeddings, training curves, Pareto plots
└── utils/ # Config, logging, seed, device detection