This repository contains the implementation of FR-CCA (Fair Representation Canonical Correlation Analysis), a novel method for fair representation learning that ensures projected features are independent of sensitive attributes while maintaining high correlation analysis performance.
Traditional Canonical Correlation Analysis (CCA) methods often overlook fairness considerations, potentially leading to biased representations that amplify societal biases. Our FR-CCA method addresses this limitation by:
- Ensuring independence between learned representations and sensitive attributes
- Maintaining high correlation between different data modalities
- Optimizing for both fairness and classification performance
- Providing computational efficiency comparable to standard CCA
FR-CCA extends traditional CCA by adding fairness constraints that prevent any classifier from predicting sensitive attributes (e.g., sex, race, age) based solely on the projected data. The method solves:
max trace(U^T X^T Y V)
subject to: U^T X^T X U = V^T Y^T Y V = I_R
U^T X^T ẑ = V^T Y^T ẑ = 0
Where ẑ represents the centered sensitive attribute vector.
├── FRCCA_AV14_MRI_Update_X.ipynb # Implementation for MRI modality
├── FRCCA_AV14_MRI_Update_Y.ipynb # Implementation for AV1451 (Tau PET) modality
├── data/ # Data directory (ADNI datasets)
├── utils/ # Utility functions
└── README.md # This file
pip install numpy pandas scikit-learn matplotlib seaborn
pip install scipy shap jupyter- MRI: 66 cortical thickness features from structural MRI scans
- AV1451: 68 SUVR features from Tau PET scans
- Participants: 375 subjects (182 males, 193 females)
- Diagnoses: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), Alzheimer's Disease (AD)
- Multivariate Gaussian distributions with embedded sensitive attribute bias
- 500 samples with controllable correlation structure
- Used for method validation and parameter tuning
# Run the synthetic X notebook
jupyter notebook FRCCA_Synthetic_X.ipynb# Run the synthetic T notebook
jupyter notebook FRCCA_Synthetic_Y.ipynbNeed access to ADNI data when running MRI and AV14 notebook
# Run the MRI notebook
jupyter notebook FRCCA_AV14_MRI_Update_X.ipynb# Run the AV1451 notebook
jupyter notebook FRCCA_AV14_MRI_Update_Y.ipynbBoth notebooks include:
- Data preprocessing and standardization
- FR-CCA model training
- Fairness evaluation (DPG, EOG, GSG metrics)
- Classification performance assessment
Our FR-CCA method demonstrates:
- Superior Fairness: Significant reduction in demographic parity gap (DPG), equalized odds gap (EOG), and group sufficiency gap (GSG)
- Maintained Accuracy: Competitive classification performance compared to traditional methods
- Computational Efficiency: Time complexity similar to standard CCA: O((Dx + Dy)³ + N·Dx² + N·Dy² + N·Dx·Dy)
- Clinical Relevance: Identifies important brain regions consistent with Alzheimer's disease pathology
| Method | GSG ↓ | DPG ↓ | EOG ↓ | Accuracy |
|---|---|---|---|---|
| SVM | 0.329±0.042 | 0.081±0.040 | 0.084±0.037 | 0.612±0.033 |
| CCA | 0.239±0.208 | 0.028±0.024 | 0.015±0.017 | 0.635±0.026 |
| SF-CCA | 0.286±0.033 | 0.036±0.024 | 0.033±0.031 | 0.619±0.026 |
| MF-CCA | 0.242±0.042 | 0.015±0.013 | 0.026±0.022 | 0.621±0.015 |
| FR-CCA | 0.215±0.041 | 0.008±0.006 | 0.010±0.005 | 0.612±0.009 |
- Demographic Parity Gap (DPG): Measures score consistency across sensitive groups
- Equalized Odds Gap (EOG): Ensures equal treatment across all label-attribute combinations
- Group Sufficiency Gap (GSG): Validates prediction relevance regardless of sensitive attributes
- MRI Focus: Memory, language, and visual processing regions (entorhinal cortex, temporal gyri, fusiform gyrus)
- AV1451 Focus: Tau pathology in sensory processing and emotional regulation areas (postcentral gyrus, temporal pole, anterior cingulate)
@misc{hou2025fairccafairrepresentation,
title={Fair CCA for Fair Representation Learning: An ADNI Study},
author={Bojian Hou and Zhanliang Wang and Zhuoping Zhou and Boning Tong and Zexuan Wang and Jingxuan Bao and Duy Duong-Tran and Qi Long and Li Shen},
year={2025},
eprint={2507.09382},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2507.09382},
}We welcome contributions! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
- Alzheimer's Disease Neuroimaging Initiative (ADNI) for providing the neuroimaging data
- The research community for foundational work in fair machine learning and canonical correlation analysis
For questions or collaboration opportunities, please open an issue in this repository.
Note: This implementation is for research purposes. Clinical applications should undergo appropriate validation and regulatory approval.