An end‑to‑end credit‑scoring workflow that automatically generates the technical evidence required by the EU AI Act.
Financial institutions must comply with the EU AI Act for any high‑risk AI system. Meeting Articles 9–18 requires extensive documentation and auditing. This pipeline delivers a production‑ready workflow that:
- Generates all required technical evidence automatically
- Ensures robust risk management and human oversight
- Maintains full audit trails with versioned artifacts
- Provides real‑time compliance dashboards for stakeholders
This project leverages the Home Credit Default Risk dataset provided by the Home Credit Group. The raw dataset contains potentially sensitive attributes such as CODE_GENDER
, DAYS_BIRTH
, NAME_EDUCATION_TYPE
, NAME_FAMILY_STATUS
, and NAME_HOUSING_TYPE
, which can be filtered using the pipeline's sensitive_attributes
parameter to comply with fairness requirements.
Key fields used for modeling:
Field | Description |
---|---|
AMT_INCOME_TOTAL |
Annual income of the applicant |
AMT_CREDIT |
Credit amount of the loan |
AMT_ANNUITY |
Loan annuity amount |
EXT_SOURCE_1/2/3 |
External source scores (credit history proxies) |
TARGET |
Default indicator (0 = no default, 1 = default) |
Automated preprocessing handles:
- Missing value imputation using SimpleImputer
- Feature scaling with StandardScaler (optional)
- Categorical encoding with OneHotEncoder
- Feature engineering including age derivation from
DAYS_BIRTH
The system implements three main pipelines that map directly to EU AI Act requirements:
Pipeline | Key Steps | EU AI Act Focus |
---|---|---|
Feature Engineering | Ingest → Record SHA‑256 provenance 📥 Profile → WhyLogs data governance 📊 Preprocess → Impute, encode, normalize 🔧 |
Arts 10, 12, 15 |
Training | Train → LightGBM w/ class‑imbalance handling 🎯 Evaluate → Accuracy, AUC, fairness analysis ⚖️ Assess → Risk scoring & model registry 📋 |
Arts 9, 11, 15 |
Deployment | Approve → Human oversight gate 🙋♂️ Deploy → Modal API deployment 🚀 Monitor → SBOM + post‑market tracking 📈 |
Arts 14, 17, 18 |
Each pipeline run automatically versions all inputs/outputs, generates profiling reports, creates risk assessments, produces a Software Bill of Materials (SBOM), and compiles complete Annex IV technical documentation.
credit-scorer/
│
├── run.py # Main pipeline execution script
├── run_dashboard.py # Dashboard launcher
│
├── src/
│ ├── data/ # Dataset directory
│ ├── pipelines/ # Pipeline definitions
│ ├── steps/ # Pipeline step implementations
│ ├── configs/ # Configuration files
│ ├── utils/ # Utility functions
│ └── constants/ # Project constants
│
├── streamlit_app/ # Compliance dashboard
├── modal_app/ # Modal deployment code
├── docs/ # Documentation and compliance artifacts
├── models/ # Saved model artifacts
├── assets/ # Images and static resources
└── scripts/ # Helper scripts
- Python 3.12+
- ZenML >= 0.82.1
- Modal account (for deployment pipeline)
- WhyLogs integration (for data profiling)
- Install dependencies
pip install -r requirements.txt
- Set up ZenML
zenml init
- Install WhyLogs integration for data profiling:
zenml integration install whylogs
zenml data-validator register whylogs_data_validator --flavor=whylogs
zenml stack update <STACK_NAME> -dv whylogs_data_validator
- Install Slack integration for deployment approval gate and incident reporting:
zenml integration install slack
zenml secret create slack_token --oauth_token=<SLACK_TOKEN>
zenml alerter register slack_alerter \
--flavor=slack \
--slack_token={{slack_token.oauth_token}} \
--slack_channel_id=<SLACK_CHANNEL_ID>
zenml stack update <STACK_NAME> -al slack_alerter
# Run individual pipelines
python run.py --feature # Feature engineering (Articles 10, 12)
python run.py --train # Model training (Articles 9, 11, 15)
python run.py --deploy # Deployment (Articles 14, 17, 18)
# Pipeline options
python run.py --train --auto-approve # Skip manual approval steps
python run.py --feature --no-cache # Disable ZenML caching
python run.py --deploy --config-dir ./my-configs # Custom config directory
The project includes a Streamlit-based compliance dashboard that provides:
- Real-time visibility into EU AI Act compliance status
- Executive summary of current risk levels and compliance metrics
- Generated Annex IV documentation with export options
To run the dashboard:
# Launch the Streamlit compliance dashboard
python run_dashboard.py
Note: All compliance artifacts are also directly accessible through the ZenML dashboard. The Streamlit dashboard is provided as a convenient additional interface for browsing compliance information locally and offline.
Pipeline configurations are stored in src/configs/
:
feature_engineering.yaml
- Data processing and profiling settingstraining.yaml
- Model training and evaluation parametersdeployment.yaml
- Deployment and monitoring configuration
You can store artifacts and run pipelines locally, but storing them in the cloud enables you to visualize the data artifacts produced by pipelines directly in the ZenML dashboard.
See the Cloud Deployment Guide for step-by-step instructions on setting up a cloud artifact store and orchestrator.
Each pipeline run creates a unique release directory in docs/releases/<run_id>/
containing all compliance artifacts. Here are some guides to help you navigate the artifacts produced and what ZenML features were leveraged to produce them:
- ZenML Documentation
- QMS Templates - Quality management system documentation templates
Note: This project provides the technical evidence required by the EU AI Act. For complete compliance, organizations must also maintain formal quality management documentation and processes.
This project is licensed under the Apache License 2.0.