Skip to content

Files

Failed to load latest commit information.

Latest commit

 Cannot retrieve latest commit at this time.

History

History

credit-scorer

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

CreditScorer: EU AI Act-Compliant Pipeline

An end‑to‑end credit‑scoring workflow that automatically generates the technical evidence required by the EU AI Act.

Compliance Dashboard

🎯 Regulatory Context

Financial institutions must comply with the EU AI Act for any high‑risk AI system. Meeting Articles 9–18 requires extensive documentation and auditing. This pipeline delivers a production‑ready workflow that:

  • Generates all required technical evidence automatically
  • Ensures robust risk management and human oversight
  • Maintains full audit trails with versioned artifacts
  • Provides real‑time compliance dashboards for stakeholders

🔍 Data Overview

This project leverages the Home Credit Default Risk dataset provided by the Home Credit Group. The raw dataset contains potentially sensitive attributes such as CODE_GENDER, DAYS_BIRTH, NAME_EDUCATION_TYPE, NAME_FAMILY_STATUS, and NAME_HOUSING_TYPE, which can be filtered using the pipeline's sensitive_attributes parameter to comply with fairness requirements.

Key fields used for modeling:

Field Description
AMT_INCOME_TOTAL Annual income of the applicant
AMT_CREDIT Credit amount of the loan
AMT_ANNUITY Loan annuity amount
EXT_SOURCE_1/2/3 External source scores (credit history proxies)
TARGET Default indicator (0 = no default, 1 = default)

Automated preprocessing handles:

  • Missing value imputation using SimpleImputer
  • Feature scaling with StandardScaler (optional)
  • Categorical encoding with OneHotEncoder
  • Feature engineering including age derivation from DAYS_BIRTH

🚀 Pipeline Architecture

End-to-End Architecture

The system implements three main pipelines that map directly to EU AI Act requirements:

Pipeline Key Steps EU AI Act Focus
Feature Engineering Ingest → Record SHA‑256 provenance 📥
Profile → WhyLogs data governance 📊
Preprocess → Impute, encode, normalize 🔧
Arts 10, 12, 15
Training Train → LightGBM w/ class‑imbalance handling 🎯
Evaluate → Accuracy, AUC, fairness analysis ⚖️
Assess → Risk scoring & model registry 📋
Arts 9, 11, 15
Deployment Approve → Human oversight gate 🙋‍♂️
Deploy → Modal API deployment 🚀
Monitor → SBOM + post‑market tracking 📈
Arts 14, 17, 18

Each pipeline run automatically versions all inputs/outputs, generates profiling reports, creates risk assessments, produces a Software Bill of Materials (SBOM), and compiles complete Annex IV technical documentation.

🛠️ Project Structure

credit-scorer/
│
├── run.py                  # Main pipeline execution script
├── run_dashboard.py        # Dashboard launcher
│
├── src/
│   ├── data/               # Dataset directory
│   ├── pipelines/          # Pipeline definitions
│   ├── steps/              # Pipeline step implementations
│   ├── configs/            # Configuration files
│   ├── utils/              # Utility functions
│   └── constants/          # Project constants
│
├── streamlit_app/          # Compliance dashboard
├── modal_app/              # Modal deployment code
├── docs/                   # Documentation and compliance artifacts
├── models/                 # Saved model artifacts
├── assets/                 # Images and static resources
└── scripts/                # Helper scripts

🚀 Getting Started

Prerequisites

  • Python 3.12+
  • ZenML >= 0.82.1
  • Modal account (for deployment pipeline)
  • WhyLogs integration (for data profiling)

Installation & Configuration

  1. Install dependencies
pip install -r requirements.txt
  1. Set up ZenML
zenml init
  1. Install WhyLogs integration for data profiling:
zenml integration install whylogs
zenml data-validator register whylogs_data_validator --flavor=whylogs
zenml stack update <STACK_NAME> -dv whylogs_data_validator
  1. Install Slack integration for deployment approval gate and incident reporting:
zenml integration install slack
zenml secret create slack_token --oauth_token=<SLACK_TOKEN>
zenml alerter register slack_alerter \
    --flavor=slack \
    --slack_token={{slack_token.oauth_token}} \
    --slack_channel_id=<SLACK_CHANNEL_ID>
zenml stack update <STACK_NAME> -al slack_alerter

📊 Running Pipelines

Basic Commands

# Run individual pipelines
python run.py --feature    # Feature engineering (Articles 10, 12)
python run.py --train      # Model training (Articles 9, 11, 15)
python run.py --deploy     # Deployment (Articles 14, 17, 18)

# Pipeline options
python run.py --train --auto-approve     # Skip manual approval steps
python run.py --feature --no-cache       # Disable ZenML caching
python run.py --deploy --config-dir ./my-configs  # Custom config directory

View Compliance Dashboard

The project includes a Streamlit-based compliance dashboard that provides:

  • Real-time visibility into EU AI Act compliance status
  • Executive summary of current risk levels and compliance metrics
  • Generated Annex IV documentation with export options

To run the dashboard:

# Launch the Streamlit compliance dashboard
python run_dashboard.py

Note: All compliance artifacts are also directly accessible through the ZenML dashboard. The Streamlit dashboard is provided as a convenient additional interface for browsing compliance information locally and offline.

🔧 Configuration

Pipeline configurations are stored in src/configs/:

  • feature_engineering.yaml - Data processing and profiling settings
  • training.yaml - Model training and evaluation parameters
  • deployment.yaml - Deployment and monitoring configuration

☁️ Cloud Deployment

You can store artifacts and run pipelines locally, but storing them in the cloud enables you to visualize the data artifacts produced by pipelines directly in the ZenML dashboard.

See the Cloud Deployment Guide for step-by-step instructions on setting up a cloud artifact store and orchestrator.

📄 Generated Artifacts

Each pipeline run creates a unique release directory in docs/releases/<run_id>/ containing all compliance artifacts. Here are some guides to help you navigate the artifacts produced and what ZenML features were leveraged to produce them:

📚 Documentation

Note: This project provides the technical evidence required by the EU AI Act. For complete compliance, organizations must also maintain formal quality management documentation and processes.

Additional Resources

📄 License

This project is licensed under the Apache License 2.0.