credit-scorer

Failed to load latest commit information.

Cannot retrieve latest commit at this time.

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
docs		docs
modal_app		modal_app
scripts		scripts
src		src
streamlit_app		streamlit_app
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
run_dashboard.py		run_dashboard.py

README.md

CreditScorer: EU AI Act-Compliant Pipeline

An end‑to‑end credit‑scoring workflow that automatically generates the technical evidence required by the EU AI Act.

🎯 Regulatory Context

Financial institutions must comply with the EU AI Act for any high‑risk AI system. Meeting Articles 9–18 requires extensive documentation and auditing. This pipeline delivers a production‑ready workflow that:

Generates all required technical evidence automatically
Ensures robust risk management and human oversight
Maintains full audit trails with versioned artifacts
Provides real‑time compliance dashboards for stakeholders

🔍 Data Overview

This project leverages the Home Credit Default Risk dataset provided by the Home Credit Group. The raw dataset contains potentially sensitive attributes such as CODE_GENDER, DAYS_BIRTH, NAME_EDUCATION_TYPE, NAME_FAMILY_STATUS, and NAME_HOUSING_TYPE, which can be filtered using the pipeline's sensitive_attributes parameter to comply with fairness requirements.

Key fields used for modeling:

Field	Description
`AMT_INCOME_TOTAL`	Annual income of the applicant
`AMT_CREDIT`	Credit amount of the loan
`AMT_ANNUITY`	Loan annuity amount
`EXT_SOURCE_1/2/3`	External source scores (credit history proxies)
`TARGET`	Default indicator (0 = no default, 1 = default)

Automated preprocessing handles:

Missing value imputation using SimpleImputer
Feature scaling with StandardScaler (optional)
Categorical encoding with OneHotEncoder
Feature engineering including age derivation from DAYS_BIRTH

🚀 Pipeline Architecture

The system implements three main pipelines that map directly to EU AI Act requirements:

Pipeline	Key Steps	EU AI Act Focus
Feature Engineering	Ingest → Record SHA‑256 provenance 📥 Profile → WhyLogs data governance 📊 Preprocess → Impute, encode, normalize 🔧	Arts 10, 12, 15
Training	Train → LightGBM w/ class‑imbalance handling 🎯 Evaluate → Accuracy, AUC, fairness analysis ⚖️ Assess → Risk scoring & model registry 📋	Arts 9, 11, 15
Deployment	Approve → Human oversight gate 🙋‍♂️ Deploy → Modal API deployment 🚀 Monitor → SBOM + post‑market tracking 📈	Arts 14, 17, 18

Each pipeline run automatically versions all inputs/outputs, generates profiling reports, creates risk assessments, produces a Software Bill of Materials (SBOM), and compiles complete Annex IV technical documentation.

🛠️ Project Structure

credit-scorer/
│
├── run.py                  # Main pipeline execution script
├── run_dashboard.py        # Dashboard launcher
│
├── src/
│   ├── data/               # Dataset directory
│   ├── pipelines/          # Pipeline definitions
│   ├── steps/              # Pipeline step implementations
│   ├── configs/            # Configuration files
│   ├── utils/              # Utility functions
│   └── constants/          # Project constants
│
├── streamlit_app/          # Compliance dashboard
├── modal_app/              # Modal deployment code
├── docs/                   # Documentation and compliance artifacts
├── models/                 # Saved model artifacts
├── assets/                 # Images and static resources
└── scripts/                # Helper scripts

🚀 Getting Started

Prerequisites

Python 3.12+
ZenML >= 0.82.1
Modal account (for deployment pipeline)
WhyLogs integration (for data profiling)

Installation & Configuration

Install dependencies

pip install -r requirements.txt

Set up ZenML

zenml init

Install WhyLogs integration for data profiling:

zenml integration install whylogs
zenml data-validator register whylogs_data_validator --flavor=whylogs
zenml stack update <STACK_NAME> -dv whylogs_data_validator

Install Slack integration for deployment approval gate and incident reporting:

zenml integration install slack
zenml secret create slack_token --oauth_token=<SLACK_TOKEN>
zenml alerter register slack_alerter \
    --flavor=slack \
    --slack_token={{slack_token.oauth_token}} \
    --slack_channel_id=<SLACK_CHANNEL_ID>
zenml stack update <STACK_NAME> -al slack_alerter

📊 Running Pipelines

Basic Commands

# Run individual pipelines
python run.py --feature    # Feature engineering (Articles 10, 12)
python run.py --train      # Model training (Articles 9, 11, 15)
python run.py --deploy     # Deployment (Articles 14, 17, 18)

# Pipeline options
python run.py --train --auto-approve     # Skip manual approval steps
python run.py --feature --no-cache       # Disable ZenML caching
python run.py --deploy --config-dir ./my-configs  # Custom config directory

View Compliance Dashboard

The project includes a Streamlit-based compliance dashboard that provides:

Real-time visibility into EU AI Act compliance status
Executive summary of current risk levels and compliance metrics
Generated Annex IV documentation with export options

To run the dashboard:

# Launch the Streamlit compliance dashboard
python run_dashboard.py

Note: All compliance artifacts are also directly accessible through the ZenML dashboard. The Streamlit dashboard is provided as a convenient additional interface for browsing compliance information locally and offline.

🔧 Configuration

Pipeline configurations are stored in src/configs/:

feature_engineering.yaml - Data processing and profiling settings
training.yaml - Model training and evaluation parameters
deployment.yaml - Deployment and monitoring configuration

☁️ Cloud Deployment

You can store artifacts and run pipelines locally, but storing them in the cloud enables you to visualize the data artifacts produced by pipelines directly in the ZenML dashboard.

See the Cloud Deployment Guide for step-by-step instructions on setting up a cloud artifact store and orchestrator.

📄 Generated Artifacts

Each pipeline run creates a unique release directory in docs/releases/<run_id>/ containing all compliance artifacts. Here are some guides to help you navigate the artifacts produced and what ZenML features were leveraged to produce them:

📚 Documentation

ZenML Documentation
QMS Templates - Quality management system documentation templates

Note: This project provides the technical evidence required by the EU AI Act. For complete compliance, organizations must also maintain formal quality management documentation and processes.

Additional Resources

📄 License

This project is licensed under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

credit-scorer

credit-scorer

README.md

CreditScorer: EU AI Act-Compliant Pipeline

🎯 Regulatory Context

🔍 Data Overview

🚀 Pipeline Architecture

🛠️ Project Structure

🚀 Getting Started

Prerequisites

Installation & Configuration

📊 Running Pipelines

Basic Commands

View Compliance Dashboard

🔧 Configuration

☁️ Cloud Deployment

📄 Generated Artifacts

📚 Documentation

Additional Resources

📄 License

Collapse file tree

Files

credit-scorer

Directory actions

More options

Directory actions

More options

Latest commit

History

credit-scorer

Folders and files

parent directory

README.md

CreditScorer: EU AI Act-Compliant Pipeline

🎯 Regulatory Context

🔍 Data Overview

🚀 Pipeline Architecture

🛠️ Project Structure

🚀 Getting Started

Prerequisites

Installation & Configuration

📊 Running Pipelines

Basic Commands

View Compliance Dashboard

🔧 Configuration

☁️ Cloud Deployment

📄 Generated Artifacts

📚 Documentation

Additional Resources

📄 License