Dual-Agency Housing Challenge 🏠

2nd Place Solution — Regression with Distribution Shift Handling

Overview

This repository contains my solution for the APPLAI ’26 Dual-Agency Housing Challenge, where the goal was to predict median house values from two heterogeneous data sources with severe distribution shifts.

The focus of this work was not only model accuracy, but also robust data preprocessing, distribution-shift detection, and reproducible ML engineering.
The final solution achieved 2nd place on the private leaderboard.

Key Contributions

Detected and analyzed distribution shifts between merged data sources.
Designed custom preprocessing transformers to handle:
- Unit and scale mismatches
- Bimodal feature distributions
- Outliers and extreme values
Built a K-Fold neural network training pipeline with ensembling.
Saved preprocessing artifacts and trained models for full reproducibility.
Performed systematic experimentation to improve generalization.

Methodology

1. Data Exploration & Shift Detection

Exploratory data analysis to compare feature distributions across agencies.
Identification of inconsistent units and bimodal behaviors in multiple features.

2. Preprocessing Pipeline

Custom preprocessing components were implemented to stabilize training:

UnitAligner: Aligns features with inconsistent measurement units.
BimodalSplitter: Separates and normalizes bimodal feature distributions.
Clipper: Limits extreme outliers to reduce training instability.
Robust scaling and missing-value handling.

All preprocessing steps are fitted on training data only and saved as artifacts.

3. Model Training

Neural networks built with TensorFlow / Keras.
K-Fold cross-validation to reduce variance.
Ensembling across folds for final predictions.
Evaluation using Mean Squared Error (MSE).

4. Experiments

Baseline vs enhanced preprocessing
Log-transformed targets
Loss function variations
Seed-based ensembling
Stacking and averaging strategies

Results

Private Leaderboard Rank: 🥈 2nd Place
Private Score (MSE): 0.23875

Submission file:

submission_enhancements.csv

Technologies Used

Python
TensorFlow / Keras
Scikit-learn
Joblib
NumPy / Pandas
Git

Reproducibility

Fixed random seeds for experiments.
Saved preprocessing pipelines and trained models using joblib.
Clear separation between training, validation, and inference steps.

Notes & Limitations

The solution is optimized for robustness under distribution shifts rather than minimal model complexity.
Neural networks were chosen for flexibility; tree-based models were explored but underperformed under shifts.
Further improvements could include uncertainty estimation and automated shift detection.

Author

Menna Thabet
Computer Science & AI Student

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
applai-competition-1.ipynb		applai-competition-1.ipynb
applai-competition-1_Enhanced.ipynb		applai-competition-1_Enhanced.ipynb
leaderboard.jpg		leaderboard.jpg
sample_submission.csv		sample_submission.csv
submission-1.csv		submission-1.csv
submission_enhancements.csv		submission_enhancements.csv
submission_retraining.csv		submission_retraining.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dual-Agency Housing Challenge 🏠

Overview

Key Contributions

Methodology

1. Data Exploration & Shift Detection

2. Preprocessing Pipeline

3. Model Training

4. Experiments

Results

Technologies Used

Reproducibility

Notes & Limitations

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dual-Agency Housing Challenge 🏠

Overview

Key Contributions

Methodology

1. Data Exploration & Shift Detection

2. Preprocessing Pipeline

3. Model Training

4. Experiments

Results

Technologies Used

Reproducibility

Notes & Limitations

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages