ISIC 2024 Skin Cancer Detection with 3D total body photographs

Project Overview

This repository contains the implementation for our Machine learning project, Skin Cancer Detection with 3D-TBP, developed for the ISIC 2024 Challenge. The goal is to build machine learning models that can accurately distinguish between malignant and benign skin lesions using both image and metadata, even in low-quality smartphone-like photographs.

Dataset

Data Sources

ISIC 2024 Challenge Dataset
- Over 400,000 images of individual skin lesions.
- Metadata with 54 features, including patient demographics, lesion characteristics, and diagnostic labels.
External Datasets

Data Distribution

Malignant cases are significantly underrepresented.

Methodology

Tabular Data

Preprocessing Techniques
- Handling Missing Values: Imputation (e.g., mean filling) for numerical features.
- Categorical Features: One-hot encoding for variables like sex and lesion location.
- Normalization: Scaling features to ensure balanced contributions.
- Feature Selection: Correlation matrix and PCA for dimensionality reduction.
- Balancing Classes: Applied SMOTE and ADASYN to oversample malignant cases.
Models Explored
- Random Forest, Extra Trees, XGBoost, and LightGBM.
- LightGBM and XGBoost achieved the best results.
Optimization
- Cross-validation (5-fold stratified).
- Bayesian optimization for hyperparameter tuning.

Image Data

Preprocessing Techniques
- Hair Removal: Used the DullRazor Algorithm to remove hair artifacts.
- Image Resizing: All images resized to 224x224 pixels to ensure uniform input.
- Data Augmentation: 1. Random horizontal and vertical flips. 2. Random resized cropping.
- Normalization: Applied mean and standard deviation values to match pre-trained model requirements.
Models Explored
- MobileNet
- Vision Transformer (ViT)
Cross-validation

Incorporated stratified cross-validation to improve robustness and minimize overfitting.

Ensemble Models

Combined predictions from tabular and image models for better overall performance.

Method

Arithmetic Mean
Geometric Mean
Soft Voting
Stacking
Bagging

Evaluation

Metric: Partial Area Under the ROC Curve (pAUC) above an 80% True Positive Rate (TPR). Hence, scores range from [0.0, 0.2].
Implementation: ISIC pAUC above TPR

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
data		data
models		models
results		results
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISIC 2024 Skin Cancer Detection with 3D total body photographs

Table of Contents

Project Overview

Dataset

Data Sources

Data Distribution

Methodology

Tabular Data

Image Data

Ensemble Models

Method

Evaluation

Results

Tabular Models

Image Models

Ensemble Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ISIC 2024 Skin Cancer Detection with 3D total body photographs

Table of Contents

Project Overview

Dataset

Data Sources

Data Distribution

Methodology

Tabular Data

Image Data

Ensemble Models

Method

Evaluation

Results

Tabular Models

Image Models

Ensemble Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages