Skip to content

nastiapetrovych/ISIC-2024---Skin-Cancer-Detection-with-3D-TBP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ISIC 2024 Skin Cancer Detection with 3D total body photographs

Table of Contents

  • Project Overview
  • Dataset
  • Methodology
    • Tabular Data
    • Image Data
    • Ensemble Models
  • Evaluation
  • Results

Project Overview

This repository contains the implementation for our Machine learning project, Skin Cancer Detection with 3D-TBP, developed for the ISIC 2024 Challenge. The goal is to build machine learning models that can accurately distinguish between malignant and benign skin lesions using both image and metadata, even in low-quality smartphone-like photographs.

Dataset

Data Sources

  • ISIC 2024 Challenge Dataset
    • Over 400,000 images of individual skin lesions.
    • Metadata with 54 features, including patient demographics, lesion characteristics, and diagnostic labels.
  • External Datasets

Data Distribution

  • Malignant cases are significantly underrepresented.

target value distribution

Methodology

Tabular Data

  • Preprocessing Techniques
    • Handling Missing Values: Imputation (e.g., mean filling) for numerical features.
    • Categorical Features: One-hot encoding for variables like sex and lesion location.
    • Normalization: Scaling features to ensure balanced contributions.
    • Feature Selection: Correlation matrix and PCA for dimensionality reduction.
    • Balancing Classes: Applied SMOTE and ADASYN to oversample malignant cases.
  • Models Explored
    • Random Forest, Extra Trees, XGBoost, and LightGBM.
    • LightGBM and XGBoost achieved the best results.
  • Optimization
    • Cross-validation (5-fold stratified).
    • Bayesian optimization for hyperparameter tuning.

Image Data

  • Preprocessing Techniques

    • Hair Removal: Used the DullRazor Algorithm to remove hair artifacts.
    • Image Resizing: All images resized to 224x224 pixels to ensure uniform input.
    • Data Augmentation: 1. Random horizontal and vertical flips. 2. Random resized cropping.
    • Normalization: Applied mean and standard deviation values to match pre-trained model requirements.
  • Models Explored

  • Cross-validation

    Incorporated stratified cross-validation to improve robustness and minimize overfitting.

Combined predictions from tabular and image models for better overall performance.

Method

  • Arithmetic Mean
  • Geometric Mean
  • Soft Voting
  • Stacking
  • Bagging

Evaluation

  • Metric: Partial Area Under the ROC Curve (pAUC) above an 80% True Positive Rate (TPR). Hence, scores range from [0.0, 0.2].

  • Implementation: ISIC pAUC above TPR

Results

Tabular Models

  • Best Model: XGBoost
  • Results on Kaggle Submission

Tabular Model Comparison

Image Models

Image Model Comparison

Ensemble Models

Ensemble Model

About

The Machine Learning course project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.6%
  • Python 0.4%