Skip to content

This Reserach leverages advanced machine learning algorithms to predict cancer patient survival rates. Using the TRACERx lung cancer dataset, we aim to enhance prognostic accuracy by integrating innovative feature generation methods and robust predictive models.

License

Notifications You must be signed in to change notification settings

rafipatel/MLCancerResearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Cancer Research Project

Overview

Welcome to the repository for our research project on predicting cancer progression and survival rates using evolutionary cancer trees integrated with advanced machine learning algorithms. This innovative approach leverages multi-regional sequencing data and sophisticated computational techniques to enhance our understanding of cancer dynamics and improve prognostic accuracy.

Cancer is a fundamentally genetic disease characterized by complex clonal evolution, where different cancer cell populations evolve over time. Understanding these evolutionary dynamics is critical for developing targeted treatments and improving prognostic accuracy. In this project, we employ evolutionary cancer trees, constructed from multi-regional sequencing data, to model the evolutionary relationships among cancer clones.

By integrating these evolutionary models with machine learning algorithms such as linear regression, random forests, support vector machines, and genetic algorithms, we aim to enhance the prediction of survival rates among cancer patients. Our study is grounded in the TRACERx lung cancer dataset, providing a rich and clinically relevant foundation for predictive analysis.

Table of Contents

Project Structure

ml-cancer-research/
├── data/                   # Dataset directory
├── models/                 # Saved model files
├── checkpoints/            # Training checkpoints
├── graphs/                 # Generated visualizations
├── dissertation/           # Research documentation
├── structuring_project/    # Main project code
│   ├── preprocessing.py    # Data processing pipeline
│   ├── train_models.py     # Model training scripts
│   ├── evaluation.py       # Model evaluation tools
│   ├── utils.py            # Utility functions
│   └── experiments.ipynb   # Initial Experiments
├── NN and XGboost.csv      # Model comparison data
├── requirements.txt        # Project dependencies
└── LICENSE                 # License information

Installation

  1. Clone the repository:
git clone https://github.com/rafipatel/MLCancerResearch.git
cd MLCancerResearch
  1. Create a virtual environment in python or conda (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

Data Preprocessing

Data Preprocessing, Model Training and Model Evaluation

python structuring_project/train_models.py 

Data

The project uses lung cancer datasets stored in the data/ directory. Key components:

  • Data is placed in data
  • Data preprocessing pipeline is defined in preprocessing.py

Models

The project implements several machine learning models:

  • Linear Regression
  • Lasso Regression
  • Ridge Regression
  • Neural Networks
  • XGBoost

Model artifacts are saved in:

  • models/: Model architectures
  • checkpoints/: Training checkpoints for model recovery and selection

Scripts

preprocessing.py

  • Data cleaning and normalization
  • Feature engineering
  • Data transformation pipelines

train_models.py

  • Model architecture definitions
  • Training loop implementation
  • Hyperparameter configuration
  • Checkpoint management

evaluation.py

  • Performance metric calculations
  • Model comparison tools
  • Visualization generation

utils.py

  • Data loading/saving utilities
  • Common helper functions
  • Configuration management

Documentation

  • Detailed project documentation is available in the dissertation/ directory
  • Technical implementation details are in MLCancerResearch_final.zip
  • Additional research context: "The evolution of lung cancer TracerX.pdf"

License

This project is licensed under the LICENSE - see the LICENSE file for details.

About

This Reserach leverages advanced machine learning algorithms to predict cancer patient survival rates. Using the TRACERx lung cancer dataset, we aim to enhance prognostic accuracy by integrating innovative feature generation methods and robust predictive models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published