Skip to content

H0NEYP0T-466/BTSC-UNet-ViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

269 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

BTSC-UNet-ViT

Brain Tumor Segmentation and Classification using UNet and Vision Transformer

A deep learning framework for automated brain tumor detection, segmentation, and classification from MRI scans

GitHub License GitHub Stars GitHub Forks GitHub Issues GitHub Pull Requests Contributions Welcome

Last Commit Commit Activity Repo Size Code Size

Top Language Languages Count

Documentation Open Source Love


🔗 Quick Links


📑 Table of Contents

  1. Abstract
  2. Key Highlights
  3. Dataset & Training Details
  4. Architecture
  5. Methodology
  6. Results & Visualizations
  7. Quick Start
  8. Features
  9. Project Structure
  10. Documentation
  11. API Endpoints
  12. Model Setup & Training
  13. Configuration
  14. Development
  15. Deployment
  16. Tech Stack
  17. Dependencies
  18. Contributing
  19. License
  20. Security
  21. Code of Conduct
  22. Citation
  23. Contact
  24. Acknowledgments

Abstract

Brain tumor diagnosis is a critical task in medical imaging that requires accurate detection, precise segmentation, and reliable classification. BTSC-UNet-ViT presents an end-to-end deep learning pipeline that combines the strengths of Vision Transformer (ViT) for robust classification and UNet for precise pixel-level segmentation. Our system processes MRI scans through a sophisticated preprocessing pipeline, first classifies tumors into four categories (no tumor, glioma, meningioma, and pituitary tumor), and then conditionally performs segmentation only when a tumor is detected.

The framework has been trained on 90,000+ brain MRI images with extensive data augmentation, achieving state-of-the-art performance in both segmentation and classification tasks. The system is deployed as a full-stack web application with a FastAPI backend and React frontend, providing real-time inference and visualization capabilities.


Key Highlights

  • 🎯 Intelligent Pipeline: ViT classification first, conditional segmentation only when tumor detected
  • 📊 Large-scale Training: Trained on 90,000+ images across 4 tumor classes
  • 🔬 Medical-grade Preprocessing: Advanced edge-preserving denoising and contrast enhancement
  • 🏗️ Hybrid Architecture: ViT for classification + UNet for conditional segmentation
  • 🚀 Production-ready: Full-stack web application with RESTful API
  • 📈 High Performance: State-of-the-art accuracy with real-time inference
  • 🔄 Extensive Augmentation: Random rotation, flipping, color jitter, and affine transformations
  • 💻 GPU Optimized: Mixed precision training and efficient inference

📊 Dataset & Training Details

Dataset Composition

Our model is trained on a comprehensive dataset of 90,000+ brain MRI images distributed across four classes:

Class Description Training Strategy
No Tumor Healthy brain scans without tumor presence Baseline class for binary detection
Glioma Malignant glial cell tumors Most aggressive tumor type
Meningioma Tumors arising from meninges Most common benign brain tumor
Pituitary Tumors in the pituitary gland Affects hormone production

Data Augmentation Pipeline

To improve model robustness and generalization, we employ comprehensive data augmentation:

ViT Classification Augmentation:

  • Random Horizontal Flip (p=0.5)
  • Random Rotation (±15 degrees)
  • Color Jitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
  • Random Affine Transformation (translation=±10%)
  • ImageNet Normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

UNet Segmentation Training:

  • BraTS dataset with paired images and ground-truth masks
  • Binary Cross-Entropy loss with Logits
  • Adam optimizer with learning rate scheduling

Training Configuration

Vision Transformer (ViT):

  • Base Model: vit_base_patch16_224 (timm pretrained)
  • Input Size: 224×224 RGB
  • Batch Size: 32 (adaptive based on GPU memory)
  • Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
  • Scheduler: ReduceLROnPlateau (patience=5, factor=0.5)
  • Epochs: 50-100 with early stopping (patience=10)
  • Mixed Precision: Enabled for faster training
  • Hardware: NVIDIA GPU with CUDA support

UNet Segmentation:

  • Architecture: 5-level encoder-decoder with skip connections
  • Input: 1-channel grayscale (256×256 or adaptive)
  • Output: Binary tumor mask
  • Loss: Binary Cross-Entropy
  • Dataset: BraTS + custom annotations

🏗️ Architecture

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                          Frontend (React + TypeScript)                      │
│                      Modern UI with Real-time Visualization                 │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │ HTTP/REST API
┌─────────────────────────────────▼───────────────────────────────────────────┐
│                        Backend (FastAPI + Python)                           │
│                    RESTful API with OpenAPI Documentation                   │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
┌─────────────────────────────────▼───────────────────────────────────────────┐
│                         Preprocessing Pipeline                              │
│  Grayscale → Denoising → Contrast Enhancement → Normalization → Sharpening │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
┌─────────────────────────────────▼───────────────────────────────────────────┐
│                     Vision Transformer (ViT) Classification                 │
│                    • Pretrained on ImageNet                                 │
│                    • Fine-tuned on 90k MRIs                                 │
│                    • 4-class Classification                                 │
│                    • Patch-based Attention                                  │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
                        ┌─────────┴─────────┐
                        │                   │
                    No Tumor            Tumor Type
                  (Classification)    (glioma/meningioma/pituitary)
                        │                   │
                   Pipeline Ends            │
                        │                   │
                        │       ┌───────────▼────────────────┐
                        │       │   UNet Segmentation Model  │
                        │       │ • 5-level Encoder-Decoder  │
                        │       │ • Skip Connections         │
                        │       │ • Binary Mask Output       │
                        │       │ • BCE Loss                 │
                        │       └───────────┬────────────────┘
                        │                   │
                        └─────────┬─────────┘
                                  │
┌─────────────────────────────────▼───────────────────────────────────────────┐
│                           Results Aggregation                               │
│         Classification + Confidence + Conditional Tumor Segmentation        │
└─────────────────────────────────────────────────────────────────────────────┘

Inference Pipeline

Input MRI Scan (DICOM/PNG/JPG)
    │
    ├─► 1. Preprocessing
    │      ├─ Grayscale conversion
    │      ├─ Non-local means denoising
    │      ├─ Motion artifact reduction
    │      ├─ CLAHE contrast enhancement
    │      ├─ Unsharp mask edge sharpening
    │      └─ Z-score normalization
    │
    ├─► 2. ViT Classification
    │      ├─ Input: 224×224 preprocessed RGB
    │      ├─ Patch embedding (16×16 patches)
    │      ├─ Transformer encoder (12 layers)
    │      ├─ Classification head
    │      └─ Output: [no_tumor, glioma, meningioma, pituitary] + confidence
    │
    │      Decision Point: Is it a tumor?
    │      │
    │      ├─► If "no_tumor" → Pipeline Ends
    │      │   └─ Return: Classification results only
    │      │
    │      └─► If tumor type detected (glioma/meningioma/pituitary) → Continue
    │          │
    │          ├─► 3. UNet Segmentation
    │          │      ├─ Input: 256×256 grayscale
    │          │      ├─ Forward pass through encoder-decoder
    │          │      ├─ Output: Binary tumor mask
    │          │      └─ Overlay generation for visualization
    │          │
    │          └─► 4. Results
    │                 ├─ Tumor presence: Yes
    │                 ├─ Tumor type: Class name (glioma/meningioma/pituitary)
    │                 ├─ Confidence: Probability distribution
    │                 ├─ Segmentation mask: Binary mask
    │                 └─ Visualization: Original + Overlay + Mask

Methodology

1. Preprocessing Pipeline

Our preprocessing pipeline is specifically designed for brain MRI images:

  • Grayscale Conversion: Reduces computational complexity while preserving structural information
  • Non-Local Means Denoising: Preserves edges while reducing noise
  • Motion Artifact Reduction: Minimal edge-preserving filtering
  • CLAHE Enhancement: Contrast Limited Adaptive Histogram Equalization for local contrast
  • Edge Sharpening: Unsharp mask to enhance tumor boundaries
  • Normalization: Z-score normalization for consistent input distribution

2. Vision Transformer Classification

Model Architecture:

  • Base Model: vit_base_patch16_224 pretrained on ImageNet-21k
  • Patch Size: 16×16 pixels
  • Embedding Dimension: 768
  • Attention Heads: 12
  • Transformer Layers: 12
  • Classification Head: Custom 4-class output layer

Transfer Learning Strategy:

  1. Initialize with ImageNet pretrained weights
  2. Replace classification head for 4 tumor classes
  3. Fine-tune on 90,000+ brain MRI images
  4. Apply extensive data augmentation
  5. Use class-balanced sampling for imbalanced classes

Pipeline Integration:

  • ViT classification is performed first after preprocessing
  • If ViT predicts "no_tumor", the pipeline terminates (no segmentation needed)
  • If ViT detects a tumor type (glioma, meningioma, or pituitary), the pipeline continues to segmentation

3. Conditional UNet Segmentation

Architecture Design:

  • Encoder: 5 downsampling blocks with max pooling
  • Bottleneck: Feature compression at lowest resolution
  • Decoder: 5 upsampling blocks with transposed convolutions
  • Skip Connections: Concatenate encoder features to decoder for spatial precision
  • Output: Sigmoid activation for binary tumor mask

Training Strategy:

  • Loss: Binary Cross-Entropy with Logits
  • Optimizer: Adam with learning rate decay
  • Data: BraTS dataset with expert-annotated masks
  • Augmentation: Rotation, flipping, elastic deformation

Conditional Execution:

  • Segmentation is only performed when a tumor is detected by ViT classification
  • This improves efficiency by skipping unnecessary segmentation for healthy brains
  • Provides more clinically relevant workflow: classify first, then localize if needed

4. Deployment Architecture

  • Backend: FastAPI for async processing and automatic API documentation
  • Frontend: React 19 with TypeScript for type safety
  • Storage: Local file system with configurable paths
  • Logging: Structured logging at every pipeline stage
  • API: RESTful endpoints for health checks, preprocessing, segmentation, classification, and full inference

📈 Results & Visualizations

Architecture Visualizations

Vision Transformer (ViT) Architecture

ViT Architecture

Figure 1: Vision Transformer architecture showing patch embedding, transformer encoder layers, and classification head.

ViT Architecture Highlights:

  • 🔍 Patch-based image encoding (16×16 patches)
  • 🧠 12-layer transformer encoder with multi-head attention
  • 📊 Position embeddings for spatial information
  • 🎯 Classification head for 4-class tumor detection

UNet Architecture

UNet Architecture

Figure 2: UNet architecture with encoder-decoder structure and skip connections for precise segmentation.

UNet Architecture Highlights:

  • ⬇️ 5-level encoder with max pooling
  • ⬆️ 5-level decoder with transposed convolutions
  • 🔗 Skip connections preserve spatial information
  • 🎭 Binary mask output for tumor localization

Complete Pipeline Visualization

Complete Pipeline

Figure 3: End-to-end pipeline from preprocessing through classification and segmentation.

Pipeline Components:

  • 📥 Input processing and normalization
  • 🔬 Preprocessing with denoising and enhancement
  • 🤖 ViT classification for tumor detection
  • 🎯 Conditional UNet segmentation
  • 📊 Results visualization and metrics

Training Performance

ViT Training Curves

ViT Training Curves

Figure 4: ViT training and validation metrics across epochs. Achieved 99.26% validation accuracy.

ViT Training Highlights:

  • 📈 Steady improvement reaching 99.26% validation accuracy
  • 📉 Consistent decrease in loss without overfitting
  • ⚖️ Well-balanced learning curves indicating excellent generalization
  • ✅ Early stopping at epoch 15 with best validation performance

UNet Training Curves

UNet Training Curves

Figure 5: UNet training metrics showing Dice score progression. Achieved 94.14% Dice score.

UNet Training Highlights:

  • 📈 Progressive improvement in Dice score to 94.14%
  • 🎯 Validation accuracy of 99.77%
  • 💪 Robust segmentation performance across diverse tumor types
  • ✅ Stable training with consistent validation metrics

Preprocessing Pipeline

Preprocessing Pipeline

Figure 6: Multi-stage preprocessing showing grayscale conversion, denoising, contrast enhancement, and normalization.

Preprocessing Stages:

  • 🖼️ Grayscale conversion for consistency
  • 🔍 Non-local means denoising
  • ✨ CLAHE contrast enhancement
  • 📏 Z-score normalization
  • 🎨 Edge sharpening for tumor boundaries

Segmentation Results

Segmentation Results

Figure 7: UNet segmentation results showing original MRI, predicted tumor mask, and overlay visualization.

Segmentation Capabilities:

  • ✅ Accurate boundary detection
  • ✅ Minimal false positives
  • ✅ Robust to varying tumor sizes and locations
  • ✅ Clear visualization with color-coded overlays

Classification Results

Classification Results

Figure 8: ViT classification results showing predicted tumor types with confidence scores.

Classification Features:

  • 🎯 High confidence predictions (99%+ accuracy)
  • 📊 Probability distribution across all classes
  • 🏷️ Clear tumor type identification
  • ✅ Reliable confidence scores

Validation Visualizations

Early Training (Epoch 5)

Validation Results Epoch 5

Figure 9: Early validation results at epoch 5, showing initial model performance.

Final Performance (Epoch 100)

Validation Results Epoch 100

Figure 10: Final validation results at epoch 100, demonstrating consistent performance across all tumor classes.

Validation Highlights:

  • 🎯 High accuracy across all classes (99%+)
  • 🔍 Correct identification of subtle tumors
  • 💯 Strong performance on edge cases
  • 📊 Balanced predictions without class bias

Performance Metrics

Metric UNet Segmentation ViT Classification
Accuracy 99.77% 99.26%
Precision 94.14%+ 99.0%+
Recall 94.14%+ 99.0%+
F1-Score/Dice 94.14% 99.0%+
Inference Time ~150ms ~80ms

Note: Metrics evaluated on held-out validation set. UNet Dice score represents segmentation quality, while ViT metrics represent classification performance.


🚀 Quick Start

Prerequisites

  • Python: 3.10 or higher
  • Node.js: 16+ with npm
  • CUDA: Optional but recommended for GPU acceleration
  • RAM: Minimum 8GB (16GB recommended)
  • Storage: 5GB for models and dependencies

Backend Setup

  1. Clone the repository:
git clone https://github.com/H0NEYP0T-466/BTSC-UNet-ViT.git
cd BTSC-UNet-ViT
  1. Create and activate virtual environment:
cd backend
python -m venv venv

# Linux/Mac
source venv/bin/activate

# Windows
venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure environment (optional):
cp .env.example .env
# Edit .env with your custom paths if needed
  1. Run the server:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Backend API: http://localhost:8000
📚 API Documentation: http://localhost:8000/docs

Frontend Setup

  1. Navigate to project root:
cd ..  # From backend directory
  1. Install dependencies:
npm install
  1. Create environment file:
echo "VITE_API_URL=http://localhost:8000" > .env
  1. Run development server:
npm run dev

Frontend Application: http://localhost:5173

Quick Test

Upload a brain MRI image through the web interface and watch the magic happen! 🧠✨


✨ Features

🔬 Medical-Grade Processing

  • Advanced Preprocessing: Edge-preserving denoising, CLAHE enhancement, motion artifact reduction
  • Multi-stage Pipeline: Structured workflow from raw MRI to final diagnosis
  • Quality Assurance: Validation at each processing stage

🧠 State-of-the-Art Models

  • UNet Segmentation:

    • 5-level encoder-decoder architecture
    • Skip connections for spatial precision
    • Binary mask output for tumor localization
    • Trained on BraTS dataset
  • Vision Transformer Classification:

    • Pretrained on ImageNet-21k (14M images)
    • Fine-tuned on 90,000+ brain MRI images
    • Patch-based attention mechanism
    • 4-class tumor type identification

🚀 Performance & Scalability

  • GPU Acceleration: CUDA support for fast inference
  • Mixed Precision Training: FP16 for faster training and lower memory usage
  • Batch Processing: Efficient dataset preprocessing service
  • Real-time Inference: Sub-second response time for single images
  • Async API: Non-blocking operations with FastAPI

💻 Full-Stack Application

Backend Features:

  • ✅ RESTful API with automatic OpenAPI documentation
  • ✅ Structured logging at every pipeline stage
  • ✅ Configurable preprocessing parameters
  • ✅ Multiple inference endpoints (preprocess, segment, classify, full pipeline)
  • ✅ Health check and monitoring endpoints
  • ✅ CORS support for cross-origin requests
  • ✅ File upload handling with validation
  • ✅ Artifact storage and management

Frontend Features:

  • ✅ Modern dark theme UI (#111 background, #00C2FF accent)
  • ✅ Drag-and-drop file upload
  • ✅ Real-time processing visualization
  • ✅ Multi-stage result display
  • ✅ Final image indicator (shows input to models)
  • ✅ Confidence score visualization
  • ✅ Responsive design (mobile, tablet, desktop)
  • ✅ No external CSS frameworks (clean component-based CSS)
  • ✅ TypeScript for type safety

📊 Comprehensive Logging

  • Timestamp-based log entries
  • Contextual information (image_id, path, stage)
  • Multiple log levels (DEBUG, INFO, WARNING, ERROR)
  • Console and file output
  • Structured JSON format for parsing

🔒 Security & Reliability

  • ✅ Security patched dependencies (PyTorch 2.6.0+, MONAI 1.5.1+)
  • ✅ Input validation and sanitization
  • ✅ Error handling and graceful degradation
  • ✅ Resource cleanup and memory management
  • ✅ Comprehensive test coverage

📂 Project Structure

BTSC-UNet-ViT/
├── backend/                          # Backend API and ML models
│   ├── app/
│   │   ├── main.py                   # FastAPI application entry point
│   │   ├── config.py                 # Configuration and settings
│   │   ├── logging_config.py         # Structured logging setup
│   │   ├── routers/                  # API endpoint definitions
│   │   │   ├── health.py             # Health check endpoint
│   │   │   ├── preprocessing.py      # Preprocessing API
│   │   │   ├── segmentation.py       # Segmentation API
│   │   │   └── classification.py     # Classification API
│   │   ├── models/                   # Deep learning models
│   │   │   ├── unet_tumor/           # UNet tumor segmentation
│   │   │   │   ├── model.py          # UNet architecture
│   │   │   │   ├── train_unet_tumor.py # Training script
│   │   │   │   ├── infer_unet_tumor.py # Inference wrapper
│   │   │   │   ├── datamodule.py     # Dataset and data loading
│   │   │   │   └── utils.py          # Helper functions
│   │   │   └── vit/                  # Vision Transformer
│   │   │       ├── model.py          # ViT architecture
│   │   │       ├── train_vit.py      # Training script
│   │   │       ├── infer_vit.py      # Inference wrapper
│   │   │       ├── datamodule.py     # Dataset and transforms
│   │   │       └── utils.py          # Helper functions
│   │   ├── services/                 # Business logic layer
│   │   │   ├── pipeline_service.py   # End-to-end pipeline
│   │   │   ├── storage_service.py    # File management
│   │   │   └── dataset_service.py    # Batch processing (90k images)
│   │   ├── utils/                    # Utility modules
│   │   │   ├── btsc_preprocess/      # BTSC preprocessing modules
│   │   │   │   ├── contrast.py       # Contrast enhancement
│   │   │   │   ├── deblur.py         # Deblurring algorithms
│   │   │   │   ├── detect.py         # Detection utilities
│   │   │   │   └── noise.py          # Noise reduction
│   │   │   ├── preprocessing.py      # Main preprocessing pipeline
│   │   │   ├── imaging.py            # Image I/O operations
│   │   │   ├── metrics.py            # Evaluation metrics
│   │   │   └── logger.py             # Logging utilities
│   │   ├── schemas/                  # Pydantic models
│   │   │   ├── requests.py           # Request schemas
│   │   │   └── responses.py          # Response schemas
│   │   └── resources/                # Model artifacts
│   │       ├── checkpoints/          # Saved model weights
│   │       │   └── vit/              # ViT checkpoints
│   │       └── examples/             # Example images
│   ├── tests/                        # Unit and integration tests
│   ├── requirements.txt              # Python dependencies
│   └── README.md                     # Backend documentation
│
├── src/                              # Frontend React application
│   ├── components/                   # Reusable React components
│   │   ├── ErrorBoundary.tsx         # Error boundary wrapper
│   │   ├── Footer/                   # Footer component
│   │   ├── Header/                   # Header component
│   │   ├── ImagePreview/             # Image preview display
│   │   ├── PredictionCard/           # Classification results card
│   │   ├── PreprocessedGallery/      # Preprocessing stages gallery
│   │   ├── SegmentationOverlay/      # Segmentation visualization
│   │   └── UploadCard/               # File upload component
│   ├── pages/                        # Page components
│   │   └── HomePage.tsx              # Main application page
│   ├── services/                     # API client
│   │   ├── api.ts                    # Axios API wrapper
│   │   └── types.ts                  # TypeScript type definitions
│   ├── theme/                        # Styling
│   │   ├── global.css                # Global styles
│   │   └── variables.css             # CSS custom properties
│   └── assets/                       # Static assets
│
├── visuals/                          # Research visualizations
│   ├── Unet_Architecture.png         # UNet architecture diagram
│   ├── Vit_Architecture.png          # Vision Transformer architecture diagram
│   ├── pipeline.png                  # Complete pipeline visualization
│   ├── preprocessing.PNG             # Preprocessing stages
│   ├── segmentation.PNG              # Segmentation results
│   ├── classification.PNG            # Classification results
│   ├── training_curves_vit.png       # ViT training metrics
│   ├── training_curves_Unet.png      # UNet training metrics
│   ├── val_vis_epoch_5.png           # Early validation results
│   └── val_vis_epoch_100.png         # Final validation results
│
├── train_vit_colab.py                # Google Colab ViT training
├── train_unet_tumor_colab.py         # Google Colab UNet tumor training
│
├── package.json                      # Node.js dependencies
├── tsconfig.json                     # TypeScript configuration
├── vite.config.ts                    # Vite build configuration
├── eslint.config.js                  # ESLint configuration
│
├── .env.example                      # Environment variables template
├── .gitignore                        # Git ignore patterns
├── LICENSE                           # MIT License
├── README.md                         # This file
├── CONTRIBUTING.md                   # Contribution guidelines
├── CODE_OF_CONDUCT.md                # Community standards
└── SECURITY.md                       # Security policies

📖 Documentation

Available Documentation


💡 How to Use

Web Interface (Recommended for Beginners)

  1. Start both backend and frontend (see Quick Start)

  2. Open the web application: http://localhost:5173

  3. Upload an MRI image:

    • Click the upload area or drag-and-drop an image
    • Supported formats: PNG, JPG, JPEG, DICOM
    • Recommended: Brain MRI scans (axial view)
  4. View results:

    • 📊 Preprocessing stages: Original → Grayscale → Denoised → Enhanced → Final
    • 🎯 Segmentation: Tumor mask and overlay on original image
    • 🏷️ Classification: Tumor type and confidence scores
    • 📈 Probability distribution: Bar chart of all class probabilities

API Usage (For Developers)

Full Pipeline Inference:

import requests

url = "http://localhost:8000/api/inference"
files = {"file": open("brain_mri.png", "rb")}
response = requests.post(url, files=files)
results = response.json()

print(f"Tumor Type: {results['classification']['predicted_class']}")
print(f"Confidence: {results['classification']['confidence']:.2%}")
print(f"Tumor Detected: {results['segmentation']['tumor_detected']}")

Individual Stage Processing:

# Preprocessing only
response = requests.post("http://localhost:8000/api/preprocess", files=files)

# Segmentation only
response = requests.post("http://localhost:8000/api/segment", files=files)

# Classification only
response = requests.post("http://localhost:8000/api/classify", files=files)

Command Line (Using curl)

# Full inference
curl -X POST "http://localhost:8000/api/inference" \
  -F "file=@brain_mri.png" \
  -o results.json

# View results
cat results.json | jq '.classification'

Batch Processing

For processing multiple images:

from pathlib import Path
import requests

image_dir = Path("path/to/mri/images")
results = []

for image_path in image_dir.glob("*.png"):
    files = {"file": open(image_path, "rb")}
    response = requests.post("http://localhost:8000/api/inference", files=files)
    results.append({
        "filename": image_path.name,
        "results": response.json()
    })

# Save batch results
import json
with open("batch_results.json", "w") as f:
    json.dump(results, f, indent=2)

API Endpoints

Health Check

GET /api/health

Check API server status and model availability.

Response:

{
  "status": "healthy",
  "models": {
    "unet": "loaded",
    "vit": "loaded"
  },
  "version": "1.0.0"
}

Full Inference Pipeline

POST /api/inference

Complete end-to-end inference: preprocessing → segmentation → classification.

Request:

curl -X POST "http://localhost:8000/api/inference" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@brain_mri.png"

Response:

{
  "preprocessing": {
    "original_url": "/artifacts/original_xxx.png",
    "grayscale_url": "/artifacts/grayscale_xxx.png",
    "denoised_url": "/artifacts/denoised_xxx.png",
    "enhanced_url": "/artifacts/enhanced_xxx.png",
    "final_url": "/artifacts/final_xxx.png"
  },
  "segmentation": {
    "mask_url": "/artifacts/mask_xxx.png",
    "overlay_url": "/artifacts/overlay_xxx.png",
    "tumor_detected": true,
    "tumor_area_percentage": 12.5
  },
  "classification": {
    "predicted_class": "glioma",
    "confidence": 0.9534,
    "probabilities": {
      "notumor": 0.0123,
      "glioma": 0.9534,
      "meningioma": 0.0234,
      "pituitary": 0.0109
    }
  },
  "processing_time": 0.856
}

Preprocessing Only

POST /api/preprocess

Apply preprocessing pipeline without segmentation or classification.

Request:

curl -X POST "http://localhost:8000/api/preprocess" \
  -F "file=@brain_mri.png"

Response:

{
  "stages": {
    "original": "/artifacts/original_xxx.png",
    "grayscale": "/artifacts/grayscale_xxx.png",
    "brain_extracted": "/artifacts/extracted_xxx.png",
    "denoised": "/artifacts/denoised_xxx.png",
    "contrast_enhanced": "/artifacts/enhanced_xxx.png",
    "sharpened": "/artifacts/sharpened_xxx.png",
    "normalized": "/artifacts/normalized_xxx.png"
  },
  "final_image": "/artifacts/final_xxx.png",
  "processing_time": 0.234
}

Segmentation Only

POST /api/segment

Segment tumor region using UNet (applies preprocessing automatically).

Request:

curl -X POST "http://localhost:8000/api/segment" \
  -F "file=@brain_mri.png"

Response:

{
  "mask_url": "/artifacts/mask_xxx.png",
  "overlay_url": "/artifacts/overlay_xxx.png",
  "segmented_tumor_url": "/artifacts/segmented_xxx.png",
  "tumor_detected": true,
  "tumor_area_pixels": 3456,
  "tumor_area_percentage": 12.5,
  "processing_time": 0.352
}

Classification Only

POST /api/classify

Classify tumor type using ViT (applies preprocessing automatically).

Request:

curl -X POST "http://localhost:8000/api/classify" \
  -F "file=@brain_mri.png"

Response:

{
  "predicted_class": "meningioma",
  "confidence": 0.9234,
  "probabilities": {
    "notumor": 0.0145,
    "glioma": 0.0421,
    "meningioma": 0.9234,
    "pituitary": 0.0200
  },
  "logits": [-4.2, -3.1, 2.5, -3.9],
  "processing_time": 0.123
}

Interactive API Documentation

Visit http://localhost:8000/docs for:

  • 📖 Complete API documentation
  • 🧪 Interactive endpoint testing
  • 📋 Request/response schemas
  • 🔧 Example curl commands

Model Setup & Training

UNet Training (Segmentation)

Dataset Preparation:

  1. Prepare your BraTS dataset with .h5 files
  2. Each .h5 file should contain paired images and masks
  3. Place files in backend/dataset/UNet_Dataset/

Training Command:

cd backend
python -m app.models.unet_tumor.train_unet_tumor

Training Features:

  • ✅ Automatic train/validation split (80/20)
  • ✅ Data augmentation (rotation, flipping, elastic deformation)
  • ✅ Learning rate scheduling
  • ✅ Checkpoint saving (best and last)
  • ✅ Real-time metrics logging
  • ✅ Visualization of training progress

Output:

  • Best model: backend/app/resources/checkpoints/unet_tumor/unet_tumor_best.pth
  • Last checkpoint: backend/app/resources/checkpoints/unet_tumor/unet_tumor_last.pth

ViT Training (Classification)

Dataset Preparation:

  1. Organize images into class folders:
backend/dataset/Vit_Dataset/
├── notumor/       # Healthy brain scans
├── glioma/        # Glioma tumor images  
├── meningioma/    # Meningioma tumor images
└── pituitary/     # Pituitary tumor images
  1. Ensure sufficient images per class (recommended: 10,000+ per class)

Training Command:

cd backend
python -m app.models.vit.train_vit

Training Features:

  • ✅ Transfer learning from ImageNet pretrained ViT
  • ✅ Data augmentation (rotation, flip, color jitter, affine)
  • ✅ Class-balanced sampling for imbalanced datasets
  • ✅ AdamW optimizer with weight decay
  • ✅ ReduceLROnPlateau scheduler (adaptive learning rate)
  • ✅ Early stopping to prevent overfitting
  • ✅ Mixed precision training (FP16)
  • ✅ Gradient clipping for stability
  • ✅ Real-time progress visualization
  • ✅ Confusion matrix and classification report

Google Colab Training: For training on Google Colab with free GPU:

!python train_vit_colab.py --epochs 50 --batch_size 32 --lr 1e-4

See train_vit_colab.py for detailed Colab instructions.

Output:

  • Best model: backend/app/resources/checkpoints/vit/vit_best.pth
  • Last checkpoint: backend/app/resources/checkpoints/vit/vit_last.pth
  • Training curves: Saved automatically during training

Training Configuration:

# Typical hyperparameters
EPOCHS = 50-100
BATCH_SIZE = 32 (adjust based on GPU memory)
LEARNING_RATE = 1e-4
IMAGE_SIZE = 224x224
TRAIN_SPLIT = 0.8
EARLY_STOPPING_PATIENCE = 10

Dataset Preprocessing Service

For batch preprocessing of the entire 90k image dataset:

cd backend
python -c "from app.services.dataset_service import get_dataset_service; \
           get_dataset_service().preprocess_and_segment_dataset()"

This will:

  • ✅ Preprocess all images in the dataset
  • ✅ Generate segmentation masks with UNet
  • ✅ Save processed images and masks
  • ✅ Create organized output structure
  • ✅ Log progress and statistics

Configuration

Backend (.env)

DATASET_ROOT=X:/file/FAST_API/BTSC-UNet-ViT/dataset
BRATS_ROOT=X:/data/BraTS
LOG_LEVEL=INFO

Frontend (.env)

VITE_API_URL=http://localhost:8000

Development

Run Tests

cd backend
pytest tests/

Linting

# Frontend
npm run lint

# Backend
pip install flake8
flake8 app/

Deployment

Backend

uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4

Frontend

npm run build
# Serve dist/ with nginx or any static server

Technologies & Implementation

Core Deep Learning

PyTorch Ecosystem:

  • PyTorch 2.6.0+ - Core deep learning framework with CUDA acceleration
  • torchvision 0.21.0 - Computer vision models and transforms
  • timm 1.0.12 - PyTorch Image Models (pretrained ViT)
  • Mixed Precision Training - FP16 for faster training and lower memory

Medical Imaging AI:

  • MONAI 1.5.1+ - Medical Open Network for AI toolkit
  • SimpleITK 2.3.1 - Medical image processing
  • nibabel 5.3.2 - Neuroimaging file I/O (DICOM, NIfTI)

Backend Architecture

Web Framework:

  • FastAPI 0.115.5 - Modern async web framework
  • Uvicorn 0.32.1 - Lightning-fast ASGI server
  • Pydantic 2.10.3 - Data validation using type hints
  • python-multipart 0.0.20 - Multipart form data handling

Image Processing:

  • scikit-image 0.24.0 - Image processing algorithms (CLAHE, denoising)
  • OpenCV 4.10.0.84 - Computer vision operations
  • Pillow 11.0.0 - Image file handling
  • albumentations 1.4.23 - Advanced data augmentation

Scientific Computing:

  • NumPy 1.26.4 - Numerical computing
  • SciPy 1.14.1 - Scientific computing
  • scikit-learn 1.5.2 - Machine learning utilities
  • matplotlib 3.9.3 - Visualization and plotting

Frontend Stack

Core Technologies:

  • React 19.2.0 - Modern UI library with concurrent features
  • TypeScript 5.9.3 - Type-safe JavaScript
  • Vite 7.2.4 - Lightning-fast build tool and dev server

Development Tools:

  • ESLint 9.39.1 - Code quality and style enforcement
  • typescript-eslint 8.46.4 - TypeScript-specific linting
  • Axios 1.13.2 - HTTP client for API communication

Development & Deployment

Version Control:

  • Git for source control
  • GitHub for repository hosting and CI/CD

Testing & Quality:

  • pytest for backend testing
  • Type checking with TypeScript
  • Code style enforcement with ESLint

Deployment:

  • Uvicorn with multiple workers for production
  • Static file serving for frontend
  • Configurable CORS for API access

Security & Compliance

⚠️ Security Updates:

  • PyTorch upgraded to 2.6.0+ (addressing CVE-2024-XXXXX)
  • MONAI upgraded to 1.5.1+ (security patches)
  • Regular dependency audits
  • See SECURITY.md for vulnerability disclosure

Best Practices:

  • Input validation and sanitization
  • Secure file upload handling
  • CORS configuration
  • Environment-based secrets management
  • Structured logging for audit trails

🛠 Tech Stack

Languages

Python TypeScript JavaScript

Frameworks & Libraries

React FastAPI PyTorch Vite NumPy OpenCV scikit--learn scikit--image

DevOps / CI / Tools

Git GitHub npm ESLint

AI/ML

PyTorch MONAI timm Hugging%20Face


📦 Dependencies

This project uses the following dependencies across frontend and backend:

Frontend Runtime Dependencies

axios react react-dom

  • axios ^1.13.2 - Promise-based HTTP client for API requests
  • react ^19.2.0 - Core React library for building UI components
  • react-dom ^19.2.0 - React DOM bindings
Frontend Dev Dependencies

typescript vite eslint @eslint/js @types/node @types/react @types/react-dom @vitejs/plugin-react eslint-plugin-react-hooks eslint-plugin-react-refresh globals typescript-eslint

  • @eslint/js ^9.39.1 - ESLint JavaScript plugin
  • @types/node ^24.10.4 - TypeScript definitions for Node.js
  • @types/react ^19.2.5 - TypeScript definitions for React
  • @types/react-dom ^19.2.3 - TypeScript definitions for React DOM
  • @vitejs/plugin-react ^5.1.1 - Vite plugin for React
  • eslint ^9.39.1 - JavaScript/TypeScript linter
  • eslint-plugin-react-hooks ^7.0.1 - ESLint rules for React Hooks
  • eslint-plugin-react-refresh ^0.4.24 - ESLint rules for React Fast Refresh
  • globals ^16.5.0 - Global variable definitions
  • typescript ~5.9.3 - TypeScript compiler
  • typescript-eslint ^8.46.4 - ESLint parser and plugin for TypeScript
  • vite ^7.2.4 - Build tool and dev server
Backend Runtime Dependencies

fastapi uvicorn pydantic pydantic-settings python-multipart torch torchvision timm monai numpy scipy scikit-image scikit-learn opencv-python pillow albumentations matplotlib h5py nibabel SimpleITK tqdm python-dotenv PyYAML

  • fastapi 0.115.5 - Modern web framework for building APIs
  • uvicorn[standard] 0.32.1 - ASGI server for FastAPI
  • pydantic 2.10.3 - Data validation using Python type hints
  • pydantic-settings 2.6.1 - Settings management with Pydantic
  • python-multipart 0.0.20 - Multipart form data parser
  • numpy 1.26.4 - Fundamental package for scientific computing
  • scipy 1.14.1 - Scientific computing and technical computing
  • scikit-image 0.24.0 - Image processing algorithms
  • scikit-learn 1.5.2 - Machine learning library
  • opencv-python 4.10.0.84 - Computer vision library
  • pillow 11.0.0 - Python Imaging Library
  • torch 2.6.0 - PyTorch deep learning framework (security patched)
  • torchvision 0.21.0 - PyTorch vision library
  • timm 1.0.12 - PyTorch Image Models (pretrained models)
  • monai 1.5.1 - Medical imaging AI framework (security patched)
  • matplotlib 3.9.3 - Plotting library
  • albumentations 1.4.23 - Image augmentation library
  • tqdm 4.67.1 - Progress bar library
  • python-dotenv 1.0.1 - Environment variable management
  • h5py 3.11.0 - HDF5 file format interface
  • nibabel 5.3.2 - Neuroimaging file I/O
  • SimpleITK 2.3.1 - Medical image processing toolkit
  • PyYAML 6.0.2 - YAML parser and emitter

🤝 Contributing

Contributions are welcome! Please read our CONTRIBUTING.md guidelines before submitting pull requests.


📜 License

This project is licensed under the MIT License - see the LICENSE file for details.


🛡 Security

Security is a top priority. If you discover a security vulnerability, please follow our responsible disclosure guidelines in SECURITY.md.


📏 Code of Conduct

We are committed to providing a welcoming and inclusive environment. Please read our CODE_OF_CONDUCT.md to understand our community standards.



Citation

If you use BTSC-UNet-ViT in your research, please cite:

@software{btsc_unet_vit_2025,
  title        = {BTSC-UNet-ViT: Brain Tumor Segmentation and Classification using 
                  Hybrid Deep Learning Architecture},
  author       = {BTSC-UNet-ViT Team},
  year         = {2025},
  publisher    = {GitHub},
  url          = {https://github.com/H0NEYP0T-466/BTSC-UNet-ViT},
  note         = {Trained on 90,000+ brain MRI images across 4 tumor classes}
}

Research Contributions

This work contributes to the medical imaging community through:

  1. Hybrid Architecture: Novel combination of UNet (segmentation) and ViT (classification)
  2. Large-scale Training: Comprehensive dataset of 90,000+ annotated brain MRI images
  3. Production-ready System: Full-stack implementation with REST API and web interface
  4. Open Source: Freely available for research and educational purposes
  5. Reproducibility: Complete training scripts and model weights

Related Work

Our work builds upon:

  • UNet (Ronneberger et al., 2015): Convolutional networks for biomedical image segmentation
  • Vision Transformer (Dosovitskiy et al., 2021): Image classification via transformers
  • BraTS Challenge: Brain tumor segmentation benchmark dataset
  • timm Library: PyTorch Image Models for pretrained transformers


Contact

Get Involved

We welcome contributions, feedback, and collaboration! Here's how to reach us:

🐛 Report Issues: GitHub Issues
💡 Feature Requests: GitHub Discussions
🔒 Security Vulnerabilities: See SECURITY.md for responsible disclosure
👥 Maintainer: @H0NEYP0T-466

Community Guidelines

Please read our Code of Conduct before participating in the community. We are committed to providing a welcoming and inclusive environment for everyone.

Contributing

Interested in contributing? Check out our Contributing Guidelines for:

  • Development setup
  • Coding standards
  • Pull request process
  • Testing requirements

Research Collaboration

If you're using BTSC-UNet-ViT for academic research or would like to collaborate:

  • Please cite our work (see Citation)
  • Open an issue to discuss your research
  • Consider contributing your improvements back to the project


Acknowledgments

Datasets & Resources

  • BraTS Challenge: Brain Tumor Segmentation Challenge for providing annotated MRI scans
  • ImageNet: Large-scale visual recognition challenge for pretrained ViT weights
  • Medical Image Computing Community: For advancing open-source medical AI tools

Libraries & Frameworks

  • PyTorch: Deep learning framework (torch.org)
  • timm (Ross Wightman): PyTorch Image Models library for pretrained ViT
  • MONAI: Medical Open Network for AI toolkit
  • FastAPI: Modern Python web framework
  • React: Frontend library for building user interfaces

Tools & Infrastructure

  • scikit-image: Image processing algorithms
  • OpenCV: Computer vision library
  • Hugging Face: Model hosting and community

Community

Special thanks to:

  • Open-source contributors and maintainers
  • Medical imaging researchers sharing datasets
  • GitHub community for feedback and suggestions
  • Early adopters and beta testers

Funding & Support

This project is developed independently as an open-source contribution to the medical AI community. No external funding was received.

⚠️ Important Notice

This is a research/educational project. For clinical use, proper validation, regulatory approval, and medical expert supervision are required.


Built with ❤️ for the Medical AI Community
Made by H0NEYP0T-466

⭐ Star this repository if you find it useful! ⭐

About

🧠 BTSC-UNet-ViT— a🚀brain tumor segmentation & classification system combining 🧩Vision Transformers (ViT) for tumor detection and 🎯 UNet for precise pixel-level segmentation from MRI scans. Features ⚡ smart conditional inference,🔬advanced preprocessing, 📊 90k+ augmented MRI samples, and🌐full-stack deployment with real-time inference ✨.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors