Brain Tumor Segmentation and Classification using UNet and Vision Transformer
A deep learning framework for automated brain tumor detection, segmentation, and classification from MRI scans
- 📊 Dataset & Training
- 🏗️ Architecture
- 📈 Results & Visualizations
- 🚀 Quick Start
- ✨ Features
- 📂 Project Structure
- 🛠 Tech Stack
- 📦 Dependencies
- 🤝 Contributing
- 📜 License
- 🛡 Security
- Abstract
- Key Highlights
- Dataset & Training Details
- Architecture
- Methodology
- Results & Visualizations
- Quick Start
- Features
- Project Structure
- Documentation
- API Endpoints
- Model Setup & Training
- Configuration
- Development
- Deployment
- Tech Stack
- Dependencies
- Contributing
- License
- Security
- Code of Conduct
- Citation
- Contact
- Acknowledgments
Brain tumor diagnosis is a critical task in medical imaging that requires accurate detection, precise segmentation, and reliable classification. BTSC-UNet-ViT presents an end-to-end deep learning pipeline that combines the strengths of Vision Transformer (ViT) for robust classification and UNet for precise pixel-level segmentation. Our system processes MRI scans through a sophisticated preprocessing pipeline, first classifies tumors into four categories (no tumor, glioma, meningioma, and pituitary tumor), and then conditionally performs segmentation only when a tumor is detected.
The framework has been trained on 90,000+ brain MRI images with extensive data augmentation, achieving state-of-the-art performance in both segmentation and classification tasks. The system is deployed as a full-stack web application with a FastAPI backend and React frontend, providing real-time inference and visualization capabilities.
- 🎯 Intelligent Pipeline: ViT classification first, conditional segmentation only when tumor detected
- 📊 Large-scale Training: Trained on 90,000+ images across 4 tumor classes
- 🔬 Medical-grade Preprocessing: Advanced edge-preserving denoising and contrast enhancement
- 🏗️ Hybrid Architecture: ViT for classification + UNet for conditional segmentation
- 🚀 Production-ready: Full-stack web application with RESTful API
- 📈 High Performance: State-of-the-art accuracy with real-time inference
- 🔄 Extensive Augmentation: Random rotation, flipping, color jitter, and affine transformations
- 💻 GPU Optimized: Mixed precision training and efficient inference
Our model is trained on a comprehensive dataset of 90,000+ brain MRI images distributed across four classes:
| Class | Description | Training Strategy |
|---|---|---|
| No Tumor | Healthy brain scans without tumor presence | Baseline class for binary detection |
| Glioma | Malignant glial cell tumors | Most aggressive tumor type |
| Meningioma | Tumors arising from meninges | Most common benign brain tumor |
| Pituitary | Tumors in the pituitary gland | Affects hormone production |
To improve model robustness and generalization, we employ comprehensive data augmentation:
ViT Classification Augmentation:
- ✅ Random Horizontal Flip (p=0.5)
- ✅ Random Rotation (±15 degrees)
- ✅ Color Jitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
- ✅ Random Affine Transformation (translation=±10%)
- ✅ ImageNet Normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
UNet Segmentation Training:
- BraTS dataset with paired images and ground-truth masks
- Binary Cross-Entropy loss with Logits
- Adam optimizer with learning rate scheduling
Vision Transformer (ViT):
- Base Model:
vit_base_patch16_224(timm pretrained) - Input Size: 224×224 RGB
- Batch Size: 32 (adaptive based on GPU memory)
- Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
- Scheduler: ReduceLROnPlateau (patience=5, factor=0.5)
- Epochs: 50-100 with early stopping (patience=10)
- Mixed Precision: Enabled for faster training
- Hardware: NVIDIA GPU with CUDA support
UNet Segmentation:
- Architecture: 5-level encoder-decoder with skip connections
- Input: 1-channel grayscale (256×256 or adaptive)
- Output: Binary tumor mask
- Loss: Binary Cross-Entropy
- Dataset: BraTS + custom annotations
┌─────────────────────────────────────────────────────────────────────────────┐
│ Frontend (React + TypeScript) │
│ Modern UI with Real-time Visualization │
└─────────────────────────────────┬───────────────────────────────────────────┘
│ HTTP/REST API
┌─────────────────────────────────▼───────────────────────────────────────────┐
│ Backend (FastAPI + Python) │
│ RESTful API with OpenAPI Documentation │
└─────────────────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────────────────▼───────────────────────────────────────────┐
│ Preprocessing Pipeline │
│ Grayscale → Denoising → Contrast Enhancement → Normalization → Sharpening │
└─────────────────────────────────┬───────────────────────────────────────────┘
│
┌─────────────────────────────────▼───────────────────────────────────────────┐
│ Vision Transformer (ViT) Classification │
│ • Pretrained on ImageNet │
│ • Fine-tuned on 90k MRIs │
│ • 4-class Classification │
│ • Patch-based Attention │
└─────────────────────────────────┬───────────────────────────────────────────┘
│
┌─────────┴─────────┐
│ │
No Tumor Tumor Type
(Classification) (glioma/meningioma/pituitary)
│ │
Pipeline Ends │
│ │
│ ┌───────────▼────────────────┐
│ │ UNet Segmentation Model │
│ │ • 5-level Encoder-Decoder │
│ │ • Skip Connections │
│ │ • Binary Mask Output │
│ │ • BCE Loss │
│ └───────────┬────────────────┘
│ │
└─────────┬─────────┘
│
┌─────────────────────────────────▼───────────────────────────────────────────┐
│ Results Aggregation │
│ Classification + Confidence + Conditional Tumor Segmentation │
└─────────────────────────────────────────────────────────────────────────────┘
Input MRI Scan (DICOM/PNG/JPG)
│
├─► 1. Preprocessing
│ ├─ Grayscale conversion
│ ├─ Non-local means denoising
│ ├─ Motion artifact reduction
│ ├─ CLAHE contrast enhancement
│ ├─ Unsharp mask edge sharpening
│ └─ Z-score normalization
│
├─► 2. ViT Classification
│ ├─ Input: 224×224 preprocessed RGB
│ ├─ Patch embedding (16×16 patches)
│ ├─ Transformer encoder (12 layers)
│ ├─ Classification head
│ └─ Output: [no_tumor, glioma, meningioma, pituitary] + confidence
│
│ Decision Point: Is it a tumor?
│ │
│ ├─► If "no_tumor" → Pipeline Ends
│ │ └─ Return: Classification results only
│ │
│ └─► If tumor type detected (glioma/meningioma/pituitary) → Continue
│ │
│ ├─► 3. UNet Segmentation
│ │ ├─ Input: 256×256 grayscale
│ │ ├─ Forward pass through encoder-decoder
│ │ ├─ Output: Binary tumor mask
│ │ └─ Overlay generation for visualization
│ │
│ └─► 4. Results
│ ├─ Tumor presence: Yes
│ ├─ Tumor type: Class name (glioma/meningioma/pituitary)
│ ├─ Confidence: Probability distribution
│ ├─ Segmentation mask: Binary mask
│ └─ Visualization: Original + Overlay + Mask
Our preprocessing pipeline is specifically designed for brain MRI images:
- Grayscale Conversion: Reduces computational complexity while preserving structural information
- Non-Local Means Denoising: Preserves edges while reducing noise
- Motion Artifact Reduction: Minimal edge-preserving filtering
- CLAHE Enhancement: Contrast Limited Adaptive Histogram Equalization for local contrast
- Edge Sharpening: Unsharp mask to enhance tumor boundaries
- Normalization: Z-score normalization for consistent input distribution
Model Architecture:
- Base Model:
vit_base_patch16_224pretrained on ImageNet-21k - Patch Size: 16×16 pixels
- Embedding Dimension: 768
- Attention Heads: 12
- Transformer Layers: 12
- Classification Head: Custom 4-class output layer
Transfer Learning Strategy:
- Initialize with ImageNet pretrained weights
- Replace classification head for 4 tumor classes
- Fine-tune on 90,000+ brain MRI images
- Apply extensive data augmentation
- Use class-balanced sampling for imbalanced classes
Pipeline Integration:
- ViT classification is performed first after preprocessing
- If ViT predicts "no_tumor", the pipeline terminates (no segmentation needed)
- If ViT detects a tumor type (glioma, meningioma, or pituitary), the pipeline continues to segmentation
Architecture Design:
- Encoder: 5 downsampling blocks with max pooling
- Bottleneck: Feature compression at lowest resolution
- Decoder: 5 upsampling blocks with transposed convolutions
- Skip Connections: Concatenate encoder features to decoder for spatial precision
- Output: Sigmoid activation for binary tumor mask
Training Strategy:
- Loss: Binary Cross-Entropy with Logits
- Optimizer: Adam with learning rate decay
- Data: BraTS dataset with expert-annotated masks
- Augmentation: Rotation, flipping, elastic deformation
Conditional Execution:
- Segmentation is only performed when a tumor is detected by ViT classification
- This improves efficiency by skipping unnecessary segmentation for healthy brains
- Provides more clinically relevant workflow: classify first, then localize if needed
- Backend: FastAPI for async processing and automatic API documentation
- Frontend: React 19 with TypeScript for type safety
- Storage: Local file system with configurable paths
- Logging: Structured logging at every pipeline stage
- API: RESTful endpoints for health checks, preprocessing, segmentation, classification, and full inference
Figure 1: Vision Transformer architecture showing patch embedding, transformer encoder layers, and classification head.
ViT Architecture Highlights:
- 🔍 Patch-based image encoding (16×16 patches)
- 🧠 12-layer transformer encoder with multi-head attention
- 📊 Position embeddings for spatial information
- 🎯 Classification head for 4-class tumor detection
Figure 2: UNet architecture with encoder-decoder structure and skip connections for precise segmentation.
UNet Architecture Highlights:
- ⬇️ 5-level encoder with max pooling
- ⬆️ 5-level decoder with transposed convolutions
- 🔗 Skip connections preserve spatial information
- 🎭 Binary mask output for tumor localization
Pipeline Components:
- 📥 Input processing and normalization
- 🔬 Preprocessing with denoising and enhancement
- 🤖 ViT classification for tumor detection
- 🎯 Conditional UNet segmentation
- 📊 Results visualization and metrics
ViT Training Highlights:
- 📈 Steady improvement reaching 99.26% validation accuracy
- 📉 Consistent decrease in loss without overfitting
- ⚖️ Well-balanced learning curves indicating excellent generalization
- ✅ Early stopping at epoch 15 with best validation performance
UNet Training Highlights:
- 📈 Progressive improvement in Dice score to 94.14%
- 🎯 Validation accuracy of 99.77%
- 💪 Robust segmentation performance across diverse tumor types
- ✅ Stable training with consistent validation metrics
Figure 6: Multi-stage preprocessing showing grayscale conversion, denoising, contrast enhancement, and normalization.
Preprocessing Stages:
- 🖼️ Grayscale conversion for consistency
- 🔍 Non-local means denoising
- ✨ CLAHE contrast enhancement
- 📏 Z-score normalization
- 🎨 Edge sharpening for tumor boundaries
Figure 7: UNet segmentation results showing original MRI, predicted tumor mask, and overlay visualization.
Segmentation Capabilities:
- ✅ Accurate boundary detection
- ✅ Minimal false positives
- ✅ Robust to varying tumor sizes and locations
- ✅ Clear visualization with color-coded overlays
Classification Features:
- 🎯 High confidence predictions (99%+ accuracy)
- 📊 Probability distribution across all classes
- 🏷️ Clear tumor type identification
- ✅ Reliable confidence scores
Figure 10: Final validation results at epoch 100, demonstrating consistent performance across all tumor classes.
Validation Highlights:
- 🎯 High accuracy across all classes (99%+)
- 🔍 Correct identification of subtle tumors
- 💯 Strong performance on edge cases
- 📊 Balanced predictions without class bias
| Metric | UNet Segmentation | ViT Classification |
|---|---|---|
| Accuracy | 99.77% | 99.26% |
| Precision | 94.14%+ | 99.0%+ |
| Recall | 94.14%+ | 99.0%+ |
| F1-Score/Dice | 94.14% | 99.0%+ |
| Inference Time | ~150ms | ~80ms |
Note: Metrics evaluated on held-out validation set. UNet Dice score represents segmentation quality, while ViT metrics represent classification performance.
- Python: 3.10 or higher
- Node.js: 16+ with npm
- CUDA: Optional but recommended for GPU acceleration
- RAM: Minimum 8GB (16GB recommended)
- Storage: 5GB for models and dependencies
- Clone the repository:
git clone https://github.com/H0NEYP0T-466/BTSC-UNet-ViT.git
cd BTSC-UNet-ViT- Create and activate virtual environment:
cd backend
python -m venv venv
# Linux/Mac
source venv/bin/activate
# Windows
venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Configure environment (optional):
cp .env.example .env
# Edit .env with your custom paths if needed- Run the server:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000✅ Backend API: http://localhost:8000
📚 API Documentation: http://localhost:8000/docs
- Navigate to project root:
cd .. # From backend directory- Install dependencies:
npm install- Create environment file:
echo "VITE_API_URL=http://localhost:8000" > .env- Run development server:
npm run dev✅ Frontend Application: http://localhost:5173
Upload a brain MRI image through the web interface and watch the magic happen! 🧠✨
- Advanced Preprocessing: Edge-preserving denoising, CLAHE enhancement, motion artifact reduction
- Multi-stage Pipeline: Structured workflow from raw MRI to final diagnosis
- Quality Assurance: Validation at each processing stage
-
UNet Segmentation:
- 5-level encoder-decoder architecture
- Skip connections for spatial precision
- Binary mask output for tumor localization
- Trained on BraTS dataset
-
Vision Transformer Classification:
- Pretrained on ImageNet-21k (14M images)
- Fine-tuned on 90,000+ brain MRI images
- Patch-based attention mechanism
- 4-class tumor type identification
- GPU Acceleration: CUDA support for fast inference
- Mixed Precision Training: FP16 for faster training and lower memory usage
- Batch Processing: Efficient dataset preprocessing service
- Real-time Inference: Sub-second response time for single images
- Async API: Non-blocking operations with FastAPI
Backend Features:
- ✅ RESTful API with automatic OpenAPI documentation
- ✅ Structured logging at every pipeline stage
- ✅ Configurable preprocessing parameters
- ✅ Multiple inference endpoints (preprocess, segment, classify, full pipeline)
- ✅ Health check and monitoring endpoints
- ✅ CORS support for cross-origin requests
- ✅ File upload handling with validation
- ✅ Artifact storage and management
Frontend Features:
- ✅ Modern dark theme UI (#111 background, #00C2FF accent)
- ✅ Drag-and-drop file upload
- ✅ Real-time processing visualization
- ✅ Multi-stage result display
- ✅ Final image indicator (shows input to models)
- ✅ Confidence score visualization
- ✅ Responsive design (mobile, tablet, desktop)
- ✅ No external CSS frameworks (clean component-based CSS)
- ✅ TypeScript for type safety
- Timestamp-based log entries
- Contextual information (image_id, path, stage)
- Multiple log levels (DEBUG, INFO, WARNING, ERROR)
- Console and file output
- Structured JSON format for parsing
- ✅ Security patched dependencies (PyTorch 2.6.0+, MONAI 1.5.1+)
- ✅ Input validation and sanitization
- ✅ Error handling and graceful degradation
- ✅ Resource cleanup and memory management
- ✅ Comprehensive test coverage
BTSC-UNet-ViT/
├── backend/ # Backend API and ML models
│ ├── app/
│ │ ├── main.py # FastAPI application entry point
│ │ ├── config.py # Configuration and settings
│ │ ├── logging_config.py # Structured logging setup
│ │ ├── routers/ # API endpoint definitions
│ │ │ ├── health.py # Health check endpoint
│ │ │ ├── preprocessing.py # Preprocessing API
│ │ │ ├── segmentation.py # Segmentation API
│ │ │ └── classification.py # Classification API
│ │ ├── models/ # Deep learning models
│ │ │ ├── unet_tumor/ # UNet tumor segmentation
│ │ │ │ ├── model.py # UNet architecture
│ │ │ │ ├── train_unet_tumor.py # Training script
│ │ │ │ ├── infer_unet_tumor.py # Inference wrapper
│ │ │ │ ├── datamodule.py # Dataset and data loading
│ │ │ │ └── utils.py # Helper functions
│ │ │ └── vit/ # Vision Transformer
│ │ │ ├── model.py # ViT architecture
│ │ │ ├── train_vit.py # Training script
│ │ │ ├── infer_vit.py # Inference wrapper
│ │ │ ├── datamodule.py # Dataset and transforms
│ │ │ └── utils.py # Helper functions
│ │ ├── services/ # Business logic layer
│ │ │ ├── pipeline_service.py # End-to-end pipeline
│ │ │ ├── storage_service.py # File management
│ │ │ └── dataset_service.py # Batch processing (90k images)
│ │ ├── utils/ # Utility modules
│ │ │ ├── btsc_preprocess/ # BTSC preprocessing modules
│ │ │ │ ├── contrast.py # Contrast enhancement
│ │ │ │ ├── deblur.py # Deblurring algorithms
│ │ │ │ ├── detect.py # Detection utilities
│ │ │ │ └── noise.py # Noise reduction
│ │ │ ├── preprocessing.py # Main preprocessing pipeline
│ │ │ ├── imaging.py # Image I/O operations
│ │ │ ├── metrics.py # Evaluation metrics
│ │ │ └── logger.py # Logging utilities
│ │ ├── schemas/ # Pydantic models
│ │ │ ├── requests.py # Request schemas
│ │ │ └── responses.py # Response schemas
│ │ └── resources/ # Model artifacts
│ │ ├── checkpoints/ # Saved model weights
│ │ │ └── vit/ # ViT checkpoints
│ │ └── examples/ # Example images
│ ├── tests/ # Unit and integration tests
│ ├── requirements.txt # Python dependencies
│ └── README.md # Backend documentation
│
├── src/ # Frontend React application
│ ├── components/ # Reusable React components
│ │ ├── ErrorBoundary.tsx # Error boundary wrapper
│ │ ├── Footer/ # Footer component
│ │ ├── Header/ # Header component
│ │ ├── ImagePreview/ # Image preview display
│ │ ├── PredictionCard/ # Classification results card
│ │ ├── PreprocessedGallery/ # Preprocessing stages gallery
│ │ ├── SegmentationOverlay/ # Segmentation visualization
│ │ └── UploadCard/ # File upload component
│ ├── pages/ # Page components
│ │ └── HomePage.tsx # Main application page
│ ├── services/ # API client
│ │ ├── api.ts # Axios API wrapper
│ │ └── types.ts # TypeScript type definitions
│ ├── theme/ # Styling
│ │ ├── global.css # Global styles
│ │ └── variables.css # CSS custom properties
│ └── assets/ # Static assets
│
├── visuals/ # Research visualizations
│ ├── Unet_Architecture.png # UNet architecture diagram
│ ├── Vit_Architecture.png # Vision Transformer architecture diagram
│ ├── pipeline.png # Complete pipeline visualization
│ ├── preprocessing.PNG # Preprocessing stages
│ ├── segmentation.PNG # Segmentation results
│ ├── classification.PNG # Classification results
│ ├── training_curves_vit.png # ViT training metrics
│ ├── training_curves_Unet.png # UNet training metrics
│ ├── val_vis_epoch_5.png # Early validation results
│ └── val_vis_epoch_100.png # Final validation results
│
├── train_vit_colab.py # Google Colab ViT training
├── train_unet_tumor_colab.py # Google Colab UNet tumor training
│
├── package.json # Node.js dependencies
├── tsconfig.json # TypeScript configuration
├── vite.config.ts # Vite build configuration
├── eslint.config.js # ESLint configuration
│
├── .env.example # Environment variables template
├── .gitignore # Git ignore patterns
├── LICENSE # MIT License
├── README.md # This file
├── CONTRIBUTING.md # Contribution guidelines
├── CODE_OF_CONDUCT.md # Community standards
└── SECURITY.md # Security policies
- Backend README - API details, training procedures, deployment guide
- Security Policy - Security updates and vulnerability disclosure
- Contributing Guidelines - How to contribute to the project
- Code of Conduct - Community standards and expectations
-
Start both backend and frontend (see Quick Start)
-
Open the web application: http://localhost:5173
-
Upload an MRI image:
- Click the upload area or drag-and-drop an image
- Supported formats: PNG, JPG, JPEG, DICOM
- Recommended: Brain MRI scans (axial view)
-
View results:
- 📊 Preprocessing stages: Original → Grayscale → Denoised → Enhanced → Final
- 🎯 Segmentation: Tumor mask and overlay on original image
- 🏷️ Classification: Tumor type and confidence scores
- 📈 Probability distribution: Bar chart of all class probabilities
Full Pipeline Inference:
import requests
url = "http://localhost:8000/api/inference"
files = {"file": open("brain_mri.png", "rb")}
response = requests.post(url, files=files)
results = response.json()
print(f"Tumor Type: {results['classification']['predicted_class']}")
print(f"Confidence: {results['classification']['confidence']:.2%}")
print(f"Tumor Detected: {results['segmentation']['tumor_detected']}")Individual Stage Processing:
# Preprocessing only
response = requests.post("http://localhost:8000/api/preprocess", files=files)
# Segmentation only
response = requests.post("http://localhost:8000/api/segment", files=files)
# Classification only
response = requests.post("http://localhost:8000/api/classify", files=files)# Full inference
curl -X POST "http://localhost:8000/api/inference" \
-F "file=@brain_mri.png" \
-o results.json
# View results
cat results.json | jq '.classification'For processing multiple images:
from pathlib import Path
import requests
image_dir = Path("path/to/mri/images")
results = []
for image_path in image_dir.glob("*.png"):
files = {"file": open(image_path, "rb")}
response = requests.post("http://localhost:8000/api/inference", files=files)
results.append({
"filename": image_path.name,
"results": response.json()
})
# Save batch results
import json
with open("batch_results.json", "w") as f:
json.dump(results, f, indent=2)GET /api/health
Check API server status and model availability.
Response:
{
"status": "healthy",
"models": {
"unet": "loaded",
"vit": "loaded"
},
"version": "1.0.0"
}POST /api/inference
Complete end-to-end inference: preprocessing → segmentation → classification.
Request:
curl -X POST "http://localhost:8000/api/inference" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@brain_mri.png"Response:
{
"preprocessing": {
"original_url": "/artifacts/original_xxx.png",
"grayscale_url": "/artifacts/grayscale_xxx.png",
"denoised_url": "/artifacts/denoised_xxx.png",
"enhanced_url": "/artifacts/enhanced_xxx.png",
"final_url": "/artifacts/final_xxx.png"
},
"segmentation": {
"mask_url": "/artifacts/mask_xxx.png",
"overlay_url": "/artifacts/overlay_xxx.png",
"tumor_detected": true,
"tumor_area_percentage": 12.5
},
"classification": {
"predicted_class": "glioma",
"confidence": 0.9534,
"probabilities": {
"notumor": 0.0123,
"glioma": 0.9534,
"meningioma": 0.0234,
"pituitary": 0.0109
}
},
"processing_time": 0.856
}POST /api/preprocess
Apply preprocessing pipeline without segmentation or classification.
Request:
curl -X POST "http://localhost:8000/api/preprocess" \
-F "file=@brain_mri.png"Response:
{
"stages": {
"original": "/artifacts/original_xxx.png",
"grayscale": "/artifacts/grayscale_xxx.png",
"brain_extracted": "/artifacts/extracted_xxx.png",
"denoised": "/artifacts/denoised_xxx.png",
"contrast_enhanced": "/artifacts/enhanced_xxx.png",
"sharpened": "/artifacts/sharpened_xxx.png",
"normalized": "/artifacts/normalized_xxx.png"
},
"final_image": "/artifacts/final_xxx.png",
"processing_time": 0.234
}POST /api/segment
Segment tumor region using UNet (applies preprocessing automatically).
Request:
curl -X POST "http://localhost:8000/api/segment" \
-F "file=@brain_mri.png"Response:
{
"mask_url": "/artifacts/mask_xxx.png",
"overlay_url": "/artifacts/overlay_xxx.png",
"segmented_tumor_url": "/artifacts/segmented_xxx.png",
"tumor_detected": true,
"tumor_area_pixels": 3456,
"tumor_area_percentage": 12.5,
"processing_time": 0.352
}POST /api/classify
Classify tumor type using ViT (applies preprocessing automatically).
Request:
curl -X POST "http://localhost:8000/api/classify" \
-F "file=@brain_mri.png"Response:
{
"predicted_class": "meningioma",
"confidence": 0.9234,
"probabilities": {
"notumor": 0.0145,
"glioma": 0.0421,
"meningioma": 0.9234,
"pituitary": 0.0200
},
"logits": [-4.2, -3.1, 2.5, -3.9],
"processing_time": 0.123
}Visit http://localhost:8000/docs for:
- 📖 Complete API documentation
- 🧪 Interactive endpoint testing
- 📋 Request/response schemas
- 🔧 Example curl commands
Dataset Preparation:
- Prepare your BraTS dataset with .h5 files
- Each .h5 file should contain paired images and masks
- Place files in
backend/dataset/UNet_Dataset/
Training Command:
cd backend
python -m app.models.unet_tumor.train_unet_tumorTraining Features:
- ✅ Automatic train/validation split (80/20)
- ✅ Data augmentation (rotation, flipping, elastic deformation)
- ✅ Learning rate scheduling
- ✅ Checkpoint saving (best and last)
- ✅ Real-time metrics logging
- ✅ Visualization of training progress
Output:
- Best model:
backend/app/resources/checkpoints/unet_tumor/unet_tumor_best.pth - Last checkpoint:
backend/app/resources/checkpoints/unet_tumor/unet_tumor_last.pth
Dataset Preparation:
- Organize images into class folders:
backend/dataset/Vit_Dataset/
├── notumor/ # Healthy brain scans
├── glioma/ # Glioma tumor images
├── meningioma/ # Meningioma tumor images
└── pituitary/ # Pituitary tumor images
- Ensure sufficient images per class (recommended: 10,000+ per class)
Training Command:
cd backend
python -m app.models.vit.train_vitTraining Features:
- ✅ Transfer learning from ImageNet pretrained ViT
- ✅ Data augmentation (rotation, flip, color jitter, affine)
- ✅ Class-balanced sampling for imbalanced datasets
- ✅ AdamW optimizer with weight decay
- ✅ ReduceLROnPlateau scheduler (adaptive learning rate)
- ✅ Early stopping to prevent overfitting
- ✅ Mixed precision training (FP16)
- ✅ Gradient clipping for stability
- ✅ Real-time progress visualization
- ✅ Confusion matrix and classification report
Google Colab Training: For training on Google Colab with free GPU:
!python train_vit_colab.py --epochs 50 --batch_size 32 --lr 1e-4See train_vit_colab.py for detailed Colab instructions.
Output:
- Best model:
backend/app/resources/checkpoints/vit/vit_best.pth - Last checkpoint:
backend/app/resources/checkpoints/vit/vit_last.pth - Training curves: Saved automatically during training
Training Configuration:
# Typical hyperparameters
EPOCHS = 50-100
BATCH_SIZE = 32 (adjust based on GPU memory)
LEARNING_RATE = 1e-4
IMAGE_SIZE = 224x224
TRAIN_SPLIT = 0.8
EARLY_STOPPING_PATIENCE = 10For batch preprocessing of the entire 90k image dataset:
cd backend
python -c "from app.services.dataset_service import get_dataset_service; \
get_dataset_service().preprocess_and_segment_dataset()"This will:
- ✅ Preprocess all images in the dataset
- ✅ Generate segmentation masks with UNet
- ✅ Save processed images and masks
- ✅ Create organized output structure
- ✅ Log progress and statistics
DATASET_ROOT=X:/file/FAST_API/BTSC-UNet-ViT/dataset
BRATS_ROOT=X:/data/BraTS
LOG_LEVEL=INFOVITE_API_URL=http://localhost:8000cd backend
pytest tests/# Frontend
npm run lint
# Backend
pip install flake8
flake8 app/uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4npm run build
# Serve dist/ with nginx or any static serverPyTorch Ecosystem:
- PyTorch 2.6.0+ - Core deep learning framework with CUDA acceleration
- torchvision 0.21.0 - Computer vision models and transforms
- timm 1.0.12 - PyTorch Image Models (pretrained ViT)
- Mixed Precision Training - FP16 for faster training and lower memory
Medical Imaging AI:
- MONAI 1.5.1+ - Medical Open Network for AI toolkit
- SimpleITK 2.3.1 - Medical image processing
- nibabel 5.3.2 - Neuroimaging file I/O (DICOM, NIfTI)
Web Framework:
- FastAPI 0.115.5 - Modern async web framework
- Uvicorn 0.32.1 - Lightning-fast ASGI server
- Pydantic 2.10.3 - Data validation using type hints
- python-multipart 0.0.20 - Multipart form data handling
Image Processing:
- scikit-image 0.24.0 - Image processing algorithms (CLAHE, denoising)
- OpenCV 4.10.0.84 - Computer vision operations
- Pillow 11.0.0 - Image file handling
- albumentations 1.4.23 - Advanced data augmentation
Scientific Computing:
- NumPy 1.26.4 - Numerical computing
- SciPy 1.14.1 - Scientific computing
- scikit-learn 1.5.2 - Machine learning utilities
- matplotlib 3.9.3 - Visualization and plotting
Core Technologies:
- React 19.2.0 - Modern UI library with concurrent features
- TypeScript 5.9.3 - Type-safe JavaScript
- Vite 7.2.4 - Lightning-fast build tool and dev server
Development Tools:
- ESLint 9.39.1 - Code quality and style enforcement
- typescript-eslint 8.46.4 - TypeScript-specific linting
- Axios 1.13.2 - HTTP client for API communication
Version Control:
- Git for source control
- GitHub for repository hosting and CI/CD
Testing & Quality:
- pytest for backend testing
- Type checking with TypeScript
- Code style enforcement with ESLint
Deployment:
- Uvicorn with multiple workers for production
- Static file serving for frontend
- Configurable CORS for API access
- PyTorch upgraded to 2.6.0+ (addressing CVE-2024-XXXXX)
- MONAI upgraded to 1.5.1+ (security patches)
- Regular dependency audits
- See SECURITY.md for vulnerability disclosure
Best Practices:
- Input validation and sanitization
- Secure file upload handling
- CORS configuration
- Environment-based secrets management
- Structured logging for audit trails
This project uses the following dependencies across frontend and backend:
Frontend Runtime Dependencies
- axios ^1.13.2 - Promise-based HTTP client for API requests
- react ^19.2.0 - Core React library for building UI components
- react-dom ^19.2.0 - React DOM bindings
Frontend Dev Dependencies
- @eslint/js ^9.39.1 - ESLint JavaScript plugin
- @types/node ^24.10.4 - TypeScript definitions for Node.js
- @types/react ^19.2.5 - TypeScript definitions for React
- @types/react-dom ^19.2.3 - TypeScript definitions for React DOM
- @vitejs/plugin-react ^5.1.1 - Vite plugin for React
- eslint ^9.39.1 - JavaScript/TypeScript linter
- eslint-plugin-react-hooks ^7.0.1 - ESLint rules for React Hooks
- eslint-plugin-react-refresh ^0.4.24 - ESLint rules for React Fast Refresh
- globals ^16.5.0 - Global variable definitions
- typescript ~5.9.3 - TypeScript compiler
- typescript-eslint ^8.46.4 - ESLint parser and plugin for TypeScript
- vite ^7.2.4 - Build tool and dev server
Backend Runtime Dependencies
- fastapi 0.115.5 - Modern web framework for building APIs
- uvicorn[standard] 0.32.1 - ASGI server for FastAPI
- pydantic 2.10.3 - Data validation using Python type hints
- pydantic-settings 2.6.1 - Settings management with Pydantic
- python-multipart 0.0.20 - Multipart form data parser
- numpy 1.26.4 - Fundamental package for scientific computing
- scipy 1.14.1 - Scientific computing and technical computing
- scikit-image 0.24.0 - Image processing algorithms
- scikit-learn 1.5.2 - Machine learning library
- opencv-python 4.10.0.84 - Computer vision library
- pillow 11.0.0 - Python Imaging Library
- torch 2.6.0 - PyTorch deep learning framework (security patched)
- torchvision 0.21.0 - PyTorch vision library
- timm 1.0.12 - PyTorch Image Models (pretrained models)
- monai 1.5.1 - Medical imaging AI framework (security patched)
- matplotlib 3.9.3 - Plotting library
- albumentations 1.4.23 - Image augmentation library
- tqdm 4.67.1 - Progress bar library
- python-dotenv 1.0.1 - Environment variable management
- h5py 3.11.0 - HDF5 file format interface
- nibabel 5.3.2 - Neuroimaging file I/O
- SimpleITK 2.3.1 - Medical image processing toolkit
- PyYAML 6.0.2 - YAML parser and emitter
Contributions are welcome! Please read our CONTRIBUTING.md guidelines before submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
Security is a top priority. If you discover a security vulnerability, please follow our responsible disclosure guidelines in SECURITY.md.
We are committed to providing a welcoming and inclusive environment. Please read our CODE_OF_CONDUCT.md to understand our community standards.
If you use BTSC-UNet-ViT in your research, please cite:
@software{btsc_unet_vit_2025,
title = {BTSC-UNet-ViT: Brain Tumor Segmentation and Classification using
Hybrid Deep Learning Architecture},
author = {BTSC-UNet-ViT Team},
year = {2025},
publisher = {GitHub},
url = {https://github.com/H0NEYP0T-466/BTSC-UNet-ViT},
note = {Trained on 90,000+ brain MRI images across 4 tumor classes}
}This work contributes to the medical imaging community through:
- Hybrid Architecture: Novel combination of UNet (segmentation) and ViT (classification)
- Large-scale Training: Comprehensive dataset of 90,000+ annotated brain MRI images
- Production-ready System: Full-stack implementation with REST API and web interface
- Open Source: Freely available for research and educational purposes
- Reproducibility: Complete training scripts and model weights
Our work builds upon:
- UNet (Ronneberger et al., 2015): Convolutional networks for biomedical image segmentation
- Vision Transformer (Dosovitskiy et al., 2021): Image classification via transformers
- BraTS Challenge: Brain tumor segmentation benchmark dataset
- timm Library: PyTorch Image Models for pretrained transformers
We welcome contributions, feedback, and collaboration! Here's how to reach us:
🐛 Report Issues: GitHub Issues
💡 Feature Requests: GitHub Discussions
🔒 Security Vulnerabilities: See SECURITY.md for responsible disclosure
👥 Maintainer: @H0NEYP0T-466
Please read our Code of Conduct before participating in the community. We are committed to providing a welcoming and inclusive environment for everyone.
Interested in contributing? Check out our Contributing Guidelines for:
- Development setup
- Coding standards
- Pull request process
- Testing requirements
If you're using BTSC-UNet-ViT for academic research or would like to collaborate:
- Please cite our work (see Citation)
- Open an issue to discuss your research
- Consider contributing your improvements back to the project
- BraTS Challenge: Brain Tumor Segmentation Challenge for providing annotated MRI scans
- ImageNet: Large-scale visual recognition challenge for pretrained ViT weights
- Medical Image Computing Community: For advancing open-source medical AI tools
- PyTorch: Deep learning framework (torch.org)
- timm (Ross Wightman): PyTorch Image Models library for pretrained ViT
- MONAI: Medical Open Network for AI toolkit
- FastAPI: Modern Python web framework
- React: Frontend library for building user interfaces
- scikit-image: Image processing algorithms
- OpenCV: Computer vision library
- Hugging Face: Model hosting and community
Special thanks to:
- Open-source contributors and maintainers
- Medical imaging researchers sharing datasets
- GitHub community for feedback and suggestions
- Early adopters and beta testers
This project is developed independently as an open-source contribution to the medical AI community. No external funding was received.
This is a research/educational project. For clinical use, proper validation, regulatory approval, and medical expert supervision are required.
Built with ❤️ for the Medical AI Community
Made by H0NEYP0T-466
⭐ Star this repository if you find it useful! ⭐



