Welcome to the SwiFT infant neurodevelopment research project! This guide will help you contribute effectively to our codebase and research efforts.
SwiFT (Swin 4D fMRI Transformer) for infant neurodevelopment is a deep learning framework that predicts early neurodevelopmental outcomes from neonatal fMRI data using the Developing Human Connectome Project (dHCP) dataset.
Key Research Goals:
- Predict Bayley-III composite scores (cognitive, language, motor) from neonatal fMRI
- Enable early intervention through risk prediction at birth
- Advance understanding of early brain development patterns
- Python 3.9.12+
- CUDA-capable GPU (tested on 8x RTX 3090)
- 32GB+ RAM recommended
- Git with LFS support
# Clone the repository
git clone https://github.com/Transconnectome/infant-fmri.git
cd infant-fmri
# Create conda environment
conda env create -f envs/py39.yaml
conda activate py39
# Verify installation
python test/module_test_swin4d.py# Additional dev tools
pip install black isort flake8 pytest pytest-cov jupyterinfant-fmri/
βββ project/ # Main codebase
β βββ main.py # Training entry point
β βββ module/
β β βββ pl_classifier.py # PyTorch Lightning module
β β βββ models/ # Model architectures
β β β βββ swin4d_transformer_ver7.py
β β β βββ patchembedding.py
β β β βββ ...
β β βββ utils/ # Utility modules
β β βββ data_module.py
β β βββ data_preprocess_and_load/
β β βββ ...
βββ paper/ # Synchronized manuscript
β βββ bookchapter.tex # Main LaTeX source
β βββ img/ # Research figures
βββ data/splits/ # Dataset split definitions
βββ pretrained_models/ # Model checkpoints
βββ sample_scripts/ # Example training scripts
βββ test/ # Unit tests
βββ interpretation/ # Model interpretability
βββ envs/ # Environment configs
# Create feature branch
git checkout -b feature/your-feature-name
# Install development hooks (recommended)
pre-commit install
# Run tests to ensure everything works
pytest test/Python Code Style:
# Format code
black project/
isort project/
# Check linting
flake8 project/ --max-line-length=88 --ignore=E203,W503Key Conventions:
- Use type hints for function signatures
- Follow PyTorch Lightning patterns for model modules
- Document complex functions with docstrings
- Keep functions focused and modular
Unit Tests:
# Run all tests
pytest test/ -v
# Run with coverage
pytest test/ --cov=project/ --cov-report=htmlTest Categories:
test_model_*.py: Model architecture teststest_data_*.py: Data loading and preprocessing teststest_training_*.py: Training pipeline tests
Required Test Coverage:
- New model components: 90%+ coverage
- Data processing functions: 85%+ coverage
- Utility functions: 80%+ coverage
- Create model in
project/module/models/:
# example_model.py
import torch
import torch.nn as nn
from .utils import ModelUtils
class NewModelComponent(nn.Module):
"""Brief description of model component.
Args:
input_dim: Input dimension
output_dim: Output dimension
Example:
>>> model = NewModelComponent(128, 64)
>>> output = model(input_tensor)
"""
def __init__(self, input_dim: int, output_dim: int):
super().__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.linear(x)- Add corresponding test:
# test/test_new_model.py
import torch
import pytest
from project.module.models.example_model import NewModelComponent
def test_new_model_component():
model = NewModelComponent(128, 64)
x = torch.randn(4, 128)
output = model(x)
assert output.shape == (4, 64)- Integration with main framework:
# Update pl_classifier.py if needed
# Update main.py argument parsing
# Update configuration filesAdding New Datasets:
- Create dataset class in
project/module/utils/data_preprocess_and_load/datasets.py:
class NewDataset(BaseDataset):
"""New dataset for fMRI analysis."""
def __init__(self, data_path: str, **kwargs):
super().__init__(**kwargs)
self.data_path = data_path
def __getitem__(self, idx: int):
# Implement data loading logic
pass- Add preprocessing pipeline:
# Add to preprocessing.py
def preprocess_new_dataset(input_path: str, output_path: str):
"""Preprocess new dataset to SwiFT format."""
pass- Update data module:
# Modify data_module.py to include new datasetStandard Training Script:
# Basic training
python project/main.py \
--dataset_name dHCP \
--downstream_task cognitive \
--model swin4d_ver7 \
--learning_rate 0.001 \
--batch_size 8 \
--max_epochs 100 \
--devices 1
# Multi-label with ICA
python project/main.py \
--dataset_name dHCP \
--downstream_task cognitive,language,motor \
--model swin4d_ver7 \
--use_ica_features \
--learning_rate 0.0005 \
--batch_size 4 \
--max_epochs 150Recommended Starting Points:
# Model Architecture
embed_dim = [24, 36, 48] # Start with 24
depths = [2, 2, 6, 2] # Standard configuration
num_heads = [3, 6, 12, 24] # Proportional to embed_dim
window_size = [4, 4, 4, 4] # Standard 4D windows
# Training
learning_rate = [0.001, 0.0005, 0.0001] # Start with 0.001
batch_size = [4, 8, 16] # Limited by GPU memory
sequence_length = [20, 50, 100] # 50 often optimalNeptune Integration:
# Set up Neptune logging
export NEPTUNE_API_TOKEN="your-token"
python project/main.py \
--loggername neptune \
--project_name your-username/infant-fmri \
--downstream_task cognitiveTensorBoard Alternative:
# Local logging
python project/main.py \
--loggername tensorboard \
--default_root_dir ./experiments/Setup (if not already configured):
# Check current sync status
./sync_paper.sh status
# Pull latest changes from Overleaf
./sync_paper.sh pull
# After making local changes to paper/
git add paper/
git commit -m "Update paper content"
./sync_paper.sh pushContribution Workflow:
- Pull latest changes:
./sync_paper.sh pull - Make edits in
paper/bookchapter.tex - Test LaTeX compilation locally
- Commit changes:
git add paper/ && git commit -m "description" - Push to Overleaf:
./sync_paper.sh push
Required Documentation for New Features:
- Code Documentation: Inline docstrings and comments
- API Documentation: Update relevant
.mdfiles - Usage Examples: Add to sample scripts or notebooks
- Test Coverage: Comprehensive unit tests
Adding New Experiments:
-
Hypothesis Definition:
- Clear research question
- Expected outcomes
- Statistical analysis plan
-
Implementation:
- Extend existing model or create new variant
- Implement proper evaluation metrics
- Design appropriate baselines
-
Validation:
- Cross-validation strategy
- Statistical significance testing
- Ablation studies
Adding New Interpretation Methods:
# interpretation/new_method.py
import torch
from captum import IntegratedGradients
class NewInterpretationMethod:
"""New method for model interpretation."""
def __init__(self, model):
self.model = model
def attribute(self, inputs, targets):
"""Generate attribution maps."""
# Implementation here
passRequirements for Clinical Features:
- Neurobiological Plausibility: Literature support
- Statistical Validation: Proper significance testing
- Clinical Relevance: Connection to developmental outcomes
- Interpretability: Explainable to clinicians
Clinical Validation Checklist:
- Appropriate patient cohort
- Ethical approval documentation
- Clinical endpoint definition
- Statistical power analysis
- Bias assessment and mitigation
Before Submitting:
- Code follows style guidelines
- All tests pass
- Documentation updated
- Changes tested on sample data
- No performance regressions
PR Template:
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
- [ ] Unit tests added/updated
- [ ] Manual testing completed
- [ ] Performance benchmarks run
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated- Automatic Checks: CI/CD pipeline runs tests
- Code Review: Maintainer review for quality and consistency
- Research Review: Scientific validity for research contributions
- Final Testing: Integration testing before merge
- GitHub Issues: Bug reports and feature requests
- Discussions: Research questions and general help
- Direct Contact: Reach out to maintainers for urgent issues
GPU Memory Issues:
# Reduce batch size
--batch_size 2
# Use gradient checkpointing
--gradient_checkpointing
# Mixed precision training
--precision 16Data Loading Errors:
# Check data path
--image_path /path/to/preprocessed/data
# Verify split files
ls data/splits/dHCP/Code Contributions:
- New model architectures
- Performance improvements
- Bug fixes and optimizations
Research Contributions:
- Novel experiments and analyses
- Clinical validation studies
- Interpretability improvements
Documentation Contributions:
- Tutorial creation
- API documentation
- Clinical guides
Community Contributions:
- Issue triage and support
- Code reviews
- Testing and validation
This project is released under [LICENSE]. Contributors retain copyright over their contributions while granting the project rights to use and distribute the work.
Citation Requirements:
@article{styll2024swift,
title={Swin fMRI Transformer Predicts Early Neurodevelopmental Outcomes from Neonatal fMRI},
author={Styll, Patrick and Kim, Dowon and Cha, Jiook},
journal={[Journal Name]},
year={2024}
}Thank you for contributing to advancing early neurodevelopmental prediction and improving outcomes for infants worldwide! πΌπ§ β¨