Less Confusion in Diffusion

### Title

Less Confusion in Diffusion

### Leaders

Nancy Newlin @nancynewlin-masi, Elias Levy @eliasxlevy

### Collaborators

Karthik Ramadass @KarthikMasi, Gaurav Rudravaram @GauravR1206, Andre Hucke @AndreHucke

### Project Description

The goal of “Less confusion in diffusion” is to develop a LLM-based tool that identifies image issues (eddy current distortions, significant motion, poor resolution, insufficient number of b-vector directions, missing slices, top of brain not in FOV) in diffusion weighted images (DWI) and recommends solutions. Working with DWI can be tricky (especially if you don’t have a diffusion imaging expert on call!) and there are a wide range of distortions, artifacts, and noise that need to be corrected. The proposed tool is designed for people getting started in DWI. This project is the first step toward a tool that gives advice based on data acquisition and quality. 
Contributors to this project will gain experience in 1) using HuggingFace, 2) fine-tuning LLMs, 3) debugging LLMs, and 4) creating a user interface. 
This project requires GPUs for model training. However, anyone is welcome to join the discussions and developing the research plan.

### Link to project repository/sources

https://github.com/nancynewlin-masi/LessConfusionInDiffusion 

### Concerete Goals with Specific Tasks for Brainhack Vanderbilt 2025

1. Get a hugging face account
2. Dataset curation: There is a directory of diffusion image slices. Have an expert create their associated text labels (what should the completion look like?).
3. Data visualization: How samples are there of each type? What do the responses look like? This is the time to look at the data and understand what each of the expected cases are.  
4. Model selection: what hugging face models are appropriate for this image to text task?
5. Test run model: Run model as-is on a test dataset (hopefully provided by hugging face project). At this stage, we need to make sure the model will act as expected (inputs, outputs). 
6. Dataloading: Set up dataloader to get slices and labels from directory and properly interface with current model. 
7. Test: Try model on a few samples and observe the behaviour!
8. Documentation: Input/Output description, open problems, how to use


Advanced: Joint embedding of text and image inputs (ex. “here is my image and I have a b-vector file with 100 directions and b-values ranging 0-2000”)
Extra 1: Improve on response quality (more conversational, more information provided). 
Extra 2: User interface: Set up a local server that can take an image slice as input, and provide a response. 
Extra 3: Upload model to Hugging Face!


### Good first issues

1. issue one: Explore hugging face: Find three potential models for this project and weight the benefits and limitations of each (what size are they? What was the model pre-trained on? What are the expected inputs (pngs, npy, nii.gz)/outputs? 

2. issue two: Practical experience: Get one of those models and run it on your machine as is with a simple training dataset. Observe the training curves/losses. What’s the quality of output? 



### Skills

Must haves:    

- Proficient in Python 
- Proficient in PyTorch
- Basic knowledge of medical images (they have headers and metadata)
- Working knowledge of machine learning principles (training, inference, data loaders)
- Able to pull/push from/to github

Preferred: Experience with diffusion weighted MRI


### Onboarding documentation

Add your name to CONTRIBUTING.md by committing to the repo
Get a hugging face account https://huggingface.co/ 
Basics of Diffusion weighted MRI modality: https://radiopaedia.org/articles/diffusion-weighted-imaging-2?lang=us
Common issues and solutions with these images: https://pubmed.ncbi.nlm.nih.gov/33533094/ 
Downloading a model from hugging face: https://huggingface.co/docs/hub/models-downloading 


### What will participants learn?

Contributors to this project will gain experience in 1) using HuggingFace, 2) fine-tuning LLMs, 3) debugging LLMs, and 4) creating a user interface. 

### Public data to use

Data is currently in NIFTI format here: https://vanderbilt.box.com/s/v50gfkqzirr2pp05dgf9rs45sum3lq8h 

### Number of collaborators

4+

### Credit to collaborators

Name listed in ReadMe (make sure you added your name to contributions) and co-authorship if there is any resulting publication or conference proceeding.

### Image

![brainhack](https://github.com/user-attachments/assets/bc5a338b-5e14-4232-9ce9-94ae21393339)

### Project Summary

This project aims to help beginners in diffusion-weighted imaging (DWI) by detecting issues in DWIs and offering LLM-powered preprocessing recommendations.

### Type

method_development

### Development status

0_concept_no_content

### Topic

diffusion, machine_learning, MR_methodologies

### Tools

ANTs, DIPY, Freesurfer, HuggingFace, MRtrix, Pytorch

### Programming language

documentation, Python, html_css

### Modalities

DWI

### Git skills

1_commit_push

### Anything else?

_No response_

### Things to do after the project is submitted and ready to review.

- [x] Add a comment below the main post of your issue saying: `Hi @brainhack-vandy/project-monitors my project is ready!`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Less Confusion in Diffusion #54

Title

Leaders

Collaborators

Project Description

Link to project repository/sources

Concerete Goals with Specific Tasks for Brainhack Vanderbilt 2025

Good first issues

Skills

Onboarding documentation

What will participants learn?

Public data to use

Number of collaborators

Credit to collaborators

Image

Project Summary

Type

Development status

Topic

Tools

Programming language

Modalities

Git skills

Anything else?

Things to do after the project is submitted and ready to review.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Less Confusion in Diffusion #54

Description

Title

Leaders

Collaborators

Project Description

Link to project repository/sources

Concerete Goals with Specific Tasks for Brainhack Vanderbilt 2025

Good first issues

Skills

Onboarding documentation

What will participants learn?

Public data to use

Number of collaborators

Credit to collaborators

Image

Project Summary

Type

Development status

Topic

Tools

Programming language

Modalities

Git skills

Anything else?

Things to do after the project is submitted and ready to review.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions