Roadmap

This roadmap reflects the current project state and the next major build priorities for the multimodal pancreatic cancer detection repository.

Now: Stabilize The Hybrid Research Repo

Current baseline:

Highest-priority remaining work:

continue refactoring reusable notebook logic into src/
make the notebook more orchestration-focused and less implementation-heavy
keep documentation aligned with the repo's real state
preserve a clean boundary between tracked lightweight outputs and ignored heavy assets
keep the thesis interpretation visible in the codebase: strong biomarker reproducibility, ambiguous CT generalization, and exploratory-only fusion claims

Status:

extend the current lightweight test baseline beyond stable helper modules
add more explicit run paths for common tasks such as report generation and preprocessing
reduce notebook-only helper duplication in EDA and diagnostics sections
tighten artifact naming and report consistency across reports/ and figures/
keep uv dependencies aligned with actual runtime imports
keep the processed-data-first run path explicit so the notebook can run without raw assets in normal use
codify negative-control and multi-seed evaluation patterns so future fusion work stays methodologically honest

Status:

The next major structural improvement after modularization is a cleaner runner surface around the existing notebook and modules.

Targets:

expose common experiment steps through small scripts or thin CLI entry points
make key reports reproducible without manually re-running the full notebook
separate stable experiment interfaces from one-off thesis exploration code
keep notebook usage focused on analysis, visual inspection, and narrative assembly
make it easier to rerun the bias-analysis and summary-generation parts of the workflow outside a full notebook session

Status:

Potential later work includes:

stronger uncertainty tooling under src/uncertainty/
cleaner support for repeated ablation runs
broader experiment packaging for publication or demo purposes
adaptation to real paired multimodal cohorts if such data becomes available
external CT validation on additional institutions to resolve whether the current CT signal is pathological, domain-driven, or mixed
domain-adversarial CT training, likely via a gradient reversal layer, to suppress dataset-of-origin information in learned representations
matched CT-plus-biomarker cohorts so fusion can be evaluated as a real clinical question instead of a synthetic pairing exercise
volumetric CT architectures or transformers once the data and validation setup justify moving beyond slice-based modelling

Status: