- 2025.1.12
- tak1217
Reusable Docker-based environments for reproducible bioinformatics analysis.
This repository provides curated Docker environments for bioinformatics workflows, designed to be shared across projects and machines (Windows + WSL2, macOS, Linux servers).
The goal is:
- Reproducibility
- Portability
- Separation of environment and project code
- Easy interactive use (Jupyter, bash, R, Python)
├── README.md
└── env
├── dev
│ ├── Dockerfile
│ └── environment.yml
└── rnaseq
Each subdirectory under env/ represents an independent environment
that can be built into its own Docker image.
Purpose
Exploratory analysis and development environment for laptops and workstations.
Typical use cases:
- Interactive data analysis (Python / R)
- Visualization (matplotlib, seaborn, plotly, ggplot2)
- Jupyter notebooks
- Light bioinformatics processing (QC, file manipulation)
- Writing reusable scripts across projects
Included software
Python stack:
- numpy, pandas, scipy
- matplotlib, seaborn, plotly
- scikit-learn, statsmodels
- tqdm, joblib, pyyaml
- openpyxl, xlrd, nbconvert, pandoc
- ipython, ipykernel
- biopython
R stack:
- r-base (4.3)
- tidyverse
- data.table
- patchwork
- r-essentials
Bioinformatics tools:
- samtools, bcftools, htslib
- bedtools
- seqkit
- fastqc, multiqc
Utilities:
- git, curl, wget, unzip, pigz, parallel, tree
From the repository root:
docker build -t bio-dev:step2 -f env/dev/Dockerfile env/devRun an interactive shell:
docker run --rm -it bio-dev:step2The container starts with the work environment activated:
(work) mambauser@container:/work$ python --version
Python 3.11.14Assume a project structure like:
TR001/
├── Data/
├── Scripts/
└── Work/
From inside the project root:
docker run --rm -it \
-v "$PWD":/work \
-w /work \
-p 8888:8888 \
bio-dev:step2Inside the container:
ls
# Data/ Scripts/ Work/ ...This keeps:
- Project code/data → in project repository
- Environment → managed centrally in
bioenv
Inside the container:
jupyter lab --ip=0.0.0.0 --no-browser --allow-rootThen open the displayed URL in your browser.
-
Environment separation
- Environments live in this repository
- Project-specific code lives in each project repository
-
Reproducibility
- All packages are declared in
environment.yml - Images can be versioned via Docker tags (e.g.
step2, laterv0.1.0)
- All packages are declared in
-
Portability
-
Same environment usable across:
- Windows + WSL2
- macOS (Apple Silicon)
- Linux servers
-
-
Modularity
-
Separate images for different purposes:
dev(exploration, visualization, development)rnaseq(planned: heavy pipelines like STAR/RSEM/Salmon)scrna(planned: Seurat-based analysis)
-
Currently using experimental tags during development:
bio-dev:step1
bio-dev:step2
Planned transition to stable versioning:
bio-dev:v0.1.0
bio-dev:v0.2.0
bio-dev:v1.0.0
Git tags and Docker image tags will be aligned for reproducibility.
-
env/rnaseq/- STAR, RSEM, Salmon
- samtools, bedtools, multiqc
- Possibly Nextflow pipelines
-
env/scrna/- R + Seurat
- Single-cell analysis workflows
This repository is currently intended for personal / internal research use. Add a LICENSE file if you plan to publish or share externally.