Skip to content

tak1217/bioenv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

bioenv

  • 2025.1.12
  • tak1217

Reusable Docker-based environments for reproducible bioinformatics analysis.

This repository provides curated Docker environments for bioinformatics workflows, designed to be shared across projects and machines (Windows + WSL2, macOS, Linux servers).

The goal is:

  • Reproducibility
  • Portability
  • Separation of environment and project code
  • Easy interactive use (Jupyter, bash, R, Python)

Repository structure


├── README.md
└── env
    ├── dev
    │   ├── Dockerfile
    │   └── environment.yml
    └── rnaseq

Each subdirectory under env/ represents an independent environment that can be built into its own Docker image.


Environments

bio-dev (currently available)

Purpose
Exploratory analysis and development environment for laptops and workstations.

Typical use cases:

  • Interactive data analysis (Python / R)
  • Visualization (matplotlib, seaborn, plotly, ggplot2)
  • Jupyter notebooks
  • Light bioinformatics processing (QC, file manipulation)
  • Writing reusable scripts across projects

Included software

Python stack:

  • numpy, pandas, scipy
  • matplotlib, seaborn, plotly
  • scikit-learn, statsmodels
  • tqdm, joblib, pyyaml
  • openpyxl, xlrd, nbconvert, pandoc
  • ipython, ipykernel
  • biopython

R stack:

  • r-base (4.3)
  • tidyverse
  • data.table
  • patchwork
  • r-essentials

Bioinformatics tools:

  • samtools, bcftools, htslib
  • bedtools
  • seqkit
  • fastqc, multiqc

Utilities:

  • git, curl, wget, unzip, pigz, parallel, tree

Build

From the repository root:

docker build -t bio-dev:step2 -f env/dev/Dockerfile env/dev

Basic usage

Run an interactive shell:

docker run --rm -it bio-dev:step2

The container starts with the work environment activated:

(work) mambauser@container:/work$ python --version
Python 3.11.14

Using with a project directory

Assume a project structure like:

TR001/
├── Data/
├── Scripts/
└── Work/

From inside the project root:

docker run --rm -it \
  -v "$PWD":/work \
  -w /work \
  -p 8888:8888 \
  bio-dev:step2

Inside the container:

ls
# Data/ Scripts/ Work/ ...

This keeps:

  • Project code/data → in project repository
  • Environment → managed centrally in bioenv

JupyterLab

Inside the container:

jupyter lab --ip=0.0.0.0 --no-browser --allow-root

Then open the displayed URL in your browser.


Design principles

  • Environment separation

    • Environments live in this repository
    • Project-specific code lives in each project repository
  • Reproducibility

    • All packages are declared in environment.yml
    • Images can be versioned via Docker tags (e.g. step2, later v0.1.0)
  • Portability

    • Same environment usable across:

      • Windows + WSL2
      • macOS (Apple Silicon)
      • Linux servers
  • Modularity

    • Separate images for different purposes:

      • dev (exploration, visualization, development)
      • rnaseq (planned: heavy pipelines like STAR/RSEM/Salmon)
      • scrna (planned: Seurat-based analysis)

Versioning policy

Currently using experimental tags during development:

bio-dev:step1
bio-dev:step2

Planned transition to stable versioning:

bio-dev:v0.1.0
bio-dev:v0.2.0
bio-dev:v1.0.0

Git tags and Docker image tags will be aligned for reproducibility.


Planned environments

  • env/rnaseq/

    • STAR, RSEM, Salmon
    • samtools, bedtools, multiqc
    • Possibly Nextflow pipelines
  • env/scrna/

    • R + Seurat
    • Single-cell analysis workflows

License

This repository is currently intended for personal / internal research use. Add a LICENSE file if you plan to publish or share externally.

About

Reusable Docker-based environments for reproducible bioinformatics analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors