Skip to content

Commit dabeee8

Browse files
dpark01claude
andcommitted
Add baseimage dependencies and enhance Docker images
- Create docker/requirements/baseimage.txt with general utilities: miniwdl, udocker, awscli, google-cloud-storage, csvkit, jq, parallel, pigz, unzip, zstd - Move general utilities from core.txt to baseimage.txt - Add seaborn to core.txt for data visualization - Update Dockerfile.baseimage to install from baseimage.txt - Set MINIWDL__SCHEDULER__CONTAINER_BACKEND=udocker env var - Update Dockerfile.core to install baseimage.txt + core.txt together - Add Python dependencies to pyproject.toml for pip-based installs - Update AGENT_CONTEXT.md with learnings (udocker quirks, qsv x86-only) Note: qsv excluded as it lacks ARM64 support Co-Authored-By: Claude <noreply@anthropic.com>
1 parent eb8576d commit dabeee8

File tree

6 files changed

+85
-16
lines changed

6 files changed

+85
-16
lines changed

AGENT_CONTEXT.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,16 @@ For tools without ARM64 support (novoalign, mvicuna):
361361

362362
These packages are in bioconda but only have x86_64 builds. Use exact versions (e.g., `novoalign=3.09.04`, `mvicuna=1.0`) since newer versions may not exist.
363363

364+
### udocker Quirks
365+
366+
- udocker refuses to run as root by default. Use `--allow-root` flag for Docker build verification
367+
- In Dockerfile, use `udocker --allow-root version` to verify installation
368+
- The `MINIWDL__SCHEDULER__CONTAINER_BACKEND=udocker` environment variable tells miniwdl to use udocker
369+
370+
### qsv is x86-Only
371+
372+
The `qsv` package (fast CSV tool) is only available for linux-64, osx-64, win-64 - no ARM64 build. For multi-arch Docker builds, either skip it or add to an x86-only requirements file.
373+
364374
### PrexistingUnixCommand is the Only InstallMethod
365375

366376
Since all tools are installed via conda, `PrexistingUnixCommand` is the only `InstallMethod` subclass needed. It just checks if the tool exists in PATH.
@@ -418,7 +428,7 @@ When working on this migration:
418428
## Next Steps
419429

420430
1. Read `MONOREPO_IMPLEMENTATION_PLAN.md` for the detailed task list
421-
2. **Phase 2**: Migrate viral-core with git history preservation using `git filter-repo`
431+
2. **Phase 3a**: Migrate viral-assemble with git history preservation using `git filter-repo`
422432
3. Work through phases sequentially
423433
4. Verify each phase before moving to the next
424434

@@ -454,6 +464,15 @@ When working on this migration:
454464
- Deleted legacy viral-core CI workflow (build.yml) that was accidentally merged with git history
455465
- Docker build verified with all module imports working
456466

467+
### Baseimage Enhancements (Post-Phase 2)
468+
- Created `docker/requirements/baseimage.txt` with general utilities
469+
- Added to baseimage: miniwdl, udocker, awscli, google-cloud-storage, csvkit, jq, parallel, pigz, unzip, zstd
470+
- Set `MINIWDL__SCHEDULER__CONTAINER_BACKEND=udocker` environment variable
471+
- Moved general utilities from core.txt to baseimage.txt
472+
- Added seaborn to core.txt for data visualization
473+
- Updated pyproject.toml with Python dependencies for pip-based installs
474+
- Note: qsv is x86-only (no ARM64 build), excluded for multi-arch support
475+
457476
## Reference Repositories
458477

459478
When implementing Docker patterns, refer to:

docker/Dockerfile.baseimage

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
# viral-ngs baseimage
2-
# Base image with micromamba and Python for viral-ngs tools
2+
# Base image with micromamba, Python, and general utilities for viral-ngs tools
33
#
4-
# This provides a minimal foundation with:
4+
# This provides a foundation with:
55
# - micromamba (conda-compatible package manager)
66
# - Python 3.12
77
# - conda/mamba symlinks for compatibility
8+
# - General utilities (miniwdl, udocker, cloud CLIs, etc.)
89
#
910
# Derivative images (core, classify, assemble, phylo) build on this.
1011

@@ -56,10 +57,23 @@ ENV VIRAL_NGS_PATH="/opt/viral-ngs/source" \
5657
PYTHONPATH="/opt/viral-ngs/source" \
5758
PATH="/opt/conda/bin:/usr/local/bin:$PATH"
5859

60+
# Configure miniwdl to use udocker as container backend
61+
ENV MINIWDL__SCHEDULER__CONTAINER_BACKEND="udocker"
62+
63+
# Copy requirements and install script
64+
COPY docker/requirements/baseimage.txt /tmp/requirements/
65+
COPY docker/install-conda-deps.sh /tmp/
66+
67+
# Install baseimage dependencies (miniwdl, udocker, cloud CLIs, utilities)
68+
RUN /tmp/install-conda-deps.sh /tmp/requirements/baseimage.txt && \
69+
rm -rf /tmp/requirements /tmp/install-conda-deps.sh
70+
5971
WORKDIR /opt/viral-ngs/source
6072

6173
# Verify installation
6274
RUN python --version && \
63-
micromamba --version
75+
micromamba --version && \
76+
miniwdl --version && \
77+
udocker --allow-root version
6478

6579
CMD ["/bin/bash"]

docker/Dockerfile.core

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,13 @@ LABEL org.opencontainers.image.source="https://github.com/broadinstitute/viral-n
1818
ARG MAMBA_DOCKERFILE_ACTIVATE=1
1919

2020
# Copy requirements and dependency installation script
21-
COPY docker/requirements/core.txt docker/requirements/core-x86.txt /tmp/requirements/
21+
COPY docker/requirements/baseimage.txt docker/requirements/core.txt docker/requirements/core-x86.txt /tmp/requirements/
2222
COPY docker/install-conda-deps.sh /tmp/
2323

2424
# Install conda dependencies (bioinformatics tools)
25-
# First install all-architecture packages, then x86-only packages (skipped on ARM)
26-
RUN /tmp/install-conda-deps.sh /tmp/requirements/core.txt && \
25+
# Install baseimage.txt + core.txt together in single resolver call for proper dependency resolution
26+
# Then install x86-only packages (skipped on ARM)
27+
RUN /tmp/install-conda-deps.sh /tmp/requirements/baseimage.txt /tmp/requirements/core.txt && \
2728
/tmp/install-conda-deps.sh --x86-only /tmp/requirements/core-x86.txt
2829

2930
# Copy source code

docker/requirements/baseimage.txt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Conda dependencies for viral-ngs baseimage
2+
#
3+
# These are general-purpose tools installed in the base image that are
4+
# useful across all derivative images (core, classify, assemble, phylo).
5+
#
6+
# All packages are from conda-forge unless noted.
7+
8+
# WDL local execution (miniwdl + udocker for containerized task execution)
9+
miniwdl>=1.11.1
10+
udocker>=1.3.16
11+
12+
# Cloud CLIs
13+
awscli
14+
google-cloud-storage
15+
16+
# General utilities
17+
csvkit>=1.0.4
18+
jq>=1.6
19+
parallel>=20190922
20+
pigz>=2.4
21+
unzip>=6.0
22+
zstd>=1.5.7
23+
24+
# Note: qsv is x86-only (no linux-aarch64 build), skipping for multi-arch support

docker/requirements/core.txt

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
# installation and better binary compatibility. This includes packages
55
# with C extensions that would otherwise require compilation.
66
#
7+
# Note: General utilities (parallel, pigz, jq, etc.) are in baseimage.txt
78
# Note: x86-only packages (novoalign, mvicuna) are in core-x86.txt
89

910
# Python runtime dependencies
@@ -18,6 +19,7 @@ pybedtools>=0.7.10
1819
pysam>=0.23.0
1920
pyyaml
2021
scipy
22+
seaborn
2123
zstandard>=0.23.0
2224

2325
# Bioinformatics tools
@@ -26,22 +28,16 @@ bcftools>=1.10
2628
bedtools>=2.29.2
2729
bwa>=0.7.17
2830
cd-hit>=4.6.8
29-
csvkit>=1.0.4
3031
fastqc>=0.11.7
3132
fgbio>=2.2.1
3233
gatk=3.8
3334
illumina-interop=1.5.0
34-
jq>=1.6
3535
lbzip2>=2.5
3636
lz4-c>=1.8.3
3737
minimap2>=2.17
38-
parallel>=20190922
3938
picard=2.25.6
40-
pigz>=2.4
4139
prinseq>=0.20.4
4240
sambamba>=1.0.1
4341
samtools>=1.21
4442
splitcode>=0.31.4
4543
trimmomatic>=0.38
46-
unzip>=6.0
47-
zstd>=1.5.7

pyproject.toml

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,24 @@ classifiers = [
2424
"Topic :: Scientific/Engineering :: Bio-Informatics",
2525
]
2626
keywords = ["bioinformatics", "genomics", "viral", "ngs", "sequencing"]
27-
# All runtime dependencies are installed via conda for faster installation
28-
# and better binary compatibility. See docker/requirements/core.txt
29-
dependencies = []
27+
# Python dependencies for pip-based installs
28+
# Note: For Docker-based usage, all deps (including bioinformatics tools) are
29+
# installed via conda. See docker/requirements/core.txt
30+
dependencies = [
31+
"arrow>=0.12.1",
32+
"biopython>=1.72",
33+
"jinja2>=2.11.3",
34+
"lz4>=2.2.1",
35+
"matplotlib>=2.2.4",
36+
"pandas",
37+
"psutil>=6.1.0",
38+
"pybedtools>=0.7.10",
39+
"pysam>=0.23.0",
40+
"pyyaml",
41+
"scipy",
42+
"seaborn",
43+
"zstandard>=0.23.0",
44+
]
3045

3146
[project.optional-dependencies]
3247
dev = [

0 commit comments

Comments
 (0)