Alphafold in nextflow using azure batch #6843

venkatt007 · 2026-02-18T18:21:24Z

venkatt007
Feb 18, 2026

Azure Batch + nf-core/proteinfold: AlphaFold DB Files Always Staging (Even with blobfuse2 Mounts)

Hi all,

I’m running nf-core/proteinfold (v1.1.1) on Azure Batch using the azurebatch executor in closed private network, and I’m trying to prevent the massive AlphaFold database from being staged into the Azure Blob work directory on every run.

Despite using blobfuse2 mounts and setting stageInMode = 'symlink', the pipeline continues to stage DB files into az://.../work/stage-*. Files are properly mounted in the batch node as well.

I’m looking for confirmation of expected behavior and/or best practice for this architecture.

Environment

Nextflow: 25.10.x
nf-core/proteinfold: 1.1.1
Executor: azurebatch
Containers: Docker
Azure Storage: Blob storage mounted on compute nodes via blobfuse2
Database size: Multi-terabyte AlphaFold reference DB

Azure Batch Setup

Each Batch node has blobfuse2 mounts configured at:

/mnt/batch/tasks/fsmounts/input/mnt/batch/tasks/fsmounts/results/mnt/batch/tasks/fsmounts/work

Verified with:

blobfuse2  fuse  24G  ...  /mnt/batch/tasks/fsmounts/input
blobfuse2  fuse  24G  ...  /mnt/batch/tasks/fsmounts/results
blobfuse2  fuse  24G  ...  /mnt/batch/tasks/fsmounts/work

The AlphaFold DB is located under:

/mnt/batch/tasks/fsmounts/work/alphafolddb/alphafolddb

Goal

Avoid staging/copying the AlphaFold DB into:

az:///work/stage-/...

The DB already exists on mounted storage accessible to all nodes.

Configuration Attempt

nextflow config (simplified)

process {
  executor = 'azurebatch'
  stageInMode  = 'symlink'
  stageOutMode = 'rsync'
}

workDir = '/mnt/batch/tasks/fsmounts/work/work'

fusion.enabled = false
wave.enabled = false
tower.enabled = false

docker.enabled = true

params

input: "/mnt/batch/tasks/fsmounts/input/samplesheet.csv" outdir: "/mnt/batch/tasks/fsmounts/results/test1"alphafold2_db: "/mnt/batch/tasks/fsmounts/work/alphafolddb/alphafolddb"bfd_path: "/mnt/batch/tasks/fsmounts/work/alphafolddb/alphafolddb/bfd/*" ...

Observed Behavior

Even with:

Local POSIX paths only (no az://)
stageInMode = 'symlink'
Mounted storage on all nodes

The log still shows:

FilePorter - Copying foreign file /mnt/batch/tasks/fsmounts/work/alphafolddb/...
to work dir: az:///work/stage-/...

And on interruption:

port 4: (value) bound ; channel: bfd/* port 5: (value) bound ; channel: small_bfd/* port 6: (value) bound ; channel: mgnify/* ...

So it appears that the pipeline is materializing DB glob paths as path inputs, which forces Azure Batch localization via object storage staging.

What I’ve Tried

Using blobfuse2 mounts only (no Fusion)
Using Fusion instead of blobfuse
Mounting DB inside container with containerOptions
Overriding RUN_ALPHAFOLD2 module to use DB root directly
Using both az:// and POSIX-only configurations

The staging persists as long as DB-related parameters are passed as path inputs.

My Understanding (Please Confirm)

It seems that:

Azure Batch executor requires inputs to be localized into the remote workDir.
If a process declares path inputs (e.g., path('bfd/')), Nextflow treats them as managed inputs.
On Azure Batch, this results in uploading those files into the az://work/stage- area.
blobfuse mounts do not prevent this behavior.
The only way to avoid DB staging is:

Use Fusion with az:// paths, or
Refactor the pipeline so the DB is passed as a val string (not path inputs).

Is that correct?

Questions

Is there any supported way to prevent localization of large path inputs on Azure Batch when using mounted blob storage?
Has anyone successfully run nf-core/proteinfold on private Azure Batch with multi-TB AlphaFold DBs without massive staging overhead?

Any clarification or architectural recommendations would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alphafold in nextflow using azure batch #6843

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Alphafold in nextflow using azure batch #6843

Uh oh!

Uh oh!

venkatt007 Feb 18, 2026

Replies: 0 comments

venkatt007
Feb 18, 2026