Skip to content

GPU problem#1849

Draft
bgruening wants to merge 2 commits into
masterfrom
bgruening-patch-5
Draft

GPU problem#1849
bgruening wants to merge 2 commits into
masterfrom
bgruening-patch-5

Conversation

@bgruening

Copy link
Copy Markdown
Member

No description provided.

submit_requirements: "{galaxy_group}"
request_gpus: "{gpus or 0}"
docker_run_extra_arguments: "{entity.params.get('docker_run_extra_arguments') or ''} --gpus all --env CUDA_VISIBLE_DEVICES=$_CONDOR_AssignedGPUs --env NVIDIA_VISIBLE_DEVICES=$_CONDOR_AssignedGPUs"
##docker_run_extra_arguments: "{entity.params.get('docker_run_extra_arguments') or ''} --gpus all --env CUDA_VISIBLE_DEVICES=$_CONDOR_AssignedGPUs --env NVIDIA_VISIBLE_DEVICES=$_CONDOR_AssignedGPUs"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we include this we can not overwrite settings ... and in my case the --gpu all needed to be overwritten.

mem: 12
params:
docker_run_extra_arguments: ' --gpus all --shm-size 16g --env CUDA_VISIBLE_DEVICES=$_CONDOR_AssignedGPUs '
docker_run_extra_arguments: ' --gpus "device=${NVIDIA_VISIBLE_DEVICES}" --shm-size 16g '

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to do the trick and actually puts only one GPU into the container, which makes certain things for tools inside the container easier.

This is how a container look:

root@437d5bcfdd09:/data/jwd08/main/096/130/96130318/working# env | sort
CUDA_VISIBLE_DEVICES=GPU-b2c83767
GALAXY_BIAPY_GPU_STRING=0
GALAXY_MEMORY_MB=12288
GALAXY_MEMORY_MB_PER_SLOT=12288
GALAXY_SLOTS=1
GPU_AVAILABLE=1
HOME=/data/jwd08/main/096/130/96130318/home
HOSTNAME=437d5bcfdd09
LANG=C.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/data/jwd08/main/096/130/96130318/working
SHLVL=1
TEMP=/data/jwd08/main/096/130/96130318/tmp
TERM=xterm
TMP=/data/jwd08/main/096/130/96130318/tmp
TMPDIR=/data/jwd08/main/096/130/96130318/tmp
_=/usr/bin/env
_GALAXY_JOB_HOME_DIR=/data/jwd08/main/096/130/96130318/home
_GALAXY_JOB_TMP_DIR=/data/jwd08/main/096/130/96130318/tmp
root@437d5bcfdd09:/data/jwd08/main/096/130/96130318/working# nvidia-smi 
Fri Jan 30 23:01:49 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    Off |   00000000:61:00.0 Off |                    0 |
| N/A   25C    P8             31W /  350W |      67MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

I think this is what we want for most of our tools.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add this to the GPU destinations then?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think so ...
At least when someone else can confirm or when we can test this again.

@mira-miracoli mira-miracoli Feb 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to postpone this to after the switch?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no urgency here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants