Make flexynesis GPU version dependent and add docker user 999#2116
Make flexynesis GPU version dependent and add docker user 999#2116mira-miracoli wants to merge 4 commits into
Conversation
|
|
||
| toolshed.g2.bx.psu.edu/repos/bgruening/flexynesis/flexynesis/.*: | ||
| rules: | ||
| - if: helpers.tool_version_eq(tool, '1.1.11+galaxy0') |
There was a problem hiding this comment.
Should be greater than equal I think, all future version need Docker as well I guess
@nilchia only one specific option needs GPU not all flexynesis jobs needs a GPU correct?
| @@ -821,7 +825,6 @@ tools: | |||
| retval | |||
There was a problem hiding this comment.
| - if: helpers.tool_version_gte(tool, '1.1.11+galaxy0') | |
| params: | |
| docker_run_extra_arguments: --user 999 | |
| - id: flexynesis_gnn_high_mem | |
| if: | | |
| retval = False | |
| if helpers.tool_version_gte(tool, '1.1.11+galaxy0'): | |
| options = job.get_param_values(app) | |
| if options: | |
| training_type = options.get('training_type', {}) | |
| if training_type and isinstance(training_type, dict): | |
| model_select = training_type.get('model_class', {}) | |
| if model_select and isinstance(model_select, dict): | |
| retval = model_select.get('model_class') == 'GNN' | |
| retval | |
| gpu: 1 |
There was a problem hiding this comment.
@anuprulez also suggested to specify the GPU like the one here:
https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/files/galaxy/tpv/tools.yml#L341
There was a problem hiding this comment.
Looks good I think, the parabricks tool needs to exclude V100 GPUs, is there also a GPU model or cuda version limitation for your tool?
There was a problem hiding this comment.
The tool itseld does not have limitation. I think it is just the Memory limitation.
I ran it once on galaxy and it failed becase of low memory:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.52 GiB. GPU 0 has a total capacity of 14.56 GiB of which 2.92 GiB is free. Including non-PyTorch memory, this process has 11.64 GiB memory in use. Of the allocated memory 7.82 GiB is allocated by PyTorch, and 3.69 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
There was a problem hiding this comment.
toolshed.g2.bx.psu.edu/repos/bgruening/flexynesis/flexynesis/.*:
rules:
- if: helpers.tool_version_gte(tool, '1.1.11+galaxy0')
params:
docker_run_extra_arguments: --user 999
- id: flexynesis_gnn_high_mem
if: |
retval = False
if helpers.tool_version_gte(tool, '1.1.11+galaxy0'):
options = job.get_param_values(app)
if options:
training_type = options.get('training_type', {})
if training_type and isinstance(training_type, dict):
model_select = training_type.get('model_class', {})
if model_select and isinstance(model_select, dict):
retval = model_select.get('model_class') == 'GNN'
retval
gpu: 1
cores: 20
mem: 100
context:
exclude_gpu_models: ["Tesla T4"] # T4 GPUs have only 16 GB of memory, which is not enough for the GNN model
So excluding T4 should fix it.
There was a problem hiding this comment.
The parameters are correct.
I think the problem is at gpu:1
I think it should be gpus:1
I am also not sure if include_gpu_models is also needed?
0709aea to
b047a42
Compare

deployed and working ~ somewhat (@nilchia could you try with correct parameters ?)