Skip to content

Make flexynesis GPU version dependent and add docker user 999#2116

Open
mira-miracoli wants to merge 4 commits into
usegalaxy-eu:masterfrom
mira-miracoli:flexynesis-user
Open

Make flexynesis GPU version dependent and add docker user 999#2116
mira-miracoli wants to merge 4 commits into
usegalaxy-eu:masterfrom
mira-miracoli:flexynesis-user

Conversation

@mira-miracoli

@mira-miracoli mira-miracoli commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

deployed and working ~ somewhat (@nilchia could you try with correct parameters ?)

Comment thread files/galaxy/tpv/tools.yml Outdated

toolshed.g2.bx.psu.edu/repos/bgruening/flexynesis/flexynesis/.*:
rules:
- if: helpers.tool_version_eq(tool, '1.1.11+galaxy0')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be greater than equal I think, all future version need Docker as well I guess

@nilchia only one specific option needs GPU not all flexynesis jobs needs a GPU correct?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. Only "GNN"

Comment thread files/galaxy/tpv/tools.yml Outdated
Comment on lines 811 to 825
@@ -821,7 +825,6 @@ tools:
retval

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if: helpers.tool_version_gte(tool, '1.1.11+galaxy0')
params:
docker_run_extra_arguments: --user 999
- id: flexynesis_gnn_high_mem
if: |
retval = False
if helpers.tool_version_gte(tool, '1.1.11+galaxy0'):
options = job.get_param_values(app)
if options:
training_type = options.get('training_type', {})
if training_type and isinstance(training_type, dict):
model_select = training_type.get('model_class', {})
if model_select and isinstance(model_select, dict):
retval = model_select.get('model_class') == 'GNN'
retval
gpu: 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sth like this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I think, the parabricks tool needs to exclude V100 GPUs, is there also a GPU model or cuda version limitation for your tool?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool itseld does not have limitation. I think it is just the Memory limitation.

I ran it once on galaxy and it failed becase of low memory:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.52 GiB. GPU 0 has a total capacity of 14.56 GiB of which 2.92 GiB is free. Including non-PyTorch memory, this process has 11.64 GiB memory in use. Of the allocated memory 7.82 GiB is allocated by PyTorch, and 3.69 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  toolshed.g2.bx.psu.edu/repos/bgruening/flexynesis/flexynesis/.*:
    rules:
      - if: helpers.tool_version_gte(tool, '1.1.11+galaxy0')
        params:
          docker_run_extra_arguments: --user 999
      - id: flexynesis_gnn_high_mem
        if: |
          retval = False
          if helpers.tool_version_gte(tool, '1.1.11+galaxy0'):
            options = job.get_param_values(app)
            if options:
              training_type = options.get('training_type', {})
              if training_type and isinstance(training_type, dict):
                model_select = training_type.get('model_class', {})
                if model_select and isinstance(model_select, dict):
                  retval = model_select.get('model_class') == 'GNN'
          retval
        gpu: 1
        cores: 20
        mem: 100
        context:
          exclude_gpu_models: ["Tesla T4"] # T4 GPUs have only 16 GB of memory, which is not enough for the GNN model

So excluding T4 should fix it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested your rule, but I am not sure if I chose the right parameters, however, my job did not trigger it:
image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters are correct.

I think the problem is at gpu:1
I think it should be gpus:1

I am also not sure if include_gpu_models is also needed?

Comment thread files/galaxy/tpv/tools.yml Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants