Skip to content

Update flexynesis#1679

Merged
bgruening merged 20 commits intobgruening:masterfrom
nilchia:update_flexynesis
Oct 16, 2025
Merged

Update flexynesis#1679
bgruening merged 20 commits intobgruening:masterfrom
nilchia:update_flexynesis

Conversation

@nilchia
Copy link
Copy Markdown
Collaborator

@nilchia nilchia commented Sep 16, 2025

This PR updates flexynesis to the newest version.
Features:

  1. faster module loading
  2. saving model as safetensors

@nilchia nilchia marked this pull request as ready for review September 16, 2025 19:38
@nilchia nilchia requested a review from bgruening September 17, 2025 07:18
@nilchia nilchia marked this pull request as draft September 17, 2025 07:48
@nilchia
Copy link
Copy Markdown
Collaborator Author

nilchia commented Sep 17, 2025

Still work in process. The inference mode should be added. Please don't review :)

@nilchia nilchia marked this pull request as ready for review October 14, 2025 13:39
Comment thread tools/flexynesis/flexynesis.xml Outdated
Comment thread tools/flexynesis/flexynesis.xml Outdated
</assert_contents>
</element>
</output_collection>
<output name="model" ftype="safetensors" file="model_1.safetensors" compare="sim_size"/>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other way to test this apart from sim_size?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some asserts? (Easiest: size?)

Comment thread tools/flexynesis/flexynesis.xml Outdated
<collection name="results" type="list" label="${tool.name} on ${on_string}: results">
<discover_datasets pattern="(?P&lt;name&gt;.+)\.tabular$" format="tabular" directory="output"/>
</collection>
<data name="model" format="safetensors" from_work_dir="output/job.final_model.safetensors" label="${tool.name} on ${on_string}: trained_model" />
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always needed? Should we make this optional?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we can make it optional.
But I think the optional should be set to True as default.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for creating a boolean parameter and making this output optional

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we can make it optional. But I think the optional should be set to True as default.

This would mean your safetensor model will always be a output dataset unless the user unselects it. Do we want that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean your safetensor model will always be a output dataset unless the user unselects it. Do we want that?
Yes, it will reduce the unnecessary job repetition. Imagine you are training a big data which takes hours to finish, and then you realize the trained model is not saved.

Especially in the next updates, where it is possible to use the model to integrate new data.

For me, the model is not a byproduct.

Comment thread tools/flexynesis/flexynesis.xml Outdated
</assert_contents>
</element>
</output_collection>
<output name="model" ftype="safetensors" file="model_18.safetensors" compare="sim_size"/>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to test this output here, I guess one of those outputs is enough? sim_size is not a very reliable test

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of other test for this data.
The size of them are different though.

Comment thread tools/flexynesis/macros.xml Outdated
<token name="@VERSION_SUFFIX@">3</token>
<token name="@TOOL_VERSION@">1.1.1</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">24.1</token>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25.0 ... if your datatype gets merged?

$use_cv
$evaluate_baseline_performance
--feature_importance_method $feature_importance_method
\${GALAXY_FLEXYNESIS_EXTRA_ARGUMENTS}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove this? We can now not set the --device to GPU

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍

Comment thread tools/flexynesis/flexynesis.xml Outdated
</assert_contents>
</element>
</output_collection>
<output name="model" ftype="safetensors" file="model_6.safetensors" compare="sim_size"/>
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use sim-size maybe assert a size, this way we do not need to ship the safetonsor files in the git repo here.

@nilchia nilchia closed this Oct 15, 2025
@nilchia nilchia reopened this Oct 15, 2025
@nilchia nilchia requested a review from SaimMomin12 October 15, 2025 22:59
@nilchia nilchia requested a review from bgruening October 15, 2025 22:59
Comment thread tools/flexynesis/flexynesis.xml Outdated
<output name="model" ftype="safetensors">
<assert_contents>
<has_size size="107104"/>
<has_size size="20"/>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably sim_size was a better option.
Why is this passing?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passes with both size="107104" and size="20"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know for sure, but I assume because of https://github.com/bgruening/galaxytools/actions/runs/18556838587/job/52896358516?pr=1679#step:7:494
basically with the surrounding ftype check not being executable, the inner check won't run?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this locally and it passed:

                <element name="job.stats">
                    <assert_contents>
                        <has_size size="123456789"/>
                        <has_text_matching expression="DirectPred\tErlotinib\tnumerical\tmse\t"/>
                        <has_text_matching expression="DirectPred\tErlotinib\tnumerical\tr2\t"/>
                        <has_text_matching expression="DirectPred\tErlotinib\tnumerical\tpearson_corr\t"/>
                    </assert_contents>
                </element>

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. If you think its a bug, please look at one of those Galaxy internal test:

https://github.com/search?q=repo%3Agalaxyproject%2Fgalaxy+assert_contents+path%3A%2F%5Etest%5C%2F%2F+size+language%3AXML&type=code&l=XML

Try to replicated it and create a test that clearly shows its not considerd. If people have a test to work against it, they can fix it.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, now I see it ... its not size ... its value :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening bgruening merged commit bcee554 into bgruening:master Oct 16, 2025
12 of 14 checks passed
@nilchia nilchia deleted the update_flexynesis branch October 16, 2025 20:16
@SaimMomin12
Copy link
Copy Markdown
Collaborator

Deployment step was skipped in CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants