-
Notifications
You must be signed in to change notification settings - Fork 263
TabPFN updates #1598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TabPFN updates #1598
Changes from 10 commits
03a2d99
64e2dd5
166f825
ed5c01d
b51f7ac
91490cd
1744d96
3cbf64e
f32b0fa
650b6c6
ded15b4
bc74fde
653a189
504325f
d745cf3
84fa20d
03e7b4c
38e0d0b
606bf12
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,119 +1,123 @@ | ||
| <?xml version="1.0"?> | ||
| <tool id="tabpfn" name="Tabular data prediction using TabPFN" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0"> | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to change the name now?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree if the tool name is not updated while keeping |
||
| <description>with PyTorch</description> | ||
| <macros> | ||
| <token name="@TOOL_VERSION@">2.0.3</token> | ||
| <token name="@VERSION_SUFFIX@">1.1</token> | ||
| </macros> | ||
| <creator> | ||
| <organization name="European Galaxy Team" url="https://galaxyproject.org/eu/" /> | ||
| <person givenName="Anup" familyName="Kumar" email="kumara@informatik.uni-freiburg.de" /> | ||
| <person givenName="Frank" familyName="Hutter" email="fh@cs.uni-freiburg.de" /> | ||
| </creator> | ||
| <requirements> | ||
| <requirement type="package" version="@TOOL_VERSION@">tabpfn</requirement> | ||
| <requirement type="package" version="2.2.2">pandas</requirement> | ||
| <requirement type="package" version="3.9.2">matplotlib</requirement> | ||
| </requirements> | ||
| <version_command>echo "@VERSION@"</version_command> | ||
| <command detect_errors="aggressive"> | ||
| <![CDATA[ | ||
| <description>with PyTorch</description> | ||
| <macros> | ||
| <token name="@TOOL_VERSION@">2.0.3</token> | ||
| <token name="@VERSION_SUFFIX@">1.2</token> | ||
| </macros> | ||
| <creator> | ||
| <organization name="European Galaxy Team" url="https://galaxyproject.org/eu/"/> | ||
| <person givenName="Anup" familyName="Kumar" email="kumara@informatik.uni-freiburg.de"/> | ||
| <person givenName="Frank" familyName="Hutter" email="fh@cs.uni-freiburg.de"/> | ||
| </creator> | ||
| <requirements> | ||
| <requirement type="package" version="@TOOL_VERSION@">tabpfn</requirement> | ||
| <requirement type="package" version="1.2.7">catboost</requirement> | ||
| </requirements> | ||
| <version_command>echo "@VERSION@"</version_command> | ||
| <command detect_errors="aggressive"><![CDATA[ | ||
| python '$__tool_directory__/main.py' | ||
| --selected_task '$selected_task' | ||
| --train_data '$train_data' | ||
| --testhaslabels '$testhaslabels' | ||
| --test_data '$test_data' | ||
| ]]> | ||
| </command> | ||
| <inputs> | ||
| <param name="selected_task" type="select" label="Select a machine learning task"> | ||
| <option value="Classification" selected="true"></option> | ||
| <option value="Regression" selected="false"></option> | ||
| </param> | ||
| <param name="train_data" type="data" format="tabular" label="Train data" help="Please provide training data for training model. It should contain labels/class/target in the last column" /> | ||
| <param name="test_data" type="data" format="tabular" label="Test data" help="Please provide test data for evaluating model. It may or may not contain labels/class/target in the last column" /> | ||
| <param name="testhaslabels" type="boolean" truevalue="haslabels" falsevalue="" checked="false" label="Does test data contain labels?" help="Set this parameter when test data contains labels" /> | ||
| </inputs> | ||
| <outputs> | ||
| <data format="tabular" name="output_predicted_data" from_work_dir="output_predicted_data" label="Predicted data"></data> | ||
| <data format="png" name="output_plot" from_work_dir="output_plot.png" label="Prediction plot on test data"> | ||
| <filter>testhaslabels is True</filter> | ||
| </data> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="1"> | ||
| <param name="selected_task" value="Classification" /> | ||
| <param name="train_data" value="classification_local_train_rows.tabular" ftype="tabular" /> | ||
| <param name="test_data" value="classification_local_test_rows.tabular" ftype="tabular" /> | ||
| <param name="testhaslabels" value="" /> | ||
| <output name="output_predicted_data"> | ||
| <assert_contents> | ||
| <has_n_columns n="42" /> | ||
| <has_n_lines n="3" /> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| <test expect_num_outputs="2"> | ||
| <param name="selected_task" value="Classification" /> | ||
| <param name="train_data" value="classification_local_train_rows.tabular" ftype="tabular" /> | ||
| <param name="test_data" value="classification_local_test_rows_labels.tabular" ftype="tabular" /> | ||
| <param name="testhaslabels" value="haslabels" /> | ||
| <output name="output_plot" file="prc_binary.png" compare="sim_size" /> | ||
| </test> | ||
| <test expect_num_outputs="2"> | ||
| <param name="selected_task" value="Classification" /> | ||
| <param name="train_data" value="train_data_multiclass.tabular" ftype="tabular" /> | ||
| <param name="test_data" value="test_data_multiclass_labels.tabular" ftype="tabular" /> | ||
| <param name="testhaslabels" value="haslabels" /> | ||
| <output name="output_plot" file="prc_multiclass.png" compare="sim_size" /> | ||
| </test> | ||
| <test expect_num_outputs="1"> | ||
| <param name="selected_task" value="Classification" /> | ||
| <param name="train_data" value="train_data_multiclass.tabular" ftype="tabular" /> | ||
| <param name="test_data" value="test_data_multiclass_nolabels.tabular" ftype="tabular" /> | ||
| <param name="testhaslabels" value="" /> | ||
| <output name="output_predicted_data"> | ||
| <assert_contents> | ||
| <has_n_columns n="11" /> | ||
| <has_n_lines n="502" /> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| <test expect_num_outputs="2"> | ||
| <param name="selected_task" value="Regression" /> | ||
| <param name="train_data" value="regression_local_train_rows.tabular" ftype="tabular" /> | ||
| <param name="test_data" value="regression_local_test_rows_labels.tabular" ftype="tabular" /> | ||
| <param name="testhaslabels" value="haslabels" /> | ||
| <output name="output_plot" file="r2_curve.png" compare="sim_size" /> | ||
| </test> | ||
| <test expect_num_outputs="1"> | ||
| <param name="selected_task" value="Regression" /> | ||
| <param name="train_data" value="regression_local_train_rows.tabular" ftype="tabular" /> | ||
| <param name="test_data" value="regression_local_test_rows.tabular" ftype="tabular" /> | ||
| <param name="testhaslabels" value="" /> | ||
| <output name="output_predicted_data"> | ||
| <assert_contents> | ||
| <has_n_columns n="14" /> | ||
| <has_n_lines n="105" /> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| </tests> | ||
| <help> | ||
| <![CDATA[ | ||
| ]]></command> | ||
| <inputs> | ||
| <param name="selected_task" type="select" label="Select a machine learning task"> | ||
| <option value="Classification" selected="true"/> | ||
| <option value="Regression" selected="false"/> | ||
| </param> | ||
| <param name="train_data" type="data" format="tabular" label="Train data" help="Please provide training data for training model. It should contain labels/class/target in the last column"/> | ||
| <param name="test_data" type="data" format="tabular" label="Test data" help="Please provide test data for evaluating model. It may or may not contain labels/class/target in the last column"/> | ||
| <param name="testhaslabels" type="boolean" truevalue="haslabels" falsevalue="" checked="false" label="Does test data contain labels?" help="Set this parameter when test data contains labels"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data format="tabular" name="output_predicted_data_TabPFN" from_work_dir="output_predicted_data_TabPFN" label="Predicted data by TabPFN"/> | ||
| <data format="tabular" name="output_predicted_data_CatBoost" from_work_dir="output_predicted_data_CatBoost" label="Predicted data by CatBoost"/> | ||
| <data format="png" name="output_plot_TabPFN" from_work_dir="output_plot_TabPFN.png" label="Prediction plot on test data TabPFN"> | ||
| <filter>testhaslabels is True</filter> | ||
| </data> | ||
| <data format="png" name="output_plot_CatBoost" from_work_dir="output_plot_CatBoost.png" label="Prediction plot on test data using CatBoost"> | ||
| <filter>testhaslabels is True</filter> | ||
| </data> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="2"> | ||
| <param name="selected_task" value="Classification"/> | ||
| <param name="train_data" value="classification_local_train_rows.tabular" ftype="tabular"/> | ||
| <param name="test_data" value="classification_local_test_rows.tabular" ftype="tabular"/> | ||
| <param name="testhaslabels" value=""/> | ||
| <output name="output_predicted_data_TabPFN"> | ||
| <assert_contents> | ||
| <has_n_columns n="42"/> | ||
| <has_n_lines n="3"/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="selected_task" value="Classification"/> | ||
| <param name="train_data" value="classification_local_train_rows.tabular" ftype="tabular"/> | ||
| <param name="test_data" value="classification_local_test_rows_labels.tabular" ftype="tabular"/> | ||
| <param name="testhaslabels" value="haslabels"/> | ||
| <output name="output_plot_TabPFN" file="prc_binary.png" compare="sim_size"/> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="selected_task" value="Classification"/> | ||
| <param name="train_data" value="train_data_multiclass.tabular" ftype="tabular"/> | ||
| <param name="test_data" value="test_data_multiclass_labels.tabular" ftype="tabular"/> | ||
| <param name="testhaslabels" value="haslabels"/> | ||
| <output name="output_plot_TabPFN" file="prc_multiclass.png" compare="sim_size"/> | ||
| </test> | ||
| <test expect_num_outputs="2"> | ||
| <param name="selected_task" value="Classification"/> | ||
| <param name="train_data" value="train_data_multiclass.tabular" ftype="tabular"/> | ||
| <param name="test_data" value="test_data_multiclass_nolabels.tabular" ftype="tabular"/> | ||
| <param name="testhaslabels" value=""/> | ||
| <output name="output_predicted_data_CatBoost"> | ||
| <assert_contents> | ||
| <has_n_columns n="11"/> | ||
| <has_n_lines n="502"/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="selected_task" value="Regression"/> | ||
| <param name="train_data" value="regression_local_train_rows.tabular" ftype="tabular"/> | ||
| <param name="test_data" value="regression_local_test_rows_labels.tabular" ftype="tabular"/> | ||
| <param name="testhaslabels" value="haslabels"/> | ||
| <output name="output_plot_TabPFN" file="r2_curve.png" compare="sim_size"/> | ||
| </test> | ||
| <test expect_num_outputs="2"> | ||
| <param name="selected_task" value="Regression"/> | ||
| <param name="train_data" value="regression_local_train_rows.tabular" ftype="tabular"/> | ||
| <param name="test_data" value="regression_local_test_rows.tabular" ftype="tabular"/> | ||
| <param name="testhaslabels" value=""/> | ||
| <output name="output_predicted_data_TabPFN"> | ||
| <assert_contents> | ||
| <has_n_columns n="14"/> | ||
| <has_n_lines n="105"/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| </tests> | ||
| <help><![CDATA[ | ||
| **What it does** | ||
|
|
||
| Classification and Regression on tabular data by TabPFN | ||
|
|
||
| **Input files** | ||
| - Training data: the training data should contain features and the last column should be the class labels. It should be in tabular format. | ||
| - Test data: the test data should also contain the same features as the training data and the last column should be the class labels if labels are avaialble. It should be in tabular format. It is not required for the test data to have labels. | ||
| - Above files show performance comparison of TabPFN with CatBoost (https://github.com/catboost/catboost). | ||
|
|
||
| **Output files** | ||
| - Predicted data along with predicted labels. | ||
| - Prediction plot (when test data has labels available). | ||
| ]]> | ||
| </help> | ||
| <citations> | ||
| <citation type="doi">10.1038/s41586-024-08328-6</citation> | ||
| </citations> | ||
|
|
||
| **License** | ||
| - TabPFN is available under an open source license (https://github.com/PriorLabs/TabPFN?tab=License-1-ov-file) that combines Apache with a LLama-like attribution clause. It requires you to prominently display "Built with TabPFN" when you use a pipeline including it in production. | ||
| ]]></help> | ||
| <citations> | ||
| <citation type="doi">10.1038/s41586-024-08328-6</citation> | ||
| </citations> | ||
| </tool> | ||
Uh oh!
There was an error while loading. Please reload this page.