TabPFN: Add ignore_pretraining_limits to classifiers and regressors by anuprulez · Pull Request #1847 · bgruening/galaxytools

anuprulez · 2026-04-21T15:43:45Z

The parameter ignore_pretraining_limits=True removes the limit to use only 1000 samples for training.

PriorLabs/TabPFN#169
https://huggingface.co/Prior-Labs/TabPFN-v2-reg/discussions/2

Currently, the tool fails when training size > 1000 samples:

Traceback (most recent call last):
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 167, in <module>
    train_evaluate(args)
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 98, in train_evaluate
    classifier.fit(tr_features, tr_labels)
  File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 321, in wrapper
    return _safe_call_with_telemetry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 365, in _safe_call_with_telemetry
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 758, in fit
    ensemble_configs, X, y = self._initialize_dataset_preprocessing(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 642, in _initialize_dataset_preprocessing
    X, y, feature_names, n_features, original_y_name = ensure_compatible_fit_inputs(
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 75, in ensure_compatible_fit_inputs
    validate_dataset_size(
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 137, in validate_dataset_size
    _validate_num_samples_for_cpu(
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 274, in _validate_num_samples_for_cpu
    raise RuntimeError(
RuntimeError: Running on CPU with more than 1000 samples is not allowed by default due to slow performance.
To override this behavior, set the environment variable TABPFN_ALLOW_CPU_LARGE_DATASET=1 or set ignore_pretraining_limits=True.
Alternatively, consider using a GPU or the tabpfn-client API: https://github.com/PriorLabs/tabpfn-client

The parameter `ignore_pretraining_limits=True` removes the limit to use only 1000 samples for training. PriorLabs/TabPFN#169 https://huggingface.co/Prior-Labs/TabPFN-v2-reg/discussions/2 Currently, the tool fails when training size > 1000 samples: ``` Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 167, in <module> train_evaluate(args) File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 98, in train_evaluate classifier.fit(tr_features, tr_labels) File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 321, in wrapper return _safe_call_with_telemetry( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 365, in _safe_call_with_telemetry result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 758, in fit ensemble_configs, X, y = self._initialize_dataset_preprocessing( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 642, in _initialize_dataset_preprocessing X, y, feature_names, n_features, original_y_name = ensure_compatible_fit_inputs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 75, in ensure_compatible_fit_inputs validate_dataset_size( File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 137, in validate_dataset_size _validate_num_samples_for_cpu( File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 274, in _validate_num_samples_for_cpu raise RuntimeError( RuntimeError: Running on CPU with more than 1000 samples is not allowed by default due to slow performance. To override this behavior, set the environment variable TABPFN_ALLOW_CPU_LARGE_DATASET=1 or set ignore_pretraining_limits=True. Alternatively, consider using a GPU or the tabpfn-client API: https://github.com/PriorLabs/tabpfn-client ```

bgruening · 2026-04-21T17:24:41Z

Are there any disadvantages here? Should that be a user option? Or a option that we toggle based on input size?

anuprulez · 2026-04-22T06:23:16Z

Are there any disadvantages here?

It will take more time to run if > 1000 examples. So, I selected GPU option to run but it produced the error RuntimeError: Running on CPU with more than 1000 samples is not allowed by default. Not sure why it still considered the tool was running on CPUs, it executed on a GPU node (c64m384g8-n3801.bi.privat)

Should that be a user option? Or a option that we toggle based on input size?

Yes, we can add it directly in the tool based on the user's training data (toggle in the script).

if X.shape[0] <= 1000
    TabPFNClassifier()
else:
    TabPFNClassifier(ignore_pretraining_limits=True)

anuprulez · 2026-04-22T12:38:04Z

Can someone add skip-version-check label to allow tests to pass? I don't have necessary permissions. Thanks a lot!

Linting repository /home/runner/work/galaxytools/galaxytools/tools/tabpfn
.. INFO: Included files all found.
.. INFO: No tool_dependencies.xml, skipping.
.. INFO: No tool_dependencies.xml, skipping.
.. INFO: No tool_dependencies.xml, skipping.
.. INFO: No repository_dependencies.xml, skipping.
.. INFO: .shed.yml found and appears to be valid YAML.
.. INFO: No README found, skipping.
.. INFO: No tool_dependencies.xml, skipping.
+Linting tool /home/runner/work/galaxytools/galaxytools/tools/tabpfn/tabpfn.xml
.. CHECK (TestsNoValid): 8 test(s) found.
.. INFO (OutputsNumber): 2 outputs found.
.. INFO (InputsNum): Found 9 input parameters.
.. CHECK (HelpPresent): Tool contains help section.
.. CHECK (HelpValidRST): Help contains valid reStructuredText.
.. CHECK (ToolIDValid): Tool defines an id [tabpfn].
.. CHECK (ToolNameValid): Tool defines a name [Tabular data prediction using TabPFN].
.. CHECK (ToolProfileValid): Tool specifies profile version [24.2].
.. CHECK (ToolVersionValid): Tool defines a version [7.0.0+galaxy1].
.. INFO (CommandInfo): Tool contains a command.
.. CHECK (CitationsFound): Found 1 citations.
.. INFO: URL OK https://github.com/PriorLabs/TabPFN?tab=License-1-ov-file
.. INFO: URL OK https://huggingface.co/Prior-Labs/tabpfn_2_5/blob/main/LICENSE
.. ERROR (ShedVersion): tabpfn: version 7.0.0+galaxy1 is less or equal than version of the latest installable revision 7.0.0+galaxy1
.. INFO: Found all shed metadata fields required for automated repository creation and/or updates.

anuprulez changed the title ~~Add ignore_pretraining_limits to classifiers and regressors~~ TabPFN: Add ignore_pretraining_limits to classifiers and regressors Apr 21, 2026

anuprulez and others added 2 commits April 22, 2026 09:19

Merge branch 'bgruening:master' into patch-12

4be1bb4

toggle pretraining limits based on number of samples

56bb815

Merge branch 'bgruening:master' into patch-12

bab7c7e

anuprulez closed this Apr 22, 2026

anuprulez reopened this Apr 22, 2026

bgruening added the skip-version-check label Apr 22, 2026

bgruening merged commit 5f1f7b8 into bgruening:master Apr 22, 2026
9 of 22 checks passed

anuprulez deleted the patch-12 branch April 23, 2026 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TabPFN: Add ignore_pretraining_limits to classifiers and regressors#1847

TabPFN: Add ignore_pretraining_limits to classifiers and regressors#1847
bgruening merged 4 commits intobgruening:masterfrom
anuprulez:patch-12

anuprulez commented Apr 21, 2026

Uh oh!

bgruening commented Apr 21, 2026

Uh oh!

anuprulez commented Apr 22, 2026

Uh oh!

anuprulez commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anuprulez commented Apr 21, 2026

Uh oh!

bgruening commented Apr 21, 2026

Uh oh!

anuprulez commented Apr 22, 2026

Uh oh!

anuprulez commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants