Skip to content

TabPFN: Add ignore_pretraining_limits to classifiers and regressors#1847

Merged
bgruening merged 4 commits intobgruening:masterfrom
anuprulez:patch-12
Apr 22, 2026
Merged

TabPFN: Add ignore_pretraining_limits to classifiers and regressors#1847
bgruening merged 4 commits intobgruening:masterfrom
anuprulez:patch-12

Conversation

@anuprulez
Copy link
Copy Markdown
Contributor

The parameter ignore_pretraining_limits=True removes the limit to use only 1000 samples for training.

PriorLabs/TabPFN#169
https://huggingface.co/Prior-Labs/TabPFN-v2-reg/discussions/2

Currently, the tool fails when training size > 1000 samples:

Traceback (most recent call last):
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 167, in <module>
    train_evaluate(args)
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 98, in train_evaluate
    classifier.fit(tr_features, tr_labels)
  File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 321, in wrapper
    return _safe_call_with_telemetry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 365, in _safe_call_with_telemetry
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 758, in fit
    ensemble_configs, X, y = self._initialize_dataset_preprocessing(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 642, in _initialize_dataset_preprocessing
    X, y, feature_names, n_features, original_y_name = ensure_compatible_fit_inputs(
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 75, in ensure_compatible_fit_inputs
    validate_dataset_size(
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 137, in validate_dataset_size
    _validate_num_samples_for_cpu(
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 274, in _validate_num_samples_for_cpu
    raise RuntimeError(
RuntimeError: Running on CPU with more than 1000 samples is not allowed by default due to slow performance.
To override this behavior, set the environment variable TABPFN_ALLOW_CPU_LARGE_DATASET=1 or set ignore_pretraining_limits=True.
Alternatively, consider using a GPU or the tabpfn-client API: https://github.com/PriorLabs/tabpfn-client

The parameter `ignore_pretraining_limits=True` removes the limit to use only 1000 samples for training. 

PriorLabs/TabPFN#169
https://huggingface.co/Prior-Labs/TabPFN-v2-reg/discussions/2

Currently, the tool fails when training size > 1000 samples:

```
Traceback (most recent call last):
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 167, in <module>
    train_evaluate(args)
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 98, in train_evaluate
    classifier.fit(tr_features, tr_labels)
  File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 321, in wrapper
    return _safe_call_with_telemetry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 365, in _safe_call_with_telemetry
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 758, in fit
    ensemble_configs, X, y = self._initialize_dataset_preprocessing(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 642, in _initialize_dataset_preprocessing
    X, y, feature_names, n_features, original_y_name = ensure_compatible_fit_inputs(
                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 75, in ensure_compatible_fit_inputs
    validate_dataset_size(
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 137, in validate_dataset_size
    _validate_num_samples_for_cpu(
  File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 274, in _validate_num_samples_for_cpu
    raise RuntimeError(
RuntimeError: Running on CPU with more than 1000 samples is not allowed by default due to slow performance.
To override this behavior, set the environment variable TABPFN_ALLOW_CPU_LARGE_DATASET=1 or set ignore_pretraining_limits=True.
Alternatively, consider using a GPU or the tabpfn-client API: https://github.com/PriorLabs/tabpfn-client
```
@anuprulez anuprulez changed the title Add ignore_pretraining_limits to classifiers and regressors TabPFN: Add ignore_pretraining_limits to classifiers and regressors Apr 21, 2026
@bgruening
Copy link
Copy Markdown
Owner

Are there any disadvantages here? Should that be a user option? Or a option that we toggle based on input size?

@anuprulez
Copy link
Copy Markdown
Contributor Author

Are there any disadvantages here?

It will take more time to run if > 1000 examples. So, I selected GPU option to run but it produced the error RuntimeError: Running on CPU with more than 1000 samples is not allowed by default. Not sure why it still considered the tool was running on CPUs, it executed on a GPU node (c64m384g8-n3801.bi.privat)

Should that be a user option? Or a option that we toggle based on input size?

Yes, we can add it directly in the tool based on the user's training data (toggle in the script).

if X.shape[0] <= 1000
    TabPFNClassifier()
else:
    TabPFNClassifier(ignore_pretraining_limits=True)

@anuprulez
Copy link
Copy Markdown
Contributor Author

Can someone add skip-version-check label to allow tests to pass? I don't have necessary permissions. Thanks a lot!

Linting repository /home/runner/work/galaxytools/galaxytools/tools/tabpfn
.. INFO: Included files all found.
.. INFO: No tool_dependencies.xml, skipping.
.. INFO: No tool_dependencies.xml, skipping.
.. INFO: No tool_dependencies.xml, skipping.
.. INFO: No repository_dependencies.xml, skipping.
.. INFO: .shed.yml found and appears to be valid YAML.
.. INFO: No README found, skipping.
.. INFO: No tool_dependencies.xml, skipping.
+Linting tool /home/runner/work/galaxytools/galaxytools/tools/tabpfn/tabpfn.xml
.. CHECK (TestsNoValid): 8 test(s) found.
.. INFO (OutputsNumber): 2 outputs found.
.. INFO (InputsNum): Found 9 input parameters.
.. CHECK (HelpPresent): Tool contains help section.
.. CHECK (HelpValidRST): Help contains valid reStructuredText.
.. CHECK (ToolIDValid): Tool defines an id [tabpfn].
.. CHECK (ToolNameValid): Tool defines a name [Tabular data prediction using TabPFN].
.. CHECK (ToolProfileValid): Tool specifies profile version [24.2].
.. CHECK (ToolVersionValid): Tool defines a version [7.0.0+galaxy1].
.. INFO (CommandInfo): Tool contains a command.
.. CHECK (CitationsFound): Found 1 citations.
.. INFO: URL OK https://github.com/PriorLabs/TabPFN?tab=License-1-ov-file
.. INFO: URL OK https://huggingface.co/Prior-Labs/tabpfn_2_5/blob/main/LICENSE
.. ERROR (ShedVersion): tabpfn: version 7.0.0+galaxy1 is less or equal than version of the latest installable revision 7.0.0+galaxy1
.. INFO: Found all shed metadata fields required for automated repository creation and/or updates.

@anuprulez anuprulez closed this Apr 22, 2026
@anuprulez anuprulez reopened this Apr 22, 2026
@bgruening bgruening merged commit 5f1f7b8 into bgruening:master Apr 22, 2026
9 of 22 checks passed
@anuprulez anuprulez deleted the patch-12 branch April 23, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants