TabPFN: Add ignore_pretraining_limits to classifiers and regressors#1847
Merged
bgruening merged 4 commits intobgruening:masterfrom Apr 22, 2026
Merged
TabPFN: Add ignore_pretraining_limits to classifiers and regressors#1847bgruening merged 4 commits intobgruening:masterfrom
bgruening merged 4 commits intobgruening:masterfrom
Conversation
The parameter `ignore_pretraining_limits=True` removes the limit to use only 1000 samples for training. PriorLabs/TabPFN#169 https://huggingface.co/Prior-Labs/TabPFN-v2-reg/discussions/2 Currently, the tool fails when training size > 1000 samples: ``` Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 167, in <module> train_evaluate(args) File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/bgruening/tabpfn/ed78e1448387/tabpfn/main.py", line 98, in train_evaluate classifier.fit(tr_features, tr_labels) File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner return func(*args, **kwds) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 321, in wrapper return _safe_call_with_telemetry( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn_common_utils/telemetry/core/decorators.py", line 365, in _safe_call_with_telemetry result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 758, in fit ensemble_configs, X, y = self._initialize_dataset_preprocessing( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn/classifier.py", line 642, in _initialize_dataset_preprocessing X, y, feature_names, n_features, original_y_name = ensure_compatible_fit_inputs( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 75, in ensure_compatible_fit_inputs validate_dataset_size( File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 137, in validate_dataset_size _validate_num_samples_for_cpu( File "/usr/local/lib/python3.12/site-packages/tabpfn/validation.py", line 274, in _validate_num_samples_for_cpu raise RuntimeError( RuntimeError: Running on CPU with more than 1000 samples is not allowed by default due to slow performance. To override this behavior, set the environment variable TABPFN_ALLOW_CPU_LARGE_DATASET=1 or set ignore_pretraining_limits=True. Alternatively, consider using a GPU or the tabpfn-client API: https://github.com/PriorLabs/tabpfn-client ```
Owner
|
Are there any disadvantages here? Should that be a user option? Or a option that we toggle based on input size? |
Contributor
Author
It will take more time to run if > 1000 examples. So, I selected GPU option to run but it produced the error
Yes, we can add it directly in the tool based on the user's training data (toggle in the script). |
Contributor
Author
|
Can someone add |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The parameter
ignore_pretraining_limits=Trueremoves the limit to use only 1000 samples for training.PriorLabs/TabPFN#169
https://huggingface.co/Prior-Labs/TabPFN-v2-reg/discussions/2
Currently, the tool fails when training size > 1000 samples: