-
Notifications
You must be signed in to change notification settings - Fork 101
fix(skore): Convert DataFrame column names to strings #2034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
auguste-probabl
merged 3 commits into
probabl-ai:main
from
waridrox:handleMultiClassDataset
Oct 7, 2025
Merged
fix(skore): Convert DataFrame column names to strings #2034
auguste-probabl
merged 3 commits into
probabl-ai:main
from
waridrox:handleMultiClassDataset
Oct 7, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
|
Thanks for your investigation and your contribution. Please add a dedicated test. |
waridrox
commented
Sep 16, 2025
waridrox
commented
Sep 16, 2025
auguste-probabl
approved these changes
Oct 7, 2025
Contributor
auguste-probabl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Simplified the tests a bit.
Contributor
Contributor
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2029
CC: @thomass-dev
When debugging, in the
data_accessor.pyfile:skore/skore/src/skore/_sklearn/_estimator/data_accessor.py
Lines 49 to 67 in eb0d6e9
This method only converts numpy arrays to DataFrames with string column names, but doesn't handle the case if DataFrames already exist but have integer type column names.
This then further passes down to the skrub functions (which expects strings), and the final error of
TypeError: cannot use a string pattern on a bytes-like objectoccurs becausesuggested_nameis an integer from the RangeIndex columns.tag_patternis a string regex pattern andre.findall(tag_pattern, suggested_name)method expects both arguments to be strings or both to be bytes-like objects.I tried to overcome this by ensuring that the DataFrame object has string column names if it already exists to avoid issues with skrub later when passing down from
data_accessor.pyfunction.Alternatively, should I simply just raise exception errors if the compute fails in subsequent steps instead of this?