fix(skore): Convert DataFrame column names to strings #2034

waridrox · 2025-09-15T07:25:56Z

Closes #2029
CC: @thomass-dev

When debugging, in the data_accessor.py file:

skore/skore/src/skore/_sklearn/_estimator/data_accessor.py

Lines 49 to 67 in eb0d6e9

    
           if X is None: 
        
               raise ValueError(err_msg.format(f"X_{dataset}", data_source)) 
        
           elif not sbd.is_dataframe(X): 
        
               X = pd.DataFrame(X, columns=[f"Feature {i}" for i in range(X.shape[1])]) 
        
           if with_y: 
        
               if y is None: 
        
                   raise ValueError(err_msg.format(f"y_{dataset}", data_source)) 
        
               if isinstance(y, pd.Series) and y.name is not None: 
        
                   y = y.to_frame() 
        
               elif not sbd.is_dataframe(y): 
        
                   if y.ndim == 1: 
        
                       columns = ["Target"] 
        
                   else: 
        
                       columns = [f"Target {i}" for i in range(y.shape[1])] 
        
                   y = pd.DataFrame(y, columns=columns) 
        
           return X, y

This method only converts numpy arrays to DataFrames with string column names, but doesn't handle the case if DataFrames already exist but have integer type column names.

This then further passes down to the skrub functions (which expects strings), and the final error of TypeError: cannot use a string pattern on a bytes-like object occurs because suggested_name is an integer from the RangeIndex columns.

def _get_new_name(suggested_name, forbidden_names):
    """Get a new name for a column."""
    # .......    
    tags = re.findall(tag_pattern, suggested_name)  # <==== 
    # .......

tag_pattern is a string regex pattern and re.findall(tag_pattern, suggested_name) method expects both arguments to be strings or both to be bytes-like objects.

I tried to overcome this by ensuring that the DataFrame object has string column names if it already exists to avoid issues with skrub later when passing down from data_accessor.py function.

Alternatively, should I simply just raise exception errors if the compute fails in subsequent steps instead of this?

thomass-dev · 2025-09-15T07:32:03Z

Thanks for your investigation and your contribution. Please add a dedicated test.

skore/tests/unit/reports/estimator/data/test_accessor.py

auguste-probabl

Thanks for this!

Simplified the tests a bit.

github-actions · 2025-10-07T14:00:10Z

Coverage Report for skore/

File	Stmts	Miss	Cover	Missing
skore/src/skore
__init__.py	23	0	100%
_config.py	31	0	100%
exceptions.py	4	4	0%	4, 15, 19, 23
skore/src/skore/_sklearn
__init__.py	6	0	100%
_base.py	198	14	92%	45, 58, 127, 130, 183, 186–187, 189–192, 225, 228–229
find_ml_task.py	61	0	100%
types.py	27	1	96%	28
skore/src/skore/_sklearn/_comparison
__init__.py	7	0	100%
feature_importance_accessor.py	39	2	94%	90, 109
metrics_accessor.py	178	3	98%	173, 253, 1215
report.py	107	0	100%
utils.py	54	0	100%
skore/src/skore/_sklearn/_cross_validation
__init__.py	9	0	100%
data_accessor.py	45	3	93%	128, 131, 134
feature_importance_accessor.py	24	0	100%
metrics_accessor.py	182	1	99%	244
report.py	135	1	99%	487
skore/src/skore/_sklearn/_estimator
__init__.py	9	0	100%
data_accessor.py	66	1	98%	76
feature_importance_accessor.py	144	2	98%	223–224
metrics_accessor.py	356	8	97%	200, 202, 209, 300, 369, 373, 388, 423
report.py	167	2	98%	448–449
skore/src/skore/_sklearn/_plot
__init__.py	3	0	100%
base.py	102	7	93%	61–62, 200, 224–226, 230
utils.py	77	0	100%
skore/src/skore/_sklearn/_plot/data
__init__.py	2	0	100%
table_report.py	185	1	99%	706
skore/src/skore/_sklearn/_plot/metrics
__init__.py	6	0	100%
confusion_matrix.py	70	4	94%	92, 100, 122, 230
feature_importance_display.py	67	21	68%	88, 121–122, 124, 142–146, 148–155, 158–160, 162
metrics_summary_display.py	8	0	100%
precision_recall_curve.py	280	5	98%	455, 555, 559, 619, 751
prediction_error.py	227	5	97%	179, 186, 422, 505, 705
roc_curve.py	292	8	97%	385, 508, 513, 614, 619, 623, 692, 832
skore/src/skore/_sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	58	0	100%
skore/src/skore/_sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	19	1	94%	83
high_class_imbalance_warning.py	20	0	100%
random_state_unset_warning.py	10	0	100%
shuffle_true_warning.py	9	0	100%
stratify_is_set_warning.py	10	0	100%
time_based_column_warning.py	21	0	100%
train_test_split_warning.py	3	0	100%
skore/src/skore/_utils
__init__.py	6	2	66%	8, 13
_accessor.py	90	3	96%	34, 146, 190
_environment.py	27	0	100%
_fixes.py	8	0	100%
_index.py	5	0	100%
_logger.py	22	4	81%	15–17, 19
_measure_time.py	10	0	100%
_parallel.py	38	3	92%	23, 33, 124
_patch.py	13	5	61%	21, 23–24, 35, 37
_progress_bar.py	46	0	100%
_repr_html.py	8	0	100%
_show_versions.py	38	0	100%
_testing.py	55	0	100%
skore/src/skore/project
__init__.py	2	0	100%
project.py	48	0	100%
summary.py	75	1	98%	120
widget.py	165	6	96%	436, 439–441, 525–526
TOTAL	4005	118	97%

Tests	Skipped	Failures	Errors	Time
1074	5 💤	0 ❌	0 🔥	4m 21s ⏱️

github-actions · 2025-10-07T14:05:03Z

Documentation preview @ b33ad33

fix: convert DataFrame column names to string type

eca2659

github-actions bot assigned waridrox Sep 15, 2025

test: Added tests to assert different column types in DataFrame

c2c8de7

waridrox commented Sep 16, 2025

View reviewed changes

skore/tests/unit/reports/estimator/data/test_accessor.py Outdated Show resolved Hide resolved

waridrox commented Sep 16, 2025

View reviewed changes

skore/tests/unit/reports/estimator/data/test_accessor.py Outdated Show resolved Hide resolved

thomass-dev mentioned this pull request Sep 22, 2025

fix(skore): Data analyze fails when called with multi class dataset #2029

Closed

auguste-probabl mentioned this pull request Sep 26, 2025

fix(data_accessor): Correct behaviour when data are DataFrames without column names #2052

Closed

test: Simplify tests

b33ad33

auguste-probabl approved these changes Oct 7, 2025

View reviewed changes

auguste-probabl changed the title ~~fix(skore): convert DataFrame column names to string type~~ fix(skore): Convert DataFrame column names to strings Oct 7, 2025

auguste-probabl enabled auto-merge October 7, 2025 13:57

auguste-probabl added this pull request to the merge queue Oct 7, 2025

Merged via the queue into probabl-ai:main with commit 6e7292b Oct 7, 2025
32 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(skore): Convert DataFrame column names to strings #2034

fix(skore): Convert DataFrame column names to strings #2034

Uh oh!

waridrox commented Sep 15, 2025 •

edited

Loading

Uh oh!

thomass-dev commented Sep 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

auguste-probabl left a comment

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if X is None:
	raise ValueError(err_msg.format(f"X_{dataset}", data_source))
	elif not sbd.is_dataframe(X):
	X = pd.DataFrame(X, columns=[f"Feature {i}" for i in range(X.shape[1])])

	if with_y:
	if y is None:
	raise ValueError(err_msg.format(f"y_{dataset}", data_source))

	if isinstance(y, pd.Series) and y.name is not None:
	y = y.to_frame()
	elif not sbd.is_dataframe(y):
	if y.ndim == 1:
	columns = ["Target"]
	else:
	columns = [f"Target {i}" for i in range(y.shape[1])]
	y = pd.DataFrame(y, columns=columns)

	return X, y

fix(skore): Convert DataFrame column names to strings #2034

fix(skore): Convert DataFrame column names to strings #2034

Uh oh!

Conversation

waridrox commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomass-dev commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

auguste-probabl left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

github-actions bot commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

waridrox commented Sep 15, 2025 •

edited

Loading

thomass-dev commented Sep 15, 2025 •

edited

Loading