Skip to content

Helper function to sanitize tables#935

Merged
LucaMarconato merged 16 commits intomainfrom
bugfix/issue934-utils-function-to-sanitise-obs-columns
May 27, 2025
Merged

Helper function to sanitize tables#935
LucaMarconato merged 16 commits intomainfrom
bugfix/issue934-utils-function-to-sanitise-obs-columns

Conversation

@timtreis
Copy link
Copy Markdown
Member

No description provided.

@timtreis timtreis linked an issue May 18, 2025 that may be closed by this pull request
@codecov
Copy link
Copy Markdown

codecov bot commented May 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.14%. Comparing base (eb5a202) to head (b994960).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #935      +/-   ##
==========================================
+ Coverage   92.11%   92.14%   +0.03%     
==========================================
  Files          48       48              
  Lines        7429     7473      +44     
==========================================
+ Hits         6843     6886      +43     
- Misses        586      587       +1     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 96.55% <100.00%> (+0.12%) ⬆️
src/spatialdata/_core/_utils.py 100.00% <100.00%> (ø)
src/spatialdata/_core/validation.py 100.00% <100.00%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis timtreis requested a review from Copilot May 18, 2025 16:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces helper functions to sanitize table keys in AnnData objects and updates validation error messaging to guide users in fixing naming issues.

  • Adds sanitize_name and sanitize_table in src/spatialdata/_utils.py
  • Provides comprehensive tests in tests/utils/test_sanitize.py to validate various sanitization scenarios
  • Updates error messages in src/spatialdata/_core/validation.py to include a remediation hint

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/utils/test_sanitize.py Adds tests covering multiple sanitization edge cases
src/spatialdata/_utils.py Implements core sanitization logic for AnnData objects
src/spatialdata/_core/validation.py Updates ValidationError messaging with sanitization suggestion
Comments suppressed due to low confidence (1)

src/spatialdata/_core/validation.py:382

  • The error message now suggests running 'spatialdata.utils.sanitize_table(adata)'. Please verify that the module reference accurately reflects the actual location or re-export of the sanitize_table function.
raise ValidationError(

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@timtreis timtreis requested a review from LucaMarconato May 18, 2025 16:47
Copy link
Copy Markdown
Member

@LucaMarconato LucaMarconato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tim for implementing this functionality! I corrected an edge case, made the docstring more explicit (by referencing the naming restrictions #707) and added the functions to the docs.

Please give a review of my changes and feel free to merge.

@LucaMarconato
Copy link
Copy Markdown
Member

There was an edge case around the user of lstrip('_'). In particular:
-_valid is actually a valid string and

  • _index is valid when it's not a obs or var column

Also, now I sanitize __ to _ (and the same for __________, as I basically remove one _ at the time until the string doesn't start with __ anymore).


def sanitize_table(data: AnnData, inplace: bool = True) -> AnnData | None:
"""
Sanitize all keys in an AnnData table to comply with SpatialData naming rules.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe have a doc on these naming rules unless we had one already. WDYT @LucaMarconato

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at some point we will make a doc (when working on better documenting the file format). For the moment I added a link to #707.

Copy link
Copy Markdown
Collaborator

@melonora melonora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, but here and there some minor requests.

Copy link
Copy Markdown
Collaborator

@melonora melonora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After applying changes based on discussion with @LucaMarconato

@LucaMarconato
Copy link
Copy Markdown
Member

Thanks great fixes, I agree with all the changes.

@LucaMarconato LucaMarconato merged commit 872f8c2 into main May 27, 2025
7 checks passed
@LucaMarconato LucaMarconato deleted the bugfix/issue934-utils-function-to-sanitise-obs-columns branch May 27, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Utils function to sanitise obs columns

4 participants