feat: Cache Labels for Taxonomy #7383

sbatchelder · 2025-04-16T12:25:38Z

See issue #7382

This fix allows for the value(s) of Taxonomy annotations to be displayed by the experimental Cache Labels feature.
Prior to this PR, Cache Labels would create an all-blank column when applied to a Control Tag for a Taxonomy annotation.
With this PR, values selected in a Taxonomy annotation are correctly displayed.

The issue arose from Taxonomy annotation values being returned as a list, as opposed to a str. For Taxonomy type annotations, a list is required for each selected element so as to capture the hierarchical path taken for any given value in a taxonomy tree. In this PR, consideration was taken so as to only show the selected value, and not capture parts of the path leading up to the selected value. One or more selected values, as well as leaf and intermediate/branch values are supported but this fix.

This is a very simple, very small, low risk PR. It is also my first PR with Label Studio, yay!
My thanks to @makseq who pointed me towards the cache_labels.py file lines most likely responsible for the blank-taxonomy-cache-labels bug.

netlify · 2025-04-16T12:25:42Z

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`d7c6168`

netlify · 2025-04-16T12:25:42Z

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`d7c6168`

netlify · 2025-04-16T12:26:04Z

✅ Deploy Preview for label-studio-storybook canceled.

Name	Link
🔨 Latest commit	`d7c6168`
🔍 Latest deploy log	https://app.netlify.com/sites/label-studio-storybook/deploys/681d515f868e620008c11bc8

…_tag attrs HumanSignal#7382

sbatchelder · 2025-04-17T14:31:30Z

In discussions with Max, it was determined that displaying just the leaf node of a selected taxonomy was not always desirable, as important information about taxonomic hierarchy paths can be lost.

I've made an update such that cache_labels will match the formatting schema of the given Taxonomy tag's labeling interface showFullPath and pathSeparator attributes. This was achieved by instantiating a label_studio_sdk.label_interface.LabelInterface object with config of the given project. label_studio/data_manager/actions/cache_labels.extract_labels(...) now optionally accepts this label_interface object.

Now <Taxonomy name="Type" toName="image" showFullPath="true" pathSeparator="."> leads to:

Defaults are respected, so <Taxonomy name="Type" toName="image" showFullPath="true"> becomes:

And all defaults, ie a basic <Taxonomy name="Type" toName="image"> becomes:

Defaults are hardcoded and come from Label Interface Taxonomy Params .
If default taxonomy parameters for showFullPath and pathSeparator are every updated they will need to be manually updated here as well, as I am unaware of where default values for control tags are stored.

If the taxonomy control tag is a Custom control tag, naturally it will not have any Labeling Interface configuration to specify its formatting. In this case, showFullPath='true' and pathSeparator='/' are used. If a Labeling Interface object is not provided to extract_labels(...) (or is None) these defaults are also used. Although showFullPath='true' differs from the default for Label Interface Taxonomy Params, I think here it made more sense to err on the side of displaying/transcribing more information (the full path) to the cache_label than not.

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

label_studio/data_manager/actions/cache_labels.py:73

[nitpick] Consider renaming 'showFullPath' to 'show_full_path' and 'pathSeparator' to 'path_separator' for consistency with Python naming conventions.

showFullPath = 'true'

label_studio/data_manager/actions/cache_labels.py

…nal#7382

sbatchelder · 2025-04-17T19:20:17Z

new commit: I've pre-loaded the label interface tags prior to the tasks and annotations loops. The project's label interface control tags are easily accessible via dict now, should be orders of magnitude more efficient.
Also corrected the "delim-" typo :)

label_studio/data_manager/actions/cache_labels.py

sbatchelder · 2025-04-24T22:51:19Z

Hey @makseq ,

I tried following your steps above to setup the situation you described above, but I find myself unable to remove Choices name="a" using the UI, after having submitted a couple tasks containing the name=a Choices. I get a Created annotations are incompatible with provided labeling schema, we found: 2 with from_name=a, to_name=image, type=choices error when attempting to remove the name=a Choices field; the new config does not save. Note that I am using an Image data object instead of Text, because that's what I have on-hand for my project but this I'd image shouldn't affect this step.

So unfortunately I am unable to test the situation you describe.

That said, I think my implementation still works as intended and I am not convinced the label interface config needs to be rechecked on-the-fly. If an annotation is not present in the label_interface_tags = {tag.name:tag for tag in label_interface.find_tags('control')} dict but is present in the annotation results, then currently my code treats the value like a Custom Control Tag and it would get included in a cache_all column with default formatting (showFullPath='true' and pathSeparator='/').

Have you tested my code against the situation you describe and found it to fail? If yes, I'd love a snapshot of the issue because I can't reproduce your scenario.
I am open to stepping through the code with you in a live call to test or discuss if you like.

Also, I'm not sure how the situation you describe above relates to rechecking the label interface config on-the-fly.
Is that still a concern you hold?

makseq · 2025-04-30T23:31:14Z

@sbatchelder yes, this situation is not very often and it's a bit tricky to reproduce it. Maybe we can ignore it.

I am going to run pytests for it tomorrow. If they pass, we can merge it.

makseq · 2025-05-08T21:20:31Z

Could you please merge the latest develop to your branch? It should fix pytests

makseq · 2025-05-08T21:21:01Z

Also, you need to fix Linter. To do this, run:

pip install pre-commit
make fmt-all

sbatchelder · 2025-05-15T17:42:26Z

Aww I'm sad this didn't make it into the recent version release.
@makseq , thank you for the approval. What are the failing checks about?

hotfix for Taxonomy Cache Labels HumanSignal#7382

20cb972

github-actions bot added the title needs formatting label Apr 16, 2025

sbatchelder changed the title ~~[fix] for Taxonomy Cache Labels~~ fix: for Taxonomy Cache Labels Apr 16, 2025

github-actions bot added fix and removed title needs formatting labels Apr 16, 2025

sbatchelder added 2 commits April 17, 2025 09:57

taxonomy cache_labels respects showFullPath and pathSeparator control…

2d72746

…_tag attrs HumanSignal#7382

providing label_interface to extract_labels now optional

2cbc80d

makseq requested a review from Copilot April 17, 2025 18:15

Copilot AI reviewed Apr 17, 2025

View reviewed changes