Integration of Hugging Face Hub with Galaxy #3515

anuprulez · 2026-01-08T11:05:32Z

davelopez

Thank you so much @anuprulez for the writeup!
It looks pretty cool to me!

Sch-Da

Sorry, I think I am a little too late...

This is amazing, thanks for the integration @davelopez, and it is really appreciated @anuprulez that you are explaining with this example - thank you so much!
I ran my first image segmentation based on this - really amazing.
I had minor comments - not sure you still want to take a look here

content/news/2026-01-07-hf-integration/index.md

Sch-Da · 2026-01-08T13:59:12Z

content/news/2026-01-07-hf-integration/index.md

+<div align="center">
+    <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="600"/>
+</div>
+


Perhaps it would be beneficial to include information on how people can utilise this? I know, it is not the blog post's aim, but it might clarify the use case.

Like, can you feed the coordinates to another tool to cut the image accordingly?
In that case, maybe something like this:

As an example, those outputs coordinated can be used in tool x to cut the image, respectively. From those, you can only select the text passages for higher quality in optical character recognition (OCR) with Tesseract or LLMHub.

The Doclayout tool exports segmentation as Geojson file relative to the image. If there is a tool in Galaxy that can utilize the segmentation coordinates to extract sub-images, we can extend this analysis to use OCR tools as well.

Apparently, this should be working. My test failed today (but likely due to the wrong initial input format). If you want, you can either include in similar sentence just to get users on the way - or merge without it and I can add one more, once my tests work next week. I do not want to stop the publication until then

You can use 🔧 Convert coordinates to label map to convert GeoJSON to a Label map, followed by 🔧 Crop image to extract the image patches corresponding to the labeled regions (the bounding boxes of those, which are identical to the annotated image regions, if they are rectangular).

Thanks @kostrykin @pavanvidem for your suggestions. I used Convert coordinates to label map tool to produce label maps for the following segmented image

But, the 🔧 Crop image tool fails because of image size mismatch

Traceback (most recent call last): File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/imgteam/crop_image/d52317f0ac21/crop_image/crop_image.py", line 60, in <module> crop_image( File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/imgteam/crop_image/d52317f0ac21/crop_image/crop_image.py", line 22, in crop_image raise ValueError(f'Shape mismatch between image ({image.data.shape}) and label map ({labelmap.data.shape}).') ValueError: Shape mismatch between image ((1, 1, 1, 874, 1149, 3)) and label map ((1, 1, 1, 874, 1149, 1)).

For the Crop image tool, I am using the original image with dimensions (1, 1, 1, 874, 1149, 3). I see there is a mismatch of number of channels. I think if I convert the original image to grayscale to have just 1 channel, then "Crop image" tool may work. I tried to use ColorToGray
with CellProfiler, but it fails.

Any other tool you can suggest to convert the colored images to grayscale? Thanks!

Workflow: https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1

Invocation: https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e

History: https://usegalaxy.eu/u/kumara/h/ocr-doclayout-hugging-face-llm-hub

I will update the post with this workflow instead of showing each step one-by-one

Based on @kostrykin's tool update, I adapted the WF to only do the cutting task:
https://usegalaxy.eu/u/schnda/w/extract-text-passages-from-images
so this would be rather slim and could explain how users can get the texts extracted?
I think it might be a bit clearer than the workaround?
I will suggest some changes based on this workflow. However, please feel free to proceed with what you have instead if my approach is not convincing to you.

I like the trimmed down version. I am running it currently. I will update the post accordingly. thanks @Sch-Da

https://usegalaxy.eu/u/schnda/h/extract-text-passages-from-images-test
this is the respective history - in case it is needed.

I updated the post using the newer version of the workflow. @Sch-Da thanks!

Sch-Da · 2026-01-08T14:00:10Z

content/news/2026-01-07-hf-integration/7_segmented_image.png

Would it be possible to use another image here? Sending a PM to explain.

Replaced the sample image in c6ea5c9

Thank you! And I see your detection also worked better than in my image, thanks a lot!

Co-authored-by: Daniela Schneider <[email protected]>

Add test case to reproduce this issue: galaxyproject/galaxy-hub#3515 (comment)

anuprulez · 2026-01-12T10:53:36Z

I have made the changes to have a workflow. Can you look @Sch-Da @bgruening ? thanks!

Sch-Da

Here are some suggestions in case you want to go with the slim workflow. Please use, what you find necessary and discard the rest - in any case thanks a lot for your work on this!

Sch-Da · 2026-01-12T13:45:32Z

content/news/2026-01-07-hf-integration/index.md

+
+## Run inference in Galaxy
+
+The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities.


Suggested change

The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities.

This [workflow](https://usegalaxy.eu/u/schnda/w/extract-text-passages-from-images) is an example for text segmentation and extraction. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding boxes around them. These bounding boxes containing text chunks are extracted from the original image. You could combine the output with other tools in Galaxy, such as the [LLM Hub](https://usegalaxy.eu/?tool_id=llm_hub) or [Tesseract](https://usegalaxy.eu/?tool_id=tesseract) for optical character recognition (OCR). This will make your image machine-readable.

Sch-Da · 2026-01-12T13:52:19Z

content/news/2026-01-07-hf-integration/index.md

+<div align="center">
+    <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="600"/>
+</div>
+


https://usegalaxy.eu/u/schnda/h/extract-text-passages-from-images-test
this is the respective history - in case it is needed.

Sch-Da · 2026-01-12T14:06:58Z

content/news/2026-01-07-hf-integration/index.md

+<div align="center">
+    <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="400"/>
+</div>
+


Suggested change

To make use of this information, the workflow converts the location coordinates into a different format. In this step, [**Convert coordinates to label map**](https://usegalaxy.eu/root?tool_id=ip_points_to_label) it is important that the width and height of your input match the image you want to cut.

You can find this information about your image by clicking on the image in your history and clicking on the "i" at the bottom to show the dataset details. Navigate to the edit tab to find your image's height and width. You can now feed this information to the **Convert coordinates to label map** tool. Use the [cropping tool](https://usegalaxy.eu/root?tool_id=ip_crop_image) to extract your images.

Sch-Da · 2026-01-12T14:07:55Z

content/news/2026-01-07-hf-integration/index.md

+    <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="400"/>
+</div>
+
+### Configure the LLM Hub tool in the workfow


Suggested change

### Configure the LLM Hub tool in the workfow

if we leave out the llm hub, I would suggest deleting lines 120-128

Sch-Da · 2026-01-12T14:09:32Z

content/news/2026-01-07-hf-integration/index.md

+
+### Workflow output
+
+The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the **thinking process** of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details.


Suggested change

The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the **thinking process** of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details.

The following image shows the output of the text segmentation and detection produced by the workflow. Depending on the input image, we get several separate images. Those can now be used with other tools, like Tesseract or LLM Hub, for example.

I have fixed these comments.

Sch-Da · 2026-01-12T15:18:35Z

Thanks a lot @anuprulez !

anuprulez · 2026-01-12T15:27:30Z

Can we merge it if looks to you @bgruening ? thanks!

bgruening · 2026-01-12T15:55:56Z

Nice, thanks a lot!

Anup Kumar added 7 commits January 7, 2026 17:19

add post for HF

a3ea301

update post

edd4aa6

update text

04138cb

add details

7d6b19d

add details

1e9540c

add gptoss

1d74fe8

add ack

50a4e8f

davelopez approved these changes Jan 8, 2026

View reviewed changes

Merge branch 'master' into huggingface_post

3f37e5b

Sch-Da reviewed Jan 8, 2026

View reviewed changes

anuprulez and others added 4 commits January 8, 2026 15:19

Update content/news/2026-01-07-hf-integration/index.md

0c11105

Co-authored-by: Daniela Schneider <[email protected]>

Update content/news/2026-01-07-hf-integration/index.md

1c09ac7

Co-authored-by: Daniela Schneider <[email protected]>

Update content/news/2026-01-07-hf-integration/index.md

c741ea6

Co-authored-by: Daniela Schneider <[email protected]>

replace sample image

c6ea5c9

kostrykin added a commit to kostrykin/galaxy-image-analysis that referenced this pull request Jan 9, 2026

Add Zarr + combined multi/single-channel test case (failing)

784567a

Add test case to reproduce this issue: galaxyproject/galaxy-hub#3515 (comment)

Anup Kumar added 2 commits January 12, 2026 11:44

update post

1892043

add invocation link

a1e08ff

Sch-Da reviewed Jan 12, 2026

View reviewed changes

Anup Kumar added 6 commits January 12, 2026 15:18

update workflow and images

28c4f9c

remove llm hum references

e71be3a

update

6e1d2ed

fix review comments

72340fe

add Daniela

05cba5e

add Daniela

23529bc

Merge branch 'master' into huggingface_post

426ffbe

bgruening merged commit 31c6cad into galaxyproject:master Jan 12, 2026
3 checks passed

anuprulez deleted the huggingface_post branch January 12, 2026 15:59


		## Run inference in Galaxy

		The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities.



	To make use of this information, the workflow converts the location coordinates into a different format. In this step, [Convert coordinates to label map](https://usegalaxy.eu/root?tool_id=ip_points_to_label) it is important that the width and height of your input match the image you want to cut.
	You can find this information about your image by clicking on the image in your history and clicking on the "i" at the bottom to show the dataset details. Navigate to the edit tab to find your image's height and width. You can now feed this information to the Convert coordinates to label map tool. Use the [cropping tool](https://usegalaxy.eu/root?tool_id=ip_crop_image) to extract your images.

	### Configure the LLM Hub tool in the workfow

	if we leave out the llm hub, I would suggest deleting lines 120-128


		### Workflow output

		The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the thinking process of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details.

Integration of Hugging Face Hub with Galaxy #3515

Integration of Hugging Face Hub with Galaxy #3515

Uh oh!

Conversation

anuprulez commented Jan 8, 2026

Uh oh!

davelopez left a comment

Choose a reason for hiding this comment

Uh oh!

Sch-Da left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kostrykin Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anuprulez Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anuprulez Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anuprulez commented Jan 12, 2026

Uh oh!

Sch-Da left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sch-Da commented Jan 12, 2026

Uh oh!

anuprulez commented Jan 12, 2026

Uh oh!

Uh oh!

bgruening commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kostrykin Jan 8, 2026 •

edited

Loading

anuprulez Jan 9, 2026 •

edited

Loading

anuprulez Jan 9, 2026 •

edited

Loading