Skip to content

Conversation

@anuprulez
Copy link
Member

Copy link
Contributor

@davelopez davelopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @anuprulez for the writeup!
It looks pretty cool to me!

Copy link
Collaborator

@Sch-Da Sch-Da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I am a little too late...

This is amazing, thanks for the integration @davelopez, and it is really appreciated @anuprulez that you are explaining with this example - thank you so much!
I ran my first image segmentation based on this - really amazing.
I had minor comments - not sure you still want to take a look here

<div align="center">
<img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="600"/>
</div>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be beneficial to include information on how people can utilise this? I know, it is not the blog post's aim, but it might clarify the use case.

Like, can you feed the coordinates to another tool to cut the image accordingly?
In that case, maybe something like this:

As an example, those outputs coordinated can be used in tool x to cut the image, respectively. From those, you can only select the text passages for higher quality in optical character recognition (OCR) with Tesseract or LLMHub.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Doclayout tool exports segmentation as Geojson file relative to the image. If there is a tool in Galaxy that can utilize the segmentation coordinates to extract sub-images, we can extend this analysis to use OCR tools as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, this should be working. My test failed today (but likely due to the wrong initial input format). If you want, you can either include in similar sentence just to get users on the way - or merge without it and I can add one more, once my tests work next week. I do not want to stop the publication until then

Copy link
Collaborator

@kostrykin kostrykin Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use 🔧 Convert coordinates to label map to convert GeoJSON to a Label map, followed by 🔧 Crop image to extract the image patches corresponding to the labeled regions (the bounding boxes of those, which are identical to the annotated image regions, if they are rectangular).

Copy link
Member Author

@anuprulez anuprulez Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kostrykin @pavanvidem for your suggestions. I used Convert coordinates to label map tool to produce label maps for the following segmented image

Galaxy10- Segmented image covert_coor

But, the 🔧 Crop image tool fails because of image size mismatch

Traceback (most recent call last):
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/imgteam/crop_image/d52317f0ac21/crop_image/crop_image.py", line 60, in <module>
    crop_image(
  File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/imgteam/crop_image/d52317f0ac21/crop_image/crop_image.py", line 22, in crop_image
    raise ValueError(f'Shape mismatch between image ({image.data.shape}) and label map ({labelmap.data.shape}).')
ValueError: Shape mismatch between image ((1, 1, 1, 874, 1149, 3)) and label map ((1, 1, 1, 874, 1149, 1)).

For the Crop image tool, I am using the original image with dimensions (1, 1, 1, 874, 1149, 3). I see there is a mismatch of number of channels. I think if I convert the original image to grayscale to have just 1 channel, then "Crop image" tool may work. I tried to use ColorToGray
with CellProfiler
, but it fails.

Any other tool you can suggest to convert the colored images to grayscale? Thanks!

Copy link
Member Author

@anuprulez anuprulez Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @kostrykin's tool update, I adapted the WF to only do the cutting task:
https://usegalaxy.eu/u/schnda/w/extract-text-passages-from-images
so this would be rather slim and could explain how users can get the texts extracted?
I think it might be a bit clearer than the workaround?
I will suggest some changes based on this workflow. However, please feel free to proceed with what you have instead if my approach is not convincing to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the trimmed down version. I am running it currently. I will update the post accordingly. thanks @Sch-Da

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://usegalaxy.eu/u/schnda/h/extract-text-passages-from-images-test
this is the respective history - in case it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the post using the newer version of the workflow. @Sch-Da thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use another image here? Sending a PM to explain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the sample image in c6ea5c9

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! And I see your detection also worked better than in my image, thanks a lot!

kostrykin added a commit to kostrykin/galaxy-image-analysis that referenced this pull request Jan 9, 2026
@anuprulez
Copy link
Member Author

I have made the changes to have a workflow. Can you look @Sch-Da @bgruening ? thanks!

Copy link
Collaborator

@Sch-Da Sch-Da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions in case you want to go with the slim workflow. Please use, what you find necessary and discard the rest - in any case thanks a lot for your work on this!


## Run inference in Galaxy

The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities.
This [workflow](https://usegalaxy.eu/u/schnda/w/extract-text-passages-from-images) is an example for text segmentation and extraction. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding boxes around them. These bounding boxes containing text chunks are extracted from the original image. You could combine the output with other tools in Galaxy, such as the [LLM Hub](https://usegalaxy.eu/?tool_id=llm_hub) or [Tesseract](https://usegalaxy.eu/?tool_id=tesseract) for optical character recognition (OCR). This will make your image machine-readable.

<div align="center">
<img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="600"/>
</div>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://usegalaxy.eu/u/schnda/h/extract-text-passages-from-images-test
this is the respective history - in case it is needed.

<div align="center">
<img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="400"/>
</div>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To make use of this information, the workflow converts the location coordinates into a different format. In this step, [**Convert coordinates to label map**](https://usegalaxy.eu/root?tool_id=ip_points_to_label) it is important that the width and height of your input match the image you want to cut.
You can find this information about your image by clicking on the image in your history and clicking on the "i" at the bottom to show the dataset details. Navigate to the edit tab to find your image's height and width. You can now feed this information to the **Convert coordinates to label map** tool. Use the [cropping tool](https://usegalaxy.eu/root?tool_id=ip_crop_image) to extract your images.

<img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="400"/>
</div>

### Configure the LLM Hub tool in the workfow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Configure the LLM Hub tool in the workfow
if we leave out the llm hub, I would suggest deleting lines 120-128


### Workflow output

The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the **thinking process** of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the **thinking process** of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details.
The following image shows the output of the text segmentation and detection produced by the workflow. Depending on the input image, we get several separate images. Those can now be used with other tools, like Tesseract or LLM Hub, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed these comments.

@Sch-Da
Copy link
Collaborator

Sch-Da commented Jan 12, 2026

Thanks a lot @anuprulez !

@anuprulez
Copy link
Member Author

Can we merge it if looks to you @bgruening ? thanks!

@bgruening bgruening merged commit 31c6cad into galaxyproject:master Jan 12, 2026
3 checks passed
@bgruening
Copy link
Member

Nice, thanks a lot!

@anuprulez anuprulez deleted the huggingface_post branch January 12, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants