-
Notifications
You must be signed in to change notification settings - Fork 317
Integration of Hugging Face Hub with Galaxy #3515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
davelopez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @anuprulez for the writeup!
It looks pretty cool to me!
Sch-Da
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I am a little too late...
This is amazing, thanks for the integration @davelopez, and it is really appreciated @anuprulez that you are explaining with this example - thank you so much!
I ran my first image segmentation based on this - really amazing.
I had minor comments - not sure you still want to take a look here
| <div align="center"> | ||
| <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="600"/> | ||
| </div> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be beneficial to include information on how people can utilise this? I know, it is not the blog post's aim, but it might clarify the use case.
Like, can you feed the coordinates to another tool to cut the image accordingly?
In that case, maybe something like this:
As an example, those outputs coordinated can be used in tool x to cut the image, respectively. From those, you can only select the text passages for higher quality in optical character recognition (OCR) with Tesseract or LLMHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Doclayout tool exports segmentation as Geojson file relative to the image. If there is a tool in Galaxy that can utilize the segmentation coordinates to extract sub-images, we can extend this analysis to use OCR tools as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, this should be working. My test failed today (but likely due to the wrong initial input format). If you want, you can either include in similar sentence just to get users on the way - or merge without it and I can add one more, once my tests work next week. I do not want to stop the publication until then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use 🔧 Convert coordinates to label map to convert GeoJSON to a Label map, followed by 🔧 Crop image to extract the image patches corresponding to the labeled regions (the bounding boxes of those, which are identical to the annotated image regions, if they are rectangular).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kostrykin @pavanvidem for your suggestions. I used Convert coordinates to label map tool to produce label maps for the following segmented image
But, the 🔧 Crop image tool fails because of image size mismatch
Traceback (most recent call last):
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/imgteam/crop_image/d52317f0ac21/crop_image/crop_image.py", line 60, in <module>
crop_image(
File "/opt/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/imgteam/crop_image/d52317f0ac21/crop_image/crop_image.py", line 22, in crop_image
raise ValueError(f'Shape mismatch between image ({image.data.shape}) and label map ({labelmap.data.shape}).')
ValueError: Shape mismatch between image ((1, 1, 1, 874, 1149, 3)) and label map ((1, 1, 1, 874, 1149, 1)).
For the Crop image tool, I am using the original image with dimensions (1, 1, 1, 874, 1149, 3). I see there is a mismatch of number of channels. I think if I convert the original image to grayscale to have just 1 channel, then "Crop image" tool may work. I tried to use ColorToGray
with CellProfiler, but it fails.
Any other tool you can suggest to convert the colored images to grayscale? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow: https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1
Invocation: https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e
History: https://usegalaxy.eu/u/kumara/h/ocr-doclayout-hugging-face-llm-hub
I will update the post with this workflow instead of showing each step one-by-one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @kostrykin's tool update, I adapted the WF to only do the cutting task:
https://usegalaxy.eu/u/schnda/w/extract-text-passages-from-images
so this would be rather slim and could explain how users can get the texts extracted?
I think it might be a bit clearer than the workaround?
I will suggest some changes based on this workflow. However, please feel free to proceed with what you have instead if my approach is not convincing to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the trimmed down version. I am running it currently. I will update the post accordingly. thanks @Sch-Da
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://usegalaxy.eu/u/schnda/h/extract-text-passages-from-images-test
this is the respective history - in case it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the post using the newer version of the workflow. @Sch-Da thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use another image here? Sending a PM to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the sample image in c6ea5c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! And I see your detection also worked better than in my image, thanks a lot!
Co-authored-by: Daniela Schneider <[email protected]>
Co-authored-by: Daniela Schneider <[email protected]>
Co-authored-by: Daniela Schneider <[email protected]>
Add test case to reproduce this issue: galaxyproject/galaxy-hub#3515 (comment)
|
I have made the changes to have a workflow. Can you look @Sch-Da @bgruening ? thanks! |
Sch-Da
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some suggestions in case you want to go with the slim workflow. Please use, what you find necessary and discard the rest - in any case thanks a lot for your work on this!
|
|
||
| ## Run inference in Galaxy | ||
|
|
||
| The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The [workflow](https://usegalaxy.eu/u/kumara/w/ocr-with-doclayout-hugging-face-and-llm-hub-1) for text segmentation and extraction includes tools such as DocLayout-YOLO and LLM Hub. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding-boxes around them. These bounding boxes containing text chunks are extracted from the original image and eventually sent to LLM hub tool for extraction that utilises advanced LLM with OCR capabilities. | |
| This [workflow](https://usegalaxy.eu/u/schnda/w/extract-text-passages-from-images) is an example for text segmentation and extraction. The DocLayout-YOLO tool uses the pre-trained model, supplied by Galaxy's Hugging Face integration, to detect text chunks and create bounding boxes around them. These bounding boxes containing text chunks are extracted from the original image. You could combine the output with other tools in Galaxy, such as the [LLM Hub](https://usegalaxy.eu/?tool_id=llm_hub) or [Tesseract](https://usegalaxy.eu/?tool_id=tesseract) for optical character recognition (OCR). This will make your image machine-readable. |
| <div align="center"> | ||
| <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="600"/> | ||
| </div> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://usegalaxy.eu/u/schnda/h/extract-text-passages-from-images-test
this is the respective history - in case it is needed.
| <div align="center"> | ||
| <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="400"/> | ||
| </div> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To make use of this information, the workflow converts the location coordinates into a different format. In this step, [**Convert coordinates to label map**](https://usegalaxy.eu/root?tool_id=ip_points_to_label) it is important that the width and height of your input match the image you want to cut. | |
| You can find this information about your image by clicking on the image in your history and clicking on the "i" at the bottom to show the dataset details. Navigate to the edit tab to find your image's height and width. You can now feed this information to the **Convert coordinates to label map** tool. Use the [cropping tool](https://usegalaxy.eu/root?tool_id=ip_crop_image) to extract your images. |
| <img src="7_segmented_image.png" alt="Segmented output image produced by DocLayout-YOLO" width="400"/> | ||
| </div> | ||
|
|
||
| ### Configure the LLM Hub tool in the workfow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Configure the LLM Hub tool in the workfow | |
| if we leave out the llm hub, I would suggest deleting lines 120-128 |
|
|
||
| ### Workflow output | ||
|
|
||
| The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the **thinking process** of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The following image shows the output of the text segmentation and detection output produced by the workflow in the markdown format. Additionally, the output enlists the **thinking process** of the associated imaging LLM before producing the text from the bounding regions. The [workflow invocation](https://usegalaxy.eu/workflows/invocations/6fc32b7a39dc5b6e) provides more details. | |
| The following image shows the output of the text segmentation and detection produced by the workflow. Depending on the input image, we get several separate images. Those can now be used with other tools, like Tesseract or LLM Hub, for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed these comments.
|
Thanks a lot @anuprulez ! |
|
Can we merge it if looks to you @bgruening ? thanks! |
|
Nice, thanks a lot! |
ping @bgruening @davelopez