Add omniparser detector #53

bobozi-cmd · 2025-06-04T05:23:18Z

#52
Done:

Added A simple omniparser detector, which can pass screenshots to the server for OCR recognition, and then resolve them into InteractiveElements.
Added filtering logic. OCR will identify many irrelevant elements, and if not filtered, the accuracy of LLM will be reduced. DOM Element is inversely mapped through element coordinates to filter non-interactive elements. see OmniparserDetector.filter_cv_elements_by_interactive()
Added a usecase for OmniparserDetector.

Question:

OmniparserDetector.filter_cv_elements_by_interactive(page: Page) requires playwright.Page parameters, which should be placed in the workflow, maybe modify the get_interactive_elements.

Important

Add OmniparserDetector for OCR-based interactive element detection with filtering and a use case demonstration.

OmniparserDetector:
- Added OmniparserDetector class in detector.py for OCR-based detection of interactive elements.
- Implements detect_from_image() to send screenshots to a server for OCR and parse results into InteractiveElement objects.
- Implements filter_cv_elements_by_interactive() to filter non-interactive elements using coordinate mapping.
JavaScript Evaluation:
- Added checkCoordinateInteractive.js to evaluate element interactivity based on coordinates.
Use Case:
- Added omni_usecase.py to demonstrate OCR and HTML-based element detection and highlight results on screenshots.

^{This description was created by}^{for af1c242. You can customize this summary. It will automatically update as commits are pushed.}

greptile-apps

PR Summary

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here: https://app.greptile.com/review/github.

_{💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!}

_{3 file(s) reviewed, no comment(s)}
_{Edit PR Review Bot Settings | Greptile}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to af1c242 in 1 minute and 28 seconds. Click for details.

Reviewed 386 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. omni_usecase.py:15

Draft comment:
Consider using logging instead of print statements for better consistency in error/info handling.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This appears to be a demo/example script rather than production code. Print statements are common and acceptable in example scripts. While logging would be better for production code, suggesting this change feels overly pedantic for what seems to be a usage example. The print statements are simple and serve their purpose for a demo. I could be wrong about this being just a demo script - it might be meant for production use. Proper logging is generally a best practice for error handling. Even if this were production code, the change from print to logging is a minor improvement that doesn't meaningfully impact functionality. The current implementation is adequate for demonstrating the core functionality. While technically correct, this comment suggests a change that adds little value given the apparent purpose of this script. The comment should be removed.

2. index/browser/checkCoordinateInteractive.js:76

Draft comment:
Typographical suggestion: In the comment on line 76, consider using 'an' instead of 'a' (e.g. 'Ensure each element has an index_id').
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 20% <= threshold 50% The comment is suggesting a typographical change in a comment within the code. This is a minor suggestion and doesn't impact the functionality or logic of the code. It doesn't ask for any confirmation or testing, but it is purely informative and doesn't provide significant value to the code review process.

3. index/browser/detector.py:120

Draft comment:
Typographical issue: The log message "Cannot Found elements" appears to have a grammatical error. Please consider changing it to "Cannot find elements".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While this is a real grammatical error in the new code, log messages are primarily for developers and debugging. The meaning is still clear despite the grammatical error. The rules state we should not make purely informative comments or comments about obvious/unimportant issues. This feels like a very minor issue that doesn't impact functionality. The grammatical error could make the codebase look unprofessional and might confuse non-native English speakers. While professional appearance matters, this is an internal log message that doesn't affect functionality or code quality in any meaningful way. The meaning remains clear despite the error. Delete this comment as it addresses a very minor grammatical issue in a log message that doesn't impact code functionality or maintainability.

4. omni_usecase.py:20

Draft comment:
Typographical suggestion: change "Successfully save to {name}.png" to "Successfully saved to {name}.png" for correct verb form.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the grammar correction is technically correct, this is a very minor issue in a debug print statement. It doesn't affect functionality, readability, or maintainability in any meaningful way. Our rules state not to make obvious or unimportant comments. Debug message grammar falls into this category. The grammar error could be seen as unprofessional if this message appears in logs that customers might see. Also, fixing it is a very simple change. This appears to be an internal debug message, not customer-facing. Even if it was customer-facing, the grammatical error is so minor it wouldn't meaningfully impact understanding. Delete this comment as it addresses a trivial grammar issue in a debug print statement, which violates our rule about not making obvious or unimportant comments.

5. omni_usecase.py:33

Draft comment:
Consider revising: "Ocr detector find {len(interactive_elements)} elements." might be clearer as "OCR detector found {len(interactive_elements)} elements." to fix capitalization and grammatical tense.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a debug print statement in a new example/demo file. The grammar issue is minor and doesn't affect functionality. The same issue exists in both functions (lines 33 and 52). Print statements are typically informal and used for debugging. Making them grammatically perfect doesn't add significant value. The grammar issue could make the output slightly less professional if this is user-facing output. The inconsistent capitalization of "OCR" vs "Ocr" could be considered a technical terminology issue. This appears to be internal debug output rather than user-facing text. The meaning is perfectly clear despite the minor grammatical issues. Focusing on grammar in debug statements is overly pedantic. Delete the comment as it focuses on an unimportant grammatical issue in debug output that doesn't impact code functionality or maintainability.

6. omni_usecase.py:52

Draft comment:
Consider revising: "Html detector find {len(interactive_elements)} elements." might be clearer as "HTML detector found {len(interactive_elements)} elements." to fix capitalization and grammatical tense.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Print statements are typically for debugging/logging and don't need perfect grammar. The same grammatical issue exists in the OCR print statement on line 33. The comment is technically correct but very minor. This kind of nitpicking about grammar in debug prints doesn't add value and could be distracting from more important review comments. The grammar issue could indicate a non-native English speaker who might appreciate the correction. Consistent and professional logging messages can make debugging easier. While good grammar is nice, this is an internal debug print statement, not user-facing text. The benefit is too small to justify a code change and review comment. Delete this comment as it's too minor of an issue for a print statement and doesn't meaningfully improve the code.

Workflow ID: wflow_rIddaTTGU1bQGria

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-06-09T02:56:24Z

index/browser/checkCoordinateInteractive.js

+
+        const element2 = frameDocument.elementFromPoint(x - rect.left, y - rect.top);
+
+        if (interactiveTags.includes(element2.tagName.toLowerCase())) {


Check if element2 is null before calling element2.tagName to avoid potential runtime errors.

Suggested change

if (interactiveTags.includes(element2.tagName.toLowerCase())) {

if (element2 && interactiveTags.includes(element2.tagName.toLowerCase())) {

ellipsis-dev · 2025-06-09T02:56:25Z

index/browser/detector.py

+from importlib import resources
+from io import BytesIO
+import logging
+from tkinter import Image


Using from tkinter import Image is likely incorrect; consider using from PIL import Image for image processing.

Suggested change

from tkinter import Image

from PIL import Image

ellipsis-dev · 2025-06-09T02:56:25Z

index/browser/detector.py

+    async def detect_from_image(self, image_b64: str, scale_factor: float, detect_sheets: bool = False) -> List[InteractiveElement]:
+        elements = []
+        try:
+            resp = requests.post(f"{self.endpoint}/parse/", json={"base64_image": image_b64})


Avoid blocking the event loop by using a synchronous HTTP library (requests) in an async function. Consider an async client like aiohttp/HTTPX.

ellipsis-dev · 2025-06-09T02:56:25Z

index/browser/detector.py

+                    )
+                    elements.append(elem)
+            else:
+                logging.warn("Cannot Found elements")


Use logging.warning instead of deprecated logging.warn.

Suggested change

logging.warn("Cannot Found elements")

logging.warning("Cannot Found elements")

ellipsis-dev · 2025-06-09T02:56:25Z

index/browser/detector.py

+                content_list = resp.json()["parsed_content_list"]
+
+                data = base64.b64decode(som_image_b64)
+                with open(f"ocr.png", "wb") as fp:


Writing an ocr.png file may be leftover debugging code; remove or refactor for production.

ellipsis-dev · 2025-06-09T02:56:25Z

omni_usecase.py

+from index.browser.browser import Browser, BrowserConfig
+from index.browser.utils import put_highlight_elements_on_screenshot
+
+endpoint = "http://xxx:8000"


The 'endpoint' value is hardcoded with a placeholder; consider externalizing configuration or using environment variables.

bobozi-cmd added 3 commits June 4, 2025 13:13

[WIP] Add omniparser detector

095aeae

Add use case

3eef07f

Merge branch 'lmnr-ai:main' into main

af1c242

bobozi-cmd changed the title ~~[WIP] Add omniparser detector~~ Add omniparser detector Jun 9, 2025

bobozi-cmd marked this pull request as ready for review June 9, 2025 02:54

greptile-apps bot reviewed Jun 9, 2025

View reviewed changes

ellipsis-dev bot reviewed Jun 9, 2025

View reviewed changes

Fix from bot

e418423

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add omniparser detector #53

Add omniparser detector #53

Uh oh!

bobozi-cmd commented Jun 4, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Uh oh!

ellipsis-dev bot Jun 9, 2025

Uh oh!

ellipsis-dev bot Jun 9, 2025

Uh oh!

ellipsis-dev bot Jun 9, 2025

Uh oh!

ellipsis-dev bot Jun 9, 2025

Uh oh!

ellipsis-dev bot Jun 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		const element2 = frameDocument.elementFromPoint(x - rect.left, y - rect.top);

		if (interactiveTags.includes(element2.tagName.toLowerCase())) {

	if (interactiveTags.includes(element2.tagName.toLowerCase())) {
	if (element2 && interactiveTags.includes(element2.tagName.toLowerCase())) {

	logging.warn("Cannot Found elements")
	logging.warning("Cannot Found elements")

Add omniparser detector #53

Are you sure you want to change the base?

Add omniparser detector #53

Uh oh!

Conversation

bobozi-cmd commented Jun 4, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bobozi-cmd commented Jun 4, 2025 •

edited by ellipsis-dev bot

Loading