Skip to content
This repository was archived by the owner on Jul 3, 2025. It is now read-only.

Conversation

bobozi-cmd
Copy link

@bobozi-cmd bobozi-cmd commented Jun 4, 2025

#52
Done:

  • Added A simple omniparser detector, which can pass screenshots to the server for OCR recognition, and then resolve them into InteractiveElements.
  • Added filtering logic. OCR will identify many irrelevant elements, and if not filtered, the accuracy of LLM will be reduced. DOM Element is inversely mapped through element coordinates to filter non-interactive elements. see OmniparserDetector.filter_cv_elements_by_interactive()
  • Added a usecase for OmniparserDetector.

Question:

  • OmniparserDetector.filter_cv_elements_by_interactive(page: Page) requires playwright.Page parameters, which should be placed in the workflow, maybe modify the get_interactive_elements.

Important

Add OmniparserDetector for OCR-based interactive element detection with filtering and a use case demonstration.

  • OmniparserDetector:
    • Added OmniparserDetector class in detector.py for OCR-based detection of interactive elements.
    • Implements detect_from_image() to send screenshots to a server for OCR and parse results into InteractiveElement objects.
    • Implements filter_cv_elements_by_interactive() to filter non-interactive elements using coordinate mapping.
  • JavaScript Evaluation:
    • Added checkCoordinateInteractive.js to evaluate element interactivity based on coordinates.
  • Use Case:
    • Added omni_usecase.py to demonstrate OCR and HTML-based element detection and highlight results on screenshots.

This description was created by Ellipsis for af1c242. You can customize this summary. It will automatically update as commits are pushed.

@bobozi-cmd bobozi-cmd changed the title [WIP] Add omniparser detector Add omniparser detector Jun 9, 2025
@bobozi-cmd bobozi-cmd marked this pull request as ready for review June 9, 2025 02:54
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here: https://app.greptile.com/review/github.

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

3 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to af1c242 in 1 minute and 28 seconds. Click for details.
  • Reviewed 386 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. omni_usecase.py:15
  • Draft comment:
    Consider using logging instead of print statements for better consistency in error/info handling.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This appears to be a demo/example script rather than production code. Print statements are common and acceptable in example scripts. While logging would be better for production code, suggesting this change feels overly pedantic for what seems to be a usage example. The print statements are simple and serve their purpose for a demo. I could be wrong about this being just a demo script - it might be meant for production use. Proper logging is generally a best practice for error handling. Even if this were production code, the change from print to logging is a minor improvement that doesn't meaningfully impact functionality. The current implementation is adequate for demonstrating the core functionality. While technically correct, this comment suggests a change that adds little value given the apparent purpose of this script. The comment should be removed.
2. index/browser/checkCoordinateInteractive.js:76
  • Draft comment:
    Typographical suggestion: In the comment on line 76, consider using 'an' instead of 'a' (e.g. 'Ensure each element has an index_id').
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 20% <= threshold 50% The comment is suggesting a typographical change in a comment within the code. This is a minor suggestion and doesn't impact the functionality or logic of the code. It doesn't ask for any confirmation or testing, but it is purely informative and doesn't provide significant value to the code review process.
3. index/browser/detector.py:120
  • Draft comment:
    Typographical issue: The log message "Cannot Found elements" appears to have a grammatical error. Please consider changing it to "Cannot find elements".
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While this is a real grammatical error in the new code, log messages are primarily for developers and debugging. The meaning is still clear despite the grammatical error. The rules state we should not make purely informative comments or comments about obvious/unimportant issues. This feels like a very minor issue that doesn't impact functionality. The grammatical error could make the codebase look unprofessional and might confuse non-native English speakers. While professional appearance matters, this is an internal log message that doesn't affect functionality or code quality in any meaningful way. The meaning remains clear despite the error. Delete this comment as it addresses a very minor grammatical issue in a log message that doesn't impact code functionality or maintainability.
4. omni_usecase.py:20
  • Draft comment:
    Typographical suggestion: change "Successfully save to {name}.png" to "Successfully saved to {name}.png" for correct verb form.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the grammar correction is technically correct, this is a very minor issue in a debug print statement. It doesn't affect functionality, readability, or maintainability in any meaningful way. Our rules state not to make obvious or unimportant comments. Debug message grammar falls into this category. The grammar error could be seen as unprofessional if this message appears in logs that customers might see. Also, fixing it is a very simple change. This appears to be an internal debug message, not customer-facing. Even if it was customer-facing, the grammatical error is so minor it wouldn't meaningfully impact understanding. Delete this comment as it addresses a trivial grammar issue in a debug print statement, which violates our rule about not making obvious or unimportant comments.
5. omni_usecase.py:33
  • Draft comment:
    Consider revising: "Ocr detector find {len(interactive_elements)} elements." might be clearer as "OCR detector found {len(interactive_elements)} elements." to fix capitalization and grammatical tense.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a debug print statement in a new example/demo file. The grammar issue is minor and doesn't affect functionality. The same issue exists in both functions (lines 33 and 52). Print statements are typically informal and used for debugging. Making them grammatically perfect doesn't add significant value. The grammar issue could make the output slightly less professional if this is user-facing output. The inconsistent capitalization of "OCR" vs "Ocr" could be considered a technical terminology issue. This appears to be internal debug output rather than user-facing text. The meaning is perfectly clear despite the minor grammatical issues. Focusing on grammar in debug statements is overly pedantic. Delete the comment as it focuses on an unimportant grammatical issue in debug output that doesn't impact code functionality or maintainability.
6. omni_usecase.py:52
  • Draft comment:
    Consider revising: "Html detector find {len(interactive_elements)} elements." might be clearer as "HTML detector found {len(interactive_elements)} elements." to fix capitalization and grammatical tense.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Print statements are typically for debugging/logging and don't need perfect grammar. The same grammatical issue exists in the OCR print statement on line 33. The comment is technically correct but very minor. This kind of nitpicking about grammar in debug prints doesn't add value and could be distracting from more important review comments. The grammar issue could indicate a non-native English speaker who might appreciate the correction. Consistent and professional logging messages can make debugging easier. While good grammar is nice, this is an internal debug print statement, not user-facing text. The benefit is too small to justify a code change and review comment. Delete this comment as it's too minor of an issue for a print statement and doesn't meaningfully improve the code.

Workflow ID: wflow_rIddaTTGU1bQGria

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.


const element2 = frameDocument.elementFromPoint(x - rect.left, y - rect.top);

if (interactiveTags.includes(element2.tagName.toLowerCase())) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if element2 is null before calling element2.tagName to avoid potential runtime errors.

Suggested change
if (interactiveTags.includes(element2.tagName.toLowerCase())) {
if (element2 && interactiveTags.includes(element2.tagName.toLowerCase())) {

from importlib import resources
from io import BytesIO
import logging
from tkinter import Image
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using from tkinter import Image is likely incorrect; consider using from PIL import Image for image processing.

Suggested change
from tkinter import Image
from PIL import Image

async def detect_from_image(self, image_b64: str, scale_factor: float, detect_sheets: bool = False) -> List[InteractiveElement]:
elements = []
try:
resp = requests.post(f"{self.endpoint}/parse/", json={"base64_image": image_b64})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid blocking the event loop by using a synchronous HTTP library (requests) in an async function. Consider an async client like aiohttp/HTTPX.

)
elements.append(elem)
else:
logging.warn("Cannot Found elements")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging.warning instead of deprecated logging.warn.

Suggested change
logging.warn("Cannot Found elements")
logging.warning("Cannot Found elements")

content_list = resp.json()["parsed_content_list"]

data = base64.b64decode(som_image_b64)
with open(f"ocr.png", "wb") as fp:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing an ocr.png file may be leftover debugging code; remove or refactor for production.

from index.browser.browser import Browser, BrowserConfig
from index.browser.utils import put_highlight_elements_on_screenshot

endpoint = "http://xxx:8000"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'endpoint' value is hardcoded with a placeholder; consider externalizing configuration or using environment variables.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant