-
Notifications
You must be signed in to change notification settings - Fork 127
Add omniparser detector #53
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here: https://app.greptile.com/review/github.
💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!
3 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to af1c242 in 1 minute and 28 seconds. Click for details.
- Reviewed
386
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
6
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. omni_usecase.py:15
- Draft comment:
Consider using logging instead of print statements for better consistency in error/info handling. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This appears to be a demo/example script rather than production code. Print statements are common and acceptable in example scripts. While logging would be better for production code, suggesting this change feels overly pedantic for what seems to be a usage example. The print statements are simple and serve their purpose for a demo. I could be wrong about this being just a demo script - it might be meant for production use. Proper logging is generally a best practice for error handling. Even if this were production code, the change from print to logging is a minor improvement that doesn't meaningfully impact functionality. The current implementation is adequate for demonstrating the core functionality. While technically correct, this comment suggests a change that adds little value given the apparent purpose of this script. The comment should be removed.
2. index/browser/checkCoordinateInteractive.js:76
- Draft comment:
Typographical suggestion: In the comment on line 76, consider using 'an' instead of 'a' (e.g. 'Ensure each element has an index_id'). - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =20%
<= threshold50%
The comment is suggesting a typographical change in a comment within the code. This is a minor suggestion and doesn't impact the functionality or logic of the code. It doesn't ask for any confirmation or testing, but it is purely informative and doesn't provide significant value to the code review process.
3. index/browser/detector.py:120
- Draft comment:
Typographical issue: The log message "Cannot Found elements" appears to have a grammatical error. Please consider changing it to "Cannot find elements". - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While this is a real grammatical error in the new code, log messages are primarily for developers and debugging. The meaning is still clear despite the grammatical error. The rules state we should not make purely informative comments or comments about obvious/unimportant issues. This feels like a very minor issue that doesn't impact functionality. The grammatical error could make the codebase look unprofessional and might confuse non-native English speakers. While professional appearance matters, this is an internal log message that doesn't affect functionality or code quality in any meaningful way. The meaning remains clear despite the error. Delete this comment as it addresses a very minor grammatical issue in a log message that doesn't impact code functionality or maintainability.
4. omni_usecase.py:20
- Draft comment:
Typographical suggestion: change "Successfully save to {name}.png" to "Successfully saved to {name}.png" for correct verb form. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% While the grammar correction is technically correct, this is a very minor issue in a debug print statement. It doesn't affect functionality, readability, or maintainability in any meaningful way. Our rules state not to make obvious or unimportant comments. Debug message grammar falls into this category. The grammar error could be seen as unprofessional if this message appears in logs that customers might see. Also, fixing it is a very simple change. This appears to be an internal debug message, not customer-facing. Even if it was customer-facing, the grammatical error is so minor it wouldn't meaningfully impact understanding. Delete this comment as it addresses a trivial grammar issue in a debug print statement, which violates our rule about not making obvious or unimportant comments.
5. omni_usecase.py:33
- Draft comment:
Consider revising: "Ocr detector find {len(interactive_elements)} elements." might be clearer as "OCR detector found {len(interactive_elements)} elements." to fix capitalization and grammatical tense. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a debug print statement in a new example/demo file. The grammar issue is minor and doesn't affect functionality. The same issue exists in both functions (lines 33 and 52). Print statements are typically informal and used for debugging. Making them grammatically perfect doesn't add significant value. The grammar issue could make the output slightly less professional if this is user-facing output. The inconsistent capitalization of "OCR" vs "Ocr" could be considered a technical terminology issue. This appears to be internal debug output rather than user-facing text. The meaning is perfectly clear despite the minor grammatical issues. Focusing on grammar in debug statements is overly pedantic. Delete the comment as it focuses on an unimportant grammatical issue in debug output that doesn't impact code functionality or maintainability.
6. omni_usecase.py:52
- Draft comment:
Consider revising: "Html detector find {len(interactive_elements)} elements." might be clearer as "HTML detector found {len(interactive_elements)} elements." to fix capitalization and grammatical tense. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Print statements are typically for debugging/logging and don't need perfect grammar. The same grammatical issue exists in the OCR print statement on line 33. The comment is technically correct but very minor. This kind of nitpicking about grammar in debug prints doesn't add value and could be distracting from more important review comments. The grammar issue could indicate a non-native English speaker who might appreciate the correction. Consistent and professional logging messages can make debugging easier. While good grammar is nice, this is an internal debug print statement, not user-facing text. The benefit is too small to justify a code change and review comment. Delete this comment as it's too minor of an issue for a print statement and doesn't meaningfully improve the code.
Workflow ID: wflow_rIddaTTGU1bQGria
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
||
const element2 = frameDocument.elementFromPoint(x - rect.left, y - rect.top); | ||
|
||
if (interactiveTags.includes(element2.tagName.toLowerCase())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if element2
is null before calling element2.tagName
to avoid potential runtime errors.
if (interactiveTags.includes(element2.tagName.toLowerCase())) { | |
if (element2 && interactiveTags.includes(element2.tagName.toLowerCase())) { |
index/browser/detector.py
Outdated
from importlib import resources | ||
from io import BytesIO | ||
import logging | ||
from tkinter import Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using from tkinter import Image
is likely incorrect; consider using from PIL import Image
for image processing.
from tkinter import Image | |
from PIL import Image |
async def detect_from_image(self, image_b64: str, scale_factor: float, detect_sheets: bool = False) -> List[InteractiveElement]: | ||
elements = [] | ||
try: | ||
resp = requests.post(f"{self.endpoint}/parse/", json={"base64_image": image_b64}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid blocking the event loop by using a synchronous HTTP library (requests
) in an async function. Consider an async client like aiohttp
/HTTPX
.
index/browser/detector.py
Outdated
) | ||
elements.append(elem) | ||
else: | ||
logging.warn("Cannot Found elements") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging.warning
instead of deprecated logging.warn
.
logging.warn("Cannot Found elements") | |
logging.warning("Cannot Found elements") |
index/browser/detector.py
Outdated
content_list = resp.json()["parsed_content_list"] | ||
|
||
data = base64.b64decode(som_image_b64) | ||
with open(f"ocr.png", "wb") as fp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing an ocr.png
file may be leftover debugging code; remove or refactor for production.
from index.browser.browser import Browser, BrowserConfig | ||
from index.browser.utils import put_highlight_elements_on_screenshot | ||
|
||
endpoint = "http://xxx:8000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'endpoint' value is hardcoded with a placeholder; consider externalizing configuration or using environment variables.
#52
Done:
OmniparserDetector.filter_cv_elements_by_interactive()
OmniparserDetector
.Question:
OmniparserDetector.filter_cv_elements_by_interactive(page: Page)
requiresplaywright.Page
parameters, which should be placed in the workflow, maybe modify theget_interactive_elements
.Important
Add
OmniparserDetector
for OCR-based interactive element detection with filtering and a use case demonstration.OmniparserDetector
class indetector.py
for OCR-based detection of interactive elements.detect_from_image()
to send screenshots to a server for OCR and parse results intoInteractiveElement
objects.filter_cv_elements_by_interactive()
to filter non-interactive elements using coordinate mapping.checkCoordinateInteractive.js
to evaluate element interactivity based on coordinates.omni_usecase.py
to demonstrate OCR and HTML-based element detection and highlight results on screenshots.This description was created by
for af1c242. You can customize this summary. It will automatically update as commits are pushed.