-
Notifications
You must be signed in to change notification settings - Fork 7
instrument cua, fixes and tests to litellm responses #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to c3339c6 in 3 minutes and 33 seconds. Click for details.
- Reviewed
1363
lines of code in13
files - Skipped
0
files when reviewing. - Skipped posting
7
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. tests/test_litellm_openai.py:89
- Draft comment:
Multiple tests use repeated time.sleep(SLEEP_TO_FLUSH_SECONDS) and Laminar.flush() calls. Consider encapsulating this wait+flush logic into a helper function or fixture to reduce duplication and improve test maintainability. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. tests/test_litellm_openai.py:96
- Draft comment:
Several assertions hard-code token counts (e.g. input_tokens == 14). These magic numbers can be brittle if underlying tokenization changes. Consider either calculating expected token counts dynamically or documenting these fixed values explicitly. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. tests/test_litellm_openai.py:736
- Draft comment:
Assertions on structured output schema vary between Pydantic and non‐Pydantic cases. Ensure this behavior is documented and intentional, or consider unifying the schema expectations. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. tests/test_litellm_anthropic.py:353
- Draft comment:
Many tests duplicate similar assertions for attributes like model names, token counts, and prompt content. Consider parameterizing or abstracting these common checks into helper functions to reduce duplication. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. src/lmnr/opentelemetry_lib/litellm/__init__.py:348
- Draft comment:
Typographical error: The comment says "messge"; please correct it to "message". - Reason this comment was not posted:
Comment looked like it was already resolved.
6. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:367
- Draft comment:
Typo in comment: 'messge' should be 'message'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While this is a real typo in a comment that was added in this diff, fixing typos in comments is very low value. The comment is still perfectly understandable despite the typo. Our rules say not to make comments that are obvious or unimportant, and a minor typo in a comment definitely qualifies as unimportant. The typo is real and was introduced in this diff, so technically it's a valid comment about a changed line. And having correct spelling in comments is good practice. While correct spelling is good, this kind of minor typo fix creates noise in the PR review and distracts from more important issues. The meaning is still clear despite the typo. Delete this comment as it points out an unimportant typo in a comment that doesn't impact code functionality or readability.
7. tests/cassettes/test_litellm_openai/test_litellm_openai_responses_with_computer_tools.yaml:3
- Draft comment:
It appears that the JSON string for the 'text' value in the 'input_text' object is split across two lines ("Take" on line 3 and "a screenshot of the desktop." on line 4). If this newline is unintentional, it could result in an unexpected value. Please confirm whether the text should be on a single line. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_fsOCbw3raRDvKdnd
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_agent/__init__.py
Outdated
Show resolved
Hide resolved
src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py
Outdated
Show resolved
Hide resolved
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
…gent/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
…omputer/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 96293e9 in 49 seconds. Click for details.
- Reviewed
12
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pyproject.toml:148
- Draft comment:
Added [tool.ruff] config looks correct; consider adding a trailing newline for consistency. - Reason this comment was not posted:
Confidence changes required:30%
<= threshold50%
None
Workflow ID: wflow_ojKilvI5yVq39hy5
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed b25406d in 2 minutes and 25 seconds. Click for details.
- Reviewed
87
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
3
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_agent/__init__.py:52
- Draft comment:
Using 'continue' inside the nested try block may skip the explicit call to end the span. Verify that Laminar.use_span properly ends the span when 'continue' is triggered. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment raises a valid concern about control flow, but misses a crucial detail - the span is managed by a context manager (with Laminar.use_span(step_span)). Context managers in Python are specifically designed to handle cleanup even when control flow changes via continue/break/return/exceptions. The use_span context manager should properly end the span regardless of how the block exits. I could be wrong about how Laminar.use_span is implemented - maybe it's not a proper context manager? The code could be from an old version with different behavior? The use of the 'with' statement strongly indicates this is a proper context manager, and context manager cleanup is a fundamental Python language feature that would be very unusual to break. The comment should be deleted because it misunderstands Python context managers - the span will be properly ended by the context manager regardless of the continue statement.
2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py:343
- Draft comment:
The span creation now includes span_type="TOOL". Confirm that Laminar.start_as_current_span supports the 'span_type' parameter and that this change aligns with downstream expectations. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the PR author to confirm that a specific function supports a new parameter and that the change aligns with downstream expectations. This falls under asking the author to confirm their intention and ensure behavior, which is against the rules.
3. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py:131
- Draft comment:
The output_formatter for the 'screenshot' method has been switched from payload_to_base64url to payload_to_placeholder. Ensure that downstream components expecting a base64 URL can handle the placeholder format. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the PR author to ensure that downstream components can handle the new format, which violates the rule against asking the author to ensure compatibility or test changes. It doesn't provide a specific suggestion or ask for a test to be written, so it should be removed.
Workflow ID: wflow_wS8MxhsUklukA9ao
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
||
|
||
def payload_to_placeholder(payload_bytes: bytes) -> bytes: | ||
return "<BINARY_BLOB_SCREENSHOT>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type mismatch: payload_to_placeholder is annotated to return bytes but returns a string literal. Consider returning a bytes literal (e.g., b"<BINARY_BLOB_SCREENSHOT>") or updating the type annotation.
return "<BINARY_BLOB_SCREENSHOT>" | |
return b"<BINARY_BLOB_SCREENSHOT>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 5dee2d5 in 1 minute and 14 seconds. Click for details.
- Reviewed
11
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py:11
- Draft comment:
Adjusted return type to str matches the actual returned literal. Confirm callers are expecting a string, not bytes. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py:11
- Draft comment:
The parameter 'payload_bytes' is unused in payload_to_placeholder. Consider renaming it to '_' for clarity. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_vOWp03CYmr74s4qy
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Important
Enhance
litellm
with new CUA instrumentation, improved response handling, and extensive testing for OpenAI and Anthropic integrations.CuaAgentInstrumentor
andCuaComputerInstrumentor
incua_agent/__init__.py
andcua_computer/__init__.py
for OpenTelemetry instrumentation._instrument_initializers.py
andinstruments.py
to include new instrumentors._process_input_messages
and_process_response_output
inlitellm/__init__.py
to handle new message types and improve content processing.is_validator_iterator()
inutils.py
to handle iterable content types.litellm
responses with computer tools intest_litellm_openai.py
andtest_litellm_anthropic.py
.pyproject.toml
to settarget-version
forruff
topy310
.decorators/__init__.py
.This description was created by
for 5dee2d5. You can customize this summary. It will automatically update as commits are pushed.