Skip to content

Conversation

dinmukhamedm
Copy link
Member

@dinmukhamedm dinmukhamedm commented Sep 10, 2025

Important

Enhance litellm with new CUA instrumentation, improved response handling, and extensive testing for OpenAI and Anthropic integrations.

  • Instrumentation:
    • Add CuaAgentInstrumentor and CuaComputerInstrumentor in cua_agent/__init__.py and cua_computer/__init__.py for OpenTelemetry instrumentation.
    • Update _instrument_initializers.py and instruments.py to include new instrumentors.
  • Litellm Enhancements:
    • Refactor _process_input_messages and _process_response_output in litellm/__init__.py to handle new message types and improve content processing.
    • Add is_validator_iterator() in utils.py to handle iterable content types.
  • Testing:
    • Add tests for litellm responses with computer tools in test_litellm_openai.py and test_litellm_anthropic.py.
    • Update test cases to handle new response structures and tool integrations.
  • Miscellaneous:
    • Update pyproject.toml to set target-version for ruff to py310.
    • Minor exception handling improvement in decorators/__init__.py.

This description was created by Ellipsis for 5dee2d5. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to c3339c6 in 3 minutes and 33 seconds. Click for details.
  • Reviewed 1363 lines of code in 13 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. tests/test_litellm_openai.py:89
  • Draft comment:
    Multiple tests use repeated time.sleep(SLEEP_TO_FLUSH_SECONDS) and Laminar.flush() calls. Consider encapsulating this wait+flush logic into a helper function or fixture to reduce duplication and improve test maintainability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. tests/test_litellm_openai.py:96
  • Draft comment:
    Several assertions hard-code token counts (e.g. input_tokens == 14). These magic numbers can be brittle if underlying tokenization changes. Consider either calculating expected token counts dynamically or documenting these fixed values explicitly.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. tests/test_litellm_openai.py:736
  • Draft comment:
    Assertions on structured output schema vary between Pydantic and non‐Pydantic cases. Ensure this behavior is documented and intentional, or consider unifying the schema expectations.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. tests/test_litellm_anthropic.py:353
  • Draft comment:
    Many tests duplicate similar assertions for attributes like model names, token counts, and prompt content. Consider parameterizing or abstracting these common checks into helper functions to reduce duplication.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. src/lmnr/opentelemetry_lib/litellm/__init__.py:348
  • Draft comment:
    Typographical error: The comment says "messge"; please correct it to "message".
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
6. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:367
  • Draft comment:
    Typo in comment: 'messge' should be 'message'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While this is a real typo in a comment that was added in this diff, fixing typos in comments is very low value. The comment is still perfectly understandable despite the typo. Our rules say not to make comments that are obvious or unimportant, and a minor typo in a comment definitely qualifies as unimportant. The typo is real and was introduced in this diff, so technically it's a valid comment about a changed line. And having correct spelling in comments is good practice. While correct spelling is good, this kind of minor typo fix creates noise in the PR review and distracts from more important issues. The meaning is still clear despite the typo. Delete this comment as it points out an unimportant typo in a comment that doesn't impact code functionality or readability.
7. tests/cassettes/test_litellm_openai/test_litellm_openai_responses_with_computer_tools.yaml:3
  • Draft comment:
    It appears that the JSON string for the 'text' value in the 'input_text' object is split across two lines ("Take" on line 3 and "a screenshot of the desktop." on line 4). If this newline is unintentional, it could result in an unexpected value. Please confirm whether the text should be on a single line.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_fsOCbw3raRDvKdnd

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

dinmukhamedm and others added 4 commits September 10, 2025 21:34
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
…gent/__init__.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
…omputer/__init__.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 96293e9 in 49 seconds. Click for details.
  • Reviewed 12 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. pyproject.toml:148
  • Draft comment:
    Added [tool.ruff] config looks correct; consider adding a trailing newline for consistency.
  • Reason this comment was not posted:
    Confidence changes required: 30% <= threshold 50% None

Workflow ID: wflow_ojKilvI5yVq39hy5

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed b25406d in 2 minutes and 25 seconds. Click for details.
  • Reviewed 87 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_agent/__init__.py:52
  • Draft comment:
    Using 'continue' inside the nested try block may skip the explicit call to end the span. Verify that Laminar.use_span properly ends the span when 'continue' is triggered.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment raises a valid concern about control flow, but misses a crucial detail - the span is managed by a context manager (with Laminar.use_span(step_span)). Context managers in Python are specifically designed to handle cleanup even when control flow changes via continue/break/return/exceptions. The use_span context manager should properly end the span regardless of how the block exits. I could be wrong about how Laminar.use_span is implemented - maybe it's not a proper context manager? The code could be from an old version with different behavior? The use of the 'with' statement strongly indicates this is a proper context manager, and context manager cleanup is a fundamental Python language feature that would be very unusual to break. The comment should be deleted because it misunderstands Python context managers - the span will be properly ended by the context manager regardless of the continue statement.
2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py:343
  • Draft comment:
    The span creation now includes span_type="TOOL". Confirm that Laminar.start_as_current_span supports the 'span_type' parameter and that this change aligns with downstream expectations.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that a specific function supports a new parameter and that the change aligns with downstream expectations. This falls under asking the author to confirm their intention and ensure behavior, which is against the rules.
3. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py:131
  • Draft comment:
    The output_formatter for the 'screenshot' method has been switched from payload_to_base64url to payload_to_placeholder. Ensure that downstream components expecting a base64 URL can handle the placeholder format.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that downstream components can handle the new format, which violates the rule against asking the author to ensure compatibility or test changes. It doesn't provide a specific suggestion or ask for a test to be written, so it should be removed.

Workflow ID: wflow_wS8MxhsUklukA9ao

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.



def payload_to_placeholder(payload_bytes: bytes) -> bytes:
return "<BINARY_BLOB_SCREENSHOT>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return type mismatch: payload_to_placeholder is annotated to return bytes but returns a string literal. Consider returning a bytes literal (e.g., b"<BINARY_BLOB_SCREENSHOT>") or updating the type annotation.

Suggested change
return "<BINARY_BLOB_SCREENSHOT>"
return b"<BINARY_BLOB_SCREENSHOT>"

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 5dee2d5 in 1 minute and 14 seconds. Click for details.
  • Reviewed 11 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py:11
  • Draft comment:
    Adjusted return type to str matches the actual returned literal. Confirm callers are expecting a string, not bytes.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py:11
  • Draft comment:
    The parameter 'payload_bytes' is unused in payload_to_placeholder. Consider renaming it to '_' for clarity.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_vOWp03CYmr74s4qy

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@dinmukhamedm dinmukhamedm merged commit 404098d into main Sep 11, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant