instrument cua, fixes and tests to litellm responses #187

dinmukhamedm · 2025-09-10T20:30:20Z

Important

Enhance litellm with new CUA instrumentation, improved response handling, and extensive testing for OpenAI and Anthropic integrations.

Instrumentation:
- Add CuaAgentInstrumentor and CuaComputerInstrumentor in cua_agent/__init__.py and cua_computer/__init__.py for OpenTelemetry instrumentation.
- Update _instrument_initializers.py and instruments.py to include new instrumentors.
Litellm Enhancements:
- Refactor _process_input_messages and _process_response_output in litellm/__init__.py to handle new message types and improve content processing.
- Add is_validator_iterator() in utils.py to handle iterable content types.
Testing:
- Add tests for litellm responses with computer tools in test_litellm_openai.py and test_litellm_anthropic.py.
- Update test cases to handle new response structures and tool integrations.
Miscellaneous:
- Update pyproject.toml to set target-version for ruff to py310.
- Minor exception handling improvement in decorators/__init__.py.

^{This description was created by}^{for 5dee2d5. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to c3339c6 in 3 minutes and 33 seconds. Click for details.

Reviewed 1363 lines of code in 13 files
Skipped 0 files when reviewing.
Skipped posting 7 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. tests/test_litellm_openai.py:89

Draft comment:
Multiple tests use repeated time.sleep(SLEEP_TO_FLUSH_SECONDS) and Laminar.flush() calls. Consider encapsulating this wait+flush logic into a helper function or fixture to reduce duplication and improve test maintainability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. tests/test_litellm_openai.py:96

Draft comment:
Several assertions hard-code token counts (e.g. input_tokens == 14). These magic numbers can be brittle if underlying tokenization changes. Consider either calculating expected token counts dynamically or documenting these fixed values explicitly.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. tests/test_litellm_openai.py:736

Draft comment:
Assertions on structured output schema vary between Pydantic and non‐Pydantic cases. Ensure this behavior is documented and intentional, or consider unifying the schema expectations.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. tests/test_litellm_anthropic.py:353

Draft comment:
Many tests duplicate similar assertions for attributes like model names, token counts, and prompt content. Consider parameterizing or abstracting these common checks into helper functions to reduce duplication.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. src/lmnr/opentelemetry_lib/litellm/__init__.py:348

Draft comment:
Typographical error: The comment says "messge"; please correct it to "message".
Reason this comment was not posted:
Comment looked like it was already resolved.

6. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:367

Draft comment:
Typo in comment: 'messge' should be 'message'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% While this is a real typo in a comment that was added in this diff, fixing typos in comments is very low value. The comment is still perfectly understandable despite the typo. Our rules say not to make comments that are obvious or unimportant, and a minor typo in a comment definitely qualifies as unimportant. The typo is real and was introduced in this diff, so technically it's a valid comment about a changed line. And having correct spelling in comments is good practice. While correct spelling is good, this kind of minor typo fix creates noise in the PR review and distracts from more important issues. The meaning is still clear despite the typo. Delete this comment as it points out an unimportant typo in a comment that doesn't impact code functionality or readability.

7. tests/cassettes/test_litellm_openai/test_litellm_openai_responses_with_computer_tools.yaml:3

Draft comment:
It appears that the JSON string for the 'text' value in the 'input_text' object is split across two lines ("Take" on line 3 and "a screenshot of the desktop." on line 4). If this newline is unintentional, it could result in an unexpected value. Please confirm whether the text should be on a single line.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_fsOCbw3raRDvKdnd

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

tests/test_litellm_openai.py

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_agent/__init__.py

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

…gent/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

…omputer/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 96293e9 in 49 seconds. Click for details.

Reviewed 12 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. pyproject.toml:148

Draft comment:
Added [tool.ruff] config looks correct; consider adding a trailing newline for consistency.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

Workflow ID: wflow_ojKilvI5yVq39hy5

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…span

ellipsis-dev

Caution

Changes requested ❌

Reviewed b25406d in 2 minutes and 25 seconds. Click for details.

Reviewed 87 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_agent/__init__.py:52

Draft comment:
Using 'continue' inside the nested try block may skip the explicit call to end the span. Verify that Laminar.use_span properly ends the span when 'continue' is triggered.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment raises a valid concern about control flow, but misses a crucial detail - the span is managed by a context manager (with Laminar.use_span(step_span)). Context managers in Python are specifically designed to handle cleanup even when control flow changes via continue/break/return/exceptions. The use_span context manager should properly end the span regardless of how the block exits. I could be wrong about how Laminar.use_span is implemented - maybe it's not a proper context manager? The code could be from an old version with different behavior? The use of the 'with' statement strongly indicates this is a proper context manager, and context manager cleanup is a fundamental Python language feature that would be very unusual to break. The comment should be deleted because it misunderstands Python context managers - the span will be properly ended by the context manager regardless of the continue statement.

2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py:343

Draft comment:
The span creation now includes span_type="TOOL". Confirm that Laminar.start_as_current_span supports the 'span_type' parameter and that this change aligns with downstream expectations.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that a specific function supports a new parameter and that the change aligns with downstream expectations. This falls under asking the author to confirm their intention and ensure behavior, which is against the rules.

3. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/__init__.py:131

Draft comment:
The output_formatter for the 'screenshot' method has been switched from payload_to_base64url to payload_to_placeholder. Ensure that downstream components expecting a base64 URL can handle the placeholder format.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that downstream components can handle the new format, which violates the rule against asking the author to ensure compatibility or test changes. It doesn't provide a specific suggestion or ask for a test to be written, so it should be removed.

Workflow ID: wflow_wS8MxhsUklukA9ao

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-09-11T15:16:25Z

src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py

+
+
+def payload_to_placeholder(payload_bytes: bytes) -> bytes:
+    return "<BINARY_BLOB_SCREENSHOT>"


Return type mismatch: payload_to_placeholder is annotated to return bytes but returns a string literal. Consider returning a bytes literal (e.g., b"<BINARY_BLOB_SCREENSHOT>") or updating the type annotation.

Suggested change

return "<BINARY_BLOB_SCREENSHOT>"

return b"<BINARY_BLOB_SCREENSHOT>"

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 5dee2d5 in 1 minute and 14 seconds. Click for details.

Reviewed 11 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py:11

Draft comment:
Adjusted return type to str matches the actual returned literal. Confirm callers are expecting a string, not bytes.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_computer/utils.py:11

Draft comment:
The parameter 'payload_bytes' is unused in payload_to_placeholder. Consider renaming it to '_' for clarity.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_vOWp03CYmr74s4qy

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

instrument cua, fixes and tests to litellm responses

c3339c6

ellipsis-dev bot reviewed Sep 10, 2025

View reviewed changes

dinmukhamedm and others added 4 commits September 10, 2025 21:34

Update tests/test_litellm_openai.py

0afebae

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Update src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_a…

221c2e6

…gent/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Update src/lmnr/opentelemetry_lib/opentelemetry/instrumentation/cua_c…

c2fb191

…omputer/__init__.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

add ruff config python 310

96293e9

ellipsis-dev bot reviewed Sep 10, 2025

View reviewed changes

tool span, dont record screenshot, rename step span, skip empty step …

b25406d

…span

ellipsis-dev bot reviewed Sep 11, 2025

View reviewed changes

fix type hint

5dee2d5

ellipsis-dev bot reviewed Sep 11, 2025

View reviewed changes

dinmukhamedm merged commit 404098d into main Sep 11, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

instrument cua, fixes and tests to litellm responses #187

instrument cua, fixes and tests to litellm responses #187

Uh oh!

dinmukhamedm commented Sep 10, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot Sep 11, 2025

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		def payload_to_placeholder(payload_bytes: bytes) -> bytes:
		return "<BINARY_BLOB_SCREENSHOT>"

	return "<BINARY_BLOB_SCREENSHOT>"
	return b"<BINARY_BLOB_SCREENSHOT>"

instrument cua, fixes and tests to litellm responses #187

instrument cua, fixes and tests to litellm responses #187

Uh oh!

Conversation

dinmukhamedm commented Sep 10, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dinmukhamedm commented Sep 10, 2025 •

edited by ellipsis-dev bot

Loading