fix: migrate OpenAI provider to use Responses API #11674

devbyteai · 2025-12-27T18:21:23Z

Summary

Migrates OpenAI native API calls from the deprecated chat.completions.create endpoint to the new responses.create endpoint as recommended by OpenAI.

Fixes #11624

Changes

Core Changes

Updated OpenAI provider in llm.py to use client.responses.create()
Added extract_responses_api_reasoning() helper to parse reasoning output (handles both string and array summary formats)
Added extract_responses_api_tool_calls() helper to parse function calls
Added error handling for API errors (matching Anthropic provider pattern)
Extract system messages to instructions parameter (Responses API requirement)

Parameter Mapping (Chat Completions → Responses API)

messages → input (non-system messages only)
System messages → instructions parameter
max_completion_tokens → max_output_tokens
response_format={...} → text={"format":{...}}

Response Parsing (Chat Completions → Responses API)

choices[0].message.content → output_text
usage.prompt_tokens → usage.input_tokens
usage.completion_tokens → usage.output_tokens
choices[0].message.tool_calls → output items with type="function_call"

Compatibility

SDK Version

Required: openai >= 1.66.0 (Responses API added in v1.66.0)
AutoGPT uses: ^1.97.1 (COMPATIBLE)

API Compatibility

llm_call() function signature - UNCHANGED
LLMResponse class structure - UNCHANGED
Return type and fields - UNCHANGED

Provider Impact

openai - YES, modified (Native OpenAI - uses Responses API)
anthropic - NO (Different SDK entirely)
groq - NO (Third-party API, Chat Completions compatible)
open_router - NO (Third-party API, Chat Completions compatible)
llama_api - NO (Third-party API, Chat Completions compatible)
ollama - NO (Uses ollama SDK)
aiml_api - NO (Third-party API, Chat Completions compatible)
v0 - NO (Third-party API, Chat Completions compatible)

Dependent Blocks Verified

smart_decision_maker.py (Line 508) - Uses: response, tool_calls, prompt_tokens, completion_tokens, reasoning - COMPATIBLE
ai_condition.py (Line 113) - Uses: response, prompt_tokens, completion_tokens, prompt - COMPATIBLE
perplexity.py - Does not use llm_call (uses different API) - NOT AFFECTED

Streaming Service

backend/server/v2/chat/service.py is NOT affected - it uses OpenRouter by default which requires Chat Completions API format.

Testing

Test File Updates

Updated test_llm.py mocks to use output_text instead of choices[0].message.content
Updated mocks to use output array for tool calls
Updated mocks to use usage.input_tokens / usage.output_tokens

Verification Performed

SDK version compatibility verified (1.97.1 > 1.66.0)
Function signature unchanged
LLMResponse class unchanged
All 7 other providers unchanged
Dependent blocks use only public API
Streaming service unaffected (uses OpenRouter)
Error handling matches Anthropic provider pattern
Tool call extraction handles call_id with fallback to id
Reasoning extraction handles both string and array summary formats

Recommended Manual Testing

Test with GPT-4o model using native OpenAI API
Test with tool/function calling enabled
Test with JSON mode (force_json_output=True)
Verify token counting works correctly

Files Modified

1. `autogpt_platform/backend/backend/blocks/llm.py`

Added extract_responses_api_reasoning() helper
Added extract_responses_api_tool_calls() helper
Updated OpenAI provider section to use responses.create
Added error handling with try/except
Extract system messages to instructions parameter

2. `autogpt_platform/backend/backend/blocks/test/test_llm.py`

Updated mocks for Responses API format

References

Checklist

Changes

I have clearly listed my changes in the PR description
I have made a test plan
I have tested my changes according to the test plan:
- Updated unit test mocks to use Responses API format
- Verified function signature unchanged
- Verified LLMResponse class unchanged
- Verified dependent blocks compatible
- Verified other providers unchanged

Code Quality

My code follows the project's style guidelines
I have commented my code where necessary
My changes generate no new warnings
I have added error handling matching existing patterns

…vitas#11639) ### Changes 🏗️ Chat should be disabled by default; otherwise, it flashes, and if Launch Darkly fails to fail, it is dangerous. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run locally with Launch Darkly disabled and test the above

Updates the OpenAI provider in llm.py to use the newer Responses API (responses.create) instead of the legacy Chat Completions API (chat.completions.create). Changes: - Replace chat.completions.create with responses.create - Use 'input' parameter instead of 'messages' - Use 'max_output_tokens' instead of 'max_completion_tokens' - Parse response.output_text instead of choices[0].message.content - Use input_tokens/output_tokens for token usage tracking - Add helper functions for extracting reasoning and tool calls from the Responses API response format - Update tests to mock the new API format The chat service is not migrated as it uses OpenRouter which requires the Chat Completions API format. Closes Significant-Gravitas#11624

coderabbitai · 2025-12-27T18:21:28Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-27T18:21:31Z

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

wasnt in my first commit dunno wtf with this

diffray-bot · 2025-12-28T16:31:23Z

Changes Summary

This PR migrates the native OpenAI API integration from the deprecated Chat Completions endpoint to the new Responses API (v1.66.0+). It introduces two new helper functions to parse Responses API output, updates the OpenAI provider section to use the new endpoint, and adjusts parameter mapping and response parsing accordingly.

Type: feature

Components Affected: backend.blocks.llm (OpenAI provider), test.test_llm (test mocks), LLM infrastructure

Files Changed

File	Summary	Change	Impact
`.../autogpt_platform/backend/backend/blocks/llm.py`	Updated OpenAI provider to use Responses API instead of Chat Completions; added extract_responses_api_reasoning() and extract_responses_api_tool_calls() helpers; refactored parameter mapping and response parsing.	✏️	🔴
`...latform/backend/backend/blocks/test/test_llm.py`	Updated test mocks to reflect Responses API response format (output_text, input_tokens, output_tokens).	✏️	🟡

Architecture Impact

New Patterns: Adapter/wrapper pattern for API response extraction, Conditional response parsing based on API format
Dependencies: openai SDK: already compatible (1.97.1 >= 1.66.0 requirement)
Coupling: Reduced coupling to Chat Completions API format; new dependency on Responses API schema. Function signature and return types remain unchanged, so downstream blocks are unaffected.

Risk Areas: Response field mapping differences: output_text vs message.content, input_tokens/output_tokens vs prompt_tokens/completion_tokens, Tool call extraction: handling call_id vs id field in Responses API output, Reasoning extraction: handling both string and array summary formats in reasoning output, Error handling: new try/catch for openai.APIError (follows Anthropic pattern), System message extraction: moved from messages to instructions parameter, Response format parameter mapping: response_format to text parameter

Suggestions

Verify manual testing with GPT-4o model and tool calling enabled (as recommended in PR)
Test JSON mode (force_json_output=True) with Responses API
Verify token counting accuracy with real API responses
Consider adding integration tests with live OpenAI API (if feasible in CI/CD)
Document the API migration for future maintainers

_{Full review in progress... | Powered by diffray}

autogpt_platform/backend/backend/blocks/llm.py

autogpt_platform/backend/backend/blocks/test/test_llm.py

autogpt_platform/backend/backend/blocks/llm.py

autogpt_platform/backend/backend/blocks/test/test_llm.py

diffray-bot · 2025-12-28T16:38:03Z

Review Summary

Free public review - Want AI code reviews on your PRs? Check out diffray.ai

Validated 78 issues: 42 kept, 36 filtered

Issues Found: 42

💬 See 17 individual line comment(s) for details.

📊 23 unique issue type(s) across 42 location(s)

📋 Full issue list (click to expand)

🔴 CRITICAL - Array bounds check missing for response.choices[0] (3 occurrences)

Agent: bugs

Category: bug

Why this matters: Accessing empty arrays throws IndexError/undefined access.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:367`	The extract_openai_reasoning function accesses response.choices[0] without checking if the choices a...	Add a bounds check before accessing choices[0]: if response.choices and len(response.choices) > 0: b...	92%
`autogpt_platform/backend/backend/blocks/llm.py:381`	The extract_openai_tool_calls function accesses response.choices[0] without checking if the choices ...	Add a bounds check: if response.choices and len(response.choices) > 0: before accessing choices[0].	92%
`autogpt_platform/backend/backend/blocks/llm.py:423`	Line 423 checks if item has call_id and falls back to item.id if not, but doesn't verify item.id act...	Use getattr with a default: id=getattr(item, 'call_id', getattr(item, 'id', None)) or add explicit h...	78%

Rule: bug_array_bounds

🔴 CRITICAL - Redundant Optional usage with union type syntax

Agent: python

Category: quality

Why this matters: Type hints enable IDE autocomplete and catch type errors early.

File: autogpt_platform/backend/backend/blocks/llm.py:326

Description: Line 326 uses both 'Optional[List[ToolContentBlock]]' and '| None' on the same field, creating redundancy and confusion. Optional[X] is equivalent to X | None.

Suggestion: Change to: tool_calls: list[ToolContentBlock] | None = None

Confidence: 98%

Rule: py_use_type_annotations_for_better_readabil

🟠 HIGH - Redundant and illogical None comparison (2 occurrences)

Agent: python

Category: bug

Why this matters: Redundant None check indicates logic error or dead branch.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:694-698`	After checking 'if not response.choices' at line 694, the code checks 'if response' at line 695. Thi...	Remove the inner conditional. Simply raise ValueError('No response choices from OpenRouter').	95%
`autogpt_platform/backend/backend/blocks/llm.py:736-740`	After checking 'if not response.choices' at line 736, the code checks 'if response' at line 737. Thi...	Remove the inner conditional. Simply raise ValueError('No response choices from Llama API').	95%

Rule: py_avoid_redundant_none_comparisons

🟠 HIGH - Sequential async calls in loop instead of parallel gathering

Agent: performance

Category: performance

Why this matters: N+1 queries cause severe performance degradation.

File: autogpt_platform/backend/backend/blocks/llm.py:1452-1454

Description: The _run method in AITextSummarizerBlock processes chunks sequentially using a for loop with await on each chunk. With multiple chunks, this becomes O(n) serialized operations when they could be parallelized.

Suggestion: Use asyncio.gather() or asyncio.TaskGroup to summarize all chunks concurrently, reducing wall-clock time from O(n*t) to O(t).

Confidence: 88%

Rule: perf_n_plus_one_queries

🟠 HIGH - Inconsistent mock setup for async HTTP calls (2 occurrences)

Agent: microservices

Category: bug

Why this matters: Live I/O introduces slowness, nondeterminism, and external failures unrelated to the code.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/test/test_llm.py:283`	In test_ai_text_summarizer_real_llm_call_stats, the async function mock_create is assigned directly ...	Change line 283 from 'mock_client.responses.create = mock_create' to 'mock_client.responses.create =...	70%
`autogpt_platform/backend/backend/blocks/test/test_llm.py:260`	The test is named 'test_ai_text_summarizer_real_llm_call_stats' which suggests it makes real LLM cal...	Rename to 'test_ai_text_summarizer_mocked_llm_call_stats' or 'test_ai_text_summarizer_llm_call_stats...	65%

Rule: gen_no_live_io_in_unit_tests

🟠 HIGH - Error logged without sufficient context for debugging

Agent: bugs

Category: bug

Why this matters: Context-free errors are impossible to trace in production logs.

File: autogpt_platform/backend/backend/blocks/llm.py:527-529

Description: When catching openai.APIError, the error message only includes the error text but lacks context about what model was being called, the input parameters, or the operation context. This makes debugging and monitoring difficult in production logs.

Suggestion: Include relevant context in error message: add llm_model value, a summary of input messages, and operation context. Example: f'OpenAI Responses API error for model {llm_model.value}: {str(e)}'

Confidence: 72%

Rule: bug_missing_error_context

🟠 HIGH - God Function with 329 lines handling 8 different LLM providers

Agent: architecture

Category: quality

Why this matters: Framework coupling makes code harder to test and migrate.

File: autogpt_platform/backend/backend/blocks/llm.py:491

Description: The llm_call function is a monolithic function with completely separate logic paths for 8 LLM providers. Each provider has its own client instantiation, request building, response parsing, error handling, and token counting. This violates Single Responsibility Principle.

Suggestion: Refactor using Strategy pattern: create an abstract LLMProvider base class with provider-specific implementations. Use a factory to instantiate the correct provider based on model metadata.

Confidence: 75%

Rule: py_separate_business_logic_from_framework

🟠 HIGH - Expensive regex substitution on every parse failure

Agent: performance

Category: performance

File: autogpt_platform/backend/backend/blocks/llm.py:1048

Description: Line 1048 executes re.sub(r'[A-Za-z0-9]', '*', response_text) to censor responses for logging on every parse failure. The regex is recompiled each time and operates on potentially large text.

Suggestion: Pre-compile the regex at module level: CENSOR_PATTERN = re.compile(r'[A-Za-z0-9]'). Better: only censor a fixed-size snippet instead of entire response.

Confidence: 85%

Rule: perf_expensive_in_loop

🟠 HIGH - Missing timeout on external service call

Agent: python

Category: bug

File: autogpt_platform/backend/backend/blocks/llm.py:524-529

Description: The oai_client.responses.create() call at line 525 has no explicit timeout parameter. Compare with anthropic client at line 575 which properly includes timeout=600.

Suggestion: Add a timeout parameter to the responses_params dictionary before making the call, similar to the Anthropic call.

Confidence: 90%

Rule: python_request_no_timeout

🟠 HIGH - Incomplete error handling for external service call

Agent: python

Category: bug

File: autogpt_platform/backend/backend/blocks/llm.py:524-529

Description: The oai_client.responses.create() call only catches openai.APIError. Other exceptions like asyncio.TimeoutError, ConnectionError are not caught.

Suggestion: Expand the try-except to catch additional exceptions: asyncio.TimeoutError, ConnectionError, etc.

Confidence: 80%

Rule: py_add_error_handling_for_external_service_

🟠 HIGH - Overly broad exception handling

Agent: python

Category: quality

File: autogpt_platform/backend/backend/blocks/llm.py:1131

Description: Bare Exception clause catches all exceptions including SystemExit and KeyboardInterrupt. Should catch specific exception types.

Suggestion: Replace 'except Exception as e' with specific exception types like (openai.APIError, anthropic.APIError, ValueError).

Confidence: 90%

Rule: python_bare_except

🟠 HIGH - Input parameter 'input_data' is modified in-place (2 occurrences)

Agent: python

Category: quality

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:981-982`	The 'input_data' parameter is modified by reassigning input_data.prompt and input_data.sys_prompt, c...	Create local variables for formatted values instead of modifying the original input_data.	75%
`autogpt_platform/backend/backend/blocks/llm.py:1137-1139`	The 'input_data' parameter is modified by reassigning input_data.max_tokens in the exception handler...	Store adjusted max_tokens in a local variable and pass it to subsequent llm_call invocations.	70%

Rule: py_avoid_modifying_input_parameters

🟠 HIGH - Duplicate import in function body (5 occurrences)

Agent: python

Category: quality

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/test/test_llm.py:29`	The 'patch' function is imported at module level (line 1) but re-imported inside function (line 43)....	Remove the import statement at line 43 since 'patch' is already imported at the module level.	90%
`autogpt_platform/backend/backend/blocks/test/test_llm.py:260`	AsyncMock, MagicMock, and patch are all imported at module level (line 1) but re-imported inside fun...	Remove the import statement at line 257 since all these are already imported at the module level.	90%
`autogpt_platform/backend/backend/blocks/llm.py:1519`	Function-level import of 'truncate' from backend.util.truncate inside _summarize_chunk. This import ...	Move 'from backend.util.truncate import truncate' to the top-level imports.	70%
`autogpt_platform/backend/backend/blocks/llm.py:329`	Using Optional[str] when the modern Python 3.10+ syntax is str \| None. This is inconsistent with th...	Change 'reasoning: Optional[str] = None' to 'reasoning: str \| None = None'	75%
`autogpt_platform/backend/backend/blocks/llm.py:324`	Using 'List' from typing module when Python 3.9+ allows using 'list' directly. Inconsistent with res...	Change 'prompt: List[Any]' to 'prompt: list[Any]'	70%

Rule: py_remove_unused_imports_and_variables

🟡 MEDIUM - Test mocks internal method - limited integration coverage (2 occurrences)

Agent: testing

Category: quality

Why this matters: Improper mocks make tests brittle and unreliable.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/test/test_llm.py:310-351`	Test replaces block.llm_call with mock that sets execution_stats directly, then verifies the mock-se...	Consider mocking at OpenAI API boundary (AsyncOpenAI) to test real integration. Note: this pattern i...	65%
`autogpt_platform/backend/backend/blocks/test/test_llm.py:353-400`	Test replaces block.llm_call with mock that sets execution_stats directly, then verifies mock-set va...	Consider mocking at OpenAI API boundary instead. Note: this is the same pattern used elsewhere in th...	65%

Rule: test_py_mocking_too_much

🟡 MEDIUM - Missing return type annotation (2 occurrences)

Agent: python

Category: quality

Why this matters: Type hints prevent whole classes of bugs and improve IDE/refactor support.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:434-436`	Function 'get_parallel_tool_calls_param' has no return type annotation. Should declare the return ty...	Add return type annotation: 'def get_parallel_tool_calls_param(llm_model: LlmModel, parallel_tool_ca...	75%
`autogpt_platform/backend/backend/blocks/llm.py:1199-1207`	Parameter 'error' in method 'invalid_response_feedback' lacks type annotation. This method accepts d...	Add type annotation: 'def invalid_response_feedback(self, error: Union[ValueError, JSONDecodeError, ...	70%

Rule: py_add_comprehensive_type_hints

🟡 MEDIUM - Double loop through response.content blocks (2 occurrences)

Agent: performance

Category: performance

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:582-608`	The function loops through resp.content twice: lines 582-597 for tool_use and lines 605-608 for thin...	Combine into a single iteration through resp.content, checking for both types in one pass.	70%
`autogpt_platform/backend/backend/blocks/llm.py:500-504`	Lines 500 and 504 both iterate through the prompt list separately with list comprehensions.	Combine into single loop to extract both system_messages and input_messages in one pass.	65%

Rule: perf_quadratic_loops

🟡 MEDIUM - Debug print statements in test code

Agent: python

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:286

Description: Debug print() statements left in test function. Should be removed or replaced with logging.

Suggestion: Replace with logger.debug() calls or use pytest's caplog fixture, or remove if no longer needed.

Confidence: 85%

Rule: python_print_debug

🟡 MEDIUM - Test assertion comments contradict expected behavior

Agent: refactoring

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:186-191

Description: Line 187 states 'llm_call_count is only set on success, so it shows 1' but line 190 asserts llm_call_count == 2. The comment explanation is misleading - it actually should be retry_count + 1 = 2 as stated in line 190's inline comment.

Suggestion: Remove or correct the misleading comment at line 187. The assertion at line 190 with its inline comment 'retry_count + 1 = 1 + 1 = 2' is correct, but line 187 contradicts this.

Confidence: 75%

Rule: quality_unreachable_code

🟡 MEDIUM - User input formatted into prompts via Jinja2

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:979-982

Description: User-supplied prompt_values are formatted into system and user prompts using Jinja2 SandboxedEnvironment (via fmt.format_string()). While sandboxed, autoescape=False at initialization allows potential content manipulation in prompts.

Suggestion: Consider enabling autoescape for the TextFormatter used in LLM blocks, or validate prompt_values don't contain suspicious patterns. The SandboxedEnvironment mitigates template injection but doesn't prevent prompt manipulation.

Confidence: 65%

Rule: sec_llm_prompt_injection

🟡 MEDIUM - User input directly embedded into LLM prompts

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:1812-1823

Description: User-supplied input_data.focus and input_data.source_data are directly embedded into prompts using f-strings. While this is common for LLM applications, sensitive data could be sent to external providers.

Suggestion: Consider implementing PII detection or data sanitization for sensitive patterns before embedding user input into prompts sent to external LLM providers.

Confidence: 62%

Rule: sec_llm_sensitive_data_exposure

🟡 MEDIUM - Missing Input Validation for max_tokens Parameter (2 occurrences)

Agent: security

Category: security

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:436`	max_tokens parameter has no validation at function entry. However, the code at lines 486-489 does ca...	Consider adding explicit bounds validation at entry for clearer error messages, though the current l...	60%
`autogpt_platform/backend/backend/blocks/llm.py:1391-1401`	chunk_overlap has ge=0 but no upper bound. No cross-field validation ensures overlap < max_tokens, w...	Add validation to ensure chunk_overlap < max_tokens to prevent invalid chunking behavior.	70%

Rule: py_add_input_validation_for_critical_parame

🟡 MEDIUM - OpenAI Responses API Call Without Explicit Timeout (6 occurrences)

Agent: security

Category: security

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:525`	The OpenAI Responses API call does not explicitly set a timeout parameter. This could lead to indefi...	Add an explicit timeout parameter to the create() call, e.g., timeout=60.	72%
`autogpt_platform/backend/backend/blocks/llm.py:633-638`	The Groq API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:681-691`	The OpenRouter API call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:723-733`	The Llama API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:765-769`	The AIML API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:797-804`	The v0 API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%

Rule: sec_external_call_no_timeout

🟡 MEDIUM - Test assertions too vague to catch bugs (2 occurrences)

Agent: testing

Category: quality

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/test/test_llm.py:194-252`	The test uses assertions like `call_count > 1` and `llm_call_count > 0` which are too permissive and...	Replace `> 1` and `> 0` assertions with exact expected values based on known chunking behavior.	75%
`autogpt_platform/backend/backend/blocks/test/test_llm.py:260`	Test has print statements (lines 301-303) and uses `>= 1` assertion when comment says 'Should have m...	Remove print statements. Replace `>= 1` with `== 2` per the test comment. Use specific expected valu...	85%

Rule: test_py_bare_assert

ℹ️ 25 issue(s) outside PR diff (click to expand)

These issues were found in lines not modified in this PR.

🔴 CRITICAL - Redundant Optional usage with union type syntax

Agent: python

Category: quality

Why this matters: Type hints enable IDE autocomplete and catch type errors early.

File: autogpt_platform/backend/backend/blocks/llm.py:326

Description: Line 326 uses both 'Optional[List[ToolContentBlock]]' and '| None' on the same field, creating redundancy and confusion. Optional[X] is equivalent to X | None.

Suggestion: Change to: tool_calls: list[ToolContentBlock] | None = None

Confidence: 98%

Rule: py_use_type_annotations_for_better_readabil

🟠 HIGH - Redundant and illogical None comparison (2 occurrences)

Agent: python

Category: bug

Why this matters: Redundant None check indicates logic error or dead branch.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:694-698`	After checking 'if not response.choices' at line 694, the code checks 'if response' at line 695. Thi...	Remove the inner conditional. Simply raise ValueError('No response choices from OpenRouter').	95%
`autogpt_platform/backend/backend/blocks/llm.py:736-740`	After checking 'if not response.choices' at line 736, the code checks 'if response' at line 737. Thi...	Remove the inner conditional. Simply raise ValueError('No response choices from Llama API').	95%

Rule: py_avoid_redundant_none_comparisons

🟠 HIGH - Sequential async calls in loop instead of parallel gathering

Agent: performance

Category: performance

Why this matters: N+1 queries cause severe performance degradation.

File: autogpt_platform/backend/backend/blocks/llm.py:1452-1454

Description: The _run method in AITextSummarizerBlock processes chunks sequentially using a for loop with await on each chunk. With multiple chunks, this becomes O(n) serialized operations when they could be parallelized.

Suggestion: Use asyncio.gather() or asyncio.TaskGroup to summarize all chunks concurrently, reducing wall-clock time from O(n*t) to O(t).

Confidence: 88%

Rule: perf_n_plus_one_queries

🟠 HIGH - Expensive regex substitution on every parse failure

Agent: performance

Category: performance

File: autogpt_platform/backend/backend/blocks/llm.py:1048

Description: Line 1048 executes re.sub(r'[A-Za-z0-9]', '*', response_text) to censor responses for logging on every parse failure. The regex is recompiled each time and operates on potentially large text.

Suggestion: Pre-compile the regex at module level: CENSOR_PATTERN = re.compile(r'[A-Za-z0-9]'). Better: only censor a fixed-size snippet instead of entire response.

Confidence: 85%

Rule: perf_expensive_in_loop

🟠 HIGH - Overly broad exception handling

Agent: python

Category: quality

File: autogpt_platform/backend/backend/blocks/llm.py:1131

Description: Bare Exception clause catches all exceptions including SystemExit and KeyboardInterrupt. Should catch specific exception types.

Suggestion: Replace 'except Exception as e' with specific exception types like (openai.APIError, anthropic.APIError, ValueError).

Confidence: 90%

Rule: python_bare_except

🟠 HIGH - Input parameter 'input_data' is modified in-place (2 occurrences)

Agent: python

Category: quality

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:981-982`	The 'input_data' parameter is modified by reassigning input_data.prompt and input_data.sys_prompt, c...	Create local variables for formatted values instead of modifying the original input_data.	75%
`autogpt_platform/backend/backend/blocks/llm.py:1137-1139`	The 'input_data' parameter is modified by reassigning input_data.max_tokens in the exception handler...	Store adjusted max_tokens in a local variable and pass it to subsequent llm_call invocations.	70%

Rule: py_avoid_modifying_input_parameters

🟡 MEDIUM - Test mocks internal method - limited integration coverage (2 occurrences)

Agent: testing

Category: quality

Why this matters: Improper mocks make tests brittle and unreliable.

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/test/test_llm.py:310-351`	Test replaces block.llm_call with mock that sets execution_stats directly, then verifies the mock-se...	Consider mocking at OpenAI API boundary (AsyncOpenAI) to test real integration. Note: this pattern i...	65%
`autogpt_platform/backend/backend/blocks/test/test_llm.py:353-400`	Test replaces block.llm_call with mock that sets execution_stats directly, then verifies mock-set va...	Consider mocking at OpenAI API boundary instead. Note: this is the same pattern used elsewhere in th...	65%

Rule: test_py_mocking_too_much

🟡 MEDIUM - Missing type annotation for error parameter

Agent: python

Category: quality

Why this matters: Type hints prevent whole classes of bugs and improve IDE/refactor support.

File: autogpt_platform/backend/backend/blocks/llm.py:1199-1207

Description: Parameter 'error' in method 'invalid_response_feedback' lacks type annotation. This method accepts different error types and should document them.

Suggestion: Add type annotation: 'def invalid_response_feedback(self, error: Union[ValueError, JSONDecodeError, str], ...) -> str:'

Confidence: 70%

Rule: py_add_comprehensive_type_hints

🟡 MEDIUM - Double loop through response.content blocks

Agent: performance

Category: performance

File: autogpt_platform/backend/backend/blocks/llm.py:582-608

Description: The function loops through resp.content twice: lines 582-597 for tool_use and lines 605-608 for thinking type extraction.

Suggestion: Combine into a single iteration through resp.content, checking for both types in one pass.

Confidence: 70%

Rule: perf_quadratic_loops

🟡 MEDIUM - Import inside function body (3 occurrences)

Agent: python

Category: quality

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:1519`	Function-level import of 'truncate' from backend.util.truncate inside _summarize_chunk. This import ...	Move 'from backend.util.truncate import truncate' to the top-level imports.	70%
`autogpt_platform/backend/backend/blocks/llm.py:329`	Using Optional[str] when the modern Python 3.10+ syntax is str \| None. This is inconsistent with th...	Change 'reasoning: Optional[str] = None' to 'reasoning: str \| None = None'	75%
`autogpt_platform/backend/backend/blocks/llm.py:324`	Using 'List' from typing module when Python 3.9+ allows using 'list' directly. Inconsistent with res...	Change 'prompt: List[Any]' to 'prompt: list[Any]'	70%

Rule: py_remove_unused_imports_and_variables

🟡 MEDIUM - Test assertion comments contradict expected behavior

Agent: refactoring

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:186-191

Description: Line 187 states 'llm_call_count is only set on success, so it shows 1' but line 190 asserts llm_call_count == 2. The comment explanation is misleading - it actually should be retry_count + 1 = 2 as stated in line 190's inline comment.

Suggestion: Remove or correct the misleading comment at line 187. The assertion at line 190 with its inline comment 'retry_count + 1 = 1 + 1 = 2' is correct, but line 187 contradicts this.

Confidence: 75%

Rule: quality_unreachable_code

🟡 MEDIUM - User input formatted into prompts via Jinja2

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:979-982

Description: User-supplied prompt_values are formatted into system and user prompts using Jinja2 SandboxedEnvironment (via fmt.format_string()). While sandboxed, autoescape=False at initialization allows potential content manipulation in prompts.

Suggestion: Consider enabling autoescape for the TextFormatter used in LLM blocks, or validate prompt_values don't contain suspicious patterns. The SandboxedEnvironment mitigates template injection but doesn't prevent prompt manipulation.

Confidence: 65%

Rule: sec_llm_prompt_injection

🟡 MEDIUM - User input directly embedded into LLM prompts

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:1812-1823

Description: User-supplied input_data.focus and input_data.source_data are directly embedded into prompts using f-strings. While this is common for LLM applications, sensitive data could be sent to external providers.

Suggestion: Consider implementing PII detection or data sanitization for sensitive patterns before embedding user input into prompts sent to external LLM providers.

Confidence: 62%

Rule: sec_llm_sensitive_data_exposure

🟡 MEDIUM - Missing Range Validation for chunk_overlap Parameter

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:1391-1401

Description: chunk_overlap has ge=0 but no upper bound. No cross-field validation ensures overlap < max_tokens, which could create invalid chunk configurations.

Suggestion: Add validation to ensure chunk_overlap < max_tokens to prevent invalid chunking behavior.

Confidence: 70%

Rule: py_add_input_validation_for_critical_parame

🟡 MEDIUM - Groq API Call Without Explicit Timeout (5 occurrences)

Agent: security

Category: security

📍 View all locations

File	Description	Suggestion	Confidence
`autogpt_platform/backend/backend/blocks/llm.py:633-638`	The Groq API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:681-691`	The OpenRouter API call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:723-733`	The Llama API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:765-769`	The AIML API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%
`autogpt_platform/backend/backend/blocks/llm.py:797-804`	The v0 API chat.completions.create() call does not explicitly set a timeout parameter.	Add an explicit timeout parameter to the create() call.	72%

Rule: sec_external_call_no_timeout

🟡 MEDIUM - Test assertions too vague to catch bugs

Agent: testing

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:194-252

Description: The test uses assertions like call_count > 1 and llm_call_count > 0 which are too permissive and won't catch regressions if counts change unexpectedly.

Suggestion: Replace > 1 and > 0 assertions with exact expected values based on known chunking behavior.

Confidence: 75%

Rule: test_py_bare_assert

_{Review ID: 856e49d7-82a0-43b7-8634-e05aaca4b5a6}
_{Rate it 👍 or 👎 to improve future reviews | Powered by diffray}

- Use getattr with fallback for tool call ID extraction - Add return type annotation to get_parallel_tool_calls_param - Add timeout (600s) to OpenAI Responses API call - Add model context to error messages - Handle TimeoutError in addition to APIError - Remove duplicate imports in test file - Remove debug print statements in test file

autogpt_platform/backend/backend/blocks/llm.py

The raw_response field is used by smart_decision_maker.py for conversation history. It expects a message dict with 'role' and 'content' keys, not the raw Response object. - Construct message dict with role='assistant' and content - Include tool_calls in OpenAI format when present - Fixes multi-turn conversation and retry logic

qodo-code-review · 2025-12-28T18:32:56Z

ⓘ Your monthly quota for Qodo has expired. Upgrade your plan

ⓘ Paying users. Check that your Qodo account is linked with this Git user account

0ubbe and others added 2 commits December 18, 2025 19:04

devbyteai requested a review from a team as a code owner December 27, 2025 18:21

devbyteai requested review from Pwuts and Swiftyos and removed request for a team December 27, 2025 18:21

github-project-automation bot added this to AutoGPT development kanban Dec 27, 2025

github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Dec 27, 2025

github-actions bot changed the base branch from master to dev December 27, 2025 18:21

github-actions bot added platform/frontend AutoGPT Platform - Front end platform/blocks labels Dec 27, 2025

github-project-automation bot added this to Frontend Dec 27, 2025

github-actions bot added the size/l label Dec 27, 2025

Enable CHAT feature flag

c558293

wasnt in my first commit dunno wtf with this

github-actions bot removed the platform/frontend AutoGPT Platform - Front end label Dec 27, 2025

diffray-bot reviewed Dec 28, 2025

View reviewed changes

sentry bot reviewed Dec 28, 2025

View reviewed changes

autogpt_platform/backend/backend/blocks/llm.py Outdated Show resolved Hide resolved

style: Fix Black formatting

e60823a

fix: migrate OpenAI provider to use Responses API #11674

Are you sure you want to change the base?

fix: migrate OpenAI provider to use Responses API #11674

Conversation

devbyteai commented Dec 27, 2025

Summary

Changes

Core Changes

Parameter Mapping (Chat Completions → Responses API)

Response Parsing (Chat Completions → Responses API)

Compatibility

SDK Version

API Compatibility

Provider Impact

Dependent Blocks Verified

Streaming Service

Testing

Test File Updates

Verification Performed

Recommended Manual Testing

Files Modified

1. autogpt_platform/backend/backend/blocks/llm.py

2. autogpt_platform/backend/backend/blocks/test/test_llm.py

References

Checklist

Changes

Code Quality

Uh oh!

coderabbitai bot commented Dec 27, 2025

Review skipped

Uh oh!

github-actions bot commented Dec 27, 2025

Uh oh!

diffray-bot commented Dec 28, 2025

Changes Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diffray-bot commented Dec 28, 2025

Review Summary

Issues Found: 42

🔴 CRITICAL - Array bounds check missing for response.choices[0] (3 occurrences)

🔴 CRITICAL - Redundant Optional usage with union type syntax

🟠 HIGH - Redundant and illogical None comparison (2 occurrences)

🟠 HIGH - Sequential async calls in loop instead of parallel gathering

🟠 HIGH - Inconsistent mock setup for async HTTP calls (2 occurrences)

🟠 HIGH - Error logged without sufficient context for debugging

🟠 HIGH - God Function with 329 lines handling 8 different LLM providers

🟠 HIGH - Expensive regex substitution on every parse failure

🟠 HIGH - Missing timeout on external service call

🟠 HIGH - Incomplete error handling for external service call

🟠 HIGH - Overly broad exception handling

🟠 HIGH - Input parameter 'input_data' is modified in-place (2 occurrences)

🟠 HIGH - Duplicate import in function body (5 occurrences)

🟡 MEDIUM - Test mocks internal method - limited integration coverage (2 occurrences)

🟡 MEDIUM - Missing return type annotation (2 occurrences)

🟡 MEDIUM - Double loop through response.content blocks (2 occurrences)

🟡 MEDIUM - Debug print statements in test code

🟡 MEDIUM - Test assertion comments contradict expected behavior

🟡 MEDIUM - User input formatted into prompts via Jinja2

🟡 MEDIUM - User input directly embedded into LLM prompts

🟡 MEDIUM - Missing Input Validation for max_tokens Parameter (2 occurrences)

🟡 MEDIUM - OpenAI Responses API Call Without Explicit Timeout (6 occurrences)

🟡 MEDIUM - Test assertions too vague to catch bugs (2 occurrences)

🔴 CRITICAL - Redundant Optional usage with union type syntax

🟠 HIGH - Redundant and illogical None comparison (2 occurrences)

🟠 HIGH - Sequential async calls in loop instead of parallel gathering

🟠 HIGH - Expensive regex substitution on every parse failure

🟠 HIGH - Overly broad exception handling

🟠 HIGH - Input parameter 'input_data' is modified in-place (2 occurrences)

🟡 MEDIUM - Test mocks internal method - limited integration coverage (2 occurrences)

🟡 MEDIUM - Missing type annotation for error parameter

1. `autogpt_platform/backend/backend/blocks/llm.py`

2. `autogpt_platform/backend/backend/blocks/test/test_llm.py`