Skip to content

Conversation

@devbyteai
Copy link

Summary

Migrates OpenAI native API calls from the deprecated chat.completions.create endpoint to the new responses.create endpoint as recommended by OpenAI.

Fixes #11624

Changes

Core Changes

  1. Updated OpenAI provider in llm.py to use client.responses.create()
  2. Added extract_responses_api_reasoning() helper to parse reasoning output (handles both string and array summary formats)
  3. Added extract_responses_api_tool_calls() helper to parse function calls
  4. Added error handling for API errors (matching Anthropic provider pattern)
  5. Extract system messages to instructions parameter (Responses API requirement)

Parameter Mapping (Chat Completions → Responses API)

  1. messagesinput (non-system messages only)
  2. System messages → instructions parameter
  3. max_completion_tokensmax_output_tokens
  4. response_format={...}text={"format":{...}}

Response Parsing (Chat Completions → Responses API)

  1. choices[0].message.contentoutput_text
  2. usage.prompt_tokensusage.input_tokens
  3. usage.completion_tokensusage.output_tokens
  4. choices[0].message.tool_callsoutput items with type="function_call"

Compatibility

SDK Version

  1. Required: openai >= 1.66.0 (Responses API added in v1.66.0)
  2. AutoGPT uses: ^1.97.1 (COMPATIBLE)

API Compatibility

  1. llm_call() function signature - UNCHANGED
  2. LLMResponse class structure - UNCHANGED
  3. Return type and fields - UNCHANGED

Provider Impact

  1. openai - YES, modified (Native OpenAI - uses Responses API)
  2. anthropic - NO (Different SDK entirely)
  3. groq - NO (Third-party API, Chat Completions compatible)
  4. open_router - NO (Third-party API, Chat Completions compatible)
  5. llama_api - NO (Third-party API, Chat Completions compatible)
  6. ollama - NO (Uses ollama SDK)
  7. aiml_api - NO (Third-party API, Chat Completions compatible)
  8. v0 - NO (Third-party API, Chat Completions compatible)

Dependent Blocks Verified

  1. smart_decision_maker.py (Line 508) - Uses: response, tool_calls, prompt_tokens, completion_tokens, reasoning - COMPATIBLE
  2. ai_condition.py (Line 113) - Uses: response, prompt_tokens, completion_tokens, prompt - COMPATIBLE
  3. perplexity.py - Does not use llm_call (uses different API) - NOT AFFECTED

Streaming Service

backend/server/v2/chat/service.py is NOT affected - it uses OpenRouter by default which requires Chat Completions API format.

Testing

Test File Updates

  1. Updated test_llm.py mocks to use output_text instead of choices[0].message.content
  2. Updated mocks to use output array for tool calls
  3. Updated mocks to use usage.input_tokens / usage.output_tokens

Verification Performed

  1. SDK version compatibility verified (1.97.1 > 1.66.0)
  2. Function signature unchanged
  3. LLMResponse class unchanged
  4. All 7 other providers unchanged
  5. Dependent blocks use only public API
  6. Streaming service unaffected (uses OpenRouter)
  7. Error handling matches Anthropic provider pattern
  8. Tool call extraction handles call_id with fallback to id
  9. Reasoning extraction handles both string and array summary formats

Recommended Manual Testing

  1. Test with GPT-4o model using native OpenAI API
  2. Test with tool/function calling enabled
  3. Test with JSON mode (force_json_output=True)
  4. Verify token counting works correctly

Files Modified

1. autogpt_platform/backend/backend/blocks/llm.py

  1. Added extract_responses_api_reasoning() helper
  2. Added extract_responses_api_tool_calls() helper
  3. Updated OpenAI provider section to use responses.create
  4. Added error handling with try/except
  5. Extract system messages to instructions parameter

2. autogpt_platform/backend/backend/blocks/test/test_llm.py

  1. Updated mocks for Responses API format

References

  1. OpenAI Responses API Docs
  2. OpenAI Function Calling Docs
  3. OpenAI Reasoning Docs
  4. Simon Willison's Comparison
  5. OpenAI Python SDK v1.66.0 Release

Checklist

Changes

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • Updated unit test mocks to use Responses API format
    • Verified function signature unchanged
    • Verified LLMResponse class unchanged
    • Verified dependent blocks compatible
    • Verified other providers unchanged

Code Quality

  • My code follows the project's style guidelines
  • I have commented my code where necessary
  • My changes generate no new warnings
  • I have added error handling matching existing patterns

0ubbe and others added 2 commits December 18, 2025 19:04
…vitas#11639)

### Changes 🏗️

Chat should be disabled by default; otherwise, it flashes, and if Launch
Darkly fails to fail, it is dangerous.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run locally with Launch Darkly disabled and test the above
Updates the OpenAI provider in llm.py to use the newer Responses API
(responses.create) instead of the legacy Chat Completions API
(chat.completions.create).

Changes:
- Replace chat.completions.create with responses.create
- Use 'input' parameter instead of 'messages'
- Use 'max_output_tokens' instead of 'max_completion_tokens'
- Parse response.output_text instead of choices[0].message.content
- Use input_tokens/output_tokens for token usage tracking
- Add helper functions for extracting reasoning and tool calls from
  the Responses API response format
- Update tests to mock the new API format

The chat service is not migrated as it uses OpenRouter which requires
the Chat Completions API format.

Closes Significant-Gravitas#11624
@devbyteai devbyteai requested a review from a team as a code owner December 27, 2025 18:21
@devbyteai devbyteai requested review from Pwuts and Swiftyos and removed request for a team December 27, 2025 18:21
@github-project-automation github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Dec 27, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 27, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions bot changed the base branch from master to dev December 27, 2025 18:21
@github-actions github-actions bot added platform/frontend AutoGPT Platform - Front end platform/blocks labels Dec 27, 2025
wasnt in my first commit dunno wtf with this
@github-actions github-actions bot removed the platform/frontend AutoGPT Platform - Front end label Dec 27, 2025
@diffray-bot
Copy link

Changes Summary

This PR migrates the native OpenAI API integration from the deprecated Chat Completions endpoint to the new Responses API (v1.66.0+). It introduces two new helper functions to parse Responses API output, updates the OpenAI provider section to use the new endpoint, and adjusts parameter mapping and response parsing accordingly.

Type: feature

Components Affected: backend.blocks.llm (OpenAI provider), test.test_llm (test mocks), LLM infrastructure

Files Changed
File Summary Change Impact
.../autogpt_platform/backend/backend/blocks/llm.py Updated OpenAI provider to use Responses API instead of Chat Completions; added extract_responses_api_reasoning() and extract_responses_api_tool_calls() helpers; refactored parameter mapping and response parsing. ✏️ 🔴
...latform/backend/backend/blocks/test/test_llm.py Updated test mocks to reflect Responses API response format (output_text, input_tokens, output_tokens). ✏️ 🟡
Architecture Impact
  • New Patterns: Adapter/wrapper pattern for API response extraction, Conditional response parsing based on API format
  • Dependencies: openai SDK: already compatible (1.97.1 >= 1.66.0 requirement)
  • Coupling: Reduced coupling to Chat Completions API format; new dependency on Responses API schema. Function signature and return types remain unchanged, so downstream blocks are unaffected.

Risk Areas: Response field mapping differences: output_text vs message.content, input_tokens/output_tokens vs prompt_tokens/completion_tokens, Tool call extraction: handling call_id vs id field in Responses API output, Reasoning extraction: handling both string and array summary formats in reasoning output, Error handling: new try/catch for openai.APIError (follows Anthropic pattern), System message extraction: moved from messages to instructions parameter, Response format parameter mapping: response_format to text parameter

Suggestions
  • Verify manual testing with GPT-4o model and tool calling enabled (as recommended in PR)
  • Test JSON mode (force_json_output=True) with Responses API
  • Verify token counting accuracy with real API responses
  • Consider adding integration tests with live OpenAI API (if feasible in CI/CD)
  • Document the API migration for future maintainers

Full review in progress... | Powered by diffray

@diffray-bot
Copy link

Review Summary

Free public review - Want AI code reviews on your PRs? Check out diffray.ai

Validated 78 issues: 42 kept, 36 filtered

Issues Found: 42

💬 See 17 individual line comment(s) for details.

📊 23 unique issue type(s) across 42 location(s)

📋 Full issue list (click to expand)

🔴 CRITICAL - Array bounds check missing for response.choices[0] (3 occurrences)

Agent: bugs

Category: bug

Why this matters: Accessing empty arrays throws IndexError/undefined access.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:367 The extract_openai_reasoning function accesses response.choices[0] without checking if the choices a... Add a bounds check before accessing choices[0]: if response.choices and len(response.choices) > 0: b... 92%
autogpt_platform/backend/backend/blocks/llm.py:381 The extract_openai_tool_calls function accesses response.choices[0] without checking if the choices ... Add a bounds check: if response.choices and len(response.choices) > 0: before accessing choices[0]. 92%
autogpt_platform/backend/backend/blocks/llm.py:423 Line 423 checks if item has call_id and falls back to item.id if not, but doesn't verify item.id act... Use getattr with a default: id=getattr(item, 'call_id', getattr(item, 'id', None)) or add explicit h... 78%

Rule: bug_array_bounds


🔴 CRITICAL - Redundant Optional usage with union type syntax

Agent: python

Category: quality

Why this matters: Type hints enable IDE autocomplete and catch type errors early.

File: autogpt_platform/backend/backend/blocks/llm.py:326

Description: Line 326 uses both 'Optional[List[ToolContentBlock]]' and '| None' on the same field, creating redundancy and confusion. Optional[X] is equivalent to X | None.

Suggestion: Change to: tool_calls: list[ToolContentBlock] | None = None

Confidence: 98%

Rule: py_use_type_annotations_for_better_readabil


🟠 HIGH - Redundant and illogical None comparison (2 occurrences)

Agent: python

Category: bug

Why this matters: Redundant None check indicates logic error or dead branch.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:694-698 After checking 'if not response.choices' at line 694, the code checks 'if response' at line 695. Thi... Remove the inner conditional. Simply raise ValueError('No response choices from OpenRouter'). 95%
autogpt_platform/backend/backend/blocks/llm.py:736-740 After checking 'if not response.choices' at line 736, the code checks 'if response' at line 737. Thi... Remove the inner conditional. Simply raise ValueError('No response choices from Llama API'). 95%

Rule: py_avoid_redundant_none_comparisons


🟠 HIGH - Sequential async calls in loop instead of parallel gathering

Agent: performance

Category: performance

Why this matters: N+1 queries cause severe performance degradation.

File: autogpt_platform/backend/backend/blocks/llm.py:1452-1454

Description: The _run method in AITextSummarizerBlock processes chunks sequentially using a for loop with await on each chunk. With multiple chunks, this becomes O(n) serialized operations when they could be parallelized.

Suggestion: Use asyncio.gather() or asyncio.TaskGroup to summarize all chunks concurrently, reducing wall-clock time from O(n*t) to O(t).

Confidence: 88%

Rule: perf_n_plus_one_queries


🟠 HIGH - Inconsistent mock setup for async HTTP calls (2 occurrences)

Agent: microservices

Category: bug

Why this matters: Live I/O introduces slowness, nondeterminism, and external failures unrelated to the code.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/test/test_llm.py:283 In test_ai_text_summarizer_real_llm_call_stats, the async function mock_create is assigned directly ... Change line 283 from 'mock_client.responses.create = mock_create' to 'mock_client.responses.create =... 70%
autogpt_platform/backend/backend/blocks/test/test_llm.py:260 The test is named 'test_ai_text_summarizer_real_llm_call_stats' which suggests it makes real LLM cal... Rename to 'test_ai_text_summarizer_mocked_llm_call_stats' or 'test_ai_text_summarizer_llm_call_stats... 65%

Rule: gen_no_live_io_in_unit_tests


🟠 HIGH - Error logged without sufficient context for debugging

Agent: bugs

Category: bug

Why this matters: Context-free errors are impossible to trace in production logs.

File: autogpt_platform/backend/backend/blocks/llm.py:527-529

Description: When catching openai.APIError, the error message only includes the error text but lacks context about what model was being called, the input parameters, or the operation context. This makes debugging and monitoring difficult in production logs.

Suggestion: Include relevant context in error message: add llm_model value, a summary of input messages, and operation context. Example: f'OpenAI Responses API error for model {llm_model.value}: {str(e)}'

Confidence: 72%

Rule: bug_missing_error_context


🟠 HIGH - God Function with 329 lines handling 8 different LLM providers

Agent: architecture

Category: quality

Why this matters: Framework coupling makes code harder to test and migrate.

File: autogpt_platform/backend/backend/blocks/llm.py:491

Description: The llm_call function is a monolithic function with completely separate logic paths for 8 LLM providers. Each provider has its own client instantiation, request building, response parsing, error handling, and token counting. This violates Single Responsibility Principle.

Suggestion: Refactor using Strategy pattern: create an abstract LLMProvider base class with provider-specific implementations. Use a factory to instantiate the correct provider based on model metadata.

Confidence: 75%

Rule: py_separate_business_logic_from_framework


🟠 HIGH - Expensive regex substitution on every parse failure

Agent: performance

Category: performance

File: autogpt_platform/backend/backend/blocks/llm.py:1048

Description: Line 1048 executes re.sub(r'[A-Za-z0-9]', '*', response_text) to censor responses for logging on every parse failure. The regex is recompiled each time and operates on potentially large text.

Suggestion: Pre-compile the regex at module level: CENSOR_PATTERN = re.compile(r'[A-Za-z0-9]'). Better: only censor a fixed-size snippet instead of entire response.

Confidence: 85%

Rule: perf_expensive_in_loop


🟠 HIGH - Missing timeout on external service call

Agent: python

Category: bug

File: autogpt_platform/backend/backend/blocks/llm.py:524-529

Description: The oai_client.responses.create() call at line 525 has no explicit timeout parameter. Compare with anthropic client at line 575 which properly includes timeout=600.

Suggestion: Add a timeout parameter to the responses_params dictionary before making the call, similar to the Anthropic call.

Confidence: 90%

Rule: python_request_no_timeout


🟠 HIGH - Incomplete error handling for external service call

Agent: python

Category: bug

File: autogpt_platform/backend/backend/blocks/llm.py:524-529

Description: The oai_client.responses.create() call only catches openai.APIError. Other exceptions like asyncio.TimeoutError, ConnectionError are not caught.

Suggestion: Expand the try-except to catch additional exceptions: asyncio.TimeoutError, ConnectionError, etc.

Confidence: 80%

Rule: py_add_error_handling_for_external_service_


🟠 HIGH - Overly broad exception handling

Agent: python

Category: quality

File: autogpt_platform/backend/backend/blocks/llm.py:1131

Description: Bare Exception clause catches all exceptions including SystemExit and KeyboardInterrupt. Should catch specific exception types.

Suggestion: Replace 'except Exception as e' with specific exception types like (openai.APIError, anthropic.APIError, ValueError).

Confidence: 90%

Rule: python_bare_except


🟠 HIGH - Input parameter 'input_data' is modified in-place (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:981-982 The 'input_data' parameter is modified by reassigning input_data.prompt and input_data.sys_prompt, c... Create local variables for formatted values instead of modifying the original input_data. 75%
autogpt_platform/backend/backend/blocks/llm.py:1137-1139 The 'input_data' parameter is modified by reassigning input_data.max_tokens in the exception handler... Store adjusted max_tokens in a local variable and pass it to subsequent llm_call invocations. 70%

Rule: py_avoid_modifying_input_parameters


🟠 HIGH - Duplicate import in function body (5 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/test/test_llm.py:29 The 'patch' function is imported at module level (line 1) but re-imported inside function (line 43).... Remove the import statement at line 43 since 'patch' is already imported at the module level. 90%
autogpt_platform/backend/backend/blocks/test/test_llm.py:260 AsyncMock, MagicMock, and patch are all imported at module level (line 1) but re-imported inside fun... Remove the import statement at line 257 since all these are already imported at the module level. 90%
autogpt_platform/backend/backend/blocks/llm.py:1519 Function-level import of 'truncate' from backend.util.truncate inside _summarize_chunk. This import ... Move 'from backend.util.truncate import truncate' to the top-level imports. 70%
autogpt_platform/backend/backend/blocks/llm.py:329 Using Optional[str] when the modern Python 3.10+ syntax is str | None. This is inconsistent with th... Change 'reasoning: Optional[str] = None' to 'reasoning: str | None = None' 75%
autogpt_platform/backend/backend/blocks/llm.py:324 Using 'List' from typing module when Python 3.9+ allows using 'list' directly. Inconsistent with res... Change 'prompt: List[Any]' to 'prompt: list[Any]' 70%

Rule: py_remove_unused_imports_and_variables


🟡 MEDIUM - Test mocks internal method - limited integration coverage (2 occurrences)

Agent: testing

Category: quality

Why this matters: Improper mocks make tests brittle and unreliable.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/test/test_llm.py:310-351 Test replaces block.llm_call with mock that sets execution_stats directly, then verifies the mock-se... Consider mocking at OpenAI API boundary (AsyncOpenAI) to test real integration. Note: this pattern i... 65%
autogpt_platform/backend/backend/blocks/test/test_llm.py:353-400 Test replaces block.llm_call with mock that sets execution_stats directly, then verifies mock-set va... Consider mocking at OpenAI API boundary instead. Note: this is the same pattern used elsewhere in th... 65%

Rule: test_py_mocking_too_much


🟡 MEDIUM - Missing return type annotation (2 occurrences)

Agent: python

Category: quality

Why this matters: Type hints prevent whole classes of bugs and improve IDE/refactor support.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:434-436 Function 'get_parallel_tool_calls_param' has no return type annotation. Should declare the return ty... Add return type annotation: 'def get_parallel_tool_calls_param(llm_model: LlmModel, parallel_tool_ca... 75%
autogpt_platform/backend/backend/blocks/llm.py:1199-1207 Parameter 'error' in method 'invalid_response_feedback' lacks type annotation. This method accepts d... Add type annotation: 'def invalid_response_feedback(self, error: Union[ValueError, JSONDecodeError, ... 70%

Rule: py_add_comprehensive_type_hints


🟡 MEDIUM - Double loop through response.content blocks (2 occurrences)

Agent: performance

Category: performance

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:582-608 The function loops through resp.content twice: lines 582-597 for tool_use and lines 605-608 for thin... Combine into a single iteration through resp.content, checking for both types in one pass. 70%
autogpt_platform/backend/backend/blocks/llm.py:500-504 Lines 500 and 504 both iterate through the prompt list separately with list comprehensions. Combine into single loop to extract both system_messages and input_messages in one pass. 65%

Rule: perf_quadratic_loops


🟡 MEDIUM - Debug print statements in test code

Agent: python

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:286

Description: Debug print() statements left in test function. Should be removed or replaced with logging.

Suggestion: Replace with logger.debug() calls or use pytest's caplog fixture, or remove if no longer needed.

Confidence: 85%

Rule: python_print_debug


🟡 MEDIUM - Test assertion comments contradict expected behavior

Agent: refactoring

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:186-191

Description: Line 187 states 'llm_call_count is only set on success, so it shows 1' but line 190 asserts llm_call_count == 2. The comment explanation is misleading - it actually should be retry_count + 1 = 2 as stated in line 190's inline comment.

Suggestion: Remove or correct the misleading comment at line 187. The assertion at line 190 with its inline comment 'retry_count + 1 = 1 + 1 = 2' is correct, but line 187 contradicts this.

Confidence: 75%

Rule: quality_unreachable_code


🟡 MEDIUM - User input formatted into prompts via Jinja2

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:979-982

Description: User-supplied prompt_values are formatted into system and user prompts using Jinja2 SandboxedEnvironment (via fmt.format_string()). While sandboxed, autoescape=False at initialization allows potential content manipulation in prompts.

Suggestion: Consider enabling autoescape for the TextFormatter used in LLM blocks, or validate prompt_values don't contain suspicious patterns. The SandboxedEnvironment mitigates template injection but doesn't prevent prompt manipulation.

Confidence: 65%

Rule: sec_llm_prompt_injection


🟡 MEDIUM - User input directly embedded into LLM prompts

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:1812-1823

Description: User-supplied input_data.focus and input_data.source_data are directly embedded into prompts using f-strings. While this is common for LLM applications, sensitive data could be sent to external providers.

Suggestion: Consider implementing PII detection or data sanitization for sensitive patterns before embedding user input into prompts sent to external LLM providers.

Confidence: 62%

Rule: sec_llm_sensitive_data_exposure


🟡 MEDIUM - Missing Input Validation for max_tokens Parameter (2 occurrences)

Agent: security

Category: security

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:436 max_tokens parameter has no validation at function entry. However, the code at lines 486-489 does ca... Consider adding explicit bounds validation at entry for clearer error messages, though the current l... 60%
autogpt_platform/backend/backend/blocks/llm.py:1391-1401 chunk_overlap has ge=0 but no upper bound. No cross-field validation ensures overlap < max_tokens, w... Add validation to ensure chunk_overlap < max_tokens to prevent invalid chunking behavior. 70%

Rule: py_add_input_validation_for_critical_parame


🟡 MEDIUM - OpenAI Responses API Call Without Explicit Timeout (6 occurrences)

Agent: security

Category: security

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:525 The OpenAI Responses API call does not explicitly set a timeout parameter. This could lead to indefi... Add an explicit timeout parameter to the create() call, e.g., timeout=60. 72%
autogpt_platform/backend/backend/blocks/llm.py:633-638 The Groq API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:681-691 The OpenRouter API call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:723-733 The Llama API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:765-769 The AIML API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:797-804 The v0 API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%

Rule: sec_external_call_no_timeout


🟡 MEDIUM - Test assertions too vague to catch bugs (2 occurrences)

Agent: testing

Category: quality

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/test/test_llm.py:194-252 The test uses assertions like call_count > 1 and llm_call_count > 0 which are too permissive and... Replace > 1 and > 0 assertions with exact expected values based on known chunking behavior. 75%
autogpt_platform/backend/backend/blocks/test/test_llm.py:260 Test has print statements (lines 301-303) and uses >= 1 assertion when comment says 'Should have m... Remove print statements. Replace >= 1 with == 2 per the test comment. Use specific expected valu... 85%

Rule: test_py_bare_assert


ℹ️ 25 issue(s) outside PR diff (click to expand)

These issues were found in lines not modified in this PR.

🔴 CRITICAL - Redundant Optional usage with union type syntax

Agent: python

Category: quality

Why this matters: Type hints enable IDE autocomplete and catch type errors early.

File: autogpt_platform/backend/backend/blocks/llm.py:326

Description: Line 326 uses both 'Optional[List[ToolContentBlock]]' and '| None' on the same field, creating redundancy and confusion. Optional[X] is equivalent to X | None.

Suggestion: Change to: tool_calls: list[ToolContentBlock] | None = None

Confidence: 98%

Rule: py_use_type_annotations_for_better_readabil


🟠 HIGH - Redundant and illogical None comparison (2 occurrences)

Agent: python

Category: bug

Why this matters: Redundant None check indicates logic error or dead branch.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:694-698 After checking 'if not response.choices' at line 694, the code checks 'if response' at line 695. Thi... Remove the inner conditional. Simply raise ValueError('No response choices from OpenRouter'). 95%
autogpt_platform/backend/backend/blocks/llm.py:736-740 After checking 'if not response.choices' at line 736, the code checks 'if response' at line 737. Thi... Remove the inner conditional. Simply raise ValueError('No response choices from Llama API'). 95%

Rule: py_avoid_redundant_none_comparisons


🟠 HIGH - Sequential async calls in loop instead of parallel gathering

Agent: performance

Category: performance

Why this matters: N+1 queries cause severe performance degradation.

File: autogpt_platform/backend/backend/blocks/llm.py:1452-1454

Description: The _run method in AITextSummarizerBlock processes chunks sequentially using a for loop with await on each chunk. With multiple chunks, this becomes O(n) serialized operations when they could be parallelized.

Suggestion: Use asyncio.gather() or asyncio.TaskGroup to summarize all chunks concurrently, reducing wall-clock time from O(n*t) to O(t).

Confidence: 88%

Rule: perf_n_plus_one_queries


🟠 HIGH - Expensive regex substitution on every parse failure

Agent: performance

Category: performance

File: autogpt_platform/backend/backend/blocks/llm.py:1048

Description: Line 1048 executes re.sub(r'[A-Za-z0-9]', '*', response_text) to censor responses for logging on every parse failure. The regex is recompiled each time and operates on potentially large text.

Suggestion: Pre-compile the regex at module level: CENSOR_PATTERN = re.compile(r'[A-Za-z0-9]'). Better: only censor a fixed-size snippet instead of entire response.

Confidence: 85%

Rule: perf_expensive_in_loop


🟠 HIGH - Overly broad exception handling

Agent: python

Category: quality

File: autogpt_platform/backend/backend/blocks/llm.py:1131

Description: Bare Exception clause catches all exceptions including SystemExit and KeyboardInterrupt. Should catch specific exception types.

Suggestion: Replace 'except Exception as e' with specific exception types like (openai.APIError, anthropic.APIError, ValueError).

Confidence: 90%

Rule: python_bare_except


🟠 HIGH - Input parameter 'input_data' is modified in-place (2 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:981-982 The 'input_data' parameter is modified by reassigning input_data.prompt and input_data.sys_prompt, c... Create local variables for formatted values instead of modifying the original input_data. 75%
autogpt_platform/backend/backend/blocks/llm.py:1137-1139 The 'input_data' parameter is modified by reassigning input_data.max_tokens in the exception handler... Store adjusted max_tokens in a local variable and pass it to subsequent llm_call invocations. 70%

Rule: py_avoid_modifying_input_parameters


🟡 MEDIUM - Test mocks internal method - limited integration coverage (2 occurrences)

Agent: testing

Category: quality

Why this matters: Improper mocks make tests brittle and unreliable.

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/test/test_llm.py:310-351 Test replaces block.llm_call with mock that sets execution_stats directly, then verifies the mock-se... Consider mocking at OpenAI API boundary (AsyncOpenAI) to test real integration. Note: this pattern i... 65%
autogpt_platform/backend/backend/blocks/test/test_llm.py:353-400 Test replaces block.llm_call with mock that sets execution_stats directly, then verifies mock-set va... Consider mocking at OpenAI API boundary instead. Note: this is the same pattern used elsewhere in th... 65%

Rule: test_py_mocking_too_much


🟡 MEDIUM - Missing type annotation for error parameter

Agent: python

Category: quality

Why this matters: Type hints prevent whole classes of bugs and improve IDE/refactor support.

File: autogpt_platform/backend/backend/blocks/llm.py:1199-1207

Description: Parameter 'error' in method 'invalid_response_feedback' lacks type annotation. This method accepts different error types and should document them.

Suggestion: Add type annotation: 'def invalid_response_feedback(self, error: Union[ValueError, JSONDecodeError, str], ...) -> str:'

Confidence: 70%

Rule: py_add_comprehensive_type_hints


🟡 MEDIUM - Double loop through response.content blocks

Agent: performance

Category: performance

File: autogpt_platform/backend/backend/blocks/llm.py:582-608

Description: The function loops through resp.content twice: lines 582-597 for tool_use and lines 605-608 for thinking type extraction.

Suggestion: Combine into a single iteration through resp.content, checking for both types in one pass.

Confidence: 70%

Rule: perf_quadratic_loops


🟡 MEDIUM - Import inside function body (3 occurrences)

Agent: python

Category: quality

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:1519 Function-level import of 'truncate' from backend.util.truncate inside _summarize_chunk. This import ... Move 'from backend.util.truncate import truncate' to the top-level imports. 70%
autogpt_platform/backend/backend/blocks/llm.py:329 Using Optional[str] when the modern Python 3.10+ syntax is str | None. This is inconsistent with th... Change 'reasoning: Optional[str] = None' to 'reasoning: str | None = None' 75%
autogpt_platform/backend/backend/blocks/llm.py:324 Using 'List' from typing module when Python 3.9+ allows using 'list' directly. Inconsistent with res... Change 'prompt: List[Any]' to 'prompt: list[Any]' 70%

Rule: py_remove_unused_imports_and_variables


🟡 MEDIUM - Test assertion comments contradict expected behavior

Agent: refactoring

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:186-191

Description: Line 187 states 'llm_call_count is only set on success, so it shows 1' but line 190 asserts llm_call_count == 2. The comment explanation is misleading - it actually should be retry_count + 1 = 2 as stated in line 190's inline comment.

Suggestion: Remove or correct the misleading comment at line 187. The assertion at line 190 with its inline comment 'retry_count + 1 = 1 + 1 = 2' is correct, but line 187 contradicts this.

Confidence: 75%

Rule: quality_unreachable_code


🟡 MEDIUM - User input formatted into prompts via Jinja2

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:979-982

Description: User-supplied prompt_values are formatted into system and user prompts using Jinja2 SandboxedEnvironment (via fmt.format_string()). While sandboxed, autoescape=False at initialization allows potential content manipulation in prompts.

Suggestion: Consider enabling autoescape for the TextFormatter used in LLM blocks, or validate prompt_values don't contain suspicious patterns. The SandboxedEnvironment mitigates template injection but doesn't prevent prompt manipulation.

Confidence: 65%

Rule: sec_llm_prompt_injection


🟡 MEDIUM - User input directly embedded into LLM prompts

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:1812-1823

Description: User-supplied input_data.focus and input_data.source_data are directly embedded into prompts using f-strings. While this is common for LLM applications, sensitive data could be sent to external providers.

Suggestion: Consider implementing PII detection or data sanitization for sensitive patterns before embedding user input into prompts sent to external LLM providers.

Confidence: 62%

Rule: sec_llm_sensitive_data_exposure


🟡 MEDIUM - Missing Range Validation for chunk_overlap Parameter

Agent: security

Category: security

File: autogpt_platform/backend/backend/blocks/llm.py:1391-1401

Description: chunk_overlap has ge=0 but no upper bound. No cross-field validation ensures overlap < max_tokens, which could create invalid chunk configurations.

Suggestion: Add validation to ensure chunk_overlap < max_tokens to prevent invalid chunking behavior.

Confidence: 70%

Rule: py_add_input_validation_for_critical_parame


🟡 MEDIUM - Groq API Call Without Explicit Timeout (5 occurrences)

Agent: security

Category: security

📍 View all locations
File Description Suggestion Confidence
autogpt_platform/backend/backend/blocks/llm.py:633-638 The Groq API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:681-691 The OpenRouter API call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:723-733 The Llama API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:765-769 The AIML API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%
autogpt_platform/backend/backend/blocks/llm.py:797-804 The v0 API chat.completions.create() call does not explicitly set a timeout parameter. Add an explicit timeout parameter to the create() call. 72%

Rule: sec_external_call_no_timeout


🟡 MEDIUM - Test assertions too vague to catch bugs

Agent: testing

Category: quality

File: autogpt_platform/backend/backend/blocks/test/test_llm.py:194-252

Description: The test uses assertions like call_count > 1 and llm_call_count > 0 which are too permissive and won't catch regressions if counts change unexpectedly.

Suggestion: Replace > 1 and > 0 assertions with exact expected values based on known chunking behavior.

Confidence: 75%

Rule: test_py_bare_assert



Review ID: 856e49d7-82a0-43b7-8634-e05aaca4b5a6
Rate it 👍 or 👎 to improve future reviews | Powered by diffray

- Use getattr with fallback for tool call ID extraction
- Add return type annotation to get_parallel_tool_calls_param
- Add timeout (600s) to OpenAI Responses API call
- Add model context to error messages
- Handle TimeoutError in addition to APIError
- Remove duplicate imports in test file
- Remove debug print statements in test file
The raw_response field is used by smart_decision_maker.py for conversation
history. It expects a message dict with 'role' and 'content' keys, not the
raw Response object.

- Construct message dict with role='assistant' and content
- Include tool_calls in OpenAI format when present
- Fixes multi-turn conversation and retry logic
@qodo-code-review
Copy link

ⓘ Your monthly quota for Qodo has expired. Upgrade your plan
ⓘ Paying users. Check that your Qodo account is linked with this Git user account

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🆕 Needs initial review
Status: No status

Development

Successfully merging this pull request may close these issues.

Update OpenAI calls to use responses.create

3 participants