Skip to content

feat(backend/executor): Avoid full table scan on AgentNodeExecutionInputOutput table #10049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

majdyz
Copy link
Contributor

@majdyz majdyz commented May 26, 2025

This query is currently the most time consuming

select
  "platform"."AgentNodeExecution"."id",
  "platform"."AgentNodeExecution"."agentGraphExecutionId",
  "platform"."AgentNodeExecution"."agentNodeId",
  "platform"."AgentNodeExecution"."executionStatus"::text,
  "platform"."AgentNodeExecution"."executionData",
  "platform"."AgentNodeExecution"."addedTime",
  "platform"."AgentNodeExecution"."queuedTime",
  "platform"."AgentNodeExecution"."startedTime",
  "platform"."AgentNodeExecution"."endedTime",
  "platform"."AgentNodeExecution"."stats"
from
  "platform"."AgentNodeExecution"
where
  (
    "platform"."AgentNodeExecution"."agentNodeId" = $1
    and "platform"."AgentNodeExecution"."agentGraphExecutionId" = $2
    and "platform"."AgentNodeExecution"."executionStatus" = CAST($3::text as "platform"."AgentExecutionStatus")
    and ("platform"."AgentNodeExecution"."id") not in (
      select
        "t1"."referencedByInputExecId"
      from
        "platform"."AgentNodeExecutionInputOutput" as "t1"
      where
        (
          (not "t1"."name" <> $4)
          and "t1"."referencedByInputExecId" is not null
        )
    )
  )
order by
  "platform"."AgentNodeExecution"."addedTime" asc
limit
  $5
offset
  $6 /* traceparent='00-00000000000000000000000000000000-0000000000000000-01' */

which is likely caused by the inner select query that do full table scan most of the time

(
      select
        "t1"."referencedByInputExecId"
      from
        "platform"."AgentNodeExecutionInputOutput" as "t1"
      where
        (
          (not "t1"."name" <> $4)
          and "t1"."referencedByInputExecId" is not null
        )
    )

Changes 🏗️

The scope of the change is to avoid the full table call, by trying to load everything without filtering and filters it on the application level.

Checklist 📋

For code changes:

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • CI, manual agent run

@majdyz majdyz requested review from Swiftyos and Bentlybro May 26, 2025 19:57
@majdyz majdyz requested a review from a team as a code owner May 26, 2025 19:57
@github-project-automation github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban May 26, 2025
Copy link

netlify bot commented May 26, 2025

Deploy Preview for auto-gpt-docs-dev canceled.

Name Link
🔨 Latest commit cab1299
🔍 Latest deploy log https://app.netlify.com/projects/auto-gpt-docs-dev/deploys/6834c7b93dc1430008391ded

@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end and removed Review effort 2/5 labels May 26, 2025
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Memory Usage

The new implementation loads all executions into memory and filters them in Python code instead of letting the database handle the filtering. This could lead to excessive memory usage if there are many executions.

existing_execution = next(
    (
        execution
        for execution in await AgentNodeExecution.prisma().find_many(
            where=existing_exec_query_filter,
            order={"addedTime": "asc"},
            include={"Input": True},
        )
        if input_name not in [d.name for d in execution.Input or []]
    ),
    None,
Performance Edge Case

While this change avoids a full table scan on AgentNodeExecutionInputOutput, it might be less efficient if there are many AgentNodeExecution records but few inputs with the specified name. The database query optimization should be validated with real-world data volumes.

existing_execution = next(
    (
        execution
        for execution in await AgentNodeExecution.prisma().find_many(
            where=existing_exec_query_filter,
            order={"addedTime": "asc"},
            include={"Input": True},
        )
        if input_name not in [d.name for d in execution.Input or []]
    ),
    None,

Copy link

netlify bot commented May 26, 2025

Deploy Preview for auto-gpt-docs canceled.

Name Link
🔨 Latest commit cab1299
🔍 Latest deploy log https://app.netlify.com/projects/auto-gpt-docs/deploys/6834c7b94255670008eaa939

Copy link

deepsource-io bot commented May 26, 2025

Here's the code health analysis summary for commits 8e2fb2d..cab1299. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource JavaScript LogoJavaScript✅ SuccessView Check ↗
DeepSource Python LogoPython✅ Success
❗ 2 occurences introduced
🎯 2 occurences resolved
View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

@majdyz majdyz enabled auto-merge May 27, 2025 04:23
@majdyz majdyz disabled auto-merge May 27, 2025 08:29
@majdyz majdyz marked this pull request as draft May 27, 2025 08:29
@majdyz
Copy link
Contributor Author

majdyz commented May 27, 2025

I'll hold it for now, we will fetch more data over the wire which could be concerning

Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conflicts Automatically applied to PRs with merge conflicts platform/backend AutoGPT Platform - Back end size/m
Projects
Status: 🆕 Needs initial review
Development

Successfully merging this pull request may close these issues.

1 participant