feat(max): single loop executor attempt 2 #39717

skoob13 · 2025-10-15T13:27:34Z

Problem

The second attempt for the new loop executor with fixed issues.

Changes

Fixed the Inkeep node.
Fixed the HogQL fixer.
Fixed the insight search streaming.

How did you test this code?

Unit tests & manual testing

Changelog: (features only) Is this feature complete?

Max has become a more intelligent agent.

github-actions · 2025-10-15T13:31:46Z

Size Change: +28 B (0%)

Total Size: 3.06 MB

ℹ️ View Unchanged

Filename	Size	Change
`frontend/dist/toolbar.js`	3.06 MB	+28 B (0%)

_{compressed-size-action}

This reverts commit 8ca251c.

This reverts commit 6341ff1.

greptile-apps

_{82 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

ee/hogai/graph/insights/nodes.py

kappa90

:yolo:

Twixes

Fixed one more case with ReasoningMessages now being kept after streaming completion: when generation was canceled when the last message was an ai/reasoning one, that message should say "Canceled" instead of appearing to load endlessly.
Everything working here.

Twixes · 2025-10-16T14:37:25Z

ee/hogai/graph/inkeep_docs/nodes.py

-        # Original node has Anthropic messages, but Inkeep expects OpenAI messages
-        langchain_messages = convert_to_messages(
-            convert_to_openai_messages(super()._construct_messages(messages, window_start_id, tool_calls_count))
+        conversation_window = self._window_manager.get_messages_in_window(messages, window_start_id)[-28:]


Worth adding a comments why the number 28 for other readers, while we're here

Thanks, updated

…Hog/posthog into feat/loop-executor-revamp-attempt-2

posthog-bot · 2025-10-16T15:31:03Z

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

chromium: 0 added, 2 modified, 0 deleted (wasn't pushed!)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot · 2025-10-16T15:43:24Z

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

…Hog/posthog into feat/loop-executor-revamp-attempt-2

posthog-bot · 2025-10-16T16:24:24Z

🧠 AI eval results

Evaluated 29 experiments, comprising 61 metrics.

deep_research_onboarding

🔵 covers_essential_topics: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 has_correct_sections: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔴 is_concise_and_focused: 33.33%, -26.67% (improvements: 0, regressions: 1)

Baseline: master-1760615533 • Avg. case performance: ⏱️ 48.91 s, 🔢 0 tokens

tool_routing_dashboard_creation

🔴 ToolRelevance: 59.10%, -17.00% (improvements: 1, regressions: 5)

Baseline: master-1760615555 • Avg. case performance: ⏱️ 8.30 s, 🔢 0 tokens

tool_call_dashboard_creation

🔵 dashboard_creation_accuracy: 40.00%, ±0.00% (improvements: 1, regressions: 1)

Baseline: master-1760615566 • Avg. case performance: ⏱️ 22.30 s, 🔢 6523 tokens, 💵 $0.0030 in tokens

funnel

🔴 plan_correctness: 82.75%, -7.00% (improvements: 3, regressions: 4)
🟢 QueryKindSelection: 100.00%, +5.26% (improvements: 1, regressions: 0)
🔴 query_and_plan_alignment: 91.11%, -1.52% (improvements: 4, regressions: 7)
🔵 time_range_relevancy: 94.44%, -0.29% (improvements: 0, regressions: 1)

Baseline: master-1760615620 • Avg. case performance: ⏱️ 62.20 s, 🔢 8709 tokens, 💵 $0.0185 in tokens

insight_evaluation_accuracy

🟢 InsightEvaluationAccuracy: 75.00%, +25.00% (improvements: 1, regressions: 0)

Baseline: master-1760615758 • Avg. case performance: ⏱️ 16.71 s, 🔢 2804 tokens, 💵 $0.0013 in tokens

memory

🔴 ToolRelevance: 68.87%, -1.07% (improvements: 2, regressions: 2)
🟢 memory_content_relevance: 74.00%, +4.00% (improvements: 3, regressions: 2)

Baseline: master-1760615771 • Avg. case performance: ⏱️ 2.43 s, 🔢 1134 tokens, 💵 $0.0032 in tokens

memory_onboarding

🔴 has_correct_style: 50.00%, -16.67% (improvements: 0, regressions: 1)
🔵 has_technical_details: 83.33%, ±0.00% (improvements: 0, regressions: 0)
🔵 satisfies_business_details: 83.33%, ±0.00% (improvements: 0, regressions: 0)
🔵 satisfies_product_details: 83.33%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760615784 • Avg. case performance: ⏱️ 116.52 s, 🔢 674 tokens, 💵 $0.0033 in tokens

retention

🔵 QueryKindSelection: 75.00%, ±0.00% (improvements: 0, regressions: 0)
🟢 plan_correctness: 63.00%, +63.00% (improvements: 4, regressions: 0)
🔵 query_and_plan_alignment: 92.50%, ±0.00% (improvements: 0, regressions: 0)
🔵 time_range_relevancy: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760615956 • Avg. case performance: ⏱️ 62.02 s, 🔢 7693 tokens, 💵 $0.0207 in tokens

root

🔴 ToolRelevance: 69.14%, -18.11% (improvements: 4, regressions: 36)

Baseline: master-1760616013 • Avg. case performance: ⏱️ 11.23 s, 🔢 0 tokens

root_style

🔴 style_checker: 85.00%, -5.00% (improvements: 1, regressions: 2)

Baseline: master-1760616034 • Avg. case performance: ⏱️ 18.53 s, 🔢 0 tokens

tool_routing_session_replay

🔵 ToolRelevance: 94.86%, +0.14% (improvements: 4, regressions: 3)

Baseline: master-1760616083 • Avg. case performance: ⏱️ 11.32 s, 🔢 7668 tokens, 💵 $0.0469 in tokens

session_summarization_no_context

🔵 ToolRelevance: 97.99%, -0.35% (improvements: 1, regressions: 1)

Baseline: master-1760616096 • Avg. case performance: ⏱️ 8.74 s, 🔢 0 tokens

sql

🔴 QueryKindSelection: 92.31%, -7.69% (improvements: 0, regressions: 0)
🔵 plan_correctness: 76.07%, +0.36% (improvements: 5, regressions: 2)
🔴 query_and_plan_alignment: 84.23%, -7.69% (improvements: 2, regressions: 4)
🟢 retry_efficiency: 92.86%, +7.14% (improvements: 3, regressions: 1)
🟢 sql_syntax_correctness: 95.83%, +3.53% (improvements: 1, regressions: 1)
🔴 time_range_relevancy: 93.46%, -4.62% (improvements: 0, regressions: 1)

Baseline: master-1760616102 • Avg. case performance: ⏱️ 56.39 s, 🔢 10467 tokens, 💵 $0.0237 in tokens

survey_analysis

🔵 recommendation_quality: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 test_data_detection: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 theme_extraction_quality: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616241 • Avg. case performance: ⏱️ 11.28 s, 🔢 2697 tokens, 💵 $0.0088 in tokens

surveys

🔵 feature_flag_integration: 0.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 feature_flag_understanding: 70.00%, ±0.00% (improvements: 0, regressions: 0)
🟢 first_question_type_correct: 40.00%, +10.00% (improvements: 1, regressions: 0)
🔴 survey_creation_basics: 85.00%, -5.00% (improvements: 0, regressions: 1)
🔴 survey_question_quality: 57.00%, -7.00% (improvements: 1, regressions: 1)
🔴 survey_relevance: 52.00%, -3.00% (improvements: 0, regressions: 1)

Baseline: master-1760616246 • Avg. case performance: ⏱️ 5.06 s, 🔢 5687 tokens, 💵 $0.0122 in tokens

trends

🟢 QueryKindSelection: 90.00%, +20.00% (improvements: 2, regressions: 0)
🟢 plan_correctness: 96.00%, +8.50% (improvements: 4, regressions: 2)
🟢 query_and_plan_alignment: 93.00%, +4.50% (improvements: 3, regressions: 3)
🟢 time_range_relevancy: 99.00%, +9.00% (improvements: 3, regressions: 0)

Baseline: master-1760616262 • Avg. case performance: ⏱️ 37.91 s, 🔢 11250 tokens, 💵 $0.0230 in tokens

ui_context_actions

🟢 ToolRelevance: 68.24%, +1.58% (improvements: 2, regressions: 0)

Baseline: master-1760616354 • Avg. case performance: ⏱️ 8.21 s, 🔢 0 tokens

ui_context_events

🔵 ToolRelevance: 66.42%, +0.81% (improvements: 3, regressions: 0)

Baseline: master-1760616360 • Avg. case performance: ⏱️ 8.59 s, 🔢 0 tokens

insights_addition

🔵 SemanticSimilarity: 78.95%, -0.67% (improvements: 0, regressions: 2)

Baseline: master-1760616367 • Avg. case performance: ⏱️ 9.22 s, 🔢 6035 tokens, 💵 $0.0028 in tokens

combined_rename_and_add

🔵 SemanticSimilarity: 89.00%, +0.04% (improvements: 1, regressions: 0)

Baseline: master-1760616411 • Avg. case performance: ⏱️ 6.15 s, 🔢 5739 tokens, 💵 $0.0025 in tokens

tool_generate_hogql_query

🔵 no_mustache: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔴 sql_semantics_correctness: 46.67%, -7.14% (improvements: 1, regressions: 2)
🔵 sql_syntax_correctness: 86.67%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616421 • Avg. case performance: ⏱️ 9.93 s, 🔢 21989 tokens, 💵 $0.0447 in tokens

tool_filter_revenue_analytics

🔵 date_time_filtering_correctness: 100.00%, ±0.00% (improvements: 0, regressions: 0)
🔵 filter_generation_correctness: 100.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616472 • Avg. case performance: ⏱️ 2.17 s, 🔢 4113 tokens, 💵 $0.0087 in tokens

tool_filter_revenue_analytics_ask_user_for_help

🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616480 • Avg. case performance: ⏱️ 2.02 s, 🔢 3577 tokens, 💵 $0.0075 in tokens

tool_search_session_recordings

🔴 date_time_filtering_correctness: 94.44%, -3.57% (improvements: 0, regressions: 1)
🔴 filter_generation_correctness: 96.03%, -2.04% (improvements: 1, regressions: 2)

Baseline: master-1760616520 • Avg. case performance: ⏱️ 7.20 s, 🔢 14987 tokens, 💵 $0.0308 in tokens

tool_search_session_recordings_ask_user_for_help

🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616536 • Avg. case performance: ⏱️ 13.60 s, 🔢 58365 tokens, 💵 $0.1180 in tokens

tool_search_session_recordings

🔵 date_time_filtering_correctness: 97.22%, ±0.00% (improvements: 0, regressions: 0)
🔵 filter_generation_correctness: 98.41%, +0.79% (improvements: 1, regressions: 0)

Baseline: master-1760616520 • Avg. case performance: ⏱️ 7.72 s, 🔢 13979 tokens, 💵 $0.0287 in tokens

tool_search_session_recordings_ask_user_for_help

🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616536 • Avg. case performance: ⏱️ 6.56 s, 🔢 27404 tokens, 💵 $0.0555 in tokens

filter_query_generation

🔵 SemanticSimilarity: 99.21%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616545 • Avg. case performance: ⏱️ 0.54 s, 🔢 554 tokens, 💵 $0.0012 in tokens

yaml_fixing

🔵 ExactMatch: 83.33%, ±0.00% (improvements: 0, regressions: 0)

Baseline: master-1760616555 • Avg. case performance: ⏱️ 1.00 s, 🔢 168 tokens, 💵 $0.0001 in tokens

Triggered by this commit.

skoob13 and others added 9 commits October 16, 2025 16:24

Revert "revert: "feat(max): loop executor revamp (#38527)" (#39674)"

a902ad6

This reverts commit 8ca251c.

fix: refactor root

1d6fceb

Revert "fix: refactor root"

16f9927

This reverts commit 6341ff1.

wip

fe61ef1

fix: inkeep crashing

735537d

fix: sql tool

7eee1d1

fix: disable streaming in insights

77fd8d5

Update query snapshots

036d672

feat: add test

24f8a3e

skoob13 force-pushed the feat/loop-executor-revamp-attempt-2 branch from bb11ccc to 24f8a3e Compare October 16, 2025 14:24

skoob13 requested a review from a team October 16, 2025 14:26

skoob13 marked this pull request as ready for review October 16, 2025 14:26

posthog-bot requested review from a team, ablaszkiewicz, daibhin, hpouillot, ioannisj, marandaneto, oliverb123 and rafaeelaudibert and removed request for a team October 16, 2025 14:28

skoob13 removed request for a team, daibhin, hpouillot, marandaneto and oliverb123 October 16, 2025 14:28

skoob13 removed request for ablaszkiewicz, ioannisj and rafaeelaudibert October 16, 2025 14:28

greptile-apps bot reviewed Oct 16, 2025

View reviewed changes

ee/hogai/graph/insights/nodes.py Show resolved Hide resolved

kappa90 approved these changes Oct 16, 2025

View reviewed changes

skoob13 and others added 2 commits October 16, 2025 16:38

fix: tests

c3ee30a

Fix thread when canceled with a ReasoningMessage in last place

f70ed41

Twixes approved these changes Oct 16, 2025

View reviewed changes

Twixes mentioned this pull request Oct 16, 2025

test(max): Add ThreadCanceledDuringReasoning story #39790

Open

skoob13 added 3 commits October 16, 2025 17:23

fix: review comments

e0934ef

Merge branch 'feat/loop-executor-revamp-attempt-2' of github.com:Post…

d108cec

…Hog/posthog into feat/loop-executor-revamp-attempt-2

fix: mypy

d738546

Update UI snapshots for chromium (2)

e7ea7a6

skoob13 added the evals-ready Whether to run AI evals on this PR. label Oct 16, 2025

skoob13 added 2 commits October 16, 2025 17:56

fix: eval

42faf4f

Merge branch 'feat/loop-executor-revamp-attempt-2' of github.com:Post…

46ce4bf

…Hog/posthog into feat/loop-executor-revamp-attempt-2

skoob13 merged commit bcbe71a into master Oct 16, 2025
187 of 188 checks passed

skoob13 deleted the feat/loop-executor-revamp-attempt-2 branch October 16, 2025 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(max): single loop executor attempt 2 #39717

feat(max): single loop executor attempt 2 #39717

skoob13 commented Oct 15, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

kappa90 left a comment

Uh oh!

Twixes left a comment

Uh oh!

Twixes Oct 16, 2025

Uh oh!

skoob13 Oct 16, 2025

Uh oh!

posthog-bot commented Oct 16, 2025

Uh oh!

posthog-bot commented Oct 16, 2025

Uh oh!

posthog-bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(max): single loop executor attempt 2 #39717

feat(max): single loop executor attempt 2 #39717

Conversation

skoob13 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Changelog: (features only) Is this feature complete?

Uh oh!

github-actions bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kappa90 left a comment

Choose a reason for hiding this comment

Uh oh!

Twixes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

posthog-bot commented Oct 16, 2025

📸 UI snapshots have been updated

Uh oh!

posthog-bot commented Oct 16, 2025

📸 UI snapshots have been updated

Uh oh!

posthog-bot commented Oct 16, 2025

🧠 AI eval results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

skoob13 commented Oct 15, 2025 •

edited

Loading

github-actions bot commented Oct 15, 2025 •

edited

Loading