-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(max): single loop executor attempt 2 #39717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Size Change: +28 B (0%) Total Size: 3.06 MB ℹ️ View Unchanged
|
This reverts commit 6341ff1.
bb11ccc
to
24f8a3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
82 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:yolo:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ee/hogai/graph/inkeep_docs/nodes.py
Outdated
# Original node has Anthropic messages, but Inkeep expects OpenAI messages | ||
langchain_messages = convert_to_messages( | ||
convert_to_openai_messages(super()._construct_messages(messages, window_start_id, tool_calls_count)) | ||
conversation_window = self._window_manager.get_messages_in_window(messages, window_start_id)[-28:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a comments why the number 28 for other readers, while we're here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
…Hog/posthog into feat/loop-executor-revamp-attempt-2
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
…Hog/posthog into feat/loop-executor-revamp-attempt-2
🧠 AI eval resultsEvaluated 29 experiments, comprising 61 metrics. deep_research_onboarding🔵 covers_essential_topics: 100.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760615533 • Avg. case performance: ⏱️ 48.91 s, 🔢 0 tokens tool_routing_dashboard_creation🔴 ToolRelevance: 59.10%, -17.00% (improvements: 1, regressions: 5) Baseline: master-1760615555 • Avg. case performance: ⏱️ 8.30 s, 🔢 0 tokens tool_call_dashboard_creation🔵 dashboard_creation_accuracy: 40.00%, ±0.00% (improvements: 1, regressions: 1) Baseline: master-1760615566 • Avg. case performance: ⏱️ 22.30 s, 🔢 6523 tokens, 💵 $0.0030 in tokens funnel🔴 plan_correctness: 82.75%, -7.00% (improvements: 3, regressions: 4) Baseline: master-1760615620 • Avg. case performance: ⏱️ 62.20 s, 🔢 8709 tokens, 💵 $0.0185 in tokens insight_evaluation_accuracy🟢 InsightEvaluationAccuracy: 75.00%, +25.00% (improvements: 1, regressions: 0) Baseline: master-1760615758 • Avg. case performance: ⏱️ 16.71 s, 🔢 2804 tokens, 💵 $0.0013 in tokens memory🔴 ToolRelevance: 68.87%, -1.07% (improvements: 2, regressions: 2) Baseline: master-1760615771 • Avg. case performance: ⏱️ 2.43 s, 🔢 1134 tokens, 💵 $0.0032 in tokens memory_onboarding🔴 has_correct_style: 50.00%, -16.67% (improvements: 0, regressions: 1) Baseline: master-1760615784 • Avg. case performance: ⏱️ 116.52 s, 🔢 674 tokens, 💵 $0.0033 in tokens retention🔵 QueryKindSelection: 75.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760615956 • Avg. case performance: ⏱️ 62.02 s, 🔢 7693 tokens, 💵 $0.0207 in tokens root🔴 ToolRelevance: 69.14%, -18.11% (improvements: 4, regressions: 36) Baseline: master-1760616013 • Avg. case performance: ⏱️ 11.23 s, 🔢 0 tokens root_style🔴 style_checker: 85.00%, -5.00% (improvements: 1, regressions: 2) Baseline: master-1760616034 • Avg. case performance: ⏱️ 18.53 s, 🔢 0 tokens tool_routing_session_replay🔵 ToolRelevance: 94.86%, +0.14% (improvements: 4, regressions: 3) Baseline: master-1760616083 • Avg. case performance: ⏱️ 11.32 s, 🔢 7668 tokens, 💵 $0.0469 in tokens session_summarization_no_context🔵 ToolRelevance: 97.99%, -0.35% (improvements: 1, regressions: 1) Baseline: master-1760616096 • Avg. case performance: ⏱️ 8.74 s, 🔢 0 tokens sql🔴 QueryKindSelection: 92.31%, -7.69% (improvements: 0, regressions: 0) Baseline: master-1760616102 • Avg. case performance: ⏱️ 56.39 s, 🔢 10467 tokens, 💵 $0.0237 in tokens survey_analysis🔵 recommendation_quality: 100.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616241 • Avg. case performance: ⏱️ 11.28 s, 🔢 2697 tokens, 💵 $0.0088 in tokens surveys🔵 feature_flag_integration: 0.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616246 • Avg. case performance: ⏱️ 5.06 s, 🔢 5687 tokens, 💵 $0.0122 in tokens trends🟢 QueryKindSelection: 90.00%, +20.00% (improvements: 2, regressions: 0) Baseline: master-1760616262 • Avg. case performance: ⏱️ 37.91 s, 🔢 11250 tokens, 💵 $0.0230 in tokens ui_context_actions🟢 ToolRelevance: 68.24%, +1.58% (improvements: 2, regressions: 0) Baseline: master-1760616354 • Avg. case performance: ⏱️ 8.21 s, 🔢 0 tokens ui_context_events🔵 ToolRelevance: 66.42%, +0.81% (improvements: 3, regressions: 0) Baseline: master-1760616360 • Avg. case performance: ⏱️ 8.59 s, 🔢 0 tokens insights_addition🔵 SemanticSimilarity: 78.95%, -0.67% (improvements: 0, regressions: 2) Baseline: master-1760616367 • Avg. case performance: ⏱️ 9.22 s, 🔢 6035 tokens, 💵 $0.0028 in tokens combined_rename_and_add🔵 SemanticSimilarity: 89.00%, +0.04% (improvements: 1, regressions: 0) Baseline: master-1760616411 • Avg. case performance: ⏱️ 6.15 s, 🔢 5739 tokens, 💵 $0.0025 in tokens tool_generate_hogql_query🔵 no_mustache: 100.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616421 • Avg. case performance: ⏱️ 9.93 s, 🔢 21989 tokens, 💵 $0.0447 in tokens tool_filter_revenue_analytics🔵 date_time_filtering_correctness: 100.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616472 • Avg. case performance: ⏱️ 2.17 s, 🔢 4113 tokens, 💵 $0.0087 in tokens tool_filter_revenue_analytics_ask_user_for_help🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616480 • Avg. case performance: ⏱️ 2.02 s, 🔢 3577 tokens, 💵 $0.0075 in tokens tool_search_session_recordings🔴 date_time_filtering_correctness: 94.44%, -3.57% (improvements: 0, regressions: 1) Baseline: master-1760616520 • Avg. case performance: ⏱️ 7.20 s, 🔢 14987 tokens, 💵 $0.0308 in tokens tool_search_session_recordings_ask_user_for_help🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616536 • Avg. case performance: ⏱️ 13.60 s, 🔢 58365 tokens, 💵 $0.1180 in tokens tool_search_session_recordings🔵 date_time_filtering_correctness: 97.22%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616520 • Avg. case performance: ⏱️ 7.72 s, 🔢 13979 tokens, 💵 $0.0287 in tokens tool_search_session_recordings_ask_user_for_help🔵 ask_user_for_help_scorer: 0.00%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616536 • Avg. case performance: ⏱️ 6.56 s, 🔢 27404 tokens, 💵 $0.0555 in tokens filter_query_generation🔵 SemanticSimilarity: 99.21%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616545 • Avg. case performance: ⏱️ 0.54 s, 🔢 554 tokens, 💵 $0.0012 in tokens yaml_fixing🔵 ExactMatch: 83.33%, ±0.00% (improvements: 0, regressions: 0) Baseline: master-1760616555 • Avg. case performance: ⏱️ 1.00 s, 🔢 168 tokens, 💵 $0.0001 in tokens Triggered by this commit. |
Problem
The second attempt for the new loop executor with fixed issues.
Changes
How did you test this code?
Unit tests & manual testing
Changelog: (features only) Is this feature complete?
Max has become a more intelligent agent.