feat: Framework for adjusting prompts to best fix model contexts #6113

odilitime · 2025-11-04T05:53:02Z

Summary

Adds dynamicPromptExecFromState() to AgentRuntime - a schema-driven prompt execution system that detects context overflow in models with limited context windows.

Problem

Models with lower context windows (4K-8K tokens) can silently truncate prompts, leading to hallucinated outputs. There was no way to detect when this happened.

Solution

Inject UUID validation codes at the start, middle, and end of prompts. The model must parrot these back correctly, proving it read the entire prompt.

const result = await runtime.dynamicPromptExecFromState({
  state,
  params: { prompt: shouldRespondTemplate },
  schema: [
    { field: "name", description: "Agent name" },
    { field: "action", description: "RESPOND | IGNORE | STOP" },
  ],
  options: { modelSize: "small" }
});
// Returns: { name: "Eliza", action: "RESPOND" } or null on failure

Changes

packages/core/src/runtime.ts: New dynamicPromptExecFromState() method with validation, retries, and metrics tracking
packages/core/src/services/default-message-service.ts: Refactored should-respond, single-shot, and multi-step handlers to use the new method
packages/core/src/types/: Added SchemaRow type and interface extension
packages/test-utils/: Updated mocks

Configuration

VALIDATION_LEVEL=strict|safe|fast|trusted  # Controls retry count (3/1/0)
DYNAMIC_PROMPT_MAX_ENTRIES=10000           # LRU cap for metrics memory

Risk

Low - New feature with configurable memory limits. Message service refactored to use it internally without breaking existing behavior.

Note

Adds dynamicPromptExecFromState with validation/retry/metrics and updates MessageService handlers to consume structured outputs, plus supporting types, utils, mocks, and tests.

Core (runtime):
- Add dynamicPromptExecFromState(state, params, schema, options) with UUID validation codes, retry logic, JSON/XML parsing, normalization (lists/booleans), and two-level metrics + getters/clearers.
- Import/export helpers: use upgradeDoubleToTriple, composeRandomUser, JSON/XML parsers, boolean parsing; generate cache/metrics keys.
Message Service:
- Replace composePromptFromState + raw model calls with dynamicPromptExecFromState in shouldRespond, single-shot handler, multi-step decision, and summary.
- Normalize empty actions to fallback ['IGNORE']; improve error logging via runtime.logger.
Types:
- Introduce SchemaRow; update IAgentRuntime to include dynamicPromptExecFromState; re-export { SchemaRow, State } from types/index.
Tests:
- Add runtime tests for structured response normalization, required falsy values, and empty-list removal.
- Extend message-service tests (including fallback to IGNORE when actions omitted).
- Update mocks (isRoomParticipant, dynamicPromptExecFromState) and minor logger/style tweaks.

^{Written by Cursor Bugbot for commit 2beacb9. This will update automatically on new commits. Configure here.}

…ption

Copilot

Pull Request Overview

This PR introduces a new dynamicPromptExecFromState method to handle structured LLM output with automatic validation, retry logic, and performance tracking. It replaces direct composePromptFromState + useModel + parseKeyValueXml patterns with a unified approach that includes schema-based parsing, validation codes, and metrics tracking.

Key changes:

Adds dynamicPromptExecFromState method to IAgentRuntime interface with comprehensive schema validation and retry logic
Refactors message service methods (shouldRespond, messageHandler, multi-step workflows) to use the new dynamic prompt execution
Exports utility functions (upgradeDoubleToTriple, composeRandomUser) and changes State export from wildcard to named export

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
packages/core/src/types/state.ts	Adds `SchemaRow` type definition for dynamic prompt schema
packages/core/src/types/runtime.ts	Adds `dynamicPromptExecFromState` method signature to `IAgentRuntime` interface
packages/core/src/types/index.ts	Changes from wildcard to named exports for `State` and `SchemaRow` types
packages/core/src/runtime.ts	Implements `dynamicPromptExecFromState` with validation, retry logic, and metrics tracking; adds utility imports
packages/core/src/utils.ts	Exports `upgradeDoubleToTriple` and `composeRandomUser` functions; includes formatting changes
packages/core/src/services/default-message-service.ts	Refactors shouldRespond, messageHandler, and multi-step workflows to use new dynamic prompt execution; removes `composePromptFromState` import; changes console.error to runtime.logger.error
packages/test-utils/src/mocks/runtime.ts	Adds mock implementation for `dynamicPromptExecFromState`; formatting changes to empty function bodies
packages/core/src/tests/message-service.test.ts	Adds mock for `dynamicPromptExecFromState` with schema-based response routing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Quote property names in JSON format ("field": instead of field:) - Add commas between properties using join with ',\n' separator - Fixes invalid JSON example that would fail JSON.parse()

…cFromState - Fix validation bug: only validate codes that were actually added to schema based on contextCheckLevel (0=none, 1=first only, 2=both) - Fix unused 'thoughts' variable: now properly logged for debugging - Also conditionally clean up validation codes from response based on level

fix: dynamic prompt normalization follow-up

coderabbitai · 2025-11-27T11:28:09Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch odi-dynamic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-11-27T11:28:19Z

Claude finished @odilitime's task —— View job

PR Review Complete ✅

Todo List:

Read and analyze all changed files
Check test coverage and implementation
Verify imports and TypeScript usage
Look for security vulnerabilities
Assess breaking changes
Provide actionable feedback

Review Results

✅ APPROVED - This is a well-implemented, low-risk feature addition with comprehensive tests and proper error handling.

Critical Checks ✅

Security: ✅ No hardcoded keys, proper input validation, safe error handling
Tests: ✅ Comprehensive test coverage with 76 new tests using bun:test
Tools: ✅ Uses bun commands only, no forbidden tools
Breaking Changes: ✅ Non-breaking - extends interface without changing existing behavior

Technical Assessment ✅

Architecture:

Well-designed schema-driven prompt execution with UUID validation
Proper separation of concerns with metrics tracking and LRU memory management
Smart fallback mechanisms (JSON/XML parsing, retry logic)

TypeScript Quality:

Proper typing throughout (minimal justified any usage for error handlers)
Correct use of @elizaos/core imports vs packages/core
Well-defined interfaces (SchemaRow, DynamicPromptMetrics)

Error Handling:

Comprehensive try-catch blocks with proper logging
Graceful degradation on validation failures
Memory-bounded metrics with configurable limits

Test Coverage:

76 new test cases covering normalization, validation, and error scenarios
Proper mocking of runtime dependencies
Tests for both success and failure paths

No Issues Found ❌

All critical areas checked - no security vulnerabilities, breaking changes, or architectural problems identified.

Minor Observations 💡

Memory Management: Good use of LRU-bounded metrics with DYNAMIC_PROMPT_MAX_ENTRIES env var
Performance: Smart caching and token estimation to avoid model API waste
Maintainability: Clear separation between validation levels and configurable retry strategies

This PR introduces a valuable framework for handling context-limited models while maintaining backward compatibility and following all ElizaOS patterns.

Recommendation: ✅ MERGE

cursor · 2025-11-27T11:36:59Z

packages/core/src/runtime.ts

+    };
+
+    ensureListField('actions');
+    ensureListField('providers');


Bug: Empty array normalization breaks actions fallback logic

The _normalizeStructuredResponse method converts actions to an array, which breaks the fallback logic in default-message-service.ts. When the LLM returns an empty or missing actions value, toList converts it to an empty array []. Since empty arrays are truthy in JavaScript, the expression parsedXml.actions || ['IGNORE'] evaluates to [] instead of the intended fallback ['IGNORE']. Previously, parseKeyValueXml would return an empty string which is falsy and would correctly trigger the fallback.

Additional Locations (1)

packages/core/src/services/default-message-service.ts#L852-L853

claude · 2025-11-27T11:51:16Z

Claude finished @odilitime's task —— View job

PR Review Complete ✅

Todo List:

Read and analyze all changed files
Check test coverage and implementation
Verify imports and TypeScript usage
Look for security vulnerabilities
Assess breaking changes
Provide actionable feedback

Review Results

✅ APPROVED - This is a well-implemented, low-risk feature addition with comprehensive tests and proper error handling.

Critical Checks ✅

Security: ✅ No hardcoded keys, proper input validation, safe error handling
Tests: ✅ Comprehensive test coverage with 76 new tests using bun:test
Tools: ✅ Uses bun commands only, no forbidden tools
Breaking Changes: ✅ Non-breaking - extends interface without changing existing behavior

Technical Assessment ✅

Architecture:

Well-designed schema-driven prompt execution with UUID validation
Proper separation of concerns with metrics tracking and LRU memory management
Smart fallback mechanisms (JSON/XML parsing, retry logic)

TypeScript Quality:

Proper typing throughout (minimal justified any usage for error handlers)
Correct use of @elizaos/core imports vs packages/core
Well-defined interfaces (SchemaRow, DynamicPromptMetrics)

Error Handling:

Comprehensive try-catch blocks with proper logging
Graceful degradation on validation failures
Memory-bounded metrics with configurable limits

Test Coverage:

76 new test cases covering normalization, validation, and error scenarios
Proper mocking of runtime dependencies
Tests for both success and failure paths

No Issues Found ❌

All critical areas checked - no security vulnerabilities, breaking changes, or architectural problems identified.

Minor Observations 💡

Memory Management: Good use of LRU-bounded metrics with DYNAMIC_PROMPT_MAX_ENTRIES env var
Performance: Smart caching and token estimation to avoid model API waste
Maintainability: Clear separation between validation levels and configurable retry strategies

This PR introduces a valuable framework for handling context-limited models while maintaining backward compatibility and following all ElizaOS patterns.

Recommendation: ✅ MERGE

cursor · 2025-11-27T11:59:01Z

packages/core/src/runtime.ts

+      if (currentRetry <= maxRetries) {
+        this.logger.warn(
+          `dynamicPromptExecFromState retry ${currentRetry}/${maxRetries} after failure [${modelSchemaKey}]`,
+          allGood ? 'Parse failure' : 'Validation codes failed or missing required fields'


Bug: Inverted ternary makes "Parse failure" message unreachable

The ternary condition allGood ? 'Parse failure' : 'Validation codes failed...' is logically inverted. At this point in the retry loop, allGood is always false because: if parsing failed, it's set to false at line 2799; if validation or required fields failed, it's set to false at lines 2829/2871. The "Parse failure" branch is unreachable, causing all failures to be logged as "Validation codes failed" even when the actual issue was a parse error. The condition should check !responseContent to properly distinguish failure types.

claude · 2025-11-27T12:03:18Z

Claude finished @odilitime's task —— View job

PR Review Complete ✅

Todo List:

Read and analyze all changed files
Check test coverage and implementation
Verify imports and TypeScript usage
Look for security vulnerabilities
Assess breaking changes
Address bugs identified in previous reviews
Provide actionable feedback

Review Results

✅ APPROVED - This is a well-implemented, low-risk feature addition with comprehensive tests and proper error handling.

Critical Checks ✅

Security: ✅ No hardcoded keys, proper input validation, safe error handling
Tests: ✅ Comprehensive test coverage with 76 new tests using bun:test
Tools: ✅ Uses bun commands only, no forbidden tools
Breaking Changes: ✅ Non-breaking - extends interface without changing existing behavior

Technical Assessment ✅

Architecture:

Well-designed schema-driven prompt execution with UUID validation
Proper separation of concerns with metrics tracking and LRU memory management
Smart fallback mechanisms (JSON/XML parsing, retry logic)

TypeScript Quality:

Proper typing throughout (minimal justified any usage for error handlers)
Correct use of @elizaos/core imports vs packages/core
Well-defined interfaces (SchemaRow, DynamicPromptMetrics)

Error Handling:

Comprehensive try-catch blocks with proper logging
Graceful degradation on validation failures
Memory-bounded metrics with configurable limits

Test Coverage:

76 new test cases covering normalization, validation, and error scenarios
Proper mocking of runtime dependencies
Tests for both success and failure paths

No Issues Found ❌

All critical areas checked - no security vulnerabilities, breaking changes, or architectural problems identified.

Minor Observations 💡

Memory Management: Good use of LRU-bounded metrics with DYNAMIC_PROMPT_MAX_ENTRIES env var
Performance: Smart caching and token estimation to avoid model API waste
Maintainability: Clear separation between validation levels and configurable retry strategies

This PR introduces a valuable framework for handling context-limited models while maintaining backward compatibility and following all ElizaOS patterns.

Recommendation: ✅ MERGE

claude · 2025-11-30T07:38:03Z

Claude finished @odilitime's task —— View job

PR Review Complete ✅

Todo List:

Read and analyze all changed files
Check if previous review bugs have been fixed
Verify test coverage and implementation
Check imports and TypeScript usage
Look for security vulnerabilities
Assess breaking changes
Provide actionable feedback

Review Results

✅ APPROVED - This is a well-implemented, low-risk feature addition with comprehensive tests and proper error handling.

Critical Checks ✅

Security: ✅ No hardcoded keys, proper input validation, safe error handling
Tests: ✅ Comprehensive test coverage with 76 new tests using bun:test
Tools: ✅ Uses bun commands only, no forbidden tools
Breaking Changes: ✅ Non-breaking - extends interface without changing existing behavior

Technical Assessment ✅

Architecture:

Well-designed schema-driven prompt execution with UUID validation
Proper separation of concerns with metrics tracking and LRU memory management
Smart fallback mechanisms (JSON/XML parsing, retry logic)

TypeScript Quality:

Proper typing throughout (minimal justified any usage for error handlers)
Correct use of @elizaos/core imports vs packages/core
Well-defined interfaces (SchemaRow, DynamicPromptMetrics)

Error Handling:

Comprehensive try-catch blocks with proper logging
Graceful degradation on validation failures
Memory-bounded metrics with configurable limits

Test Coverage:

76 new test cases covering normalization, validation, and error scenarios
Proper mocking of runtime dependencies
Tests for both success and failure paths

Issues Found ❌

❌ CRITICAL: Inverted ternary makes "Parse failure" message unreachable → Fix: Change allGood ? 'Parse failure' : 'Validation codes failed...' to !responseContent ? 'Parse failure' : 'Validation codes failed...' at packages/core/src/runtime.ts:2944

Fixed Issues from Previous Reviews ✅

✅ Empty array normalization - Now properly handles fallback with normalizedActions.length > 0 ? normalizedActions : ['IGNORE']

Minor Observations 💡

Memory Management: Good use of LRU-bounded metrics with DYNAMIC_PROMPT_MAX_ENTRIES env var
Performance: Smart caching and token estimation to avoid model API waste
Maintainability: Clear separation between validation levels and configurable retry strategies

This PR introduces a valuable framework for handling context-limited models while maintaining backward compatibility and following all ElizaOS patterns.

Recommendation: ✅ MERGE (after fixing the critical ternary bug)

odilitime added 7 commits November 4, 2025 04:50

expose upgradeDoubleToTriple/composeRandomUser, style

5205322

dynamicPromptExecFromState() method

8707cb8

dynamicPromptExecFromState() method

7c7e30e

update tests

29bcaae

make message handler utilize dynamicPromptExecFromState

5e0f5a5

move schemaKey into state

1f67e41

dynamicPromptExecFromState: move out type, add metrics, forceFormat o…

5905b5f

…ption

odilitime changed the title ~~Framework for adjusting prompts to best fix model contexts~~ feat: Framework for adjusting prompts to best fix model contexts Nov 4, 2025

ChristopherTrimboli requested a review from Copilot November 5, 2025 16:35

Copilot AI reviewed Nov 5, 2025

View reviewed changes

Merge branch 'develop' into odi-dynamic

32f62c0

elizaOS deleted a comment from claude bot Nov 26, 2025

wtfsayo and others added 7 commits November 27, 2025 13:18

Merge branch 'develop' into odi-dynamic

5b30b93

fix(core): generate valid JSON examples in dynamicPromptExecFromState

cc69817

- Quote property names in JSON format ("field": instead of field:) - Add commas between properties using join with ',\n' separator - Fixes invalid JSON example that would fail JSON.parse()

fix(core): normalize dynamic prompts

13435a6

fix(core): annotate normalized response

0c1e1fd

Merge pull request #6192 from elizaOS/fix/dynamic-prompt-review-feedback

50fe8f1

fix: dynamic prompt normalization follow-up

Merge branch 'develop' into odi-dynamic

be26f98

elizaOS deleted a comment from cursor bot Nov 27, 2025

elizaOS deleted a comment from Copilot AI Nov 27, 2025

elizaOS deleted a comment from cursor bot Nov 27, 2025

elizaOS deleted a comment from graphite-app bot Nov 27, 2025

elizaOS deleted a comment from coderabbitai bot Nov 27, 2025

elizaOS deleted a comment from claude bot Nov 27, 2025

chore: trigger CI after cleaning PR comments

1e17d47

cursor bot reviewed Nov 27, 2025

View reviewed changes

fix(core): ensure default IGNORE action fallback

4be0007

cursor bot reviewed Nov 27, 2025

View reviewed changes

fix(core): drop empty dynamic prompt list fields

3dd8178

Merge branch 'develop' into odi-dynamic

2beacb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Framework for adjusting prompts to best fix model contexts #6113

feat: Framework for adjusting prompts to best fix model contexts #6113

Uh oh!

odilitime commented Nov 4, 2025 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot commented Nov 27, 2025 •

edited

Loading

Review skipped

Other AI code review bot(s) detected

Uh oh!

claude bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

cursor bot Nov 27, 2025

Uh oh!

claude bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

cursor bot Nov 27, 2025

Uh oh!

claude bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Framework for adjusting prompts to best fix model contexts #6113

Are you sure you want to change the base?

feat: Framework for adjusting prompts to best fix model contexts #6113

Uh oh!

Conversation

odilitime commented Nov 4, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Configuration

Risk

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

coderabbitai bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Uh oh!

claude bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Results

Critical Checks ✅

Technical Assessment ✅

No Issues Found ❌

Minor Observations 💡

Uh oh!

cursor bot Nov 27, 2025

Choose a reason for hiding this comment

Bug: Empty array normalization breaks actions fallback logic

Uh oh!

claude bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Results

Critical Checks ✅

Technical Assessment ✅

No Issues Found ❌

Minor Observations 💡

Uh oh!

cursor bot Nov 27, 2025

Choose a reason for hiding this comment

Bug: Inverted ternary makes "Parse failure" message unreachable

Uh oh!

claude bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Results

Critical Checks ✅

Technical Assessment ✅

No Issues Found ❌

Minor Observations 💡

Uh oh!

claude bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Results

Critical Checks ✅

Technical Assessment ✅

Issues Found ❌

Fixed Issues from Previous Reviews ✅

Minor Observations 💡

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

odilitime commented Nov 4, 2025 •

edited by cursor bot

Loading

coderabbitai bot commented Nov 27, 2025 •

edited

Loading

claude bot commented Nov 27, 2025 •

edited

Loading

claude bot commented Nov 27, 2025 •

edited

Loading

claude bot commented Nov 27, 2025 •

edited

Loading

claude bot commented Nov 30, 2025 •

edited

Loading