Date: 2025-10-28 Status: Identified Priority: Critical
Issue: Same benchmark (200-jup-swap-then-lend-deposit) succeeds via CLI (100% score) but fails via API due to different Jupiter swap tool response formats between execution paths.
Root Cause: Two different Jupiter swap tool implementations with inconsistent response structures:
- CLI path: Uses
jupiter_swap_flow.rs(flow-aware tool) - API path: Uses
jupiter_swap.rs(standard tool)
Impact: Step 2 (deposit) receives no swap amount data from step 1, causing LLM to guess wrong amount.
Evidence:
CLI Success: amount=394358118 (394.358 USDC) - Uses swap_details.output_amount
API Failure: amount=1000000000 (1000 USDC) - Missing swap_details structure
Critical Files:
crates/reev-tools/src/tools/jupiter_swap_flow.rs- Flow-aware tool with swap_detailscrates/reev-tools/src/tools/jupiter_swap.rs- Standard tool without swap_detailscrates/reev-agent/src/flow/agent.rs- Tool routing logiccrates/reev-context/src/lib.rs- process_step_result_for_context() expects swap_details
Fix Strategy: Unify Jupiter swap tool implementations to ensure consistent response format across all execution paths.
Date: 2025-06-17 Status: Open Priority: High
AI model was requesting 1,000,000,000,000 USDC (1 trillion) for deposit in benchmark 200-jup-swap-then-lend-deposit step 2, despite only having 383,193,564 USDC available in context.
Status: Significant Improvement 🎉
- Before: Complete failure due to trillion USDC requests
- After: 75% score with custom program errors (0x1, 0xffff)
- Issue: No longer requesting insane amounts, now has execution errors
Fixes Applied:
- Fixed context serialization to use numbers instead of strings
- Enhanced tool description to be more explicit about reading exact balances
Date: 2025-10-26 Status: In Progress Priority: High
Issue: GLM models (glm-4.6-coding) through reev-agent are generating wrong recipient ATAs for SPL transfers. Instead of using pre-created ATAs from benchmark setup, the LLM generates new ATAs or uses incorrect ATA names.
Symptoms:
002-spl-transferscore: 56.2% with "invalid account data for instruction" error- LLM generates transaction with wrong recipient ATA: "8RXifzZ34i3E7qTcvYFaUvCRaswcJBDBXrPGgrwPZxTo" instead of expected "BmCGQJCPZHrAzbLCjHd1JBQAxF24jrReU3fPwN6ri6a7"
Root Cause:
- LLM should use placeholder name
"RECIPIENT_USDC_ATA"in tool calls, but is generating new recipient ATA - Context confusion from RESOLVED ADDRESSES section (already fixed but still affecting GLM behavior)
- Possible misinterpretation of recipient parameters vs ATA placeholders
- FIXED: Different GLM agents had inconsistent context and wallet handling
✅ COMPLETED FIXES:
- UNIFIED GLM LOGIC: Created
UnifiedGLMAgentwith shared context and wallet handling - IDENTICAL CONTEXT: Both
OpenAIAgentandZAIAgentnow use same context building logic - SHARED COMPONENTS: Wallet info creation and prompt mapping are now identical
- PROVIDER-SPECIFIC: Only request/response handling differs between implementations
Technical Requirements:
- Test Unified Logic: Verify unified GLM logic resolves context inconsistencies
- Improve ATA Resolution Logic: Enhance SPL transfer tool to better prioritize pre-created ATAs from key_map over generated ones
- Strengthen Context Instructions: Make context warnings more explicit about using placeholder names vs direct addresses
- Test Across GLM Variants: Verify fix works with different GLM model implementations
- Documentation Update: Update documentation with clear examples of correct ATA usage
Status: In Progress Priority: High Description: Fix SPL transfer tool to properly resolve pre-created ATAs and prevent LLM from generating incorrect recipient addresses
Background:
- SOL transfer issue was successfully resolved by improving context instructions
- However, SPL transfers still fail because GLM models generate wrong recipient ATAs despite having pre-created ones in key_map
- Local agents work perfectly, indicating the issue is specific to GLM model routing through reev-agent
- ✅ COMPLETED: Unified GLM logic architecture - both OpenAIAgent and ZAIAgent now use identical context and wallet handling
Technical Requirements:
- Investigate LLM Tool Calls: Debug exactly what recipient_pubkey value LLM is using in spl_transfer calls
- Improve ATA Resolution Logic: Enhance SPL transfer tool to better prioritize pre-created ATAs from key_map over generated ones
- Strengthen Context Instructions: Make context warnings more explicit about using placeholder names vs direct addresses
- Test Across GLM Variants: Verify fix works with different GLM model implementations
- Documentation Update: Update documentation with clear examples of correct ATA usage
Implementation Steps:
- ✅ UNIFIED GLM ARCHITECTURE: Refactored both OpenAIAgent and ZAIAgent to use shared logic
- Debug Current Behavior: Add extensive logging to SPL transfer tool to track LLM parameter usage
- Enhance Key Map Resolution: Improve how tool looks up and prioritizes ATAs from key_map over generated ones
- Context Clarification: Strengthen RESOLVED ADDRESSES section instructions for GLM models
- Comprehensive Testing: Test fix across multiple GLM benchmarks and model variants
- Documentation Update: Update documentation with clear examples of correct ATA usage
Current Critical Priority: Jupiter swap tool unification takes precedence over GLM ATA resolution fix.
Add this task to TASKS.md and track progress in ISSUES.md
- Analyze why two different implementations exist
- Determine if
jupiter_swap_flow.rsshould replacejupiter_swap.rsentirely - Update tool registration and discovery mechanisms
- Document proper usage patterns for each tool type
Acceptance Criteria:
- Unified GLM logic implemented in common module
- Both OpenAIAgent and ZAIAgent use identical context building
- Wallet info creation and prompt mapping are shared
- Only request/response handling differs between implementations
- Local agent works correctly (SOL and SPL transfers)
- GLM-4.6-coding agent works correctly (SOL transfers)
- GLM-4.6-coding agent works correctly (SPL transfers)
- No regression in other agent types
- Improved context prevents address confusion
- All diagnostics pass
Status: In Progress
Priority: High
Description: Unify Jupiter swap tool implementations to ensure consistent swap_details response format across CLI and API execution paths.
Background:
Same benchmark (200-jup-swap-then-lend-deposit) succeeds via CLI (100% score) but fails via API due to different Jupiter swap tool response formats between execution paths.
Root Cause: Two different Jupiter swap tool implementations with inconsistent response structures:
- CLI path: Uses
jupiter_swap_flow.rs(flow-aware tool) - API path: Uses
jupiter_swap.rs(standard tool)
Impact: Step 2 (deposit) receives no swap amount data from step 1, causing LLM to guess wrong amount.
Evidence:
CLI Success: amount=394358118 (394.358 USDC) - Uses swap_details.output_amount
API Failure: amount=1000000000 (1000 USDC) - Missing swap_details structure
Critical Files:
crates/reev-tools/src/tools/jupiter_swap_flow.rs- Flow-aware tool with swap_detailscrates/reev-tools/src/tools/jupiter_swap.rs- Standard tool without swap_detailscrates/reev-agent/src/flow/agent.rs- Tool routing logiccrates/reev-context/src/lib.rs-process_step_result_for_context()expects swap_details
Fix Strategy:
Unify Jupiter swap tool implementations to ensure both CLI and API use consistent tool that provides structured swap_details for multi-step flow communication.
Implementation Options:
- Option A: Tool Unification (Recommended)
- Merge both implementations into single flow-aware Jupiter swap tool
- Ensure consistent
swap_detailsresponse format - Update tool registration to use unified tool only
- Option B: Response Format Standardization (Quick)
- Modify
jupiter_swap.rsto also returnswap_detailsstructure - Ensure both tools provide same data format
- Modify
- Investigate routing logic - Why do CLI and API use different tools?
- Compare response formats - Document exact structure differences
- Implement unification - Choose Option A or B based on complexity
- Test thoroughly - Ensure API achieves same success rate as CLI
- Document changes - Update tool usage patterns for each tool type
Technical Requirements:
- Preserve CLI functionality - Don't break the working path
- Consistent response format - Both tools return
swap_details.output_amount - Context flow verification - Step 2 receives correct swap result data
- No performance regression - API path matches CLI success rate
Acceptance Criteria:
- Unified GLM logic implemented in common module
- Both OpenAIAgent and ZAIAgent use identical context building
- Wallet info creation and prompt mapping are shared
- Only request/response handling differs between implementations
- Local agent works correctly (SOL and SPL transfers)
- GLM-4.6-coding agent works correctly (SOL transfers)
- GLM-4.6-coding agent works correctly (SPL transfers)
- No regression in other agent types
- Improved context prevents address confusion
- All diagnostics pass
Add this task to TASKS.md and track progress in ISSUES.md