Tasks

🎯 Open Issues

#1 Jupiter Swap Tool Response Format Inconsistency - CRITICAL 🔥

Date: 2025-10-28 Status: Identified Priority: Critical

Issue: Same benchmark (200-jup-swap-then-lend-deposit) succeeds via CLI (100% score) but fails via API due to different Jupiter swap tool response formats between execution paths.

Root Cause: Two different Jupiter swap tool implementations with inconsistent response structures:

CLI path: Uses jupiter_swap_flow.rs (flow-aware tool)
API path: Uses jupiter_swap.rs (standard tool)

Impact: Step 2 (deposit) receives no swap amount data from step 1, causing LLM to guess wrong amount.

Evidence:

CLI Success: amount=394358118 (394.358 USDC) - Uses swap_details.output_amount
API Failure: amount=1000000000 (1000 USDC) - Missing swap_details structure

Critical Files:

crates/reev-tools/src/tools/jupiter_swap_flow.rs - Flow-aware tool with swap_details
crates/reev-tools/src/tools/jupiter_swap.rs - Standard tool without swap_details
crates/reev-agent/src/flow/agent.rs - Tool routing logic
crates/reev-context/src/lib.rs - process_step_result_for_context() expects swap_details

Fix Strategy: Unify Jupiter swap tool implementations to ensure consistent response format across all execution paths.

#1 AI Model Amount Request Issue - High

Date: 2025-06-17 Status: Open Priority: High

AI model was requesting 1,000,000,000,000 USDC (1 trillion) for deposit in benchmark 200-jup-swap-then-lend-deposit step 2, despite only having 383,193,564 USDC available in context.

Status: Significant Improvement 🎉

Before: Complete failure due to trillion USDC requests
After: 75% score with custom program errors (0x1, 0xffff)
Issue: No longer requesting insane amounts, now has execution errors

Fixes Applied:

Fixed context serialization to use numbers instead of strings
Enhanced tool description to be more explicit about reading exact balances

🎯 GLM SPL Transfer ATA Resolution Issue - Medium

Date: 2025-10-26 Status: In Progress Priority: High

Issue: GLM models (glm-4.6-coding) through reev-agent are generating wrong recipient ATAs for SPL transfers. Instead of using pre-created ATAs from benchmark setup, the LLM generates new ATAs or uses incorrect ATA names.

Symptoms:

002-spl-transfer score: 56.2% with "invalid account data for instruction" error
LLM generates transaction with wrong recipient ATA: "8RXifzZ34i3E7qTcvYFaUvCRaswcJBDBXrPGgrwPZxTo" instead of expected "BmCGQJCPZHrAzbLCjHd1JBQAxF24jrReU3fPwN6ri6a7"

Root Cause:

LLM should use placeholder name "RECIPIENT_USDC_ATA" in tool calls, but is generating new recipient ATA
Context confusion from RESOLVED ADDRESSES section (already fixed but still affecting GLM behavior)
Possible misinterpretation of recipient parameters vs ATA placeholders
FIXED: Different GLM agents had inconsistent context and wallet handling

✅ COMPLETED FIXES:

UNIFIED GLM LOGIC: Created UnifiedGLMAgent with shared context and wallet handling
IDENTICAL CONTEXT: Both OpenAIAgent and ZAIAgent now use same context building logic
SHARED COMPONENTS: Wallet info creation and prompt mapping are now identical
PROVIDER-SPECIFIC: Only request/response handling differs between implementations

Technical Requirements:

Test Unified Logic: Verify unified GLM logic resolves context inconsistencies
Improve ATA Resolution Logic: Enhance SPL transfer tool to better prioritize pre-created ATAs from key_map over generated ones
Strengthen Context Instructions: Make context warnings more explicit about using placeholder names vs direct addresses
Test Across GLM Variants: Verify fix works with different GLM model implementations
Documentation Update: Update documentation with clear examples of correct ATA usage

📋 Tasks

🎯 GLM SPL Transfer ATA Resolution Fix - High Priority

Status: In Progress Priority: High Description: Fix SPL transfer tool to properly resolve pre-created ATAs and prevent LLM from generating incorrect recipient addresses

Background:

SOL transfer issue was successfully resolved by improving context instructions
However, SPL transfers still fail because GLM models generate wrong recipient ATAs despite having pre-created ones in key_map
Local agents work perfectly, indicating the issue is specific to GLM model routing through reev-agent
✅ COMPLETED: Unified GLM logic architecture - both OpenAIAgent and ZAIAgent now use identical context and wallet handling

Technical Requirements:

Investigate LLM Tool Calls: Debug exactly what recipient_pubkey value LLM is using in spl_transfer calls
Improve ATA Resolution Logic: Enhance SPL transfer tool to better prioritize pre-created ATAs from key_map over generated ones
Strengthen Context Instructions: Make context warnings more explicit about using placeholder names vs direct addresses
Test Across GLM Variants: Verify fix works with different GLM model implementations
Documentation Update: Update documentation with clear examples of correct ATA usage

Implementation Steps:

✅ UNIFIED GLM ARCHITECTURE: Refactored both OpenAIAgent and ZAIAgent to use shared logic
Debug Current Behavior: Add extensive logging to SPL transfer tool to track LLM parameter usage
Enhance Key Map Resolution: Improve how tool looks up and prioritizes ATAs from key_map over generated ones
Context Clarification: Strengthen RESOLVED ADDRESSES section instructions for GLM models
Comprehensive Testing: Test fix across multiple GLM benchmarks and model variants
Documentation Update: Update documentation with clear examples of correct ATA usage

Current Critical Priority: Jupiter swap tool unification takes precedence over GLM ATA resolution fix.

🧪 Jupiter Swap Tool Implementation Review**

Add this task to TASKS.md and track progress in ISSUES.md

🧪 Jupiter Swap Tool Implementation Review

Analyze why two different implementations exist
Determine if jupiter_swap_flow.rs should replace jupiter_swap.rs entirely
Update tool registration and discovery mechanisms
Document proper usage patterns for each tool type

Acceptance Criteria:

🎯 Jupiter Swap Tool Unification - HIGH PRIORITY ✅ IN PROGRESS

Status: In Progress
Priority: High
Description: Unify Jupiter swap tool implementations to ensure consistent swap_details response format across CLI and API execution paths.

Background: Same benchmark (200-jup-swap-then-lend-deposit) succeeds via CLI (100% score) but fails via API due to different Jupiter swap tool response formats between execution paths.

Root Cause: Two different Jupiter swap tool implementations with inconsistent response structures:

CLI path: Uses jupiter_swap_flow.rs (flow-aware tool)
API path: Uses jupiter_swap.rs (standard tool)

Impact: Step 2 (deposit) receives no swap amount data from step 1, causing LLM to guess wrong amount.

Evidence:

CLI Success: amount=394358118 (394.358 USDC) - Uses swap_details.output_amount
API Failure: amount=1000000000 (1000 USDC) - Missing swap_details structure

Critical Files:

crates/reev-tools/src/tools/jupiter_swap_flow.rs - Flow-aware tool with swap_details
crates/reev-tools/src/tools/jupiter_swap.rs - Standard tool without swap_details
crates/reev-agent/src/flow/agent.rs - Tool routing logic
crates/reev-context/src/lib.rs - process_step_result_for_context() expects swap_details

Fix Strategy: Unify Jupiter swap tool implementations to ensure both CLI and API use consistent tool that provides structured swap_details for multi-step flow communication.

Implementation Options:

Option A: Tool Unification (Recommended)
- Merge both implementations into single flow-aware Jupiter swap tool
- Ensure consistent swap_details response format
- Update tool registration to use unified tool only
Option B: Response Format Standardization (Quick)
- Modify jupiter_swap.rs to also return swap_details structure
- Ensure both tools provide same data format

🛠️ Immediate Action Required

Investigate routing logic - Why do CLI and API use different tools?
Compare response formats - Document exact structure differences
Implement unification - Choose Option A or B based on complexity
Test thoroughly - Ensure API achieves same success rate as CLI
Document changes - Update tool usage patterns for each tool type

Technical Requirements:

Preserve CLI functionality - Don't break the working path
Consistent response format - Both tools return swap_details.output_amount
Context flow verification - Step 2 receives correct swap result data
No performance regression - API path matches CLI success rate

Acceptance Criteria:

Add this task to TASKS.md and track progress in ISSUES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks

🎯 Open Issues

#1 Jupiter Swap Tool Response Format Inconsistency - CRITICAL 🔥

#1 AI Model Amount Request Issue - High

🎯 GLM SPL Transfer ATA Resolution Issue - Medium

📋 Tasks

🎯 GLM SPL Transfer ATA Resolution Fix - High Priority

🧪 Jupiter Swap Tool Implementation Review**

🧪 Jupiter Swap Tool Implementation Review

🎯 Jupiter Swap Tool Unification - HIGH PRIORITY ✅ IN PROGRESS

🛠️ Immediate Action Required

FilesExpand file tree

TASKS.md

Latest commit

History

TASKS.md

File metadata and controls

Tasks

🎯 Open Issues

#1 Jupiter Swap Tool Response Format Inconsistency - CRITICAL 🔥

#1 AI Model Amount Request Issue - High

🎯 GLM SPL Transfer ATA Resolution Issue - Medium

📋 Tasks

🎯 GLM SPL Transfer ATA Resolution Fix - High Priority

🧪 Jupiter Swap Tool Implementation Review**

🧪 Jupiter Swap Tool Implementation Review

🎯 Jupiter Swap Tool Unification - HIGH PRIORITY ✅ IN PROGRESS

🛠️ Immediate Action Required