Skip to content

CLI Cost Tracking Gap: Single-Task Execution Lacks Structured Cost Output #1356

@MervinPraison

Description

@MervinPraison

CLI Cost Tracking Gap: Single-Task Execution Lacks Structured Cost Output

Problem Statement

When using the PraisonAI CLI in single-task mode (praisonai "TASK" --model MODEL), cost and token usage data is calculated internally but not output in a machine-readable format. This creates a significant gap for programmatic use cases like Terminal-Bench benchmarking where cost comparison between approaches is critical.

Comparison: Direct Agent vs CLI Wrapper

Metric Direct Agent (Agent class) CLI Wrapper (subprocess)
Success Rate 5/5 (100%) 5/5 (100%)
Total Time 262.47s 105.25s (2.5x faster)
Total Cost $0.017157 N/A (not trackable)
Avg per Task 52.49s 21.05s

Critical Gap: The CLI approach is 2.5x faster but provides zero cost visibility, making it impossible to optimize for both speed AND cost-efficiency in production workloads.


Root Cause Analysis

Architecture Gap

Direct Agent Path (Works):

Agent.start() → LLM calls → TokenUsage dataclass → _total_cost accumulation → cost_summary() method available
  • File: praisonaiagents/agent/agent.py:1898-1914
  • cost_summary() returns: {"tokens_in": int, "tokens_out": int, "cost": float, "llm_calls": int}

CLI Wrapper Path (Broken):

praisonai "TASK" → subprocess.run() → stdout/stderr only → No structured cost data at process exit
  • File: praisonai/cli/main.py - handle_direct_prompt() method prints result but not metrics

Where Cost Data Lives in CLI

The CLI does track cost internally (evidence found):

  1. CostTracker class (praisonai/cli/features/cost_tracker.py:140-201):

    • SessionStats.to_dict() returns complete cost data
    • Fields: total_cost, total_input_tokens, total_output_tokens, avg_cost_per_request
  2. Interactive TUI (praisonai/cli/main.py:6148-6168):

    • _handle_stats_command() shows cost via /stats command
    • Calculates: pricing.calculate_cost(input_tokens, output_tokens)
  3. Metrics Feature (praisonai/cli/main.py:965):

    • --metrics flag exists but only for interactive TUI mode
    • Missing: --metrics-json for single-task structured output

The Missing Bridge

When running praisonai "TASK" --model gpt-4o-mini:

  • ✅ CLI calculates cost internally in session_state
  • ❌ At process exit, only the text response is printed
  • ❌ No JSON blob with {"cost_usd": X, "tokens_in": Y, "tokens_out": Z} is output
  • ❌ Wrapper agent cannot capture cost data via subprocess

Evidence: Code Locations

Core SDK (Works)

# praisonaiagents/agent/agent.py:1898-1914
@property
def total_cost(self) -> float:
    """Cumulative USD cost of all LLM calls in this agent run."""
    return self._total_cost

@property
def cost_summary(self) -> dict:
    """Summary of cost and token usage."""
    return {
        "tokens_in": self._total_tokens_in,
        "tokens_out": self._total_tokens_out,
        "cost": self._total_cost,
        "llm_calls": self._llm_call_count,
    }

CLI (Missing Output)

# praisonai/cli/main.py (handle_direct_prompt method)
# Prints result but no cost metrics at exit
print(result)  # Line ~709
# Missing: print(json.dumps(session_stats.to_dict()))

TokenUsage Dataclass (Core SDK)

# praisonaiagents/llm/llm.py:96-121
@dataclass
class TokenUsage:
    prompt_tokens: int = 0
    completion_tokens: int = 0
    total_tokens: int = 0
    cached_tokens: int = 0
    # ... methods

Proposed Solutions (Ranked by Complexity)

Option 1: CLI Metrics Output Flag (Recommended - Low Complexity)

Add --metrics-json flag to CLI that outputs structured cost data at process end:

# In praisonai/cli/main.py at exit point
if args.metrics_json:
    print(json.dumps({
        "cost_usd": session_state['total_cost'],
        "tokens_in": session_state['total_input_tokens'],
        "tokens_out": session_state['total_output_tokens'],
        "model": session_state['current_model'],
        "request_count": session_state['request_count']
    }))

Files to modify:

  • praisonai/cli/main.py - Add flag and output logic in handle_direct_prompt() and main()

Benefits:

  • Minimal code change (~10 lines)
  • Follows existing CLI patterns
  • Benefits all CLI users, not just Terminal-Bench
  • Machine-readable output enables programmatic cost tracking
  • No performance impact (only executes when flag is set)

Option 2: Wrapper Agent Cost Estimation (Medium Complexity)

Calculate cost in the wrapper agent after execution using litellm's cost calculator:

from litellm import cost_calculator

# After subprocess completes, estimate cost based on model + output length
# Requires parsing output length to estimate tokens

Downside: Estimation only, not actual cost from provider

Option 3: Environment Variable Bridge (Medium Complexity)

CLI writes cost data to temp file via env var path, wrapper reads it:

# CLI side
if os.environ.get('PRAISONAI_COST_FILE'):
    with open(os.environ['PRAISONAI_COST_FILE'], 'w') as f:
        json.dump(cost_data, f)

# Wrapper side reads file after subprocess completes

Downside: More complex, requires file system coordination

Option 4: Full Structured Logging (High Complexity)

Add comprehensive structured output mode to CLI with full execution metadata.

Downside: Overkill for this specific use case


Recommendation

Go with Option 1 - Add --metrics-json flag:

  1. Minimal change: ~10 lines of code
  2. No performance impact: Only executes when flag is explicitly set
  3. Follows patterns: --metrics flag already exists for TUI mode
  4. Universal benefit: All CLI users gain programmatic cost visibility
  5. Terminal-Bench unblocked: Wrapper agent can capture and compare costs

Implementation Plan

Phase 1: Core CLI Change

  1. Add --metrics-json argument to argument parser (praisonai/cli/main.py)
  2. In handle_direct_prompt() method, capture cost data from session_state
  3. At process exit, output JSON if flag is set
  4. Test with single task: praisonai "hello" --model gpt-4o-mini --metrics-json

Phase 2: Wrapper Agent Update

  1. Update praisonai_wrapper_agent.py to pass --metrics-json flag
  2. Parse JSON output from subprocess
  3. Populate Harbor AgentContext with cost data

Phase 3: Verification

  1. Run comparison test again (5 tasks)
  2. Verify both approaches show cost data
  3. Confirm CLI cost matches Direct Agent cost for same task

Acceptance Criteria

  • praisonai "TASK" --model MODEL --metrics-json outputs valid JSON with cost data
  • JSON includes: cost_usd, tokens_in, tokens_out, model, request_count
  • Wrapper agent captures cost data and populates Harbor context
  • Comparison test shows cost for both approaches
  • No regression in existing CLI functionality
  • Zero performance impact when flag is not used

Related Files

Core SDK (Cost Tracking Works)

  • praisonaiagents/agent/agent.py:1898-1914 - total_cost, cost_summary properties
  • praisonaiagents/llm/llm.py:96-121 - TokenUsage dataclass
  • praisonaiagents/agent/chat_mixin.py:680-699 - Cost accumulation during chat

CLI (Missing Output)

  • praisonai/cli/main.py:965 - --metrics flag (TUI only)
  • praisonai/cli/main.py:6148-6168 - _handle_stats_command() (interactive only)
  • praisonai/cli/features/cost_tracker.py:140-201 - SessionStats class with cost data

Wrapper Agent (Needs Cost)

  • examples/terminal_bench/praisonai_wrapper_agent.py - Currently cannot capture cost
  • examples/terminal_bench/test_agent_comparison.py - Shows cost gap in test results

Priority

High - Blocks production benchmarking and cost optimization workflows. Currently impossible to compare cost-efficiency of CLI vs Direct Agent approaches.

Labels

  • enhancement
  • cli
  • cost-tracking
  • terminal-bench
  • good-first-issue (Option 1 is straightforward)

Additional Context

This issue was discovered during Terminal-Bench 2.0 integration testing. The wrapper agent is 2.5x faster than direct Agent class but lacks cost visibility, making it impossible to optimize for both speed AND cost in production workloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions