CLI Cost Tracking Gap: Single-Task Execution Lacks Structured Cost Output
Problem Statement
When using the PraisonAI CLI in single-task mode (praisonai "TASK" --model MODEL), cost and token usage data is calculated internally but not output in a machine-readable format. This creates a significant gap for programmatic use cases like Terminal-Bench benchmarking where cost comparison between approaches is critical.
Comparison: Direct Agent vs CLI Wrapper
| Metric |
Direct Agent (Agent class) |
CLI Wrapper (subprocess) |
| Success Rate |
5/5 (100%) |
5/5 (100%) |
| Total Time |
262.47s |
105.25s (2.5x faster) |
| Total Cost |
$0.017157 |
N/A (not trackable) |
| Avg per Task |
52.49s |
21.05s |
Critical Gap: The CLI approach is 2.5x faster but provides zero cost visibility, making it impossible to optimize for both speed AND cost-efficiency in production workloads.
Root Cause Analysis
Architecture Gap
Direct Agent Path (Works):
Agent.start() → LLM calls → TokenUsage dataclass → _total_cost accumulation → cost_summary() method available
- File:
praisonaiagents/agent/agent.py:1898-1914
cost_summary() returns: {"tokens_in": int, "tokens_out": int, "cost": float, "llm_calls": int}
CLI Wrapper Path (Broken):
praisonai "TASK" → subprocess.run() → stdout/stderr only → No structured cost data at process exit
- File:
praisonai/cli/main.py - handle_direct_prompt() method prints result but not metrics
Where Cost Data Lives in CLI
The CLI does track cost internally (evidence found):
-
CostTracker class (praisonai/cli/features/cost_tracker.py:140-201):
SessionStats.to_dict() returns complete cost data
- Fields:
total_cost, total_input_tokens, total_output_tokens, avg_cost_per_request
-
Interactive TUI (praisonai/cli/main.py:6148-6168):
_handle_stats_command() shows cost via /stats command
- Calculates:
pricing.calculate_cost(input_tokens, output_tokens)
-
Metrics Feature (praisonai/cli/main.py:965):
--metrics flag exists but only for interactive TUI mode
- Missing:
--metrics-json for single-task structured output
The Missing Bridge
When running praisonai "TASK" --model gpt-4o-mini:
- ✅ CLI calculates cost internally in
session_state
- ❌ At process exit, only the text response is printed
- ❌ No JSON blob with
{"cost_usd": X, "tokens_in": Y, "tokens_out": Z} is output
- ❌ Wrapper agent cannot capture cost data via subprocess
Evidence: Code Locations
Core SDK (Works)
# praisonaiagents/agent/agent.py:1898-1914
@property
def total_cost(self) -> float:
"""Cumulative USD cost of all LLM calls in this agent run."""
return self._total_cost
@property
def cost_summary(self) -> dict:
"""Summary of cost and token usage."""
return {
"tokens_in": self._total_tokens_in,
"tokens_out": self._total_tokens_out,
"cost": self._total_cost,
"llm_calls": self._llm_call_count,
}
CLI (Missing Output)
# praisonai/cli/main.py (handle_direct_prompt method)
# Prints result but no cost metrics at exit
print(result) # Line ~709
# Missing: print(json.dumps(session_stats.to_dict()))
TokenUsage Dataclass (Core SDK)
# praisonaiagents/llm/llm.py:96-121
@dataclass
class TokenUsage:
prompt_tokens: int = 0
completion_tokens: int = 0
total_tokens: int = 0
cached_tokens: int = 0
# ... methods
Proposed Solutions (Ranked by Complexity)
Option 1: CLI Metrics Output Flag (Recommended - Low Complexity)
Add --metrics-json flag to CLI that outputs structured cost data at process end:
# In praisonai/cli/main.py at exit point
if args.metrics_json:
print(json.dumps({
"cost_usd": session_state['total_cost'],
"tokens_in": session_state['total_input_tokens'],
"tokens_out": session_state['total_output_tokens'],
"model": session_state['current_model'],
"request_count": session_state['request_count']
}))
Files to modify:
praisonai/cli/main.py - Add flag and output logic in handle_direct_prompt() and main()
Benefits:
- Minimal code change (~10 lines)
- Follows existing CLI patterns
- Benefits all CLI users, not just Terminal-Bench
- Machine-readable output enables programmatic cost tracking
- No performance impact (only executes when flag is set)
Option 2: Wrapper Agent Cost Estimation (Medium Complexity)
Calculate cost in the wrapper agent after execution using litellm's cost calculator:
from litellm import cost_calculator
# After subprocess completes, estimate cost based on model + output length
# Requires parsing output length to estimate tokens
Downside: Estimation only, not actual cost from provider
Option 3: Environment Variable Bridge (Medium Complexity)
CLI writes cost data to temp file via env var path, wrapper reads it:
# CLI side
if os.environ.get('PRAISONAI_COST_FILE'):
with open(os.environ['PRAISONAI_COST_FILE'], 'w') as f:
json.dump(cost_data, f)
# Wrapper side reads file after subprocess completes
Downside: More complex, requires file system coordination
Option 4: Full Structured Logging (High Complexity)
Add comprehensive structured output mode to CLI with full execution metadata.
Downside: Overkill for this specific use case
Recommendation
Go with Option 1 - Add --metrics-json flag:
- Minimal change: ~10 lines of code
- No performance impact: Only executes when flag is explicitly set
- Follows patterns:
--metrics flag already exists for TUI mode
- Universal benefit: All CLI users gain programmatic cost visibility
- Terminal-Bench unblocked: Wrapper agent can capture and compare costs
Implementation Plan
Phase 1: Core CLI Change
- Add
--metrics-json argument to argument parser (praisonai/cli/main.py)
- In
handle_direct_prompt() method, capture cost data from session_state
- At process exit, output JSON if flag is set
- Test with single task:
praisonai "hello" --model gpt-4o-mini --metrics-json
Phase 2: Wrapper Agent Update
- Update
praisonai_wrapper_agent.py to pass --metrics-json flag
- Parse JSON output from subprocess
- Populate Harbor AgentContext with cost data
Phase 3: Verification
- Run comparison test again (5 tasks)
- Verify both approaches show cost data
- Confirm CLI cost matches Direct Agent cost for same task
Acceptance Criteria
Related Files
Core SDK (Cost Tracking Works)
praisonaiagents/agent/agent.py:1898-1914 - total_cost, cost_summary properties
praisonaiagents/llm/llm.py:96-121 - TokenUsage dataclass
praisonaiagents/agent/chat_mixin.py:680-699 - Cost accumulation during chat
CLI (Missing Output)
praisonai/cli/main.py:965 - --metrics flag (TUI only)
praisonai/cli/main.py:6148-6168 - _handle_stats_command() (interactive only)
praisonai/cli/features/cost_tracker.py:140-201 - SessionStats class with cost data
Wrapper Agent (Needs Cost)
examples/terminal_bench/praisonai_wrapper_agent.py - Currently cannot capture cost
examples/terminal_bench/test_agent_comparison.py - Shows cost gap in test results
Priority
High - Blocks production benchmarking and cost optimization workflows. Currently impossible to compare cost-efficiency of CLI vs Direct Agent approaches.
Labels
enhancement
cli
cost-tracking
terminal-bench
good-first-issue (Option 1 is straightforward)
Additional Context
This issue was discovered during Terminal-Bench 2.0 integration testing. The wrapper agent is 2.5x faster than direct Agent class but lacks cost visibility, making it impossible to optimize for both speed AND cost in production workloads.
CLI Cost Tracking Gap: Single-Task Execution Lacks Structured Cost Output
Problem Statement
When using the PraisonAI CLI in single-task mode (
praisonai "TASK" --model MODEL), cost and token usage data is calculated internally but not output in a machine-readable format. This creates a significant gap for programmatic use cases like Terminal-Bench benchmarking where cost comparison between approaches is critical.Comparison: Direct Agent vs CLI Wrapper
Critical Gap: The CLI approach is 2.5x faster but provides zero cost visibility, making it impossible to optimize for both speed AND cost-efficiency in production workloads.
Root Cause Analysis
Architecture Gap
Direct Agent Path (Works):
praisonaiagents/agent/agent.py:1898-1914cost_summary()returns:{"tokens_in": int, "tokens_out": int, "cost": float, "llm_calls": int}CLI Wrapper Path (Broken):
praisonai/cli/main.py-handle_direct_prompt()method prints result but not metricsWhere Cost Data Lives in CLI
The CLI does track cost internally (evidence found):
CostTracker class (
praisonai/cli/features/cost_tracker.py:140-201):SessionStats.to_dict()returns complete cost datatotal_cost,total_input_tokens,total_output_tokens,avg_cost_per_requestInteractive TUI (
praisonai/cli/main.py:6148-6168):_handle_stats_command()shows cost via/statscommandpricing.calculate_cost(input_tokens, output_tokens)Metrics Feature (
praisonai/cli/main.py:965):--metricsflag exists but only for interactive TUI mode--metrics-jsonfor single-task structured outputThe Missing Bridge
When running
praisonai "TASK" --model gpt-4o-mini:session_state{"cost_usd": X, "tokens_in": Y, "tokens_out": Z}is outputEvidence: Code Locations
Core SDK (Works)
CLI (Missing Output)
TokenUsage Dataclass (Core SDK)
Proposed Solutions (Ranked by Complexity)
Option 1: CLI Metrics Output Flag (Recommended - Low Complexity)
Add
--metrics-jsonflag to CLI that outputs structured cost data at process end:Files to modify:
praisonai/cli/main.py- Add flag and output logic inhandle_direct_prompt()andmain()Benefits:
Option 2: Wrapper Agent Cost Estimation (Medium Complexity)
Calculate cost in the wrapper agent after execution using litellm's cost calculator:
Downside: Estimation only, not actual cost from provider
Option 3: Environment Variable Bridge (Medium Complexity)
CLI writes cost data to temp file via env var path, wrapper reads it:
Downside: More complex, requires file system coordination
Option 4: Full Structured Logging (High Complexity)
Add comprehensive structured output mode to CLI with full execution metadata.
Downside: Overkill for this specific use case
Recommendation
Go with Option 1 - Add
--metrics-jsonflag:--metricsflag already exists for TUI modeImplementation Plan
Phase 1: Core CLI Change
--metrics-jsonargument to argument parser (praisonai/cli/main.py)handle_direct_prompt()method, capture cost data from session_statepraisonai "hello" --model gpt-4o-mini --metrics-jsonPhase 2: Wrapper Agent Update
praisonai_wrapper_agent.pyto pass--metrics-jsonflagPhase 3: Verification
Acceptance Criteria
praisonai "TASK" --model MODEL --metrics-jsonoutputs valid JSON with cost datacost_usd,tokens_in,tokens_out,model,request_countRelated Files
Core SDK (Cost Tracking Works)
praisonaiagents/agent/agent.py:1898-1914-total_cost,cost_summarypropertiespraisonaiagents/llm/llm.py:96-121-TokenUsagedataclasspraisonaiagents/agent/chat_mixin.py:680-699- Cost accumulation during chatCLI (Missing Output)
praisonai/cli/main.py:965---metricsflag (TUI only)praisonai/cli/main.py:6148-6168-_handle_stats_command()(interactive only)praisonai/cli/features/cost_tracker.py:140-201-SessionStatsclass with cost dataWrapper Agent (Needs Cost)
examples/terminal_bench/praisonai_wrapper_agent.py- Currently cannot capture costexamples/terminal_bench/test_agent_comparison.py- Shows cost gap in test resultsPriority
High - Blocks production benchmarking and cost optimization workflows. Currently impossible to compare cost-efficiency of CLI vs Direct Agent approaches.
Labels
enhancementclicost-trackingterminal-benchgood-first-issue(Option 1 is straightforward)Additional Context
This issue was discovered during Terminal-Bench 2.0 integration testing. The wrapper agent is 2.5x faster than direct Agent class but lacks cost visibility, making it impossible to optimize for both speed AND cost in production workloads.