Skip to content

[Feature]: Cost Budget & Limits for Agent Executions #1506

@kid-ye

Description

@kid-ye

Problem Statement

Currently, there is no way to limit spending on LLM calls during agent execution. I'm always frustrated when an agent with a bug, infinite loop, or unexpectedly complex task consumes unlimited tokens, leading to:

  1. Unexpectedly high API bills
  2. No visibility into costs until after execution completes
  3. No way to set guardrails for production deployments
  4. No alerts when costs are approaching limits

Proposed Solution

Add the ability to set token and cost (USD) budgets for agent executions. When limits are approached or exceeded, the system should warn, pause, or stop execution based on configuration.

Key Components:

  1. BudgetConfig - Configuration dataclass for limits, thresholds, and callbacks
  2. CostTracker - Tracks usage, calculates costs, enforces limits
  3. GraphExecutor integration - Check budget after each LLM call
  4. CLI flags - --max-cost, --max-tokens, --warn-at

Example Usage:

executor = GraphExecutor(
    runtime=runtime,
    llm=llm,
    budget_config=BudgetConfig(
        max_cost_usd=5.00,
        max_total_tokens=100_000,
        warn_at_percentage=75.0,
        on_exceed="pause",
        webhook_url="https://hooks.slack.com/...",
    ),
)

result = await executor.execute(graph, goal, input_data)
print(f"Cost: ${result.total_cost_usd:.4f}")
print(f"Budget exceeded: {result.budget_exceeded}")

Alternatives Considered

  1. External monitoring only - Use third-party tools (LangSmith, Helicone) to track costs after the fact. Downside: No real-time enforcement, costs already incurred.

  2. LLM provider limits - Set spending limits at the API provider level (OpenAI, Anthropic). Downside: Not granular per-agent, no pause/resume capability.

  3. Pre-execution estimation - Estimate costs before running based on graph complexity. Downside: Inaccurate for dynamic agents, doesn't handle retries.

  4. Token-only limits (no USD) - Just limit tokens without cost calculation. Downside: Different models have different costs per token.

Additional Context

Use Cases

  • Development: Limit costs while testing new agents
  • Production: Set hard limits to prevent runaway costs
  • Budgeting: Track costs per agent/goal for billing
  • Alerts: Slack/webhook notifications when approaching limits

Model Pricing Reference

Default pricing should be included for common models:

Model Input (per 1M) Output (per 1M)
gpt-4o $2.50 $10.00
gpt-4o-mini $0.15 $0.60
claude-3-5-sonnet $3.00 $15.00
claude-3-5-haiku $0.25 $1.25
gemini-1.5-pro $1.25 $5.00

Related Roadmap Items

  • Guardrails (Phase 2)
  • Basic observability hooks (Phase 1)

Implementation Ideas

New Files

File Description
core/framework/graph/cost_tracker.py CostTracker and BudgetConfig classes
core/tests/test_cost_tracker.py Unit tests

Modified Files

File Changes
core/framework/graph/executor.py Add budget_config param, check limits after each node
core/framework/graph/__init__.py Export new classes
core/framework/cli.py Add --max-cost, --max-tokens flags

New Fields in ExecutionResult

@dataclass
class ExecutionResult:
    # ... existing fields ...
    total_cost_usd: float = 0.0
    budget_exceeded: bool = False
    cost_summary: dict[str, Any] = field(default_factory=dict)

Key Classes

@dataclass
class BudgetConfig:
    max_input_tokens: int | None = None
    max_output_tokens: int | None = None
    max_total_tokens: int | None = None
    max_cost_usd: float | None = None
    warn_at_percentage: float = 80.0
    on_exceed: str = "stop"  # "warn", "pause", "stop"
    on_warning: Callable | None = None
    webhook_url: str | None = None

class CostTracker:
    def record_usage(self, model: str, input_tokens: int, output_tokens: int) -> CostSnapshot
    def should_stop(self) -> bool
    def should_pause(self) -> bool
    def get_summary(self) -> dict

@TimothyZhang7 can you assign me this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions