-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Problem Statement
Currently, there is no way to limit spending on LLM calls during agent execution. I'm always frustrated when an agent with a bug, infinite loop, or unexpectedly complex task consumes unlimited tokens, leading to:
- Unexpectedly high API bills
- No visibility into costs until after execution completes
- No way to set guardrails for production deployments
- No alerts when costs are approaching limits
Proposed Solution
Add the ability to set token and cost (USD) budgets for agent executions. When limits are approached or exceeded, the system should warn, pause, or stop execution based on configuration.
Key Components:
BudgetConfig- Configuration dataclass for limits, thresholds, and callbacksCostTracker- Tracks usage, calculates costs, enforces limitsGraphExecutorintegration - Check budget after each LLM call- CLI flags -
--max-cost,--max-tokens,--warn-at
Example Usage:
executor = GraphExecutor(
runtime=runtime,
llm=llm,
budget_config=BudgetConfig(
max_cost_usd=5.00,
max_total_tokens=100_000,
warn_at_percentage=75.0,
on_exceed="pause",
webhook_url="https://hooks.slack.com/...",
),
)
result = await executor.execute(graph, goal, input_data)
print(f"Cost: ${result.total_cost_usd:.4f}")
print(f"Budget exceeded: {result.budget_exceeded}")Alternatives Considered
-
External monitoring only - Use third-party tools (LangSmith, Helicone) to track costs after the fact. Downside: No real-time enforcement, costs already incurred.
-
LLM provider limits - Set spending limits at the API provider level (OpenAI, Anthropic). Downside: Not granular per-agent, no pause/resume capability.
-
Pre-execution estimation - Estimate costs before running based on graph complexity. Downside: Inaccurate for dynamic agents, doesn't handle retries.
-
Token-only limits (no USD) - Just limit tokens without cost calculation. Downside: Different models have different costs per token.
Additional Context
Use Cases
- Development: Limit costs while testing new agents
- Production: Set hard limits to prevent runaway costs
- Budgeting: Track costs per agent/goal for billing
- Alerts: Slack/webhook notifications when approaching limits
Model Pricing Reference
Default pricing should be included for common models:
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| claude-3-5-sonnet | $3.00 | $15.00 |
| claude-3-5-haiku | $0.25 | $1.25 |
| gemini-1.5-pro | $1.25 | $5.00 |
Related Roadmap Items
- Guardrails (Phase 2)
- Basic observability hooks (Phase 1)
Implementation Ideas
New Files
| File | Description |
|---|---|
core/framework/graph/cost_tracker.py |
CostTracker and BudgetConfig classes |
core/tests/test_cost_tracker.py |
Unit tests |
Modified Files
| File | Changes |
|---|---|
core/framework/graph/executor.py |
Add budget_config param, check limits after each node |
core/framework/graph/__init__.py |
Export new classes |
core/framework/cli.py |
Add --max-cost, --max-tokens flags |
New Fields in ExecutionResult
@dataclass
class ExecutionResult:
# ... existing fields ...
total_cost_usd: float = 0.0
budget_exceeded: bool = False
cost_summary: dict[str, Any] = field(default_factory=dict)Key Classes
@dataclass
class BudgetConfig:
max_input_tokens: int | None = None
max_output_tokens: int | None = None
max_total_tokens: int | None = None
max_cost_usd: float | None = None
warn_at_percentage: float = 80.0
on_exceed: str = "stop" # "warn", "pause", "stop"
on_warning: Callable | None = None
webhook_url: str | None = None
class CostTracker:
def record_usage(self, model: str, input_tokens: int, output_tokens: int) -> CostSnapshot
def should_stop(self) -> bool
def should_pause(self) -> bool
def get_summary(self) -> dict@TimothyZhang7 can you assign me this issue