-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Problem Statement
Currently, the AgentRuntime and LiteLLMProvider lack a centralized mechanism for monitoring resource consumption. When running complex, multi-node agents like the OnlineResearchAgent, there is zero visibility into:
Cumulative Token Usage: How many tokens were consumed across the entire multi-step graph?
Session Costs: What is the real-time USD cost of a specific execution?
Without this, developers cannot set cost guardrails, and autonomous agents risk "runaway" loops that can unexpectedly deplete API credits.
Proposed Solution
I propose adding an Observability Layer that hooks into the completion lifecycle.
Usage Aggregator: A component within AgentRuntime that captures the usage metadata from every LiteLLM response.
Session Telemetry: Persist these metrics alongside the existing telemetry in ~/.hive.
Cost Estimator: A utility to map model strings (e.g., gpt-4o) to their respective input/output prices for real-time cost reporting.
Alternatives Considered
Manual Tracking: Requiring developers to manually extract usage from every node. (This is too high-friction and error-prone).
Provider-Side Only: Checking the OpenAI/Anthropic dashboard. (This doesn't allow for per-agent or per-session granular tracking).
Additional Context
This is a natural follow-up to the LiteLLM Resilience work. Once the provider is resilient to errors, the next step in production reliability is understanding its cost and performance.
Implementation Ideas
Modify LiteLLMProvider: Ensure it returns the usage object (prompt, completion, and total tokens) in its internal state or response.
Stateful Tracker: Add a self.session_usage dictionary to the Agent class to store running totals.
Reporting Hook: Add a get_stats() or get_cost() method to the agent that provides a summary of the current session's impact.