Skip to content

Fix missing manager_agent tokens in usage_metrics from kickoff #2848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hegasz
Copy link

@hegasz hegasz commented May 15, 2025

The Crew.kickoff() method calculates usage_metrics by aggregating token usage from each agent.
Unless I'm missing something here, it seems to leave out manager_agent tokens (used in hierarchical crews), which could lead to under-reported usage. Is this intentional?

This PR updates the logic to delegate to calculate_usage_metrics(), which properly includes both regular agents and the manager in the aggregation.

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #2848

Overview

This pull request modifies the usage metrics collection in crew.py, specifically in the kickoff method. The aim is to prevent the loss of manager_agent tokens by restructuring the collection and aggregation of metrics.

Identified Changes

  • Removal of Initialization: The explicit list initialization for metrics is removed, simplifying metrics handling.
  • Elimination of Aggregation Loop: The previous manual loop for collecting metrics has been removed.
  • Introduction of calculate_usage_metrics(): This new method should centralize the logic for metrics calculation based on agents.

Issues Found

  1. Potential Missing Method Implementation: calculate_usage_metrics() is introduced but lacks visible implementation, leading to possible runtime errors if not defined elsewhere.
  2. Error Handling Concerns: The unchanged generic exception handling makes debugging more challenging; specific metrics-related errors should be captured.

Code Quality

Positive Aspects:

  • Reduced Complexity: The code is clearer without the manual metrics aggregation.
  • Improved Maintainability: Centralizing metrics calculations leads to better readability and potential reuse.
  • Memory Efficiency: Removal of temporary list creation optimizes memory usage.

Improvement Suggestions:

  • Documentation: The new method calculate_usage_metrics() should include a docstring outlining its purpose and return types.
  • Type Hints: Adding type hints for variables and method signatures would enhance clarity and maintainability.

Recommendations

  1. Add Method Documentation
    Example:

    def calculate_usage_metrics(self) -> UsageMetrics:
        """
        Aggregates usage metrics from all agents, including manager agents.
        Returns:
            UsageMetrics: Combined usage metrics from all agents in the crew.
        """
  2. Enhance Error Handling
    Enhanced error handling could look like:

    try:
        self.usage_metrics = self.calculate_usage_metrics()
    except AttributeError as ae:
        raise CrewException("Failed to calculate metrics due to missing attributes.") from ae
    except Exception as e:
        crewai_event_bus.emit("crew.error", {"error": str(e), "context": "metrics_calculation"})
        raise
  3. Implement Type Hints
    For better code readability:

    def kickoff(self) -> Any:
        result: Any = None
        self.usage_metrics: UsageMetrics = self.calculate_usage_metrics()
        return result
  4. Add Validation Logic
    To improve the robustness of metrics calculation:

    def calculate_usage_metrics(self) -> UsageMetrics:
        if not self.agents:
            raise CrewException("No agents available for metrics calculation.")
        return UsageMetrics().add_usage_metrics(*[
            agent._token_process.get_summary() for agent in self.agents
        ])

Conclusion

While the changes made enhance the efficiency of metrics collection, further improvements in documentation, type hints, and error handling are necessary for clarity and robustness. Implementing the above suggestions will not only maintain the original intent of the changes but also strengthen the code quality.

Testing Recommendations

  • Implement unit tests for calculate_usage_metrics(), including scenarios with empty agent lists and normal cases.
  • Validate that metrics are accurately aggregated across varying agent types.

Addressing the points above will ensure a smooth merging process and contribute to a maintainable codebase.

Copy link
Contributor

@lucasgomide lucasgomide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add some tests to cover this issue?

@hegasz
Copy link
Author

hegasz commented May 16, 2025

could you add some tests to cover this issue?

Have added test_hierarchical_kickoff_usage_metrics_include_manager in tests/crew_test.py!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants