-
Notifications
You must be signed in to change notification settings - Fork 2.8k
docs: Add ADK Agent Analytics Advanced Capabilities Design Document #4284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: Add ADK Agent Analytics Advanced Capabilities Design Document #4284
Conversation
This design document outlines three advanced capabilities for the ADK BigQuery Agent Analytics Plugin: 1. Trace-Based Evaluation Harness - Automated evaluation of agent behavior using stored traces with trajectory matching, LLM-as-judge, and deterministic replay for debugging. 2. Long-Horizon Agent Memory - Context and memory management using historical trace data including session memory, episodic memory, and user profile building. 3. BigQuery AI/ML Integration - Leveraging BigQuery's ML functions for embedding-based search, anomaly detection, and batch evaluation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @caohy1988, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive design document outlining advanced capabilities for the ADK BigQuery Agent Analytics Plugin. The document focuses on enhancing agent evaluation, improving long-term memory management, and deeply integrating with BigQuery's AI/ML features to provide richer insights and more robust agent systems. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Response from ADK Triaging Agent Hello @caohy1988, thank you for creating this PR! Before we can merge this, you'll need to sign the Contributor License Agreement (CLA). You can do so by following the instructions at https://cla.developers.google.com/. Also, for a contribution of this size, could you please create a GitHub issue that describes the new feature and associate it with this PR? This information will help reviewers to review your PR more efficiently. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive design document detailing advanced capabilities for the ADK BigQuery Agent Analytics Plugin. It covers trace-based evaluation, long-horizon agent memory, and BigQuery AI/ML integration. The document includes architecture diagrams, SQL examples, and Python code snippets. The review focuses on identifying potential issues related to correctness, efficiency, and maintainability, particularly concerning the integration of various components and the clarity of the design. No specific style guide was provided, so the review defaults to general best practices for documentation and code clarity.
| | Metric | Description | Implementation | | ||
| |--------|-------------|----------------| | ||
| | `TOOL_TRAJECTORY_AVG_SCORE` | How closely agent's tool calls match expected trajectory | Compare tool sequences from trace vs golden trajectory | | ||
| | `TOOL_TRAJECTORY_IN_ORDER_SCORE` | Whether tools were called in correct order | Order-aware sequence matching | | ||
| | `RESPONSE_MATCH_SCORE` | Final response similarity to expected | Embedding similarity or LLM judge | | ||
| | `STEP_EFFICIENCY_SCORE` | Ratio of necessary vs actual steps | Count trace events vs optimal path | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table lacks proper alignment. Consider using a consistent number of spaces or tabs for alignment to improve readability. Also, consider using a more descriptive header for the Implementation column, such as Implementation Details or Implementation Strategy, to provide more context to the reader.
| | Metric | Description | Data Source | | ||
| |--------|-------------|-------------| | ||
| | `PLAN_QUALITY` | Quality of agent's reasoning/planning | LLM_REQUEST content analysis | | ||
| | `PLAN_ADHERENCE` | Whether agent followed its plan | Compare stated plan vs executed tools | | ||
| | `TOOL_SELECTION_ACCURACY` | Correct tool chosen for task | TOOL_STARTING events vs expected | | ||
| | `ARGUMENT_CORRECTNESS` | Tool arguments match requirements | TOOL_STARTING attributes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transition document to SQL-native implementation and enhance advanced capabilities for ADK BigQuery Agent Analytics Plugin. Update sections on evaluation harness, memory architecture, and integration with BigQuery AI functions.
Summary
This PR adds a comprehensive design document for three advanced capabilities of the ADK BigQuery Agent Analytics Plugin:
Trace-Based Evaluation Harness - Automated evaluation of agent behavior using stored traces
Long-Horizon Agent Memory - Context and memory management for agents
BigQuery AI/ML Integration - Leveraging BigQuery's advanced features
Test plan
🤖 Generated with Claude Code