Skip to content

Conversation

@caohy1988
Copy link

Summary

This PR adds a comprehensive design document for three advanced capabilities of the ADK BigQuery Agent Analytics Plugin:

  • Trace-Based Evaluation Harness - Automated evaluation of agent behavior using stored traces

    • Trajectory matching (exact, in-order, any-order)
    • LLM-as-judge evaluation
    • Deterministic replay for debugging
  • Long-Horizon Agent Memory - Context and memory management for agents

    • Cross-session context retrieval
    • Episodic memory with semantic search
    • User profile building from trace history
  • BigQuery AI/ML Integration - Leveraging BigQuery's advanced features

    • Embedding-based trace search (AI.EMBED)
    • Anomaly detection (ML.DETECT_ANOMALIES)
    • Batch evaluation pipeline

Test plan

  • Document reviewed for technical accuracy
  • Architecture diagrams included
  • SQL examples and code snippets validated
  • References to academic papers and documentation included

🤖 Generated with Claude Code

This design document outlines three advanced capabilities for the
ADK BigQuery Agent Analytics Plugin:

1. Trace-Based Evaluation Harness - Automated evaluation of agent
   behavior using stored traces with trajectory matching, LLM-as-judge,
   and deterministic replay for debugging.

2. Long-Horizon Agent Memory - Context and memory management using
   historical trace data including session memory, episodic memory,
   and user profile building.

3. BigQuery AI/ML Integration - Leveraging BigQuery's ML functions
   for embedding-based search, anomaly detection, and batch evaluation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@google-cla
Copy link

google-cla bot commented Jan 27, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @caohy1988, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive design document outlining advanced capabilities for the ADK BigQuery Agent Analytics Plugin. The document focuses on enhancing agent evaluation, improving long-term memory management, and deeply integrating with BigQuery's AI/ML features to provide richer insights and more robust agent systems.

Highlights

  • Trace-Based Evaluation Harness: Introduces an automated system for evaluating agent behavior using stored traces. This includes trajectory matching (exact, in-order, any-order), LLM-as-judge evaluation for qualitative assessment, and deterministic replay capabilities for debugging agent interactions.
  • Long-Horizon Agent Memory: Details mechanisms for managing agent context and memory over extended periods. This encompasses cross-session context retrieval, episodic memory with semantic search, and the building of user profiles based on historical trace data.
  • BigQuery AI/ML Integration: Outlines how to leverage BigQuery's advanced AI/ML functions for agent analytics. This includes using AI.EMBED for embedding-based trace search, ML.DETECT_ANOMALIES for identifying unusual agent behavior, and implementing batch evaluation pipelines for efficient analysis.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the documentation [Component] This issue is related to documentation, it will be transferred to adk-docs label Jan 27, 2026
@adk-bot
Copy link
Collaborator

adk-bot commented Jan 27, 2026

Response from ADK Triaging Agent

Hello @caohy1988, thank you for creating this PR!

Before we can merge this, you'll need to sign the Contributor License Agreement (CLA). You can do so by following the instructions at https://cla.developers.google.com/.

Also, for a contribution of this size, could you please create a GitHub issue that describes the new feature and associate it with this PR?

This information will help reviewers to review your PR more efficiently. Thanks!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design document detailing advanced capabilities for the ADK BigQuery Agent Analytics Plugin. It covers trace-based evaluation, long-horizon agent memory, and BigQuery AI/ML integration. The document includes architecture diagrams, SQL examples, and Python code snippets. The review focuses on identifying potential issues related to correctness, efficiency, and maintainability, particularly concerning the integration of various components and the clarity of the design. No specific style guide was provided, so the review defaults to general best practices for documentation and code clarity.

Comment on lines 69 to 74
| Metric | Description | Implementation |
|--------|-------------|----------------|
| `TOOL_TRAJECTORY_AVG_SCORE` | How closely agent's tool calls match expected trajectory | Compare tool sequences from trace vs golden trajectory |
| `TOOL_TRAJECTORY_IN_ORDER_SCORE` | Whether tools were called in correct order | Order-aware sequence matching |
| `RESPONSE_MATCH_SCORE` | Final response similarity to expected | Embedding similarity or LLM judge |
| `STEP_EFFICIENCY_SCORE` | Ratio of necessary vs actual steps | Count trace events vs optimal path |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This table lacks proper alignment. Consider using a consistent number of spaces or tabs for alignment to improve readability. Also, consider using a more descriptive header for the Implementation column, such as Implementation Details or Implementation Strategy, to provide more context to the reader.

Comment on lines 78 to 83
| Metric | Description | Data Source |
|--------|-------------|-------------|
| `PLAN_QUALITY` | Quality of agent's reasoning/planning | LLM_REQUEST content analysis |
| `PLAN_ADHERENCE` | Whether agent followed its plan | Compare stated plan vs executed tools |
| `TOOL_SELECTION_ACCURACY` | Correct tool chosen for task | TOOL_STARTING events vs expected |
| `ARGUMENT_CORRECTNESS` | Tool arguments match requirements | TOOL_STARTING attributes |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous table, this table also lacks proper alignment. Consistent spacing or tabs should be used to improve readability. Additionally, consider renaming the Data Source column to Data Source Description or Data Source Details to provide more context.

@caohy1988 caohy1988 marked this pull request as draft January 27, 2026 18:42
Transition document to SQL-native implementation and enhance advanced capabilities for ADK BigQuery Agent Analytics Plugin. Update sections on evaluation harness, memory architecture, and integration with BigQuery AI functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation [Component] This issue is related to documentation, it will be transferred to adk-docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants