Skip to content

Author field for live transcription #4260

@bashimr

Description

@bashimr

ADK Bug Report: Input Transcription Events Have Incorrect Author Field

Summary

In ADK's run_live mode, input transcription events (user voice) are saved with author set to the agent name instead of 'user'. This contradicts the function's docstring and can cause issues with event filtering and debugging.

Environment

  • ADK Version: Latest (as of January 2025)
  • Model: gemini-2.5-flash-native-audio-preview-12-2025
  • Mode: run_live (WebSocket voice conversation)

Bug Location

File: src/google/adk/flows/llm_flows/base_llm_flow.py
Function: get_author_for_event() (lines ~342-357)

def get_author_for_event(llm_response):
    """Get the author of the event.

    When the model returns transcription, the author is "user". Otherwise, the
    author is the agent name(not 'model').
    """
    if (
        llm_response
        and llm_response.content
        and llm_response.content.role == 'user'
    ):
        return 'user'
    else:
        return invocation_context.agent.name

Problem

The docstring explicitly states: "When the model returns transcription, the author is 'user'"

However, for input transcription events:

  • llm_response.content is None
  • llm_response.input_transcription contains the user's speech text

The condition llm_response.content evaluates to False, so the function always returns invocation_context.agent.name for transcription events.

Evidence

Database query on a live session shows all events have the agent name as author, even input transcriptions:

SELECT
    json_extract(event_data, '$.author') as author,
    json_extract(event_data, '$.input_transcription.text') as user_speech
FROM events
WHERE session_id = 'f8fa4acb-5cae-4522-9609-b0386983bb9b';

Results:

author user_speech
lidia_agent " Hello."
lidia_agent " All right, show me my pants."
lidia_agent " Hi what was the last question I asked you?"

All three user utterances have author='lidia_agent' instead of author='user'.

Impact

  1. Event filtering issues: Code that filters by event.author == 'user' won't find user transcription events
  2. _get_current_turn_contents() in contents.py uses event.author == 'user' to find turn boundaries
  3. Debugging/tracing: Makes it difficult to distinguish user vs model events in logs
  4. Potential context issues: While _get_contents() uses input_transcription vs output_transcription to build LLM context (which works correctly), other code paths may rely on the author field

Suggested Fix

def get_author_for_event(llm_response):
    """Get the author of the event.

    When the model returns transcription, the author is "user". Otherwise, the
    author is the agent name(not 'model').
    """
    if (
        llm_response
        and llm_response.content
        and llm_response.content.role == 'user'
    ):
        return 'user'
    # Fix: Also check for input transcription (user's voice)
    if llm_response and llm_response.input_transcription:
        return 'user'
    return invocation_context.agent.name

Reproduction Steps

  1. Create an ADK agent with run_live enabled
  2. Connect via WebSocket and speak to the agent
  3. Query the events table in adk_sessions.db
  4. Observe that input transcription events have author set to agent name instead of 'user'

Related Code

  • base_llm_flow.py:_receive_from_model() - calls get_author_for_event()
  • contents.py:_get_current_turn_contents() - uses event.author == 'user' for turn detection
  • runners.py:_exec_with_plugin() - saves events to session

Metadata

Metadata

Assignees

No one assigned

    Labels

    live[Component] This issue is related to live, voice and video chat

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions