Skip to content

VertexAISearchSummaryTool drops SummaryWithMetadata citations/references despite summary_include_citations=True #1836

@glaziermag

Description

@glaziermag

VertexAISearchSummaryTool drops SummaryWithMetadata citations/references despite summary_include_citations=True

Package versions

  • langchain-google-community: 4.0.0
  • google-cloud-discoveryengine: 0.20.0
  • langchain-core: 1.4.4
  • Python: 3.13.12
  • langchain-google main SHA: 330f2df

Minimal corpus

Three direct-upload test documents under 100 KB:

Direct Discovery Engine request

  • query: What are the current release codes and policy codes?
  • filter: tenant: ANY("red")
  • SummarySpec.include_citations: true
  • SummarySpec.summary_result_count: 3
  • data store type: unstructured
  • serving config: Enterprise/LLM Discovery Engine app serving config

Note: I first tried a structured datastore serving config. Direct Search returned
summary_with_metadata, but skipped generation with LLM_ADDON_NOT_ENABLED, so
that was not used as the LangChain bug confirmation. The confirmed run used a
tiny direct-upload text datastore and an Enterprise Search app with
SEARCH_ADD_ON_LLM.

Direct Discovery Engine response evidence

  • summary.summary_text present: True
  • summary.summary_with_metadata present: True
  • citation_metadata.citations present: True
  • references present: True
  • reference documents: ['projects/917493783080/locations/global/collections/default_collection/dataStores/summary-citation-parity-ds-r3/branches/0/documents/doc_red_v1']
  • reference URIs: []

Because this was a direct raw-bytes text upload, Discovery Engine returned
references[*].document but did not populate references[*].uri.

Excerpt:

{
  "summary": {
    "summary_text": "The current red tenant release code is FALCON-17 [1]. The red policy code is R-ALLOW-101 [1].",
    "summary_with_metadata": {
      "summary": "The current red tenant release code is FALCON-17. The red policy code is R-ALLOW-101.",
      "citation_metadata": {
        "citations": [
          {"end_index": "49", "sources": [{}]},
          {"start_index": "50", "end_index": "85", "sources": [{}]}
        ]
      },
      "references": [
        {
          "document": "projects/917493783080/locations/global/collections/default_collection/dataStores/summary-citation-parity-ds-r3/branches/0/documents/doc_red_v1"
        }
      ]
    }
  }
}

Full request/response JSON is in:

/Users/gabe/Desktop/agent-search-freshness/summary_citation_parity_runs/summary_citation_parity_20260610T233854282350Z/raw_calls.jsonl

LangChain code

from langchain_google_community.vertex_ai_search import VertexAISearchSummaryTool

tool = VertexAISearchSummaryTool(
    project_id=PROJECT_ID,
    location_id="global",
    data_store_id=DATA_STORE_ID,
    engine_data_type=0,
    summary_result_count=3,
    summary_include_citations=True,
)
result = tool.run('What are the current release codes and policy codes?')

The harness set the tool's private _serving_config to the engine/app serving
config above so the direct API and LangChain calls exercised the same
Enterprise/LLM Search request. The metadata loss itself is independent of that
override: current main still has _run() return only
response.summary.summary_text.

Current main, libs/community/langchain_google_community/vertex_ai_search.py:

def _run(self, user_query: str) -> str:
    request = self._create_search_request(user_query)
    response = self._client.search(request)
    return response.summary.summary_text

LangChain actual output

  • installed return type: str
  • installed returned value: 'The current red tenant release code is FALCON-17 [1]. The red policy code is R-ALLOW-101 [1].'
  • main return type: str
  • main returned value: 'The current red tenant release code is FALCON-17 [1]. The red policy code is R-ALLOW-101 [1].'

The wrapped LangChain _client.search call received a raw SearchResponse with
summary_with_metadata, citation_metadata.citations, and references, but
VertexAISearchSummaryTool._run() returned only response.summary.summary_text.

Expected behavior

The tool should expose summary_with_metadata, citation metadata, and references,
or provide a structured artifact/return path when summary_include_citations=True.

Actual behavior

The tool discards the metadata and returns only summary_text.

Negative controls

  • Direct google-cloud-discoveryengine Search API was correct for the same request.
  • No ADK, ChatVertexAI, Gemini, Vertex model endpoint, Cloud Run, GCS, BigQuery,
    Document AI, website crawl, PDF import, or LangSmith telemetry was used.
  • Duplicate searches for VertexAISearchSummaryTool + citations/references and
    SearchResponse SummaryWithMetadata did not find an existing matching issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions