Skip to content

[Cosmos] [Embedding V0] Wire resolver into _run_hybrid_search, plumb generator, and add diagnostics span #46733

@ananth7592

Description

@ananth7592

Wire _resolve_embeddings into _run_hybrid_search (sync + async), plumb generator through dispatcher, and add diagnostics span

Parent: 46729
Depends on: 46732

Goal

Make the resolver actually run on real queries, fail-fast when embeddings are needed but no generator is configured, and surface embedding-generation latency as a first-class OpenTelemetry span. (Diagnostics scope merged from #46734.)

Scope

Pipeline plumbing

  1. �zure/cosmos/_execution_context/execution_dispatcher.py :: _ProxyQueryExecutionContext:

    • Accept �mbedding_generator in its options dict.
    • Pass it through to the hybrid-search aggregator.
  2. hybrid_search_aggregator.py (sync) and �io/hybrid_search_aggregator.py (async):

    • In _run_hybrid_search / async sibling, after the plan is obtained and before per-partition fan-out, call _resolve_embeddings (from [Cosmos] [Embedding V0] _resolve_embeddings helper on hybrid-search aggregator (sync + async) #46732) and replace the working SqlQuerySpec with the augmented one.
    • Fast-fail: if the plan's �mbeddingParameterMap is non-empty but �mbedding_generator is None, raise a ValueError (or CosmosHttpResponseError 400) with:

      "Query requires embedding generation but no embedding_generator was provided to query_items."

  3. Thread the generator from container.py / �io/_container.py → �xecution_dispatcher → aggregator (already started in [Cosmos] [Embedding V0] Public surface: EmbeddingGenerator protocols + embedding_generator on query_items #46730; confirm the option key name is �mbedding_generator).

  4. The resolver is called exactly once per top-level query attempt, not on every continuation page. Assert this with a counting mock in unit tests.

Diagnostics (merged from #46734)

Inside _resolve_embeddings (and async sibling), wrap the generator call in an OpenTelemetry span:

`python
from azure.core.settings import settings
import time, logging

_logger = logging.getLogger("azure.cosmos")

tracer = settings.tracing_implementation()
if tracer is not None:
with tracer.span(name="cosmos.embedding_generation") as span:
span.add_attribute("cosmos.embedding.count", len(texts))
span.add_attribute("cosmos.embedding.generator_type", type(generator).name)
start = time.perf_counter()
vectors = generator.generate_embeddings(list(texts))
span.add_attribute("cosmos.embedding.latency_ms",
int((time.perf_counter() - start) * 1000))
else:
start = time.perf_counter()
vectors = generator.generate_embeddings(list(texts))
_logger.info(
"embedding_generation count=%d generator_type=%s latency_ms=%d",
len(texts), type(generator).name,
int((time.perf_counter() - start) * 1000),
)
`

Mirror in async aggregator (await the call inside the span).

Privacy:

  • Never log raw input strings.
  • Never log returned vectors.
  • Parameter key names (e.g. @documentdb-hybridsearchquery-embedding-0) may be logged — they are not customer data.

If a tracer is configured, record a span event when the generator raises (so failures appear in traces, not just logs).

Acceptance criteria

  • End-to-end happy path works with a mocked plan + mocked generator (sync + async).
  • Fast-fail exception raised with the right message when generator is missing.
  • Generator called exactly once per top-level query attempt (assert with a counting mock).
  • With an OTel tracer, a cosmos.embedding_generation span is produced with count, latency_ms, and generator_type attributes.
  • Without a tracer, an INFO-level log line is produced.
  • Raw text / vectors do not appear in span attributes or log output (assert in tests).

Files likely touched

  • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/execution_dispatcher.py
  • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/execution_dispatcher.py
  • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/hybrid_search_aggregator.py
  • sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/hybrid_search_aggregator.py
  • sdk/cosmos/azure-cosmos/azure/cosmos/container.py (option threading)
  • sdk/cosmos/azure-cosmos/azure/cosmos/aio/_container.py (option threading)

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    Cosmosfeature-requestThis issue requires a new behavior in the product in order be resolved.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions