Skip to content

[Cosmos] [Embedding V0] VectorEmbeddingPolicy: add EmbeddingSource TypedDict and typed policy models #46764

@ananth7592

Description

@ananth7592

[Cosmos] [Embedding V0] VectorEmbeddingPolicy: document and add typed support for embeddingSource

Parent: #46729

Background

The vectorEmbeddingPolicy on a container now supports an optional embeddingSource block inside each entry of vectorEmbeddings. This block carries the endpoint, deployment name, auth type, and source paths that the new azure-cosmos-embedding package reads to construct an AzureOpenAIEmbeddingGenerator.

Example container policy JSON:

{
  "vectorEmbeddingPolicy": {
    "vectorEmbeddings": [
      {
        "path": "/embedding",
        "dataType": "float32",
        "dimensions": 1536,
        "distanceFunction": "cosine",
        "embeddingSource": {
          "sourcePaths": ["/title", "/abstract"],
          "deploymentName": "text-embedding-3-small",
          "modelName": "text-embedding-3-small",
          "endpoint": "https://embedding-south-central.cognitiveservices.azure.com/",
          "authType": "ApiKey"
        }
      }
    ]
  }
}

Scope

Python's vector_embedding_policy is currently typed as dict[str, Any] and passed through transparently. This issue adds typed support for the new embeddingSource sub-object without breaking the existing raw-dict path.

1. Add TypedDict models

In azure/cosmos/_models.py (or documents.py — confirm preferred location with SDK conventions):

from typing import List, Literal, Optional
from typing_extensions import TypedDict

class EmbeddingSource(TypedDict, total=False):
    sourcePaths: List[str]
    deploymentName: str
    modelName: str
    endpoint: str
    authType: Literal["ApiKey", "Entra"]

class VectorEmbedding(TypedDict, total=False):
    path: str
    dataType: Literal["float32", "float16", "uint8", "int8"]
    dimensions: int
    distanceFunction: Literal["cosine", "dotproduct", "euclidean"]
    embeddingSource: EmbeddingSource   # NEW

class VectorEmbeddingPolicy(TypedDict, total=False):
    vectorEmbeddings: List[VectorEmbedding]

2. Update database.py (sync + async)

Update all vector_embedding_policy keyword parameter type annotations from dict[str, Any] to VectorEmbeddingPolicy — no behavioral change, just stronger typing.

3. Update ContainerProperties docstring (if applicable)

Update the docstring for vector_embedding_policy in ContainerProperties / database.py to document the new embeddingSource schema.

Acceptance criteria

  • TypedDict models are exported from azure.cosmos (or a supported sub-module).
  • mypy passes on a usage like:
    source: EmbeddingSource = {"endpoint": "...", "deploymentName": "...", "authType": "ApiKey"}
  • Existing containers without embeddingSource continue to work unchanged (round-trip through dict[str, Any] still valid).
  • Unit test: create a VectorEmbedding TypedDict with and without embeddingSource, verify json.dumps round-trips correctly.

Files likely touched

  • sdk/cosmos/azure-cosmos/azure/cosmos/_models.py (or documents.py) — new TypedDict classes
  • sdk/cosmos/azure-cosmos/azure/cosmos/__init__.py — export new types
  • sdk/cosmos/azure-cosmos/azure/cosmos/database.py — updated type annotations (sync)
  • sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py — updated type annotations (async)

Dependencies

None — pure model/typing change, no behavioral change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions