Skip to content

Batching in upload_documents() does not work #40157

Open
@cecheta

Description

@cecheta
  • Package Name: azure-search-documents
  • Package Version: 11.6.0b10
  • Operating System: Windows (WSL2)
  • Python Version: 3.12.9

Describe the bug
I noticed that in the source code of SearchClient.upload_documents(), more specifically the _index_documents_actions() function, there is meant to be a batch split and retry if the batch is too large and a 413: RequestEntityTooLargeError error is produced. However, the batch splitting doesn't work. Instead, the following error is observed:

Traceback (most recent call last):
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 703, in _index_documents_actions
    batch_response = self._client.documents.index(batch=batch, error_map=error_map, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_generated/operations/_documents_operations.py", line 1232, in index
    map_error(status_code=response.status_code, response=response, error_map=error_map)
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/core/exceptions.py", line 163, in map_error
    raise error
azure.search.documents._search_documents_error.RequestEntityTooLargeError: Operation returned an invalid status 'Request Entity Too Large'
Content: The page was not displayed because the request entity is too large.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/Code/playground/main.py", line 37, in <module>
    client.get_search_client("my-index").upload_documents(documents=documents)
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 596, in upload_documents
    results = self.index_documents(batch, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 695, in index_documents
    return self._index_documents_actions(actions=batch.actions, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 709, in _index_documents_actions
    batch_response_first_half = self._index_documents_actions(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 703, in _index_documents_actions
    batch_response = self._client.documents.index(batch=batch, error_map=error_map, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: azure.search.documents._generated.operations._documents_operations.DocumentsOperations.index() got multiple values for keyword argument 'error_map'

To Reproduce
Steps to reproduce the behavior:

  1. Create AI Search resource
  2. Set the following env vars: AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_API_KEY
  3. Run the following code:
import os

from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchFieldDataType,
    SearchIndex,
    SimpleField,
)

client = SearchIndexClient(
    os.getenv("AZURE_SEARCH_ENDPOINT"),
    AzureKeyCredential(os.getenv("AZURE_SEARCH_API_KEY")),
)

index_name = "my-index"

client.create_or_update_index(
    SearchIndex(
        name=index_name,
        fields=[
            SimpleField(name="id", type=SearchFieldDataType.String, key=True),
            SimpleField(name="content", type=SearchFieldDataType.String),
        ],
    )
)

documents = [{"id": str(i), "content": " " * 100000} for i in range(10000)]

client.get_search_client(index_name).upload_documents(documents=documents)

Expected behavior
After the first batch fails, it should split into two smaller batches and retry both.

Additional context
From what I've seen, removing error_map=error_map here and here seems to fix it.

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Searchneeds-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions