Open
Description
- Package Name: azure-search-documents
- Package Version: 11.6.0b10
- Operating System: Windows (WSL2)
- Python Version: 3.12.9
Describe the bug
I noticed that in the source code of SearchClient.upload_documents()
, more specifically the _index_documents_actions()
function, there is meant to be a batch split and retry if the batch is too large and a 413: RequestEntityTooLargeError
error is produced. However, the batch splitting doesn't work. Instead, the following error is observed:
Traceback (most recent call last):
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 703, in _index_documents_actions
batch_response = self._client.documents.index(batch=batch, error_map=error_map, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_generated/operations/_documents_operations.py", line 1232, in index
map_error(status_code=response.status_code, response=response, error_map=error_map)
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/core/exceptions.py", line 163, in map_error
raise error
azure.search.documents._search_documents_error.RequestEntityTooLargeError: Operation returned an invalid status 'Request Entity Too Large'
Content: The page was not displayed because the request entity is too large.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/Code/playground/main.py", line 37, in <module>
client.get_search_client("my-index").upload_documents(documents=documents)
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 596, in upload_documents
results = self.index_documents(batch, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 695, in index_documents
return self._index_documents_actions(actions=batch.actions, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 709, in _index_documents_actions
batch_response_first_half = self._index_documents_actions(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/Code/playground/.venv/lib/python3.12/site-packages/azure/search/documents/_search_client.py", line 703, in _index_documents_actions
batch_response = self._client.documents.index(batch=batch, error_map=error_map, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: azure.search.documents._generated.operations._documents_operations.DocumentsOperations.index() got multiple values for keyword argument 'error_map'
To Reproduce
Steps to reproduce the behavior:
- Create AI Search resource
- Set the following env vars:
AZURE_SEARCH_ENDPOINT
,AZURE_SEARCH_API_KEY
- Run the following code:
import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchFieldDataType,
SearchIndex,
SimpleField,
)
client = SearchIndexClient(
os.getenv("AZURE_SEARCH_ENDPOINT"),
AzureKeyCredential(os.getenv("AZURE_SEARCH_API_KEY")),
)
index_name = "my-index"
client.create_or_update_index(
SearchIndex(
name=index_name,
fields=[
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SimpleField(name="content", type=SearchFieldDataType.String),
],
)
)
documents = [{"id": str(i), "content": " " * 100000} for i in range(10000)]
client.get_search_client(index_name).upload_documents(documents=documents)
Expected behavior
After the first batch fails, it should split into two smaller batches and retry both.
Additional context
From what I've seen, removing error_map=error_map
here and here seems to fix it.