[PLU-95]: add opensearch serverless capability#648
Conversation
unstructured_ingest/processes/connectors/elasticsearch/opensearch.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| OpenSearch Service. Falls back to scroll if PIT creation fails due to | ||
| missing permissions (403) or unsupported version (400/404). | ||
| """ | ||
| from opensearchpy import AsyncOpenSearch |
There was a problem hiding this comment.
I know it's not part of the pr, but I didn't have AsyncOpenSearch after installing the opensearch deps. Looks like we may need to install opensearch-py[async]
There was a problem hiding this comment.
good spot! potentially comes from the requirements.txt -> pyproject.toml change
i will look at that as a new PR
There was a problem hiding this comment.
actually let me add it
awalker4
left a comment
There was a problem hiding this comment.
LGTM! The pipelines in https://github.com/Unstructured-IO/potter-testing/pull/3 work well
|
gonna actually put the [async] in a different PR cause it creates a uv.lock reactivation which needs to be looked at carefully. plus this isn't a deal breaker. the aiobotocore stuff is in other requirements. |
Note
Medium Risk
Changes core OpenSearch pagination/query paths (PIT/search_after vs scroll) and AWS endpoint parsing, which can affect large index reads and compatibility across OpenSearch versions; integration tests reduce risk but production clusters may vary in permissions/behavior.
Overview
Adds OpenSearch Serverless (AOSS) support in the OpenSearch connector by switching async ID discovery to PIT +
search_afterpagination with a targeted scroll fallback when PIT creation fails (400/403/404), and by changing the downloader to fetch batches via a boundedsearch-by-IDs request rather thanasync_scan/scroll.Extends AWS hostname auto-detection to recognize FIPS endpoints for both OpenSearch Service (
es-fips) and Serverless (aoss-fips), adds live AOSS source/destination integration tests plus PIT-fallback unit coverage, wiresOPENSEARCH_AOSS_HOSTinto the E2E workflow, and bumps the package to1.4.5with a changelog entry.Written by Cursor Bugbot for commit 524573e. This will update automatically on new commits. Configure here.