Skip to content

Conversation

@OVI3D0
Copy link
Member

@OVI3D0 OVI3D0 commented Nov 11, 2025

Description

Adds simple/detailed stats methods to the BulkVectorDataSet runner and also changes the retry logic.

Rather than retrying the entire bulk upon failure, this PR adds logic to parse through failed documents and retry only those failed ones rather than the entire bulk.

2025-11-13 01:32:47,15 ActorAddr-(T|:41429)/PID:616400 osbenchmark.worker_coordinator.runner INFO Retrying 54 structured docs, indices=[0, 1, 2, 5, 8, 9, 11, 12, 13, 15]

Ingested documents using a 1M dataset with these changes:

[ec2-user@ip-172-31-0-197 ~]$ ~/.local/bin/awscurl --service aoss --region "$REGION"   --access_key "$AWS_ACCESS_KEY_ID"   --secret_key "$AWS_SECRET_ACCESS_KEY"   --security_token "$AWS_SESSION_TOKEN"   -X GET "$AOSS_ENDPOINT/${INDEX}/_count?pretty"
{
  "count" : 999990,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "skipped" : 0,
    "failed" : 0
  }
}

vs without the changes:

[ec2-user@ip-172-31-0-197 ~]$ ~/.local/bin/awscurl --service aoss --region "$REGION"   --access_key "$AWS_ACCESS_KEY_ID"   --secret_key "$AWS_SECRET_ACCESS_KEY"   --security_token "$AWS_SESSION_TOKEN"   -X GET "$AOSS_ENDPOINT/${INDEX}/_count?pretty"
{
  "count" : 525869,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "skipped" : 0,
    "failed" : 0
  }
}

Issues Resolved

#979

Testing

  • New functionality includes testing

Running ingestion of 1M documents on AOSS vectorsearch collection


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@OVI3D0 OVI3D0 marked this pull request as draft November 11, 2025 21:24
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
@OVI3D0 OVI3D0 changed the title add simple/detailed stats to BulkVectorDataSet runner Add individual doc retries to BulkVectorDataSet runner Nov 13, 2025
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
@IanHoang IanHoang marked this pull request as ready for review November 20, 2025 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant