Skip to content

[FEATURE] Yield bulk api error doc sources when raise_on_error=False #954

@mergu

Description

@mergu

Is your feature request related to a problem?

The bulk (and streaming_bulk) helpers only yield the response from every indexing operation when raise_on_error=False. This means when there are errors indexing a document, we don't have enough information to find out which doc failed.

An example current yield:

    {
        'index': {
            '_index': 'logs-2025.09.10-000001',
            '_id': 'JQn6MpkBL_dyks7LFLJw',
            'status': 400,
            'error': {
                'type': 'mapper_parsing_exception',
                'reason': "failed to parse field [resource] of type [keyword] in document with id 'JQn6MpkBL_dyks7LFLJw'. Preview of field's value: '{resourceId=..., resourceType=AWS::EC2::Instance}'",
                'caused_by': {
                    'type': 'illegal_state_exception',
                    'reason': "Can't get text on a START_OBJECT at 1:387"
                }
            }
        }
    }

The only identifying information here is the doc id that opensearch generated.

We want to track the documents that fail so we can retry certain errors or patch data/mappings. And we would also like to continue to use these bulk helpers.

What solution would you like?

We would like an option in streaming_bulk that also yields the data back, for example yield_data

What alternatives have you considered?

One workaround is providing my own document ids in the bulk payload but doing this incurs a performance hit while indexing. The other workaround is handling exceptions raised when raise_on_error=True but there are some trade-offs, it doesn't seem like retries happen and from what I can tell any subsequent chunks after the exception is raised don't get sent.

Do you have any additional context?

Open to PRing some kind of solution here

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions