-
Notifications
You must be signed in to change notification settings - Fork 5k
Add events.failure_store metric to track events sent to Elasticsearch failure store
#48068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
events.failure_store metric to track events sent to Elasticsearch failure store
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
| | `.output.events.failed` | Integer | Number of events that Auditbeat tried to send to the output destination, but the destination failed to receive them. | Generally, we want this field to be absent or its value to be zero. When the value is greater than zero, it’s useful to check Auditbeat’s logs right before this log entry’s `@timestamp` to see if there are any connectivity issues with the output destination. Note that failed events are not lost or dropped; they will be sent back to the publisher pipeline for retrying later. | | ||
| | `.output.events.dropped` | Integer | Number of events that Auditbeat gave up sending to the output destination because of a permanent (non-retryable) error. | | ||
| | `.output.events.dead_letter` | Integer | Number of events that Auditbeat successfully sent to a configured dead letter index after they failed to ingest in the primary index. | | ||
| | `.output.events.failure_store` | Integer | Number of events that were sent to the failure store. The failure store is a feature in Elasticsearch data streams that stores events that fail mapping or ingestion. Events sent to the failure store are still counted as acknowledged. | This metric indicates how many events encountered mapping or ingestion errors but were successfully stored in the failure store. A non-zero value suggests there may be mapping issues or data type mismatches that need to be addressed. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this only be applicable to 9.3+? If so, we should add an applies_to badge to each of the beat reference pages, similar to what is shown here:
| | `.output.events.failure_store` | Integer | Number of events that were sent to the failure store. The failure store is a feature in Elasticsearch data streams that stores events that fail mapping or ingestion. Events sent to the failure store are still counted as acknowledged. | This metric indicates how many events encountered mapping or ingestion errors but were successfully stored in the failure store. A non-zero value suggests there may be mapping issues or data type mismatches that need to be addressed. | | |
| | `.output.events.failure_store` {applies_to}`stack: ga 9.3` | Integer | Number of events that were sent to the failure store. The failure store is a feature in Elasticsearch data streams that stores events that fail mapping or ingestion. Events sent to the failure store are still counted as acknowledged. | This metric indicates how many events encountered mapping or ingestion errors but were successfully stored in the failure store. A non-zero value suggests there may be mapping issues or data type mismatches that need to be addressed. | |
Proposed commit message
See title
Checklist
I have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.## Disruptive User Impact## Author's ChecklistHow to test this PR locally
Manual Testing Procedure: Failure Store Metric
Prerequisites
Test Setup
1. Create a Data Stream with Failure Store Enabled
Create an index template with failure store enabled and strict mappings:
2. Initialize the Data Stream
Create the data stream by indexing two documents
Ensure one of the documents went to the failure store (look for
"failure_store": "used"), the response should look like this:Ensure there is one document in the failure store:
3. Generate some logs that will cause mapping conflict
You can use Docker and flog for this:
4. Run Filebeat
Build Filebeat from this PR and run it using the following
configuration (adjust the output settings as necessary):
filebeat.yml
You can run Filebeat using
jqto parse the logs:You should see some 5s metrics like this:
Where
fsis the counter of events sent to the failure storeThe metrics are also published in the stats endpoint:
will output something like this:
{ "acked": 105, "active": 0, "batches": 21, "dead_letter": 0, "dropped": 0, "duplicates": 0, "failed": 0, "failure_store": 105, "toomany": 0, "total": 105 }Related issues
## Use cases## Screenshots## Logs