Description
APM Server version (apm-server version
):
8.13.2
Description of the problem including expected versus actual behavior:
I believe there is a regression in the number of bulk requests/s emitted by APM Server towards Elastic. I don't have all data for it, but I believe the issue started in 8.12.x and then got worse with the upgrade to 8.13.x.
This is from the Elastic cloud console: On otherwise identical clients/agents, we see that with the upgrade from 8.12 to 8.13 the number of indexing requests/s jumped noticeably.
The number of documents/s that get ingested didn't really change much, but our hot nodes started to run under higher load. Fortunately, we observed that setting index.translog.durability=async
on the all APM indices drops IOPS per hot node back to acceptable levels:
On a decently tuned bulk_max_size
setting I'd not expect such a big drop in IOPS/s. I am thus wondering if the APM server is misconfigured in newer versions.
It almost feels like the server uses the agent defaults instead of its custom bulk size option? Was it maybe forgotten to set preset: custom
into the server config once elastic/elastic-agent#3797 got merged into the agent?