Description
Host operating system: output of uname -a
Linux <pod> 4.19.0-18-cloud-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 GNU/Linux
mysqld_exporter version: output of mysqld_exporter --version
0.12.1
MySQL server version
5.7
What did you do that produced an error?
We've periodically had a problem in production where the mysqld_exporter
contributes to database overload. Here are the steps for how that happened:
- Our production database got bogged down running a very slow query.
- Prometheus hit
mysqld_exporter
at a scrape interval. - The
mysqld_exporter
query took longer than the scrape interval to run. So before it could complete, we looped back to (2). They continued to stack up without bound.
Note: We submitted a fix for this 2 years ago on the Percona (PMM) fork. That has been running successfully ever since. Here is that issue and the percona PR. But recently we had a project that used this prometheus version of mysqld_exporter
and after a few months of running that, we were bitten by the same unbounded query meltdown.
Here is a prometheus PR for this issue that backports the PMM fix above.
(Also note, although this ticket is related, it is not a duplicate, since there are many possible causes of query slowdown.)
What did you expect to see?
We expected the first mysqld_exporter query to return results as soon as possible and later attempts to run the same query to return a "429 Too Many Requests" error.
What did you see instead?
When the query time exceeded the scrape interval, we saw an unbounded set of mysqld_exporter queries running at the same time. These queries led to a meltdown where the mysqld_exporter
eventually ran out of memory and was OOM-killed. And during that time, the excess mysqld_exporter
queries contributed to the MySQL server overload.