Open
Description
When we're tracing some container OOM issues (exit code 137, killed by K8S because too much memory usage) of Spark Rapids, we find most of executor memory is allocated via HostMemoryBuffer. Meanwhile, after @revans2 's #17197, in theory the total amount of memory allocated by HostMemoryBuffer will be bounded by spark.rapids.memory.host.offHeapLimit.size
.
It is nice and important if we can have a metrics (like a Gauge in promethus) telling us how much memory is consumed by HostMemoryBuffer. Since we don't have embedded monitoring solution (like promethus+grafana), we'd better log HostMemoryBuffer usage each time it reaches a new watermark.