Skip to content

TBS: apm-server never recovers from storage limit exceeded in rare cases #14923

Closed
@carsonip

Description

@carsonip

Tail based sampling: There are observations where after storage limit is exceeded, lsm size remains greatly higher than vlog size.

The assumption is that lsm size should usually be smaller than vlog. It is unclear whether compactions are done in the background to reclaim expired keys in LSM tree. If vlog << lsm, any vlog gc would not be effective in reclaiming storage, and apm-server may be indefinitely stuck in a state where storage exceeds limit.

There are also cases where vlog files are days old in the file system. My hypothesis is that vlog gc thread is still running, but because it relies on stats from compactions but compactions are not run, vlog files are not cleaned up.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions