-
Notifications
You must be signed in to change notification settings - Fork 3.6k
fix: limit number of concurrent optimized compactions #26319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Limit number of concurrent optimized compactions so that level compactions do not get starved. Starved level compactions result in a sudden increase in disk usage. Add [data] max-concurrent-optimized-compactions for configuring maximum number of concurrent optimized compactions. Default value is 1. Closes: #26315
@gwossum & @devanbenz - I have broken out the compaction planning into a method separate from the asynchronous Go proc with a ticker that drive it. This let me write a test that I think we should massively expand to test and explore the planner. One thing I think is a bug is that we count generations skipping files in use when we are planning optimization. This means in my second new test case, the generation count is zero, because the files were acquired in the initial level planning. But I think that the generation count in this case should be one. What do you think? Please expand the test cases in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two comments regarding the level5Aggressive variable and the logic. As you point out here: https://github.com/influxdata/influxdb/pull/26319/files#r2061085298 it's likely that we have a bug.
tsdb/engine/tsm1/engine.go
Outdated
if level5Aggressive { | ||
log.Info("Planning aggressive optimized compaction because all level 5 is planned for aggressive") | ||
aggressive = true | ||
} else if isOpt, filename, heur := e.IsGroupOptimized(theGroup); isOpt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably pull this out of here and put it within the PlanCompactions
method. We are checking there and then again here for IsGroupOptimized
.
tsdb/engine/tsm1/engine.go
Outdated
if isOpt, filename, heur := e.IsGroupOptimized(group); isOpt { | ||
e.logger.Info("Promoting full compaction level 4 group to optimized level 5 compaction group because it contains an already optimized TSM file", | ||
zap.String("optimized_file", filename), zap.String("heuristic", heur), zap.Strings("files", group)) | ||
level5Groups = append(level5Groups, group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably Release
the group that we moved from initiallevelLevel4Groups
to level5Groups
here? So it is no longer in filesInUse
by the compaction planner and can be picked up by the PlanOptimize planner?
Actually... if we move the compaction group from level4Groups to level5Groups, its still in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means we should set FileInUse to false when counting generations.
If we already plan a level4Group and it has a TSM file that is at aggressivePointsPerBlock move it in to a level5Group and set level5Aggressive to true. This shard should be compacted using the aggressive points per block amount so we don't need to re-write TSM files at the lower points per block.
I would really like to find some ways to pull out the actual "scheduling" of compactions too and test some more of this large chunk of code: I like how we have modularized planning and made it very easy to test. Would be nice to break some of this out and make it more test-able. Going to start brainstorming on this. |
abstraction on top of CompactionGroups that will include the points per block required during compaction.
…5/opt_compact_limiter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions, still need to review test coverage.
a level5 compaction group that has 1 > tsm file(s) that are over max points per block. This will ensure we do not unwind any tsm files already > max points per block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments, some outdated.
tsdb/engine/tsm1/compact_test.go
Outdated
@@ -2704,6 +2704,9 @@ func TestDefaultPlanner_PlanOptimize_Test(t *testing.T) { | |||
|
|||
cp := tsm1.NewDefaultPlanner(ffs, tsdb.DefaultCompactFullWriteColdDuration) | |||
|
|||
compacted, _ := cp.FullyCompacted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check string return, too. Should be empty.
tsdb/engine/tsm1/engine.go
Outdated
@@ -2096,11 +2118,59 @@ func (e *Engine) ShouldCompactCache(t time.Time) bool { | |||
return t.Sub(e.Cache.LastWriteTime()) > e.CacheFlushWriteColdDuration | |||
} | |||
|
|||
// isSingleGeneration returns true if a group contains files from a single generation. | |||
func (e *Engine) isSingleGeneration(group CompactionGroup) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please wait for @gwossum's approval, as well before merging.
tsdb/engine/tsm1/engine.go
Outdated
const waitForOptimization = time.Hour | ||
const tickPeriod = time.Second | ||
|
||
var ticksBeforeOptimize = int(waitForOptimization.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ticksBeforeOptimize
is only correct if tickPeriod
is time.Second
.
I think this will generalize it if tickPeriod
is not time.Second
:
var ticksBeforeOptimize = int(waitForOptimization.Seconds() * time.Second / tickPeriod)
tsdb/engine/tsm1/engine.go
Outdated
if cycleCount <= ticksBeforeOptimize { | ||
return true, waitMessage | ||
} else { | ||
return false, "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to do a startTime := ticks.Now()
before the loop and then do if time.Since(startTime) > waitForOptimize
here because this loop may not take exactly one tickPeriod
to run. Also, it seems easier since you don't have to do dimensional analysis to determine if the ticksBeforeOptimize
calculation is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop runs indefinitely for the life of the program afaik. Wouldn't start time just always be program start? So after the first time it is greater than waitForOptimize it will always be greater?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This just avoids optimizing compactions for the first hour after start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Limit number of concurrent optimized compactions so that level compactions do not get starved. Starved level compactions result in a sudden increase in disk usage. Add [data] max-concurrent-optimized-compactions for configuring maximum number of concurrent optimized compactions. Default value is 1. Co-authored-by: davidby-influx <[email protected]> Co-authored-by: devanbenz <[email protected]> Closes: #26315 (cherry picked from commit 66f4dbe)
Co-authored-by: Geoffrey Wossum <[email protected]> Co-authored-by: davidby-influx <[email protected]> Closes: #26315
Limit number of concurrent optimized compactions so that level compactions do not get starved. Starved level compactions result in a sudden increase in disk usage.
Add [data] max-concurrent-optimized-compactions for configuring maximum number of concurrent optimized compactions. Default value is 1.
Closes: #26315