@@ -119,7 +119,12 @@ var walFailoverUnlimitedTokens = settings.RegisterBoolSetting(
119
119
"when true, during WAL failover, unlimited admission tokens are allocated" ,
120
120
false )
121
121
122
- // Experimental observations:
122
+ // The following experimental observations were used to guide the initial
123
+ // implementation, which aimed to maintain a sub-level count of 20 with token
124
+ // calculation every 60s. Since then, the code has evolved to calculate tokens
125
+ // every 15s and to aim for regular work maintaining a sub-level count of
126
+ // l0SubLevelCountOverloadThreshold/2. So this commentary should be
127
+ // interpreted in that context:
123
128
// - Sub-level count of ~40 caused a node heartbeat latency p90, p99 of 2.5s,
124
129
// 4s. With a setting that limits sub-level count to 10, before the system
125
130
// is considered overloaded, and adjustmentInterval = 60, we see the actual
@@ -133,9 +138,35 @@ var walFailoverUnlimitedTokens = settings.RegisterBoolSetting(
133
138
// then we run the risk of having 100+ sub-levels when we hit a file count
134
139
// of 1000. Instead we use a sub-level overload threshold of 20.
135
140
//
136
- // We've set these overload thresholds in a way that allows the system to
137
- // absorb short durations (say a few minutes) of heavy write load.
138
- const l0FileCountOverloadThreshold = 1000
141
+ // A sub-level count of l0SubLevelCountOverloadThreshold results in the same
142
+ // score as a file count of l0FileCountOverloadThreshold. Exceptions: a small
143
+ // L0 in terms of bytes (see IOThreshold.Score); these constants being
144
+ // overridden in the cluster settings
145
+ // admission.l0_sub_level_count_overload_threshold and
146
+ // admission.l0_file_count_overload_threshold. We ignore these exceptions in
147
+ // the discussion here. Hence, 20 sub-levels is equivalent in score to 4000 L0
148
+ // files, i.e., 1 sub-level is equivalent to 200 files.
149
+ //
150
+ // Ideally, equivalence here should match equivalence in how L0 is scored for
151
+ // compactions. CockroachDB sets Pebble's L0CompactionThreshold to a constant
152
+ // value of 2, which results in a compaction score of 1.0 with 1 sub-level.
153
+ // CockroachDB does not override Pebble's L0CompactionFileThreshold, which
154
+ // defaults to 500, so 500 files cause a compaction score of 1.0. So in
155
+ // Pebble's compaction scoring logic, 1 sub-level is equivalent to 500 L0
156
+ // files.
157
+ //
158
+ // So admission control is more sensitive to higher file count than Pebble's
159
+ // compaction scoring. l0FileCountOverloadThreshold used to be 1000 up to
160
+ // v24.3, and increasing it to 4000 was considered significant enough --
161
+ // increasing to 10000, to make Pebble's compaction logic and admission
162
+ // control equivalent was considered too risky. Note that admission control
163
+ // tries to maintain a score of 0.5 when admitting regular work, which if
164
+ // caused by file count represent 2000 files. With 2000 files, the L0
165
+ // compaction score is 2000/500 = 4.0, which is significantly above the
166
+ // compaction threshold of 1.0 (at which a level is eligible for compaction).
167
+ // So one could argue that this inconsistency between admission control and
168
+ // Pebble is potentially harmless.
169
+ const l0FileCountOverloadThreshold = 4000
139
170
const l0SubLevelCountOverloadThreshold = 20
140
171
141
172
// ioLoadListener adjusts tokens in kvStoreTokenGranter for IO, specifically due to
0 commit comments