Add global label length limits#83
Conversation
1. config/config.go — Default limits of 1 MiB - Added LabelNameLengthLimit: 1 << 20 and LabelValueLengthLimit: 1 << 20 to DefaultGlobalConfig - Added fallback logic in GlobalConfig.UnmarshalYAML so a partial global config (one that doesn't set these fields) still gets the defaults, matching the pattern already used for ScrapeInterval, EvaluationInterval, etc. 2. scrape/scrape.go — Pre-construction panic prevention Added a check before p.Labels(&lset) that rejects metrics whose raw text exceeds 16 MiB. Since the encoding format panics for any individual string > 16 MiB, and no single label value can be longer than the total metric string, len(met) > 1<<24 is a safe guard. The configured 1 MiB limit then catches the normal cases in verifyLabelLimits. 3. storage/remote/write_handler.go — Configurable limits in RW receiver - Added labelNameLengthLimit and labelValueLengthLimit fields to writeHandler - Added two int parameters to NewWriteHandler - Added checkLabelLengths() helper that enforces the limits - Wired the check into both v1 write() and v2 appendV2() paths 4. web/api/v1/api.go — Wires global config limits to RW handler Passes cfg.GlobalConfig.LabelNameLengthLimit and LabelValueLengthLimit from the config to NewWriteHandler at construction. 5. promql/functions.go — Guards in label_replace and label_join Added explicit checks for the 16 MiB encoding limit before the lb.Labels() call. The evaluator's existing defer ev.recover() already converts panics to errors, but these checks give a clear, explicit error message instead. Tests - Updated config/config_test.go to expect the new 1 MiB defaults in TestGetScrapeConfigs - Updated all ~16 NewWriteHandler call sites in write_handler_test.go to pass 0, 0 (no limit) for the new parameters
|
I see there is a mismatch between the description and what is checked in for LabelValueLengthLimit:
LabelValueLengthLimit: 1 << 21, |
|
Please can you ensure you sign the DCO with your commit? Thanks! |
| if gc.LabelNameLengthLimit == 0 { | ||
| gc.LabelNameLengthLimit = DefaultGlobalConfig.LabelNameLengthLimit | ||
| } | ||
|
|
There was a problem hiding this comment.
gc.LabelValueLengthLimit is not initialised to DefaultGlobalConfig.LabelValueLengthLimit
| } | ||
|
|
||
| // checkLabelLengths returns an error if any label name or value in lset exceeds | ||
| // the configured limits. A limit of 0 means no limit. |
There was a problem hiding this comment.
If the user wishes to set no limit, and uint has a default value of 0, how can we differentiate between omission vs no limit? Perhaps we can use -1 instead to specify no limit?
| MetricNameEscapingScheme: model.AllowUTF8, | ||
| // Default to 1 MiB to avoid crashes from the 16 MiB encoding limit. | ||
| LabelNameLengthLimit: 1 << 20, | ||
| LabelValueLengthLimit: 1 << 21, |
There was a problem hiding this comment.
For consistency, could you omit the spaces here to match how the syntax is done later on in the code?
| indexes := regex.FindStringSubmatchIndex(srcVal) | ||
| if indexes != nil { // Only replace when regexp matches. | ||
| res := regex.ExpandString([]byte{}, repl, srcVal, indexes) | ||
| if len(res) > 1<<24 { |
There was a problem hiding this comment.
Could we set 1<<24 to a variable and referenced wherelse it is being used?
Note: this PR is an example for use in the KubeCon EU ContribFest session. Please leave if you are not in that session.
Added a check before p.Labels(&lset) that rejects metrics whose raw text exceeds 16 MiB. Since the encoding format panics for any individual string >
16 MiB, and no single label value can be longer than the total metric string, len(met) > 1<<24 is a safe guard. The configured 1 MiB limit then
catches the normal cases in verifyLabelLimits.
Passes cfg.GlobalConfig.LabelNameLengthLimit and LabelValueLengthLimit from the config to NewWriteHandler at construction.
Added explicit checks for the 16 MiB encoding limit before the lb.Labels() call. The evaluator's existing defer ev.recover() already converts panics
to errors, but these checks give a clear, explicit error message instead.
Tests
Which issue(s) does the PR fix:
Does this PR introduce a user-facing change?