feat: Implement utf8 label name client capability #4442

jake-kramer · 2025-09-17T19:26:44Z

This PR implements a client capability framework, with allow-utf8-labelnames as the first capability. Capabilities are flags clients set in the Accept header to inform the API about specific client support. This is useful for cross-API features (like utf-8 support).

This PR does not provide full utf-8 label name support. Instead, it offers the first pass at what will be full support, with follow up PRs to provide full support (expanded on below).

This PR respects the allow-utf8-labelnames client capability in the read path (both v1 and v2 LabelNames and Series APIs). If the capability is not set or not set to true, these APIs will filter out "invalid" label names (i.e. with chars outside of [a-zA-Z0-9_.]) from the response. This logic is currently a no-op, given it's not yet possible to write label names outside of this charset.

A future PR will disable the write path label name sanitization if the allow-utf8-labelnames client capability (gated behind a per-tenant feature flag).

Another future PR will update the OG UI and profilecli clients to take advantage of this client capability.

cc @simonswine @bryanhuhta

jake-kramer

Still have todos like tests and more api coverage (I'm missing some read path APIs), but wanted to get this draft out for comment sooner rather than later

pkg/featureflags/client_capability.go

This feature is a relaxation in previous validation for label names. Given the backward incompatible nature of this change (for example, PromQL queries must be [updated to support utf-8 label names](https://prometheus.io/docs/guides/utf8/#querying)), it is gated behind a "client capability": a key/val that a client must set in the `Accept:` header.

pkg/featureflags/client_capability.go

pkg/distributor/distributor.go

pkg/featureflags/client_capability.go

- Moved all client capability logic to its own module - Move legacy (i.e. non utf-8 enabled) validation logic to query path - Implemented in v2 LabelNames and Series APIs - Relax write path to allow any utf-8 valid label names - Created HTTP and gRPC middleware for client capability - Updates og UI to work with utf-8 label name selection - Removed utf8 label name from feature flags [since unrelated to feature flags now]

jake-kramer · 2025-09-23T18:06:22Z

I've verified

Writes work as expected
HTTP and gRPC middleware both work as expected
Reads work with the OG UI
Writes work with profilecli

Breaking changes

If a client writes a utf-8 label name with this new logic, but reads the label name with the old logic, they will see utf-8 label names.
Previously, writing [labelName_1: val, labelName.1: val] would resolve to just [labelName_1: val] on write [perhaps this was a logic bug?]. With new query logic, but ut8 label names disabled, labelName.1 would convert to labelName_1 and then an error would be returned for duplicate label names.

Open questions cc @simonswine @bryanhuhta

Is implementing this logic for only the v2 LabelNames and Series APIs sufficient?
Any issues with the breaking changes above this PR introduces?

TODOs

Testing
Documentation
Integrating against other clients

- General cleanup - Removed clientcapabilities package, moved back in to frontend package - Using struct for client capabilities in the context - Added many tests

alsoba13 · 2025-09-25T11:11:12Z

TL;DR: Worth mentioning that recording rules are not 100% compatible with UTF-8. We should validate exported label names and metric name.

Extended:
So recording rules may target some profiles to be recorded. Those are aggregated and exported through remote_write v1. There's an experimental remote write v2 version that can handle that, but we are uncertain about mimir's plans to support this. I think best approach right now is to not support utf8 in recording rules.

From recording rules model https://github.com/grafana/pyroscope/blob/main/api/settings/v1/recording_rules.proto#L56-L92:

metric_name should still be [a-zA-Z_:]([a-zA-Z0-9_:])*
matchers should be able to handle utf8 {"service.name"="foo"} should be okay.
group_by should still be [a-zA-Z_:]([a-zA-Z0-9_:])*. Those are label names to export, so ["service.name"] must NOT be supported.
external_labels label names should still be [a-zA-Z_:]([a-zA-Z0-9_:])*.

I propose to validate those so we make sure our rules are ok. We have 2 places to validate this, first at recording rule creating time (UI sending a POST request to tenant-settings), and compaction workers fetching rules from tenant-settings at compaction time, basically wrapped in this constructor. Maybe you find that some of the requirements already meet (I remember using some validations for the metric_names, for example).

jake-kramer · 2025-09-25T17:25:27Z

Current State of the PR

Write behavior:
- All writes (v1 and v2) now only perform UTF-8 label name validation.
- Previously, label names were restricted to [a-zA-Z_.:]([a-zA-Z0-9_.:])* (with "." transformed to "_").
Read behavior (v2 APIs):
- Behavior depends on the allow-utf8-labelnames client capability set in the Accept header:
  - Enabled → Return label names as written.
  - Disabled → Fallback to the legacy logic (reject invalid names, transform "." → "_").

Known Issues

Mixed read/write semantics during rollout
- If a client writes a UTF-8 label name (new logic) but reads it with the old logic, they will see UTF-8 label names.
- This is expected during deployment and can be mitigated by feature flags that delay enabling new writes until rollout is complete.
Not implemented for v1 APIs
- Assumption: these APIs are not actively supported—can we confirm?
Label selectors and compatibility
- Any API that uses a label_selector may break if queries assume "." were transformed to "_", regardless of if the client enabled the allow-utf8-labelnames client capability.

Proposed Modification

Change writes to also respect the allow-utf8-labelnames client capability.. This would imply existing/old write logic would remain unless this client capability was enabled.

Pros:
- Seamless deployment path without need for a runtime feature flag.
- Avoids breaking queries (e.g., label_selector) for clients that haven’t yet updated their capability setting.
Cons:
- Expands the compatibility matrix (UTF-8 on/off × read/write paths).
- Increases the risk of confusion if clients inconsistently set the capability across different calls.

marcsanmi · 2025-09-26T14:23:12Z

My two cents about it:

While client capabilities provide granular control, I think we should keep a server-side feature flag as a kill switch ideally to allow us to:

Emergency rollback without code changes

Assumption: these APIs are not actively supported—can we confirm?

I think the v1 APIs should be supported because:

OSS users may still be using the traditional architecture (V2=false)
AFAIK, no deprecation timeline has been communicated for V1 frontend

Label selectors and compatibility

I'm not sure about the best approach in a mixed-data scenario... I see:

Dual query approach: "Temporarily" query both formats
Data migration: Background job to normalize historical label names
Query translation: Automatically try both formats in label selectors
... Or we could just break

I believe the key decision is whether we want a clean migration (store UTF-8 only, handle compatibility at read/query time) or safer migration (store based on client capability, accept permanent mixed data); do we want to support both formats coexisting at the same time for a given user...?

simonswine

Sorry it took a while to come back to you on this.

Let me address from you commentt

Not implemented for v1 APIs

As per marc's comment, this needs to be implemented in both implementations. (Note: There is not necessarily a v1/v2 API, but v1/v2 have different code paths implementing the same API.

Mixed read/write semantics during rollout

I think during version rollout of the new version of Pyroscope small inconsistencies are acceptable, also even if the rollout would be instant, there is still old data that will have complied to the old push validation. I think this is the price we will have to for not implementing a full forward/backward compatibility of labels names outside of the legacy validation.

Label selectors and compatibility

I agree this is a problem and there are further problem with translating label names on the read path (translation is not reversible so follow upcall would need to implement them too (all calls with selector), therefore we said in https://github.com/grafana/pyroscope-squad/issues/434#issuecomment-3113813464, we don't do such translation and filter non legacy compatible names in Series/LabelNames call.

So what I am proposing is to change the PR to:

Filter on read (not Sanitize on read)
Scope this PR to be backend and read path only (both v1/v2 implmenetation of the frontend)
Create separate new PR for the frontend to use the client capabilities defined here
Create separate new PR that modfies the write path to no longer sanitise label names, make this controlable on a per tenant basis and publish the state of it through a new feature flag, something likeutf8LabelNamesWritePath

pkg/frontend/readpath/queryfrontend/query_label_names.go

pkg/frontend/readpath/queryfrontend/query_series_labels.go

simonswine · 2025-09-29T08:43:16Z

public/app/components/TagsBar.tsx

 // Identifies whether a label is in a query or not
 function isLabelInQuery(query: string, label: string, labelValue: string) {
-  return query.includes(`${label}="${labelValue}"`);
+  return query.includes(`"${label}=${labelValue}"`);


I would do this in separate PR to keep the scope as tight as possible.

I also think this would be the correct way to double-quote utf8 names in selectors:

Suggested change

return query.includes(`"${label}=${labelValue}"`);

return query.includes(`"${label}"="${labelValue}"`);

simonswine · 2025-09-29T08:53:38Z

pkg/featureflags/client_capability.go

+		// TODO add metrics = # requests like this and # clients [need
+		//  labels for requests and clients/tenet and user agent(?)]


Metrics like this are likely very costly (high cardinality), so I would advise against it. If we need anything like this we already should log those headers. (Maybe doublecheck that works as exptected, but we are setting -server.log-request-headers)

I'm considering adding a counter metric with low cardinality labels (tenant + capability name).. any concerns?

Yes that's makes a lot of sense, probably a good follow up PR

Followed up here: #4498

cmd/profilecli/client_test.go

- Removes profilecli and OG front end changes (will go in separate PR's) - Updates read path to filter instead of sanitize on read - Reverts utf8 featureflag removal (will be used in separate PR)

jake-kramer · 2025-10-06T20:47:46Z

Taking this PR out of draft mode 🎉

This PR implements a client capability framework, with allow-utf8-labelnames as the first capability. Capabilities are flags clients set in the Accept header to inform the API about specific client support. This is useful for cross-API features (like utf-8 support).

This PR does not provide full utf-8 label name support. Instead, it offers the first pass at what will be full support, with follow up PRs to provide full support (expanded on below).

This PR respects the allow-utf8-labelnames client capability in the read path (both v1 and v2 LabelNames and Series APIs). If the capability is not set or not set to true, these APIs will filter out "invalid" label names (i.e. with chars outside of [a-zA-Z0-9_.]) from the response. This logic is currently a no-op, given it's not yet possible to write label names outside of this charset.

A future PR will disable the write path label name sanitization if the allow-utf8-labelnames client capability (gated behind a per-tenant feature flag).

Another future PR will update the OG UI and profilecli clients to take advantage of this client capability.

cc @simonswine @bryanhuhta

simonswine

I do think this is very close to LGTM, a couple of suggestions, after that I am happy for you to merge this.

pkg/frontend/readpath/queryfrontend/query_series_labels.go

pkg/querier/querier_test.go

jake-kramer · 2025-10-08T16:55:20Z

TL;DR: Worth mentioning that recording rules are not 100% compatible with UTF-8. We should validate exported label names and metric name.

Extended: So recording rules may target some profiles to be recorded. Those are aggregated and exported through remote_write v1. There's an experimental remote write v2 version that can handle that, but we are uncertain about mimir's plans to support this. I think best approach right now is to not support utf8 in recording rules.

From recording rules model https://github.com/grafana/pyroscope/blob/main/api/settings/v1/recording_rules.proto#L56-L92:

metric_name should still be [a-zA-Z_:]([a-zA-Z0-9_:])*

matchers should be able to handle utf8 {"service.name"="foo"} should be okay.

group_by should still be [a-zA-Z_:]([a-zA-Z0-9_:])*. Those are label names to export, so ["service.name"] must NOT be supported.

external_labels label names should still be [a-zA-Z_:]([a-zA-Z0-9_:])*.

I propose to validate those so we make sure our rules are ok. We have 2 places to validate this, first at recording rule creating time (UI sending a POST request to tenant-settings), and compaction workers fetching rules from tenant-settings at compaction time, basically wrapped in this constructor. Maybe you find that some of the requirements already meet (I remember using some validations for the metric_names, for example).

pyroscope/pkg/settings/recording/recording.go

Lines 279 to 290 in e0e688b

    
           for _, l := range req.GroupBy { 
        
           	name := prom.LabelName(l) 
        
           	if !prom.UTF8Validation.IsValidLabelName(string(name)) { 
        
           		errs = append(errs, fmt.Errorf("group_by label %q must match %s", l, prom.LabelNameRE.String())) 
        
           	} 
        
           } 
        
           for _, l := range req.ExternalLabels { 
        
           	name := prom.LabelName(l.Name) 
        
           	if !prom.UTF8Validation.IsValidLabelName(string(name)) { 
        
           		errs = append(errs, fmt.Errorf("external_labels name %q must be a valid utf-8 string", l.Name)) 
        
           	}

@alsoba13 If I'm understanding the code correctly, it seems like label names from recording rules are currently utf-8 validated (because of this code), and not validated by Prometheus' legacy validation. I can address this issue, but in a separate PR since it's orthogonal to label name compatibility for profiles.

simonswine · 2025-10-09T16:19:32Z

pkg/querier/querier.go

+	for _, name := range toFilter {
+		if _, _, ok := validation.SanitizeLegacyLabelName(name); !ok {
+			level.Debug(q.logger).Log("msg", "filtering out label", "label_name", name)
+			continue
+		}
+		filtered = append(filtered, name)
+	}
+	return filtered, nil


This is not necessary, as LabeNames should already filter them with the same logic

Logic is required when len(req.Msg.LabelNames) != 0; will add for that case

jake-kramer requested review from bryanhuhta and simonswine September 17, 2025 19:26

jake-kramer commented Sep 17, 2025

View reviewed changes

pkg/featureflags/client_capability.go Show resolved Hide resolved

pkg/featureflags/client_capability.go Outdated Show resolved Hide resolved

pkg/featureflags/client_capability.go Outdated Show resolved Hide resolved

jake-kramer force-pushed the utf-label-names branch from 6f6e55d to 72a3b80 Compare September 17, 2025 19:31

simonswine reviewed Sep 18, 2025

View reviewed changes

pkg/featureflags/client_capability.go Outdated Show resolved Hide resolved

pkg/distributor/distributor.go Outdated Show resolved Hide resolved

pkg/featureflags/client_capability.go Outdated Show resolved Hide resolved

jake-kramer changed the title ~~DRAFT feat: Allow utf-8 characters in label names~~ WIP feat: Allow utf-8 characters in label names Sep 23, 2025

jake-kramer added 3 commits September 23, 2025 16:14

Prettier format

f651740

Handle multiple values set for Accept header

6e8c895

Address comments

a2b48a1

- General cleanup - Removed clientcapabilities package, moved back in to frontend package - Using struct for client capabilities in the context - Added many tests

simonswine reviewed Sep 29, 2025

View reviewed changes

jake-kramer added 6 commits October 6, 2025 10:05

Merge branch 'main' into utf-label-names

d28e172

Filter (instead of sanitize) on read

8b0a87e

- Removes profilecli and OG front end changes (will go in separate PR's) - Updates read path to filter instead of sanitize on read - Reverts utf8 featureflag removal (will be used in separate PR)

Finish v2 read path filtering

12c20e7

v1 read path filtering

52d3ed3

Add tests

cb7c9b2

fmt/lint

705b47e

jake-kramer changed the title ~~WIP feat: Allow utf-8 characters in label names~~ feat: Implement utf8 label name client capability Oct 6, 2025

jake-kramer added 2 commits October 6, 2025 16:28

Fix comment on legacy label name valid chars

84bc1b2

Fix race condition in test

b9d1ec2

jake-kramer marked this pull request as ready for review October 6, 2025 20:48

jake-kramer requested review from a team, aleks-p and marcsanmi as code owners October 6, 2025 20:48

Merge branch 'main' into utf-label-names

b2888fa

simonswine approved these changes Oct 8, 2025

View reviewed changes

pkg/frontend/readpath/queryfrontend/query_series_labels.go Show resolved Hide resolved

pkg/querier/querier_test.go Outdated Show resolved Hide resolved

simonswine assigned jake-kramer Oct 8, 2025

jake-kramer added 3 commits October 8, 2025 09:43

Use LabelNames instead of backdoor query

9b0d790

Update querier label name filtering test

f41512f

fmt

55810eb

This was referenced Oct 8, 2025

feat: Allow utf-8 label names in write path #4489

Open

feat: Use allow-utf8-labelnames client capability in profilecli #4490

Merged

feat: Use allow-utf8-labelnames client capability in UI #4493

Merged

jake-kramer merged commit c26f585 into main Oct 9, 2025
20 checks passed

jake-kramer deleted the utf-label-names branch October 9, 2025 12:57

simonswine reviewed Oct 9, 2025

View reviewed changes

	return query.includes(`"${label}=${labelValue}"`);
	return query.includes(`"${label}"="${labelValue}"`);

		// TODO add metrics = # requests like this and # clients [need
		// labels for requests and clients/tenet and user agent(?)]

feat: Implement utf8 label name client capability #4442

feat: Implement utf8 label name client capability #4442

Uh oh!

Conversation

jake-kramer commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jake-kramer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jake-kramer commented Sep 23, 2025

I've verified

Breaking changes

Open questions cc @simonswine @bryanhuhta

TODOs

Uh oh!

alsoba13 commented Sep 25, 2025

Uh oh!

jake-kramer commented Sep 25, 2025

Current State of the PR

Known Issues

Proposed Modification

Uh oh!

marcsanmi commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonswine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jake-kramer commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonswine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jake-kramer commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jake-kramer commented Sep 17, 2025 •

edited

Loading

jake-kramer left a comment •

edited

Loading

marcsanmi commented Sep 26, 2025 •

edited

Loading

jake-kramer commented Oct 6, 2025 •

edited

Loading

jake-kramer commented Oct 8, 2025 •

edited

Loading