Skip to content

Conversation

@tetolv
Copy link

@tetolv tetolv commented Jan 25, 2026

I'm trying to reduce my NativeAOT application size and look for the areas where something could be trimmed off. One of such areas was OpenTelemetry .NET, which extensively uses Regex parsing. I decided to check how much of it could be trimmed of by replacing all the regular expressions with manual code.

Changes

There are three general usages of regular expressions in OpenTelemetry .NET and here are how I have dealt with each of them:

Instrument name validation #

Instrument name validation, has simple expression ^[a-z][a-z0-9-._/]{0,254}$, which actually means all the strings using ASCII letters, numbers, and non-delimiters, up to 254 symbols, except for the first symbol, that can be only ASCII letter. This pattern is rather simple to implement in code as done

OTEL_DIAGNOSTICS.json parsing #

.json file itself is rather simple, only four predefined fields and it's not coming from outside, so security vulnerability concerns during parsing are low. My parsing approach might not be perfect, I have only tested it with some variations of the syntactically correct input, so more testing might be needed.

Wildcard match #

It is used, where a new registered trace, or meter is matched against a list of traces, or meters to listen. Here I have taken base source from the popular WildcardMatch Nuget and adapting it a little for the code standards of OpenTelemetry.NET project.
All of those usages are rather simple and could be replaced with manual code.

Unfortunately I have committed all three changes as a single commit, but I hope, that it is still possible to understand where's what.

Results

Using Sizoscope tool it is possible to compare what was included into NativeAOT binary before and after changes. For my changes I ended up with a following change summary:
image
System.Text.RegularExpressions is completely gone, and System.Private.CoreLib, which contained a lot of generated collections classes used by compiled regex objects, is trimmed significantly. In total binary size is reduced by 734Kb, which in my opinion is a great result.

Related issues

#5785

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 25, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions bot added the pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package label Jan 25, 2026
@martincostello
Copy link
Member

Thanks for the PR.

Some quick thoughts ahead of a fuller review next week:

  • If we take any of the changes that result in meaningful AoT size reductions, we should have a way to validate that doesn't regress. Someone could do a PR in the future to use a Regex again and undo the optimisations.
  • For the code taken from another repository, we should credit it with a comment in the source that includes the exact commit it was based on.
  • I wonder if we could consider (separate to this PR) if it would be better to depend on System.Text.Json to parse the configuration file, rather than hand-rolling code. It would be more robust/reliable and we could use the source generator in newer versions of .NET.

@martincostello
Copy link
Member

In fact, for newer versions of .NET the Regex usage could be retained but use the Regex source generator. Native AoT size reduction isn't important for TFMs that doesn't work with it (i.e. .NET Framework) as it's not supported anyway.

That would potentially make additional savings anyway (the source generator can generate simple C# equivalents for simple patterns) without the need to diverge from maintaining such code ourselves.

I would suggest investigating the size improvements from just adopting the Regex source generator, rather than rewriting the code as in this PR in its current state.

@tetolv
Copy link
Author

tetolv commented Jan 25, 2026

Hi, @martincostello
For now I just wanted to make sure, that you are interested in such kind of contribution.
I will investigate regex source generators approach in the next few days.
As for potential regressions, when developer will decide to include new Regex expressions in code, I guess, that it could be somehow checked via custom target in one of the .targets files. I will think about it...

@tetolv tetolv marked this pull request as ready for review January 25, 2026 11:35
@tetolv tetolv requested a review from a team as a code owner January 25, 2026 11:35
@tetolv
Copy link
Author

tetolv commented Jan 26, 2026

I have tried to use Regex source generation in SelfDiagnosticsConfigParser, which parses OTEL_DIAGNOSTICS.json file. The result, comparing to a my previous version with manual parsing is following:

  • baseline uses regex source generation
  • compare is manual parsing
image

Most of the additional weight in System.Private.CoreLib is from System.Buffers.SearchValues, which autogenerated code from System.Text.RegularExpressions.Generated is actively using.
Here lies an uncertainity of how much weight will actually be trimmed off. If OpenTelemetry .NET would also use System.Buffers.SearchValues elsewhere, then it would not be included in the difference. On the other hand when I build my NativeAOT binary, then my private code may use the same .NET functionality, that either compiled regex, or source generated regex are using, so this functionality will never appear as trimmed off. I believe, if I would make a dedicated bare-bone "Hello world" application, using OpenTelemetry .NET, then differences between all three solutions: custom code, compiled regexes, source generated regexes would be greater.

I haven't tried source generated regexes for other two cases, because I think that custom parsing for valid trace and meter names is simple enough already, and matching listener subscriptions to the meter and traces via wildcards is based on the proven wildcard matching code and shouldn't require much maintenance in future. Also it enables additional simplification elsewhere in code.

@martincostello
Copy link
Member

I still think the best first step should be to just move to the Regex source generator as it's most similar to the existing code, then we could consider doing something else if needed.

We already use the Regex source generator in open-telemetry/opentelemetry-dotnet-contrib, we just haven't used it here yet. If I was aware we had usages in this repo we could upgrade, I'd have already done it myself.

Something to maintain (custom code) is always more expensive than something we don't (the generated code the .NET team supports).

I'm happy to do a PR to do that as a precursor if that's not something you'd like to contribute yourself.

@tetolv
Copy link
Author

tetolv commented Jan 26, 2026

  • ✅ In case of instrument name regex we have a static regex ^[a-z][a-z0-9-._/]{0,254}$. It could be moved to [GeneratedRegex(...)] attribute
  • ✅ in case of OTEL_DIAGNOSTICS.json file parsing, we also have couple of static regexes for each of the fields (in fact I have tried a generalized regex \"(\w+)\"[\s\r\n]*:[\s\r\n]*\"?([^},]*?)["},], but that's optional)
  • ❌, but in case of wildcard matching of trace and meter names, to the sources, that user wants to listen to, it is not possible to get a static regex, which could be refactored into GeneratedRegexAttribute, because wildcard templates are provided at runtime. There we still have to come up with a different solution. Also I don't really like how all the sources, user wishes to listen to, are combined into one single regex, I think it's more complex solution, than to match every trace and meter against every source, that should be listened.

martincostello added a commit to martincostello/opentelemetry-dotnet that referenced this pull request Jan 27, 2026
Use `[GeneratedRegex]`, where supported, for statically-known regular expressions.

See open-telemetry#6849.

Co-Authored-By: Mihail Golubev <211329375+tetolv@users.noreply.github.com>
@martincostello
Copy link
Member

I opened a draft PR to investigate adopting the Regex source generator where possible: #6850

Overall with regards to native AoT file size only, the end result from my testing is that the size always goes up, so has the opposite goal of this PR.

The apps I've tested with probably still have Regex usage in them somewhere anyway (so other 3rd party dependency, or within ASP.NET Core itself), so the increase is from the code for those Regexes that is paid at compile time rather than runtime (I imagine the total in-memory size would be pretty much equivalent).

I haven't run any of the benchmarks to see if there's a performance benefit of using the source generator, though it is recommended for use by source analyzers which implies that swapping over is a net improvement in a general sense (even if it increases AoT codesize).

For an app that doesn't use Regex anywhere, then I can see that removing Regex from the libraries here would save that size. However, that's effectively a ban on ever using Regexes in the codebase, otherwise it would regress this specific scenario/optimisation.

While we could optimise the size now, I don't think we would want to constrain ourselves forward looking by preventing any future use of Regex just to save on the image size, where I imagine the usage/presence of Regex in applications using the OpenTelementry SDK in general is likely common.

In other words, we could optimize the size by removing Regex, but it could get added back again later and render the work ultimately futile.

@tetolv
Copy link
Author

tetolv commented Jan 27, 2026

I agree, that most probably your app already have compiled regex somewhere, so code for the source generated regexes was just added to existing code for compiled regexes, that's why you see binary size increase, instead of decrease. At the same time those services I'm using as a testbed, are intentionally kept minimalistic, so I notice every increase in it's size and try to deal with it, if possible. Also the modern tendency towards microservices presume, that apps (services) will be small and minimalistic, so they will definitely benefit from every size optimization on the OpenTelemetry behalf.
Another thing, that if Microsoft is pushing a shift from the compiled, or interpreted regexes toward source-generated ones, then we will see more and more of them in future apps.

Look, I have added my version of SelfDiagnosticsConfigParser, it uses single regex, instead of four separate, which makes class itself a little simpler and also will benefit from the binary size standpoint, because because four separate source-generated implementation are taking more space due to repetitive boilerplate code.

A question remain what to do with wildcard matching, because I don't see how it is possible to use source-generated regexes with it.

@tetolv
Copy link
Author

tetolv commented Jan 27, 2026

Another area I'm looking at is mTLS support, recently added to OpenTelemetry.Exporter.OpenTelemetryProtocol. I guess not many people are using it yet, but it adds 233Kb to my application. I'm thinking if it's possible to add it to the builder explicitly, so that static analyzer would be able to determine if mTLS support should be compiled in, or not.
Do you think it is worth it?

@martincostello
Copy link
Member

Let's see what some of the other maintainers think first before sinking more time into this.

Personally, I don't think we're currently at a point where the benefit of specifically considering code size impact for native AoT scenarios for any change/feature (e.g. it was not a consideration for supporting mTLS) made to the repository is worth the maintenance overhead/trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants