-
Notifications
You must be signed in to change notification settings - Fork 870
Reducing NativeAOT binary size by getting rid of regex parsing #6849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for the PR. Some quick thoughts ahead of a fuller review next week:
|
|
In fact, for newer versions of .NET the Regex usage could be retained but use the Regex source generator. Native AoT size reduction isn't important for TFMs that doesn't work with it (i.e. .NET Framework) as it's not supported anyway. That would potentially make additional savings anyway (the source generator can generate simple C# equivalents for simple patterns) without the need to diverge from maintaining such code ourselves. I would suggest investigating the size improvements from just adopting the Regex source generator, rather than rewriting the code as in this PR in its current state. |
|
Hi, @martincostello |
|
I still think the best first step should be to just move to the Regex source generator as it's most similar to the existing code, then we could consider doing something else if needed. We already use the Regex source generator in open-telemetry/opentelemetry-dotnet-contrib, we just haven't used it here yet. If I was aware we had usages in this repo we could upgrade, I'd have already done it myself. Something to maintain (custom code) is always more expensive than something we don't (the generated code the .NET team supports). I'm happy to do a PR to do that as a precursor if that's not something you'd like to contribute yourself. |
|
Use `[GeneratedRegex]`, where supported, for statically-known regular expressions. See open-telemetry#6849. Co-Authored-By: Mihail Golubev <211329375+tetolv@users.noreply.github.com>
|
I opened a draft PR to investigate adopting the Regex source generator where possible: #6850 Overall with regards to native AoT file size only, the end result from my testing is that the size always goes up, so has the opposite goal of this PR. The apps I've tested with probably still have Regex usage in them somewhere anyway (so other 3rd party dependency, or within ASP.NET Core itself), so the increase is from the code for those Regexes that is paid at compile time rather than runtime (I imagine the total in-memory size would be pretty much equivalent). I haven't run any of the benchmarks to see if there's a performance benefit of using the source generator, though it is recommended for use by source analyzers which implies that swapping over is a net improvement in a general sense (even if it increases AoT codesize). For an app that doesn't use Regex anywhere, then I can see that removing Regex from the libraries here would save that size. However, that's effectively a ban on ever using Regexes in the codebase, otherwise it would regress this specific scenario/optimisation. While we could optimise the size now, I don't think we would want to constrain ourselves forward looking by preventing any future use of Regex just to save on the image size, where I imagine the usage/presence of Regex in applications using the OpenTelementry SDK in general is likely common. In other words, we could optimize the size by removing Regex, but it could get added back again later and render the work ultimately futile. |
|
I agree, that most probably your app already have compiled regex somewhere, so code for the source generated regexes was just added to existing code for compiled regexes, that's why you see binary size increase, instead of decrease. At the same time those services I'm using as a testbed, are intentionally kept minimalistic, so I notice every increase in it's size and try to deal with it, if possible. Also the modern tendency towards microservices presume, that apps (services) will be small and minimalistic, so they will definitely benefit from every size optimization on the OpenTelemetry behalf. Look, I have added my version of A question remain what to do with wildcard matching, because I don't see how it is possible to use source-generated regexes with it. |
|
Another area I'm looking at is mTLS support, recently added to |
|
Let's see what some of the other maintainers think first before sinking more time into this. Personally, I don't think we're currently at a point where the benefit of specifically considering code size impact for native AoT scenarios for any change/feature (e.g. it was not a consideration for supporting mTLS) made to the repository is worth the maintenance overhead/trade-off. |

I'm trying to reduce my NativeAOT application size and look for the areas where something could be trimmed off. One of such areas was OpenTelemetry .NET, which extensively uses Regex parsing. I decided to check how much of it could be trimmed of by replacing all the regular expressions with manual code.
Changes
There are three general usages of regular expressions in OpenTelemetry .NET and here are how I have dealt with each of them:
Instrument name validation #
Instrument name validation, has simple expression
^[a-z][a-z0-9-._/]{0,254}$, which actually means all the strings using ASCII letters, numbers, and non-delimiters, up to 254 symbols, except for the first symbol, that can be only ASCII letter. This pattern is rather simple to implement in code as doneOTEL_DIAGNOSTICS.jsonparsing #.json file itself is rather simple, only four predefined fields and it's not coming from outside, so security vulnerability concerns during parsing are low. My parsing approach might not be perfect, I have only tested it with some variations of the syntactically correct input, so more testing might be needed.
Wildcard match #
It is used, where a new registered trace, or meter is matched against a list of traces, or meters to listen. Here I have taken base source from the popular WildcardMatch Nuget and adapting it a little for the code standards of OpenTelemetry.NET project.
All of those usages are rather simple and could be replaced with manual code.
Unfortunately I have committed all three changes as a single commit, but I hope, that it is still possible to understand where's what.
Results
Using Sizoscope tool it is possible to compare what was included into NativeAOT binary before and after changes. For my changes I ended up with a following change summary:

System.Text.RegularExpressionsis completely gone, andSystem.Private.CoreLib, which contained a lot of generated collections classes used by compiled regex objects, is trimmed significantly. In total binary size is reduced by 734Kb, which in my opinion is a great result.Related issues
#5785