Skip to content

MSBuildNameIgnoreCaseComparer might be optimizable #10926

@rainersigwald

Description

@rainersigwald

tl;dr: I don't think we should change anything now but I did some perf investigation on MSBuildNameIgnoreCaseComparer.

/// This is a custom string comparer that has three advantages over the regular
/// string comparer:
/// 1) It can generate hash codes and perform equivalence operations on parts of a string rather than a whole
/// 2) It uses "unsafe" pointers to maximize performance of those operations
/// 3) It takes advantage of limitations on MSBuild Property/Item names to cheaply do case insensitive comparison.
/// </summary>
[Serializable]
internal class MSBuildNameIgnoreCaseComparer : IConstrainedEqualityComparer<string>, IEqualityComparer<string>

. . . but does that stuff hold true any more? There's been a bit of change since (checks notes) 2010, when it was introduced.

I pulled a copy into a tiny BDN application, seeded with ~30k actual comparisons sourced from a real-world .NET 9 build.

Instrumentation to collect that:

diff --git a/src/Shared/MSBuildNameIgnoreCaseComparer.cs b/src/Shared/MSBuildNameIgnoreCaseComparer.cs
index 9517b5f2646..23b11f824c5 100644
--- a/src/Shared/MSBuildNameIgnoreCaseComparer.cs
+++ b/src/Shared/MSBuildNameIgnoreCaseComparer.cs
@@ -3,6 +3,8 @@
 
 using System;
 using System.Collections.Generic;
+using System.IO;
+
 using Microsoft.Build.Shared;
 
 #nullable disable
@@ -19,6 +21,9 @@ namespace Microsoft.Build.Collections
     [Serializable]
     internal class MSBuildNameIgnoreCaseComparer : IConstrainedEqualityComparer<string>, IEqualityComparer<string>
     {
+        private StreamWriter EqualsLogger = new($@"S:\temp\MSBuildNameIgnoreCaseComparerEquals{Guid.NewGuid()}.txt");
+        private StreamWriter HashLogger = new($@"S:\temp\MSBuildNameIgnoreCaseComparerHash{Guid.NewGuid()}.txt");
+
         /// <summary>
         /// The processor architecture on which we are running, but default it will be x86
         /// </summary>
@@ -44,6 +49,8 @@ namespace Microsoft.Build.Collections
         /// </summary>
         public bool Equals(string compareToString, string constrainedString, int start, int lengthToCompare)
         {
+            EqualsLogger.Write($"{compareToString}\u001F{constrainedString}\u001F{start}\u001F{lengthToCompare}\u001E");
+
             if (lengthToCompare < 0)
             {
                 ErrorUtilities.ThrowInternalError("Invalid lengthToCompare '{0}' {1} {2}", constrainedString, start, lengthToCompare);
@@ -116,6 +123,8 @@ namespace Microsoft.Build.Collections
                 return 0; // per BCL convention
             }
 
+            EqualsLogger.Write($"{obj}\u001F{start}\u001F{length}\u001E");
+
             if ((s_runningProcessorArchitecture != NativeMethodsShared.ProcessorArchitectures.IA64)
                 && (s_runningProcessorArchitecture != NativeMethodsShared.ProcessorArchitectures.ARM))
             {

Benchmark program:
MSBuildNameIgnoreCaseComparer_Bench.zip

Benchmark results

13900K desktop

BenchmarkDotNet v0.14.0, Windows 11 (10.0.26120.2200)
Unknown processor
.NET SDK 9.0.100-rc.2.24474.11
  [Host]               : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
  .NET 9.0             : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9032.0), X64 RyuJIT VectorSize=256
Method Job Runtime Mean Error StdDev Ratio RatioSD
MSBuildNameIgnoreCase .NET 9.0 .NET 9.0 9.845 ns 0.1375 ns 0.1286 ns 1.00 0.02
BuiltIn .NET 9.0 .NET 9.0 9.147 ns 0.1347 ns 0.1194 ns 0.93 0.02
MSBuildNameIgnoreCase .NET Framework 4.7.2 .NET Framework 4.7.2 9.166 ns 0.1201 ns 0.1064 ns 1.00 0.02
BuiltIn .NET Framework 4.7.2 .NET Framework 4.7.2 9.506 ns 0.1478 ns 0.1382 ns 1.04 0.02

Dev box

BenchmarkDotNet v0.14.0, Windows 11 (10.0.26100.2033) (Hyper-V)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]               : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
  .NET 9.0             : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8.1 (4.8.9277.0), X64 RyuJIT VectorSize=256
Method Job Runtime Mean Error StdDev Ratio RatioSD
MSBuildNameIgnoreCase .NET 9.0 .NET 9.0 17.50 ns 0.342 ns 0.512 ns 1.00 0.04
BuiltIn .NET 9.0 .NET 9.0 16.61 ns 0.238 ns 0.199 ns 0.95 0.03
MSBuildNameIgnoreCase .NET Framework 4.7.2 .NET Framework 4.7.2 17.70 ns 0.114 ns 0.101 ns 1.00 0.01
BuiltIn .NET Framework 4.7.2 .NET Framework 4.7.2 19.12 ns 0.146 ns 0.122 ns 1.08 0.01

macOS ARM64

BenchmarkDotNet v0.14.0, macOS Sequoia 15.1 (24B83) [Darwin 24.1.0]
Apple M3 Pro, 1 CPU, 12 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]   : .NET 9.0.0 (9.0.24.47305), Arm64 RyuJIT AdvSIMD
  .NET 9.0 : .NET 9.0.0 (9.0.24.47305), Arm64 RyuJIT AdvSIMD
Method Mean Error StdDev Ratio RatioSD
MSBuildNameIgnoreCase 8.007 ns 0.0594 ns 0.0527 ns 1.00 0.01
BuiltIn 10.654 ns 0.2028 ns 0.1897 ns 1.33 0.02

Codespace

⚠️ net8.0 because I'm lazy!

BenchmarkDotNet v0.14.0, Ubuntu 20.04.6 LTS (Focal Fossa) (container)
AMD EPYC 7763, 1 CPU, 2 logical cores and 1 physical core
.NET SDK 8.0.402
  [Host]   : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2

Job=.NET 8.0  Runtime=.NET 8.0  
Method Mean Error StdDev Ratio RatioSD
MSBuildNameIgnoreCase 17.85 ns 0.424 ns 1.182 ns 1.00 0.09
BuiltIn 19.07 ns 0.481 ns 1.373 ns 1.07 0.10

Conclusions

So that's not very convincing; maaybe we should consider using OrdinalIgnoreCase on Windows/.NET, but it's a big loss in other cases.

I also poked around the same thing with GetHashCode with surprising results--I was hoping for a clean win on the case where we allocate a substring (since string.GetHashCode(ReadOnlySpan<char>) now exists) but we seem to do that quite rarely, averaging 5 bytes per call in the sample:

MSBuildNameIgnoreCaseComparer_Bench_with_hashes.zip

Method Runtime Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
MSBuildNameIgnoreCase_Hash .NET 9.0 5.6479 ns 0.0579 ns 0.0514 ns 5.6519 ns 0.696 0.01 - - NA
OrdinalIgnoreCase_Hash_Substring .NET 9.0 12.1174 ns 0.1419 ns 0.1328 ns 12.1209 ns 1.492 0.02 0.0002 5 B NA
OrdinalIgnoreCase_Hash_Span .NET 9.0 11.3895 ns 0.0930 ns 0.0776 ns 11.3921 ns 1.403 0.02 - - NA
MSBuildNameIgnoreCase_Hash .NET Framework 4.7.2 8.1208 ns 0.0942 ns 0.0881 ns 8.1265 ns 1.000 0.01 - - NA
OrdinalIgnoreCase_Hash_Substring .NET Framework 4.7.2 19.9626 ns 0.1757 ns 0.1643 ns 19.9121 ns 2.458 0.03 0.0007 5 B NA

We can get rid of those allocations but at the cost of almost doubling the compute time, so I guess we can keep allocating sometimes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions