Skip to content

Comments

Cache StringNames and NodePaths in implicit casts from string in C##100452

Open
Joy-less wants to merge 1 commit intogodotengine:masterfrom
Joy-less:cache-stringnames-nodepaths
Open

Cache StringNames and NodePaths in implicit casts from string in C##100452
Joy-less wants to merge 1 commit intogodotengine:masterfrom
Joy-less:cache-stringnames-nodepaths

Conversation

@Joy-less
Copy link
Contributor

This pull request addresses godotengine/godot-proposals#10826 (please take a look).
To summarise, StringNames and NodePaths are now cached when implicitly casting from strings to avoid allocations like:

Input.IsActionPressed("action"); // Created a StringName.
GetNode("path/to/node"); // Created a NodePath.

This pull request adds a FusionCache dependency since there is no cache built-in to C#. This could be replaced with a MemoryCache dependency, or an in-house solution could be created using a ConcurrentDictionary.

In my opinion, StringNames (and NodePaths) do not make sense in C# and should be removed from the API. The reason is the garbage collection spike issues I detailed in the discussion. There are very few benefits to StringNames since identifiers are generally less than 20 characters long, and NodePaths don't seem to provide any value.

That being said, as long as StringNames and NodePaths remain in the API, implicit allocations should not exist, especially when it's so common to use string literals as StringNames and NodePaths.

If this pull request is added, the developer can still opt for a non-cached StringName or NodePath using new StringName(string) or new NodePath(string).

Another related issue is there is currently no equality operator between StringName and string, so comparing them will result in the string being converted to a StringName, which is not ideal. It may be a good idea to add an == operator to StringName which accepts a string and compares them by converting the StringName to a string (instead of converting the string to a StringName).

By no means is this the only solution to the problem - I am open to discussions about better alternatives.

@Joy-less
Copy link
Contributor Author

Joy-less commented Jan 4, 2025

I ran some benchmarks using BenchmarkDotNet.Godot with alternative caching mechanisms:

Method Mean Error StdDev Gen0 Gen1 Allocated
NoneTest 142.99 ms 2.141 ms 2.003 ms 3000.0000 2750.0000 19200760 B
FusionCacheTest 33.66 ms 0.072 ms 0.064 ms 12200.0000 - 38400494 B
MemoryCacheTest 19.34 ms 0.045 ms 0.040 ms 4062.5000 - 12800410 B
MemoryCacheStaticTest 17.63 ms 0.074 ms 0.058 ms - - 410 B
ConcurrentDictionaryTest 16.55 ms 0.022 ms 0.018 ms 4062.5000 - 12800410 B
ConcurrentDictionaryStaticTest 16.57 ms 0.010 ms 0.008 ms - - 410 B
DictionaryWithLockTest 16.95 ms 0.251 ms 0.235 ms - - 410 B
Benchmark Code
#nullable enable

using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Godot.Attributes;
using BenchmarkDotNet.Godot.Attributes.Jobs;
using Godot;
using Microsoft.Extensions.Caching.Memory;
using ZiggyCreatures.Caching.Fusion;

[MemoryDiagnoser, GodotSimpleJob]
public partial class StringNameBenchmark : Node {
    [Export] public int TestInteger = 53;

    public const int Count = 200_000;

    private readonly FusionCache FusionCache = new(new FusionCacheOptions());
    private readonly MemoryCache MemoryCache = new(new MemoryCacheOptions());
    private readonly ConcurrentDictionary<string, StringName> ConcurrentDictionary = [];
    private readonly Dictionary<string, StringName> Dictionary = [];
    private readonly Lock Lock = new();
    private string TargetString = "TestInteger";

    [GodotBenchmark]
    public void NoneTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(TargetString);
        }
    }
    [GodotBenchmark]
    public void FusionCacheTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(FusionCache.GetOrSet(TargetString, _ => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void MemoryCacheTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(MemoryCache.GetOrCreate(TargetString, _ => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void MemoryCacheStaticTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(MemoryCache.GetOrCreate(TargetString, static (ICacheEntry Entry) => (StringName)(string)Entry.Key));
        }
    }
    [GodotBenchmark]
    public void ConcurrentDictionaryTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ConcurrentDictionary.GetOrAdd(TargetString, _ => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void ConcurrentDictionaryStaticTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ConcurrentDictionary.GetOrAdd(TargetString, static (string Key) => (StringName)Key));
        }
    }
    [GodotBenchmark]
    public void DictionaryWithLockTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            lock (Lock) {
                if (!Dictionary.TryGetValue(TargetString, out StringName? Result)) {
                    Dictionary[TargetString] = (StringName)TargetString;
                }
                Get(Result);
            }
        }
    }
}

None is the worst, with large Generation 1 heap allocations and the slowest performance.

FusionCache is the second worst, likely because of two things:

  • It appears to be designed for ASP.NET to cache very slow things like web requests, rather than class allocations.
  • It has to allocate a delegate for the factory function.

I think the best option is to change to MemoryCache and use it like so:

string TargetString = "TestNumber";
StringName TargetStringName = StringNameCache.GetOrCreate(TargetString, static (ICacheEntry Entry) => new StringName((string)Entry.Key));

The static lambda avoids allocating any delegates. The unboxing should not be a problem since string is a reference type.

The implicit cast would end up looking like this:

public static implicit operator StringName(string from) => _stringNameCache.GetOrCreate(from, static (ICacheEntry entry) => new StringName((string)entry.Key))!;

@jodydonetti
Copy link

Hi all, FusionCache creator here: let me know if you need anything.

Just FYI: MemoryCache does not have cache stampede protection.

Also, after months of work, FusionCache V2 is about to be released in the next few weeks with a lot of extra features.

Here is the last preview.

Hope this helps!

@jodydonetti
Copy link

jodydonetti commented Jan 6, 2025

I don't know that much about Godot, so take this with a pinch of salt, but: for something so core and so core and so low level I would probably not use FusionCache, nor MemoryCache (its API surface area is not really great) nor anything like that at all.

I think I'd create something uber low level and 100% specialized for this exact purpose, basically something like a Dictionary/ConcurrentDictionary + massive use of ReadOnlySpan<T> or similar + some form of low level locking, if needed at all: for example if an extra string allocation in the worst case scenario is not the end of the world I may not even put a locking mechanism and just go with the native dictionary GetOrAdd method.

To be clear: I'm saying this not because FusionCache is bad, I mean, I created it (and FWIW I think it's really good 😬).

The thing is that it does a lot like L1+L2, backplane, fail-safe, eager-refresh, conditional refresh, auto-recovery and way more, and if what is needed here is "just" an in-memory dictionary with some concurrent access mechanism, that may be overkill.

Hope this helps.

@jodydonetti
Copy link

Hi @Joy-less

  • It has to allocate a delegate for the factory function.

I'm not sure I got this 100% right, but to be clear: FusionCache can of course work with a static delegate and no allocation at all.
To be more explicit, I have created some benchmarks to test exactly this with a so called "read happy path", and the memory allocated is exactly zero bytes.

Hope this helps.

@Joy-less
Copy link
Contributor Author

Joy-less commented Jan 8, 2025

I have changed the cache again to BitFaster.Caching :: ConcurrentLru after discussing with @jodydonetti. (RCache), a new .NET caching library, has been in development and is based on BitFaster.
ConcurrentLru is essentially a ConcurrentDictionary with a hard capacity, a least-recently-used eviction policy, and an optional expire after access time-eviction policy. This makes it very fast for this use case.

Benchmark code
#nullable enable

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Godot.Attributes;
using BenchmarkDotNet.Godot.Attributes.Jobs;
using BitFaster.Caching;
using BitFaster.Caching.Lfu;
using BitFaster.Caching.Lru;
using Godot;
using Microsoft.Extensions.Caching.Memory;
using ZiggyCreatures.Caching.Fusion;

[MemoryDiagnoser, GodotSimpleJob]
public partial class StringNameBenchmark : Node {
    [Export] public int TestInteger = 53;

    public const int Count = 200_000;

    private readonly FusionCache FusionCache = new(new FusionCacheOptions());
    private readonly MemoryCache MemoryCache = new(new MemoryCacheOptions());
    private readonly ConcurrentLru<string, StringName> ConcurrentLru = new(capacity: 1_000);
    private readonly ConcurrentLfu<string, StringName> ConcurrentLfu = new(capacity: 1_000);
    private readonly ICache<string, StringName> ExpiringConcurrentLru = new ConcurrentLruBuilder<string, StringName>()
        .WithCapacity(1_000)
        .WithExpireAfterAccess(TimeSpan.FromSeconds(30))
        .Build();
    private readonly ConcurrentDictionary<string, StringName> ConcurrentDictionary = [];
    private readonly Dictionary<string, StringName> Dictionary = [];
    private readonly Lock Lock = new();
    private string TargetString = "TestInteger";

    [GodotBenchmark]
    public void NoneTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(TargetString);
        }
    }
    [GodotBenchmark]
    public void FusionCacheTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(FusionCache.GetOrSet(TargetString, _ => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void MemoryCacheTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(MemoryCache.GetOrCreate(TargetString, _ => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void MemoryCacheStaticTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(MemoryCache.GetOrCreate(TargetString, static (ICacheEntry Entry) => (StringName)(string)Entry.Key));
        }
    }
    [GodotBenchmark]
    public void ConcurrentDictionaryTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ConcurrentDictionary.GetOrAdd(TargetString, _ => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void ConcurrentDictionaryStaticTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ConcurrentDictionary.GetOrAdd(TargetString, static (string Key) => (StringName)Key));
        }
    }
    [GodotBenchmark]
    public void DictionaryWithLockTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            lock (Lock) {
                if (!Dictionary.TryGetValue(TargetString, out StringName? Result)) {
                    Dictionary[TargetString] = (StringName)TargetString;
                }
                Get(Result);
            }
        }
    }
    [GodotBenchmark]
    public void ConcurrentLruTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ConcurrentLru.GetOrAdd(TargetString, static (string TargetString) => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void ConcurrentLfuTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ConcurrentLfu.GetOrAdd(TargetString, static (string TargetString) => (StringName)TargetString));
        }
    }
    [GodotBenchmark]
    public void ExpiringConcurrentLruTest() {
        for (int Counter = 0; Counter < Count; Counter++) {
            Get(ExpiringConcurrentLru.GetOrAdd(TargetString, static (string TargetString) => (StringName)TargetString));
        }
    }
}

Benchmark results:

Method Mean Error StdDev Median Gen0 Gen1 Allocated
NoneTest 155.71 ms 3.110 ms 4.460 ms 156.71 ms 3000.0000 2750.0000 19201012 B
FusionCacheTest 35.05 ms 0.690 ms 0.612 ms 34.77 ms 12200.0000 - 38400561 B
MemoryCacheTest 19.39 ms 0.363 ms 0.304 ms 19.29 ms 4062.5000 - 12800442 B
MemoryCacheStaticTest 17.84 ms 0.224 ms 0.175 ms 17.76 ms - - 442 B
ConcurrentDictionaryTest 17.01 ms 0.282 ms 0.264 ms 17.02 ms 4062.5000 - 12800442 B
ConcurrentDictionaryStaticTest 16.80 ms 0.150 ms 0.133 ms 16.83 ms - - 442 B
DictionaryWithLockTest 17.03 ms 0.323 ms 0.345 ms 17.06 ms - - 442 B
ConcurrentLruTest 16.56 ms 0.050 ms 0.039 ms 16.55 ms - - 442 B
ConcurrentLfuTest 18.23 ms 0.502 ms 1.466 ms 17.63 ms - - 54102 B
ExpiringConcurrentLruTest 17.04 ms 0.287 ms 0.268 ms 17.07 ms - - 442 B

Conclusion
I decided to pick a capacity of 1,000 to avoid storing a large number of StringNames and an expire-after-access of 30 seconds to avoid storing old and unused StringNames. These can be changed but I think these work nicely.
Like I said before, this caching can be easily avoided if desired by using the constructor: new StringName("...").
The result of this caching is a massive improvement in not only performance (9x faster) but also memory usage (19 MB -> 442 B), all while keeping the convenient implicit casting.

@Joy-less
Copy link
Contributor Author

Joy-less commented Jan 8, 2025

This can also be emulated before the pull request is accepted using the following helper functions:

Open me
global using static RecycleExtensions;

using System.Diagnostics.CodeAnalysis;
using BitFaster.Caching;
using BitFaster.Caching.Lru;

public static class RecycleExtensions {
    private static readonly ICache<string, StringName> StringNameCache = new ConcurrentLruBuilder<string, StringName>()
        .WithCapacity(1_000)
        .WithExpireAfterAccess(TimeSpan.FromSeconds(30))
        .Build();
    private static readonly ICache<string, NodePath> NodePathCache = new ConcurrentLruBuilder<string, NodePath>()
        .WithCapacity(1_000)
        .WithExpireAfterAccess(TimeSpan.FromSeconds(30))
        .Build();

    /// <summary>
    /// Converts a <see cref="string"/> to a <see cref="Godot.StringName"/>.<br/>
    /// The resulting <see cref="Godot.StringName"/> is temporarily cached for future casts.
    /// </summary>
    [return: NotNullIfNotNull(nameof(String))]
    public static StringName? StringName(string? String) {
        if (String is null) {
            return null;
        }
        return StringNameCache.GetOrAdd(String, static (string From) => new StringName(From));
    }
    /// <summary>
    /// Converts a <see cref="string"/> to a <see cref="Godot.NodePath"/>.<br/>
    /// The resulting <see cref="Godot.NodePath"/> is temporarily cached for future casts.
    /// </summary>
    [return: NotNullIfNotNull(nameof(String))]
    public static NodePath? NodePath(string? String) {
        if (String is null) {
            return null;
        }
        return NodePathCache.GetOrAdd(String, static (string From) => new NodePath(From));
    }
}

Use like so:

StringName Apple = StringName("apple");
StringName Banana = NodePath("banana");

@Joy-less
Copy link
Contributor Author

I removed the WithCapacity(1_000) because I thought it's too much of a magic number in comparison to WithExpireAfterAccess(TimeSpan.FromSeconds(30)), and I made the package private so it's not visible to Godot projects.

@Delsin-Yu
Copy link
Contributor

Delsin-Yu commented Mar 26, 2025

Is there a solid reason why we should invalidate the caches by time?

@Joy-less
Copy link
Contributor Author

Is there a solid reason why we should invalidate the caches by time?
@Delsin-Yu

The caches must be invalidated at some point to prevent Ruby's symbols denial-of-service problem (at least before mortal symbols were introduced).
As for the method of invalidation, time-based is the best solution I could find. Note that it invalidates 30s after the last access not after caching. The PR mainly prevents situations where you're creating a new StringName or NodePath every frame; if it's not used for 30s then it won't impact performance to expunge it from the cache. Ideally, it would take into account the amount of available memory, but there doesn't seem to be a good way to do this. I think the approach I suggested works for 99.99% of cases and is easy to workaround if someone wants to use their own caching system.

@Delsin-Yu
Copy link
Contributor

if it's not used for 30s then it won't impact performance to expunge it from the cache.

It almost sounds like your solution aims to address the use case where the engine user suddenly creates an unreasonable number of StringNames (or NodePaths). Is this common in a typical project?

The PR mainly prevents situations where you're creating a new StringName or NodePath every frame;

Generally, the tiered GC (as opposed to the Unity's legacy non-moving non-tiered GC) can take care of a small number (<500? benchmark needed) of these creations (or not to create stuttering), as tier 0 object GC is generally very fast.

@Joy-less
Copy link
Contributor Author

It almost sounds like your solution aims to address the use case where the engine user suddenly creates an unreasonable number of StringNames (or NodePaths). Is this common in a typical project?

Very common, any project which uses them for getting nodes, checking input actions, setting shader parameters, or using GDScript variables will benefit from this.

Generally, the tiered GC (as opposed to the Unity's legacy non-moving non-tiered GC) can take care of a small number (<500? benchmark needed) of these creations (or not to create stuttering), as tier 0 object GC is generally very fast.

Yes, C# is performant, but that doesn't mean it makes sense to make hidden allocations every frame for no reason.

@beicause
Copy link
Contributor

I'm curious how the performance would be if you used a lower number instead of 200,000 in previous benchmarks.

@Delsin-Yu
Copy link
Contributor

Delsin-Yu commented Mar 26, 2025

that doesn't mean it makes sense to make hidden allocations every frame for no reason.

I believe we need to first prove the performance issue exists before addressing them, 200_000 is not a fair number for benchmarking.

to make hidden allocations every frame for no reason.

The documentation did tell you to use the cached StringNames.

@Joy-less
Copy link
Contributor Author

Joy-less commented Mar 26, 2025

The benchmark project as requested by @Delsin-Yu:
stringnamestest.zip

@Joy-less
Copy link
Contributor Author

I'm curious how the performance would be if you used a lower number instead of 200,000 in previous benchmarks.

I made an updated benchmark that creates about 60 unique StringNames per frame to get a shader parameter:
stringnamestest.zip

image
image
image

@Delsin-Yu
Copy link
Contributor

The test numbers are here:

In general, any form of caching starts to make a difference when the numbers have reached 50_000 (on my end), so it helps more on eliminating GC spikes in common use cases.

Details
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19044.5608/21H2/November2021Update)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.200
  [Host] : .NET 9.0.2 (9.0.225.6610), X64 RyuJIT AVX2

Toolchain=InGodotProcessNoEmitToolchain  
Method Count Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
NoneTest 100 16.58 ms 0.027 ms 0.023 ms 1.00 0.00 - - 10000 B 1.00
FusionCacheTest 100 16.60 ms 0.011 ms 0.010 ms 1.00 0.00 - - 19631 B 1.96
MemoryCacheTest 100 16.59 ms 0.052 ms 0.041 ms 1.00 0.00 - - 6831 B 0.68
MemoryCacheStaticTest 100 16.60 ms 0.030 ms 0.027 ms 1.00 0.00 - - 431 B 0.04
ConcurrentDictionaryStaticTest 100 16.60 ms 0.023 ms 0.020 ms 1.00 0.00 - - 431 B 0.04
DictionaryWithLockTest 100 16.60 ms 0.014 ms 0.013 ms 1.00 0.00 - - 431 B 0.04
ConcurrentLruTest 100 16.59 ms 0.032 ms 0.026 ms 1.00 0.00 - - 431 B 0.04
ConcurrentLfuTest 100 16.59 ms 0.022 ms 0.018 ms 1.00 0.00 - - 493 B 0.05
ExpiringConcurrentLruTest 100 16.59 ms 0.015 ms 0.013 ms 1.00 0.00 - - 431 B 0.04
NoneTest 5000 16.46 ms 0.049 ms 0.043 ms 1.00 0.00 - - 480431 B 1.000
FusionCacheTest 5000 16.53 ms 0.022 ms 0.019 ms 1.00 0.00 31.2500 - 960431 B 1.999
MemoryCacheTest 5000 16.53 ms 0.035 ms 0.029 ms 1.00 0.00 - - 320431 B 0.667
MemoryCacheStaticTest 5000 16.57 ms 0.016 ms 0.015 ms 1.01 0.00 - - 431 B 0.001
ConcurrentDictionaryStaticTest 5000 16.56 ms 0.020 ms 0.018 ms 1.01 0.00 - - 431 B 0.001
DictionaryWithLockTest 5000 16.57 ms 0.020 ms 0.017 ms 1.01 0.00 - - 440 B 0.001
ConcurrentLruTest 5000 16.56 ms 0.040 ms 0.034 ms 1.01 0.00 - - 431 B 0.001
ConcurrentLfuTest 5000 16.55 ms 0.033 ms 0.029 ms 1.01 0.00 - - 3353 B 0.007
ExpiringConcurrentLruTest 5000 16.55 ms 0.044 ms 0.037 ms 1.01 0.00 - - 431 B 0.001
NoneTest 10000 16.76 ms 0.035 ms 0.033 ms 1.00 0.00 31.2500 - 960431 B 1.000
FusionCacheTest 10000 16.52 ms 0.011 ms 0.010 ms 0.99 0.00 93.7500 - 1920431 B 2.000
MemoryCacheTest 10000 16.51 ms 0.029 ms 0.024 ms 0.99 0.00 31.2500 - 640431 B 0.667
MemoryCacheStaticTest 10000 16.56 ms 0.011 ms 0.010 ms 0.99 0.00 - - 431 B 0.000
ConcurrentDictionaryStaticTest 10000 16.55 ms 0.031 ms 0.026 ms 0.99 0.00 - - 431 B 0.000
DictionaryWithLockTest 10000 16.56 ms 0.007 ms 0.006 ms 0.99 0.00 - - 431 B 0.000
ConcurrentLruTest 10000 16.54 ms 0.020 ms 0.018 ms 0.99 0.00 - - 431 B 0.000
ConcurrentLfuTest 10000 16.52 ms 0.068 ms 0.053 ms 0.99 0.00 - - 6313 B 0.007
ExpiringConcurrentLruTest 10000 16.53 ms 0.023 ms 0.021 ms 0.99 0.00 - - 431 B 0.000
NoneTest 50000 47.06 ms 0.927 ms 1.764 ms 1.00 0.00 272.7273 181.8182 4800612 B 1.000
FusionCacheTest 50000 47.79 ms 0.543 ms 0.508 ms 1.02 0.04 545.4545 - 9600612 B 2.000
MemoryCacheTest 50000 20.41 ms 0.320 ms 0.299 ms 0.44 0.02 187.5000 - 3200440 B 0.667
MemoryCacheStaticTest 50000 19.49 ms 0.269 ms 0.252 ms 0.42 0.02 - - 431 B 0.000
ConcurrentDictionaryStaticTest 50000 18.21 ms 0.235 ms 0.220 ms 0.39 0.02 - - 431 B 0.000
DictionaryWithLockTest 50000 18.26 ms 0.270 ms 0.252 ms 0.39 0.02 - - 431 B 0.000
ConcurrentLruTest 50000 19.40 ms 0.229 ms 0.214 ms 0.42 0.02 - - 431 B 0.000
ConcurrentLfuTest 50000 20.46 ms 0.078 ms 0.073 ms 0.44 0.02 - - 29586 B 0.006
ExpiringConcurrentLruTest 50000 19.33 ms 0.225 ms 0.210 ms 0.41 0.01 - - 431 B 0.000
NoneTest 100000 91.81 ms 1.829 ms 3.480 ms 1.00 0.00 500.0000 333.3333 9600843 B 1.000
FusionCacheTest 100000 96.25 ms 1.088 ms 1.017 ms 1.07 0.03 1000.0000 - 19200843 B 2.000
MemoryCacheTest 100000 40.41 ms 0.485 ms 0.454 ms 0.45 0.02 307.6923 - 6400570 B 0.667
MemoryCacheStaticTest 100000 38.75 ms 0.358 ms 0.335 ms 0.43 0.01 - - 570 B 0.000
ConcurrentDictionaryStaticTest 100000 36.57 ms 0.308 ms 0.288 ms 0.41 0.01 - - 553 B 0.000
DictionaryWithLockTest 100000 36.30 ms 0.484 ms 0.453 ms 0.40 0.01 - - 553 B 0.000
ConcurrentLruTest 100000 38.71 ms 0.432 ms 0.404 ms 0.43 0.01 - - 553 B 0.000
ConcurrentLfuTest 100000 40.96 ms 0.314 ms 0.279 ms 0.45 0.02 - - 59518 B 0.006
ExpiringConcurrentLruTest 100000 38.66 ms 0.529 ms 0.495 ms 0.43 0.01 - - 553 B 0.000
NoneTest 200000 182.50 ms 3.631 ms 6.454 ms 1.00 0.00 1000.0000 666.6667 19201349 B 1.000
FusionCacheTest 200000 190.42 ms 2.502 ms 2.341 ms 1.05 0.04 2000.0000 - 38401349 B 2.000
MemoryCacheTest 200000 80.64 ms 0.756 ms 0.670 ms 0.45 0.01 714.2857 - 12800770 B 0.667
MemoryCacheStaticTest 200000 77.39 ms 1.113 ms 1.041 ms 0.43 0.02 - - 770 B 0.000
ConcurrentDictionaryStaticTest 200000 73.00 ms 0.997 ms 0.933 ms 0.40 0.01 - - 770 B 0.000
DictionaryWithLockTest 200000 73.37 ms 1.442 ms 1.349 ms 0.41 0.02 - - 770 B 0.000
ConcurrentLruTest 200000 78.51 ms 1.490 ms 1.464 ms 0.43 0.01 - - 770 B 0.000
ConcurrentLfuTest 200000 82.03 ms 0.554 ms 0.491 ms 0.46 0.02 - - 118216 B 0.006
ExpiringConcurrentLruTest 200000 76.77 ms 0.999 ms 0.935 ms 0.43 0.01 - - 770 B 0.000

You may grab the test project an try it yourself:

stringnamestest.zip

This project also includes the ability to test GC spike interactively:

QQ2025326-172515.mp4

@Joy-less
Copy link
Contributor Author

Joy-less commented Mar 26, 2025

Since some people raised an issue with the BitFaster.Caching dependency, I've been working on an alternative implementation based on ConcurrentDictionary.

Code
public sealed class CustomCache<TKey, TValue> where TKey : notnull {
    public TimeSpan ExpireAfterAccessDuration { get; set; }
    public TimeSpan EvictInterval {
        get => TimeSpan.FromMilliseconds(EvictTimer.Interval);
        set => EvictTimer.Interval = value.TotalMilliseconds;
    }

    private readonly ConcurrentDictionary<TKey, Entry> Entries = [];
    private readonly IEnumerator<KeyValuePair<TKey, Entry>> EntriesEnumerator;
    private readonly System.Timers.Timer EvictTimer = new();
    private readonly Lock EvictLock = new();

    public CustomCache(TimeSpan expireAfterAccessDuration, TimeSpan evictInterval) {
        ExpireAfterAccessDuration = expireAfterAccessDuration;
        EvictInterval = evictInterval;

        EntriesEnumerator = Entries.GetEnumerator();

        EvictTimer.Elapsed += (_, _) => EvictExpired();
        EvictTimer.Start();
    }
    public TValue GetOrAdd(TKey key, Func<TKey, TValue> getValue) {
        Entry Entry = Entries.GetOrAdd(key, static (TKey key, Func<TKey, TValue> getValue) => {
            return new Entry() {
                Value = getValue(key),
                Timestamp = Environment.TickCount64
            };
        }, getValue);
        Entry.Timestamp = Environment.TickCount64;
        return Entry.Value;
    }
    public void EvictExpired() {
        lock (EvictLock) {
            // Get each cache entry
            while (EntriesEnumerator.MoveNext()) {
                (TKey key, Entry entry) = EntriesEnumerator.Current;
                // Evict if expired
                if (TimeSpan.FromTicks(Environment.TickCount64 - entry.Timestamp) > ExpireAfterAccessDuration) {
                    Entries.TryRemove(key, out _);
                }
            }
            EntriesEnumerator.Reset();
        }
    }
    public void EvictAll() {
        lock (EvictLock) {
            Entries.Clear();
        }
    }

    private sealed class Entry {
        public required TValue Value { get; set; }
        public required long Timestamp { get; set; }
    }
}
Method Mean Error StdDev Gen0 Allocated
BitFasterCaching 82.05 us 0.170 us 0.151 us - -
Custom 84.50 us 0.135 us 0.127 us - -
ConcurrentDictionary 48.82 us 0.098 us 0.091 us - -
None 82.94 us 0.467 us 0.437 us 100.9521 316800 B

The implementation uses GetOrAdd to add the value (StringName/NodePath) with a timestamp based on Environment.TickCount64 (faster than Stopwatch.GetTimeStamp() but only precise to 16ms). It uses a timer to call the eviction method every interval, which enumerates through each entry and removes expired entries. The enumerator is reused to prevent allocations.

If using the alternative implementation, the timer should likely be replaced with a native Godot timer if possible.

@beicause
Copy link
Contributor

I believe caching can bring enough performance improvement, but I wonder if it's possible to implement a more fundamental solution to reduce the overhead of implicit conversion.
Compared with the implicit conversion of StringName in GDScript, how much slower it is in C#?

@Joy-less
Copy link
Contributor Author

I believe caching can bring enough performance improvement, but I wonder if it's possible to implement a more fundamental solution to reduce the overhead of implicit conversion.
Compared with the implicit conversion of StringName in GDScript, how much slower it is in C#?

In GDScript it's not as big of a problem for three reasons:

  1. StringNames/NodePaths are built into the language, so you have &StringName and ^NodePath syntax and also the parser can detect implicit casts.
  2. GDScript uses reference counting which results in slower performance but not garbage collection spikes. (I think)
  3. GDScript is closer to C++ so it only has to maintain one StringName (rather than marshalling a C# copy)

@Joy-less
Copy link
Contributor Author

I refined the custom caching implementation into two alternative implementations. One uses a ConcurrentDictionary and one uses a Dictionary and a PriorityQueue.

Implementation 1
public sealed class CustomCache1<TKey, TValue> where TKey : notnull {
    public TimeSpan ExpireAfterAccessDuration { get; set; }
    public TimeSpan EvictInterval { get; set; }

    private readonly ConcurrentDictionary<TKey, Entry> Entries = [];
    private readonly IEnumerator<KeyValuePair<TKey, Entry>> EntriesEnumerator;
    private readonly Lock EvictExpiredLock = new();
    private long LastEvictionTimestamp;

    public CustomCache1(TimeSpan expireAfterAccessDuration, TimeSpan evictInterval) {
        ExpireAfterAccessDuration = expireAfterAccessDuration;
        EvictInterval = evictInterval;

        EntriesEnumerator = Entries.GetEnumerator();
    }
    public TValue GetOrAdd(TKey key, Func<TKey, TValue> getValue) {
        // Evict expired entries every interval
        long tickCount64 = System.Environment.TickCount64;
        if (TimeSpan.FromTicks(tickCount64 - LastEvictionTimestamp) >= EvictInterval) {
            LastEvictionTimestamp = tickCount64;
            EvictExpired();
        }

        // Get or add entry
        Entry entry = Entries.GetOrAdd(key, static (TKey key, Func<TKey, TValue> getValue) => {
            return new Entry() {
                Value = getValue(key),
                Timestamp = System.Environment.TickCount64,
            };
        }, getValue);
        // Update access timestamp
        entry.Timestamp = tickCount64;
        return entry.Value;
    }
    public void Evict(TKey key) {
        Entries.Remove(key, out _);
    }
    public void EvictExpired() {
        lock (EvictExpiredLock) {
            // Get each cache entry
            while (EntriesEnumerator.MoveNext()) {
                (TKey key, Entry entry) = EntriesEnumerator.Current;
                // Evict if expired
                if (TimeSpan.FromTicks(System.Environment.TickCount64 - entry.Timestamp) >= ExpireAfterAccessDuration) {
                    Entries.TryRemove(key, out _);
                }
            }
            EntriesEnumerator.Reset();
        }
    }
    public void EvictAll() {
        Entries.Clear();
    }

    private sealed class Entry {
        public required TValue Value { get; set; }
        public required long Timestamp { get; set; }
    }
}
Implementation 2
public sealed class CustomCache2<TKey, TValue> where TKey : notnull {
    public TimeSpan ExpireAfterAccessDuration { get; set; }
    public TimeSpan AutoEvictInterval { get; set; }

    private readonly Dictionary<TKey, TValue> Values = [];
    private readonly PriorityQueue<TKey, long> Timestamps = new(new TimestampComparer());
    private readonly Lock Lock = new();
    private long LastEvictionTimestamp = System.Environment.TickCount64;

    public CustomCache2(TimeSpan expireAfterAccessDuration, TimeSpan minimumEvictInterval) {
        ExpireAfterAccessDuration = expireAfterAccessDuration;
        AutoEvictInterval = minimumEvictInterval;
    }
    public TValue GetOrAdd(TKey key, Func<TKey, TValue> getValue) {
        lock (Lock) {
            // Evict expired entries every interval
            long tickCount64 = System.Environment.TickCount64;
            if (TimeSpan.FromTicks(tickCount64 - LastEvictionTimestamp) > AutoEvictInterval) {
                LastEvictionTimestamp = tickCount64;
                EvictExpired();
            }

            // Try fetch from cache
            if (Values.TryGetValue(key, out TValue? value)) {
                // Update last access timestamp
                Timestamps.DequeueEnqueue(key, System.Environment.TickCount64);
                return value;
            }

            // Add to cache
            value = getValue(key);
            Values[key] = value;
            Timestamps.Enqueue(key, System.Environment.TickCount64);
            return value;
        }
    }
    public void Evict(TKey key) {
        lock (Lock) {
            Values.Remove(key);
            Timestamps.Remove(key, out _, out _);
        }
    }
    public void EvictExpired() {
        lock (Lock) {
            while (Timestamps.TryPeek(out TKey? key, out long timestamp)) {
                // Get time elapsed since last access
                TimeSpan timeElapsed = TimeSpan.FromTicks(System.Environment.TickCount64 - timestamp);

                // Ensure enough time elapsed
                if (timeElapsed < ExpireAfterAccessDuration) {
                    break;
                }

                // Evict from cache and move on to next entry
                Values.Remove(key);
                Timestamps.Dequeue();
            }
        }
    }
    public void EvictAll() {
        lock (Lock) {
            Values.Clear();
            Timestamps.Clear();
        }
    }

    private readonly struct TimestampComparer : IComparer<long> {
        public int Compare(long x, long y) {
            // Older timestamps have higher priority than newer timestamps
            return y.CompareTo(x);
        }
    }
}
Method Mean Error StdDev Gen0 Gen1 Allocated
NoneTest 335.1 μs 6.22 μs 8.93 μs 0.9766 0.4883 6480 B
ExpiringConcurrentLruTest 271.5 μs 4.06 μs 3.39 μs - - 337 B
Custom1Test 288.4 μs 2.70 μs 2.11 μs - - 337 B
Custom2Test 299.1 μs 2.74 μs 2.29 μs - - 337 B

The benchmarks for Custom1 and Custom2 include evicting expired entries. Note that BitFaster.Caching/ExpiringConcurrentLru is currently handicapped because it has not yet upgraded from net6.0.

BitFaster.Caching still comes out on top (especially at the cache level) but the benefits of any cache are more significant. Custom1 (the ConcurrentDictionary approach) appears to be more performant than Custom2, likely due to reduced locking.

So! I can implement either of these or stick with the BitFaster.Caching dependency.

Benchmarks: stringnamestest.zip

@beicause
Copy link
Contributor

beicause commented Mar 27, 2025

This is a patch using WeakReference and ConcurrentDictionary as cache:

Details
diff --git a/modules/mono/glue/GodotSharp/GodotSharp/Core/NodePath.cs b/modules/mono/glue/GodotSharp/GodotSharp/Core/NodePath.cs
index 0af640533d..68f7f37c12 100644
--- a/modules/mono/glue/GodotSharp/GodotSharp/Core/NodePath.cs
+++ b/modules/mono/glue/GodotSharp/GodotSharp/Core/NodePath.cs
@@ -1,4 +1,5 @@
 using System;
+using System.Collections.Concurrent;
 using System.Diagnostics.CodeAnalysis;
 using Godot.NativeInterop;
 
@@ -48,8 +49,13 @@ namespace Godot
 
         private WeakReference<IDisposable>? _weakReferenceToSelf;
 
+        private string? _cache_key;
+
+        private static readonly ConcurrentDictionary<string, WeakReference<NodePath>> NodePathCache = new();
+
         ~NodePath()
         {
+            if (_cache_key != null) NodePathCache.TryRemove(_cache_key, out _);
             Dispose(false);
         }
 
@@ -129,16 +135,32 @@ namespace Godot
         }
 
         /// <summary>
-        /// Converts a string to a <see cref="NodePath"/>.
+        /// Converts a <see cref="string"/> to a <see cref="NodePath"/>.<br/>
+        /// The resulting <see cref="NodePath"/> is temporarily cached for future casts.
         /// </summary>
         /// <param name="from">The string to convert.</param>
-        public static implicit operator NodePath(string from) => new NodePath(from);
+        public static implicit operator NodePath(string from)
+        {
+            if (NodePathCache.TryGetValue(from, out WeakReference<NodePath>? weakref) && weakref != null)
+            {
+                if (weakref.TryGetTarget(out NodePath? val) && val != null)
+                {
+                    return val;
+                }
+            }
+            var ret = new NodePath(from)
+            {
+                _cache_key = from
+            };
+            NodePathCache[from] = new(ret);
+            return ret;
+        }
 
         /// <summary>
-        /// Converts this <see cref="NodePath"/> to a string.
+        /// Converts a <see cref="NodePath"/> to a <see cref="string"/>.
         /// </summary>
         /// <param name="from">The <see cref="NodePath"/> to convert.</param>
-        [return: NotNullIfNotNull("from")]
+        [return: NotNullIfNotNull(nameof(from))]
         public static implicit operator string?(NodePath? from) => from?.ToString();
 
         /// <summary>
diff --git a/modules/mono/glue/GodotSharp/GodotSharp/Core/StringName.cs b/modules/mono/glue/GodotSharp/GodotSharp/Core/StringName.cs
index 21d9ada127..aa058e3a41 100644
--- a/modules/mono/glue/GodotSharp/GodotSharp/Core/StringName.cs
+++ b/modules/mono/glue/GodotSharp/GodotSharp/Core/StringName.cs
@@ -1,4 +1,5 @@
 using System;
+using System.Collections.Concurrent;
 using System.Diagnostics.CodeAnalysis;
 using Godot.NativeInterop;
 
@@ -19,8 +20,13 @@ namespace Godot
 
         private WeakReference<IDisposable>? _weakReferenceToSelf;
 
+        private string? _cache_key;
+
+        private static readonly ConcurrentDictionary<string, WeakReference<StringName>> StringNameCache = new();
+
         ~StringName()
         {
+            if (_cache_key != null) StringNameCache.TryRemove(_cache_key, out _);
             Dispose(false);
         }
 
@@ -75,16 +81,32 @@ namespace Godot
         }
 
         /// <summary>
-        /// Converts a string to a <see cref="StringName"/>.
+        /// Converts a <see cref="string"/> to a <see cref="StringName"/>.<br/>
+        /// The resulting <see cref="StringName"/> is temporarily cached for future casts.
         /// </summary>
         /// <param name="from">The string to convert.</param>
-        public static implicit operator StringName(string from) => new StringName(from);
+        public static implicit operator StringName(string from)
+        {
+            if (StringNameCache.TryGetValue(from, out WeakReference<StringName>? weakref) && weakref != null)
+            {
+                if (weakref.TryGetTarget(out StringName? val) && val != null)
+                {
+                    return val;
+                }
+            }
+            var ret = new StringName(from)
+            {
+                _cache_key = from
+            };
+            StringNameCache[from] = new(ret);
+            return ret;
+        }
 
         /// <summary>
-        /// Converts a <see cref="StringName"/> to a string.
+        /// Converts a <see cref="StringName"/> to a <see cref="string"/>.
         /// </summary>
         /// <param name="from">The <see cref="StringName"/> to convert.</param>
-        [return: NotNullIfNotNull("from")]
+        [return: NotNullIfNotNull(nameof(from))]
         public static implicit operator string?(StringName? from) => from?.ToString();
 
         /// <summary>
@@ -95,7 +117,10 @@ namespace Godot
         {
             if (IsEmpty)
                 return string.Empty;
-
+            if (_cache_key != null)
+            {
+                return _cache_key;
+            }
             var src = (godot_string_name)NativeValue;
             NativeFuncs.godotsharp_string_name_as_string(out godot_string dest, src);
             using (dest)

The following two tests are in the release mode, where the improvement by cache is more significant.

// * Summary *

BenchmarkDotNet v0.13.12, EndeavourOS
AMD Ryzen 7 4800U with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.202
  [Host] : .NET 9.0.3 (9.0.325.11113), X64 RyuJIT AVX2

Toolchain=InGodotProcessNoEmitToolchain  

| Method                         | Count | Mean     | Error     | StdDev    | Ratio | RatioSD | Gen0     | Gen1     | Allocated | Alloc Ratio |
|------------------------------- |------ |---------:|----------:|----------:|------:|--------:|---------:|---------:|----------:|------------:|
| NoneTest                       | 10000 | 9.644 ms | 0.2711 ms | 0.7992 ms |  1.00 |    0.00 | 140.6250 | 125.0000 |  960373 B |       1.000 |
| ConcurrentDictionaryStaticTest | 10000 | 1.366 ms | 0.0284 ms | 0.0838 ms |  0.14 |    0.02 |        - |        - |     342 B |       0.000 |
// * Summary *

BenchmarkDotNet v0.13.12, EndeavourOS
AMD Ryzen 7 4800U with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.202
  [Host] : .NET 9.0.3 (9.0.325.11113), X64 RyuJIT AVX2

Toolchain=InGodotProcessNoEmitToolchain  

| Method                         | Count | Mean     | Error     | StdDev    | Ratio | RatioSD | Allocated | Alloc Ratio |
|------------------------------- |------ |---------:|----------:|----------:|------:|--------:|----------:|------------:|
| ReuseTest                      | 10000 | 1.247 ms | 0.0655 ms | 0.1772 ms |  1.00 |    0.00 |     340 B |        1.00 |
| WeakRefTest                    | 10000 | 1.245 ms | 0.0249 ms | 0.0650 ms |  1.01 |    0.13 |     342 B |        1.01 |
| ConcurrentDictionaryStaticTest | 10000 | 1.335 ms | 0.0369 ms | 0.1071 ms |  1.09 |    0.18 |     342 B |        1.01 |
| ExpiringConcurrentLruTest      | 10000 | 1.467 ms | 0.0292 ms | 0.0706 ms |  1.20 |    0.16 |     342 B |        1.01 |

In my opinion, compared with time-based caching, WeakReference caching is good as well.

@Joy-less
Copy link
Contributor Author

@beicause Nice work! One possible problem with weak reference caching is that, if the garbage collector runs often, then string names / node paths could be collected and reallocated more often than they should. I think it could still be more ideal though since it will take into account the available memory of the system.

@neikeq
Copy link
Contributor

neikeq commented Feb 19, 2026

See my comment to discuss other possible solutions: godotengine/godot-proposals#10826 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants