Skip to content

Performance: Consider removing the CurrentManagedThreadId "optimization" for generators #76962

Open
@hez2010

Description

@hez2010

With all the efforts around de-abstraction being put into the JIT of .NET 10 so far, we have made significant improvements on devirtualizing, inlining so that we can optimize all kinds of enumerators in the JIT.

Say we have a generator:

IEnumerable<int> GetNums()
{
    yield return 1;
    yield return 2;
    yield return 3;
}

Today Roslyn produces the following state machine for a generator:

sealed class _M_d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
    private int _1__state;
    private int _2__current;
    private int _l__initialThreadId;

    int IEnumerator<int>.Current =>  _2__current;

    object IEnumerator.Current => _2__current;

    public _M_d__0(int _1__state)
    {
        this._1__state = _1__state;
        _l__initialThreadId = Environment.CurrentManagedThreadId;
    }

    void IDisposable.Dispose() { }

    private bool MoveNext()
    {
        switch (_1__state)
        {
            default:
                return false;
            case 0:
                _1__state = -1;
                _2__current = 1;
                _1__state = 1;
                return true;
            case 1:
                _1__state = -1;
                _2__current = 2;
                _1__state = 2;
                return true;
            case 2:
                _1__state = -1;
                _2__current = 3;
                _1__state = 3;
                return true;
            case 3:
                _1__state = -1;
                return false;
        }
    }

    bool IEnumerator.MoveNext() => this.MoveNext();

    void IEnumerator.Reset() => throw new NotSupportedException();

    IEnumerator<int> IEnumerable<int>.GetEnumerator()
    {
        if (_1__state == -2 && _l__initialThreadId == Environment.CurrentManagedThreadId)
        {
            _1__state = 0;
            return this;
        }
        return new _M_d__0(0);
    }

    IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<int>)this).GetEnumerator();
}

There're two issues in the codegen:

  1. The branch in GetEnumerator makes the method too large to be inlined, resulting in the subsequent MoveNext call not able to be devirted.
  2. Environment.CurrentManagedThreadId is expensive.

We can inspect the performance with benchmark. In the benchmark I created three variants of the state machine:

  • WithThreadIdOpts: The current Roslyn codegen
  • WithThreadIdOptsAndAggressiveInliningOnGetEnumerator: Put an [MethodImpl(MethodImplOptions.AggressiveInlining)] on GetEnumerator
  • WithoutThreadIdOpts: Remove all CurrentManagedThreadId related code

The test code in the benchmark is:

var sum = 0;
var e = new _M_d__0(0); // instantiate the state machine, i.e., call GetNums. GetNums() is small so we always inline it in the JIT, hence the JIT can always devirt it regardless of whether it's a call or a direct object instantiation
foreach (var i in e) // GetEnumerator
{
    sum += i;
}
return sum;

And here is the benchmark result:

Method Mean Error StdDev Gen0 Code Size Allocated
WithThreadIdOpts 129.85 ns 1.322 ns 1.236 ns 0.0041 267 B 64 B
WithThreadIdOptsAndAggressiveInliningOnGetEnumerator 119.26 ns 0.786 ns 0.735 ns 0.0041 249 B 64 B
WithoutThreadIdOpts 22.52 ns 0.163 ns 0.145 ns 0.0015 180 B 24 B

We managed to boost the perf by ~4.5x via removing the invalid "optimization"!

The performance gain can also be observed from the JIT codegen: https://www.diffchecker.com/fxc8SJk4, where the left one is WithThreadIdOpts, while the right one is WithoutThreadIdOpts.

So here I'm suggesting omitting the CurrentManagedThreadId code from generators, and instead generating code like:

sealed class _M_d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
    private int _1__state;
    private int _2__current;

    int IEnumerator<int>.Current =>  _2__current;

    object IEnumerator.Current => _2__current;

    public _M_d__0(int _1__state)
    {
        this._1__state = _1__state;
    }

    void IDisposable.Dispose() { }

    private bool MoveNext()
    {
        switch (_1__state)
        {
            default:
                return false;
            case 0:
                _1__state = -1;
                _2__current = 1;
                _1__state = 1;
                return true;
            case 1:
                _1__state = -1;
                _2__current = 2;
                _1__state = 2;
                return true;
            case 2:
                _1__state = -1;
                _2__current = 3;
                _1__state = 3;
                return true;
            case 3:
                _1__state = -1;
                return false;
        }
    }

    bool IEnumerator.MoveNext() => this.MoveNext();

    void IEnumerator.Reset() => throw new NotSupportedException();

    IEnumerator<int> IEnumerable<int>.GetEnumerator()
    {
        return new _M_d__0(0);
    }

    IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<int>)this).GetEnumerator();
}

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area-CompilersCode Gen QualityRoom for improvement in the quality of the compiler's generated code

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions