Description
With all the efforts around de-abstraction being put into the JIT of .NET 10 so far, we have made significant improvements on devirtualizing, inlining so that we can optimize all kinds of enumerators in the JIT.
Say we have a generator:
IEnumerable<int> GetNums()
{
yield return 1;
yield return 2;
yield return 3;
}
Today Roslyn produces the following state machine for a generator:
sealed class _M_d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
private int _1__state;
private int _2__current;
private int _l__initialThreadId;
int IEnumerator<int>.Current => _2__current;
object IEnumerator.Current => _2__current;
public _M_d__0(int _1__state)
{
this._1__state = _1__state;
_l__initialThreadId = Environment.CurrentManagedThreadId;
}
void IDisposable.Dispose() { }
private bool MoveNext()
{
switch (_1__state)
{
default:
return false;
case 0:
_1__state = -1;
_2__current = 1;
_1__state = 1;
return true;
case 1:
_1__state = -1;
_2__current = 2;
_1__state = 2;
return true;
case 2:
_1__state = -1;
_2__current = 3;
_1__state = 3;
return true;
case 3:
_1__state = -1;
return false;
}
}
bool IEnumerator.MoveNext() => this.MoveNext();
void IEnumerator.Reset() => throw new NotSupportedException();
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
if (_1__state == -2 && _l__initialThreadId == Environment.CurrentManagedThreadId)
{
_1__state = 0;
return this;
}
return new _M_d__0(0);
}
IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<int>)this).GetEnumerator();
}
There're two issues in the codegen:
- The branch in
GetEnumerator
makes the method too large to be inlined, resulting in the subsequentMoveNext
call not able to be devirted. Environment.CurrentManagedThreadId
is expensive.
We can inspect the performance with benchmark. In the benchmark I created three variants of the state machine:
WithThreadIdOpts
: The current Roslyn codegenWithThreadIdOptsAndAggressiveInliningOnGetEnumerator
: Put an[MethodImpl(MethodImplOptions.AggressiveInlining)]
onGetEnumerator
WithoutThreadIdOpts
: Remove allCurrentManagedThreadId
related code
The test code in the benchmark is:
var sum = 0;
var e = new _M_d__0(0); // instantiate the state machine, i.e., call GetNums. GetNums() is small so we always inline it in the JIT, hence the JIT can always devirt it regardless of whether it's a call or a direct object instantiation
foreach (var i in e) // GetEnumerator
{
sum += i;
}
return sum;
And here is the benchmark result:
Method | Mean | Error | StdDev | Gen0 | Code Size | Allocated |
---|---|---|---|---|---|---|
WithThreadIdOpts | 129.85 ns | 1.322 ns | 1.236 ns | 0.0041 | 267 B | 64 B |
WithThreadIdOptsAndAggressiveInliningOnGetEnumerator | 119.26 ns | 0.786 ns | 0.735 ns | 0.0041 | 249 B | 64 B |
WithoutThreadIdOpts | 22.52 ns | 0.163 ns | 0.145 ns | 0.0015 | 180 B | 24 B |
We managed to boost the perf by ~4.5x via removing the invalid "optimization"!
The performance gain can also be observed from the JIT codegen: https://www.diffchecker.com/fxc8SJk4, where the left one is WithThreadIdOpts
, while the right one is WithoutThreadIdOpts
.
So here I'm suggesting omitting the CurrentManagedThreadId
code from generators, and instead generating code like:
sealed class _M_d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
{
private int _1__state;
private int _2__current;
int IEnumerator<int>.Current => _2__current;
object IEnumerator.Current => _2__current;
public _M_d__0(int _1__state)
{
this._1__state = _1__state;
}
void IDisposable.Dispose() { }
private bool MoveNext()
{
switch (_1__state)
{
default:
return false;
case 0:
_1__state = -1;
_2__current = 1;
_1__state = 1;
return true;
case 1:
_1__state = -1;
_2__current = 2;
_1__state = 2;
return true;
case 2:
_1__state = -1;
_2__current = 3;
_1__state = 3;
return true;
case 3:
_1__state = -1;
return false;
}
}
bool IEnumerator.MoveNext() => this.MoveNext();
void IEnumerator.Reset() => throw new NotSupportedException();
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
return new _M_d__0(0);
}
IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable<int>)this).GetEnumerator();
}
Activity