Description
Initial thread was started by @alpencolt #12361
Initial performance test results generated by @alpencolt
https://gist.github.com/alpencolt/0580af0be86e49bb9d89508dabcd8615
During arm32 performance investigation we found, that the one of the point of performance degradation is data memory barrier usage. Note, that in case of arm32 we use it for volatile variables, plus, it present in atomic memory access functions.
For example, __sync_val_compare_and_swap(value, comp_val, new_val)
implementation for armv7 looks like:
sub sp, sp, dotnet/coreclr#8
movs r3, dotnet/coreclr#1
add r1, sp, dotnet/coreclr#4
str r3, [sp, dotnet/coreclr#4]
movs r3, #0
dmb ish
.L2:
ldrex r2, [r1]
cmp r2, dotnet/coreclr#5
bne .L3
strex r0, r3, [r1]
cmp r0, #0
bne .L2
.L3:
dmb ish
add sp, sp, dotnet/coreclr#8
bx lr
in the same time, for arm64 we have
mov QWORD PTR [rsp-8], 1
xor edx, edx
mov eax, 5
lock cmpxchg QWORD PTR [rsp-8], rdx
ret
We also compared the results of tests running with a setting flag COMPlus_JitNoMemoryBarriers and without it.
For example:
https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/corefx/System.Collections/Concurrent
System.Collections.Concurrent.Count
Results running with COMPlus_JitNoMemoryBarriers = "":
[2019/08/08 11:33:37][INFO] | Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 11:33:37][INFO] |------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 11:33:37][INFO] | Stack | 512 | 1.103 us | 0.0014 us | 0.0012 us | 1.103 us | 1.102 us | 1.106 us | - | - | - | - |
Results running with COMPlus_JitNoMemoryBarriers = 1:
[2019/08/08 11:49:19][INFO] | Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 11:49:19][INFO] |------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 11:49:19][INFO] | Stack | 512 | 1.047 us | 0.0005 us | 0.0005 us | 1.046 us | 1.046 us | 1.047 us | - | - | - | - |
System.Collections.Concurrent.IsEmpty
Results running with COMPlus_JitNoMemoryBarriers = "":
[2019/08/08 12:01:06][INFO] | Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 12:01:06][INFO] |------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 12:01:06][INFO] | Stack | 0 | 62.26 ns | 0.8814 ns | 0.7360 ns | 61.98 ns | 61.78 ns | 63.90 ns | - | - | - | - |
[2019/08/08 12:01:06][INFO] | Stack | 512 | 67.73 ns | 4.5348 ns | 5.2223 ns | 65.76 ns | 63.02 ns | 76.57 ns | - | - | - | - |
Results running with COMPlus_JitNoMemoryBarriers = 1:
[2019/08/08 12:08:37][INFO] | Method | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 12:08:37][INFO] |------- |----- |----------:|----------:|----------:|----------:|----------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 12:08:37][INFO] | Stack | 0 | 0.9811 ns | 0.0621 ns | 0.0581 ns | 0.9774 ns | 0.8880 ns | 1.080 ns | - | - | - | - |
[2019/08/08 12:08:37][INFO] | Stack | 512 | 0.9913 ns | 0.0864 ns | 0.0809 ns | 0.9951 ns | 0.8675 ns | 1.124 ns | - | - | - | - |
category:cq
theme:barriers
skill-level:expert
cost:medium