Skip to content

[RyuJIT/ARM32] low performance compared to amd64 investigation, data memory barrier usage. #13482

Open
@viewizard

Description

@viewizard

Initial thread was started by @alpencolt #12361
Initial performance test results generated by @alpencolt
https://gist.github.com/alpencolt/0580af0be86e49bb9d89508dabcd8615

During arm32 performance investigation we found, that the one of the point of performance degradation is data memory barrier usage. Note, that in case of arm32 we use it for volatile variables, plus, it present in atomic memory access functions.

For example, __sync_val_compare_and_swap(value, comp_val, new_val) implementation for armv7 looks like:

  sub sp, sp, dotnet/coreclr#8
  movs r3, dotnet/coreclr#1
  add r1, sp, dotnet/coreclr#4
  str r3, [sp, dotnet/coreclr#4]
  movs r3, #0
  dmb ish
.L2:
  ldrex r2, [r1]
  cmp r2, dotnet/coreclr#5
  bne .L3
  strex r0, r3, [r1]
  cmp r0, #0
  bne .L2
.L3:
  dmb ish
  add sp, sp, dotnet/coreclr#8
  bx lr

in the same time, for arm64 we have

  mov QWORD PTR [rsp-8], 1
  xor edx, edx
  mov eax, 5
  lock cmpxchg QWORD PTR [rsp-8], rdx
  ret

We also compared the results of tests running with a setting flag COMPlus_JitNoMemoryBarriers and without it.
For example:
https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/corefx/System.Collections/Concurrent

System.Collections.Concurrent.Count

Results running with COMPlus_JitNoMemoryBarriers = "":

[2019/08/08 11:33:37][INFO] | Method | Size |     Mean |     Error |    StdDev |   Median |      Min |      Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 11:33:37][INFO] |------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 11:33:37][INFO] |  Stack |  512 | 1.103 us | 0.0014 us | 0.0012 us | 1.103 us | 1.102 us | 1.106 us |           - |           - |           - |                   - |

Results running with COMPlus_JitNoMemoryBarriers = 1:

[2019/08/08 11:49:19][INFO] | Method | Size |     Mean |     Error |    StdDev |   Median |      Min |      Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 11:49:19][INFO] |------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 11:49:19][INFO] |  Stack |  512 | 1.047 us | 0.0005 us | 0.0005 us | 1.046 us | 1.046 us | 1.047 us |           - |           - |           - |                   - |

System.Collections.Concurrent.IsEmpty

Results running with COMPlus_JitNoMemoryBarriers = "":

[2019/08/08 12:01:06][INFO] | Method | Size |     Mean |     Error |    StdDev |   Median |      Min |      Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 12:01:06][INFO] |------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 12:01:06][INFO] |  Stack |    0 | 62.26 ns | 0.8814 ns | 0.7360 ns | 61.98 ns | 61.78 ns | 63.90 ns |           - |           - |           - |                   - |
[2019/08/08 12:01:06][INFO] |  Stack |  512 | 67.73 ns | 4.5348 ns | 5.2223 ns | 65.76 ns | 63.02 ns | 76.57 ns |           - |           - |           - |                   - |

Results running with COMPlus_JitNoMemoryBarriers = 1:

[2019/08/08 12:08:37][INFO] | Method | Size |      Mean |     Error |    StdDev |    Median |       Min |      Max | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
[2019/08/08 12:08:37][INFO] |------- |----- |----------:|----------:|----------:|----------:|----------:|---------:|------------:|------------:|------------:|--------------------:|
[2019/08/08 12:08:37][INFO] |  Stack |    0 | 0.9811 ns | 0.0621 ns | 0.0581 ns | 0.9774 ns | 0.8880 ns | 1.080 ns |           - |           - |           - |                   - |
[2019/08/08 12:08:37][INFO] |  Stack |  512 | 0.9913 ns | 0.0864 ns | 0.0809 ns | 0.9951 ns | 0.8675 ns | 1.124 ns |           - |           - |           - |                   - |

category:cq
theme:barriers
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    JitUntriagedCLR JIT issues needing additional triagearch-arm32area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions