Skip to content

Make invalidation faster! #4999

@Sonicadvance1

Description

@Sonicadvance1

Code invalidation today is costly and takes too much time. This shows up drmatically in Dark Souls: Remastered where it invalidates ~700 times per second and causes the game's FPS to be utterly garbage.

Image

perf top shows what's going wrong

  10.02%  FEX                  [.] FEXCore::GuestToHostMap::Erase(FEXCore::Core::CpuStateFrame*, unsigned long, FEXCore::LookupCacheWriteLockToken const&)
   7.39%  FEX                  [.] FEXCore::Context::ContextImpl::InvalidateGuestCodeRange(FEXCore::Core::InternalThreadState*, std::vector<std::vector<unsigned long, fextl::FEXAlloc<unsigned long> >, fextl::FEXAlloc<std::vector<unsigned long, fextl::FEXAlloc<unsigned long> > > >&, unsigned long, unsigned long)
   4.78%  [kernel]             [k] unmap_page_range
   3.99%  [kernel]             [k] __wake_up_common_lock
   3.23%  [kernel]             [k] el0_svc
   2.07%  [kernel]             [k] mas_walk
   1.48%  FEX                  [.] FEXCore::IR::ConstrainedRAPass::Run(FEXCore::IR::IREmitter*)
   0.99%  [kernel]             [k] get_random_u16
   0.95%  [kernel]             [k] try_to_wake_up
   0.79%  libgcc_s.so.1        [.] 0x0000000000009a14
   0.76%  [kernel]             [k] __fget_light
   0.74%  FEX                  [.] unsigned long FEX::HLE::SyscallPassthrough3<212>(FEXCore::Core::CpuStateFrame*, unsigned long, unsigned long, unsigned long) requires (212)!=(-(1))

And FEX stats show it clearly.

Top 12 threads executing
[                                                ]: 0.03% (0 ms/S, 317600 cycles)
[                                                ]: 0.40% (4 ms/S, 4051390 cycles)
[                                                ]: 0.42% (4 ms/S, 4187150 cycles)
[                                                ]: 0.49% (4 ms/S, 4900050 cycles)
[                                                ]: 0.55% (5 ms/S, 5556570 cycles)
[                                                ]: 0.61% (6 ms/S, 6092310 cycles)
[                                                ]: 0.65% (6 ms/S, 6529080 cycles)
[                                                ]: 0.69% (6 ms/S, 6887220 cycles)
[                                                ]: 0.76% (7 ms/S, 7631600 cycles)
[▁                                               ]: 2.59% (25 ms/S, 25927000 cycles)
[▂                                               ]: 3.64% (36 ms/S, 36478180 cycles)
[█████████▅                                      ]: 26.69% (267 ms/S, 267222900 cycles)
Total (1000 millisecond sample period):
       JIT Time: 192.448310 ms/second (1.60 percent)
    Signal Time: 184.736040 ms/second (1.54 percent)
     SIGBUS Cnt: 1 (1.001366 per second)
        SMC Cnt: 670
  Softfloat Cnt: 0
  CacheMiss Cnt: 12524 (12541.103626 per second)
    $RDLck Time: 1.824620 ms/second (0.02 percent)
    $WRLck Time: 0.907340 ms/second (0.01 percent)
        JIT Cnt: 2669 (2672.644968 percent)
FEX JIT Load: 3.138916 (cycles: 377184350)

Total FEX Anon memory resident: 479 MiB
    JIT resident:             78 MiB
    OpDispatcher resident:    78 MiB
    Frontend resident:        18 MiB
    CPUBackend resident:      884 KiB
    Lookup cache resident:    0 KiB
    Lookup L1 cache resident: 42 MiB
    ThreadStates resident:    544 KiB
    BlockLinks resident:      13 MiB
          Misc resident:      23 MiB
    JEMalloc resident:        0 KiB
    Unaccounted resident:     223 MiB

To repro, just run Dark Souls Remastered, or create a bench that causes 700 invalidations a second.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions