[AArch64] Adjust cache location for PC relative branches #7838

jim-saxman · 2017-05-12T16:15:55Z

AArch64 supports up to +/-128MB PC relative branches with 26-bit
immediates in the B and BL instructions, so make Hot, Prof, Main,
and Cold caches 128MB (total).

No new failures in the 'all' unit test suite, nor in the oss-
performance benchmarking suite. When combined with relocation, it
produces nice gains in oss.

AArch64 supports up to +/-128MB PC relative branches with 26-bit immediates in the B and BL instructions, so make Hot, Prof, Main, and Cold caches 128MB (total). No new failures in the 'all' unit test suite, nor in the oss- perfomrance benchmarking suite. When combined with relocation, it produces nice gains in oss.

jim-saxman · 2017-05-12T16:18:15Z

@mxw please review.
@swalk-cavium This patch goes well with relocation, #7661. Care to test it?

swalk-cavium · 2017-05-12T16:23:45Z

@jim-saxman - Is this branch complete? Or do I have to do the cherry-picking business on top of #7661.

jim-saxman · 2017-05-12T16:26:50Z

@swalk-cavium It has no dependencies against #7661 so it will apply cleanly either way. However you won't see any gains if you don't also have #7661 in your tree.

swalk-cavium · 2017-05-12T16:36:30Z

@jim-saxman - does this remove the restriction on retranslateAll()?

jim-saxman · 2017-05-12T16:43:44Z

@swalk-cavium No, AFAIK retranslateAll() is still broken on ARM.

mxw · 2017-05-12T17:11:49Z

We definitely can't just quit if the code size is too big—some applications might need all that code.

jim-saxman · 2017-05-12T17:19:47Z

hphp/runtime/vm/jit/code-cache.cpp

@@ -97,7 +98,14 @@ CodeCache::CodeCache()
    exit(1);


@mxw I tried to copy to functionality from here. Do you recommend printing the warning and not calling exit(1), or just removing this entire block?

That code quits if the code size is too small.

More generally, I don't think we should be altering these defaults on ARM, since your choice of platform isn't going to alter the reality of your codebase's size. We should be looking for ways to optimize in spite of the direct jump size restrictions.

Ok, I can accept that.

I've found that instead of shrinking the sizes of the various caches to fit within 128MB, I can change the memory mapping to place ACold before AProf. This allows direct jumps from AHot to reach ACold, and keeps the significant gains for OSS-mediawiki (in conjunction with #7661). Would an ARM specific change to the cache's layout be acceptable?

Whoa, sorry @jim-saxman; I must have missed the notification (or maybe it got dropped) for this comment.

That seems fine to me, although if the user configures HHVM to have larger code sizes than the defaults (which I suspect we may do internally), then you'll still run into the performance degradation with far jumps. There might be reasons not do to it that I'm not aware of, but if there are, someone will raise them once we see the PR.

cc-ing @dave-estes, @cmuellner, @swalk-cavium also—On the subject of avoiding long jumps: I think the most important thing is for hot and main to be within near jump distance of one another. Jumps to cold and frozen are supposed to be rare, so the cost penalty should be relatively insignificant in practice—though, of course, the far jump is a much bigger instruction sequence, which does still have a passive impact on icache behavior.

One idea we bounced around internally when discussing this issue on ARM was to dynamically allocate blocks of far jump trampolines which were always within a near jump of hot and main. Then, in the code we emit, we can just always emit a near jump, and then proxy through the trampoline if necessary. The static relocator could probably handle this, and it would obviate the dependency on dynamic relocation (though smashable jumps would still require dynamic relocation to shorten them, I'd expect). (Are calls PC-relative and limited in immediate size, also? Even if main itself grew too big, we could partition main into chunks with trampoline areas in between, and the same solution would work if we called across chunks in main.)

This is an area with substantial potential for perf improvement, so I'm interested in hearing other thoughts or ideas on the subject you all might have.

@mxw - Hi Max, et.al, In my previous life we called those trampolines branch islands. The linker would create a little table between object files for the case you describe above. Is there an easy way to instrument this an capture some data? It sure seems like most of the jumps are nearby, but that could be an artifact of just looking at small-ish tests.

@mxw Calls can be PC-relative with the BL instruction in which case the range is also +/-128MB, or through a Register using the BLR instructions.

We also have a few branch instructions with smaller immediates:

CBZ/CBNZ <Xt>, <label> Compare and branch if register Xt is zero/non-zero to label. They have a +/- 1MB range.

TBZ/TBNZ <Xt>, #<imm>, <label> Test bit numbered by imm in register Xt with and branch to label. They have a +/- 32KB range.

jim-saxman · 2017-05-15T05:38:16Z

hphp/runtime/vm/jit/code-cache.cpp

@@ -97,7 +98,14 @@ CodeCache::CodeCache()
    exit(1);


That code also exit()'s if code size is > 2GiB due to PC-relatives. Perhaps I marked the wrong exit(1)???

Anyway, I've also found that moving aHot around works well, but I'm afraid to do that since I don't know if __hot_start and __hot_end will be enabled, which could gain from having backward(to lower addresses) PC-relative branches from aHot into the __hot_start--__hot_end native functions.

swalk-cavium · 2017-05-15T15:08:14Z

@jim-saxman - I think I merged #7661 and this change correctly and ran a MOP
with 6 option sets. No new regressions were detected. I ran it on a Ubuntu 16.04
machine.

jim-saxman · 2017-05-15T15:36:52Z

@swalk-cavium Thanks!

This reverts commit d804b9b.

AArch64 supports up to +/-128MB PC relative branches with 26-bit immediates in the B and BL instructions, so move aHot closer to the aCold cache. No new failures in the 'all' unit test suite, nor in the oss- perfomrance benchmarking suite. When combined with relocation, it produces nice gains in oss.

hhvm-bot · 2017-05-24T17:23:11Z

@jim-saxman updated the pull request - view changes

jim-saxman · 2017-05-24T23:48:11Z

HI @mxw I've updated this PR by moving the Hot cache to between Main and Cold.

However, Dave and I still have a concern. Why are we running code from Cold often enough that this really makes such a difference? Specifically, is something being incorrectly placed in Cold with the hint "Unlikely" ?

I used #7703 from @swalk-cavium to dump the caches during the OSS/Mediawiki benchmark, and under 3% of the total cycles are spent in Cold). I then filtered by the addresses of the cold cache using the -a and -A flags, and the top hitter in Cold is part of is the Hooks::run php function (from the /tmp/hhvm-nginxxUDocX/mediawiki-1.28.0/includes/Hooks.php file line 131). I think, but I'm not sure, that the foreach on line 132 is in Hot, as is the return true on line 207, however the foreach's body from lines 133 to 205 is placed into Cold. This is true on ARM and x64. Why? I was expecting to find an exception handler or something else "unlikely", but finding the body of a foreach loop confuses me. Any light you can shine would be greatly appreciated!

hhvm-bot · 2017-05-26T19:31:47Z

@mxw has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jim-saxman · 2017-08-24T18:49:08Z

@mxw, Hi. What's the status on this patch? Would you like it changed in some fashion? I just re-benchmarked it last night and it still provides solid gains across OSS, especially Mediawiki and Drupal7.

jim-saxman · 2018-03-27T02:49:53Z

Hi @mxw Do I have a chance of getting this PR accepted? Are there any changes you'd like me to make?

hhvm-bot added GH Review: review-needed CLA Signed labels May 12, 2017

mxw self-assigned this May 12, 2017

mxw added GH Review: needs-revision and removed GH Review: review-needed labels May 12, 2017

jim-saxman commented May 12, 2017

View reviewed changes

jim-saxman commented May 15, 2017

View reviewed changes

jim-saxman added 2 commits May 24, 2017 12:56

Revert "[AArch64] Adjust cache sizes for PC relative branches"

64ae80e

This reverts commit d804b9b.

hhvm-bot added GH Review: review-needed and removed GH Review: needs-revision labels May 24, 2017

jim-saxman changed the title ~~[AArch64] Adjust cache sizes for PC relative branches~~ [AArch64] Adjust cache localtion for PC relative branches May 24, 2017

jim-saxman changed the title ~~[AArch64] Adjust cache localtion for PC relative branches~~ [AArch64] Adjust cache location for PC relative branches May 24, 2017

hhvm-bot added the Import Started label May 26, 2017

[AArch64] Adjust cache location for PC relative branches #7838

Are you sure you want to change the base?

[AArch64] Adjust cache location for PC relative branches #7838

Uh oh!

Conversation

jim-saxman commented May 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jim-saxman commented May 12, 2017

Uh oh!

swalk-cavium commented May 12, 2017

Uh oh!

jim-saxman commented May 12, 2017

Uh oh!

swalk-cavium commented May 12, 2017

Uh oh!

jim-saxman commented May 12, 2017

Uh oh!

mxw commented May 12, 2017

Uh oh!

jim-saxman May 12, 2017

Choose a reason for hiding this comment

Uh oh!

mxw May 12, 2017

Choose a reason for hiding this comment

Uh oh!

jim-saxman May 15, 2017

Choose a reason for hiding this comment

Uh oh!

mxw May 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swalk-cavium May 24, 2017

Choose a reason for hiding this comment

Uh oh!

jim-saxman May 31, 2017

Choose a reason for hiding this comment

Uh oh!

jim-saxman May 15, 2017

Choose a reason for hiding this comment

Uh oh!

swalk-cavium commented May 15, 2017

Uh oh!

jim-saxman commented May 15, 2017

Uh oh!

hhvm-bot commented May 24, 2017

Uh oh!

jim-saxman commented May 24, 2017

Uh oh!

hhvm-bot commented May 26, 2017

Uh oh!

jim-saxman commented Aug 24, 2017

Uh oh!

jim-saxman commented Mar 27, 2018

Uh oh!

Uh oh!

jim-saxman commented May 12, 2017 •

edited

Loading

mxw May 24, 2017 •

edited

Loading