Skip to content

Conversation

@aswaterman
Copy link
Collaborator

Dataflow through the PC is an ILP bottleneck. Reduce the critical code path by memoizing the likely next PC.

Shorten code paths through branch instructions by turning uncommon cases into tail calls (which gets rid of stack-frame allocations) and avoiding checking whether the Zca extension on every execution of a Zca branch.

Dataflow through the PC is an ILP bottleneck.  Reduce the critical code path
by memoizing the likely next PC.

The first time an instruction is cached, the assumption is that the likely
next PC is on the sequential path.  Whenever the instruction is executed and
the next PC was mispredicted, correct it.  This does the right thing for
jumps and heavily biased branches.  The misprediction penalty is low enough
that doing something smarter for less-biased branches is unprofitable.
It's easier to understand the instret handling in this scheme.
Avoids establishing a stack frame.
All of these are cases where other instructions overlay the reserved
encodings (e.g. C.EBREAK is C.JALR with rs1=x0), so the checks are
redundant.
@aswaterman aswaterman merged commit 34a42c3 into master Nov 19, 2025
3 checks passed
@aswaterman aswaterman deleted the speed-up-fetch branch November 19, 2025 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants