-
Notifications
You must be signed in to change notification settings - Fork 880
Add speculative loads support #3188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
340290c to
9bc67a2
Compare
9bc67a2 to
04cc91e
Compare
numero-744
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Unfortunately, I cannot review LSU's FSM as I am not familiar enough with it to understand the internal changes. I don't know enough about the TLB either.
Still, I'm wondering: in the execute stage, the branch unit answers immediately combinatorially, so do we need to go to a new state in LSU's FSM?
Is there a maximum frequency regression due to the ALU->BRANCH->LSU path, or is the logic of the FSM shallow enough?
How does the performance of this PR compare with the performance of the fast but wrong implementation where we only remove the stall condition from the issue stage?
Below are minor comments about the code.
Cc @Gchauvon
| output logic commit_lsu_o, | ||
| // Commit buffer of LSU is ready - EX_STAGE | ||
| input logic commit_lsu_ready_i, | ||
| // Transaction id of first commit port - ID_STAGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment does not match the declaration anymore, we should update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not accurate anymore
core/issue_stage.sv
Outdated
| output logic [5:0] orig_instr_aes_bits | ||
| output logic [5:0] orig_instr_aes_bits, | ||
| // Signals speculative loads to LSU for non-idempotent load handling - EX_STAGE | ||
| output logic speculative_load_o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speculative_load_o should be inserted just after lsu_valid_o IMO
core/issue_stage.sv
Outdated
| .rvfi_rs2_o (rvfi_rs2_o), | ||
| .orig_instr_aes_bits (orig_instr_aes_bits) | ||
| .orig_instr_aes_bits (orig_instr_aes_bits), | ||
| .speculative_load_o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speculative_load_o should be inserted just after lsu_valid_o IMO
core/cva6.sv
Outdated
| .rvfi_rs2_o (rvfi_rs2), | ||
| .orig_instr_aes_bits (orig_instr_aes) | ||
| .orig_instr_aes_bits (orig_instr_aes), | ||
| .speculative_load_o (speculative_load) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speculative_load_o should be inserted just after lsu_valid_o IMO
core/cva6.sv
Outdated
| .lsu_commit_i (lsu_commit_commit_ex), // from commit | ||
| .lsu_commit_ready_o (lsu_commit_ready_ex_commit), // to commit | ||
| .commit_tran_id_i (lsu_commit_trans_id), // from commit | ||
| .speculative_load_i (speculative_load), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speculative_load_i should be inserted just after lsu_valid_i IMO
It is connected to issue stage, not commit stage, right?
| // Signals speculative loads for non-idempotent load handling - ISSUE_STAGE | ||
| input logic speculative_load_i, | ||
| // Result from branch unit - EX_STAGE | ||
| input bp_resolve_t resolved_branch_i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this new input add a critical path inside the execute stage?
core/ex_stage.sv
Outdated
| .commit_i (lsu_commit_i), | ||
| .commit_ready_o (lsu_commit_ready_o), | ||
| .commit_tran_id_i, | ||
| .speculative_load_i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speculative_load_i should be inserted just after lsu_valid_i IMO
It is connected to issue stage, not commit stage, right?
core/load_store_unit.sv
Outdated
|
|
||
| input logic [CVA6Cfg.NrCommitPorts-1:0][CVA6Cfg.TRANS_ID_BITS-1:0] commit_tran_id_i, | ||
| // Signals speculative loads for non-idempotent load handling - ISSUE_STAGE | ||
| input logic speculative_load_i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speculative_load_i should be inserted just after lsu_valid_i IMO
It is connected to issue stage, not commit stage, right?
| input logic page_offset_matches_i, | ||
| // Store buffer is empty - STORE_UNIT | ||
| input logic store_buffer_empty_i, | ||
| // Transaction ID of the committing instruction - COMMIT_STAGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Transaction ID of the committing instruction - COMMIT_STAGE | |
| // Transaction ID of the committing instructions - COMMIT_STAGE |
core/load_unit.sv
Outdated
| end | ||
| assign inflight_stores = (!dcache_wbuffer_not_ni_i || !store_buffer_empty_i); | ||
| assign stall_ni = (inflight_stores || not_commit_time) && (paddr_ni && CVA6Cfg.NonIdemPotenceEn); | ||
| assign stall_ni = (inflight_stores || ¬_commit_time) && (paddr_ni && CVA6Cfg.NonIdemPotenceEn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the not_commit_time guaranteed that there was no previous instructions as it could be a branch miss or a exception.
Here ¬_commit_time = 1'b0 if the load is the second committing instruction, so there might be an uncommitted instruction before it.
I do not yet understand why is this change needed (e.g.: correctness or performance?) and why it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this change is not needed for speculative loads and now I also realized that this is wrong, so I'm removing this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm curious to know whether removing this part of the diff changes performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this part, performance are still the same
It should be possible to use the branch result combinationally to control the load unit FSM without a maximum frequency regression, however this combinational path will be longer than the one in this PR, since here the branch result is only used to update a register inside the lsu_bypass. This PR checks the branch result in the next cycle (and there is a new state in the load unit FSM to do this) so it should have a small performance disadvantage with respect to the incorrect implementation where only the stall condition from the issue stage is removed. However, in my testing I didn’t find any significant performance difference between these two implementations; both achieve around +7% CoreMark/MHz. I also observed the same performance results when comparing this PR to an implementation where the branch result is used to control the load unit FSM combinationally. |
core/ex_stage.sv
Outdated
| lsu_data = fu_data_i[1]; | ||
| lsu_tinst = tinst_i[1]; | ||
| if (CVA6Cfg.SpeculativeSb) begin | ||
| speculative_load = branch_valid_i[0] ? 1'b1 : 1'b0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| speculative_load = branch_valid_i[0] ? 1'b1 : 1'b0; | |
| speculative_load = branch_valid_i[0]; |
Let's avoid the "if true then true else false" tautology
|
LGTM except for the load unit FSM that I cannot review |
|
The LSU looks good to me. Have you done some ASIC or FPGA synthesis ? Have you been able to test it with the linux boot on FPGA ? |
Yes, I tested Linux boot on FPGA and ran an ASIC synthesis to confirm that the critical path is unaffected by these changes. |
|
Great ! Well it's ok for me. |
This PR introduces support for speculative loads, allowing dual issue of a control flow instruction and a load instruction.
Speculative loads improve Coremark performance by up to 7% with no impact on the CVA6 timing.
We have successfully tested the code on Linux and Splash3 benchmarks.