Add matrix address optimization phase for Zba by Alexehv77 · Pull Request #261 · foss-for-synopsys-dwc-arc-processors/gcc

Alexehv77 · 2026-05-26T08:05:09Z

This pass identifies addressing patterns in memory-access intensive kernels. In such kernels an index is frequently scaled and added to a base pointer to access memory followed by an immediate index update. The pass hoists the pointer arithmetic out of the loop header converting index addressing into induction variables.

MichielDerhaeg · 2026-05-27T14:53:37Z

+/* This pass identifies addressing patterns in memory-access intensive 
+   kernels. In such kernels an index is frequently scaled and added to a 
+   base pointer to access memory followed by an immediate index update.
+   The pass hoists the pointer arithmetic out of the loop header converting
+   index addressing into induction variables. */
+


On #241 I asked what the point of this pass was. This looks a lot like something ivopts is for. Why make something specialized for RTL?

In the context of loops with unsigned loop bounds, ivopts can not do much because scalar evolution can not calculate the upper iteration bound of a loop - as the loop iterator may overflow. Because of this i tried to fix things at rtl level.

As you were trying to make use of loop versioning at the tree level (where ivopts takes place) I started approaching the performance issues from the rtl level by analyzing the generated assembly and comparing it to what llvm and gcc with signed loop bounds were generating.

This phase combined with the patch from #261 managed to reach same performance as if the loop bounds were signed int.

Also remark that ivopts is not aware of sh1add.uw and addw and therefore its cost model can not accurately calculate whether hoisting them is cheaper than leaving them inside the loop.

In conclusion this phase is complementary to ivopts as it targets a specific peephole (or combine) type of optimization being able to reason without the issues triggered by the unsigned loop bounds.
Remark here that combine and peephole wont apply here as they work on instruction windows of 2-4 instructions while the pattern i recognize here is a dependency chain of several instructions that spans over bb or even loop boundaries.

Also another interesting issue is that i allow the optimization to take place only for leaf functions (this ensures no function calls happen inside - and as a result no need to spill caller saved registers) to decrease the chance of introducing spill code. On the contrary i am familiar that although ivopts (as running at the tree level) does take register pressure into account it does it in simplistic way which in return has the potential to result in a large amount of spill code.

Last but not least - there is a jira ticket where i proposed another solution for dealing with loops with unsigned bounds that doesnt require loop versioning. The approach is rather simple and i have already implemented a basic tree level phase that deals with it.

Alexehv77 requested review from MichielDerhaeg and luismgsilva May 26, 2026 08:05

Alexehv77 self-assigned this May 26, 2026

MichielDerhaeg mentioned this pull request May 27, 2026

Riscv hoist address ladder #241

Closed

MichielDerhaeg reviewed May 27, 2026

View reviewed changes

Alexehv77 requested a review from MichielDerhaeg June 12, 2026 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add matrix address optimization phase for Zba#261

Add matrix address optimization phase for Zba#261
Alexehv77 wants to merge 1 commit into
mem-optfrom
mem-opt2

Alexehv77 commented May 26, 2026

Uh oh!

MichielDerhaeg May 27, 2026

Uh oh!

Alexehv77 Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alexehv77 commented May 26, 2026

Uh oh!

MichielDerhaeg May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Alexehv77 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants