Lab4-2 Performance Optimization #176

boledulab · 2023-12-08T00:36:38Z

boledulab
Dec 8, 2023
Maintainer

This is to initate a thread of discussion to improve Lab4 Caravel FIR, in particular, how to have firmware feed data to FIR to match the hardware throughput, i.e. 11 or 12 T a data output.

nthuyouwei · 2023-12-08T11:56:27Z

nthuyouwei
Dec 8, 2023

我們這組的優化過程如下:
lab4-2優化過程.pdf

0 replies

vic9112 · 2023-12-08T12:20:49Z

vic9112
Dec 8, 2023

Here is my improvement after reordering some of the instructions.

FurtherOptimizationVic.pdf

0 replies

nthuyouwei · 2023-12-08T16:18:34Z

nthuyouwei
Dec 8, 2023

感謝 @vic9112 提供更改assembly code順序的方式(直接進入.hex中更改對應的順序就好了)，所以我再優化，從原先的waveform發現38000028卡住了整個時間，所以我把他往前移如下圖

所以優化後的waveform為

這樣y->x只要2個cycle

0 replies

boledulab · 2023-12-09T00:43:15Z

boledulab
Dec 9, 2023
Maintainer Author

It is good that you find out it is a load-to-use case, i.e.
lw a2, 132(a3)
sw a2, 0(a4) => this sw is stalled thus delays the following write x ( sw a5, 128 (a3). This is an in-order processor.

Until now, the host firmware is not a bottleneck.
Next, You can hack FIR HW design to remove the actual FIR calculate, i.e. to bypass X to Y, to see how fast the firmware can run. Then set this as a target to optimize hardware.

0 replies

nthuyouwei · 2023-12-14T21:43:01Z

nthuyouwei
Dec 14, 2023

更新一下:目前優化程度，因一開始硬體設計上並沒有加一層FF，所以每次做完y都要等到收到x才能繼續，這時極限只能做到2+14 cycles per data且在此狀況下還要花時間去設計究竟x->y還是y->x哪段要比較快來配合硬體。不過感謝台大同學的報告，從中發現加了一層FF確實可以再進一步優化。有了FF後，我們可以再運算的過程中也可以收x，所以我們現在只要思考如何讓smtready之間的距離縮短即可。所以我試著再去縮短smtready之間的cycle數，最後做出來12cycles per data。

下圖為firmware code中while迴圈對應的 assembly code以及waveform

0 replies

boledulab · 2023-12-14T23:47:16Z

boledulab
Dec 14, 2023
Maintainer Author

It is great. This is to preload X so that hardware does not need to wait for X. The throughput 12T is probably the maximum hardware can go?
FYI. The NTU team unfolded the loop by 2 (still fit the code into the instruction cache) to reduce the branch overhead (as attached).
lab4-2 best result.pptx.pdf

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lab4-2 Performance Optimization #176

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Lab4-2 Performance Optimization #176

Uh oh!

boledulab Dec 8, 2023 Maintainer

Replies: 6 comments

Uh oh!

Uh oh!

nthuyouwei Dec 8, 2023

Uh oh!

vic9112 Dec 8, 2023

Uh oh!

nthuyouwei Dec 8, 2023

Uh oh!

boledulab Dec 9, 2023 Maintainer Author

Uh oh!

nthuyouwei Dec 14, 2023

Uh oh!

boledulab Dec 14, 2023 Maintainer Author

boledulab
Dec 8, 2023
Maintainer

nthuyouwei
Dec 8, 2023

vic9112
Dec 8, 2023

nthuyouwei
Dec 8, 2023

boledulab
Dec 9, 2023
Maintainer Author

nthuyouwei
Dec 14, 2023

boledulab
Dec 14, 2023
Maintainer Author