Coherent L1 instruction cache #4617
Replies: 7 comments 4 replies
-
No, coherent L1i is not supported in current master version. |
Beta Was this translation helpful? Give feedback.
-
So
there is no coherent L1-i in Xiangshan, but Nunhu support coherent L1-i.
For this statement
"We removed coherency for design and verification difficulty in L2 cache.No coherent L2 cache for all”
Did that mean “No coherent L2 cache for all version"
Am I right?
… Easton Man ***@***.***> 於 2025年4月23日 下午4:40 寫道:
No, coherent L1i is not supported in current master version.
Nanhu version support coherent L1i. We removed coherency for design and verification difficulty in L2 cache.
From my point of view, there is no difficulty in fetch or L1i to support coherency.
—
Reply to this email directly, view it on GitHub <#4617 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EECI6F2V3BAN7JUTHSL225GZPAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJSGAZDGNQ>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
Another question about c0herent I$,
Following issue is the reason why we asked this question
Even if this issue can be relieved by using fence.i to do manual cache synchronization, we still want to solve this from hardware (support of coherent L1 I$)
So,
From your reply,
Both Xiangshan and Kunminghu cannot solved this issue from hardware level since they don’t have coherent L1-i
Nanhu may be able to solve it from hardware
But,
You said that Nanhu has the difficulty in L2 coherency verification.
Did Nanhu finish the verification of L1-i/L2 coherency finally and released formally
Or
This part of verification was not done and the L1-i/L2 coherency feature is not guaranteed in Nanhu

… Easton Man ***@***.***> 於 2025年4月23日 下午5:38 寫道:
No no, L2 cache has coherency on data side for all version. By "remove" I mean remove coherency between L1i and L2 (and any other data cache as well). In nanhu version, we used to support L1i-L2 coherency, but we meet difficulties in L2 coherency verification. As a result in current kunminghu version, we decided to step a bit back and make sure our L2 is solid.
—
Reply to this email directly, view it on GitHub <#4617 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EECDLZYRIPYLZN2TAKL225NRDAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJSGA4DMMY>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
So umm, there is a bit of naming confusion. Let me clarify a bit, Kunminghu and Nanhu are two versions of XiangShan project. Nanhu is checked out for bug fixes and other SoC features since around mid-2022. Verification process is done thoroughly for Nanhu during the past few years with extensive hard work. Nanhu has been taped-out on multiple process node and on multiple SoCs. So I think the whole cache system is robust enough (mostly). Nanhu supports hardware L1i coherency. On Current github master branch is for Kunminghu development. Kunminghu development starts before Nanhu verification is done. Kunminghu does not support hardware I-D sync for now (L1i is out of the entire coherence tree, and requires software I am not sure whether we will add hardware I-D sync in the future or not. I know there are multiple use cases where hardware I-D sync is important, such as Chrome (multi-core JIT engine), but our development schedule may prioritize other features (SPEC CPU performace for example). If you have clear I-D sync use cases, you can discuss it with us here. |
Beta Was this translation helpful? Give feedback.
-
BTW,
Is there any document talking about the power (w/MHz) /area (gate count) of Nanhu and Kunminghu?
We are comparing Xiangshan cores with others
Norman
… Easton Man ***@***.***> 於 2025年4月23日 晚上7:33 寫道:
So umm, there is a bit of naming confusion. Let me clarify a bit, Kunminghu and Nanhu are two versions of XiangShan project.
Nanhu is checked out for bug fixes and other SoC features since around mid-2022. Verification process is done thoroughly for Nanhu during the past few years with extensive hard work. Nanhu has been taped-out on multiple process node and on multiple SoCs. So I think the whole cache system is robust enough (mostly). Nanhu supports hardware L1i coherency. On fence.i, no ICache flush happens.
Current github master branch is for Kunminghu development. Kunminghu development starts before Nanhu verification is done. Kunminghu does not support hardware I-D sync for now (L1i is out of the entire coherence tree, and requires software fence.i to maintaince I-D coherency). Kunminghu features a new L2 and multiple performance improvement all over the core. So for the sake of verification workload, we decided to remove this support when starting the development back in 2023.
I am not sure whether we will add hardware I-D sync in the future or not. I know there are multiple use cases where hardware I-D sync is important, such as Chrome (multi-core JIT engine), but our development schedule may prioritize other features (SPEC CPU performace for example). If you have clear I-D sync use cases, you can discuss it with us here.
—
Reply to this email directly, view it on GitHub <#4617 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EEDC2YIM7EQI7US4Z432253B3AVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJSGIYDAOA>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
I am grateful for getting such contact
Thanks
… Easton Man ***@***.***> 於 2025年4月24日 上午11:30 寫道:
If you want detailed physical design metrics, I suggest you contact BOSC(Beijing Institute of Open Source Chip).
@Tang-Haojin <https://github.com/Tang-Haojin> Maybe provide some email?
—
Reply to this email directly, view it on GitHub <#4617 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EEE6RDKBI2WTLFBALZL23BLHFAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJTGAZDKOI>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
I found a paper that has a number about the Nanhu implementation on FPGA

In this paper, it shows that the FPGA resource utilization of Nanhu,

Suppose that a LUT=4 gates and a FF=10 gates roughly,
Based on such assumption,
The equivalent gate count of Nanhu ~=9.5M gates
Since the ARM counterpart of Nanhu is A76, it seems that 9.5M seems to be a reasonable number if the number of above FPGA resource utilization is correct.
Is there anything I missed in my estimation?
Norman
… Easton Man ***@***.***> 於 2025年4月24日 上午11:26 寫道:
I don’t have any precise numbers, but I can provide you with some information about Kunminghu. We are targeting ARM Neoverse N2 in terms of PPA. Currently, Kunminghu is roughly the same performance as Neoverse N2 and remains within a small margin (less than 10%) on Power and Area metrics. This varies depends on physical design strategies. In some cases, Kunminghu can be better then N2 in power and area.
If you have foundry PDK and EDA needed, you can run physical design flow yourself to compare with others on your process node.
—
Reply to this email directly, view it on GitHub <#4617 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EEF76MM5WIV3A2YKVP323BKXTAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJTGAZDGNY>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
I am interesting to know whether coherent L1 instruction cache is supported in XaingShan. If Xiangshan did not support coherent L1-cache, is there any other one support coherent L1 cache?
Beta Was this translation helpful? Give feedback.
All reactions