Coherent L1 instruction cache #4617

HsiehTsungHsien · 2025-04-23T08:25:14Z

HsiehTsungHsien
Apr 23, 2025

I am interesting to know whether coherent L1 instruction cache is supported in XaingShan. If Xiangshan did not support coherent L1-cache, is there any other one support coherent L1 cache?

eastonman · 2025-04-23T08:40:33Z

eastonman
Apr 23, 2025
Collaborator

No, coherent L1i is not supported in current master version.
Nanhu version support coherent L1i. We removed coherency for design and verification difficulty in L2 cache.
From my point of view, there is no difficulty in fetch or L1i to support coherency.

0 replies

HsiehTsungHsien · 2025-04-23T09:27:29Z

HsiehTsungHsien
Apr 23, 2025
Author

So there is no coherent L1-i in Xiangshan, but Nunhu support coherent L1-i. For this statement "We removed coherency for design and verification difficulty in L2 cache.No coherent L2 cache for all” Did that mean “No coherent L2 cache for all version" Am I right?

…

Easton Man ***@***.***> 於 2025年4月23日下午4:40 寫道： No, coherent L1i is not supported in current master version. Nanhu version support coherent L1i. We removed coherency for design and verification difficulty in L2 cache. From my point of view, there is no difficulty in fetch or L1i to support coherency. — Reply to this email directly, view it on GitHub <#4617 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EECI6F2V3BAN7JUTHSL225GZPAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJSGAZDGNQ>. You are receiving this because you authored the thread.

1 reply

eastonman Apr 23, 2025
Collaborator

No no, L2 cache has coherency on data side for all version. By "remove" I mean remove coherency between L1i and L2 (and any other data cache as well). In nanhu version, we used to support L1i-L2 coherency, but we meet difficulties in L2 coherency verification. As a result in current kunminghu version, we decided to step a bit back and make sure our L2 is solid.

HsiehTsungHsien · 2025-04-23T11:09:17Z

HsiehTsungHsien
Apr 23, 2025
Author

Another question about c0herent I$, Following issue is the reason why we asked this question Even if this issue can be relieved by using fence.i to do manual cache synchronization, we still want to solve this from hardware (support of coherent L1 I$) So, From your reply, Both Xiangshan and Kunminghu cannot solved this issue from hardware level since they don’t have coherent L1-i Nanhu may be able to solve it from hardware But, You said that Nanhu has the difficulty in L2 coherency verification. Did Nanhu finish the verification of L1-i/L2 coherency finally and released formally Or This part of verification was not done and the L1-i/L2 coherency feature is not guaranteed in Nanhu

…

Easton Man ***@***.***> 於 2025年4月23日下午5:38 寫道： No no, L2 cache has coherency on data side for all version. By "remove" I mean remove coherency between L1i and L2 (and any other data cache as well). In nanhu version, we used to support L1i-L2 coherency, but we meet difficulties in L2 coherency verification. As a result in current kunminghu version, we decided to step a bit back and make sure our L2 is solid. — Reply to this email directly, view it on GitHub <#4617 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EECDLZYRIPYLZN2TAKL225NRDAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJSGA4DMMY>. You are receiving this because you authored the thread.

0 replies

eastonman · 2025-04-23T11:33:29Z

eastonman
Apr 23, 2025
Collaborator

So umm, there is a bit of naming confusion. Let me clarify a bit, Kunminghu and Nanhu are two versions of XiangShan project.

Nanhu is checked out for bug fixes and other SoC features since around mid-2022. Verification process is done thoroughly for Nanhu during the past few years with extensive hard work. Nanhu has been taped-out on multiple process node and on multiple SoCs. So I think the whole cache system is robust enough (mostly). Nanhu supports hardware L1i coherency. On fence.i, no ICache flush happens.

Current github master branch is for Kunminghu development. Kunminghu development starts before Nanhu verification is done. Kunminghu does not support hardware I-D sync for now (L1i is out of the entire coherence tree, and requires software fence.i to maintaince I-D coherency). Kunminghu features a new L2 and multiple performance improvement all over the core. So for the sake of verification workload, we decided to remove this support when starting the development back in 2023.

I am not sure whether we will add hardware I-D sync in the future or not. I know there are multiple use cases where hardware I-D sync is important, such as Chrome (multi-core JIT engine), but our development schedule may prioritize other features (SPEC CPU performace for example). If you have clear I-D sync use cases, you can discuss it with us here.

0 replies

HsiehTsungHsien · 2025-04-24T02:45:35Z

HsiehTsungHsien
Apr 24, 2025
Author

BTW, Is there any document talking about the power (w/MHz) /area (gate count) of Nanhu and Kunminghu? We are comparing Xiangshan cores with others Norman

…

Easton Man ***@***.***> 於 2025年4月23日晚上7:33 寫道： So umm, there is a bit of naming confusion. Let me clarify a bit, Kunminghu and Nanhu are two versions of XiangShan project. Nanhu is checked out for bug fixes and other SoC features since around mid-2022. Verification process is done thoroughly for Nanhu during the past few years with extensive hard work. Nanhu has been taped-out on multiple process node and on multiple SoCs. So I think the whole cache system is robust enough (mostly). Nanhu supports hardware L1i coherency. On fence.i, no ICache flush happens. Current github master branch is for Kunminghu development. Kunminghu development starts before Nanhu verification is done. Kunminghu does not support hardware I-D sync for now (L1i is out of the entire coherence tree, and requires software fence.i to maintaince I-D coherency). Kunminghu features a new L2 and multiple performance improvement all over the core. So for the sake of verification workload, we decided to remove this support when starting the development back in 2023. I am not sure whether we will add hardware I-D sync in the future or not. I know there are multiple use cases where hardware I-D sync is important, such as Chrome (multi-core JIT engine), but our development schedule may prioritize other features (SPEC CPU performace for example). If you have clear I-D sync use cases, you can discuss it with us here. — Reply to this email directly, view it on GitHub <#4617 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EEDC2YIM7EQI7US4Z432253B3AVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJSGIYDAOA>. You are receiving this because you authored the thread.

3 replies

eastonman Apr 24, 2025
Collaborator

I don’t have any precise numbers, but I can provide you with some information about Kunminghu. We are targeting ARM Neoverse N2 in terms of PPA. Currently, Kunminghu is roughly the same performance as Neoverse N2 and remains within a small margin (less than 10%) on Power and Area metrics. This varies depends on physical design strategies. In some cases, Kunminghu can be better then N2 in power and area.

If you have foundry PDK and EDA needed, you can run physical design flow yourself to compare with others on your process node.

eastonman Apr 24, 2025
Collaborator

If you want detailed physical design metrics, I suggest you contact BOSC(Beijing Institute of Open Source Chip).

@Tang-Haojin Maybe provide some email?

Tang-Haojin Apr 24, 2025
Maintainer

Sure. If you want to know more related to commercial stuffs, you may consult the PM of BOSC, Jian Zhang, and his email address is zhangjian_at_bosc.ac.cn.

HsiehTsungHsien · 2025-04-24T03:40:40Z

HsiehTsungHsien
Apr 24, 2025
Author

I am grateful for getting such contact Thanks

…

Easton Man ***@***.***> 於 2025年4月24日上午11:30 寫道： If you want detailed physical design metrics, I suggest you contact BOSC(Beijing Institute of Open Source Chip). @Tang-Haojin <https://github.com/Tang-Haojin> Maybe provide some email? — Reply to this email directly, view it on GitHub <#4617 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EEE6RDKBI2WTLFBALZL23BLHFAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJTGAZDKOI>. You are receiving this because you authored the thread.

0 replies

HsiehTsungHsien · 2025-04-30T06:48:56Z

HsiehTsungHsien
Apr 30, 2025
Author

I found a paper that has a number about the Nanhu implementation on FPGA In this paper, it shows that the FPGA resource utilization of Nanhu, Suppose that a LUT=4 gates and a FF=10 gates roughly, Based on such assumption, The equivalent gate count of Nanhu ~=9.5M gates Since the ARM counterpart of Nanhu is A76, it seems that 9.5M seems to be a reasonable number if the number of above FPGA resource utilization is correct. Is there anything I missed in my estimation? Norman

…

Easton Man ***@***.***> 於 2025年4月24日上午11:26 寫道： I don’t have any precise numbers, but I can provide you with some information about Kunminghu. We are targeting ARM Neoverse N2 in terms of PPA. Currently, Kunminghu is roughly the same performance as Neoverse N2 and remains within a small margin (less than 10%) on Power and Area metrics. This varies depends on physical design strategies. In some cases, Kunminghu can be better then N2 in power and area. If you have foundry PDK and EDA needed, you can run physical design flow yourself to compare with others on your process node. — Reply to this email directly, view it on GitHub <#4617 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRY3EEF76MM5WIV3A2YKVP323BKXTAVCNFSM6AAAAAB3VVRJPSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEOJTGAZDGNY>. You are receiving this because you authored the thread.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coherent L1 instruction cache #4617

{{title}}

Replies: 7 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Coherent L1 instruction cache #4617

HsiehTsungHsien Apr 23, 2025

Replies: 7 comments · 4 replies

eastonman Apr 23, 2025 Collaborator

HsiehTsungHsien Apr 23, 2025 Author

eastonman Apr 23, 2025 Collaborator

HsiehTsungHsien Apr 23, 2025 Author

eastonman Apr 23, 2025 Collaborator

HsiehTsungHsien Apr 24, 2025 Author

eastonman Apr 24, 2025 Collaborator

eastonman Apr 24, 2025 Collaborator

Tang-Haojin Apr 24, 2025 Maintainer

HsiehTsungHsien Apr 24, 2025 Author

HsiehTsungHsien Apr 30, 2025 Author

HsiehTsungHsien
Apr 23, 2025

Replies: 7 comments 4 replies

eastonman
Apr 23, 2025
Collaborator

HsiehTsungHsien
Apr 23, 2025
Author

eastonman Apr 23, 2025
Collaborator

HsiehTsungHsien
Apr 23, 2025
Author

eastonman
Apr 23, 2025
Collaborator

HsiehTsungHsien
Apr 24, 2025
Author

eastonman Apr 24, 2025
Collaborator

eastonman Apr 24, 2025
Collaborator

Tang-Haojin Apr 24, 2025
Maintainer

HsiehTsungHsien
Apr 24, 2025
Author

HsiehTsungHsien
Apr 30, 2025
Author