Skip to content

Commit f5a7699

Browse files
Ztee2 (#199)
* Update 2024-11-08-ZTEE2.mdx * added forum link * improve "how much should be open" section * Add crypto AG example link in first par * fix link in first par * Add files via upload
1 parent 6e7aeec commit f5a7699

File tree

2 files changed

+12
-18
lines changed

2 files changed

+12
-18
lines changed

content/2024-11-08-ZTEE2.mdx

+12-18
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ hide_table_of_contents: false
77
forum_link: https://collective.flashbots.net/t/ztee-trustless-supply-chains/4033
88
---
99

10-
Every distributed cryptographic protocol, key management system or wallet runs on opaque hardware. In almost all cases, we do not know with any certainty that our hardware is executing the expected program and that it is not actually acting against us. [Many cases](https://www.spiegel.de/international/world/the-nsa-uses-powerful-toolbox-in-effort-to-spy-on-global-networks-a-940969.html) of [exactly](https://web.archive.org/web/20230721093448/https://www.bloomberg.com/features/2021-supermicro/) this kind of [betrayal](https://eprint.iacr.org/2024/1275) have been [uncovered](https://arstechnica.com/tech-policy/2014/05/photos-of-an-nsa-upgrade-factory-show-cisco-router-getting-implant/). The [latest](https://www.aljazeera.com/economy/2024/9/19/lebanon-blasts-raise-alarm-about-supply-chain-security-tech-safety) proved deadly. This precedent suggests the likely existence of undetected malicious hardware in use today.
10+
Every distributed cryptographic protocol, key management system or wallet runs on opaque hardware. In almost all cases, we do not know with any certainty that our hardware is executing the expected program and that it is not actually acting against us. [Many cases](https://www.spiegel.de/international/world/the-nsa-uses-powerful-toolbox-in-effort-to-spy-on-global-networks-a-940969.html) of [exactly](https://web.archive.org/web/20230721093448/https://www.bloomberg.com/features/2021-supermicro/) this [kind](https://en.wikipedia.org/wiki/Crypto_AG) of [betrayal](https://eprint.iacr.org/2024/1275) have been [uncovered](https://arstechnica.com/tech-policy/2014/05/photos-of-an-nsa-upgrade-factory-show-cisco-router-getting-implant/). The [latest](https://www.aljazeera.com/economy/2024/9/19/lebanon-blasts-raise-alarm-about-supply-chain-security-tech-safety) proved deadly. This precedent suggests the likely existence of undetected malicious hardware in use today.
1111

1212
In [our first post](https://writings.flashbots.net/ZTEE), we went over the big picture security shortcomings of TEEs and broke up the work that needs to be done into two: securing the completed chip against remote and physical attackers, and securing the chip against actors in the supply chain. While there is a lot of existing work on both categories, the latter is less explored for our purposes and requires more fundamental research so we are dedicating this post to the topic, and address remote and physical attackers in the next post.
1313
A verifiable supply chain is within reach. We demonstrate this by pointing out existing and ongoing research that constitutes various pieces of the puzzle. Along the way we also cover a good deal on open hardware which will provide important context for future posts. The post is structured as follows:
@@ -119,35 +119,29 @@ The fabs with open PDKs have far from the state of the art process nodes availab
119119
Better process nodes not only imply better economics and performance, they also impact security. As chips get denser, it becomes much harder to carry out various forms of physical tampering. Extracting information via probing and electromagnetic side channel attacks becomes much more difficult for example. Interestingly, defense against hardware trojans does the inverse and becomes harder to do as features get smaller due to the limited precision of current imaging techniques. Open access to better process nodes is certainly very necessary today, but there may be a point at which smaller nodes would not be more desirable even if they were open.
120120

121121
### **How much needs to be open?**
122-
123122
An open source maximalist may want every design detail to be open, but given the status quo we will need to navigate a world in which not everything is open, at least for now. Despite a [growing](https://fossi-foundation.org/orconf/2024) [community](https://wiki.f-si.org/index.php/FSiC2024) of people and [commercial](https://www.zerorisc.com/) [projects](https://tropicsquare.com/) working to build open versions of the tools required to design and build chips, there is still a very high cost to building fully open hardware, impacting the performance, cost and security of chips. There is a point at which benefits of fully open designs outweigh the costs, but significant work is required to reach this point.
124123

125-
**Opening Nothing**
124+
**Opening Everything**
125+
Fortunately, there are already several designs which are ***partially*** open. [OpenTitan](https://opentitan.org/), [Cramium](https://www.crossbar-inc.com/products/secure-processing-units/overview/) and [Tropic Square](https://tropicsquare.com/) are projects building chips with many, but not all aspects of their chip design collateral publicly released. By keeping the GDS and some other details closed, chips can be produced with high quality process nodes. However, enough material is open for users to be able to audit the design for potential bugs or side channels. For example, some side-channel mitigation techniques ([1](https://ieeexplore.ieee.org/abstract/document/9190067), [2](https://eprint.iacr.org/2022/507), [3](https://tches.iacr.org/index.php/TCHES/article/view/11689)) can be verified by existing tooling given the netlist ([1](https://eprint.iacr.org/2020/634), [2](https://tches.iacr.org/index.php/TCHES/article/view/9820), [3](https://graz.elsevierpure.com/en/publications/cocoalma-a-versatile-masking-verifier), [4](https://tches.iacr.org/index.php/TCHES/article/view/9822)) or RTL ([1](https://link.springer.com/chapter/10.1007/978-3-030-29959-0_15), [2](https://ieeexplore.ieee.org/abstract/document/9833600)). There’s also the benefit of lessened vendor lock-in and easier extensibility. To address trojans, we could throw weight behind the movement to open up tools and designs. **The aim would be to get better open PDKs** and make enough of an economic, social and/or political argument to sway foundries into opening up higher quality process nodes, making it possible to have a public commercially viable chip with an open GDS. Since some process nodes (e.g. ~28nm^[The 2x node is also the “last node of Moore’s law” in that it was the last node where transistors got cheaper. It used to be that the cost of wafers always dropped to around $3k as the process matured; 2x nm was the end of that. For smaller nodes, wafer cost doesn’t come down, and they may even go up. This makes 2x nm a very economically important node. For instance, TSMC is doubling down on this node and trying to get legacy customers to migrate to the node.]) have been around for quite some time, the “edge” lost from opening these PDKs is not that large and it is realistic to expect that, especially with demonstration of sufficient demand, some of these will be opened up in the next 2 years.
126126

127-
Before discussing open source, it’s worth noting that we can already make some improvements without making any new information public. There are two ways to do this: a permissioned set of verifiers, and a ZK-proof-based verification protocol. Both rely on the idea that a special actor can have access to the PDK and GDS while these are not made public.
128-
In the first case, the fab selects a *permissioned set of verifiers*, optimising for public trust. A verifier uses the GDS and PDK to make public statements about the absence of trojans, but nothing else. The requirement for fab permissions might make this set of parties very limited, undermining security.
129-
Another approach is to let other entities do the imaging and then have the actor who knows the design post proofs that confirm that the images match a GDS which was derived from the public RTL. This would consist of two [ZK proofs](https://en.wikipedia.org/wiki/Zero-knowledge_proof):
127+
![image(4)|490x488](upload://f387HyyNbYD5nFRrmICZSdncpaV.png)
130128

131-
- `hash(GDS)` is derived from the public RTL/netlist
132-
- `compare(image, hash(GDS)) == true` where `compare` is a public algorithm
133129

134-
We may only need to produce these proofs once or a small number of times as the proved image would then provide a public reference. While chip images may not be protected by an NDA, the fab may not like the publication of these images if they are unwilling to make the GDS open as they carry much of the same information.
130+
#### Partial Openness
135131

136-
Neither of these provide any of the other benefits of openness like public scrutiny of designs and open innovation. However, protocols like the one described in (IV) are compatible with these approaches. While using a permissioned set of verifiers is inferior to having an unrestricted selection of verifiers and the ability of the public to conduct ad hoc audits, the ZK proving approach does not have these drawbacks.
132+
Another approach recognises that the march to a sufficiently powerful open hardware “stack” is a long and unpredictable one, and instead asks what we can do without opening everything up. The idea is to go as far as possible with only the RTL and netlist being public, using mathematical tooling to learn as much as possible from the information that is open.
133+
One way ([proposed by Bunnie](https://www.bunniestudios.com/blog/2024/iris-infra-red-in-situ-project-updates/)) to do this is to **bound the density of logic** (i.e. number of transistors per unit of area) we should expect in different regions of the chip. We could rely on formal methods to achieve these bounds, but partial reliance on heuristics may also be a viable path. The reasoning behind these heuristics would be that there are large financial incentives to develop techniques to pack logic more tightly and to advertise such improvements instead of secretly developing them for the insertion of trojans. Sufficiently tight bounds would render large trojans detectable. Given how [small](https://link.springer.com/chapter/10.1007/978-3-642-40349-1_12) some (dopant-level) trojans can be, we would also need other techniques to force trojans to a detectable size. We cover this issue in more depth in the next section. The proof techniques for upper bounding logic density and lower bounding trojans still need to be developed so this should be considered a direction for exploration rather than an option today.^[If you are knowledgeable or interested in working on (or funding) these problems reach out to us or Bunnie directly.]
137134

138-
**Opening Everything**
139-
Fortunately, there are already several designs which are ***partially*** open. [OpenTitan](https://opentitan.org/), [Cramium](https://www.crossbar-inc.com/products/secure-processing-units/overview/) and [Tropic Square](https://tropicsquare.com/) are projects building chips with many, but not all aspects of their chip design collateral publicly released. By keeping the GDS and some other details closed, chips can be produced with high quality process nodes. However, enough material is open for users to be able to audit the design for potential bugs or side channels. For example, some side-channel mitigation techniques ([1](https://ieeexplore.ieee.org/abstract/document/9190067), [2](https://eprint.iacr.org/2022/507), [3](https://tches.iacr.org/index.php/TCHES/article/view/11689)) can be verified by existing tooling given the netlist ([1](https://eprint.iacr.org/2020/634), [2](https://tches.iacr.org/index.php/TCHES/article/view/9820), [3](https://graz.elsevierpure.com/en/publications/cocoalma-a-versatile-masking-verifier), [4](https://tches.iacr.org/index.php/TCHES/article/view/9822)) or RTL ([1](https://link.springer.com/chapter/10.1007/978-3-030-29959-0_15), [2](https://ieeexplore.ieee.org/abstract/document/9833600)). There’s also the benefit of lessened vendor lock-in and easier extensibility. To address trojans, we could throw weight behind the movement to open up tools and designs. **The aim would be to get better open PDKs** and make enough of an economic, social and/or political argument to sway foundries into opening up higher quality process nodes, making it possible to have a public commercially viable chip with an open GDS. Since some process nodes (e.g. ~28nm^[The 2x node is also the “last node of Moore’s law” in that it was the last node where transistors got cheaper. It used to be that the cost of wafers always dropped to around $3k as the process matured; 2x nm was the end of that. For smaller nodes, wafer cost doesn’t come down, and they may even go up. This makes 2x nm a very economically important node. For instance, TSMC is doubling down on this node and trying to get legacy customers to migrate to the node.]) have been around for quite some time, the “edge” lost from opening these PDKs is not that large and it is realistic to expect that, especially with demonstration of sufficient demand, some of these will be opened up in the next 2 years.
140-
141-
![OS Trojan](/img/ZTEE/trojanmeme.png)
135+
Other ideas require the fab to select special actors to be given access to the GDS and PDK. These chosen verifiers can use the GDS and PDK to make public statements that link an image of a chip to the netlist. These statements would actually consist of two claims:
142136

137+
- The verifier is in possession of a GDS and PDK such that `GDS==PlaceRoute(netlist, PDK)`
138+
- The verifier is in possession of an image such that `compare(image, GDS) == true` where `compare` is a public algorithm
143139

144-
#### Partial Openness
140+
Of course these claims can simply be signed statements posted publicly from reputable actors, but we can also rely on zero-knowledge proofs (e.g. SNARKS^[we could technically use SH for this as well, but this feels like turtles all the way down]) which provide much better trust assumptions. One challenge here is that `PlaceRoute` is given by the EDA software which is typically closed source. Since a malicious `PlaceRoute` can be used to insert a trojan, the EDA must be publicly auditable in order for this scheme to work. In effect, we are trading off the challenge of open sourcing the EDA with that of getting an open PDK.If the image is published along with the proofs tying it to the netlist, the image can later be used by other actors as a reference to verify other chips.
145141

146-
Another approach recognises that the march to a sufficiently powerful open hardware “stack” is a long and unpredictable one, and instead asks what we can do without opening everything up. The idea is to develop mathematical tooling to derive as much information as possible from design collateral that is relatively easy to open up like RTLs and netlists. One way ([proposed by Bunnie](https://www.bunniestudios.com/blog/2024/iris-infra-red-in-situ-project-updates/)) to do this is to bound the density of logic (i.e. number of transistors per unit of area) we should expect in different regions of the chip. We could rely on formal methods to achieve these bounds, but partial reliance on heuristics may also be a viable path. The reasoning behind these heuristics would be that there are large financial incentives to develop techniques to pack logic more tightly and to advertise such improvements instead of secretly developing them for the insertion of trojans. Sufficiently tight bounds would render large trojans detectable. Given how [small](https://link.springer.com/chapter/10.1007/978-3-642-40349-1_12) some (dopant-level) trojans can be, we would also need other techniques to force trojans to a detectable size. We cover this issue in more depth in the next section. The proof techniques for upper bounding logic density and lower bounding trojans still need to be developed so this should be considered a direction for exploration rather than an option today.^[If you are knowledgeable or interested in working on (or funding) these problems reach out to us or Bunnie directly.]
142+
The combination of the mathematical and pro-openness approaches is promising. The mathematical approaches serve as a good hedge against the open hardware movement taking a long time to get up to speed and helps to add additional assurances to partially open architectures like OpenTitan. One approach is technical while the other requires navigating patents and licensing or convincing large corporations to change their stance on something, presenting two very different forms of risk.
147143

148-
The combination of the mathematical and pro-openness approaches is promising. The mathematical approaches serves as a good hedge against the open hardware movement taking a long time to get up to speed and helps to add additional assurances to partially open architectures like OpenTitan. One approach is technical while the other requires navigating patents and licensing or convincing large corporations to change their stance on something, presenting two very different forms of risk.
149144
# III. Detecting Trojans
150-
151145
We can separate trojan detection techniques into destructive and non-destructive. [Destructive analyses](https://ieeexplore.ieee.org/document/10179341) are the state of the art. The destructive analysis process typically involves shaving away protective packaging and then, because chips actually consist of many layers of logic and interconnects, carefully shaving away each layer of logic and using a tool like a [scanning electron microscope (SEM)](https://en.wikipedia.org/wiki/Scanning_electron_microscope) or [Focused Ion Beam](https://en.wikipedia.org/wiki/Focused_ion_beam) (FIB)^[Not to be confused with [ion cannons](https://starwars.fandom.com/wiki/Ion_cannon).] to inspect each layer. These techniques have been shown [to be able to detect even stealthy dopant-level of trojans](https://link.springer.com/chapter/10.1007/978-3-662-44709-3_7).
152146

153147
Despite their effectiveness, destructive techniques have significant downsides such as being, well, *destructive*… Not only are destructive techniques slow and costly, chips that end up being used have never gone through such an inspection process so the security guarantee must be derived from the adversary being statistically warded off by a sampling process which checks some fraction of produced chips. Additionally, as section IV makes clearer, creating publicly verifiable trojan-detection protocols based on destructive analysis is challenging because a chip can only be inspected once and the process doesn’t produce evidence that can be tied back to the identity of the chip. These limitations do not rule out the use of destructive techniques, but do motivate exploration of non-destructive alternatives.

static/img/ZTEE/overview.png

-87 KB
Loading

0 commit comments

Comments
 (0)