You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: EIPS/eip-7923.md
+104-29
Original file line number
Diff line number
Diff line change
@@ -22,11 +22,19 @@ The EVM currently uses a quadratic pricing model for its memory. This was origin
22
22
2. The quadratic model makes it difficult to reason about how much memory a transaction can allocate. It requires solving an optimization problem which involves computing how many message calls are available to recurse into based on the call stack limit (and, post [EIP-150](./eip-150.md), the 63/64ths rule), and then maximizing the memory used per message call.
23
23
3. The quadratic model makes it impossible for high-level smart contracts languages to get the benefits of virtual memory. Most modern programming languages maintain what is known as the "heap" and the "call stack". The heap is used to allocate objects which live past the lifetime of their current function frame, whereas the call stack is used to allocate objects which live in the current function frames. Importantly, the call stack starts at the top of memory and grows down, while the heap starts at the bottom of memory and grows up, thus the language implementation does not need to worry about the two regions of memory interfering with each other. This is a feature which is enabled by virtual, paged memory, which has been present in operating systems since the early 90's. However, smart contract languages like Vyper and Solidity are not able to implement this, leading to inefficiencies in their memory models.
24
24
25
-
This EIP proposes a linear costing model which more closely reflects the hardware of today. It uses a virtual addressing scheme so that memory pages are not allocated until they are actually accessed. Notably, the data structures used for costing memory do not need to be part of the memory implementation itself, which suggests an elegant implementation using `mmap`.
25
+
This EIP proposes a linear costing model which more closely reflects the hardware of today, which is hierarchical ("hot" memory is much faster to access than "cold" memory), and virtually addressed (memory does not need to be allocated contiguously, but is rather served "on-demand" by the operating system).
26
+
27
+
First, some preliminaries. A page is 4096 bytes on most architectures. Given a memory address, its page is simple to compute by masking out the rightmost 12 bits.
28
+
29
+
There are two factors which contribute to "cold" memory (i.e. not-recently-used) being slower: CPU cache and TLB (Translation Lookaside Buffer) cache. The CPU cache is a least-recently-used memory cache, which is significantly faster than fetching all the way from RAM. The TLB is usually some hash table which maps virtual pages (used by the user) to physical pages in RAM. "Thrashing", or accessing a lot of different memory addresses, does two things: it pushes memory out of the hot cache and into cold memory, and it pushes pages out of the TLB cache.
30
+
31
+
This EIP uses a virtual addressing scheme so that memory pages are not allocated until they are actually accessed. Further, it adds a surcharge for accessing memory outside of an EVM-defined "hot" area.
32
+
33
+
Notably, the data structures used for costing memory do not need to be part of the memory implementation itself, which suggests an elegant implementation using the POSIX `mmap` syscall (or, its counterpart on Windows, `VirtualAlloc`).
26
34
27
35
The implementation can be approached in two ways. The first way is to implement the virtual addressing "manually". This is intended for systems without `mmap` or a virtual addressing capability. The implementation needs to maintain a map from `map[page_id -> char[4096]]`, where `page_id` is an integer, computed as `memory_address >> 12`. Additionally, for costing purposes, a set of 512 `page_id`s (`set[page_id]`) is maintained. This is only used for pricing the operation, it doesn't actually contain the data.
28
36
29
-
The other implementation is easier, for systems with `mmap` or a similar facility. To hold the actual data of the memory, the implementation `mmap`s a `2**32` byte region of memory. Then, memory operations can be implemented simply as reads or writes against this buffer. (With an anonymous `mmap`, the operating system will allocate pages "on demand", as they are touched). The `pages` map is still necessary, but it doesn't hold any data, it is just to track which pages have been allocated, for pricing purposes. In this implementation, there are three data structures: `memory char[2**32]`, `allocated_pages set[page_id]`, `hot_pages set[page_id]`. The `memory` data structure is only used for memory reads and writes. The `allocated_pages` and `hot_pages` are only used for gas costing.
37
+
The other implementation is easier, for systems with `mmap` or a similar facility. To hold the actual data of the memory, the implementation `mmap`s a `2**32` byte region of memory. Then, memory operations can be implemented simply as reads or writes against this buffer. (With an anonymous `mmap`, the operating system will not allocate the entire buffer up-front, rather, it will allocate pages "on demand", as they are touched). The `pages` map is still necessary, but it doesn't hold any data, it is just to track which pages have been allocated, for pricing purposes. In this implementation, there are three data structures: `memory char[2**32]`, `allocated_pages set[page_id]`, `hot_pages set[page_id]`. The `memory` data structure is only used for memory reads and writes. The `allocated_pages` and `hot_pages` are only used for gas costing.
30
38
31
39
## Specification
32
40
@@ -55,7 +63,7 @@ A transaction-global memory limit is imposed. If the number of pages allocated i
55
63
56
64
## Rationale
57
65
58
-
Benchmarks were performed on a 2019-era CPU, with the ability to keccak256 around 256MB/s, giving it a gas-to-ns ratio of 20 ns per 1 gas. The following benchmarks were performed:
66
+
Benchmarks were performed on a 2019-era CPU, with the ability to `keccak256` around 256MB/s, giving it a gas-to-ns ratio of 20 ns per 1 gas (given that `keccak256` costs 6 gas per 32 bytes). The following benchmarks were performed:
59
67
60
68
- Time to allocate a fresh page: 1-2us
61
69
- Time to randomly read a byte from a 2MB range: 1.8ns
@@ -64,13 +72,16 @@ Benchmarks were performed on a 2019-era CPU, with the ability to keccak256 aroun
64
72
- Time to update a hashmap with 512 items: 8ns
65
73
- Time to update a hashmap with 8192 items: 9ns
66
74
- Time to update a hashmap with 5mm items: 108ns
75
+
- Time to execute the `mmap` syscall: 230ns
67
76
68
77
These suggest the following prices:
69
78
70
79
- 100 gas to allocate a page, and
71
80
- 6 gas for a page thrash
72
81
73
-
Since the delta between hitting a page and thrashing a page (including bookkeeping overhead) is ~120ns, we could ignore the resource cost and simply increase the base cost per memory operation from 3 gas to 6 gas. However, since memory operations which exploit cost-locality are so cheap, it leaves "room on the table" for future improvements to the gas schedule, including reducing the base cost of a memory operation to 1 gas. Furthermore, as the reference implementation below shows, it takes very little bookkeeping overhead (one additional data structure, and four lines of code) to check for the thrash.
82
+
Note that the cost to execute `mmap` (~11 gas) is already well-paid for by the base cost of the CALL series of instructions (100 gas).
83
+
84
+
Since the delta between hitting a page and thrashing a page (including bookkeeping overhead) is ~120ns, we could ignore the resource cost and simply increase the base cost per memory operation from 3 gas to 6 gas. However, since memory operations which exploit cost-locality are so cheap, it leaves "room on the table" for future improvements to the gas schedule, including reducing the base cost of a memory operation to 1 gas. Furthermore, as the reference implementation below shows, it takes very little bookkeeping overhead (one additional data structure, and four lines of code) to check for the thrash. Therefore, we model memory with a one-level hierarchy. While this is simpler than most real CPUs, which may have several levels of memory hierarchy, it is granular enough for our purposes.
74
85
75
86
There is a desire among client implementations to be able to enforce global limits separately from the gas limit due to DoS reasons. For example, RPC providers may be designed to allow many concurrent `eth_call` computations with a much higher gas limit than on mainnet. Not implicitly tying the memory limit to the gas limit results in one less vector for misconfiguration. That is not to say that in the future, a clean formula cannot be created which allows the memory limit to scale with future hardware improvements (e.g., proportional to the sqrt of the gas limit), but to limit the scope of things that need to be reasoned about for this EIP, the hard limit is introduced.
76
87
@@ -89,14 +100,14 @@ Addressed in Security Considerations section. No backwards compatibility is brok
89
100
90
101
## Reference Implementation
91
102
92
-
A ~50-line reference implementation is provided below. It is implemented as a patch against the `py-evm` codebase at commit ethereum/py-evm@fec63b8c4b9dad9fcb1022c48c863bdd584820c6.
103
+
A ~60-line reference implementation is provided below. It is implemented as a patch against the `py-evm` codebase at commit ethereum/py-evm@fec63b8c4b9dad9fcb1022c48c863bdd584820c6. (This is a reference implementation, it does not, for example, contain fork choice rules).
0 commit comments