Skip to content

Commit ae81496

Browse files
Update EIP-7923: give more background on pages, thrashing and general setup of the EIP
Merged by EIP-Bot.
1 parent b6b2949 commit ae81496

File tree

1 file changed

+104
-29
lines changed

1 file changed

+104
-29
lines changed

Diff for: EIPS/eip-7923.md

+104-29
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,19 @@ The EVM currently uses a quadratic pricing model for its memory. This was origin
2222
2. The quadratic model makes it difficult to reason about how much memory a transaction can allocate. It requires solving an optimization problem which involves computing how many message calls are available to recurse into based on the call stack limit (and, post [EIP-150](./eip-150.md), the 63/64ths rule), and then maximizing the memory used per message call.
2323
3. The quadratic model makes it impossible for high-level smart contracts languages to get the benefits of virtual memory. Most modern programming languages maintain what is known as the "heap" and the "call stack". The heap is used to allocate objects which live past the lifetime of their current function frame, whereas the call stack is used to allocate objects which live in the current function frames. Importantly, the call stack starts at the top of memory and grows down, while the heap starts at the bottom of memory and grows up, thus the language implementation does not need to worry about the two regions of memory interfering with each other. This is a feature which is enabled by virtual, paged memory, which has been present in operating systems since the early 90's. However, smart contract languages like Vyper and Solidity are not able to implement this, leading to inefficiencies in their memory models.
2424

25-
This EIP proposes a linear costing model which more closely reflects the hardware of today. It uses a virtual addressing scheme so that memory pages are not allocated until they are actually accessed. Notably, the data structures used for costing memory do not need to be part of the memory implementation itself, which suggests an elegant implementation using `mmap`.
25+
This EIP proposes a linear costing model which more closely reflects the hardware of today, which is hierarchical ("hot" memory is much faster to access than "cold" memory), and virtually addressed (memory does not need to be allocated contiguously, but is rather served "on-demand" by the operating system).
26+
27+
First, some preliminaries. A page is 4096 bytes on most architectures. Given a memory address, its page is simple to compute by masking out the rightmost 12 bits.
28+
29+
There are two factors which contribute to "cold" memory (i.e. not-recently-used) being slower: CPU cache and TLB (Translation Lookaside Buffer) cache. The CPU cache is a least-recently-used memory cache, which is significantly faster than fetching all the way from RAM. The TLB is usually some hash table which maps virtual pages (used by the user) to physical pages in RAM. "Thrashing", or accessing a lot of different memory addresses, does two things: it pushes memory out of the hot cache and into cold memory, and it pushes pages out of the TLB cache.
30+
31+
This EIP uses a virtual addressing scheme so that memory pages are not allocated until they are actually accessed. Further, it adds a surcharge for accessing memory outside of an EVM-defined "hot" area.
32+
33+
Notably, the data structures used for costing memory do not need to be part of the memory implementation itself, which suggests an elegant implementation using the POSIX `mmap` syscall (or, its counterpart on Windows, `VirtualAlloc`).
2634

2735
The implementation can be approached in two ways. The first way is to implement the virtual addressing "manually". This is intended for systems without `mmap` or a virtual addressing capability. The implementation needs to maintain a map from `map[page_id -> char[4096]]`, where `page_id` is an integer, computed as `memory_address >> 12`. Additionally, for costing purposes, a set of 512 `page_id`s (`set[page_id]`) is maintained. This is only used for pricing the operation, it doesn't actually contain the data.
2836

29-
The other implementation is easier, for systems with `mmap` or a similar facility. To hold the actual data of the memory, the implementation `mmap`s a `2**32` byte region of memory. Then, memory operations can be implemented simply as reads or writes against this buffer. (With an anonymous `mmap`, the operating system will allocate pages "on demand", as they are touched). The `pages` map is still necessary, but it doesn't hold any data, it is just to track which pages have been allocated, for pricing purposes. In this implementation, there are three data structures: `memory char[2**32]`, `allocated_pages set[page_id]`, `hot_pages set[page_id]`. The `memory` data structure is only used for memory reads and writes. The `allocated_pages` and `hot_pages` are only used for gas costing.
37+
The other implementation is easier, for systems with `mmap` or a similar facility. To hold the actual data of the memory, the implementation `mmap`s a `2**32` byte region of memory. Then, memory operations can be implemented simply as reads or writes against this buffer. (With an anonymous `mmap`, the operating system will not allocate the entire buffer up-front, rather, it will allocate pages "on demand", as they are touched). The `pages` map is still necessary, but it doesn't hold any data, it is just to track which pages have been allocated, for pricing purposes. In this implementation, there are three data structures: `memory char[2**32]`, `allocated_pages set[page_id]`, `hot_pages set[page_id]`. The `memory` data structure is only used for memory reads and writes. The `allocated_pages` and `hot_pages` are only used for gas costing.
3038

3139
## Specification
3240

@@ -55,7 +63,7 @@ A transaction-global memory limit is imposed. If the number of pages allocated i
5563

5664
## Rationale
5765

58-
Benchmarks were performed on a 2019-era CPU, with the ability to keccak256 around 256MB/s, giving it a gas-to-ns ratio of 20 ns per 1 gas. The following benchmarks were performed:
66+
Benchmarks were performed on a 2019-era CPU, with the ability to `keccak256` around 256MB/s, giving it a gas-to-ns ratio of 20 ns per 1 gas (given that `keccak256` costs 6 gas per 32 bytes). The following benchmarks were performed:
5967

6068
- Time to allocate a fresh page: 1-2us
6169
- Time to randomly read a byte from a 2MB range: 1.8ns
@@ -64,13 +72,16 @@ Benchmarks were performed on a 2019-era CPU, with the ability to keccak256 aroun
6472
- Time to update a hashmap with 512 items: 8ns
6573
- Time to update a hashmap with 8192 items: 9ns
6674
- Time to update a hashmap with 5mm items: 108ns
75+
- Time to execute the `mmap` syscall: 230ns
6776

6877
These suggest the following prices:
6978

7079
- 100 gas to allocate a page, and
7180
- 6 gas for a page thrash
7281

73-
Since the delta between hitting a page and thrashing a page (including bookkeeping overhead) is ~120ns, we could ignore the resource cost and simply increase the base cost per memory operation from 3 gas to 6 gas. However, since memory operations which exploit cost-locality are so cheap, it leaves "room on the table" for future improvements to the gas schedule, including reducing the base cost of a memory operation to 1 gas. Furthermore, as the reference implementation below shows, it takes very little bookkeeping overhead (one additional data structure, and four lines of code) to check for the thrash.
82+
Note that the cost to execute `mmap` (~11 gas) is already well-paid for by the base cost of the CALL series of instructions (100 gas).
83+
84+
Since the delta between hitting a page and thrashing a page (including bookkeeping overhead) is ~120ns, we could ignore the resource cost and simply increase the base cost per memory operation from 3 gas to 6 gas. However, since memory operations which exploit cost-locality are so cheap, it leaves "room on the table" for future improvements to the gas schedule, including reducing the base cost of a memory operation to 1 gas. Furthermore, as the reference implementation below shows, it takes very little bookkeeping overhead (one additional data structure, and four lines of code) to check for the thrash. Therefore, we model memory with a one-level hierarchy. While this is simpler than most real CPUs, which may have several levels of memory hierarchy, it is granular enough for our purposes.
7485

7586
There is a desire among client implementations to be able to enforce global limits separately from the gas limit due to DoS reasons. For example, RPC providers may be designed to allow many concurrent `eth_call` computations with a much higher gas limit than on mainnet. Not implicitly tying the memory limit to the gas limit results in one less vector for misconfiguration. That is not to say that in the future, a clean formula cannot be created which allows the memory limit to scale with future hardware improvements (e.g., proportional to the sqrt of the gas limit), but to limit the scope of things that need to be reasoned about for this EIP, the hard limit is introduced.
7687

@@ -89,14 +100,14 @@ Addressed in Security Considerations section. No backwards compatibility is brok
89100

90101
## Reference Implementation
91102

92-
A ~50-line reference implementation is provided below. It is implemented as a patch against the `py-evm` codebase at commit ethereum/py-evm@fec63b8c4b9dad9fcb1022c48c863bdd584820c6.
103+
A ~60-line reference implementation is provided below. It is implemented as a patch against the `py-evm` codebase at commit ethereum/py-evm@fec63b8c4b9dad9fcb1022c48c863bdd584820c6. (This is a reference implementation, it does not, for example, contain fork choice rules).
93104

94105
```diff
95106
diff --git a/eth/vm/computation.py b/eth/vm/computation.py
96-
index bf34fbee..477f969e 100644
107+
index bf34fbee..db85aee7 100644
97108
--- a/eth/vm/computation.py
98109
+++ b/eth/vm/computation.py
99-
@@ -454,34 +454,37 @@ class BaseComputation(ComputationAPI, Configurable):
110+
@@ -454,34 +454,40 @@ class BaseComputation(ComputationAPI, Configurable):
100111
validate_uint256(start_position, title="Memory start position")
101112
validate_uint256(size, title="Memory size")
102113

@@ -106,12 +117,12 @@ index bf34fbee..477f969e 100644
106117
- before_cost = memory_gas_cost(before_size)
107118
- after_cost = memory_gas_cost(after_size)
108119
-
109-
if self.logger.show_debug2:
110-
self.logger.debug2(
111-
f"MEMORY: size ({before_size} -> {after_size}) | "
112-
f"cost ({before_cost} -> {after_cost})"
113-
)
114-
120+
- if self.logger.show_debug2:
121+
- self.logger.debug2(
122+
- f"MEMORY: size ({before_size} -> {after_size}) | "
123+
- f"cost ({before_cost} -> {after_cost})"
124+
- )
125+
-
115126
- if size:
116127
- if before_cost < after_cost:
117128
- gas_fee = after_cost - before_cost
@@ -126,35 +137,86 @@ index bf34fbee..477f969e 100644
126137
- )
127138
- ),
128139
- )
140+
-
141+
- self._memory.extend(start_position, size)
129142
+ if size == 0:
130143
+ return
131144
+
132145
+ ALLOCATE_PAGE_COST = 100
133146
+ THRASH_PAGE_COST = 6
147+
+ LOWER_BITS = 12 # bits ignored for page calculations
148+
+ PAGE_SIZE = 4096
149+
+ assert 2**LOWER_BITS == PAGE_SIZE # sanity check
150+
+ MAXIMUM_MEMORY_SIZE = 64 * 1024 * 1024
151+
+ TRANSACTION_MAX_PAGES = MAXIMUM_MEMORY_SIZE // PAGE_SIZE
134152
+
135153
+ end = start_position + size
136154
+
137-
+ start_page = start_position >> 12
138-
+ end_page = end >> 12
139-
+
140-
+ gas = 0
155+
+ start_page = start_position >> LOWER_BITS
156+
+ end_page = end >> LOWER_BITS
141157
+
142158
+ for page in range(start_page, end_page + 1):
143159
+ if page not in self._memory.pages:
144-
+ gas += ALLOCATE_PAGE_COST
160+
+ if self.transaction_context.num_pages >= TRANSACTION_MAX_PAGES:
161+
+ raise VMError("Out Of Memory")
162+
+ self.transaction_context.num_pages += 1
145163
+
146-
+ if page not in self._memory.lru_pages:
147-
+ gas += THRASH_PAGE_COST
164+
+ reason = f"Allocating page {hex(page << LOWER_BITS)}"
165+
+ self._gas_meter.consume_gas(ALLOCATE_PAGE_COST, reason)
166+
+ self._memory.pages[page] = True
148167
+
149-
+ for page in range(start_page, end_page + 1):
150-
+ self._memory.lru_pages[page] = True
151-
152-
- self._memory.extend(start_position, size)
153-
+ reason = f"Expanding memory {before_size} -> {after_size}"
154-
+ self._gas_meter.consume_gas(gas, reason)
168+
+ if page not in self._memory.lru_pages:
169+
+ reason = f"Page {hex(page << LOWER_BITS)} not in LRU pages"
170+
+ self._gas_meter.consume_gas(THRASH_PAGE_COST, reason)
171+
+ # insert into the lru_pages data structure.
172+
+ # it's important to do it here rather than after
173+
+ # the loop, since this could evict a page we haven't
174+
+ # visited yet, increasing the cost.
175+
+ self._memory.lru_pages[page] = True
155176

156177
def memory_write(self, start_position: int, size: int, value: bytes) -> None:
157178
return self._memory.write(start_position, size, value)
179+
diff --git a/eth/vm/forks/frontier/computation.py b/eth/vm/forks/frontier/computation.py
180+
index 51666ae0..443f82b5 100644
181+
--- a/eth/vm/forks/frontier/computation.py
182+
+++ b/eth/vm/forks/frontier/computation.py
183+
@@ -29,6 +29,7 @@ from eth.exceptions import (
184+
InsufficientFunds,
185+
OutOfGas,
186+
StackDepthLimit,
187+
+ VMError,
188+
)
189+
from eth.vm.computation import (
190+
BaseComputation,
191+
@@ -87,12 +88,21 @@ class FrontierComputation(BaseComputation):
192+
193+
state.touch_account(message.storage_address)
194+
195+
- computation = cls.apply_computation(
196+
- state,
197+
- message,
198+
- transaction_context,
199+
- parent_computation=parent_computation,
200+
- )
201+
+ # implement transaction-global memory limit
202+
+ num_pages_anchor = transaction_context.num_pages
203+
+ try:
204+
+ computation = cls.apply_computation(
205+
+ state,
206+
+ message,
207+
+ transaction_context,
208+
+ parent_computation=parent_computation,
209+
+ )
210+
+ finally:
211+
+ # "deallocate" all the pages allocated in the child computation
212+
+
213+
+ # sanity check an invariant:
214+
+ allocated_pages = len(computation._memory.pages)
215+
+ assert transaction_context.num_pages == num_pages_anchor + allocated pages
216+
+ transaction_context.num_pages = num_pages_anchor
217+
218+
if computation.is_error:
219+
state.revert(snapshot)
158220
diff --git a/eth/vm/logic/memory.py b/eth/vm/logic/memory.py
159221
index 806dbd8b..247b3c74 100644
160222
--- a/eth/vm/logic/memory.py
@@ -169,7 +231,7 @@ index 806dbd8b..247b3c74 100644
169231

170232
def mcopy(computation: ComputationAPI) -> None:
171233
diff --git a/eth/vm/memory.py b/eth/vm/memory.py
172-
index 2ccfd090..5950a4d4 100644
234+
index 2ccfd090..9002b559 100644
173235
--- a/eth/vm/memory.py
174236
+++ b/eth/vm/memory.py
175237
@@ -1,8 +1,11 @@
@@ -180,7 +242,7 @@ index 2ccfd090..5950a4d4 100644
180242
from eth._utils.numeric import (
181243
ceil32,
182244
)
183-
+from eth.exceptions import PyEVMError
245+
+from eth.exceptions import VMError
184246
from eth.abc import (
185247
MemoryAPI,
186248
)
@@ -231,7 +293,7 @@ index 2ccfd090..5950a4d4 100644
231293
- def __len__(self) -> int:
232294
- return len(self._bytes)
233295
+ if start_position + size >= 2**32:
234-
+ raise PyEVMError("Non 32-bit address")
296+
+ raise VMError("Non 32-bit address")
235297

236298
- def write(self, start_position: int, size: int, value: bytes) -> None:
237299
- if size:
@@ -268,6 +330,19 @@ index 2ccfd090..5950a4d4 100644
268330
- buf = memoryview(self._bytes)
269331
+ buf = memoryview(self.memview)
270332
buf[destination : destination + length] = buf[source : source + length]
333+
diff --git a/eth/vm/transaction_context.py b/eth/vm/transaction_context.py
334+
index 79b570e9..5943f897 100644
335+
--- a/eth/vm/transaction_context.py
336+
+++ b/eth/vm/transaction_context.py
337+
@@ -36,6 +36,9 @@ class BaseTransactionContext(TransactionContextAPI):
338+
# post-cancun
339+
self._blob_versioned_hashes = blob_versioned_hashes or []
340+
341+
+ # eip-7923
342+
+ self.num_pages = 0
343+
+
344+
def get_next_log_counter(self) -> int:
345+
return next(self._log_counter)
271346
```
272347

273348
## Security Considerations

0 commit comments

Comments
 (0)