chore: update evm data structures #401

shane-moore · 2025-03-21T22:19:34Z

Changes

docs/wiki/EL/evm.md updated with:
- contexted added to sections about stack, program counter, memory, and storage to improve reader understanding of why these data structures exist
- added Calldata section since it's another storage mechanism utilized by the EVM

shane-moore · 2025-03-21T22:20:39Z

@taxmeifyoucan, this PR is ready for review 🤙

raxhvl

Thanks @shane-moore for expanding on data locations, very helpful! I have left some comments.

It's important to keep this article EVM-centric and agnostic of any specific high-level language to ensure its relevance as new languages are introduced in the future. This way, the concepts discussed will remain applicable regardless of which language is used to interact with the Ethereum Virtual Machine (EVM).

If you must, specific details related to Solidity can be mentioned in a markdown note.

docs/wiki/EL/evm.md

raxhvl · 2025-03-22T09:30:23Z

docs/wiki/EL/evm.md

@@ -222,7 +243,7 @@ EVM memory is a byte array of $2^{256}$ (or [practically infinite](https://www.t

 ![EVM Memory](../../images/evm/evm-memory.gif)

-Unlike stack, which provides data to individual instructions, memory stores ephemeral data that is relevant to the entire program.
+Unlike stack, which provides data to individual instructions, memory stores ephemeral data that is relevant to the entire program.  This data often includes return data and dynamic types such as arrays that will be mutated during the contract function execution.


More generally, memory supplements stack by storing data greater than 1 word (bigger than stack width) and unlike stack, allows a virtually unlimited, indexable byte array

agreed, better to describe more generally the differences of the data structures between memory and stack. I added this sentence, "Since the stack has a hard limit of one word slots, memory supplements the stack by allowing indexed access to arbitrarily sized data. Stack values can be stored to or loaded from memory on demand."

raxhvl · 2025-03-22T09:54:05Z

docs/wiki/EL/evm.md

@@ -258,23 +279,35 @@ EVM doesn't have a direct equivalent to `MSTORE8` for reading. You must read the

 > EVM memory is shown as blocks of 32 bytes to illustrate how memory expansion works. In reality, it is a seamless sequence of bytes, without any inherent divisions or blocks.

+## Calldata
+The **calldata** memory type is very similar to **Memory** as discussed above in that they both store dynamically sized data that is removed after contract execution. However, the **calldata** memory structure stores read-only data originating from a function's parameters.  The EVM will store the parameter as calldata since it's cheaper than copying the value to **Memory**.


The calldata memory type is very similar...

The calldata is better understood as an input to the current execution environment, rather than a traditional memory type. As such, inputs are read-only. These inputs can come directly from the transaction or as part of a message call.

The EVM will store the parameter as calldata since it's cheaper than copying the value to Memory.

This is incorrect. Function parameters are abstractions of languages like Solidity that do not exist in the EVM. When you define a parameter as calldata in Solidity, the EVM uses CALLDATALOAD to directly read from calldata without creating a copy. However, when the parameter is marked as memory, a copy of the calldata is loaded into memory using CALLDATACOPY, where it can be modified if necessary.

This discussion is beyond the scope of the EVM, so you can choose to disregard it. Suffice it to say, calldata refers to the input byte array.

@raxhvl, thanks for all this clarification bud. Reworded this section to make it quite clear that calldata is read-only input data: "The calldata is read-only input data passed to the EVM via message call instructions or from a transaction and is stored as a sequence of bytes that are accessible via specific opcodes."

raxhvl · 2025-03-22T10:00:03Z

docs/wiki/EL/evm.md

+The **calldata** memory type is very similar to **Memory** as discussed above in that they both store dynamically sized data that is removed after contract execution. However, the **calldata** memory structure stores read-only data originating from a function's parameters.  The EVM will store the parameter as calldata since it's cheaper than copying the value to **Memory**.
+
+###  Reading from calldata
+If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).


Suggested change

If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).

The calldata for the current environment can be accessed using either:

- `CALLDATALOAD` opcode which reads 32 bytes from a desired offset onto the stack, [learn more](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).

- or, using `CALLDATACOPY` to copy a portion of calldata to memory.

The previous comment discusses how calldata is initialized (tx or message call) so its suffice to discuss how to read from it.

@raxhvl, for sure! reworded as you suggest above

raxhvl · 2025-03-22T10:02:45Z

docs/wiki/EL/evm.md

 ## Storage

-Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state.
+Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state.  It can be thought of as the **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.


Suggested change

Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as the **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.

Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as a key-value **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.

raxhvl · 2025-03-22T10:13:26Z

docs/wiki/EL/evm.md


 ![EVM Storage](../../images/evm/evm-storage.jpg)

 Storage can only be accessed via the code of its associated account. External accounts don't have code and therefore cannot access their own storage.

 ## Writing to storage

-`SSTORE` takes two values from the stack: a storage **slot** and a 32-byte **value**. It then writes the value to storage of the account.
+`SSTORE` takes two values from the stack: a storage **slot** and a 32-byte **value**. It then writes the value to storage of the account.  Notice: Slots can be thought of as the basic unit of storage, so when writing to storage, we deal with slots as opposed to individual bytes.  Depending upon the data type being stored, the value will take up a full slot or could be combined with other data in the same slot if the their combined length is less than 32 bytes long. More info on how different data types are packed into slots can be found [here](https://medium.com/coinmonks/solidity-storage-how-does-it-work-8354afde3eb).


Depending upon the data type being stored, the value will take up a full slot or could be combined with other data in the same slot if the their combined length is less than 32 bytes long. More info on how different data types are packed into slots can be found here.

EVM writes exactly 32 bytes to storage. You could reframe this to say:

Writing to storage is expensive. High-level languages like Solidity optimize storage by packing multiple variables into a single 32-byte slot when their combined size is less than or equal to 32 bytes.

Again, this has to do with Solidity than EVM.

shane-moore · 2025-03-25T04:21:00Z

@raxhvl, thanks for the indepth review! You can find updates in 759c455

raxhvl · 2025-03-25T04:48:29Z

docs/wiki/EL/evm.md

+- **DUPn**: Duplicate nth stack item to top.
+- **SWAPn**: Swap top with n+1th stack item.
+
+The max `n` in the **DUP** and **SWAP** opcodes is 16, As mentioned, the EVM only allows single byte opcodes. In the case of **DUP** and **SWAP**, the byte is split into two 4 bit values called nibbles. The lower nibble specifies the value on the stack to swap or duplicate and is limited to 16 due to only being 4 bits in length, hence a max of **DUP16** and **SWAP16**.


The max n in the DUP and SWAP opcodes is 16, As mentioned, the EVM only allows single byte opcodes. In the case of DUP and SWAP, the byte is split into two 4 bit values called nibbles. The lower nibble specifies the value on the stack to swap or duplicate and is limited to 16 due to only being 4 bits in length, hence a max of DUP16 and SWAP16.

I'm not sure if the DUP/SWAP opcodes are bounded by this design. I always thought of them as simple opcodes assigned to arbitrary values like PUSH*.

cc: @chfast @shemnon

The nybble split is one way to frame it, but not essential. Both framings are valid. Many clients drop DUP/SWAP/PUSH into one section of code and they take the last 4/5 bits to determine the size of the swap/dup/push. But it may be easier in ZK to treat them all as unique.

@shemnon, I see what you're saying. The lower nibble utilization is more of a client specific implementation. How about rewording this section to something like this to keep it simple and evm specific:
"The maximum n for DUP and SWAP is 16, corresponding to opcodes 0x80–0x8f and 0x90–0x9f respectively. These opcodes are explicitly defined in the EVM and form a fixed set — making the 16-item limit an EVM-level constraint."
cc @raxhvl

@shane-moore That sounds great!

raxhvl · 2025-03-25T04:56:22Z

Your updates are even better! This looks great to me barring a small comment. Let me check with some folks.

taxmeifyoucan

Thanks for the update! The current version looks great

chore: update evm data structures

be72484

raxhvl requested changes Mar 22, 2025

View reviewed changes

chore: updates per PR review

759c455

raxhvl reviewed Mar 25, 2025

View reviewed changes

taxmeifyoucan approved these changes Mar 28, 2025

View reviewed changes

taxmeifyoucan merged commit 27da55b into eth-protocol-fellows:main Mar 28, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: update evm data structures #401

chore: update evm data structures #401

shane-moore commented Mar 21, 2025

shane-moore commented Mar 21, 2025

raxhvl left a comment

raxhvl Mar 22, 2025

shane-moore Mar 25, 2025

raxhvl Mar 22, 2025

shane-moore Mar 25, 2025

raxhvl Mar 22, 2025

shane-moore Mar 25, 2025

raxhvl Mar 22, 2025

raxhvl Mar 22, 2025

shane-moore commented Mar 25, 2025

raxhvl Mar 25, 2025 •

edited

Loading

shemnon Mar 25, 2025

shane-moore Mar 25, 2025

raxhvl Mar 28, 2025

raxhvl commented Mar 25, 2025 •

edited

Loading

taxmeifyoucan left a comment

-If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).
+The calldata for the current environment can be accessed using either:
+-  `CALLDATALOAD` opcode which reads 32 bytes from a desired offset onto the stack, [learn more](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).
+- or, using `CALLDATACOPY` to copy a portion of calldata to memory.

	Storage is designed as a word-addressed word array. Unlike memory, storage is associated with an Ethereum account and is persisted across transactions as part of the world state. It can be thought of as the database associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.
	Storage is designed as a word-addressed word array. Unlike memory, storage is associated with an Ethereum account and is persisted across transactions as part of the world state. It can be thought of as a key-value database associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.

chore: update evm data structures #401

chore: update evm data structures #401

Conversation

shane-moore commented Mar 21, 2025

Changes

shane-moore commented Mar 21, 2025

raxhvl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shane-moore commented Mar 25, 2025

raxhvl Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raxhvl commented Mar 25, 2025 • edited Loading

taxmeifyoucan left a comment

Choose a reason for hiding this comment

raxhvl Mar 25, 2025 •

edited

Loading

raxhvl commented Mar 25, 2025 •

edited

Loading