Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update evm data structures #401

Merged

Conversation

shane-moore
Copy link
Contributor

Changes

  • docs/wiki/EL/evm.md updated with:
    • contexted added to sections about stack, program counter, memory, and storage to improve reader understanding of why these data structures exist
    • added Calldata section since it's another storage mechanism utilized by the EVM

@shane-moore
Copy link
Contributor Author

@taxmeifyoucan, this PR is ready for review 🤙

Copy link
Member

@raxhvl raxhvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shane-moore for expanding on data locations, very helpful! I have left some comments.

It's important to keep this article EVM-centric and agnostic of any specific high-level language to ensure its relevance as new languages are introduced in the future. This way, the concepts discussed will remain applicable regardless of which language is used to interact with the Ethereum Virtual Machine (EVM).

If you must, specific details related to Solidity can be mentioned in a markdown note.

@@ -222,7 +243,7 @@ EVM memory is a byte array of $2^{256}$ (or [practically infinite](https://www.t

![EVM Memory](../../images/evm/evm-memory.gif)

Unlike stack, which provides data to individual instructions, memory stores ephemeral data that is relevant to the entire program.
Unlike stack, which provides data to individual instructions, memory stores ephemeral data that is relevant to the entire program. This data often includes return data and dynamic types such as arrays that will be mutated during the contract function execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, memory supplements stack by storing data greater than 1 word (bigger than stack width) and unlike stack, allows a virtually unlimited, indexable byte array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, better to describe more generally the differences of the data structures between memory and stack. I added this sentence, "Since the stack has a hard limit of one word slots, memory supplements the stack by allowing indexed access to arbitrarily sized data. Stack values can be stored to or loaded from memory on demand."

@@ -258,23 +279,35 @@ EVM doesn't have a direct equivalent to `MSTORE8` for reading. You must read the

> EVM memory is shown as blocks of 32 bytes to illustrate how memory expansion works. In reality, it is a seamless sequence of bytes, without any inherent divisions or blocks.

## Calldata
The **calldata** memory type is very similar to **Memory** as discussed above in that they both store dynamically sized data that is removed after contract execution. However, the **calldata** memory structure stores read-only data originating from a function's parameters. The EVM will store the parameter as calldata since it's cheaper than copying the value to **Memory**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calldata memory type is very similar...

The calldata is better understood as an input to the current execution environment, rather than a traditional memory type. As such, inputs are read-only. These inputs can come directly from the transaction or as part of a message call.

The EVM will store the parameter as calldata since it's cheaper than copying the value to Memory.

This is incorrect. Function parameters are abstractions of languages like Solidity that do not exist in the EVM. When you define a parameter as calldata in Solidity, the EVM uses CALLDATALOAD to directly read from calldata without creating a copy. However, when the parameter is marked as memory, a copy of the calldata is loaded into memory using CALLDATACOPY, where it can be modified if necessary.

This discussion is beyond the scope of the EVM, so you can choose to disregard it. Suffice it to say, calldata refers to the input byte array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raxhvl, thanks for all this clarification bud. Reworded this section to make it quite clear that calldata is read-only input data: "The calldata is read-only input data passed to the EVM via message call instructions or from a transaction and is stored as a sequence of bytes that are accessible via specific opcodes."

The **calldata** memory type is very similar to **Memory** as discussed above in that they both store dynamically sized data that is removed after contract execution. However, the **calldata** memory structure stores read-only data originating from a function's parameters. The EVM will store the parameter as calldata since it's cheaper than copying the value to **Memory**.

### Reading from calldata
If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).
The calldata for the current environment can be accessed using either:
- `CALLDATALOAD` opcode which reads 32 bytes from a desired offset onto the stack, [learn more](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9).
- or, using `CALLDATACOPY` to copy a portion of calldata to memory.

The previous comment discusses how calldata is initialized (tx or message call) so its suffice to discuss how to read from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raxhvl, for sure! reworded as you suggest above

## Storage

Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state.
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as the **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as the **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as a key-value **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each.


![EVM Storage](../../images/evm/evm-storage.jpg)

Storage can only be accessed via the code of its associated account. External accounts don't have code and therefore cannot access their own storage.

## Writing to storage

`SSTORE` takes two values from the stack: a storage **slot** and a 32-byte **value**. It then writes the value to storage of the account.
`SSTORE` takes two values from the stack: a storage **slot** and a 32-byte **value**. It then writes the value to storage of the account. Notice: Slots can be thought of as the basic unit of storage, so when writing to storage, we deal with slots as opposed to individual bytes. Depending upon the data type being stored, the value will take up a full slot or could be combined with other data in the same slot if the their combined length is less than 32 bytes long. More info on how different data types are packed into slots can be found [here](https://medium.com/coinmonks/solidity-storage-how-does-it-work-8354afde3eb).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending upon the data type being stored, the value will take up a full slot or could be combined with other data in the same slot if the their combined length is less than 32 bytes long. More info on how different data types are packed into slots can be found here.

EVM writes exactly 32 bytes to storage. You could reframe this to say:

Writing to storage is expensive. High-level languages like Solidity optimize storage by packing multiple variables into a single 32-byte slot when their combined size is less than or equal to 32 bytes.

Again, this has to do with Solidity than EVM.

@shane-moore
Copy link
Contributor Author

@raxhvl, thanks for the indepth review! You can find updates in 759c455

- **DUPn**: Duplicate nth stack item to top.
- **SWAPn**: Swap top with n+1th stack item.

The max `n` in the **DUP** and **SWAP** opcodes is 16, As mentioned, the EVM only allows single byte opcodes. In the case of **DUP** and **SWAP**, the byte is split into two 4 bit values called nibbles. The lower nibble specifies the value on the stack to swap or duplicate and is limited to 16 due to only being 4 bits in length, hence a max of **DUP16** and **SWAP16**.
Copy link
Member

@raxhvl raxhvl Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max n in the DUP and SWAP opcodes is 16, As mentioned, the EVM only allows single byte opcodes. In the case of DUP and SWAP, the byte is split into two 4 bit values called nibbles. The lower nibble specifies the value on the stack to swap or duplicate and is limited to 16 due to only being 4 bits in length, hence a max of DUP16 and SWAP16.

I'm not sure if the DUP/SWAP opcodes are bounded by this design. I always thought of them as simple opcodes assigned to arbitrary values like PUSH*.

cc: @chfast @shemnon

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nybble split is one way to frame it, but not essential. Both framings are valid. Many clients drop DUP/SWAP/PUSH into one section of code and they take the last 4/5 bits to determine the size of the swap/dup/push. But it may be easier in ZK to treat them all as unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shemnon, I see what you're saying. The lower nibble utilization is more of a client specific implementation. How about rewording this section to something like this to keep it simple and evm specific:
"The maximum n for DUP and SWAP is 16, corresponding to opcodes 0x80–0x8f and 0x90–0x9f respectively. These opcodes are explicitly defined in the EVM and form a fixed set — making the 16-item limit an EVM-level constraint."
cc @raxhvl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shane-moore That sounds great!

@raxhvl
Copy link
Member

raxhvl commented Mar 25, 2025

Your updates are even better! This looks great to me barring a small comment. Let me check with some folks.

Copy link
Contributor

@taxmeifyoucan taxmeifyoucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! The current version looks great

@taxmeifyoucan taxmeifyoucan merged commit 27da55b into eth-protocol-fellows:main Mar 28, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants