-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: update evm data structures #401
chore: update evm data structures #401
Conversation
@taxmeifyoucan, this PR is ready for review 🤙 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shane-moore for expanding on data locations, very helpful! I have left some comments.
It's important to keep this article EVM-centric and agnostic of any specific high-level language to ensure its relevance as new languages are introduced in the future. This way, the concepts discussed will remain applicable regardless of which language is used to interact with the Ethereum Virtual Machine (EVM).
If you must, specific details related to Solidity can be mentioned in a markdown note.
docs/wiki/EL/evm.md
Outdated
@@ -222,7 +243,7 @@ EVM memory is a byte array of $2^{256}$ (or [practically infinite](https://www.t | |||
|
|||
 | |||
|
|||
Unlike stack, which provides data to individual instructions, memory stores ephemeral data that is relevant to the entire program. | |||
Unlike stack, which provides data to individual instructions, memory stores ephemeral data that is relevant to the entire program. This data often includes return data and dynamic types such as arrays that will be mutated during the contract function execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, memory supplements stack by storing data greater than 1 word (bigger than stack width) and unlike stack, allows a virtually unlimited, indexable byte array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, better to describe more generally the differences of the data structures between memory and stack. I added this sentence, "Since the stack has a hard limit of one word slots, memory supplements the stack by allowing indexed access to arbitrarily sized data. Stack values can be stored to or loaded from memory on demand."
docs/wiki/EL/evm.md
Outdated
@@ -258,23 +279,35 @@ EVM doesn't have a direct equivalent to `MSTORE8` for reading. You must read the | |||
|
|||
> EVM memory is shown as blocks of 32 bytes to illustrate how memory expansion works. In reality, it is a seamless sequence of bytes, without any inherent divisions or blocks. | |||
|
|||
## Calldata | |||
The **calldata** memory type is very similar to **Memory** as discussed above in that they both store dynamically sized data that is removed after contract execution. However, the **calldata** memory structure stores read-only data originating from a function's parameters. The EVM will store the parameter as calldata since it's cheaper than copying the value to **Memory**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calldata memory type is very similar...
The calldata is better understood as an input to the current execution environment, rather than a traditional memory type. As such, inputs are read-only. These inputs can come directly from the transaction or as part of a message call.
The EVM will store the parameter as calldata since it's cheaper than copying the value to Memory.
This is incorrect. Function parameters are abstractions of languages like Solidity that do not exist in the EVM. When you define a parameter as calldata
in Solidity, the EVM uses CALLDATALOAD
to directly read from calldata
without creating a copy. However, when the parameter is marked as memory
, a copy of the calldata
is loaded into memory using CALLDATACOPY
, where it can be modified if necessary.
This discussion is beyond the scope of the EVM, so you can choose to disregard it. Suffice it to say, calldata refers to the input byte array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raxhvl, thanks for all this clarification bud. Reworded this section to make it quite clear that calldata is read-only input data: "The calldata is read-only input data passed to the EVM via message call instructions or from a transaction and is stored as a sequence of bytes that are accessible via specific opcodes."
docs/wiki/EL/evm.md
Outdated
The **calldata** memory type is very similar to **Memory** as discussed above in that they both store dynamically sized data that is removed after contract execution. However, the **calldata** memory structure stores read-only data originating from a function's parameters. The EVM will store the parameter as calldata since it's cheaper than copying the value to **Memory**. | ||
|
||
### Reading from calldata | ||
If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the transaction calling this smart contract passes in calldata, the inputted data can be loaded using the `CALLDATALOAD` opcode, which reads 32 bytes from calldata at a given offset and pushes it onto the stack once the program counter reaches its position in the bytecode. More info on the `CALLDATALOAD` opcode can be found [here](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9). | |
The calldata for the current environment can be accessed using either: | |
- `CALLDATALOAD` opcode which reads 32 bytes from a desired offset onto the stack, [learn more](https://veridelisi.medium.com/learn-evm-opcodes-v-a59dc7cbf9c9). | |
- or, using `CALLDATACOPY` to copy a portion of calldata to memory. |
The previous comment discusses how calldata is initialized (tx or message call) so its suffice to discuss how to read from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raxhvl, for sure! reworded as you suggest above
docs/wiki/EL/evm.md
Outdated
## Storage | ||
|
||
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. | ||
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as the **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as the **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each. | |
Storage is designed as a **word-addressed word array**. Unlike memory, storage is associated with an Ethereum account and is **persisted** across transactions as part of the world state. It can be thought of as a key-value **database** associated with the smart contract, which is why it contains the contract's "state" variables. Storage size is fixed at 2^256 slots, 32 bytes each. |
docs/wiki/EL/evm.md
Outdated
|
||
 | ||
|
||
Storage can only be accessed via the code of its associated account. External accounts don't have code and therefore cannot access their own storage. | ||
|
||
## Writing to storage | ||
|
||
`SSTORE` takes two values from the stack: a storage **slot** and a 32-byte **value**. It then writes the value to storage of the account. | ||
`SSTORE` takes two values from the stack: a storage **slot** and a 32-byte **value**. It then writes the value to storage of the account. Notice: Slots can be thought of as the basic unit of storage, so when writing to storage, we deal with slots as opposed to individual bytes. Depending upon the data type being stored, the value will take up a full slot or could be combined with other data in the same slot if the their combined length is less than 32 bytes long. More info on how different data types are packed into slots can be found [here](https://medium.com/coinmonks/solidity-storage-how-does-it-work-8354afde3eb). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending upon the data type being stored, the value will take up a full slot or could be combined with other data in the same slot if the their combined length is less than 32 bytes long. More info on how different data types are packed into slots can be found here.
EVM writes exactly 32 bytes to storage. You could reframe this to say:
Writing to storage is expensive. High-level languages like Solidity optimize storage by packing multiple variables into a single 32-byte slot when their combined size is less than or equal to 32 bytes.
Again, this has to do with Solidity than EVM.
- **DUPn**: Duplicate nth stack item to top. | ||
- **SWAPn**: Swap top with n+1th stack item. | ||
|
||
The max `n` in the **DUP** and **SWAP** opcodes is 16, As mentioned, the EVM only allows single byte opcodes. In the case of **DUP** and **SWAP**, the byte is split into two 4 bit values called nibbles. The lower nibble specifies the value on the stack to swap or duplicate and is limited to 16 due to only being 4 bits in length, hence a max of **DUP16** and **SWAP16**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max
n
in the DUP and SWAP opcodes is 16, As mentioned, the EVM only allows single byte opcodes. In the case of DUP and SWAP, the byte is split into two 4 bit values called nibbles. The lower nibble specifies the value on the stack to swap or duplicate and is limited to 16 due to only being 4 bits in length, hence a max of DUP16 and SWAP16.
I'm not sure if the DUP/SWAP
opcodes are bounded by this design. I always thought of them as simple opcodes assigned to arbitrary values like PUSH*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nybble split is one way to frame it, but not essential. Both framings are valid. Many clients drop DUP/SWAP/PUSH into one section of code and they take the last 4/5 bits to determine the size of the swap/dup/push. But it may be easier in ZK to treat them all as unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shemnon, I see what you're saying. The lower nibble utilization is more of a client specific implementation. How about rewording this section to something like this to keep it simple and evm specific:
"The maximum n for DUP and SWAP is 16, corresponding to opcodes 0x80–0x8f and 0x90–0x9f respectively. These opcodes are explicitly defined in the EVM and form a fixed set — making the 16-item limit an EVM-level constraint."
cc @raxhvl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shane-moore That sounds great!
Your updates are even better! This looks great to me barring a small comment. Let me check with some folks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! The current version looks great
27da55b
into
eth-protocol-fellows:main
Changes
docs/wiki/EL/evm.md
updated with:Calldata
section since it's another storage mechanism utilized by the EVM