Description
The IR code generation in libsolidity/experimental/codegen
of #14510 is quite incomplete so far. This issue explains the next step of extending it.
Currently, the generation assumes that every type fits into exactly one stack slot (thereby all types can be treated the same), while in reality, e.g. unit
types don't require any stack slot, void
types cannot have a representation on stack at all, pair
(resp. in general tuple)-types may require multiple stack slots (similar for sum
types eventually).
In the long-term it would be nice to be able to define the stack representation of a type in-language by instantiating special type classes - and code generation would use compile-time expression evaluation to determine the stack size of a type. However, as a first step we want to do this in a hard-coded manner.
This means that we need to associate primitive types (note that due to those changing, this work should be based on #14566) with stack sizes: 0 for unit
, 1 for word
(and function types, even though we won't properly handle them for now), 1
for bool
, none for void
and integer
. For pair
types we sum up the sizes of their type arguments. For user-defined types, we take the stack size of the underlying type.
This sounds easy in theory, but will probably take a bit of doing:
Note that we can only tell the stack size of fully monomorphic types and we only monomorphize during code generation, so all of this needs to happen during code-generation. Code-gen already involves monomorphization for example in that IRGenerator::generate(FunctionDefinition const& _function, Type _type)
already gets a concrete type and it stores the correct type environment in the context - relative to that type environment we will always get fully monomorphic types for which we can determine the stack size. Moving from user-defined types to the underlying types may still involve local type environments and unification to construct the correct argument types for the underlying type.
On the codegen-side we will need a mechanism similar to https://github.com/ethereum/solidity/blob/develop/libsolidity/codegen/ir/IRVariable.h - i.e. instead of generating code for expression directly as single Yul variables, we'll want to abstract them into IRVariables
that may extend to multiple stack slots. However, we won't need any complex notion of conversions on IRVariables
since non-trivial conversions (i.e. conversions other than abs
and rep
for user-defined types that will be no-ops for code generation) will be defined in-language, so code generation itself won't need to deal with it. We will still need a (simpler, since without non-trivial conversions) equivalent of IRGeneratorForStatements::declare
and IRGeneratorForStatements::assign
from https://github.com/ethereum/solidity/blob/develop/libsolidity/codegen/ir/IRGeneratorForStatements.cpp, but without the conversion logic (the main complication will be to turn assignments of multiple variables into multiple assignments, since Yul doesn't allow multi-assignments - but if need be we can use identity functions to-be-inlined for that. I.e. let x,y := z, w
is invalid in Yul, so we either need to split into let x := z let y := w
or turn it into let x,y := identity_2(z,w)
with function identity_2(a,b) -> r,s { r := a s := b }
)
So the main thing to do is to replace instances of declarations like
m_code << "let " << IRNames::localVariable(_identifier) << ...;
with declarations of IRVariables
of the proper type (which will resolve to a multi-variable declaration on the yul level) - and similarly references to and assignments to expressions of a given type.
rep
and abs
can still remain no-ops (but should assert equal stack sizes of argument and return type).
After the above is done in a subsequent step, we need to build proper code generation for the pair.first
and pair.second
functions - and then build proper pattern-matching destructuring on the parsing/inference side and the code generation-side (i.e. let (a,b) = (c, d);
, etc.), but this will go hand-in-hand with generally defining proper type constructors and algebraic data types in language, so out of scope for this issue (first step will be abstracting IRVariables
and make sure things work for single-stack-slot types - once that works, we can experiment with pair.first
and pair.second
on tuples).
So to be clear, the first task here merely involves:
- Determine the stack sizes of types (primitive and defined).
- Build an
IRVariable
-mechanism to seamlessly handle multi-variable declarations and multi-assignments in place of the current assumption that everything is one stack slot.