[v2] Compute linearised members of all contracts in a new semantic pass#1805
[v2] Compute linearised members of all contracts in a new semantic pass#1805ggiraldez wants to merge 5 commits into
Conversation
|
teofr
left a comment
There was a problem hiding this comment.
Early pass and comments, should we add the ci:perf label?
| /// Walks the linearised bases in reverse (most-base first) and concatenates | ||
| /// every contract's state-variable members in source order. Interfaces don't | ||
| /// contribute state variables in Solidity. | ||
| fn collect_linearised_state_variables( | ||
| binder: &Binder, | ||
| contract_id: NodeId, | ||
| ) -> Vec<ir::StateVariableDefinition> { |
There was a problem hiding this comment.
We'll probably need to validate earlier whether state variables have the same name, but a comment or maybe even a debug_assert could help.
There was a problem hiding this comment.
I don't think we can make any guarantees at this point. Do you propose we skip the computation and return an empty vector if there are duplicates? This is a similar situation to #1806 (comment). Since we don't provide any guarantees when the user input is not valid through the type system, I don't see what else we should do here.
Maybe we should discuss blocking the AST if there are any diagnostics emitted?
There was a problem hiding this comment.
I meant more a note on behaviour over repeated state variables, right now they're repeated.
There was a problem hiding this comment.
A lot of validations (like this one) could happen while constructing the cache, and otherwise don't have a clear place in the code right now. I'm thinking maybe it makes sense to formalize this as a pass p5_compute_linearisations. We generate ContractDataCache as a by-product of executing the pass, but we also perform all validations related to inheritance, overriding, etc.
There was a problem hiding this comment.
A lot of validations (like this one) could happen while constructing the cache, and otherwise don't have a clear place in the code right now. I'm thinking maybe it makes sense to formalize this as a pass
p5_compute_linearisations. We generateContractDataCacheas a by-product of executing the pass, but we also perform all validations related to inheritance, overriding, etc.
I went ahead and refactored the code into a new semantic pass. I also added some new TODO(validation) comments that I'll start addressing in separate PRs, but I think this is the ideal place to run those validations.
| linearised_state_variables: Vec<ir::StateVariableDefinition>, | ||
| linearised_errors: Vec<ir::ErrorDefinition>, | ||
| linearised_events: Vec<ir::EventDefinition>, |
There was a problem hiding this comment.
I have my doubts on whether these ones should be cached (state variables, errors, and events), generating the data is linear, so we could easily return an iterator over the bases's members instead.
There was a problem hiding this comment.
I hadn't tackled this comment yet, but after thinking about duplicates of state variables we will have the same issue here. Checking for duplicate declarations in an inheritance tree needs to happen both for errors and events, and not just if the user decides to get the linearisations.
There was a problem hiding this comment.
Sorry, I thought it was ready for a re-review before.
I guess it's two separate questions, the validation pass should be done, I agree with that. But is it worth it to cache these values? Or could they be calculated on demand (without performing a second validation).
The fact that this PR barely moved the needle on used memory on the benchmarks makes me think there's actually not that much at stake here (ie chains are very short), but maybe I'm missing something on the expensive calculations (ie function linearisation).
Another question to be asked is, how many times will user use these linearised vectors.
There was a problem hiding this comment.
The vectors should be very small in comparison to the IR, that's probably why it doesn't move the needle in the perf benchmarks.
As to how many times they will be used, I don't know for sure, but for solx at least once for functions, to codegen each function. For state variables I don't think they would need it directly, but we use that for computing the storage layout, which they do consume. Errors and events are probably not needed. Then again, the memory usage for caching them should be negligible.
4bd5b30 to
39dd3d5
Compare
|
| Branch | ggiraldez/v2-cache-linearisations |
| Testbed | ci |
⚠️ WARNING: Truncated view!The full continuous benchmarking report exceeds the maximum length allowed on this platform.
⚠️ WARNING: No Threshold found!Without a Threshold, no Alerts will ever be generated.
🚨 5 Alerts
🐰 View full continuous benchmarking report in Bencher
teofr
left a comment
There was a problem hiding this comment.
LGTM, the only concern is whether we need the cache for the linear time data as well.
| linearised_state_variables: Vec<ir::StateVariableDefinition>, | ||
| linearised_errors: Vec<ir::ErrorDefinition>, | ||
| linearised_events: Vec<ir::EventDefinition>, |
Add validation TODO comments there, to be implemented later. Rename data structures to better reflect their new role.
1d8cf1c to
4a41621
Compare
teofr
left a comment
There was a problem hiding this comment.
It looks good, I really like the new pass, and you're right, it's a natural place for a lot of validations.
| @@ -75,13 +78,15 @@ impl SemanticContext { | |||
| p2_linearise_contracts::run(files, &mut binder, diagnostics); | |||
| p3_type_definitions::run(files, &mut binder, &mut types, language_version); | |||
| p4_resolve_references::run(files, &mut binder, &mut types, language_version); | |||
There was a problem hiding this comment.
p4 uses linearisations to resolve references, have you considered sharing some of the caching to improve performance on it?
There was a problem hiding this comment.
I guess the more general question is, does p5 depend on p4?
There was a problem hiding this comment.
p4 uses linearisations to resolve references, have you considered sharing some of the caching to improve performance on it?
p4 needs the linearisation of contracts, but the lookups happen in the scopes. We might be able to refactor the code to use cached linearisations, but that's probably a bigger change.
I guess the more general question is, does p5 depend on p4?
I don't think it should, because p4_resolve_references resolves expressions/statements identifiers. There are identifiers in type definitions, but those are resolved in p3_type_definitions. So, the input for p5_compute_linearisations is complete by the end of p3.
I'll verify that assertion and reorder the passes.
There was a problem hiding this comment.
Indeed, linearisations can be computed right after p3 and the result is the exact same as computing them at the end. So, for clarity I reordered the passes and put p4_compute_linearisations and then p5_resolve_references.
In the future, it may be possible to use the cached linearisations for resolution as well.
| types: &TypeRegistry, | ||
| contract_id: NodeId, | ||
| ) -> ContractLinearisations { | ||
| let functions = compute_linearised_functions(binder, types, contract_id); |
There was a problem hiding this comment.
nit: should they follow the same naming?
| let functions = compute_linearised_functions(binder, types, contract_id); | |
| let functions = collect_linearised_functions(binder, types, contract_id); |
|
|
||
| /// Cache of derived data about contracts stored on the `SemanticContext`. Every | ||
| /// contract's `NodeId` has an entry in `data`. | ||
| pub(crate) struct ContractData { |
There was a problem hiding this comment.
Doesn't need to be on this PR, but it'd be interested to have a benchmark tracking how these values are used. For example, for some big benchmarks, iterate all linearised definitions for all contracts and use them trivially (compute the hash of their selectors)
Compute linearisations before resolving references
This PR adds a new semantic pass
p5_compute_linearisationsto compute linearised collections of all contract members: functions, state variables, errors and events. This will be the ideal place to perform various validations: check for redefinition of identifiers, checkvirtualandoverrideattributes, etc.As a by-product, the information is collected and saved in the
SemanticContextfor later access from the AST API.This PR adds some new
TODO(validation)comments that will be addressed in a later PR.solx.