Skip to content

[v2] Compute linearised members of all contracts in a new semantic pass#1805

Open
ggiraldez wants to merge 5 commits into
mainfrom
ggiraldez/v2-cache-linearisations
Open

[v2] Compute linearised members of all contracts in a new semantic pass#1805
ggiraldez wants to merge 5 commits into
mainfrom
ggiraldez/v2-cache-linearisations

Conversation

@ggiraldez

@ggiraldez ggiraldez commented May 28, 2026

Copy link
Copy Markdown
Contributor

This PR adds a new semantic pass p5_compute_linearisations to compute linearised collections of all contract members: functions, state variables, errors and events. This will be the ideal place to perform various validations: check for redefinition of identifiers, check virtual and override attributes, etc.

As a by-product, the information is collected and saved in the SemanticContext for later access from the AST API.

This PR adds some new TODO(validation) comments that will be addressed in a later PR.

⚠️ Breaks API so will need a migration PR for solx.

@ggiraldez ggiraldez requested review from OmarTawfik and teofr May 28, 2026 22:51
@ggiraldez ggiraldez requested review from a team as code owners May 28, 2026 22:51
@changeset-bot

changeset-bot Bot commented May 28, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: a798fad

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@teofr teofr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early pass and comments, should we add the ci:perf label?

Comment thread crates/solidity-v2/outputs/cargo/slang_solidity/src/compilation/unit.rs Outdated
Comment thread crates/solidity-v2/outputs/cargo/semantic/src/context/mod.rs Outdated
Comment thread crates/solidity-v2/outputs/cargo/semantic/src/context/contract_data_cache.rs Outdated
Comment on lines +179 to +185
/// Walks the linearised bases in reverse (most-base first) and concatenates
/// every contract's state-variable members in source order. Interfaces don't
/// contribute state variables in Solidity.
fn collect_linearised_state_variables(
binder: &Binder,
contract_id: NodeId,
) -> Vec<ir::StateVariableDefinition> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need to validate earlier whether state variables have the same name, but a comment or maybe even a debug_assert could help.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can make any guarantees at this point. Do you propose we skip the computation and return an empty vector if there are duplicates? This is a similar situation to #1806 (comment). Since we don't provide any guarantees when the user input is not valid through the type system, I don't see what else we should do here.

Maybe we should discuss blocking the AST if there are any diagnostics emitted?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more a note on behaviour over repeated state variables, right now they're repeated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of validations (like this one) could happen while constructing the cache, and otherwise don't have a clear place in the code right now. I'm thinking maybe it makes sense to formalize this as a pass p5_compute_linearisations. We generate ContractDataCache as a by-product of executing the pass, but we also perform all validations related to inheritance, overriding, etc.

@ggiraldez ggiraldez Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of validations (like this one) could happen while constructing the cache, and otherwise don't have a clear place in the code right now. I'm thinking maybe it makes sense to formalize this as a pass p5_compute_linearisations. We generate ContractDataCache as a by-product of executing the pass, but we also perform all validations related to inheritance, overriding, etc.

I went ahead and refactored the code into a new semantic pass. I also added some new TODO(validation) comments that I'll start addressing in separate PRs, but I think this is the ideal place to run those validations.

Comment on lines +15 to +17
linearised_state_variables: Vec<ir::StateVariableDefinition>,
linearised_errors: Vec<ir::ErrorDefinition>,
linearised_events: Vec<ir::EventDefinition>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have my doubts on whether these ones should be cached (state variables, errors, and events), generating the data is linear, so we could easily return an iterator over the bases's members instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't tackled this comment yet, but after thinking about duplicates of state variables we will have the same issue here. Checking for duplicate declarations in an inheritance tree needs to happen both for errors and events, and not just if the user decides to get the linearisations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought it was ready for a re-review before.

I guess it's two separate questions, the validation pass should be done, I agree with that. But is it worth it to cache these values? Or could they be calculated on demand (without performing a second validation).

The fact that this PR barely moved the needle on used memory on the benchmarks makes me think there's actually not that much at stake here (ie chains are very short), but maybe I'm missing something on the expensive calculations (ie function linearisation).

Another question to be asked is, how many times will user use these linearised vectors.

@ggiraldez ggiraldez Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectors should be very small in comparison to the IR, that's probably why it doesn't move the needle in the perf benchmarks.

As to how many times they will be used, I don't know for sure, but for solx at least once for functions, to codegen each function. For state variables I don't think they would need it directly, but we use that for computing the storage layout, which they do consume. Errors and events are probably not needed. Then again, the memory usage for caching them should be negligible.

@ggiraldez ggiraldez force-pushed the ggiraldez/v2-cache-linearisations branch from 4bd5b30 to 39dd3d5 Compare June 5, 2026 21:51
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchggiraldez/v2-cache-linearisations
Testbedci

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

🚨 5 Alerts

🐰 View full continuous benchmarking report in Bencher

@teofr teofr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only concern is whether we need the cache for the linear time data as well.

Comment on lines +15 to +17
linearised_state_variables: Vec<ir::StateVariableDefinition>,
linearised_errors: Vec<ir::ErrorDefinition>,
linearised_events: Vec<ir::EventDefinition>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on this?

@ggiraldez ggiraldez force-pushed the ggiraldez/v2-cache-linearisations branch from 1d8cf1c to 4a41621 Compare June 8, 2026 21:54
@ggiraldez ggiraldez changed the title [v2] Cache linearised collection of contract's functions, variables, errors and events [v2] Compute linearised members of all contracts in a new semantic pass Jun 8, 2026
@ggiraldez ggiraldez requested a review from teofr June 8, 2026 22:00

@teofr teofr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, I really like the new pass, and you're right, it's a natural place for a lot of validations.

@@ -75,13 +78,15 @@ impl SemanticContext {
p2_linearise_contracts::run(files, &mut binder, diagnostics);
p3_type_definitions::run(files, &mut binder, &mut types, language_version);
p4_resolve_references::run(files, &mut binder, &mut types, language_version);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p4 uses linearisations to resolve references, have you considered sharing some of the caching to improve performance on it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the more general question is, does p5 depend on p4?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p4 uses linearisations to resolve references, have you considered sharing some of the caching to improve performance on it?

p4 needs the linearisation of contracts, but the lookups happen in the scopes. We might be able to refactor the code to use cached linearisations, but that's probably a bigger change.

I guess the more general question is, does p5 depend on p4?

I don't think it should, because p4_resolve_references resolves expressions/statements identifiers. There are identifiers in type definitions, but those are resolved in p3_type_definitions. So, the input for p5_compute_linearisations is complete by the end of p3.

I'll verify that assertion and reorder the passes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, linearisations can be computed right after p3 and the result is the exact same as computing them at the end. So, for clarity I reordered the passes and put p4_compute_linearisations and then p5_resolve_references.

In the future, it may be possible to use the cached linearisations for resolution as well.

types: &TypeRegistry,
contract_id: NodeId,
) -> ContractLinearisations {
let functions = compute_linearised_functions(binder, types, contract_id);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should they follow the same naming?

Suggested change
let functions = compute_linearised_functions(binder, types, contract_id);
let functions = collect_linearised_functions(binder, types, contract_id);


/// Cache of derived data about contracts stored on the `SemanticContext`. Every
/// contract's `NodeId` has an entry in `data`.
pub(crate) struct ContractData {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be on this PR, but it'd be interested to have a benchmark tracking how these values are used. For example, for some big benchmarks, iterate all linearised definitions for all contracts and use them trivially (compute the hash of their selectors)

Compute linearisations before resolving references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants