-
Notifications
You must be signed in to change notification settings - Fork 123
Compile execution constraints #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
53b5d6c to
558856c
Compare
|
The code is pretty flashed out at this point, but it doesn't work, because I'm trying to find the memory limbs from the unoptimized machine. But at this point, the multiplicity still depends on
I think what I want instead is to run the full optimization, except I want the optimizer to never remove a memory bus interaction (i.e., skip [this code]). Note that at this point, the solver might have actually figured out concrete values for some of the memory limbs, or might have already determined that some memory limbs are equal. The current algorithm should still work though in that case. |
a072147 to
b1c42c0
Compare
|
OK, things are working, but the effectiveness is significantly reduced. Analysis on block
|
|
@georgwiese 3 has a typo I guess? |
f02f260 to
7284971
Compare
024039d to
7f65b56
Compare
|
I ran on reth:
|
7f65b56 to
367085b
Compare
944ac87 to
eabf566
Compare
eb0ed0f to
59c742c
Compare
Cherry-picked from #3501 While running experiments for optimistic precompiles (#3366), I ran into more memory allocation errors. This PR can be seen as a follow-up to #3517. The main idea here is that we no longer materialize the partition for each block instance when detecting equivalence classes. Instead, the iterator is passed on, so that new partitions are summarized as soon as they arrive. See comments below for some nuance.
Cherry-picked from #3501 When collecting empirical constraints, we used to materialize the entire trace of a given PGO input. This turns out to be too memory-intensive. We already have a way to combine empirical constraints that were computed on different data sets, and use it to combine empirical constraints from different PGO inputs. With this PR, we do take a more granular approach: We keep at most 20 segments in memory (this is also configurable), compute empirical constraints for every chunk of 20 segments, and combine between those chunks. The result should be the same, except for some nuance in the range constraints: Range constraints are computed as the 1st and 99th percentile. When combining, we simply take the min of the minimums and the max of the maximums. So, for example, it could be that in 20 chunks, a PC is only executed once and has an extreme value. Then, it would widen the range, even though it might not influence the percentiles if they were computed globally.
59c742c to
e1ca70b
Compare
|
Some analysis on the results presented in the evaluation section of #3366. To copy the results, on 10 Ethereum blocks, we get:
Going from left to right in the effectiveness plot (guaranteed / restricted / optimistic), I looked into the columns that are in the optimistic but not restricted precompile:
This is largely consistent with the analysis above. Timestamp stuff accounts a lot with the performance drop. Another big one is that we filter out heap memory stuff where maybe we don't have to: For example, it could be that the value of a memory access is always 0 empirically, but we don't know the address at compile time. The address can be derived from (runtime) register values though. |
e1ca70b to
febd77d
Compare
…ecution-constraints
Schaeff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
autoprecompiles/src/execution/ast.rs
Outdated
| pub enum LocalOptimisticLiteral<A> { | ||
| Register(A), | ||
| /// A limb of a register | ||
| // TODO: The code below ignores the limb index; support it properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // The optimizer might introduce new columns, but we'll discard them below. | ||
| // As a result, it is fine to clone here (and reuse column IDs across instructions). | ||
| column_allocator.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this but I don't see how to fix it without a major refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @chriseth said ColumnAllocator shouldn't be clonable the other day, don't remember why specifically. Can't this just be borrowed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was following another review of mine.
I'm not sure how to fix this in general, but maybe this at least reuses an existing api.
| let empirical_constraints = empirical_constraints.filtered( | ||
| |block_cell| { | ||
| let algebraic_reference = algebraic_references | ||
| .get_algebraic_reference(block_cell) | ||
| .unwrap(); | ||
| optimistic_literals.contains_key(algebraic_reference) | ||
| }, | ||
| <A::Instruction as PcStep>::pc_step(), | ||
| ); | ||
|
|
||
| let empirical_constraints = | ||
| ConstraintGenerator::<A>::new(empirical_constraints, algebraic_references, &block) | ||
| .generate_constraints(); | ||
|
|
||
| let execution_constraints = | ||
| generate_execution_constraints(&empirical_constraints, &optimistic_literals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing the same thing in filtered and in generate_execution_constraints, the first time returning a filtered set and the second time panicking if something is not in the filtered set.
Also filtered is used only here.
I don't have a particular suggestion here but it seems like we could simplify this, maybe by going straight from BlockEmpiricalConstraints to (Vec<EqualityConstraint>, Vec<OptimisticConstraint>) instead of BlockEmpiricalConstraints -> Vec<EqualityConstraint> -> Vec<OptimisticConstraint>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I had it like this, with the ConstraintGenerator generating both the symbolic constraints and the execution constraints. With the BlockEmpiricalConstraints -> Vec<EqualityConstraint> -> Vec<OptimisticConstraint>, things are more decoupled, which I like, especially since we don't care about execution constraints unless restricted precompiles are enabled.
In a context where we already use maps for cases where we only have data for some keys (pcs...) it seems better to also apply this to the range constraints.
| }) | ||
| // Map each limb reference to an optimistic literal | ||
| .flat_map(|(instruction_idx, concrete_address, limbs)| { | ||
| // Borrow column allocator to avoid moving it into the closure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, as ChatGPT would say: "Nothing you’ve done is a hack — this is standard Rust ownership choreography."
|
|
||
| // Generate constraints for optimistic precompiles. | ||
| let should_generate_execution_constraints = | ||
| optimistic_precompile_config().restrict_optimistic_precompiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started here #3567
| @@ -0,0 +1,32 @@ | |||
| const DEFAULT_EXECUTION_COUNT_THRESHOLD: u64 = 100; | |||
| const DEFAULT_MAX_SEGMENTS: usize = 20; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of segments is this referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execution segments / shards, there is a description of the config field below.
| @@ -0,0 +1,32 @@ | |||
| const DEFAULT_EXECUTION_COUNT_THRESHOLD: u64 = 100; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count of what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See description of the config field below
| } | ||
|
|
||
| pub fn optimistic_precompile_config() -> OptimisticPrecompileConfig { | ||
| let execution_count_threshold = std::env::var("POWDR_OP_EXECUTION_COUNT_THRESHOLD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid env vars for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle I agree, but I think it's fine while the feature is highly experimental. Seems pointless to pollute the CLI and having to change reth constantly for a feature (optimistic precompiles) that is not working yet end-to-end. For this parameter, I expect that we can automatically set it in the future, but this was the easiest to run some quick experiments. Also, note that this is an env var before this PR, so I think we can fix this separately.
| .ok() | ||
| .and_then(|s| s.parse().ok()) | ||
| .unwrap_or(DEFAULT_MAX_SEGMENTS); | ||
| let restricted_optimistic_precompiles = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think especially for this we should avoid env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why especially here? In the end, all optimistic precompiles should always be restricted, the unrestricted version is only for us to know how good we'd be without restrictions. One could argue that for this reason, restricted optimistic APCs should be opt-out, not opt-in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant especially here because the limits have sane default constants
| // The optimizer might introduce new columns, but we'll discard them below. | ||
| // As a result, it is fine to clone here (and reuse column IDs across instructions). | ||
| column_allocator.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @chriseth said ColumnAllocator shouldn't be clonable the other day, don't remember why specifically. Can't this just be borrowed here?
|
|
||
| let instruction_idx = match bus_interaction.op() { | ||
| MemoryOp::GetPrevious => instruction_idx, | ||
| MemoryOp::SetNew => instruction_idx + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why + 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we fetch a value at a given instruction index, the semantics is that it is the value before the instruction is executed. So when we have a memory bus receive (GetPrevious), we want to fetch the value at the current instruction index, and for a memory bus send (SetNew), we want to match it whatever value we fetch in the next instruction.
| &vm_config.bus_map, | ||
| ); | ||
|
|
||
| symbolic_machines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this entire sequence kinda hard to read and visually polluted.
There's an iterator chain, with a nested iterator chain and multiple nested flat maps with several closures, and the whole thing is 100+ lines long. I find it hard to parse what's inside what and what's going where being consumed by what.
If I go line by line I can understand what's happening, but I personally wouldn't necessarily prioritize iterator chain maximalism over human readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more accurate, I think my issue is rather with closure maximalism.
If this block was
symbolic_machines
.into_iter()
.enumerate()
.flat_map(f)
.flat_map(g)
.flat_map(h)
with more readable function descriptions I think I would personally find it a lot more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this? 729e997
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
autoprecompiles/src/optimistic/execution_constraint_generator.rs
Outdated
Show resolved
Hide resolved
Reuses an existing API instead of cloning.
9b5eefb to
ebce803
Compare
Builds on #3562
This PR generates the "optimistic constraints" (which I'd prefer to call "execution constraints") introduced in #3491 for optimistic precompiles. They are currently ignored, actually passing them to the execution engine is left for another PR.
At a high level, this is what happens:
optimistic_literals()computes a mapAlgebraicReference -> OptimisticLiteral. It works by finding memory accesses with compile-time addresses (essentially register accesses). The columns representing the data in the memory bus interaction correspond to limbs of register values at some point in time and therefore can be mapped to an execution literal.BlockEmpiricalConstraints::filteredis used to remove any constraints on columns that cannot be mapped to execution literals. As a result, all empirical constraints can be checked at execution time, but the resulting optimistic precompiles are less effective.ConstraintGenerator::generate_constraintsturns empirical constraints into equality constraints, i.e., constraints of the form(number|algebraic_reference) = (number|algebraic_reference). These constraints can be converted toSymbolicConstraint(to be added to the solver) and to execution constraints viagenerate_execution_constraints(using the map computed in step 1).To test:
POWDR_RESTRICTED_OPTIMISTIC_PRECOMPILES=1 cargo run --bin powdr_openvm -r prove guest-keccak --input 100 --autoprecompiles 1 --apc-candidates-dir keccak100 --mock --optimistic-precompilesAlso see the evaluation on reth that I posted in #3366.