Compile execution constraints #3501

georgwiese · 2025-12-18T18:56:44Z

Builds on #3562

This PR generates the "optimistic constraints" (which I'd prefer to call "execution constraints") introduced in #3491 for optimistic precompiles. They are currently ignored, actually passing them to the execution engine is left for another PR.

At a high level, this is what happens:

optimistic_literals() computes a map AlgebraicReference -> OptimisticLiteral. It works by finding memory accesses with compile-time addresses (essentially register accesses). The columns representing the data in the memory bus interaction correspond to limbs of register values at some point in time and therefore can be mapped to an execution literal.
BlockEmpiricalConstraints::filtered is used to remove any constraints on columns that cannot be mapped to execution literals. As a result, all empirical constraints can be checked at execution time, but the resulting optimistic precompiles are less effective.
ConstraintGenerator::generate_constraints turns empirical constraints into equality constraints, i.e., constraints of the form (number|algebraic_reference) = (number|algebraic_reference). These constraints can be converted to SymbolicConstraint (to be added to the solver) and to execution constraints via generate_execution_constraints (using the map computed in step 1).

To test:
POWDR_RESTRICTED_OPTIMISTIC_PRECOMPILES=1 cargo run --bin powdr_openvm -r prove guest-keccak --input 100 --autoprecompiles 1 --apc-candidates-dir keccak100 --mock --optimistic-precompiles

Also see the evaluation on reth that I posted in #3366.

georgwiese · 2025-12-19T21:25:13Z

The code is pretty flashed out at this point, but it doesn't work, because I'm trying to find the memory limbs from the unoptimized machine. But at this point, the multiplicity still depends on is_valid (so I can't easily know which are sends and which are receives), even register addresses are still symbolic (coming from the dynamic PC lookup), and the values are often complex expression. One example:

(id=1, mult=2013265920 * is_valid, args=[is_load * mem_as + (1 - is_load) * 1, is_load * (mem_ptr_limbs__0 + mem_ptr_limbs__1 * 65536) + (1 - is_load) * rd_rs2_ptr - ((0 + flags__0 * (0 + flags__0 + flags__1 + flags__2 + flags__3 - 2) * 2013265920) * 1 + (0 + flags__2 * (flags__2 - 1) * 1006632961 + flags__1 * (0 + flags__0 + flags__1 + flags__2 + flags__3 - 2) * 2013265920) * 2 + (0 + flags__2 * (0 + flags__0 + flags__1 + flags__2 + flags__3 - 2) * 2013265920) * 3), read_data__0, read_data__1, read_data__2, read_data__3, read_data_aux__base__prev_timestamp])

I think what I want instead is to run the full optimization, except I want the optimizer to never remove a memory bus interaction (i.e., skip [this code]). Note that at this point, the solver might have actually figured out concrete values for some of the memory limbs, or might have already determined that some memory limbs are equal. The current algorithm should still work though in that case.

georgwiese · 2025-12-23T20:58:48Z

OK, things are working, but the effectiveness is significantly reduced.

Analysis on block `0x201ecc = 2105036`

The "guaranteed" precompile has 131 columns.
The optimistic precompile (only_memory_limbs = false, this is the state before this PR) has 83 columns.
The filtered optimistic precompile (only_memory_limbs = true, only empirical constraints on memory pointer limbs are used) has 112 columns.

The 29 columns that are in (3) but not in (2) fall into different categories:

Category 1 (16 / 29): timestamp diff limbs

rs1_aux_cols__base__timestamp_lt_aux__lower_decomp__0_0
read_data_aux__base__timestamp_lt_aux__lower_decomp__0_0
write_base_aux__timestamp_lt_aux__lower_decomp__0_0
read_data_aux__base__timestamp_lt_aux__lower_decomp__0_1
write_base_aux__timestamp_lt_aux__lower_decomp__0_1
read_data_aux__base__timestamp_lt_aux__lower_decomp__0_2
write_base_aux__timestamp_lt_aux__lower_decomp__0_2
read_data_aux__base__timestamp_lt_aux__lower_decomp__0_3
write_base_aux__timestamp_lt_aux__lower_decomp__0_3
rs1_aux_cols__base__timestamp_lt_aux__lower_decomp__0_4
write_base_aux__timestamp_lt_aux__lower_decomp__0_4
write_base_aux__timestamp_lt_aux__lower_decomp__0_5
write_base_aux__timestamp_lt_aux__lower_decomp__0_6
write_base_aux__timestamp_lt_aux__lower_decomp__0_7
reads_aux__0__base__timestamp_lt_aux__lower_decomp__0_9
reads_aux__0__base__timestamp_lt_aux__lower_decomp__0_11

These can be removed because typically accessed memory cells have been accessed recently. Therefore, the most significant limb of the time stamp diff is 0. AFAIU, we don't have access to the current or previous timestamp (of a memory access) during execution, so this is an inherent limitation of the way we check optimistic precompiles at execution time.

Note that all this data is based on very small samples, so it could be that in larger examples, the limbs are actually nonzero much more often, and the empirical constraint would also not be included in (2). (In other words, the 83 columns in (2) might be the result of overfitting to a small sample.)

Category 2 (5 / 29): Derivable columns

For these columns, I believe the solver should be able to derive them deterministically from the given data.

The example that happens here is related to diff markers in the final comparison (BLTU 44 48 -44 1 1):

diff_marker__0_11
diff_marker__1_11
diff_marker__2_11
diff_marker__3_11
diff_val_11

What happens here is that empirically, register 44 is always 15, and register 48 is always a byte. The difference can only be in the final byte, so at least 3 diff markers should be inferred to be 0 (and the 4th could be inferred to equal cmp_result, which is also in (2)).

Category 3 (8 / 29): Not derivable, but checkable

For these columns, the solver is correct not to infer them from the given data, but we could still formulate (complicated) execution constraints to detect the cases.

The example that happens here is the most significant memory pointer limbs:

mem_ptr_limbs__1_0
mem_ptr_limbs__1_1
mem_ptr_limbs__1_2
mem_ptr_limbs__1_3
mem_ptr_limbs__1_4
mem_ptr_limbs__1_5
mem_ptr_limbs__1_6
mem_ptr_limbs__1_7

These are empirically always the same value because (at least in this sample) the addresses to be copied from / to always start with 0x2000. The least significant 16 bit vary. Note that even if they wouldn't be equal to a constant on a larger sample, they would likely be equal among each other (because the memory pointers are the results of a same base pointer plus some small compile-time offset, so likely the most significant 16-bit stay the same).

With the given data, it is not guaranteed that the most significant 16 bit of the memory pointer is always the same, so the solver is not to blame. But, we could actually carry out the addition as part of the execution constraints to check whether this is the case.

leonardoalt · 2025-12-23T22:45:49Z

@georgwiese 3 has a typo I guess? only_memory_limbs = false -> only_memory_limbs = true

georgwiese · 2025-12-30T17:38:21Z

I ran on reth:

Guaranteed precompiles: 3.67 effectiveness
Optimistic precompiles (no restrictions): 6.94 effectiveness
Optimistic precompiles (execution-time checkable): 4.37 effectiveness

Cherry-picked from #3501 While running experiments for optimistic precompiles (#3366), I ran into more memory allocation errors. This PR can be seen as a follow-up to #3517. The main idea here is that we no longer materialize the partition for each block instance when detecting equivalence classes. Instead, the iterator is passed on, so that new partitions are summarized as soon as they arrive. See comments below for some nuance.

Cherry-picked from #3501 When collecting empirical constraints, we used to materialize the entire trace of a given PGO input. This turns out to be too memory-intensive. We already have a way to combine empirical constraints that were computed on different data sets, and use it to combine empirical constraints from different PGO inputs. With this PR, we do take a more granular approach: We keep at most 20 segments in memory (this is also configurable), compute empirical constraints for every chunk of 20 segments, and combine between those chunks. The result should be the same, except for some nuance in the range constraints: Range constraints are computed as the 1st and 99th percentile. When combining, we simply take the min of the minimums and the max of the maximums. So, for example, it could be that in 20 chunks, a PC is only executed once and has an extreme value. Then, it would widen the range, even though it might not influence the percentiles if they were computed globally.

georgwiese · 2026-01-05T21:33:17Z

Some analysis on the results presented in the evaluation section of #3366.

To copy the results, on 10 Ethereum blocks, we get:

Guaranteed precompiles: 3.64 average effectiveness
Optimistic precompiles (no restriction): 5.90 average effectiveness
Execution-time checkable optimistic precompiles: 3.96 average effectiveness

Going from left to right in the effectiveness plot (guaranteed / restricted / optimistic), I looked into the columns that are in the optimistic but not restricted precompile:

0x4e80e0 (36 / 29 / 21): All 8 columns are timestamp or timestamp diff columns.
0x303110 (132 / 110 / 85): Mostly timestamp (diff) columns, some memory pointer limbs and previous value columns (for heap accesses)
0x4e80d4 (54 / 53 / 49): All timestamp (diff) columns
0x200a1c (132 / 126 / 117): One diff marker, otherwise timestamp (diff) columns
0x33d124 (1809 / 1775 / 183): Mostly flags and (heap) memory data. Timestamp-related columns too.
0x25c3c8 (878 / 834 / 505): Mostly memory flags and memory data, but also 32 columns that seem like they should correspond to intermediate registers, like a_mul__2_110.

This is largely consistent with the analysis above. Timestamp stuff accounts a lot with the performance drop. Another big one is that we filter out heap memory stuff where maybe we don't have to: For example, it could be that the value of a memory access is always 0 empirically, but we don't know the address at compile time. The address can be derived from (runtime) register values though.

…ecution-constraints

Schaeff

Partial review

autoprecompiles/src/execution/evaluator.rs

Schaeff · 2026-01-16T10:16:31Z

autoprecompiles/src/execution/ast.rs

 pub enum LocalOptimisticLiteral<A> {
-    Register(A),
+    /// A limb of a register
+    // TODO: The code below ignores the limb index; support it properly


Done here https://github.com/powdr-labs/powdr/pull/3562/files

autoprecompiles/src/optimistic/config.rs

autoprecompiles/src/optimistic/execution_literals.rs

Schaeff · 2026-01-16T11:09:51Z

autoprecompiles/src/optimistic/execution_literals.rs

+                // The optimizer might introduce new columns, but we'll discard them below.
+                // As a result, it is fine to clone here (and reuse column IDs across instructions).
+                column_allocator.clone(),


I don't like this but I don't see how to fix it without a major refactor.

Also @chriseth said ColumnAllocator shouldn't be clonable the other day, don't remember why specifically. Can't this just be borrowed here?

Yes that was following another review of mine.
I'm not sure how to fix this in general, but maybe this at least reuses an existing api.

autoprecompiles/src/optimistic/execution_literals.rs

Schaeff · 2026-01-16T11:26:26Z

autoprecompiles/src/lib.rs

+        let empirical_constraints = empirical_constraints.filtered(
+            |block_cell| {
+                let algebraic_reference = algebraic_references
+                    .get_algebraic_reference(block_cell)
+                    .unwrap();
+                optimistic_literals.contains_key(algebraic_reference)
+            },
+            <A::Instruction as PcStep>::pc_step(),
+        );
+
+        let empirical_constraints =
+            ConstraintGenerator::<A>::new(empirical_constraints, algebraic_references, &block)
+                .generate_constraints();
+
+        let execution_constraints =
+            generate_execution_constraints(&empirical_constraints, &optimistic_literals);


We are doing the same thing in filtered and in generate_execution_constraints, the first time returning a filtered set and the second time panicking if something is not in the filtered set.
Also filtered is used only here.
I don't have a particular suggestion here but it seems like we could simplify this, maybe by going straight from BlockEmpiricalConstraints to (Vec<EqualityConstraint>, Vec<OptimisticConstraint>) instead of BlockEmpiricalConstraints -> Vec<EqualityConstraint> -> Vec<OptimisticConstraint>

I think I had it like this, with the ConstraintGenerator generating both the symbolic constraints and the execution constraints. With the BlockEmpiricalConstraints -> Vec<EqualityConstraint> -> Vec<OptimisticConstraint>, things are more decoupled, which I like, especially since we don't care about execution constraints unless restricted precompiles are enabled.

autoprecompiles/src/empirical_constraints.rs

In a context where we already use maps for cases where we only have data for some keys (pcs...) it seems better to also apply this to the range constraints.

autoprecompiles/src/empirical_constraints.rs

georgwiese · 2026-01-16T15:24:13Z

autoprecompiles/src/optimistic/execution_literals.rs

        })
        // Map each limb reference to an optimistic literal
        .flat_map(|(instruction_idx, concrete_address, limbs)| {
+            // Borrow column allocator to avoid moving it into the closure


Or, as ChatGPT would say: "Nothing you’ve done is a hack — this is standard Rust ownership choreography."

Schaeff · 2026-01-16T16:01:03Z

autoprecompiles/src/lib.rs


    // Generate constraints for optimistic precompiles.
+    let should_generate_execution_constraints =
+        optimistic_precompile_config().restrict_optimistic_precompiles;


Started here #3567

leonardoalt · 2026-01-16T16:29:24Z

autoprecompiles/src/optimistic/config.rs

@@ -0,0 +1,32 @@
+const DEFAULT_EXECUTION_COUNT_THRESHOLD: u64 = 100;
+const DEFAULT_MAX_SEGMENTS: usize = 20;


what kind of segments is this referring to?

execution segments / shards, there is a description of the config field below.

leonardoalt · 2026-01-16T16:29:38Z

autoprecompiles/src/optimistic/config.rs

@@ -0,0 +1,32 @@
+const DEFAULT_EXECUTION_COUNT_THRESHOLD: u64 = 100;


count of what?

See description of the config field below

leonardoalt · 2026-01-16T16:30:15Z

autoprecompiles/src/optimistic/config.rs

+}
+
+pub fn optimistic_precompile_config() -> OptimisticPrecompileConfig {
+    let execution_count_threshold = std::env::var("POWDR_OP_EXECUTION_COUNT_THRESHOLD")


I think we should avoid env vars for this.

In principle I agree, but I think it's fine while the feature is highly experimental. Seems pointless to pollute the CLI and having to change reth constantly for a feature (optimistic precompiles) that is not working yet end-to-end. For this parameter, I expect that we can automatically set it in the future, but this was the easiest to run some quick experiments. Also, note that this is an env var before this PR, so I think we can fix this separately.

leonardoalt · 2026-01-16T16:30:53Z

autoprecompiles/src/optimistic/config.rs

+        .ok()
+        .and_then(|s| s.parse().ok())
+        .unwrap_or(DEFAULT_MAX_SEGMENTS);
+    let restricted_optimistic_precompiles =


I think especially for this we should avoid env var.

Why especially here? In the end, all optimistic precompiles should always be restricted, the unrestricted version is only for us to know how good we'd be without restrictions. One could argue that for this reason, restricted optimistic APCs should be opt-out, not opt-in.

I meant especially here because the limits have sane default constants

leonardoalt · 2026-01-16T16:33:15Z

autoprecompiles/src/optimistic/execution_literals.rs

+                // The optimizer might introduce new columns, but we'll discard them below.
+                // As a result, it is fine to clone here (and reuse column IDs across instructions).
+                column_allocator.clone(),


Also @chriseth said ColumnAllocator shouldn't be clonable the other day, don't remember why specifically. Can't this just be borrowed here?

leonardoalt · 2026-01-16T16:35:30Z

autoprecompiles/src/optimistic/execution_literals.rs

+
+                    let instruction_idx = match bus_interaction.op() {
+                        MemoryOp::GetPrevious => instruction_idx,
+                        MemoryOp::SetNew => instruction_idx + 1,


When we fetch a value at a given instruction index, the semantics is that it is the value before the instruction is executed. So when we have a memory bus receive (GetPrevious), we want to fetch the value at the current instruction index, and for a memory bus send (SetNew), we want to match it whatever value we fetch in the next instruction.

leonardoalt · 2026-01-16T16:39:25Z

autoprecompiles/src/optimistic/execution_literals.rs

+        &vm_config.bus_map,
+    );
+
+    symbolic_machines


I find this entire sequence kinda hard to read and visually polluted.
There's an iterator chain, with a nested iterator chain and multiple nested flat maps with several closures, and the whole thing is 100+ lines long. I find it hard to parse what's inside what and what's going where being consumed by what.
If I go line by line I can understand what's happening, but I personally wouldn't necessarily prioritize iterator chain maximalism over human readability.

To be more accurate, I think my issue is rather with closure maximalism.
If this block was

symbolic_machines .into_iter() .enumerate() .flat_map(f) .flat_map(g) .flat_map(h)

with more readable function descriptions I think I would personally find it a lot more readable.

Like this? 729e997

looks good!

autoprecompiles/src/optimistic/execution_constraint_generator.rs

Reuses an existing API instead of cloning.

georgwiese force-pushed the optimistic-execution-constraints branch from 53b5d6c to 558856c Compare December 19, 2025 15:53

georgwiese force-pushed the optimistic-execution-constraints branch from a072147 to b1c42c0 Compare December 23, 2025 18:43

Base automatically changed from pgo-range-constraints to main December 26, 2025 20:07

georgwiese force-pushed the optimistic-execution-constraints branch 2 times, most recently from f02f260 to 7284971 Compare December 30, 2025 17:07

georgwiese changed the base branch from main to refactor-partition December 30, 2025 17:08

georgwiese force-pushed the optimistic-execution-constraints branch 2 times, most recently from 024039d to 7f65b56 Compare December 30, 2025 17:16

Base automatically changed from refactor-partition to main December 31, 2025 09:33

georgwiese mentioned this pull request Dec 31, 2025

Optimistic Precompiles: Build APCs only usable in "frequent" events #3366

Open

georgwiese force-pushed the optimistic-execution-constraints branch from 7f65b56 to 367085b Compare December 31, 2025 20:09

georgwiese changed the base branch from main to configurable-execution-count-threshold December 31, 2025 20:09

georgwiese force-pushed the optimistic-execution-constraints branch 2 times, most recently from 944ac87 to eabf566 Compare December 31, 2025 21:38

georgwiese mentioned this pull request Dec 31, 2025

Improve empirical constraint collection #3524

Merged

georgwiese force-pushed the optimistic-execution-constraints branch 4 times, most recently from eb0ed0f to 59c742c Compare December 31, 2025 23:19

georgwiese mentioned this pull request Dec 31, 2025

Empirical Constraints: Don't keep entire trace in memory #3525

Merged

Base automatically changed from configurable-execution-count-threshold to main January 2, 2026 14:40

georgwiese force-pushed the optimistic-execution-constraints branch from 59c742c to e1ca70b Compare January 5, 2026 15:51

georgwiese force-pushed the optimistic-execution-constraints branch from e1ca70b to febd77d Compare January 7, 2026 15:25

Base automatically changed from refactor-smybolic-machine-generator to main January 15, 2026 19:22

Merge branch 'main' of github.com:powdr-labs/powdr into optimistic-ex…

370814f

…ecution-constraints

Schaeff reviewed Jan 16, 2026

View reviewed changes

autoprecompiles/src/empirical_constraints.rs Outdated Show resolved Hide resolved

Use map instead of vector of options (#3565)

ae40e25

In a context where we already use maps for cases where we only have data for some keys (pcs...) it seems better to also apply this to the range constraints.

Schaeff reviewed Jan 16, 2026

View reviewed changes

autoprecompiles/src/empirical_constraints.rs Show resolved Hide resolved

georgwiese added 2 commits January 16, 2026 10:08

Rename config field

63f4f8e

Refactor column allocator usage

44205fc

georgwiese commented Jan 16, 2026

View reviewed changes

Remove unnecessary check

87bf2ff

Schaeff reviewed Jan 16, 2026

View reviewed changes

Schaeff and others added 5 commits January 16, 2026 17:05

enable limb access

eac4f80

fix tests

2d8f644

fix

7cef9db

fix

2fbf2ff

Remove collect

d0ee281

leonardoalt reviewed Jan 16, 2026

View reviewed changes

Schaeff reviewed Jan 16, 2026

View reviewed changes

autoprecompiles/src/optimistic/execution_constraint_generator.rs Outdated Show resolved Hide resolved

Schaeff and others added 3 commits January 16, 2026 18:21

review, introduce LIMBS_PER_VALUE

dbf458a

Fmt

67ca9fc

Merge origin/limb-access

1a72815

georgwiese changed the base branch from main to limb-access January 16, 2026 17:45

Schaeff and others added 3 commits January 16, 2026 12:55

Avoid cloning column allocator (#3568)

a68cdb8

Reuses an existing API instead of cloning.

Put back comment

b6a21d9

Get rid of option

00cbe71

Base automatically changed from limb-access to main January 16, 2026 18:28

georgwiese added 2 commits January 16, 2026 13:50

Refactor execution_literals.rs

729e997

Merge origin/main

ebce803

georgwiese force-pushed the optimistic-execution-constraints branch from 9b5eefb to ebce803 Compare January 16, 2026 18:55

Fix merge mess-up

58d718b

		@@ -0,0 +1,32 @@
		const DEFAULT_EXECUTION_COUNT_THRESHOLD: u64 = 100;
		const DEFAULT_MAX_SEGMENTS: usize = 20;

Compile execution constraints #3501

Are you sure you want to change the base?

Compile execution constraints #3501

Conversation

georgwiese commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgwiese commented Dec 19, 2025

Uh oh!

georgwiese commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis on block 0x201ecc = 2105036

Uh oh!

leonardoalt commented Dec 23, 2025

Uh oh!

georgwiese commented Dec 30, 2025

Uh oh!

georgwiese commented Jan 5, 2026

Uh oh!

Schaeff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leonardoalt Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leonardoalt Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

georgwiese commented Dec 18, 2025 •

edited

Loading

georgwiese commented Dec 23, 2025 •

edited

Loading

Analysis on block `0x201ecc = 2105036`

leonardoalt Jan 16, 2026 •

edited

Loading

leonardoalt Jan 16, 2026 •

edited

Loading