Bounded-scope generators #3014

konnov · 2024-10-03T09:05:49Z

konnov
Oct 3, 2024
Maintainer

Apalache currently supports value generators to restrict the scope of very large data structures.

Why we need value generators

When we want to find bugs in a protocol with a very large state space and plenty of behavior, e.g., Byzantine consensus, we would like to start model checking in a "reasonable" intermediate state (more on that later). Otherwise, both TLC and Apalache get stuck after a exploring the state space/execution space up to 10-20 steps.

The standard answer to this problem is to introduce an inductive invariant, prove its inductiveness and use it to start exploration in an arbitrary state that is described by the inductive invariant. Actually, with an inductive invariant, we need just need to explore one step. For instance, this is how I did it for Ben-Or's consensus recently, see IndInit (for the fixed constants):

TypeOK ==
  /\ value \in [ CORRECT -> VALUES ]
  /\ decision \in [ CORRECT -> VALUES \union { NO_DECISION } ]
  /\ round \in [ CORRECT -> ROUNDS ]
  /\ step \in [ CORRECT -> { S1, S2, S3 } ]
  /\ \E A1 \in SUBSET [ src: ALL, r: ROUNDS, v: VALUES ]:
        msgs1 = [ r \in ROUNDS |-> { m \in A1: m.r = r } ]
  /\ \E A1D \in SUBSET [ src: ALL, r: ROUNDS, v: VALUES ],
          A1Q \in SUBSET [ src: ALL, r: ROUNDS ]:
        msgs2 = [ r \in ROUNDS |->
            { D2(mm.src, r, mm.v): mm \in { m \in A1D: m.r = r } }
                \union { Q2(mm.src, r): mm \in { m \in A1Q: m.r = r } }
        ]

IndInit ==
  /\ TypeOK
  /\ IndInv

Basically, there are two components in IndInit:

The predicate TypeOK restricts the scope of the state variables to the reasonable values.
The predicate IndInv captures the essential relations between the variables, specific to the protocol logic. This predicate is usually quite hard to come up with.

This approach technically works both in TLC and Apalache. However, it's easy to see that TypeOK contains potentially very large sets on the right-hand side of the membership relation, e.g.:

\E A1 \in SUBSET [ src: ALL, r: ROUNDS, v: VALUES ]:
        msgs1 = [ r \in ROUNDS |-> { m \in A1: m.r = r } ]

When we increase the cardinality of the sets ALL, ROUND, and VALUES, the set SUBSET [ src: ALL, r: ROUNDS, v: VALUES ] grows exponentially faster than its base sets. Sure, it is the worst-case scenario. However, in many algorithms, processes need a very small subset of the messages to make the next step. Intuitively, we do not need an arbitrary subset of all possible messages, but we need an arbitrary subset of all possible messages up to a reasonable bound. Moreover, it's much easier to debug an inductive invariant on small subsets first and then check it on larger sets.

The community modules have the operator kSubset to work around this problem. TLC has a custom implementation of kSubset. In the above example, we could replace SUBSET with kSubset as follows:

\E A1 \in kSubset(k, [ src: ALL, r: ROUNDS, v: VALUES ]):
        msgs1 = [ r \in ROUNDS |-> { m \in A1: m.r = r } ]

What is $k$? Well, it can be a small constant that helps us to obtain counterexamples fast. If the model checker does not give us a counterexample, we should increase $k$, or convince ourselves that the current value of $k$ is sufficient to check our properties. Sometimes, after debugging the inductive invariant, we can even replace kSubset with SUBSET again, when the invariant is tight enough for the model checker to work.

While kSubset works in TLC, I do not know how to efficiently implement it in Apalache. Technically, we could enumerate all potential k-subsets of the base set and use them as the elements of kSubset. However, the definition of kSubset uses the set cardinality, and Apalache cannot statically compute the cardinality of an arbitrary set. The current encoding of Cardinality(S) in Apalache produces $O(n^2)$ for a set $S$ that potentially has $n$ elements. TLC does not have any issues with that, as it enumerates states, and it can precisely give us the set cardinality in every state.

To have something similar to kSubset in Apalache, we have introduced the operator Apalache!Gen(k). This operator works in a very straightforward way. It produces completely unrestricted data structures of width up to $k$. For example, if the target type of Gen is Set(T), then Gen(k) produces a symbolic set that has up to $k$ elements, and all of its elements have the width up to $k$. For example, if the set elements are sequences, their length must be bounded with $k$.

Having Apalache!Gen, we can rewrite the above k-subset predicate as follows:

LET A1 == Gen(k) IN
/\ A1.src \in ALL
/\ A1.r \in ROUNDS
/\ A1.v \in VALUES
/\ msgs1 = [ r \in ROUNDS |-> { m \in A1: m.r = r } ]

Note that this version is not semantically equivalent to the one with kSubset, as Gen can produce set of cardinalities from 0 to $k$. If we want a completely equivalent version, which we usually don't, we can add the constraint Cardinality(A1) = k.

Why `Gen` is not the best solution

Apalache!Gen was introduced as a quick hack to experiment with the idea. While it technically works, there are several issues with it:

Gen is syntactically confusing. Semantically, Gen acts as the existential quantifier \E A1 \in S: P, but syntactically, it looks very much like a deterministic computation.
Gen relies on type inference to come up with the right shape of the data structure.
Gen uses the same bound $k$ for all children in the produced data structures. When we generate a set of records that have sets as fields, the same bound $k$ applies to those sets as well. Often, we need much smaller sets in the fields. I have already hit this limitation several times.

An alternative solution

Alternatively, we could use a special pattern of \E x: P to restrict the scope. We could rewrite the above example as:

\E A1:
  /\ /\ Cardinality(A1) <= k
     /\ A1.src \in ALL
     /\ A1.r \in ROUNDS
     /\ A1.v \in VALUES
  /\ msgs1 = [ r \in ROUNDS |-> { m \in A1: m.r = r } ]

This version looks much more understandable. We just produce a completely unrestricted constant A1 and further restrict it with a predicate.

Implementation challenges

There are a few things that are not yet clear to me:

How to compute the type of A1 in the above example? We would probably need a type annotation.
How to statically extract the bounds from the predicate under \E A1? We would need some form of static analysis to go over the predicate and compute the bounds. This problem is somewhat similar to computing the bounds in i..j [FEATURE] Interval analysis to improve a..b #446.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bounded-scope generators #3014

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Bounded-scope generators #3014

Uh oh!

Uh oh!

konnov Oct 3, 2024 Maintainer

Why we need value generators

Why Gen is not the best solution

An alternative solution

Implementation challenges

Replies: 0 comments

konnov
Oct 3, 2024
Maintainer

Why `Gen` is not the best solution