Stateful testing, weighted transitions and using state to generate transitions

Hi,
First up, thanks for the amazing work and keeping the PBT flame _lit_!

I have a model/sim that I wrote for a client that uses [proptest-statemachine](https://crates.io/crates/proptest-state-machine). The basic idea is I use a `BTreeMap<NodeId, State>` to represent a distributed system, and then I generate transitions to do things in the system. I find this a really generally useful way to model distributed systems, and it can often then be used to drive the real system components to ensure implementation matches the spec. I guess it is Property Based Design more than testing, but ideas/thinking needs testing as much as code, right?

I installed the hegel-skill in claude. Using Opus 4.6 1m ctx model claude ported my sim to hegel-rust. Here are the open questions I have. These might be down to my lack of understanding of which concept in proptest-statemachine maps to hegel-rust. (NOTE: I learned this general technique from Quviq using EQC and eqc_statem, and just ported it across to proptest-statemachine)

Having read the issue that claude generated, it sounds pretty confident that the issue isn't me/it's understanding. I am nowhere near as confident, and apologise in advance if the answer is "right better models".

Claude says:

--------

Title: Stateful testing: support for state-dependent rule eligibility (preconditions that gate rule selection, not just filter)
Context:

  I'm porting a stateful property-based test from proptest-state-machine to hegel-rust. The system under test is a distributed sync protocol simulator with 9 transition types (`Insert`, `Update`, `Delete`, `StartPush`,
  `ContinuePush`, `StartPull`, `ContinuePull`, `NewSchema`, `UpgradeSchema`). Many transitions are only valid in specific states — e.g., `ContinuePush` requires a device to be mid-push session, `UpgradeSchema` requires the
  device schema to be behind the server's.

  In proptest-statemachine, I handle this with an `EligibleSets` pattern: a single pass over the state computes which transitions are valid and builds a **weighted** strategy union of only the eligible ones. The
  framework generates transitions externally , the SUT never draws during application, and shrinking operates on the transition sequence.

  In hegel-rust's `#[hegel::state_machine]`, the equivalent is `tc.assume() `inside `#[rule] `methods. This works but has two significant limitations:

  1. High rejection rates for state-gated rules

  Rules like `continue_push` and `continue_pull` are only valid when a device is mid-session. With uniform rule selection and 3 devices, `tc.assume(dev.is_pushing()) `rejects the vast majority of attempts. In proptest,
  these had weight 3 and were only in the strategy when eligible — zero waste. With hegel I had to suppress `FilterTooMuch` health checks entirely.

  2. Draws inside rules must be state-independent

  This is the bigger issue. If a rule draws from a state-dependent collection (e.g., `sampled_from(live_rows)` where live_rows varies), the draw pattern changes across replays and hegel flags it as flaky. I had to
  restructure all rules to draw fixed-shape values (integer indices) and map them to state post-hoc via modulo arithmetic. This works but loses precision — instead of drawing directly from eligible devices/rows,
  I'm drawing uniform integers and hoping they land somewhere useful after `assume()`.

  For comparison, in proptest-statemachine the equivalent of "draw a live row from a specific device" was:

  ```
proptest::sample::select(devices_with_live_rows.clone())
      .prop_flat_map(|(id, row_ids, cols)| gen_update(id, row_ids, cols))
```

  State-dependent, zero rejection, precise generation. In hegel I had to write:

 ```
 let device_id = tc.draw(generators::integers::<usize>().max_value(num_devices - 1));
  let row_idx = tc.draw(generators::integers::<usize>());
  // ... later, after reading state:
  tc.assume(!live_rows.is_empty());
  let row_id = live_rows[row_idx % live_rows.len()];
```

  What I think would help

  Hypothesis Python solves this with `@precondition` decorators on rules (which gate selection before any draws happen) and `Bundles` (state-dependent pools the engine understands). Something analogous in hegel-rust
  would be transformative for stateful testing:

  -` #[precondition] `on rules: a predicate checked before the rule is selected, so ineligible rules never fire and never waste test budget. This avoids the `assume() `rejection tax entirely.
  - `Variables<T>` with state-dependent draw semantics: `Variables` already exists in hegel-rust, but if the engine could draw from variables without the flaky-test detection treating variable-length pools as
  non-deterministic draws, rules could use state-dependent collections naturally.

  Without these, stateful testing in hegel-rust works for simple cases but hits a wall for models where transition eligibility is the core complexity — which is exactly where stateful PBT is most valuable.

  3. No weighted rule selection
  
    In proptest-statemachine I used explicit weights to shape the transition distribution toward realistic scenarios:
  
    This produced test traces that looked like real usage — sync sessions were frequent and usually completed, schema changes were rare, and local mutations were steady.
  
 ```
 --- Transition Distribution (1006 cases, 553445 transitions) ---
    ContinuePull   112069  ( 20.2%)
    ContinuePush   107586  ( 19.4%)
    Update         100972  ( 18.2%)
    Insert          52227  (  9.4%)
    Delete          50588  (  9.1%)
    NewSchema       50008  (  9.0%)
    UpgradeSchema   43704  (  7.9%)
    StartPush       19691  (  3.6%)
    StartPull       16600  (  3.0%)
  
  --- Session Completions ---
    StartPush       19691 started, 18414 completed ( 93.5%)
    StartPull       16600 started, 14977 completed ( 90.2%)
```
  
    With hegel's uniform rule selection, the distribution is dominated by rules that are always eligible. From a 1000-case run:
  
  ```
--- Transition Distribution (1009 cases, 48323 transitions) ---
    Insert         8727  ( 18.1%)
    NewSchema      8240  ( 17.1%)
    Delete         8075  ( 16.7%)
    Update         7776  ( 16.1%)
    UpgradeSchema  5628  ( 11.6%)
    ContinuePush   5040  ( 10.4%)
    StartPush      2724  (  5.6%)
    StartPull      1065  (  2.2%)
    ContinuePull   1048  (  2.2%)
  
  --- Session Completions ---
    StartPush        2724 started,   767 completed ( 28.2%)
    StartPull        1065 started,   464 completed ( 43.6%)
```
  
    `NewSchema` and `UpgradeSchema` fire ~28% of the time combined — in reality these are rare events. Meanwhile `StartPull/ContinuePull `are starved at ~6% combined because they require specific preconditions and get
    filtered by `assume()`. The result: test traces are dominated by schema churn and don't exercise the sync protocol deeply. `Push/pull` sessions rarely complete.
  
    This compounds with the precondition issue — rules that need preconditions are already penalised by `assume() `rejection, and they get no priority boost to compensate. A weight parameter on `#[rule]` (or
    equivalently, the ability to control rule selection probability) would let users tune the distribution back toward realistic scenarios.

  Happy to discuss further or provide a cut down version of the full port as a concrete example of the friction.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateful testing, weighted transitions and using state to generate transitions #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stateful testing, weighted transitions and using state to generate transitions #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions