-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi,
First up, thanks for the amazing work and keeping the PBT flame lit!
I have a model/sim that I wrote for a client that uses proptest-statemachine. The basic idea is I use a BTreeMap<NodeId, State> to represent a distributed system, and then I generate transitions to do things in the system. I find this a really generally useful way to model distributed systems, and it can often then be used to drive the real system components to ensure implementation matches the spec. I guess it is Property Based Design more than testing, but ideas/thinking needs testing as much as code, right?
I installed the hegel-skill in claude. Using Opus 4.6 1m ctx model claude ported my sim to hegel-rust. Here are the open questions I have. These might be down to my lack of understanding of which concept in proptest-statemachine maps to hegel-rust. (NOTE: I learned this general technique from Quviq using EQC and eqc_statem, and just ported it across to proptest-statemachine)
Having read the issue that claude generated, it sounds pretty confident that the issue isn't me/it's understanding. I am nowhere near as confident, and apologise in advance if the answer is "right better models".
Claude says:
Title: Stateful testing: support for state-dependent rule eligibility (preconditions that gate rule selection, not just filter)
Context:
I'm porting a stateful property-based test from proptest-state-machine to hegel-rust. The system under test is a distributed sync protocol simulator with 9 transition types (Insert, Update, Delete, StartPush,
ContinuePush, StartPull, ContinuePull, NewSchema, UpgradeSchema). Many transitions are only valid in specific states — e.g., ContinuePush requires a device to be mid-push session, UpgradeSchema requires the
device schema to be behind the server's.
In proptest-statemachine, I handle this with an EligibleSets pattern: a single pass over the state computes which transitions are valid and builds a weighted strategy union of only the eligible ones. The
framework generates transitions externally , the SUT never draws during application, and shrinking operates on the transition sequence.
In hegel-rust's #[hegel::state_machine], the equivalent is tc.assume() inside #[rule] methods. This works but has two significant limitations:
- High rejection rates for state-gated rules
Rules like continue_push and continue_pull are only valid when a device is mid-session. With uniform rule selection and 3 devices, tc.assume(dev.is_pushing()) rejects the vast majority of attempts. In proptest,
these had weight 3 and were only in the strategy when eligible — zero waste. With hegel I had to suppress FilterTooMuch health checks entirely.
- Draws inside rules must be state-independent
This is the bigger issue. If a rule draws from a state-dependent collection (e.g., sampled_from(live_rows) where live_rows varies), the draw pattern changes across replays and hegel flags it as flaky. I had to
restructure all rules to draw fixed-shape values (integer indices) and map them to state post-hoc via modulo arithmetic. This works but loses precision — instead of drawing directly from eligible devices/rows,
I'm drawing uniform integers and hoping they land somewhere useful after assume().
For comparison, in proptest-statemachine the equivalent of "draw a live row from a specific device" was:
proptest::sample::select(devices_with_live_rows.clone())
.prop_flat_map(|(id, row_ids, cols)| gen_update(id, row_ids, cols))
State-dependent, zero rejection, precise generation. In hegel I had to write:
let device_id = tc.draw(generators::integers::<usize>().max_value(num_devices - 1));
let row_idx = tc.draw(generators::integers::<usize>());
// ... later, after reading state:
tc.assume(!live_rows.is_empty());
let row_id = live_rows[row_idx % live_rows.len()];
What I think would help
Hypothesis Python solves this with @precondition decorators on rules (which gate selection before any draws happen) and Bundles (state-dependent pools the engine understands). Something analogous in hegel-rust
would be transformative for stateful testing:
-#[precondition]on rules: a predicate checked before the rule is selected, so ineligible rules never fire and never waste test budget. This avoids the assume() rejection tax entirely.
Variables<T>with state-dependent draw semantics:Variablesalready exists in hegel-rust, but if the engine could draw from variables without the flaky-test detection treating variable-length pools as
non-deterministic draws, rules could use state-dependent collections naturally.
Without these, stateful testing in hegel-rust works for simple cases but hits a wall for models where transition eligibility is the core complexity — which is exactly where stateful PBT is most valuable.
- No weighted rule selection
In proptest-statemachine I used explicit weights to shape the transition distribution toward realistic scenarios:
This produced test traces that looked like real usage — sync sessions were frequent and usually completed, schema changes were rare, and local mutations were steady.
--- Transition Distribution (1006 cases, 553445 transitions) ---
ContinuePull 112069 ( 20.2%)
ContinuePush 107586 ( 19.4%)
Update 100972 ( 18.2%)
Insert 52227 ( 9.4%)
Delete 50588 ( 9.1%)
NewSchema 50008 ( 9.0%)
UpgradeSchema 43704 ( 7.9%)
StartPush 19691 ( 3.6%)
StartPull 16600 ( 3.0%)
--- Session Completions ---
StartPush 19691 started, 18414 completed ( 93.5%)
StartPull 16600 started, 14977 completed ( 90.2%)
With hegel's uniform rule selection, the distribution is dominated by rules that are always eligible. From a 1000-case run:
--- Transition Distribution (1009 cases, 48323 transitions) ---
Insert 8727 ( 18.1%)
NewSchema 8240 ( 17.1%)
Delete 8075 ( 16.7%)
Update 7776 ( 16.1%)
UpgradeSchema 5628 ( 11.6%)
ContinuePush 5040 ( 10.4%)
StartPush 2724 ( 5.6%)
StartPull 1065 ( 2.2%)
ContinuePull 1048 ( 2.2%)
--- Session Completions ---
StartPush 2724 started, 767 completed ( 28.2%)
StartPull 1065 started, 464 completed ( 43.6%)
`NewSchema` and `UpgradeSchema` fire ~28% of the time combined — in reality these are rare events. Meanwhile `StartPull/ContinuePull `are starved at ~6% combined because they require specific preconditions and get
filtered by `assume()`. The result: test traces are dominated by schema churn and don't exercise the sync protocol deeply. `Push/pull` sessions rarely complete.
This compounds with the precondition issue — rules that need preconditions are already penalised by `assume() `rejection, and they get no priority boost to compensate. A weight parameter on `#[rule]` (or
equivalently, the ability to control rule selection probability) would let users tune the distribution back toward realistic scenarios.
Happy to discuss further or provide a cut down version of the full port as a concrete example of the friction.