Improve randomness usage when creating workloads#147
Merged
Conversation
Contributor
Skill Context Report
|
The skill's randomness guidance covered determinism (route everything through the SDK so it replays) but not the shape of the random distributions themselves. Workloads that hardcoded probabilities and action weights as constants ended up with every timeline exploring the same average mix, rarely reaching the deep skewed states where many bugs live. The new guidance is to draw those probabilities and weights themselves at the start of each timeline, biased toward the extremes including zero. It lives under "Vary randomness across timelines" in test-commands.md, with a sibling bullet in the Guidance checklist and a pointer added from iteration.md's "Common Improvements".
bc6a09c to
cc5fdd2
Compare
BillGrieser-Antithesis
added a commit
that referenced
this pull request
May 15, 2026
Brings in PRs #146 (snouty launch consistency), #147 (workload randomness guidance), #148 (interesting input values reference), #150 (workload fault-tolerance mindset), plus version metadata bumps. Conflict resolutions: - antithesis-debug/SKILL.md: version metadata — kept origin/main's freshest value (auto-regenerated anyway). - antithesis-launch/SKILL.md: kept feat/triage's more detailed webhook default rule (basic_test for docker-compose, basic_k8s_test for kubernetes) over main's older shorter rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BillGrieser-Antithesis
added a commit
that referenced
this pull request
May 15, 2026
Brings in PRs #146/#147/#148/#150 from main (snouty launch consistency, workload randomness / interesting-values / fault-tolerance guidance) plus version metadata bumps. fix/debug-skill-moment-and-refs' own commits are preserved unchanged. Single conflict resolved: antithesis-debug/SKILL.md version metadata — took origin/main's freshest value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The workload skill now incorporates the lesson from swarm testing: when every test run draws actions uniformly from the same broad menu, runs converge toward the same average mix and rarely reach the deep states where many bugs live. If each individual run is narrowly biased instead — restricted to a subset of actions, with skewed probabilities — and the biases vary across runs, each run goes deep into one corner of the state space, and the swarm covers far more ground than uniform mixing ever does.
In Antithesis terms: if your workload's probabilities and action weights are baked into the code, every timeline mixes operations in roughly the same proportions. You rarely hit the deep skewed states — long runs of the same action, whole categories never appearing — where a lot of bugs actually live. The new guidance is to draw those probabilities and weights themselves at the start of each timeline, biased toward the extremes including zero (which omits a class of action entirely from that timeline).
The advice lives under "Vary randomness across timelines" in the workload skill's test-commands reference, with a sibling bullet in the Guidance checklist so it surfaces at self-review.