You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Surface open questions in property catalog overview (#140)
A per-property Open Questions list under each catalog entry lets reviewers
see at a glance which properties carry unresolved uncertainty, without
opening every evidence file. Property prose stays at the level of
generality the analysis supports; the list does the work of flagging
what's not yet pinned down. Discovery, evaluation, and workload flows
populate, sync, and read the list. The previous self-review criterion
required all open questions to be resolved — for projects with thin
documentation that's not realistic, so the criterion now requires
visibility plus an auditable investigation log instead.
Copy file name to clipboardExpand all lines: antithesis-research/SKILL.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -156,7 +156,8 @@ Review criteria:
156
156
- Properties that need internal branch guidance or replay anchors call out likely SUT-side instrumentation points, not just workload-visible checks
157
157
-`antithesis/scratchbook/deployment-topology.md` exists and describes a minimal container topology — every container is justified
158
158
- Every cataloged property has a corresponding evidence file at `antithesis/scratchbook/properties/{slug}.md` that captures the evidence trail, relevant code paths, and key observations
159
-
- Open questions in evidence files have been investigated and resolved — no unresolved questions remain in evidence files for active properties
159
+
- Each property's catalog entry includes an Open Questions list that mirrors the unresolved questions from its evidence file (short summaries; full context stays in the evidence file). Resolved questions are removed; remaining questions are tagged per `references/property-catalog.md` ("Open Questions Conventions").
160
+
- Each `(partial: ...)` or `(needs human input)` tag is backed by an investigation log entry in the evidence file (see `references/property-catalog.md` "Investigation Log") that records what was examined, what was found, and what wasn't. Untagged (not yet investigated) questions don't require a log entry. Unresolved questions that need human input are an acceptable outcome — they must be visible in the catalog overview, not buried in evidence files.
160
161
- Properties invalidated by investigation are marked in the catalog with the reason
161
162
-`antithesis/scratchbook/property-relationships.md` exists and groups related properties into clusters with brief notes on suspected connections and dominance
162
163
- Every property slug referenced in `property-relationships.md` corresponds to a property in the catalog
Copy file name to clipboardExpand all lines: antithesis-research/references/property-catalog.md
+83Lines changed: 83 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,10 +59,70 @@ For each property, document:
59
59
| **Invariant** | What Antithesis SDK assertion(s)? Be specific about what is checked and how. Include why that assertion type matches the property semantics. |
60
60
| **Antithesis Angle** | How does fault injection interact with this? What timing/interleaving does it explore? |
61
61
| **Why It Matters** | Real-world impact. Link to issues if applicable. |
62
+
63
+
**Open Questions:**
64
+
65
+
- Bullet list of unresolved questions about this property, or "None" if no questions remain. See "Open Questions Conventions" below for state tags and rules.
62
66
```
63
67
64
68
For every property in the catalog, write a corresponding evidence file at `properties/{slug}.md`. See the "Evidence Files" section below.
65
69
70
+
## Honest Summaries
71
+
72
+
A property's catalog entry — its prose fields *and* its Open Questions list together — is the honest summary of what the analysis knows. The fields describe what the property is and why; the Open Questions list describes what's uncertain about it. Both parts do work. Don't overload one to compensate for the other.
73
+
74
+
In particular, don't write a firm claim in **Property**, **Invariant**, **Antithesis Angle**, or **Why It Matters** that the Open Questions list contradicts. That's internally inconsistent, and a reviewer who notices will distrust the rest of the catalog. The fix is to write the prose at the level of generality the analysis actually supports — not to stuff parenthetical hedges into the field. If a key term in the property's prose has an unresolved meaning, the term can stay; the question goes in the Open Questions list and the reader knows to treat the term parametrically.
75
+
76
+
The assertion type in **Invariant** (`Always`, `Sometimes(cond)`, etc.) is the one place where there's no parametric option: it's a discrete commitment. If you can't decide between two assertion types, the property isn't ready for the catalog — pick one (using "Choosing the Right Antithesis Assertion" below) or split into two properties.
77
+
78
+
**Example.** The analysis identified an open question about what "acknowledged" means in this codebase but pinned down everything else about the property.
79
+
80
+
Bad — the catalog entry hides the uncertainty. The reviewer sees a firm claim and assumes a specific definition of "acknowledged" that the analysis didn't actually verify:
Good — same prose, but the Open Questions list does the honest work. The reader sees that "acknowledged" is contested and treats the property's claim parametrically:
- What does "acknowledged" mean in this codebase — fsync'd to leader, or replicated to quorum? `(needs human input)`
98
+
```
99
+
100
+
The Property prose is identical in both. The honest summary is the entry as a whole, including the list.
101
+
102
+
## Open Questions Conventions
103
+
104
+
The Open Questions list under each property gives a reviewer scanning the catalog at-a-glance signal for which properties have unresolved uncertainty. It complements the evidence file; it does not replace it.
105
+
106
+
-**Summary, not verbatim:** each bullet is a short summary of one unresolved question. The full "why it matters / what changes depending on the answer" context lives in the evidence file (see "Evidence Files" below). The list points at the evidence file; it doesn't duplicate it.
107
+
-**State tags:** tags describe how much investigation has happened — not what kind of question it is. Different kinds of questions deserve different follow-up (some need code reading, some need a stakeholder), but the list tracks status only.
108
+
- No tag — not yet investigated. This is the default at discovery output time.
109
+
-`(partial: <one-line summary>)` — investigation produced some findings but the core question remains open. Example: `(partial: confirmed write path uses fsync, replication path still unclear)`.
110
+
-`(needs human input)` — investigation completed and the question can't be resolved without information from a human (project owner, domain expert, etc.). The evidence file's investigation log records what was examined and what's missing.
111
+
-**Exhaust investigation before tagging `(needs human input)`:** the tag means "I tried, the code/docs don't have the answer." The self-review criterion in `SKILL.md` requires the evidence file's investigation log to back this up. Skipping investigation and immediately tagging gets caught at review.
112
+
-**Resolved questions drop out:** once a question is answered, remove it from the list. If the answer changes the property — different invariant, different assertion type, different scope — update the affected catalog fields accordingly. Even if the prose was already written parametrically (per "Honest Summaries"), the resolution often makes specific terms more precise; revise the prose if the new precision changes what the field claims.
113
+
-**Priority is independent of open questions:** a high-priority property may have many open questions; a low-priority property may have none. Open Questions surface uncertainty about a property's claims; they don't lower its value. Don't deprioritize a property because its Open Questions list is busy.
114
+
-**Per-property vs. catalog-wide:** this list is per-property — it holds questions about a specific property's claims. Catalog-wide questions (about the analysis itself, the deployment topology, the scope of testing) belong under the file-level `Open Questions` heading per `references/scratchbook-setup.md`.
115
+
116
+
Example of a populated list (one bullet per state — untagged, partial, needs-human-input):
117
+
118
+
```
119
+
**Open Questions:**
120
+
121
+
- Does the snapshot path retry on a partial write, or surface the error?
122
+
- Is the post-restart index always monotonic? `(partial: confirmed for clean shutdown, crash path unclear)`
123
+
- What does "acknowledged" mean in this codebase — fsync'd to leader, or replicated to quorum? `(needs human input)`
124
+
```
125
+
66
126
## Choosing the Right Antithesis Assertion
67
127
68
128
Antithesis assertions are not traditional crash-on-failure assertions. Failed Antithesis assertions do not terminate the program; they report property outcomes and guide exploration. That makes them safe to place in workload code and, when useful, production code paths. Outside Antithesis, SDKs are designed for minimal overhead and many languages support build-time or runtime disable modes.
@@ -119,6 +179,29 @@ When you encounter questions you can't answer within the scope of discovery, inc
119
179
120
180
The goal is to preserve what you already know from your analysis. Don't do additional deep research for the evidence file — capture what you learned during discovery.
121
181
182
+
### Investigation Log
183
+
184
+
When an open question is investigated (per "Investigate Open Questions" in `references/property-discovery.md`), record what was examined under an `### Investigation Log` heading at the bottom of the evidence file. Use one `####` sub-heading per question (verbatim or close-paraphrase of the question text), with the body recording what was examined, found, and not found:
185
+
186
+
```
187
+
### Investigation Log
188
+
189
+
#### What does "acknowledged" mean — fsync'd to leader, or replicated to quorum?
- Found: leader writes to local WAL with fsync before responding; replication is async by default.
193
+
- Not found: whether `--sync-replicate` flag changes the response point. The flag is referenced in tests but its semantics aren't documented.
194
+
- Conclusion: tagged `(needs human input)` — owner needs to confirm intended behavior under `--sync-replicate`.
195
+
```
196
+
197
+
A few rules for the log:
198
+
199
+
-**Persistence.** Log entries stay in the evidence file after the question resolves — they are audit trail, not working memory. They prove the criterion was met. A future reviewer or a re-investigation pass can read them.
200
+
-**Multi-pass investigation.** If a question is re-investigated (e.g., during evaluation refinement or post-triage iteration), append a new section under the same `####` heading rather than rewriting. Date or context the new entry so the sequence is clear.
201
+
-**Required vs. illustrative fields.** The Examined / Found / Not found / Conclusion structure is the recommended template; deviate when the question doesn't fit. The reviewer's bar is "I can tell what was tried and what was missing," not "I can tick four labeled fields."
202
+
203
+
This makes the "attempted" check in `SKILL.md` self-review verifiable: a reviewer can confirm that each `(partial: ...)` or `(needs human input)` tag has a real investigation behind it.
204
+
122
205
## Output
123
206
124
207
Write the catalog to `antithesis/scratchbook/property-catalog.md`. Include the provenance frontmatter (see "Provenance" section above).
Copy file name to clipboardExpand all lines: antithesis-research/references/property-discovery.md
+48-22Lines changed: 48 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,18 @@ already exists in the codebase, is partially present, or is missing.
178
178
179
179
**Returned to synthesis:**
180
180
181
-
- Properties in catalog format (may be empty)
181
+
- Properties in catalog format (may be empty). Populate the Open Questions list
182
+
directly from the questions written into the evidence file, following
183
+
`references/property-catalog.md` ("Open Questions Conventions"). At this
184
+
stage, questions are typically untagged — they haven't been investigated
185
+
yet. The synthesis "Investigate Open Questions" step adds tags.
186
+
- If a question you raised would contradict a prose claim you'd otherwise
187
+
write in **Property**, **Invariant**, **Antithesis Angle**, or **Why It
188
+
Matters**, write the prose at the level of generality the analysis
189
+
supports — don't overcommit to an answer the Open Questions list says is
190
+
uncertain. The list does the work of surfacing uncertainty; the prose
191
+
doesn't need parenthetical hedges. See `references/property-catalog.md`
192
+
("Honest Summaries").
182
193
- For each property, the evidence file slug and a brief rationale summary —
183
194
enough for synthesis to deduplicate, resolve conflicts, and evaluate assertion
184
195
type, but not the full evidence narrative
@@ -212,21 +223,24 @@ After all agents complete, synthesize into a single property catalog:
212
223
evidence, code paths, or failure mechanisms into clusters. Note any suspected
213
224
dominance (where one property likely implies another). This is lightweight —
214
225
flag connections you noticed during synthesis, don't do deep analysis.
215
-
-**Investigate open questions:** For each property that has open questions in
216
-
its evidence file, spawn an agent to investigate. Run these in parallel — one
217
-
agent per property with open questions. Each agent receives the path to the
218
-
evidence file in the scratchbook, codebase access, and the path to
219
-
`antithesis/scratchbook/existing-assertions.md`. The agent reads the evidence
220
-
file's explanation of why each question matters, investigates the code to
221
-
answer it, and writes the updated evidence file back to the same path with
222
-
questions resolved — including correcting any instrumentation suggestions
223
-
against `existing-assertions.md` to accurately reflect what already exists vs.
224
-
what is missing. The agent returns a short structured summary (not the full
225
-
evidence file) describing: which questions were resolved, whether the property
226
-
changed (different invariant, different assertion type), and whether the
227
-
property was invalidated with the reason. After all agents return, review the
228
-
summaries. If any property was invalidated or changed, update the catalog and
229
-
re-evaluate property relationships accordingly.
226
+
-**Investigate open questions:** see "Investigate Open Questions" below.
227
+
228
+
#### Investigate Open Questions
229
+
230
+
For each property that has open questions in its evidence file, spawn an agent to investigate. Run these in parallel — one agent per property with open questions.
231
+
232
+
Each agent receives the path to the evidence file, codebase access, and the path to `antithesis/scratchbook/existing-assertions.md`.
233
+
234
+
Each agent must:
235
+
236
+
- Read the evidence file's explanation of why each question matters and investigate the code to answer it.
237
+
- Not fabricate answers. If a question can't be resolved from code, docs, or other available evidence, leave it open with partial findings noted. Tag `(partial: ...)` when there are partial findings; tag `(needs human input)` only after exhausting investigation against code and docs. Some questions legitimately need human input — that is a normal outcome, not a failure of the step.
238
+
- Record an investigation log in the evidence file under an `### Investigation Log` heading (see `references/property-catalog.md` "Investigation Log") so the "attempted" check in `SKILL.md` self-review is auditable.
239
+
- Update the evidence file: remove resolved questions, note partial findings, apply human-input tags. Correct any instrumentation suggestions against `existing-assertions.md` to reflect what already exists vs. what is missing.
240
+
241
+
Each agent returns a short structured summary (not the full evidence file) describing: which questions were resolved, which remain (and in what state), whether the property changed (different invariant, different assertion type), and whether the property was invalidated with the reason.
242
+
243
+
After all agents return, review the summaries. Update the catalog whenever a property is invalidated, changed, or has remaining unresolved questions — sync the Open Questions list under each affected property from the evidence file, following `references/property-catalog.md` ("Open Questions Conventions"). If a resolution makes a parametric term in the prose more specific, refine the prose field accordingly. Re-evaluate property relationships if any property changed or was invalidated.
230
244
231
245
## Single-Agent Mode
232
246
@@ -258,12 +272,24 @@ focuses as a sequential checklist:
258
272
lightweight — flag connections you noticed during the passes, don't do deep
259
273
analysis.
260
274
9. For each property with open questions in its evidence file, investigate the
261
-
code to answer them. Read the evidence file's explanation of why each
262
-
question matters, trace the code to answer it, and update the evidence file
263
-
with the answer and its implications — including correcting any instrumentation
264
-
suggestions against `existing-assertions.md`. If the answer changes the
265
-
property or invalidates it, update accordingly. Mark invalidated properties
266
-
in the catalog with the reason.
275
+
code to answer them:
276
+
277
+
- Read the evidence file's explanation of why each question matters and
278
+
trace the code to answer it.
279
+
- Don't fabricate answers. Tag `(partial: ...)` when partial findings exist;
280
+
tag `(needs human input)` only after exhausting investigation against code
281
+
and docs.
282
+
- Record an investigation log in the evidence file under an `### Investigation Log` heading (see `references/property-catalog.md` "Investigation Log") so
283
+
the "attempted" check in `SKILL.md` self-review is auditable.
284
+
- Update the evidence file with resolved answers and their implications,
285
+
including correcting any instrumentation suggestions against
286
+
`existing-assertions.md`.
287
+
- Sync the Open Questions list under each property from the evidence file,
288
+
following `references/property-catalog.md` ("Open Questions Conventions").
289
+
If a resolution makes a parametric term in the prose more specific,
290
+
refine the prose field accordingly.
291
+
- If an answer changes the property or invalidates it, update accordingly.
292
+
Mark invalidated properties in the catalog with the reason.
267
293
268
294
Treat each pass (1–10) as a fresh examination. Resist the pull to skip a focus
269
295
because earlier passes "already covered" that area — the point is to look at the
0 commit comments