Skip to content

Commit aacd03c

Browse files
authored
Merge pull request #50 from OpenMined/madhava/schema-changes
adding schema changes and reporting
2 parents 5413694 + 7b95554 commit aacd03c

43 files changed

Lines changed: 6244 additions & 1676 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@
55
Keep first-party production Rust source files at or below 500 lines. This applies
66
to files under `rust/bioscript-*/src/**/*.rs`.
77

8+
When editing BioScript Rust, prefer adding behavior to a small, named module
9+
whose filename describes the responsibility. If a file is approaching 500 lines,
10+
split it along a real domain boundary before adding more code. Do not satisfy
11+
the guard by creating arbitrary numbered chunks or `*_part_*` files.
12+
813
The 500-line rule does not apply to:
914

1015
- integration tests and unit-test modules
@@ -16,6 +21,5 @@ production limit measures production code, not test scaffolding. Test files
1621
should still be split when they mix unrelated behavior or become hard to scan.
1722

1823
When a production file grows past 500 lines, split it before adding more
19-
behavior. Temporary exceptions must be listed in this file under
20-
`Current Refactor Backlog`; the source-size guard reads that list and fails when
21-
it drifts from the code.
24+
behavior. Keep the include list in the parent file short and logical, and leave
25+
file names meaningful enough that future agents can find the right place to edit.

docs/assay-schema.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Assay Schema
2+
3+
Use an assay when a named test observes one or more variants and emits custom derived report fields.
4+
5+
An assay is different from a panel: a panel is a collection of mostly independent observations, while an assay has its own interpretation logic. APOL1 is an assay because it observes G1/G2 sites and reports one derived APOL1 status.
6+
7+
## Schema Identity
8+
9+
```yaml
10+
schema: "bioscript:assay:1.0"
11+
version: "1.0"
12+
```
13+
14+
## Minimal Shape
15+
16+
```yaml
17+
schema: "bioscript:assay:1.0"
18+
version: "1.0"
19+
name: "APOL1"
20+
label: "APOL1 Risk Assay"
21+
tags:
22+
- "type:risk"
23+
- "gene:APOL1"
24+
25+
members:
26+
- kind: "variant"
27+
path: "g1-site-1.yaml"
28+
version: "1.0"
29+
- kind: "variant"
30+
path: "g1-site-2.yaml"
31+
version: "1.0"
32+
- kind: "variant"
33+
path: "g2-site.yaml"
34+
version: "1.0"
35+
36+
analyses:
37+
- id: "apol1_status"
38+
kind: "bioscript"
39+
path: "apol1.py"
40+
output_format: "tsv"
41+
label: "APOL1 risk genotype"
42+
derived_from:
43+
- "g1-site-1.yaml"
44+
- "g1-site-2.yaml"
45+
- "g2-site.yaml"
46+
emits:
47+
- key: "apol1_status"
48+
label: "APOL1 status"
49+
value_type: "string"
50+
format: "badge"
51+
logic:
52+
source:
53+
name: "Example derivation source"
54+
url: "https://example.org/assay-logic"
55+
description: >
56+
Optional human-readable description of the derivation logic implemented by the analysis script.
57+
```
58+
59+
## Members
60+
61+
Assay members are currently local variant YAML files:
62+
63+
```yaml
64+
- kind: "variant"
65+
path: "g1-site-1.yaml"
66+
version: "1.0"
67+
```
68+
69+
Rules:
70+
71+
- `kind` is required and currently must be `variant`
72+
- `path` is required
73+
- `version` is recommended
74+
- keep variant identity, coordinates, alleles, findings, and provenance in the variant YAML files
75+
76+
## Analyses
77+
78+
Use `analyses` for custom output derived from the member variants. The older `interpretations` key is accepted for compatibility, but new manifests should use `analyses`.
79+
80+
Rules:
81+
82+
- `id`, `kind`, `path`, and `derived_from` are required
83+
- `kind` is currently `bioscript`
84+
- `path` points to a BioScript-compatible Python file
85+
- `output_format` is optional and defaults to `tsv`; use `json` or `jsonl` when the script writes structured JSON output
86+
- `derived_from` lists the variant YAML files used by the interpretation
87+
- `emits` is optional but recommended so report generators know which output columns to display and how to label them
88+
- `logic` is optional; use `logic.description` and `logic.source.url` to document where the script's derivation rules came from
89+
90+
## Findings
91+
92+
Use `findings` for evidence that binds either to a variant observation or an emitted analysis value. Keep the executable logic in `analyses`; keep PGx evidence and reporting semantics in YAML.
93+
94+
```yaml
95+
findings:
96+
- schema: "bioscript:pgx-label:1.0"
97+
id: "clinpgx_PA166313401"
98+
label: "ClinPGx drug label annotation PA166313401"
99+
authority_type: "regulatory_label"
100+
binding:
101+
source: "analysis"
102+
analysis_id: "apoe_epsilon"
103+
key: "apoe_status"
104+
operator: "equals"
105+
value: "e4/e4"
106+
drugs:
107+
- name: "lecanemab"
108+
aliases:
109+
- "LEQEMBI"
110+
evidence:
111+
source: "ClinPGx"
112+
kind: "label_annotation"
113+
id: "PA166313401"
114+
url: "https://www.clinpgx.org/labelAnnotation/PA166313401"
115+
notes: "Drug label annotation applies when APOE status is e4/e4."
116+
```
117+
118+
Binding rules:
119+
120+
- `source` is `analysis` or `variant`
121+
- `analysis` bindings require `analysis_id`, `key`, and either `operator: equals` with `value` or `operator: in` with `values`
122+
- `variant` bindings require `variant` or `path`, `key`, and either `equals`/`value` or `in`/`values`
123+
- PGx label findings use `schema: "bioscript:pgx-label:1.0"` and should include `regulatory_sources`, `pgx_action_level` or `prescribing_actions` when known
124+
- PGx summary findings use `schema: "bioscript:pgx-summary:1.0"` and should include `evidence_level`, `phenotype_categories`, and genotype-specific `effects` when known
125+
- PGx findings should include `drugs` and should link to the exact ClinPGx/PharmGKB/ClinVar evidence page
126+
127+
## Inclusion In Panels
128+
129+
A larger panel may include an assay as a member:
130+
131+
```yaml
132+
members:
133+
- kind: "assay"
134+
path: "../risk/APOL1/assay.yaml"
135+
version: "1.0"
136+
```
137+
138+
When a panel includes an assay, the assay's variant observations can be expanded into the panel output, while report tooling can also run the assay's interpretation and include its emitted fields.

docs/assay-schema.yaml

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
schema: "bioscript:assay:1.0"
2+
version: "1.0"
3+
name: "APOL1"
4+
label: "APOL1 Risk Assay"
5+
summary: "APOL1 assay that observes G1 and G2 sites and emits the derived APOL1 risk genotype."
6+
tags:
7+
- "type:risk"
8+
- "gene:APOL1"
9+
10+
members:
11+
- kind: "variant"
12+
path: "g1-site-1.yaml"
13+
version: "1.0"
14+
- kind: "variant"
15+
path: "g1-site-2.yaml"
16+
version: "1.0"
17+
- kind: "variant"
18+
path: "g2-site.yaml"
19+
version: "1.0"
20+
21+
analyses:
22+
- id: "apol1_status"
23+
kind: "bioscript"
24+
path: "apol1.py"
25+
output_format: "tsv"
26+
label: "APOL1 risk genotype"
27+
derived_from:
28+
- "g1-site-1.yaml"
29+
- "g1-site-2.yaml"
30+
- "g2-site.yaml"
31+
emits:
32+
- key: "apol1_status"
33+
label: "APOL1 status"
34+
value_type: "string"
35+
format: "badge"
36+
logic:
37+
source:
38+
name: "Example derivation source"
39+
url: "https://example.org/assay-logic"
40+
description: >
41+
Optional human-readable description of the derivation logic implemented by the analysis script.
42+
43+
findings:
44+
- schema: "bioscript:pgx-label:1.0"
45+
id: "example_analysis_bound_pgx_finding"
46+
label: "Example analysis-bound PGx finding"
47+
authority_type: "regulatory_label"
48+
binding:
49+
source: "analysis"
50+
analysis_id: "apol1_status"
51+
key: "apol1_status"
52+
operator: "equals"
53+
value: "G2/G2"
54+
drugs:
55+
- name: "example drug"
56+
aliases:
57+
- "example brand"
58+
regulatory_sources:
59+
- "FDA"
60+
pgx_action_level: "Actionable PGx"
61+
evidence:
62+
source: "ClinPGx"
63+
kind: "label_annotation"
64+
id: "PA..."
65+
url: "https://www.clinpgx.org/labelAnnotation/PA..."
66+
notes: "Findings can bind to emitted analysis keys using equals or in."

docs/panel-schema.md

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Panel Schema
22

3-
Use a panel when you want one manifest that points to a curated set of runnable variant records.
3+
Use a panel when you want one manifest that points to a curated set of runnable variant records, assay manifests, and optional interpretation scripts derived from those records.
44

5-
Right now the Rust runner supports variant members directly. Keep the shape simple.
5+
The Rust runner supports variant members directly. Test tooling can also run declared interpretation scripts and add their emitted fields to the generated report.
66

77
## Schema Identity
88

@@ -25,9 +25,26 @@ members:
2525
- kind: "variant"
2626
path: "variants/rs671.yaml"
2727
version: "1.0"
28+
- kind: "assay"
29+
path: "../risk/APOL1/assay.yaml"
30+
version: "1.0"
2831
- kind: "variant"
2932
path: "variants/rs713598.yaml"
3033
version: "1.0"
34+
35+
analyses:
36+
- id: "taste_status"
37+
kind: "bioscript"
38+
path: "interpretations/taste.py"
39+
output_format: "tsv"
40+
label: "Taste status"
41+
derived_from:
42+
- "variants/rs713598.yaml"
43+
emits:
44+
- key: "taste_status"
45+
label: "Taste status"
46+
value_type: "string"
47+
format: "badge"
3148
```
3249
3350
## Purpose
@@ -37,30 +54,74 @@ A panel is:
3754
- a selection manifest
3855
- a stable name for a bundle of variants
3956
- something the Rust `bioscript` command can run directly
57+
- a way to include smaller assay manifests in a broader bundle
58+
- a place to declare interpretation chunks that derive custom report fields from member variants
4059

4160
It is not:
4261

4362
- a full remote package manager
4463
- a replacement for richer assay manifests
64+
- a place to hide variant metadata inside Python when YAML can describe it
4565

4666
## Members
4767

48-
Each member must currently be:
68+
Each member must currently be a local variant or assay:
4969

5070
```yaml
5171
- kind: "variant"
5272
path: "variants/rs671.yaml"
5373
version: "1.0"
74+
- kind: "assay"
75+
path: "../risk/APOL1/assay.yaml"
76+
version: "1.0"
5477
```
5578

5679
Rules:
5780

5881
- `kind` is required
5982
- exactly one of `path` or `download` is required
60-
- current runner support is `variant` members only
83+
- current runner support is local `variant` and `assay` members
6184
- `version` is recommended for local members
6285
- `sha256` is optional for local members
6386

87+
## Analyses
88+
89+
Use `analyses` when a panel needs custom derived output that is not the same thing as a single variant observation. Examples include APOE epsilon genotype from rs429358/rs7412 or APOL1 G0/G1/G2 status from three sites. The older `interpretations` key is accepted for compatibility, but new manifests should use `analyses`.
90+
91+
```yaml
92+
analyses:
93+
- id: "apoe_epsilon"
94+
kind: "bioscript"
95+
path: "variants/APOE/apoe.py"
96+
output_format: "tsv"
97+
label: "APOE epsilon genotype"
98+
derived_from:
99+
- "variants/APOE/rs429358.yaml"
100+
- "variants/APOE/rs7412.yaml"
101+
emits:
102+
- key: "apoe_status"
103+
label: "APOE status"
104+
value_type: "string"
105+
format: "badge"
106+
logic:
107+
source:
108+
name: "ClinPGx / PharmGKB"
109+
url: "https://www.clinpgx.org/variant/PA166155341/overview"
110+
description: >
111+
Optional human-readable description of the derivation logic implemented by the analysis script.
112+
```
113+
114+
Rules:
115+
116+
- `id`, `kind`, `path`, and `derived_from` are required
117+
- `kind` is currently `bioscript`
118+
- `path` points to a BioScript-compatible Python file
119+
- `output_format` is optional and defaults to `tsv`; use `json` or `jsonl` when the script writes structured JSON output
120+
- `derived_from` lists the variant YAML files used by the interpretation
121+
- `emits` is optional but recommended so report generators know which output columns to display and how to label them
122+
- `logic` is optional; use `logic.description` and `logic.source.url` to document where the script's derivation rules came from
123+
- keep variant identity, coordinates, alleles, findings, and provenance in YAML; keep cross-variant logic in the interpretation script
124+
64125
## Permissions And Downloads
65126

66127
Panels may declare remote downloads up front even if the current runner only executes local members.

docs/panel-schema.yaml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,29 @@ members:
99
- kind: "variant"
1010
path: "variants/rs671.yaml"
1111
version: "1.0"
12+
- kind: "assay"
13+
path: "../risk/APOL1/assay.yaml"
14+
version: "1.0"
1215
- kind: "variant"
1316
path: "variants/rs713598.yaml"
1417
version: "1.0"
18+
19+
analyses:
20+
- id: "taste_status"
21+
kind: "bioscript"
22+
path: "interpretations/taste.py"
23+
output_format: "tsv"
24+
label: "Taste status"
25+
derived_from:
26+
- "variants/rs713598.yaml"
27+
emits:
28+
- key: "taste_status"
29+
label: "Taste status"
30+
value_type: "string"
31+
format: "badge"
32+
logic:
33+
source:
34+
name: "Example derivation source"
35+
url: "https://example.org/panel-analysis-logic"
36+
description: >
37+
Optional human-readable description of the derivation logic implemented by the analysis script.

0 commit comments

Comments
 (0)