Skip to content

Add publication-backed causal graph to Long COVID#371

Open
kevinschaper wants to merge 1 commit intomainfrom
add-long-covid-causal-graph
Open

Add publication-backed causal graph to Long COVID#371
kevinschaper wants to merge 1 commit intomainfrom
add-long-covid-causal-graph

Conversation

@kevinschaper
Copy link
Member

Summary

  • Adds a 37-edge causal graph to the Long COVID entry, connecting 38 existing nodes (11 pathophysiology, 21 phenotypes, 2 biochemical, 4 treatments)
  • Every edge was audited against cited PubMed abstracts and classified as STRONG, MODERATE, WEAK, or WRONG
  • 14 formerly unsupported edges were upgraded with new references (exact snippets verified as substring matches against cached abstracts)
  • 2 edges removed after audit:
    • Orthostatic Tachycardia → Fatigue: Kell et al. 2024 (PMID:38392604) shows both are downstream of microclots (common cause), not A→B
    • Sleep Disturbance → Fatigue: only associative evidence available, no causal support found
  • 15 new reference cache files added
  • Deep research output from falcon provider included

Test plan

  • just validate kb/disorders/Long_COVID.yaml passes (note: pre-existing creation_date/updated_date schema issue)
  • just validate-references kb/disorders/Long_COVID.yaml — all snippets match cached abstracts
  • uv run python -m dismech.render kb/disorders/Long_COVID.yaml generates HTML with Mermaid causal graph
  • Visual inspection of rendered causal graph shows expected topology

🤖 Generated with Claude Code

Build a publication-backed causal graph connecting the 38 existing nodes
(11 pathophysiology, 21 phenotypes, 2 biochemical, 4 treatments) via
downstream and sequelae edges. Every edge was audited against cited
abstracts; 14 edges that lacked direct support were upgraded with new
references verified as exact substring matches against PubMed abstracts.
Two edges were removed after audit: Orthostatic Tachycardia → Fatigue
(literature shows common cause via microclots, not A→B) and Sleep
Disturbance → Fatigue (only associative evidence available).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @kevinschaper's task in 2m 2s —— View job


PR Review: Add publication-backed causal graph to Long COVID

  • Read Long COVID YAML file and causal graph structure
  • Review reference cache files for snippet accuracy
  • Check biological/clinical validity of causal edges
  • Check guideline conformance
  • Run validation checks
  • Provide comprehensive review feedback

Summary

This PR adds a well-structured downstream / sequelae causal graph to the Long COVID entry, with 37 edges connecting 11 pathophysiology nodes, 21 phenotypes, 2 biochemical, and 4 treatment nodes. Overall, the curation quality is high: snippets are verified against cached abstracts, edge removals are scientifically justified, and new references fill real gaps. I have some observations organized by severity below.


Issues

Minor: Non-COVID-specific evidence used for mechanism edges

PMID:28668382 (Systemic Immune Activation → Impaired Mitochondrial Energy Production) is from a 2019 neurodegenerative disease review (Neurosci Lett) — not a Long COVID or COVID-19 paper. The snippet ("Cytokines, particularly tumor necrosis factor-alpha, impede mitochondrial oxidative phosphorylation...") accurately describes the general mechanism, but it's a class-level claim applied to non-COVID conditions. Similarly, PMID:24557875 (Metabolic Shift → Impaired Mitochondrial Energy Production feedback loop) appears to be a general mitochondrial biochemistry reference.

Recommendation: These are acceptable as mechanistic support, but the evidence_source should be set to COMPUTATIONAL or OTHER if they are reviews/mechanistic papers, rather than defaulting to HUMAN_CLINICAL. Currently these evidence items lack an explicit evidence_source field; per CLAUDE.md they will default to HUMAN_CLINICAL, which would be misleading for a neurodegenerative review paper.

Minor: PMID:15710782 is pre-COVID evidence for Mast Cell → Nausea edge

The Shibao et al. 2005 (Hypertension) paper describes MCA+POTS patients in a pre-COVID cohort. While it demonstrates nausea as a mast cell symptom, this is supporting the general MCA mechanism rather than Long COVID specifically. This is biologically valid but should note it as mechanistic/translational context in the explanation field, or the evidence_source should be OTHER rather than HUMAN_CLINICAL.

Minor: Missing evidence_source on several new edges

Several newly added evidence items lack an explicit evidence_source field:

  • PMID:28668382 — should be OTHER (general neurological review, not COVID)
  • PMID:24557875 — should be OTHER (general biochemistry paper)
  • PMID:15710782 — could be HUMAN_CLINICAL but from pre-COVID POTS population

Per CLAUDE.md, evidence_source is mandatory to classify the type of evidence in the publication. Adding explicit values prevents ambiguity.

Observation: Mast Cell → Tachycardia edge lacks evidence

The downstream edge from Mast Cell Activation → Tachycardia has no associated evidence item (the description says "Mast cell-derived histamine directly increases heart rate"). The adjacent edges (Mast Cell → Diarrhea, Mast Cell → ANS Dysfunction) do have evidence. The histamine-tachycardia mechanism is well-established, but consistency suggests adding at least a reference here — PMID:15710782 (Shibao 2005) would work since it documents "high heart rate" in MCA+POTS patients.

Observation: Mast Cell → Diarrhea edge also lacks evidence

Similarly, Mast Cell Activation → Diarrhea has no evidence item. PMID:15710782 documents diarrhea as a characteristic GI symptom of MCA+POTS and would be appropriate here too.

Positive: Removed edges are scientifically sound

The rationale for removing two edges is well-documented:

  • Orthostatic Tachycardia → Fatigue: Kell et al. (PMID:38392604) correctly places both as downstream of microclots — the paper explicitly states "fibrinaloid microclots can serve to link POTS and fatigue in long COVID", confirming a common cause structure rather than A→B.
  • Sleep Disturbance → Fatigue: Removing an associative-only edge shows appropriate causal skepticism.

Biological/Clinical Validity

The causal graph topology is biologically coherent and well-supported:

Edge Assessment
Viral Persistence → Mast Cell Activation Strong (PMID:35938771 - direct mechanism)
Systemic Immune Activation → Neuroinflammation Strong (established pathway)
Vagus-Hippocampal Signaling → Neuroinflammation Good (PMID:38018736 - vagal inflammatory reflex)
Mast Cell → Neuroinflammation Strong (PMID:38638822 - in vitro mechanism, tight junction disruption)
Mast Cell → ANS Dysfunction Good (PMID:37951572 - anatomical/mechanistic basis)
Sleep Disturbance → Cognitive Impairment Good (PMID:38370438 - PASC-specific study)
Insomnia → Anxiety / Depression Strong (PMID:30537570, PMID:27816065 - meta-analyses)
Serotonin Depletion → Depression Sound mechanistic reasoning
Metabolic Shift ⇄ Impaired Mitochondrial Energy Bidirectional feedback loop appropriately modeled

The choice to use sequelae (from phenotype nodes) vs downstream (from pathophysiology nodes) is correctly applied throughout.


Guideline Conformance

  • Schema structure: downstream on pathophysiology nodes and sequelae on phenotype nodes are used correctly.
  • Evidence items: Snippets are exact quotes verified against cached abstracts (checked: PMID:35938771, PMID:28668382, PMID:38018736, PMID:37951572, PMID:38638822, PMID:30537570, PMID:27816065, PMID:38370438, PMID:36284805, PMID:37389095, PMID:35295802, PMID:15710782).
  • updated_date: Updated to 2026-02-18T01:15:54Z — appropriate.
  • evidence_source gaps: See issues above.

Completeness

The graph covers the major Long COVID pathophysiology pathways well. A few downstream connections that could be considered for future enhancement (not blocking):

  • Serotonin Depletion → Fatigue: Serotonin's role in peripheral fatigue is increasingly recognized in Long COVID, and the Bhatt et al. (PMID:37848036) paper mentions serotonin reduction contributing broadly to PASC symptoms.
  • Neuroinflammation → Sleep Disturbance: A pathway from neuroinflammation to disrupted sleep is biologically plausible and commonly cited in Long COVID literature, currently absent from the graph.

Recommendation

Approve with minor suggestions. The core work is solid — verified snippets, justified edge removals, good source diversity, and coherent graph topology. Before merging:

  1. Fix this → Add evidence_source to the two non-COVID general review references.
  2. Consider adding evidence items to the two unevidenced edges (Mast Cell → Tachycardia, Mast Cell → Diarrhea).

@cmungall
Copy link
Member

@claude I've seen a bit of churn lately to do with these fields like last update and creation date. I'm not sure I ever intended these to be added, as I think this kind of audit is a bit of a separate concern. At some point there was a disconnect between what the CLAUDE.md was saying what the schema said and what we are doing, and this PR is maybe out of sync? Sorry to give you such an underspecified tedious task but I'm sure you will take this in your stride.

@claude
Copy link
Contributor

claude bot commented Feb 21, 2026

Claude finished @cmungall's task —— View job


Analysis of creation_date/updated_date field inconsistencies

The Disconnect Issue

I've identified the core issue you're referring to. There's a three-way mismatch:

1. Schema Reality

  • src/dismech/schema/dismech.yaml: No creation_date or updated_date fields defined
  • Files with these fields technically fail schema validation

2. CLAUDE.md Instructions

  • Lines 109-122: Explicit instructions to add these fields to every Disease entry
  • Detailed rules about ISO 8601 format, stable creation_date, updated updated_date

3. Current Practice

  • 272 of 390 files (70%) already have these fields added
  • Recent commits from January 2026 show active work on date features
  • This PR follows CLAUDE.md by including both fields

Recommendations for Alignment

Option A: Remove from CLAUDE.md

  • Delete the "Entry Metadata Dates" section (lines 109-122)
  • Keep schema as-is (no date fields)
  • Treat dates as a separate audit concern outside the core data model

Option B: Add to Schema

  • Add optional creation_date and updated_date fields to the Disease class
  • Keep CLAUDE.md instructions
  • Allows validation to pass for files with dates

Option C: Gradual Removal

  • Remove CLAUDE.md instructions now
  • Leave existing date fields in files (they'll be ignored by validation)
  • Let them naturally disappear through normal curation

This PR's Status

The Long COVID changes are following CLAUDE.md instructions, but the broader inconsistency needs resolution first. The causal graph work itself is solid.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants