Skip to content

idea: Automated evidence synthesis for NF Data Portal (papers + datasets → consensus summaries) #23

@changtotheintothemoon

Description

@changtotheintothemoon

Automated evidence synthesis for NF Data Portal (papers + datasets → consensus summaries)

Problem

Users have to read dozens of NF (Neurofibromatosis) papers and datasets to answer a simple question. That’s slow and inconsistent. We want automation that ingests each publication/dataset we collect, extracts key findings, and generates auditable, citation-backed consensus answers to user questions.

Goal

Build a “Consensus-style” (https://consensus.app/home/about-us/) capability inside the NF portal:

  • Batch-analyze publications and datasets on ingest.
  • Let users ask scoped questions about NF and receive a graded, evidence-weighted summary with links to the underlying sources.
  • Support filters (study type, cohort size, NF subtype, outcome, patient pop, etc.) and show uncertainty.

User stories

  • Researcher/Clinician: “Does MEK inhibition reduce plexiform tumor volume in NF1 pediatrics?” → portal returns a 3–6 sentence consensus with confidence, plus an evidence table of studies/datasets and direct citations.
  • Data curator: New PubMed IDs or datasets are added → background job parses, extracts metadata/results, indexes embeddings, and (optionally) precomputes snapshots.
  • PI/Analyst: Filter to RCTs vs. observational; export evidence table + citations for a grant or report.

Scope (MVP)

  1. Ingestion & parsing

    • Publications: PubMed/DOI metadata, PDF parsing (section-aware; e.g., GROBID or equivalent).
    • Datasets: core metadata (modality, NF subtype, sample size, outcomes), link to Synapse/accession.
  2. Extraction

    • Study design (RCT, cohort, case series), cohort size, endpoints, effect directions/magnitudes when available.
    • NF-specific entities: NF1/NF2/Schwannomatosis, intervention, tumor type, age group.
  3. Indexing & retrieval

    • Embeddings over abstract + results + methods + tables captions.
    • Store structured fields (Postgres) and vector index (OpenSearch/FAISS).
  4. Answer synthesis

    • LLM generates a short consensus answer with:

      • Inline citation brackets mapping to exact papers/datasets
      • Confidence grade derived from evidence quality (see “Evidence grading”)
      • “What we don’t know yet” bullets (uncertainty)
  5. UI

    • Consensus Card (answer + confidence + top 5 sources)
    • Evidence Table (sortable: study type, N, outcome, effect, link)
    • Filters (study type, NF subtype, age, outcome)
  6. Automation

    • Background worker triggers on new/updated records.
    • Optional nightly job to refresh embeddings/summaries.
  7. Provenance & guardrails

    • Every claim links back to exact passage (page + sentence range if possible).
    • “View Snapshot” per paper/dataset: key findings, methods, limitations.

Evidence grading (initial heuristic)

  • Weight = f(study type, sample size, recency, journal tier/peer review, consistency across studies).
  • Display: High / Moderate / Low confidence.
  • Show why: e.g., “3 RCTs (N=212) consistent direction; 1 small conflicting cohort study.”

API sketch

  • POST /consensus/query → {question, filters} → {answer, confidence, citations[], evidence[]}
  • GET /evidence?filters=... → tabular results
  • POST /ingest/publication|dataset → triggers parse/extract/index
  • GET /snapshot/{id} → per-source extracted summary + provenance

Data model (key fields)

  • Source: id, type (publication|dataset), title, authors, year, journal, link, access
  • NF tags: subtype, tumor type, population (age/sex), intervention
  • Study: design, N, endpoints, effect (dir/magnitude if reported), limitations
  • Embeddings: chunks + vector ids
  • Provenance: docId, page, char offsets

Acceptance criteria

  • Upload/ingest ≥10 NF publications + ≥3 datasets → indexed without manual cleanup.
  • Query returns a consensus answer in ≤ ~2–3 sentences with confidence label and ≥3 citations.
  • Evidence table supports filter by study type and NF subtype.
  • Each citation opens a per-source snapshot showing the exact supporting passage.
  • No uncited claims in the consensus text.
  • Background job auto-processes new items and updates the index.
  • Basic red-team checks: model refuses to answer outside NF scope; shows “insufficient evidence” when appropriate.

Nice-to-haves (post-MVP)

  • Effect size extraction from tables/figures.
  • Interactive comparison (“paper vs. paper” diffs, forest-plot style view).
  • Human-in-the-loop feedback to correct extractions and boost good sources.
  • Export to DOCX/CSV with citations.

Risks & mitigations

  • Hallucinations: Strict cite-or-silent rule; show source snippet; retrieval-first pipeline.
  • PII/licensing: Only process permitted PDFs/datasets; respect data-use agreements.
  • Heterogeneous outcomes: Normalize outcome terminology; map to controlled vocab where possible.

Open questions

  • Which NF ontologies/vocabularies do we standardize on?
  • Do we restrict to human studies for clinical queries by default?
  • Preferred embedding/model stack for our environment?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions