-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Automated evidence synthesis for NF Data Portal (papers + datasets → consensus summaries)
Problem
Users have to read dozens of NF (Neurofibromatosis) papers and datasets to answer a simple question. That’s slow and inconsistent. We want automation that ingests each publication/dataset we collect, extracts key findings, and generates auditable, citation-backed consensus answers to user questions.
Goal
Build a “Consensus-style” (https://consensus.app/home/about-us/) capability inside the NF portal:
- Batch-analyze publications and datasets on ingest.
- Let users ask scoped questions about NF and receive a graded, evidence-weighted summary with links to the underlying sources.
- Support filters (study type, cohort size, NF subtype, outcome, patient pop, etc.) and show uncertainty.
User stories
- Researcher/Clinician: “Does MEK inhibition reduce plexiform tumor volume in NF1 pediatrics?” → portal returns a 3–6 sentence consensus with confidence, plus an evidence table of studies/datasets and direct citations.
- Data curator: New PubMed IDs or datasets are added → background job parses, extracts metadata/results, indexes embeddings, and (optionally) precomputes snapshots.
- PI/Analyst: Filter to RCTs vs. observational; export evidence table + citations for a grant or report.
Scope (MVP)
-
Ingestion & parsing
- Publications: PubMed/DOI metadata, PDF parsing (section-aware; e.g., GROBID or equivalent).
- Datasets: core metadata (modality, NF subtype, sample size, outcomes), link to Synapse/accession.
-
Extraction
- Study design (RCT, cohort, case series), cohort size, endpoints, effect directions/magnitudes when available.
- NF-specific entities: NF1/NF2/Schwannomatosis, intervention, tumor type, age group.
-
Indexing & retrieval
- Embeddings over abstract + results + methods + tables captions.
- Store structured fields (Postgres) and vector index (OpenSearch/FAISS).
-
Answer synthesis
-
LLM generates a short consensus answer with:
- Inline citation brackets mapping to exact papers/datasets
- Confidence grade derived from evidence quality (see “Evidence grading”)
- “What we don’t know yet” bullets (uncertainty)
-
-
UI
- Consensus Card (answer + confidence + top 5 sources)
- Evidence Table (sortable: study type, N, outcome, effect, link)
- Filters (study type, NF subtype, age, outcome)
-
Automation
- Background worker triggers on new/updated records.
- Optional nightly job to refresh embeddings/summaries.
-
Provenance & guardrails
- Every claim links back to exact passage (page + sentence range if possible).
- “View Snapshot” per paper/dataset: key findings, methods, limitations.
Evidence grading (initial heuristic)
- Weight = f(study type, sample size, recency, journal tier/peer review, consistency across studies).
- Display: High / Moderate / Low confidence.
- Show why: e.g., “3 RCTs (N=212) consistent direction; 1 small conflicting cohort study.”
API sketch
POST /consensus/query→ {question, filters} → {answer, confidence, citations[], evidence[]}GET /evidence?filters=...→ tabular resultsPOST /ingest/publication|dataset→ triggers parse/extract/indexGET /snapshot/{id}→ per-source extracted summary + provenance
Data model (key fields)
Source: id, type (publication|dataset), title, authors, year, journal, link, accessNF tags: subtype, tumor type, population (age/sex), interventionStudy: design, N, endpoints, effect (dir/magnitude if reported), limitationsEmbeddings: chunks + vector idsProvenance: docId, page, char offsets
Acceptance criteria
- Upload/ingest ≥10 NF publications + ≥3 datasets → indexed without manual cleanup.
- Query returns a consensus answer in ≤ ~2–3 sentences with confidence label and ≥3 citations.
- Evidence table supports filter by study type and NF subtype.
- Each citation opens a per-source snapshot showing the exact supporting passage.
- No uncited claims in the consensus text.
- Background job auto-processes new items and updates the index.
- Basic red-team checks: model refuses to answer outside NF scope; shows “insufficient evidence” when appropriate.
Nice-to-haves (post-MVP)
- Effect size extraction from tables/figures.
- Interactive comparison (“paper vs. paper” diffs, forest-plot style view).
- Human-in-the-loop feedback to correct extractions and boost good sources.
- Export to DOCX/CSV with citations.
Risks & mitigations
- Hallucinations: Strict cite-or-silent rule; show source snippet; retrieval-first pipeline.
- PII/licensing: Only process permitted PDFs/datasets; respect data-use agreements.
- Heterogeneous outcomes: Normalize outcome terminology; map to controlled vocab where possible.
Open questions
- Which NF ontologies/vocabularies do we standardize on?
- Do we restrict to human studies for clinical queries by default?
- Preferred embedding/model stack for our environment?
Metadata
Metadata
Assignees
Labels
No labels