Biomedical Ontology & Diagnostic Hypergraph Infrastructure
BODHI is a suite of open medical knowledge graphs built for clinical AI. It encodes structured clinical relationships — conditions, symptoms, drugs, lab tests, and their interactions — into Neo4j graph databases, with export pipelines for downstream ML and application use.
| Network | Focus | Nodes | Relationships |
|---|---|---|---|
| bodhi-s | Condition ↔ Symptom ↔ Speciality | 4,855 | 13,204 |
| bodhi-m | Concept ↔ Drug ↔ LabInvestigation | 4,469 | 3,566 |
Both graphs share the same Neo4j instance (default database) and are distinguished by their node labels.
Maps conditions to their presenting symptoms, clinical specialities, and inter-condition influences.
| Metric | Count |
|---|---|
| Condition nodes | 779 |
| Symptom nodes (variants) | 4,037 |
| Symptom root concepts (distinct SNOMED IDs) | 590 |
| Speciality nodes | 39 |
| Total relationships | 13,204 |
| Symptom → Condition edges (PRESENT_IN) | 10,352 |
| Condition → Speciality edges (TREATED_BY) | 1,558 |
| Condition → Condition edges (IS_INFLUENCED_BY) | 1,020 |
| Condition → Condition edges (RELATED_TO) | 221 |
| Condition → Condition edges (HAS_PREREQUISITE) | 53 |
Condition types: Disorder 607 · Misc 84 · FamilyHistory 49 · Lifestyle 21 · Procedure 16 · Allergy 1 · Symptom 1
Avg symptoms per condition: 13.3
Condition triage: OPD Managed 367 (47%) · Worrisome 223 (29%) · Emergency 189 (24%)
Symptom triage: OPD Managed 2,244 (56%) · Worrisome 1,540 (38%) · Emergency 252 (6%)
Most symptomatic conditions: Sickle cell anaemia (54) · Psoriasis (46) · Cerebrovascular accident (45) · Gonorrhea (44) · Multiple sclerosis (42)
Most cross-cutting symptoms: Fever (145 conditions) · Fatigue (126) · Headache (110) · Vomit (94) · Malaise (81)
Top specialities by condition volume: Internal Medicine (292) · General Physician (205) · Orthopedic (139) · Neurologist (83) · General Surgeon (81)
A diagnosable medical condition, disorder, or clinical entity.
| Property | Type | Values | Description |
|---|---|---|---|
snomed_id |
string | SNOMED CT ID | Globally unique clinical identifier |
name |
string | — | Clinical name of the condition |
concept_type |
enum | Disorder Misc FamilyHistory Lifestyle Procedure Allergy Symptom |
Classification of the concept |
triage_level |
enum | opd_managed worrisome emergency |
Clinical urgency |
type_condition |
enum | acute chronic acute_that_may_turn_chronic chronic_with_acute_aggravation lifestyle medical_history Event Injury |
Temporal nature of condition |
overall_likelihood |
enum | rare low medium high very_high |
Population prevalence signal |
likelihood_male |
float | 0.0–1.0 | Relative likelihood in males |
likelihood_female |
float | 0.0–1.0 | Relative likelihood in females |
likelihood_age_0_1 |
float | 0.0–1.0 | Relative likelihood in age 0–1 |
likelihood_age_1_5 |
float | 0.0–1.0 | Relative likelihood in age 1–5 |
likelihood_age_6_12 |
float | 0.0–1.0 | Relative likelihood in age 6–12 |
likelihood_age_13_18 |
float | 0.0–1.0 | Relative likelihood in age 13–18 |
likelihood_age_19_30 |
float | 0.0–1.0 | Relative likelihood in age 19–30 |
likelihood_age_30_45 |
float | 0.0–1.0 | Relative likelihood in age 30–45 |
likelihood_age_45_60 |
float | 0.0–1.0 | Relative likelihood in age 45–60 |
likelihood_age_60_plus |
float | 0.0–1.0 | Relative likelihood in age 60+ |
A clinical symptom or refinement thereof. Symptoms are structured with parent-child refinements (e.g. "Headache > Throbbing headache > Throbbing headache on right side").
| Property | Type | Values | Description |
|---|---|---|---|
uuid |
string | UUID | Unique compound symptom identifier |
snomed_id |
string | SNOMED CT ID | SNOMED identifier for the symptom |
root_snomed_id |
string | SNOMED CT ID | Parent/root symptom SNOMED ID |
root_snomed_name |
string | — | Parent symptom name |
name |
string | — | Full compound symptom name |
triage_level |
enum | opd_managed worrisome emergency |
Clinical urgency of this symptom |
relation1_type |
enum | characteristic severity location laterality onset duration_since duration_lasts temporal_pattern pain_type radiating aggravated relieved |
Type of the first refinement axis |
child1_name |
string | — | Value of the first refinement |
grouping1_selection_type |
enum | s m |
Single (s) or multi-select (m) for axis 1 |
relation2_type |
enum | (same as relation1_type) | Type of the second refinement axis |
child2_name |
string | — | Value of the second refinement |
grouping2_selection_type |
enum | s m |
Single or multi-select for axis 2 |
relation3_type |
enum | (same as relation1_type) | Type of the third refinement axis |
child3_name |
string | — | Value of the third refinement |
grouping3_selection_type |
enum | s m |
Single or multi-select for axis 3 |
A medical speciality or care discipline.
| Property | Type | Description |
|---|---|---|
id |
string | Internal speciality identifier |
name |
string | Speciality name e.g. Cardiologist |
Links a symptom to a condition it presents in. Encodes bidirectional likelihood.
| Property | Values | Description |
|---|---|---|
likelihood_symptom_given_condition |
zero rare low medium high very_high |
How commonly this symptom appears when the condition is present — P(symptom | condition) |
likelihood_condition_given_symptom |
zero rare low medium high very_high |
How predictive this symptom is of the condition — P(condition | symptom) |
Indicates which speciality manages a condition.
| Property | Values | Description |
|---|---|---|
weight |
rare low medium high very_high |
Strength of the referral association |
Condition A is clinically influenced by the presence of condition B (e.g. Diabetes IS_INFLUENCED_BY Obesity).
| Property | Values | Description |
|---|---|---|
relation_strength |
zero rare low medium high very_high |
Magnitude of influence |
relation_polarity |
positive negative |
Positive = B increases risk of A; Negative = B decreases risk of A |
Condition A requires condition B to be present (e.g. Diabetic nephropathy HAS_PREREQUISITE Diabetes mellitus).
| Property | Values | Description |
|---|---|---|
relation_strength |
medium high very_high |
How mandatory the prerequisite is |
relation_polarity |
positive negative |
Direction of dependency |
Ontological relatedness — used for symptom deduplication and SNOMED hierarchy linkage.
| Property | Values | Description |
|---|---|---|
relation_type |
same similar snomed_parent refinement_same_to_root refinement_same_to_refinement refinement_similar_to_root refinement_similar_to_refinement |
Nature of the ontological relationship |
Maps clinical concepts (conditions/disorders) to their treatments (generic drugs) and monitoring parameters (lab tests / vitals), organised in a three-level clinical hierarchy.
| Metric | Count |
|---|---|
| Concept nodes | 2,471 |
| Drug nodes | 1,186 |
| LabInvestigation nodes | 812 |
| Total relationships | 3,566 |
| Concept → Concept edges (CHILD_OF) | 1,768 |
| Concept → Drug edges (TREATED_BY) | 908 |
| LabInvestigation → Concept edges (IMPACTS) | 808 |
| Concept → LabInvestigation edges (MONITORED_BY) | 82 |
Concept hierarchy: System 14 → Group 250 → Granular 1,942 (+ 265 unmapped)
Hierarchy coverage: 1,540 / 1,942 granular linked to group · 228 / 250 groups linked to system
Top drug therapeutic classes: GI (197) · Cardiac (182) · Derma (157) · Neuro/CNS (153) · Vitamins & Minerals (148) · Respiratory (111)
Most treated conditions: Hypertension (59 drugs) · Diabetes mellitus (33) · Acid Peptic Disease (28) · Fungal infection (21) · Heart failure (21)
Most monitored conditions: Dyslipidemia (4 vitals) · Diabetes mellitus (4) · Hypothyroidism (3) · Iron deficiency (3) · Hyperglycemia (3)
LabInvestigation health domains: Immunological (276) · Renal (120) · Hematological (82) · Endocrine (65) · Gastrointestinal (60)
LabInvestigation LOINC coverage: 812 of 1,973 vitals have LOINC IDs — only LOINC-mapped vitals are included for open interoperability.
A clinical concept — condition, disorder, finding, procedure, or lifestyle factor — organised in a three-level hierarchy: System → Group → Granular.
| Property | Type | Values | Description |
|---|---|---|---|
snomed_id |
string | SNOMED CT ID | Primary unique identifier (open standard) |
name |
string | — | Clinical name |
display_name |
string | — | Consumer-friendly name e.g. High Blood Pressure |
level_concept |
enum | system group granular |
Position in the clinical hierarchy |
type_concept |
enum | Disorder Finding Procedure Lifestyle Allergy Situation |
Clinical category |
type_information |
enum | SelfHistory FamilyHistory |
Whether this pertains to the patient themselves or family history |
active |
string | 1 |
Whether this concept is active in the knowledge base |
Hierarchy levels:
system— broad health domain e.g. Cardiovascular health, Endocrine health (14 nodes)group— clinical cluster e.g. Diabetes mellitus, Coronary Artery Disease (250 nodes)granular— specific diagnosable entity e.g. Diabetes mellitus type II (1,942 nodes)
A generic drug formulation. Combination drugs are stored as a single node.
| Property | Type | Description |
|---|---|---|
hash |
string | MD5 hash of the generic name — unique deduplication key |
name |
string | Generic drug name e.g. metformin, atorvastatin + aspirin |
therapeutic_class |
string | Comma-separated therapeutic class(es) e.g. Anti Diabetic, Cardiovascular |
A lab test or clinical measurement, identified by LOINC standard code.
| Property | Type | Values | Description |
|---|---|---|---|
loinc_id |
string | LOINC ID | Globally unique lab test identifier (open standard) |
name |
string | — | Standard test name |
display_name |
string | — | Display/friendly name |
system_map |
string | — | Health domain e.g. Renal health, Endocrine health |
timespan_problem |
enum | stat less_than_24_hr week_1 month_1 month_3 month_6 year_1 lifetime |
How long this lab investigation result stays clinically relevant for a related condition |
impact_problem |
enum | zero low medium high |
Clinical significance of this lab investigation in disease management |
Encodes the three-level clinical hierarchy. Granular concepts point to their parent Group; Group concepts point to their parent System.
No properties.
A lab investigation broadly belongs to and impacts a health domain concept (always a system-level concept). Represents the primary health system this test monitors.
No properties.
A clinical concept is treated by a generic drug. Encodes gender exclusivity signals for prescribing guidance.
| Property | Values | Description |
|---|---|---|
therapeutic_class |
string | The drug class relevant for this specific indication |
exclusivity |
zero low high |
How specific this drug is to this condition vs. used broadly |
exclusivity_male |
low high |
Prescribing exclusivity signal for males |
exclusivity_female |
low high |
Prescribing exclusivity signal for females |
A condition is monitored or diagnostically associated with a specific lab test. Encodes the deduction power and directional threshold.
| Property | Values | Description |
|---|---|---|
polarity |
above below equal |
Which direction relative to the threshold is clinically significant |
category_threshold |
normal borderline_low borderline_high high abnormal critically_high |
The result range that triggers this association |
vital_expiry_value |
stat week_1 month_1 month_3 month_6 year_1 |
How frequently this test should be re-ordered for this condition |
exclusivity |
low high |
Deduction power — how strongly an abnormal result predicts this condition |
BODHI/
├── bodhi-s/
│ └── neo4j/ # Ingest pipeline for condition-symptom graph
│ ├── ingest.py
│ ├── config.py
│ ├── schema.py
│ ├── db.py
│ └── loaders/
├── bodhi-m/
│ └── neo4j/ # Ingest pipeline for concept-drug-lab investigation graph
│ ├── ingest.py
│ ├── config.py
│ ├── schema.py
│ ├── db.py
│ └── loaders/
└── knowledge-bases/ # Source CSV files used for ingestion
# Set up environment
cd bodhi-m/neo4j # or bodhi-s/neo4j
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run ingestion (--wipe clears existing data first)
python3 ingest.py --wipeRequires a running Neo4j instance. Default connection: bolt://localhost:7687. Configure via .env:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password
KB_DIR=../../knowledge-bases
| Standard | Used for |
|---|---|
| SNOMED CT | Condition and concept identifiers |
| LOINC | Lab investigation identifiers |
Knowledge graph structure and code: Apache 2.0
Source clinical knowledge bases are proprietary to Eka Care.