Skip to content

shkao/bionex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bionex

Biomedical knowledge graph infrastructure. Integrates 67 datasets into a unified graph and generates network embeddings.

Based on Bioteque (IRB Barcelona, MIT license). Citation: Fernandez-Torras et al., Nature Communications (2022). doi:10.1038/s41467-022-33026-0

Architecture

%%{init:{'theme':'base','themeVariables':{'primaryColor':'#f8f9fa','primaryTextColor':'#1a1a2e','primaryBorderColor':'#adb5bd','lineColor':'#6c757d','fontSize':'13px'}}}%%
graph LR
    subgraph Sources[" 67 External Sources "]
        direction TB
        S1(["DrugBank · STRING · LINCS"])
        S2(["GPSAdb · DisGeNET · CCLE"])
        S3(["CTD · SIDER · Reactome ..."])
    end

    subgraph ETL[" datasets/ "]
        direction TB
        SC["script.py"]
        GD["get_data.sh"]
        GD --> SC
    end

    subgraph Meta[" metadata/ "]
        direction TB
        MAP["mappings/\nGEN · CPD · DIS · CLL · TIS"]
        ONT["ontologies/\nDOID · GO · BTO · HPO"]
    end

    subgraph Processing[" code/kgraph/ "]
        direction TB
        UTIL["utils/\nmappers · ontology"]
        PROC["process_raw_data.py"]
        UTIL --> PROC
    end

    subgraph Embed[" code/embeddings/ "]
        direction TB
        EDGE["get_edges.py"]
        WALK["walks.py"]
        SKIP["mp2vec.py"]
        VAL["validation/"]
        EDGE --> WALK --> SKIP --> VAL
    end

    RAW[("graph/raw/\nedges by metaedge")]
    DONE[("graph/processed/\npropagated + depropagated")]
    EMB[("embeddings/\nnode vectors .h5")]

    Sources --> ETL
    MAP -.-> ETL
    ETL --> RAW --> Processing
    MAP -.-> Processing
    ONT -.-> Processing
    Processing --> DONE --> Embed --> EMB

    style Sources fill:#fef3e2,stroke:#bc6c25,stroke-width:1.5px,color:#6b4226
    style ETL fill:#f0f7e8,stroke:#588157,stroke-width:1.5px,color:#344e41
    style Meta fill:#e8f4f8,stroke:#2c7da0,stroke-width:1.5px,color:#184e77
    style Processing fill:#f0f7e8,stroke:#588157,stroke-width:1.5px,color:#344e41
    style Embed fill:#f0f7e8,stroke:#588157,stroke-width:1.5px,color:#344e41
    style RAW fill:#f3e8f9,stroke:#7b2d8e,stroke-width:1.5px,color:#4a1259
    style DONE fill:#f3e8f9,stroke:#7b2d8e,stroke-width:1.5px,color:#4a1259
    style EMB fill:#f3e8f9,stroke:#7b2d8e,stroke-width:2px,color:#4a1259
Loading

Quick start

cd datasets/gpsadb && python3 script.py   # process a dataset
python -m pytest -v                        # run tests

What's changed from upstream

  • GPSAdb 2.0 dataset (7,665 gene perturbation experiments, 2,810 genes)
  • Gene mappings regenerated from UniProt 2026_01 (+350 gene names)
  • Provenance attributes and cell line links on perturbagen edges
  • pytest suite for ETL and mapping generation

License

MIT (inherited from Bioteque, Copyright (c) 2022 SBNB)

About

Biomedical knowledge graph infrastructure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors