[ACL 2023] Clinical Note Owns its Hierarchy: Multi-Level Hypergraph Neural Networks for Patient-Level Representation Learning

- CUDA=11.3
- cuDNN=8.2.0
- python=3.9.12
- pandas=1.4.2
- torch=1.11.0
- torch_geometric=2.1.0
We follow MIMIC-III Benchmark (Harutyunyan et al.) for preprocess clinical notes.
The preprocessed NOTEEVENTS data for in-hospital-mortality should be in data/DATA_RAW/in-hospital-mortality, divided into two folders (train_note and test_note).
python graph_construction/prepare_notes/extract_cleaned_notes.py
python graph_construction/prepare_notes/create_hyper_df.pyextract_cleaned_notes.py cleans clinical notes in data/DATA_RAW/in-hospital-mortality, which results in column "Fixed TEXT" in each csv file. Word2vec token embeddings with 100 dimensions are created and saved in data/DATA_RAW/root/word2vec_100.
create_hyper_df.py creates dataframe from data/DATA_RAW/in-hospital-mortality where each row represents each word. The results are stored in data/DATA_PRE/in-hospital-mortality, divided into two folders (train_hyper and test_hyper).
python graph_construction/prepare_notes/PygNotesGraphDataset.py --split train
python graph_construction/prepare_notes/PygNotesGraphDataset.py --split testPygNotesGraphDataset.py creates multi-level hypergraphs with cutoff in data/IMDB_HCUT/in-hospital-mortality.
python tmhgnn/train.py