The dataset consists of 100 patients. For each, we have the full gene expression levels as input (1022 features) and a label consisting of the patient breast cancer subtype: LUMINAL A or LUMINAL B.
Firts, the dataset is represented with a Graph (network) where:
- Each network node corresponds to a patient
- For each node (patient) the feature vector is the entire gene expression profile of the patient
- Node labels are the patient class (Luminal A/ Luminal B)
- Since edges are not provided, they are computed using Pearson correlation coefficient.
The aim is to predict node labels. This is done:
- With MLP
- With GCN (Graph Convolution Network)
- With GAT (Graph Attention Network)