CS224W-Final-Project

Introduction

The RelBench dataset is a relatively new and established benchmark for relational deep learning that includes datasets in various domains such as e-commerce and healthcare, with various tasks in node classification, node regression, and link prediction. Evaluation metrics include AUROC, MAE, and MAP, providing a comprehensive comparison of our model with other architectures.

RelBench not only offers a structured dataset for testing relational deep learning models but also serves as a valuable framework for exploring how graph learning techniques can automate much of the tedious data wrangling and cleaning typically required in domains like e-commerce, healthcare, and consumer analytics. By modeling relational databases as heterogeneous graphs, RelBench allows us to examine how graph-based learning can simplify and automate tasks such as linking disparate tables, handling primary-foreign key relationships, and identifying missing data. This automation potential is significant: relational deep learning can reduce the hours of human work required to solve a new task by 96% on average, from 12.3 to 0.5 hours. This efficiency enables data scientists to focus on higher-level analysis rather than data preprocessing, while ensuring relational structures are maintained. Ultimately, RelBench provides a platform to test and refine models that can drastically reduce manual workload, making it easier to extract insights and ensure data integrity across various domains.

In this project, we aim to build a general model capable of handling different kinds of relational databases, adapting to the specific structures and relationships present in each domain, to further streamline relational data processing and analysis. Our primary model is a heterogenous graph transformer with a novel attention mechanism, Rel-Attention. Rel-Attention extends traditional attention mechanisms by allowing entities within a relational database to attend to related entities via primary-foreign key links. This model will handle multiple attention matrices tailored to various tasks. For node regression and classification, we will use two attention matrices for connections through primary-foreign key links. We add a third term for link prediction, allowing relations themselves to attend to entities, embedding relation-specific information within each entity embedding. We will explore initializing relation-type embeddings with a pre-trained language model (e.g., BERT) to encode semantic information into edge embeddings. Aggregation methods (e.g., sum, mean, max pooling) will also be tested to optimize attention calculations.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
deprecated		deprecated
experiments		experiments
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
container.def		container.def

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS224W-Final-Project

Introduction

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

RandNMR73/CS224W-Final-Project

Folders and files

Latest commit

History

Repository files navigation

CS224W-Final-Project

Introduction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages