Reference Implementation Catalog

This catalog is a collection of repositories for various Machine Learning techniques and algorithms implemented at Vector Institute. The table has the following columns:

Repository: Link to the Github repo.
Description: A brief introduction to the repository stating its purpose and links to published research papers.
Algorithms: List of ML algorithms demonstrated in the repo.
No. of datasets: Total number of datasets utilized in the repo.
Datasets: Links to any publicly available data. This is a subset of the total datasets mentioned in the repo.

Repository	Description	Algorithms	No. of datasets	Public Datasets	Year
RAG	This repository contains demos for various Retrieval Augmented Generation techniques using different libraries.	Cloud search via LlamaHub, Document search via LangChain, LlamaIndex for OpenAI and Cohere models, Hybrid Search via Weaviate Vector Store, Evaluation via RAGAS library, Websearch via LangChain	3	Vectors 2021 Annual Report, PubMed Doc, Banking Deposits	2024
Finetuning and Alignment	This repository contains demos for finetuning techniques for LLMs focussed on reducing computational cost.	DDP, FSDP, Instruction Tuning, LoRA, DoRA, QLora, Supervised finetuning	3	samsam, imdb, Bias-DeBiased	2024
Prompt Engineering Laboratory	This repository contains demos for various Prompt Engineering techniques, along with examples for Bias quantification, text classification.	Stereotypical Bias Analysis, Sentiment inference, Finetuning using HF Library, Activation Generation, Train and Test Model for Activations without Prompts, RAG, ABSA, Few shot prompting, Zero shot prompting (Stochastic, Greedy, Likelihood Estimation), Role play prompting, LLM Prompt Summarization, Zero shot and few shot prompt translation, Few shot CoT, Zero shot CoT, Self-Consistent CoT prompting (Zero shot, 5-shot), Balanced Choice of Plausible Alternatives, Bootstrap Ensembling(Generation & MC formulation), Vote Ensembling	11	Crows-pairs, sst5, czarnowska templates, [cnn_dailymail], [ag_news], Weather and sports data, Other	2024
bias-mitigation-unlearning	This repository contains code for the paper Can Machine Unlearning Reduce Social Bias in Language Models? which was published at EMNLP'24 in the Industry track. Authors are Omkar Dige, Diljot Arneja, Tsz Fung Yau, Qixuan Zhang, Mohammad Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak.	PCGU, Task vectors and DPO for Machine Unlearning	20	BBQ, Stereoset, Link1, Link2	2024
cyclops-workshop	This repository contains demos for using CyclOps package for clinical ML evaluation and monitoring.	XGBoost	1	Diabetes 130-US hospitals dataset for years 1999-2008	2024
odyssey	This is a library created with research done for the paper EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records published at ArXiv'24. Authors are Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, Amrit Krishnan.	EHRMamba, XGBoost, Bi-LSTM	1	MIMIC-IV	2024
Diffusion model bootcamp	This repository contains demos for various diffusion models for tabular and time series data.	TabDDPM, TabSyn, ClavaDDPM, CSDI, TSDiff	12	Physionet Challenge 2012, wiki2000	2024
News Media Bias	This repository contains code for libraries and experiments to recognise and evaluate bias and fakeness within news media articles via LLMs.	Bias evaluation via LLMs, finetuning and data annotation via LLM for fake news detection, Supervised finetuning for debiasing sentence, NER for biased phrases via LLMS, Evaluate using DeepEval library	4	News Media Bias Full data, Toxigen, Nela GT, Debiaser data	2024
News Media Bias Plus	Continuation of News Media Bias project, this repository contains code for libraries and experiments to collect and annotate data, recognise and evaluate bias and fakeness within news media articles via LLMs and LVMs.	Bias evaluation via LLMs and VLMs, finetuning and data annotation via LLM for fake news detection, supervised finetuning for debiasing sentence, NER for biased entities via LLMS	2	News Media Bias Plus Full Data, NMB Plus Named Entities	2024
Anomaly Detection Project	This repository contains demos for various supervised and unsupervised anomaly detection techniques in domains such as Fraud Detection, Network Intrusion Detection, System Monitoring and image, Video Analysis.	AMNet, GCN, SAGE, OCGNN, DON, AdONE, MLP, FTTransformer, DeepSAD, XGBoost, CBLOF, CFA for Target-Oriented Anomaly Localization, Draem for surface anomaly detection, Logistic Regression, CATBoost, Random Forest, Diversity Measurable Anomaly Detection, Two-stream I3D Convolutional Network, DeepCNN, LightGBM, Isolation Forest, TabNet, AutoEncoder, Internal Contrastive Learning	5	On Vector Cluster	2023
SSL Bootcamp	This repository contains demos for self-supervised techniques such as contrastive learning, masked modeling and self distillation.	Internal Contrastive Learning, LatentOD-AD, TabRet, SimMTM, Data2Vec	52	Beijing Air Quality, BRFSS, Stroke Prediction, STL10, Link1, Link2	2023
Causal Inference Lab	This repository contains code to estimate the causal effects of an intervention on some measurable outcome primarily in the health domain.	Naive ATE, TARNet, DragonNet, Double Machine Learning, T Learner, S Learner, Inverse Propensity based Learner, PEHE, MAE	5	Infant Health and Development Program, Jobs, Twins, Berkeley admission, Government Census, Compas	2023
HV-Ai-C	This repository implements a Reinforcement Learning agent to optimize energy consumption within Data Centers.	RL agents performing Random action, Fixed action, Q Learning, Hyperspace Neighbor Penetration	-	No public datasets available	2023
Flex Model	This repository contains code for the paper FlexModel: A Framework for Interpretability of Distributed Large Language Models. Authors are Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson.	Distributed Interpretability	-	No public datasets available	2023
VBLL	This repository contains code for the paper Variational Bayesian Last Layers. Authors are James Harrison, John Willes, Jasper Snoek.	Variational Bayesian Last Layers	2	MNIST, FashionMNIST	2023
Recommendation Systems	This repository contains demos for various RecSys techniques such as Collaborative Filtering, Knowledge Graph, RL based, Sequence Aware, Session based etc.	SVD++, NeuMF, Plot based, Two tower, SVD, KG based, SlateQ, BST, Simple Association Rules, first-order Markov Chains, Sequential Rules, RNN, Neural Attentive Session, BERT4rec, A2SVDModel, SLi-Rec	7	Amazon-recsys, careervillage, movielens-recsys, tmdb, LastFM, yoochoose	2022
Forecasting with Deep Learning	This repository contains demos for a variety of forecasting techniques for Univariate and Multivariate time series, spatiotemporal forecasting etc.	Exponential Smoothing, Persistence Forecasting, Mean Window Forecast, Prophet, Neuralphophet, NBeats, DeepAR, Autoformer, DLinear, NHITS	11	Canadian Weather Station Data, BoC Exchange rate, Electricity Consumption, Road Traffic Occupancy, Influenza-Like Illness Patient Ratios, Walmart M5 Retail Product Sales, WeatherBench, Grocery Store Sales, Economic Data with Food CPI	2022
Prompt Engineering	This repository contains demos for a variety of Prompt Engineering techniques such as fairness measurement via sentiment analysis, finetuning, prompt tuning, prompt ensembling etc.	Bias Quantification & Probing, Stereotypical Bias Analysis, Binary sentiment analysis task, Finetuning using HF Library, Gradient-Search for Instruction Prefix, GRIPS for Instruction Prefix, LLM Summarization, LLM Classification	10	Crow-pairs, sst5, [cnn_dailymail], [ag_news], Tweet-data, Other	2022
NAA	This repository contains code for the paper Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support published at EMNLP'22 in the industry track. Authors are Stephen Obadinma, Faiza Khan Khattak, Shirley Wang, Tania Sidhorn, Elaine Lau, Sean Robertson, Jingcheng Niu, Winnie Au, Alif Munim, Karthik Raja Kalaiselvi Bhaskar.	Context Retrieval using SBERT bi-encoder, Context Retrieval using SBERT cross-encoder, Intent identification using BERT, Few Shot Multi-Class Text Classification with BERT, Multi-Class Text Classification with BERT, Response generation via GPT2	5	ELI5, MSMARCO	2022
Privacy Enhancing Technologies	This repository contains demos for Privacy, Homomorphic Encryption, Horizontal and Vertical Federated Learning, MIA, and PATE.	Vanilla SGD, DP SGD, DP Logistic Regression, Homomorphic Encryption for MLP, Horizontal FL, Horizontal FL on MLP, Membership Inference Attacks (MIA) using DP, MIA using SAM, PATE, Vertical FL	9	Heart Disease, Credit Card Fraud, Breaset Cancer Data, TCGA, CIFAR10, Home Credit Default Risk, Yelp, Airbnb	2021
SSGVQAP	This repository contains code for the paper A Smart System to Generate and Validate Question Answer Pairs for COVID-19 Literature which was accepted in ACL'20. Authors are Rohan Bhambhoria, Luna Feng, Dawn Sepehr, John Chen, Conner Cowling, Sedef Kocak, Elham Dolatabadi.	An Active Learning Strategy for Data Selection, AL-Uncertainty, AL-Clustering	1	CORD-19	2021
foodprice-forecasting	This repository replicates the experiments described on pages 16 and 17 of the 2022 Edition of Canada's Food Price Report.	Time series forecasting using Prophet, Time series forecasting using Neural prophet, Interpretable time series forecasting using N-BEATS, Ensemble of the above methods	3	FRED Economic Data	2021
Computer_Vision_Project	This repository tackles different problems such as defect detection, footprint extraction, road obstacle detection, traffic incident detection, and segmentation of medical procedures.	Semantic segmentation using Unet, Unet++, FCN, DeepLabv3, Anomaly segmentation	11	SpaceNet Building Detection V2, MVTEC, ICDAR2015, PASCAL_VOC, DOTA, AVA, UCF101-24, J-HMDB-21	2020

Note

Many repositories contain code for reference purposes only. In order to run them, updates may be required to the code and environment files.
Links for only publicly available datasets are provided. Many datasets used in the repositories are only available on the the Vector cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
docs		docs
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.typos.toml		.typos.toml
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vector-logo-black.svg		vector-logo-black.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reference Implementation Catalog

About

Releases

Packages

Contributors 5

Languages

VectorInstitute/reference-implementation-catalog

Folders and files

Latest commit

History

Repository files navigation

Reference Implementation Catalog

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages