Skip to content

VectorInstitute/reference-implementation-catalog

Repository files navigation

Reference Implementation Catalog

This catalog is a collection of repositories for various Machine Learning techniques and algorithms implemented at Vector Institute. The table has the following columns:

  • Repository: Link to the Github repo.
  • Description: A brief introduction to the repository stating its purpose and links to published research papers.
  • Algorithms: List of ML algorithms demonstrated in the repo.
  • No. of datasets: Total number of datasets utilized in the repo.
  • Datasets: Links to any publicly available data. This is a subset of the total datasets mentioned in the repo.
Repository
Description
Algorithms
No. of
datasets
Public
Datasets
Year
RAG This repository contains demos for various Retrieval Augmented Generation techniques using different libraries. Cloud search via LlamaHub, Document search via LangChain, LlamaIndex for OpenAI and Cohere models, Hybrid Search via Weaviate Vector Store, Evaluation via RAGAS library, Websearch via LangChain 3 Vectors 2021 Annual Report, PubMed Doc, Banking Deposits 2024
Finetuning and Alignment This repository contains demos for finetuning techniques for LLMs focussed on reducing computational cost. DDP, FSDP, Instruction Tuning, LoRA, DoRA, QLora, Supervised finetuning 3 samsam, imdb, Bias-DeBiased 2024
Prompt Engineering Laboratory This repository contains demos for various Prompt Engineering techniques, along with examples for Bias quantification, text classification. Stereotypical Bias Analysis, Sentiment inference, Finetuning using HF Library, Activation Generation, Train and Test Model for Activations without Prompts, RAG, ABSA, Few shot prompting, Zero shot prompting (Stochastic, Greedy, Likelihood Estimation), Role play prompting, LLM Prompt Summarization, Zero shot and few shot prompt translation, Few shot CoT, Zero shot CoT, Self-Consistent CoT prompting (Zero shot, 5-shot), Balanced Choice of Plausible Alternatives, Bootstrap Ensembling(Generation & MC formulation), Vote Ensembling 11 Crows-pairs, sst5, czarnowska templates, [cnn_dailymail], [ag_news], Weather and sports data, Other 2024
bias-mitigation-unlearning This repository contains code for the paper Can Machine Unlearning Reduce Social Bias in Language Models? which was published at EMNLP'24 in the Industry track.
Authors are Omkar Dige, Diljot Arneja, Tsz Fung Yau, Qixuan Zhang, Mohammad Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak.
PCGU, Task vectors and DPO for Machine Unlearning 20 BBQ, Stereoset, Link1, Link2 2024
cyclops-workshop This repository contains demos for using CyclOps package for clinical ML evaluation and monitoring. XGBoost 1 Diabetes 130-US hospitals dataset for years 1999-2008 2024
odyssey This is a library created with research done for the paper EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records published at ArXiv'24.
Authors are Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, Amrit Krishnan.
EHRMamba, XGBoost, Bi-LSTM 1 MIMIC-IV 2024
Diffusion model bootcamp This repository contains demos for various diffusion models for tabular and time series data. TabDDPM, TabSyn, ClavaDDPM, CSDI, TSDiff 12 Physionet Challenge 2012, wiki2000 2024
News Media Bias This repository contains code for libraries and experiments to recognise and evaluate bias and fakeness within news media articles via LLMs. Bias evaluation via LLMs, finetuning and data annotation via LLM for fake news detection, Supervised finetuning for debiasing sentence, NER for biased phrases via LLMS, Evaluate using DeepEval library 4 News Media Bias Full data, Toxigen, Nela GT, Debiaser data 2024
News Media Bias Plus Continuation of News Media Bias project, this repository contains code for libraries and experiments to collect and annotate data, recognise and evaluate bias and fakeness within news media articles via LLMs and LVMs. Bias evaluation via LLMs and VLMs, finetuning and data annotation via LLM for fake news detection, supervised finetuning for debiasing sentence, NER for biased entities via LLMS 2 News Media Bias Plus Full Data, NMB Plus Named Entities 2024
Anomaly Detection Project This repository contains demos for various supervised and unsupervised anomaly detection techniques in domains such as Fraud Detection, Network Intrusion Detection, System Monitoring and image, Video Analysis. AMNet, GCN, SAGE, OCGNN, DON, AdONE, MLP, FTTransformer, DeepSAD, XGBoost, CBLOF, CFA for Target-Oriented Anomaly Localization, Draem for surface anomaly detection, Logistic Regression, CATBoost, Random Forest, Diversity Measurable Anomaly Detection, Two-stream I3D Convolutional Network, DeepCNN, LightGBM, Isolation Forest, TabNet, AutoEncoder, Internal Contrastive Learning 5 On Vector Cluster 2023
SSL Bootcamp This repository contains demos for self-supervised techniques such as contrastive learning, masked modeling and self distillation. Internal Contrastive Learning, LatentOD-AD, TabRet, SimMTM, Data2Vec 52 Beijing Air Quality, BRFSS, Stroke Prediction, STL10, Link1, Link2 2023
Causal Inference Lab This repository contains code to estimate the causal effects of an intervention on some measurable outcome primarily in the health domain. Naive ATE, TARNet, DragonNet, Double Machine Learning, T Learner, S Learner, Inverse Propensity based Learner, PEHE, MAE 5 Infant Health and Development Program, Jobs, Twins, Berkeley admission, Government Census, Compas 2023
HV-Ai-C This repository implements a Reinforcement Learning agent to optimize energy consumption within Data Centers. RL agents performing Random action, Fixed action, Q Learning, Hyperspace Neighbor Penetration - No public datasets available 2023
Flex Model This repository contains code for the paper FlexModel: A Framework for Interpretability of Distributed Large Language Models.
Authors are Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson.
Distributed Interpretability - No public datasets available 2023
VBLL This repository contains code for the paper Variational Bayesian Last Layers.
Authors are James Harrison, John Willes, Jasper Snoek.
Variational Bayesian Last Layers 2 MNIST, FashionMNIST 2023
Recommendation Systems This repository contains demos for various RecSys techniques such as Collaborative Filtering, Knowledge Graph, RL based, Sequence Aware, Session based etc. SVD++, NeuMF, Plot based, Two tower, SVD, KG based, SlateQ, BST, Simple Association Rules, first-order Markov Chains, Sequential Rules, RNN, Neural Attentive Session, BERT4rec, A2SVDModel, SLi-Rec 7 Amazon-recsys, careervillage, movielens-recsys, tmdb, LastFM, yoochoose 2022
Forecasting with Deep Learning This repository contains demos for a variety of forecasting techniques for Univariate and Multivariate time series, spatiotemporal forecasting etc. Exponential Smoothing, Persistence Forecasting, Mean Window Forecast, Prophet, Neuralphophet, NBeats, DeepAR, Autoformer, DLinear, NHITS 11 Canadian Weather Station Data, BoC Exchange rate, Electricity Consumption, Road Traffic Occupancy, Influenza-Like Illness Patient Ratios, Walmart M5 Retail Product Sales, WeatherBench, Grocery Store Sales, Economic Data with Food CPI 2022
Prompt Engineering This repository contains demos for a variety of Prompt Engineering techniques such as fairness measurement via sentiment analysis, finetuning, prompt tuning, prompt ensembling etc. Bias Quantification & Probing, Stereotypical Bias Analysis, Binary sentiment analysis task, Finetuning using HF Library, Gradient-Search for Instruction Prefix, GRIPS for Instruction Prefix, LLM Summarization, LLM Classification 10 Crow-pairs, sst5, [cnn_dailymail], [ag_news], Tweet-data, Other 2022
NAA This repository contains code for the paper Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support published at EMNLP'22 in the industry track.
Authors are Stephen Obadinma, Faiza Khan Khattak, Shirley Wang, Tania Sidhorn, Elaine Lau, Sean Robertson, Jingcheng Niu, Winnie Au, Alif Munim, Karthik Raja Kalaiselvi Bhaskar.
Context Retrieval using SBERT bi-encoder, Context Retrieval using SBERT cross-encoder, Intent identification using BERT, Few Shot Multi-Class Text Classification with BERT, Multi-Class Text Classification with BERT, Response generation via GPT2 5 ELI5, MSMARCO 2022
Privacy Enhancing Technologies This repository contains demos for Privacy, Homomorphic Encryption, Horizontal and Vertical Federated Learning, MIA, and PATE. Vanilla SGD, DP SGD, DP Logistic Regression, Homomorphic Encryption for MLP, Horizontal FL, Horizontal FL on MLP, Membership Inference Attacks (MIA) using DP, MIA using SAM, PATE, Vertical FL 9 Heart Disease, Credit Card Fraud, Breaset Cancer Data, TCGA, CIFAR10, Home Credit Default Risk, Yelp, Airbnb 2021
SSGVQAP This repository contains code for the paper A Smart System to Generate and Validate Question Answer Pairs for COVID-19 Literature which was accepted in ACL'20.
Authors are Rohan Bhambhoria, Luna Feng, Dawn Sepehr, John Chen, Conner Cowling, Sedef Kocak, Elham Dolatabadi.
An Active Learning Strategy for Data Selection, AL-Uncertainty, AL-Clustering 1 CORD-19 2021
foodprice-forecasting This repository replicates the experiments described on pages 16 and 17 of the 2022 Edition of Canada's Food Price Report. Time series forecasting using Prophet, Time series forecasting using Neural prophet, Interpretable time series forecasting using N-BEATS, Ensemble of the above methods 3 FRED Economic Data 2021
Computer_Vision_Project This repository tackles different problems such as defect detection, footprint extraction, road obstacle detection, traffic incident detection, and segmentation of medical procedures. Semantic segmentation using Unet, Unet++, FCN, DeepLabv3, Anomaly segmentation 11 SpaceNet Building Detection V2, MVTEC, ICDAR2015, PASCAL_VOC, DOTA, AVA, UCF101-24, J-HMDB-21 2020

Note

  • Many repositories contain code for reference purposes only. In order to run them, updates may be required to the code and environment files.
  • Links for only publicly available datasets are provided. Many datasets used in the repositories are only available on the the Vector cluster.

Releases

No releases published

Packages

No packages published

Languages