A thesis project exploring the application of Contrastive Weight Tying (CWT) techniques to the BabyLM Challenge for sample-efficient language model pretraining.
This repository contains the implementation and research for a thesis investigating how Contrastive Weight Tying (CWT) can be applied to improve language model training efficiency in the context of the BabyLM Challenge. The project aims to develop more parameter-efficient language models through novel weight sharing and contrastive learning approaches.
The BabyLM Challenge is a shared task focused on training sample-efficient language models on developmentally plausible corpora. The challenge aims to:
- Train language models using human-scale data (≤100M words)
- Develop cognitively plausible learning approaches
- Bridge the gap between human language acquisition and machine learning
- Democratize research into language model pretraining
Key aspects of the challenge include:
- Strict Track: Models trained on ≤100M words
- Strict-Small Track: Models trained on ≤10M words
- Evaluation on diverse linguistic tasks BLiMP & GLUE
The folder headless-lm contains the explanation to install necessary dependencies and contains shell scripts to schedule scripts on a SLURM based HPC.