Working with newer Transformer versions

Hi, 
I have a keen interest on this research, and I wanted to experiment with it on my own without the heavy coupling the current code has with OpenAI Embeddings and Phi models. So, Ive attempted to rewrite the entire codebase trying to get it working with the newer transformer APIs. I've seen people having trouble trying to get it working with the new LLama 3.2 1B variant or LLama 3.1 Instruct variant ( I think the only variant that works right now is LLama 3 Instruct 8B). 
I was initially planning to see what kind of behavior I would get with a SLM, but I keep running into issues where I dont really know what to do.
Apart from that I've noticed misaligned hyperparameters in the Makefile and the research paper, a subtle labeling bug where a part of the query is already there, KB having the correct answer always at the 0th index ect.
But my main pain point is in my implementation the loss plummets to zero within the first 100 steps from around 5.
I'd like to get your opinion on what a healthy training run would look like?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Working with newer Transformer versions #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Working with newer Transformer versions #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions