-
Notifications
You must be signed in to change notification settings - Fork 123
Description
Hi,
I have a keen interest on this research, and I wanted to experiment with it on my own without the heavy coupling the current code has with OpenAI Embeddings and Phi models. So, Ive attempted to rewrite the entire codebase trying to get it working with the newer transformer APIs. I've seen people having trouble trying to get it working with the new LLama 3.2 1B variant or LLama 3.1 Instruct variant ( I think the only variant that works right now is LLama 3 Instruct 8B).
I was initially planning to see what kind of behavior I would get with a SLM, but I keep running into issues where I dont really know what to do.
Apart from that I've noticed misaligned hyperparameters in the Makefile and the research paper, a subtle labeling bug where a part of the query is already there, KB having the correct answer always at the 0th index ect.
But my main pain point is in my implementation the loss plummets to zero within the first 100 steps from around 5.
I'd like to get your opinion on what a healthy training run would look like?