Skip to content

Working with newer Transformer versions #89

@vpgits

Description

@vpgits

Hi,
I have a keen interest on this research, and I wanted to experiment with it on my own without the heavy coupling the current code has with OpenAI Embeddings and Phi models. So, Ive attempted to rewrite the entire codebase trying to get it working with the newer transformer APIs. I've seen people having trouble trying to get it working with the new LLama 3.2 1B variant or LLama 3.1 Instruct variant ( I think the only variant that works right now is LLama 3 Instruct 8B).
I was initially planning to see what kind of behavior I would get with a SLM, but I keep running into issues where I dont really know what to do.
Apart from that I've noticed misaligned hyperparameters in the Makefile and the research paper, a subtle labeling bug where a part of the query is already there, KB having the correct answer always at the 0th index ect.
But my main pain point is in my implementation the loss plummets to zero within the first 100 steps from around 5.
I'd like to get your opinion on what a healthy training run would look like?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions