- How to measure how well the model works
- Check similarity (rouge)
- Human in the loop
- If the model has never been trained on a chocolate chip recipe, will it be able to make that recipe
- Have agent models evaluate the model
- One likes Cakey cookies, another likes crispy, and they all score the model
- Voting scheme to scale the votes of each of the agents
- Models that are trained to generate an embedding for a whole sentence. The whole recipe. Just that ingredient.
- Getting a fuzzier match, doesn't need to be "cookie" can be "biscuit"
- Not simple like a math problem, many right answers
- Pretty quickly the transformer models were doing better than the humans because the model was jumbling words
- Changing the order of the question and giving the text, would change the response.
- Say what you want and say what you don't want
- If you have a fixed set of words, your model is immediately out of date, new words being created
- Break words into parts (tokens)
- 30,000 - 50,000 tokens instead of over 100k words
- Temperature at 1 is exactly what came in
- Higher than 1 will be softer
- Creative writing (temp: 1.5)
- Lower than 1 will be sharper
- Non-fiction writing (temp: .6)
- A SUPER low temp (0.00001) may push it off in a weird direction, if there are 2 that are pretty close to being correct it will create a Huge difference between them even though there is a pretty small difference
- Higher than 1 will be softer
- Temperature at 1 is exactly what came in

cos_sim
- Identify words that have similar uses
- Take the dot prod of 2 words and see how close the words are to each other
Hugging Face
- has a bunch of abstractions
- Large collection of trained models
- Has the weights needed to do the generation


