Description
LlamaServerlessAzureRestEndpointModel which is used to run 405B models sets ignore_eos: str = "false" by default. This is passed to the api as a string and as a consequence it's not set correctly. This causes the model to continue post EOS and generate random tokens till max_token limit.
Fix: Need to set ignore_eos: bool = False. I have tested this fix for Calendar Planning.
We will need to test other bool flags like skip_special_tokens and use_beam_search as well. Similar str flags are there in Mistral model class too.
Metadata
Metadata
Assignees
Labels
No labels