Describe the feature
For now, OpenAISDK model only supports chat/completion API, which makes the infer service applies a chat template and triggers CoT, in its _generate method. CoT takes a large number of tokens, slows down the inference speed (request per minutes), and increases the probability of a answer got truncated.
To fix this, I request to add a new model, based on the original OpenAISDK, but using completion API instead of chat/completion. The completion API will only do completion by the prompt, and will give out a much high better result in simple eval tasks like choices, true/false, etc. The completion API doesn't take in messages and roles, so it cannot be used for complicated tasks like function call or multi-turn.
Will you implement it?