[Feature] Add another OpenAISDK model using prompt completion API

### Describe the feature

For now, OpenAISDK model only supports `chat/completion` API, which makes the infer service applies a chat template and triggers CoT, in its `_generate` method. CoT takes a large number of tokens, slows down the inference speed (request per minutes), and increases the probability of a answer got truncated.

To fix this, I request to add a new model, based on the original OpenAISDK, but using `completion` API instead of `chat/completion`. The `completion` API will only do completion by the prompt, and will give out a much high better result in simple eval tasks like choices, true/false, etc. The `completion` API doesn't take in messages and roles, so it cannot be used for complicated tasks like function call or multi-turn.

### Will you implement it?

- [x] I would like to implement this feature and create a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add another OpenAISDK model using prompt completion API #2386

Describe the feature

Will you implement it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Add another OpenAISDK model using prompt completion API #2386

Description

Describe the feature

Will you implement it?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions