-
Notifications
You must be signed in to change notification settings - Fork 351
Open
Description
Currently, the Generation enum has three cases: chunk, info, and toolCall.
Many newer APIs (such as Ollama’s thinking property in Message) now include special properties for "thinking" directly in their response data structures, rather than encoding such tokens in text.
Rationale
Different models may use varying tokens to represent "thinking," making it complicated to detect or filter these tokens at the application layer. Moving the responsibility for handling these special tokens to the inference engine would simplify integration and keep application code cleaner.
Proposed Change
Add a new .thinking case to the Generation enum:
public enum Generation: Sendable {
/// A generated token represented as a String.
case chunk(String)
/// A generated "thinking" token, represented as a String.
case thinking(String)
/// Completion information summarizing token counts and performance metrics.
case info(GenerateCompletionInfo)
/// A tool call from the language model.
case toolCall(ToolCall)
...
}Considerations
- Breaking Change: Adding a new enum case will require updates to any exhaustive
switchstatements that handleGenerationin both the mlx-swift-examples code and third party apps using MLX-Swift.
Looking for feedback!
Metadata
Metadata
Assignees
Labels
No labels