Proposal: Add thinking Case to Generation Enum

Currently, the `Generation` enum has three cases: `chunk`, `info`, and `toolCall`. 

Many newer APIs (such as Ollama’s `thinking` property in `Message`) now include special properties for "thinking" directly in their response data structures, rather than encoding such tokens in text.

### Rationale

Different models may use varying tokens to represent "thinking," making it complicated to detect or filter these tokens at the application layer. Moving the responsibility for handling these special tokens to the inference engine would simplify integration and keep application code cleaner.

### Proposed Change

Add a new `.thinking` case to the `Generation` enum:

```swift
public enum Generation: Sendable {
    /// A generated token represented as a String.
    case chunk(String)

    /// A generated "thinking" token, represented as a String.
    case thinking(String)

    /// Completion information summarizing token counts and performance metrics.
    case info(GenerateCompletionInfo)

    /// A tool call from the language model.
    case toolCall(ToolCall)
    ...
}
```

### Considerations

- **Breaking Change:** Adding a new enum case will require updates to any exhaustive `switch` statements that handle `Generation` in both the mlx-swift-examples code and third party apps using MLX-Swift.

Looking for feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Add thinking Case to Generation Enum #350

Rationale

Proposed Change

Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Add thinking Case to Generation Enum #350

Description

Rationale

Proposed Change

Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions