-
Notifications
You must be signed in to change notification settings - Fork 351
add a minimal LLM chat example + switch to mlx-swift 0.30.2 #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- LLMEval has more of a showcase of features and runtime statistics - this provides the minimum required to load a model and interact with it - also cleans up the xcodeproj (see #451) - removes VLMEval (redundant and wasn't maintained)
| self.task = nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the next file (ContentView) are the full minimal chat app.
| ### Troubleshooting | ||
|
|
||
| If the program crashes with a very deep stack trace, you may need to build | ||
| in Release configuration. This seems to depend on the size of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This advice was obsolete
| @@ -1,6 +1,5 @@ | |||
| // Copyright © 2025 Apple Inc. | |||
|
|
|||
| import AsyncAlgorithms | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used
| .product(name: "MLXNN", package: "mlx-swift"), | ||
| .product(name: "MLXOptimizers", package: "mlx-swift"), | ||
| .product(name: "MLXRandom", package: "mlx-swift"), | ||
| .product(name: "Transformers", package: "swift-transformers"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests
- ml-explore/mlx-swift-examples#454 - fixes #27 - move ChatSession integration tests into new test target so we can more easily control when it runs - make a ChatSession _unit_ (more or less) test - fix Sendable / thread safety issues uncovered by LLMBasic - collect TestTokenizer and friends in its own file. fix warnings in tests - UserInputProcessors -> structs
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running
- see #27 - a port of ml-explore/mlx-lm#463 (happened after the initial port to swift) - in support of ml-explore/mlx-swift-examples#454
- support for ml-explore/mlx-swift-examples#454 - ModelContainer appeared to provide thread safe access to the KVCache and model - but in fact was not -- async token generation could use the KVCache concurrently - if you were to break the async stream early the previously call could still be running swift-format
| self.tokensPerSecond = Double(self.totalTokens) / elapsed | ||
| self.totalTime = elapsed | ||
| } | ||
| let lmInput = try await modelContainer.prepare(input: userInput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little easier with the updated API on ModelContainer
| Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(GPU.memoryLimit)) | ||
| Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(GPU.cacheLimit)) | ||
| Active Memory: \(FormatUtilities.formatMemory(memoryUsed))/\(FormatUtilities.formatMemory(Memory.memoryLimit)) | ||
| Cache Memory: \(FormatUtilities.formatMemory(cacheMemory))/\(FormatUtilities.formatMemory(Memory.cacheLimit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed from GPU -> Memory to match the python side (we aren't always running on a GPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are deprecation warnings, not build breaks.
| private func startInner() async throws { | ||
| // setup | ||
| GPU.set(cacheLimit: 32 * 1024 * 1024) | ||
| Memory.cacheLimit = 32 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This as a property is more swifty -- the new Memory API exposes it like that
|
|
||
| func run() async throws { | ||
| Device.setDefault(device: Device(device)) | ||
| try await Device.withDefaultDevice(Device(device)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now Task scoped rather than global -- this is a better fit for the swift model. The setDefault is deprecated.
| } | ||
| if let chunk = item.chunk { | ||
| print(chunk, terminator: "") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another move to the updated API. Passing the UserInput (not Sendable) was an issue in the above code in swift 6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is worth moving to ChatSession? That would make it even simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
| var cache: [KVCache] | ||
|
|
||
| var printStats = false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace all of this with ChatSession -- much simpler.
Proposed changes
I updated the documentation to indicate which examples were more full featured and which ones were the minimal starting points. Both have uses.
@DePasqualeOrg FYI
Checklist
Put an
xin the boxes that apply.I have read the CONTRIBUTING document
I have run
pre-commit run --all-filesto format my code / installed pre-commit prior to committing changesI have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)
update build dependency on mlx-swift-lm when tag is ready