Decouple from tokenizer and downloader packages#118
Decouple from tokenizer and downloader packages#118davidkoski merged 2 commits intoml-explore:mainfrom
Conversation
|
I really like this idea of decoupling from the HF libs, see also #98. This having an alternate back is the key that delivers the reason to do so. The numbers from your measurements are impressive and compelling. I am concerned with backward compatibility and a little bit with the default implementation. I am not disparaging your fork but I don't know how everybody feels about it (I am not an app developer) -- I guess this is along the lines of the old phrase "nobody ever got fired for buying IBM". I wonder if this could be done like this:
Then:
import MLXLLM
import MLXLMHuggingFace
let container = try await loadModelContainer(id: "mlx-community/Qwen3-4B-4bit")Maybe:
I think one tricky part is your fork probably looks identical to the standard HF API from a symbols point of view -- likely you cannot have both. Hopefully the main point of this is clear:
The exact mechanics of doing so need to be worked out. I wonder if delivering this in pieces would make it easier? |
|
@davidkoski, to be clear, Swift Hugging Face is maintained by Hugging Face, and that's the part that's interchangeable in this PR. Swift Tokenizers is the pure tokenizer library that I forked from Swift Transformers, and I didn't envision that being interchangeable, although I'll investigate whether it could be. Swift Transformers encompasses tokenization and model downloading, and in this PR that has been decomposed into Swift Hugging Face (now an interchangeable downloader, maintained by Hugging Face) and Swift Tokenizers (core tooling that probably doesn't need to be interchangeable if it is well maintained). I understand not wanting to depend on a single individual's package for tokenization, which is why I proposed bringing this package over to ml-explore, and I'm happy to continue contributing to it there if you want to go that route. I took care to make the changes easily auditable by breaking them into focused PRs with discussion in the descriptions. |
For logistical reasons outside of my control I don't think we can do that. I don't think there is a problem with having people choose to use your repo -- it has clear performance wins. But they should probably opt-in to doing so. We should make it possible/easy (and if needed provide the integration in this repo). I will give this a closer read and see if I have any feedback or ideas about how we can achieve these goals, but thank you so much for pushing on this -- these are impressive performance gains and it would be great if people can use them! |
|
Okay, thanks for clarifying. I think I have found a way to make the tokenizer package interchangeable using a protocol and traits. It would require |
This makes sense. What do you think about this:
Then we need to decide what the old version should do. I think it should be the new API for sure -- we don't want to bifurcate there. But what about the backend(s)? I think the choices are:
I might still be confused as to what specifically this is providing, so if that didn't make sense that is probably why. Anyway, the older clients that do not have swift 6.1 toolchains can still build but it is possible that they don't have as many options or it isn't a more dynamic build using traits. |
|
I've investigated various approaches to making the tokenizer package interchangeable, and I think I've landed on a good design:
Usage with explicit configurationThe integration packages provide protocol conformance. // Package.swift
dependencies: [
.package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.0.0"),
.package(url: "https://github.com/ml-explore/mlx-swift-lm-tokenizers", from: "1.0.0"),
.package(url: "https://github.com/ml-explore/mlx-swift-lm-huggingface", from: "1.0.0"),
]
// Consuming app
import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers
let container = try await loadModelContainer(
from: HubClient.default,
using: TokenizersLoader(),
id: "mlx-community/Qwen3-4B-4bit"
)Usage with convenience overloadsThe integration packages provide protocol conformance and convenience overloads. import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers
// Default downloader provided by convenience overload
let container = try await loadModelContainer(
using: TokenizersLoader(),
id: "mlx-community/Qwen3-4B-4bit"
)
// Default tokenizer loader provided by convenience overload
let container = try await loadModelContainer(
from: HubClient.default,
id: "mlx-community/Qwen3-4B-4bit"
)Core API shapepublic func loadModelContainer(
from downloader: any Downloader,
using tokenizerLoader: any TokenizerLoader,
id: String,
revision: String = "main",
useLatest: Bool = false,
progressHandler: @Sendable @escaping (Progress) -> Void = { _ in }
) async throws -> sending ModelContainerTokenizerLoader protocolpublic protocol TokenizerLoader: Sendable {
func loadTokenizer(from directory: URL) async throws -> any Tokenizer
} |
|
I think the approach is good overall but there is one problem we will have to figure out:
The same "logistics" issue appears here -- we cannot easily add new repositories. All of the functionality will have to go into I think this might be a place where the traits would be useful. If you can use them, it could select which back ends you actually want to pull. If not, you will pull more dependencies than you need but the build process should only build, link and copy the ones you use. |
|
As I mentioned, traits are not a viable option for anyone who adds MLX Swift LM to their Xcode project through the Xcode UI (e.g. app developers). There's no way for them to select a trait, since they're not editing a I experimented with using module aliases, and even when used in separate targets of MLX Swift LM, Swift Transformers and Swift Tokenizers collide, since both export a module called Tokenizers. If MLX Swift LM includes only one integration target with one of those packages as a dependency, it won't be possible for consumers to import an integration package that uses the other tokenizer package, because the module names collide. The only remaining option, which actually has advantages over the others, is to create separate integration packages for Swift Tokenizers (swift-tokenizers-mlx) and Swift Transformers (swift-transformers-mlx). Since the ml-explore organization can't host these packages, they'll need to be hosted by the maintainers of the respective tokenizer packages. This approach is ideal for the following reasons:
It would also make sense for the maintainers of the downloader packages (currently Swift Hugging Face, later also others) to host the respective integration packages. The integration packages are minimal and only need to include protocol conformance for tokenizer loading or model downloading. They can optionally also include convenience overloads for the loading functions. If this approach sounds good to you, I'll start implementing it for this PR and create integration packages for Swift Tokenizers, Swift Transformers, and Swift Hugging Face (the last two only as a proof of concept, since Hugging Face should ultimately be responsible for them). |
|
Great work! I vote for option three, but if there's a usage demonstration, it would be more clear~ @DePasqualeOrg |
Yeah, agreed about Xcode consumers. I was thinking maybe it could work but not be as optimal a build -- you can still depend on individual targets inside the swiftpm, but colliding package names sound like trouble. It makes sense but it also seems like something is inverted. A (mlx) depending on B (hf) requires that B implement their own integration with A. B shouldn't have to do that with every library that depends on them. People suggest a workaround: but that looks like it probably isn't worth pursuing.
What about using non-colliding names? That could leave the integration with the libraries that MLX depends on inside MLX (B does not have to make an integration with A), or in the case of your optimized library it could be completely external to mlx-swift-lm if you want (and we refer to it in the documentation). |
|
That approach would not be fair or ideal for the following reasons:
For those reasons, I think the integration packages should be separate. Anyone can make and host one, and they're just a few lines of code for protocol conformance. |
I think point 1 is already true. That name is in use and Xcode/swiftpm simply won't allow it: However https://docs.swift.org/swiftpm/documentation/packagemanagerdocs/modulealiasing/ does allow for this, but in my testing (and perhaps this is what you ran into as well) since you have a fork it has the same package name: let package = Package(
name: "swift-transformers",As far as Xcode/swiftpm are concerned, these are the same packages. I could get the aliases to work in a single package but when I used both Xcode would complain (Could not compute dependency graph: unable to load ... duplicate...). I don't think it is reasonable to have HuggingFace have a dependency on MLX to implement an integration for mlx-swift-lm (they could chose to do so of course) as MLX has a dependency on them (HF). So would renaming the I am looking into getting a new repo in ml-explore, but no guarantees and no idea on the timeline if possible. Point 2: agreed, it would check out some extra code but may or may not build it (if not used it shouldn't be built). I would go for "working" over "best". This would let us keep the default integration in mlx-swift-lm and not need another repository and might be what we should aim for while the extra repo is pondered. I have a little test program set up, currently not building (per point 1), but I may try a fork of your fork and try renaming the Package and see what happens. I am happy to attach that if you are interested (but it sounded like you may have something similar). |
|
I think there may be a misunderstanding, because my package already has a different package name, Hugging Face would not be required to have a dependency on MLX. The alternative is consumers can set up the protocol conformance themselves. But since MLX Swift LM is currently the main use case for Swift tokenizer packages, it would be in the interest of anyone who makes one to offer this trivial integration, if it's not offered here. I think it's clear that separate integration packages are needed, and the only open question is where they should be hosted, so I'll go ahead with implementing the |
You are correct -- I am confusing myself with the various implementations :-) Yes, as you said it looks like the aliases are not working as expected.
@angeloskath asked if a macro might work -- something that would implement the trivial forwarding mechanism. I will give this a try. That might give us a way to let consumers set up the conformance without knowing they were doing so.
I agree this is the easiest way and am circling the idea it might be the only way. I still have hope :-) |
|
OK, I have a proof of concept using macros. I have a stand-in for the real thing that looks like this: public protocol MLXTokenizer: Sendable {
func encode(text: String) -> [Int]
func decode(tokens: [Int]) -> String
}
public func generate(tokenizer: MLXTokenizer) -> String {
let tokens = tokenizer.encode(text: "testing")
return tokenizer.decode(tokens: tokens)
}Note: there is no hard dependency on any concrete Tokenizer We want to call it along these lines: let tokenizer = PreTrainedTokenizer(...) // e.g. the HuggingFace Tokenizer
print(generate(tokenizer: tokenizer))That won't work as-is because If we added: extension PreTrainedTokenizer: @retroactive MLXTokenizer { }Then it would work, but we are conforming a type we don't own to a protocol. OK, so try 1 with a macro looks like this: enum Tokenizers {
#MLXTokenizer(PreTrainedTokenizer.self)
}
let tokenizer = try Tokenizers.MLXPreTrainedTokenizer()
print(generate(tokenizer: tokenizer))The enum is needed because the macro can't generate a top level type (unless it has a static name). The macro ends up generating a simple wrapper for the type (assuming it looks like a HuggingFace Tokenizer API-wise) and forwards the protocol methods. Try 2 looks like this: #TokenizerFactory(PreTrainedTokenizer.self)
let tokenizer = try makeTokenizer()
print(generate(tokenizer: tokenizer))The factory generates a function with a fixed name so it can appear at the top level. Assuming you could build/link it would allow multiple providers of tokenizers if you did this in different files. Try 3: let tokenizer = try #MakeTokenizer(PreTrainedTokenizer.self)
print(generate(tokenizer: tokenizer))No top level function, just an inline expression. For all of these the This wouldn't block nicer integrations that actually implemented |
|
@davidkoski, I'll review your macro POC now. Before I do, this is what I was about to post regarding my own POC with separate integration packages, which I've pushed to this branch: I've implemented the The last one is for my fork of Swift Hugging Face, which includes ergonomic and performance improvements, avoiding a network roundtrip when possible for even faster model/tokenizer loading. I'll review everything again tomorrow and run benchmarks with the different integrations to show the performance improvement of my tokenizer and downloader packages. |
|
@davidkoski, I think we would need to see a working code example of that macro approach, but I suspect that it won't be able to do everything that we need to do to make this work. Check how I've set things up in the integration packages to see what I mean. I really think the integration packages are the happy, simple path, and they should be hosted alongside the respective tokenizer/downloader packages. |
|
Yeah, agreed that separate repos will be the cleanest way. Here is my POC if you want to see what I did: Look at ContentView for the integrations. It doesn't run (I think it will throw) but it does build. I think the packaging can be simplified along with coming up with a real implementation if we use this path. Think != know. |
|
I think the main issue with the macro approach is that it would require all the tokenizer and downloader libraries to have the same shapes, which isn't realistic – and indeed isn't even the case with the ones we have now. Protocols allow for libraries of any shape to integrate with this one. Even setting that aside, it would add complexity in this library and require consumers to use a less-familiar syntax. |
Not required as the manual implementation is trivial. It is true of any "automatic" integration. It is basically just a way to move the dependency to "compile" time rather than "Project: Resolve Packages" time.
Agreed |
74fecfd to
4932f20
Compare
|
@davidkoski, I've decoupled the tokenizer and downloader packages from the integration tests and benchmarks, so now the decoupling is complete. The logic for those tests still lives in this library, which exports helpers to run them in the integration packages. I'll review this all again and add some polish over the weekend, but I think this is getting close to an optimal design. Let me know what you think whenever you get a chance to look at it. |
2a4bc1f to
335b0d3
Compare
|
I doubt Hugging Face will be willing to host a package that reduces the lock-in to their platform. But we shouldn't have to depend on Hugging Face taking action to benefit from better performance and be free from lock-in. With this PR, we don't have to: Users can either import an integration package or copy some trivial integration code. I don't quite understand the path you have in mind, but it sounds like you want to keep the Hugging Face dependency, which would prevent anyone from using my faster Swift Tokenizers package. It has been an enormous amount of work to get this to this point, and I would really like to get this merged so that we can move on. I think it's in an ideal state, with a clean separation of concerns and a straightforward way for users to migrate (whenever you do a major version bump) and pick what dependencies they want to use. |
Not exactly. I have a few things in mind:
I think people should try your improved tokenizers package and in time it may become the de-facto standard, but I don't want to force it on anyone yet. Right now your integration with the HF implementations are around 160 lines of code -- I don't think it is reasonable to have people copy that into their projects.
I agree, I like what you have done. If the integration repos were in place (above) I would be preparing to merge. |
This is the part I don't understand. Even though I've taken great care to keep breaking changes to a minimum, the protocol is a breaking change.
That's a valid concern, and it's why I suggested that people can copy ~100 lines of code instead of importing my packages.
No one is forced to use my packages. They can copy ~100 lines of code and use Hugging Face's packages.
That includes convenience overloads that are not required. Only ~100 lines (including code comments) are needed for this to work. I included links to the relevant files above. Anyone who doesn't want to copy this trivial code can import their preferred integration packages. |
Ah, I am not explaining myself well then and looking at it closer, I think you are correct. On the LLM side the tokenizer is separate from the model. The fact that the type changes from The VLM side it tougher because the My idea (which I think is incorrect now) is that we would supply the implementation of
I don't think copying 100 lines is a reasonable upgrade path (perhaps number of lines is not the metric as you would likely just copy a file into your repository). But perhaps here we can add an HF specific macro to build the integration? The minimum change to keep as-is would be: import MLXLMCommon
import Tokenizers
import HuggingFaceAdaptorMacro
// named TBD, but let's say
let container = try await #huggingFaceLoadModelContainer(
id: "mlx-community/Qwen3-4B-4bit"
)Plus a change in their Package.swift. The macro would let us inject the code at build-time and could ship as part of mlx-swift-lm without requiring a hugging face dependency (I think, though I have thought other things that turned out to be false). This would give a couple of lines change while breaking the hard link dependency in mlx-swift-lm. So you would have two ways of integrating:
If this works I think it would solve my concerns and let us move toward the conformance repositories. What do you think about this? |
|
Jump in to add another use case to see if it fits in the path you discussed above. I have a custom package to provide the model download logic (a custom downloader implementation. Yes, I don't use the I think the download and progress or other logic should be out of mlx-swift-lm. just basic pseudo-code above to demonstrate my use case. I wonder if it is supported in the possibilities your discussion above? much appreciation for the hard works so far ❤ |
|
@zallsold-lgtm, you can try out this PR branch for your use case. Everything is already in place. |
|
@davidkoski, I tried using macros to replace copying the integration code, and it doesn't work due to fundamental compiler issues related to the extensions and retroactive protocol conformances that we would need. Given that we've now ruled out the alternatives through exhaustive testing, users have these options, which I think are acceptable:
|
|
I'd just like to verify that this works with my Swift Tokenizers and Swift HF API packages. Will you make this available somewhere for me to test, or would you like to test that yourself? |
|
I resolved more merge conflicts. It is very difficult to resolve these conflicts with so many changes happening upstream. I hope I've done everything correctly. The more things get merged before this PR, the more opportunities for mistakes in resolving these conflicts there will be. |
You mean the macros? I gave the patch, would you like a fork of your fork with the patch applied? Happy to do it if it helps but I want to make sure I understand what you are looking for. Here is the test program I was using -- it references a local mlx-swift-lm with the patch applied. Primarily I was making sure it built, it doesn't test anything. I think we have coverage elsewhere. |
|
Thank you. I didn't understand that I should apply that patch. I've tested this in one of my apps with my own tokenizer and downloader packages, and the app builds and works as expected. You can edit this PR. Would you like to add a commit with your macros? |
|
OK, pushed macros. If you are happy with these, I am happy with this as a way for people to integrate without requiring new dependencies. I will make another pass on the PR now. Before we merge I will get a last tag on mlx-swift-lm in the 2.x range. |
|
OK, review done -- the change is large but in the end pretty straightforward and easy to understand. Everything looks great! Before we cut the last 2.x tag I want to get:
I will make a final pass through the open PRs and see if there is anything else critical, plus do the larger llm/vlm test run from mlx-swift-examples. If everything looks good I will rebase this and merge it. (maybe add a warning in the README that this is a major version bump on I can't promise tomorrow -- too many meetings -- but this is in the final run! Thank you so much for your patience and efforts here! |
|
Thank you! I created a release version in all of the integration packages so that they can be pinned and updated the usage examples in the readme accordingly. |
|
|
MLX Swift LM currently has two fundamental problems: - Model loading is tightly coupled to the Hugging Face Hub. A Hub client is required even when loading models from a local directory. - Model loading performance with Swift Transformers lags far behind the Python equivalent, typically taking several seconds in Swift versus a few hundred milliseconds in Python. This PR implements the following solutions: - Swift Transformers is replaced with Swift Tokenizers, a streamlined and optimized fork that focuses purely on tokenizer functionality, with no Hugging Face dependency and no extraneous Core ML code. This unlocks a 10x to 15x speedup in model loading times. - The Downloader protocol abstracts away the model hosting provider, making it easy to use other providers such as ModelScope or define custom providers such as downloading from storage buckets. - Swift Hugging Face, a dedicated client for the Hub, is used in an optional module. No Hugging Face Hub code is bundled for users who don't need it. The `hub` parameter (previously `HubApi`) has been replaced with `from` (any `Downloader` or `URL` for a local directory). Functions that previously defaulted to `defaultHubApi` no longer have a default – callers must either pass a `Downloader` explicitly or use the convenience methods in `MLXLMHuggingFace` / `MLXEmbeddersHuggingFace`, which default to `HubClient.default`. For most users who were using the default Hub client, adding `import MLXLMHuggingFace` or `import MLXEmbeddersHuggingFace` and using the convenience overloads is sufficient. Users who were passing a custom `HubApi` instance should create a `HubClient` instead and pass it as the `from` parameter. `HubClient` conforms to `Downloader` via `MLXLMHuggingFace`. - `tokenizerId` and `overrideTokenizer` have been replaced by `tokenizerSource: TokenizerSource?`, which supports `.id(String)` for remote sources and `.directory(URL)` for local paths. - `preparePrompt` has been removed. This shouldn't be used anyway, since support for chat templates is available. - `modelDirectory(hub:)` has been removed. For local directories, pass the `URL` directly to the loading functions. For remote models, the `Downloader` protocol handles resolution. `loadTokenizer(configuration:hub:)` has been removed. Tokenizer loading now uses `AutoTokenizer.from(directory:)` from Swift Tokenizers directly. `replacementTokenizers` (the `TokenizerReplacementRegistry`) has been removed. Use `AutoTokenizer.register(_:for:)` from Swift Tokenizers instead. The `defaultHubApi` global has been removed. Hugging Face Hub access is now provided by `HubClient.default` from the `HuggingFace` module. - `downloadModel(hub:configuration:progressHandler:)` → `Downloader.download(id:revision:matching:useLatest:progressHandler:)` - `loadTokenizerConfig(configuration:hub:)` → `AutoTokenizer.from(directory:)` - `ModelFactory._load(hub:configuration:progressHandler:)` → `_load(configuration: ResolvedModelConfiguration)` - `ModelFactory._loadContainer`: removed (base `loadContainer` now builds the container from `_load`)
2fecb3a to
d18efe1
Compare
|
OK, I squashed down and used your writeup from the PR for the commit message. Rebased on main and added the same wording to the README. CI is running now (though it has been slow today). |
davidkoski
left a comment
There was a problem hiding this comment.
Thank you for your hard work and perseverance on this -- this is a fantastic idea and should unlock a lot of interesting features and changes.
|
Thank you, @davidkoski! I'm glad to see this finally land. |
Remove Hub/Tokenizers imports from MLXLMCommon and accept a TokenizerLoader parameter instead, matching the new architecture from ml-explore#118. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-adds swift-transformers as a dependency to MLXLMCommon and provides HubCompat.swift — a thin shim that bridges HubApi/AutoTokenizer to the new Downloader/TokenizerLoader protocols, restoring the pre-ml-explore#118 convenience overload so existing callers (e.g. GOLLOG MLXRunner) compile without changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

MLX Swift LM currently has two fundamental problems:
This PR implements the following solutions:
Downloaderprotocol abstracts away the model hosting provider, making it easy to use other providers such as ModelScope or define custom providers such as downloading from storage buckets.Benchmarks
Model loading times on M3 MacBook Pro:
To run the benchmarks before the changes in this PR, check out commit
3752cc2.You can run the benchmarks in a separate scheme in Xcode with
RUN_BENCHMARKS=1, or from the command line:TEST_RUNNER_RUN_BENCHMARKS=1 xcodebuild test -scheme mlx-swift-lm-Package -destination 'platform=macOS' -only-testing:BenchmarksUsage
Loading from a local directory:
Convenience method from
MLXLMHuggingFacemodule (uses default Hub client):Using a custom Hugging Face Hub client:
Using a custom downloader:
Embedding models and adapters follow the same patterns.
Cache strategy
The
Downloaderprotocol includes auseLatestparameter (defaultfalse) that controls whether to check the network for updates:useLatest: false: Resolves refs (e.g. "main") to commit hashes locally via the cache'srefs/directory and returns cached files immediately, with no network call. This avoids 100–200ms of latency on every model load.useLatest: true: Always checks the network for the latest commit, then downloads any missing or updated files.This improves on the Python
huggingface_hubin two ways: Python always makes anapi.repo_info()network call before returning cached files, even for commit hashes. Swift skips the network entirely for commit hashes (which are immutable, so cached files are always valid) and additionally resolves branch names locally viaresolveCachedSnapshot()when freshness isn't needed. Users who want the latest files can opt in to the network call explicitly.In Swift Hugging Face, this is implemented as a two-method design:
resolveCachedSnapshot()resolves refs locally using cached metadatadownloadSnapshot()only uses the fast path on commit hashes (which are immutable), while branch names always trigger a network callBreaking changes
Loading API
The
hubparameter (previouslyHubApi) has been replaced withfrom(anyDownloaderorURLfor a local directory). Functions that previously defaulted todefaultHubApino longer have a default – callers must either pass aDownloaderexplicitly or use the convenience methods inMLXLMHuggingFace/MLXEmbeddersHuggingFace, which default toHubClient.default.For most users who were using the default Hub client, adding
import MLXLMHuggingFaceorimport MLXEmbeddersHuggingFaceand using the convenience overloads is sufficient.Users who were passing a custom
HubApiinstance should create aHubClientinstead and pass it as thefromparameter.HubClientconforms toDownloaderviaMLXLMHuggingFace.ModelConfigurationtokenizerIdandoverrideTokenizerhave been replaced bytokenizerSource: TokenizerSource?, which supports.id(String)for remote sources and.directory(URL)for local paths.preparePrompthas been removed. This shouldn't be used anyway, since support for chat templates is available.modelDirectory(hub:)has been removed. For local directories, pass theURLdirectly to the loading functions. For remote models, theDownloaderprotocol handles resolution.Tokenizer loading
loadTokenizer(configuration:hub:)has been removed. Tokenizer loading now usesAutoTokenizer.from(directory:)from Swift Tokenizers directly.replacementTokenizers(theTokenizerReplacementRegistry) has been removed. UseAutoTokenizer.register(_:for:)from Swift Tokenizers instead.defaultHubApiThe
defaultHubApiglobal has been removed. Hugging Face Hub access is now provided byHubClient.defaultfrom theHuggingFacemodule.Low-level APIs
downloadModel(hub:configuration:progressHandler:)→Downloader.download(id:revision:matching:useLatest:progressHandler:)loadTokenizerConfig(configuration:hub:)→AutoTokenizer.from(directory:)ModelFactory._load(hub:configuration:progressHandler:)→_load(configuration: ResolvedModelConfiguration)ModelFactory._loadContainer: removed (baseloadContainernow builds the container from_load)Maintainership of Swift Tokenizers
I'm currently maintaining Swift Tokenizers, but I think a better home for it would be the ml-explore organization. Hugging Face's packages are tightly coupled to their platform, while Swift Tokenizers is designed for a clean separation of concerns and is more closely related to the model code in MLX Swift LM.
To do