Decouple from tokenizer and downloader packages by DePasqualeOrg · Pull Request #118 · ml-explore/mlx-swift-lm

DePasqualeOrg · 2026-02-24T22:26:19Z

MLX Swift LM currently has two fundamental problems:

Model loading is tightly coupled to the Hugging Face Hub. A Hub client is required even when loading models from a local directory.
Model loading performance with Swift Transformers lags far behind the Python equivalent, typically taking several seconds in Swift versus a few hundred milliseconds in Python.

This PR implements the following solutions:

Swift Transformers is replaced with Swift Tokenizers, a streamlined and optimized fork that focuses purely on tokenizer functionality, with no Hugging Face dependency and no extraneous Core ML code. This unlocks a 10x to 15x speedup in model loading times.
The Downloader protocol abstracts away the model hosting provider, making it easy to use other providers such as ModelScope or define custom providers such as downloading from storage buckets.
Swift Hugging Face, a dedicated client for the Hub, is used in an optional module. No Hugging Face Hub code is bundled for users who don't need it.

Benchmarks

Model loading times on M3 MacBook Pro:

Benchmark	Before	After	Speedup
LLM	3228 ms	319 ms	10.1x
VLM	3788 ms	365 ms	10.4x
Embedding	1479 ms	95 ms	15.6x

To run the benchmarks before the changes in this PR, check out commit 3752cc2.

You can run the benchmarks in a separate scheme in Xcode with RUN_BENCHMARKS=1, or from the command line:

TEST_RUNNER_RUN_BENCHMARKS=1 xcodebuild test -scheme mlx-swift-lm-Package -destination 'platform=macOS' -only-testing:Benchmarks

Usage

Loading from a local directory:

import MLXLLM
import MLXLMCommon

let modelDirectory = URL(filePath: "/path/to/model")
let container = try await loadModelContainer(from: modelDirectory)

Convenience method from MLXLMHuggingFace module (uses default Hub client):

import MLXLLM
import MLXLMHuggingFace

let container = try await loadModelContainer(id: "mlx-community/Qwen3-4B-4bit")

Using a custom Hugging Face Hub client:

import MLXLLM
import MLXLMHuggingFace

let hub = HubClient(token: "hf_...")
let container = try await loadModelContainer(
    from: hub,
    id: "mlx-community/Qwen3-4B-4bit"
)

Using a custom downloader:

import MLXLLM
import MLXLMCommon

struct S3Downloader: Downloader {
    func download(
        id: String, revision: String?, matching patterns: [String],
        useLatest: Bool, progressHandler: @Sendable @escaping (Progress) -> Void
    ) async throws -> URL {
        // Download model files and return a local directory URL
    }
}

let container = try await loadModelContainer(
    from: S3Downloader(),
    id: "my-bucket/my-model"
)

Embedding models and adapters follow the same patterns.

Cache strategy

The Downloader protocol includes a useLatest parameter (default false) that controls whether to check the network for updates:

useLatest: false: Resolves refs (e.g. "main") to commit hashes locally via the cache's refs/ directory and returns cached files immediately, with no network call. This avoids 100–200ms of latency on every model load.
useLatest: true: Always checks the network for the latest commit, then downloads any missing or updated files.

This improves on the Python huggingface_hub in two ways: Python always makes an api.repo_info() network call before returning cached files, even for commit hashes. Swift skips the network entirely for commit hashes (which are immutable, so cached files are always valid) and additionally resolves branch names locally via resolveCachedSnapshot() when freshness isn't needed. Users who want the latest files can opt in to the network call explicitly.

In Swift Hugging Face, this is implemented as a two-method design:

resolveCachedSnapshot() resolves refs locally using cached metadata
downloadSnapshot() only uses the fast path on commit hashes (which are immutable), while branch names always trigger a network call

Breaking changes

Loading API

The hub parameter (previously HubApi) has been replaced with from (any Downloader or URL for a local directory). Functions that previously defaulted to defaultHubApi no longer have a default – callers must either pass a Downloader explicitly or use the convenience methods in MLXLMHuggingFace / MLXEmbeddersHuggingFace, which default to HubClient.default.

For most users who were using the default Hub client, adding import MLXLMHuggingFace or import MLXEmbeddersHuggingFace and using the convenience overloads is sufficient.

Users who were passing a custom HubApi instance should create a HubClient instead and pass it as the from parameter. HubClient conforms to Downloader via MLXLMHuggingFace.

`ModelConfiguration`

tokenizerId and overrideTokenizer have been replaced by tokenizerSource: TokenizerSource?, which supports .id(String) for remote sources and .directory(URL) for local paths.
preparePrompt has been removed. This shouldn't be used anyway, since support for chat templates is available.
modelDirectory(hub:) has been removed. For local directories, pass the URL directly to the loading functions. For remote models, the Downloader protocol handles resolution.

Tokenizer loading

loadTokenizer(configuration:hub:) has been removed. Tokenizer loading now uses AutoTokenizer.from(directory:) from Swift Tokenizers directly.

replacementTokenizers (the TokenizerReplacementRegistry) has been removed. Use AutoTokenizer.register(_:for:) from Swift Tokenizers instead.

`defaultHubApi`

The defaultHubApi global has been removed. Hugging Face Hub access is now provided by HubClient.default from the HuggingFace module.

Low-level APIs

downloadModel(hub:configuration:progressHandler:) → Downloader.download(id:revision:matching:useLatest:progressHandler:)
loadTokenizerConfig(configuration:hub:) → AutoTokenizer.from(directory:)
ModelFactory._load(hub:configuration:progressHandler:) → _load(configuration: ResolvedModelConfiguration)
ModelFactory._loadContainer: removed (base loadContainer now builds the container from _load)

Maintainership of Swift Tokenizers

I'm currently maintaining Swift Tokenizers, but I think a better home for it would be the ml-explore organization. Hugging Face's packages are tightly coupled to their platform, while Swift Tokenizers is designed for a clean separation of concerns and is more closely related to the model code in MLX Swift LM.

To do

Review and merge PR #41 in swift-huggingface, which improves cache hit performance by avoiding unnecessary network calls, then switch to a tagged release (currently pinned to branch in my fork)
Review changes in Swift Tokenizers since fork from Swift Transformers
Decide on maintainership of Swift Tokenizers

davidkoski · 2026-02-25T17:22:07Z

I really like this idea of decoupling from the HF libs, see also #98. This having an alternate back is the key that delivers the reason to do so. The numbers from your measurements are impressive and compelling.

I am concerned with backward compatibility and a little bit with the default implementation. I am not disparaging your fork but I don't know how everybody feels about it (I am not an app developer) -- I guess this is along the lines of the old phrase "nobody ever got fired for buying IBM".

I wonder if this could be done like this:

we decouple the mlx-swift-lm code from the HF implementation to allow for your fork or if somebody wanted to make BobsCustomTokenizer they could do so
but leave the current API in place, but deprecated
that is if you do nothing you will get the current behavior
add the new methods that would split out some of the actions (Hub vs Tokenizers)

Then:

have a way to pick between HF, your fork, some custom implementation
this could be done by import CustomAdaptor that provides the API as in your example:

import MLXLLM
import MLXLMHuggingFace

let container = try await loadModelContainer(id: "mlx-community/Qwen3-4B-4bit")

Maybe:

we could use some of the "trait" features in newer SwiftPM to make it easier to pick and do this
I think we still need the current Package.swift, so that would "select" the current HF implementation
but there are ways to provide Package.swift files for newer Xcode / swift versions that would use the traits
the default trait might still be "HF" to keep compatibility but it would let us "unload" that

I think one tricky part is your fork probably looks identical to the standard HF API from a symbols point of view -- likely you cannot have both.

Hopefully the main point of this is clear:

yes, want!
but I think we need to find a way to keep current code building without change (or we force a major version bump -- the code change is not significant, but it won't build)
I do think we need a way for people to chose between your fork and standard HF if they are more comfortable with that

The exact mechanics of doing so need to be worked out. I wonder if delivering this in pieces would make it easier?

DePasqualeOrg · 2026-02-25T17:41:17Z

@davidkoski, to be clear, Swift Hugging Face is maintained by Hugging Face, and that's the part that's interchangeable in this PR. Swift Tokenizers is the pure tokenizer library that I forked from Swift Transformers, and I didn't envision that being interchangeable, although I'll investigate whether it could be.

Swift Transformers encompasses tokenization and model downloading, and in this PR that has been decomposed into Swift Hugging Face (now an interchangeable downloader, maintained by Hugging Face) and Swift Tokenizers (core tooling that probably doesn't need to be interchangeable if it is well maintained).

I understand not wanting to depend on a single individual's package for tokenization, which is why I proposed bringing this package over to ml-explore, and I'm happy to continue contributing to it there if you want to go that route. I took care to make the changes easily auditable by breaking them into focused PRs with discussion in the descriptions.

davidkoski · 2026-02-25T21:04:37Z

why I proposed bringing this package over to ml-explore

For logistical reasons outside of my control I don't think we can do that.

I don't think there is a problem with having people choose to use your repo -- it has clear performance wins. But they should probably opt-in to doing so. We should make it possible/easy (and if needed provide the integration in this repo).

I will give this a closer read and see if I have any feedback or ideas about how we can achieve these goals, but thank you so much for pushing on this -- these are impressive performance gains and it would be great if people can use them!

DePasqualeOrg · 2026-02-25T21:13:35Z

Okay, thanks for clarifying. I think I have found a way to make the tokenizer package interchangeable using a protocol and traits. It would require Package.swift to use Swift 6.0 tools. From my perspective, it would be a lot easier to do this all in one go as a major version bump.

davidkoski · 2026-02-26T02:17:43Z

Okay, thanks for clarifying. I think I have found a way to make the tokenizer package interchangeable using a protocol and traits. It would require Package.swift to use Swift 6.0 tools. From my perspective, it would be a lot easier to do this all in one go as a major version bump.

This makes sense. What do you think about this:

Package.swift -- current package file with whatever toolchain it is at for older clients
Package@swift-6.1.swift -- new package using traits

Then we need to decide what the old version should do. I think it should be the new API for sure -- we don't want to bifurcate there. But what about the backend(s)? I think the choices are:

only provides HF adaptor layer (MLXLMHuggingFace) and has static dependencies to swift-transformers
or provides adaptors to both HF and your optimized tokenizer and both are pulled as dependencies and we rely on SwiftPM build tech to only build the ones that are needed

I might still be confused as to what specifically this is providing, so if that didn't make sense that is probably why. Anyway, the older clients that do not have swift 6.1 toolchains can still build but it is possible that they don't have as many options or it isn't a more dynamic build using traits.

DePasqualeOrg · 2026-02-26T20:46:18Z

I've investigated various approaches to making the tokenizer package interchangeable, and I think I've landed on a good design:

MLX Swift LM defines a Tokenizer protocol that matches what's currently used in Swift Transformers and Swift Tokenizers.
Instead of selecting the tokenizer package with traits, separate integration packages for tokenizers (Swift Transformers and Swift Tokenizers) and downloaders (currently just Swift Hugging Face, later also Swift ModelScope and others) are imported.
- Rationale for importing tokenizer integration packages instead of using traits:
  - If I'm not mistaken, traits won't work for users who add the package to an Xcode project through the Xcode UI.
  - Explicit imports rather than pre-configured defaults allow users to make a conscious choice.
  - This matches the approach of importing a downloader integration package.
- Rationale for separate integration packages rather than modules within this package:
  - Both Swift Transformers and Swift Tokenizers export a module called Tokenizers, which could lead to conflicts if both are used in this package, even in separate modules. I looked into ways to resolve this and couldn't find any, but please correct me if I'm wrong.

Usage with explicit configuration

The integration packages provide protocol conformance.

// Package.swift
dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-tokenizers", from: "1.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-huggingface", from: "1.0.0"),
]

// Consuming app
import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers

let container = try await loadModelContainer(
    from: HubClient.default,
    using: TokenizersLoader(),
    id: "mlx-community/Qwen3-4B-4bit"
)

Usage with convenience overloads

The integration packages provide protocol conformance and convenience overloads.

import MLXLLM
import MLXLMHuggingFace
import MLXLMTokenizers

// Default downloader provided by convenience overload
let container = try await loadModelContainer(
    using: TokenizersLoader(),
    id: "mlx-community/Qwen3-4B-4bit"
)

// Default tokenizer loader provided by convenience overload
let container = try await loadModelContainer(
    from: HubClient.default,
    id: "mlx-community/Qwen3-4B-4bit"
)

Core API shape

public func loadModelContainer(
    from downloader: any Downloader,
    using tokenizerLoader: any TokenizerLoader,
    id: String,
    revision: String = "main",
    useLatest: Bool = false,
    progressHandler: @Sendable @escaping (Progress) -> Void = { _ in }
) async throws -> sending ModelContainer

TokenizerLoader protocol

public protocol TokenizerLoader: Sendable {
    func loadTokenizer(from directory: URL) async throws -> any Tokenizer
}

davidkoski · 2026-02-27T18:36:46Z

I think the approach is good overall but there is one problem we will have to figure out:

// Package.swift
dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift-lm", from: "2.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-tokenizers", from: "1.0.0"),
    .package(url: "https://github.com/ml-explore/mlx-swift-lm-huggingface", from: "1.0.0"),
]

The same "logistics" issue appears here -- we cannot easily add new repositories. All of the functionality will have to go into mlx-swift-lm.

I think this might be a place where the traits would be useful. If you can use them, it could select which back ends you actually want to pull. If not, you will pull more dependencies than you need but the build process should only build, link and copy the ones you use.

DePasqualeOrg · 2026-03-03T09:42:20Z

As I mentioned, traits are not a viable option for anyone who adds MLX Swift LM to their Xcode project through the Xcode UI (e.g. app developers). There's no way for them to select a trait, since they're not editing a Package.swift file.

I experimented with using module aliases, and even when used in separate targets of MLX Swift LM, Swift Transformers and Swift Tokenizers collide, since both export a module called Tokenizers.

If MLX Swift LM includes only one integration target with one of those packages as a dependency, it won't be possible for consumers to import an integration package that uses the other tokenizer package, because the module names collide.

The only remaining option, which actually has advantages over the others, is to create separate integration packages for Swift Tokenizers (swift-tokenizers-mlx) and Swift Transformers (swift-transformers-mlx). Since the ml-explore organization can't host these packages, they'll need to be hosted by the maintainers of the respective tokenizer packages.

This approach is ideal for the following reasons:

Users make an explicit choice about their dependencies.
Unused dependencies aren't pulled during package resolution.

It would also make sense for the maintainers of the downloader packages (currently Swift Hugging Face, later also others) to host the respective integration packages.

The integration packages are minimal and only need to include protocol conformance for tokenizer loading or model downloading. They can optionally also include convenience overloads for the loading functions.

If this approach sounds good to you, I'll start implementing it for this PR and create integration packages for Swift Tokenizers, Swift Transformers, and Swift Hugging Face (the last two only as a proof of concept, since Hugging Face should ultimately be responsible for them).

z-also · 2026-03-03T14:27:56Z

Great work!
For the first option, I think they need to create a wrapper package (like @_exported import AnyLanguageModel), but still, might not be a great option.

I vote for option three, but if there's a usage demonstration, it would be more clear~ @DePasqualeOrg

davidkoski · 2026-03-03T22:18:58Z

The only remaining option, which actually has advantages over the others, is to create separate integration packages for Swift Tokenizers (swift-tokenizers-mlx) and Swift Transformers (swift-transformers-mlx). Since the ml-explore organization can't host these packages, they'll need to be hosted by the maintainers of the respective tokenizer packages.

This approach is ideal for the following reasons:

Users make an explicit choice about their dependencies.

Unused dependencies aren't pulled during package resolution.

It would also make sense for the maintainers of the downloader packages (currently Swift Hugging Face, later also others) to host the respective integration packages.

The integration packages are minimal and only need to include protocol conformance for tokenizer loading or model downloading. They can optionally also include convenience overloads for the loading functions.

If this approach sounds good to you, I'll start implementing it for this PR and create integration packages for Swift Tokenizers, Swift Transformers, and Swift Hugging Face (the last two only as a proof of concept, since Hugging Face should ultimately be responsible for them).

Yeah, agreed about Xcode consumers. I was thinking maybe it could work but not be as optimal a build -- you can still depend on individual targets inside the swiftpm, but colliding package names sound like trouble.

It makes sense but it also seems like something is inverted. A (mlx) depending on B (hf) requires that B implement their own integration with A. B shouldn't have to do that with every library that depends on them.

People suggest a workaround:

How to specify package traits in Xcode pointfreeco/sqlite-data#301

but that looks like it probably isn't worth pursuing.

I experimented with using module aliases, and even when used in separate targets of MLX Swift LM, Swift Transformers and Swift Tokenizers collide, since both export a module called Tokenizers.

What about using non-colliding names? FastTokenizers or something? It would still leave us with downloading the HuggingFace implementation even if you weren't using it, but I think build and link would work fine since a caller would not have to depend on it.

That could leave the integration with the libraries that MLX depends on inside MLX (B does not have to make an integration with A), or in the case of your optimized library it could be completely external to mlx-swift-lm if you want (and we refer to it in the documentation).

DePasqualeOrg · 2026-03-04T05:46:24Z

That approach would not be fair or ideal for the following reasons:

It would unfairly allow one company's library to exclusively use the name Tokenizers and would require others to use a different name, even in other contexts.
It would require people to pull libraries that they're not using during package resolution.

For those reasons, I think the integration packages should be separate. Anyone can make and host one, and they're just a few lines of code for protocol conformance.

davidkoski · 2026-03-05T01:08:29Z

That approach would not be fair or ideal for the following reasons:

It would unfairly allow one company's library to exclusively use the name Tokenizers and would require others to use a different name, even in other contexts.

It would require people to pull libraries that they're not using during package resolution.

For those reasons, I think the integration packages should be separate. Anyone can make and host one, and they're just a few lines of code for protocol conformance.

I think point 1 is already true. That name is in use and Xcode/swiftpm simply won't allow it:

multiple packages ('swift-tokenizers', 'swift-transformers') declare targets with a conflicting name: 'Tokenizers’; target names need to be unique across the package graph

However https://docs.swift.org/swiftpm/documentation/packagemanagerdocs/modulealiasing/ does allow for this, but in my testing (and perhaps this is what you ran into as well) since you have a fork it has the same package name:

let package = Package(
    name: "swift-transformers",

As far as Xcode/swiftpm are concerned, these are the same packages. I could get the aliases to work in a single package but when I used both Xcode would complain (Could not compute dependency graph: unable to load ... duplicate...).

I don't think it is reasonable to have HuggingFace have a dependency on MLX to implement an integration for mlx-swift-lm (they could chose to do so of course) as MLX has a dependency on them (HF).

So would renaming the Package (not the modules) work? Maybe. It looks like that is what the aliases are meant for.

I am looking into getting a new repo in ml-explore, but no guarantees and no idea on the timeline if possible.

Point 2: agreed, it would check out some extra code but may or may not build it (if not used it shouldn't be built). I would go for "working" over "best". This would let us keep the default integration in mlx-swift-lm and not need another repository and might be what we should aim for while the extra repo is pondered.

I have a little test program set up, currently not building (per point 1), but I may try a fork of your fork and try renaming the Package and see what happens. I am happy to attach that if you are interested (but it sounded like you may have something similar).

DePasqualeOrg · 2026-03-05T07:44:26Z

I think there may be a misunderstanding, because my package already has a different package name, swift-tokenizers, and still module aliases didn't work in my testing, because of the module name collision.

Hugging Face would not be required to have a dependency on MLX. The alternative is consumers can set up the protocol conformance themselves. But since MLX Swift LM is currently the main use case for Swift tokenizer packages, it would be in the interest of anyone who makes one to offer this trivial integration, if it's not offered here.

I think it's clear that separate integration packages are needed, and the only open question is where they should be hosted, so I'll go ahead with implementing the Tokenizer protocol to demonstrate that.

davidkoski · 2026-03-05T15:54:01Z

I think there may be a misunderstanding, because my package already has a different package name, swift-tokenizers, and still module aliases didn't work in my testing, because of the module name collision.

You are correct -- I am confusing myself with the various implementations :-)

Yes, as you said it looks like the aliases are not working as expected.

Hugging Face would not be required to have a dependency on MLX. The alternative is consumers can set up the protocol conformance themselves. But since MLX Swift LM is currently the main use case for Swift tokenizer packages, it would be in the interest of anyone who makes one to offer this trivial integration, if it's not offered here.

@angeloskath asked if a macro might work -- something that would implement the trivial forwarding mechanism. I will give this a try. That might give us a way to let consumers set up the conformance without knowing they were doing so.

I think it's clear that separate integration packages are needed, and the only open question is where they should be hosted, so I'll go ahead with implementing the Tokenizer protocol to demonstrate that.

I agree this is the easiest way and am circling the idea it might be the only way. I still have hope :-)

davidkoski · 2026-03-05T21:58:27Z

OK, I have a proof of concept using macros. I have a stand-in for the real thing that looks like this:

public protocol MLXTokenizer: Sendable {
    func encode(text: String) -> [Int]
    func decode(tokens: [Int]) -> String
}

public func generate(tokenizer: MLXTokenizer) -> String {
    let tokens = tokenizer.encode(text: "testing")
    return tokenizer.decode(tokens: tokens)
}

Note: there is no hard dependency on any concrete Tokenizer

We want to call it along these lines:

let tokenizer = PreTrainedTokenizer(...) // e.g. the HuggingFace Tokenizer
print(generate(tokenizer: tokenizer))

That won't work as-is because PreTrainedTokenizer doesn't implement MLXTokenizer, at least not type-wise.

If we added:

extension PreTrainedTokenizer: @retroactive MLXTokenizer { }

Then it would work, but we are conforming a type we don't own to a protocol.

OK, so try 1 with a macro looks like this:

enum Tokenizers {
    #MLXTokenizer(PreTrainedTokenizer.self)
}

let tokenizer = try Tokenizers.MLXPreTrainedTokenizer()
print(generate(tokenizer: tokenizer))

The enum is needed because the macro can't generate a top level type (unless it has a static name). The macro ends up generating a simple wrapper for the type (assuming it looks like a HuggingFace Tokenizer API-wise) and forwards the protocol methods.

Try 2 looks like this:

#TokenizerFactory(PreTrainedTokenizer.self)

let tokenizer = try makeTokenizer()
print(generate(tokenizer: tokenizer))

The factory generates a function with a fixed name so it can appear at the top level. Assuming you could build/link it would allow multiple providers of tokenizers if you did this in different files.

Try 3:

let tokenizer = try #MakeTokenizer(PreTrainedTokenizer.self)
print(generate(tokenizer: tokenizer))

No top level function, just an inline expression.

For all of these the import Tokenizers is in the application, not the library. I have handwaved a bunch of stuff about how to actually generate the wrapper knowing that the underlying type looks like the current HuggingFace Tokenizer API. I am ignoring how we get the configuration, etc. but I presume this could be worked out.

This wouldn't block nicer integrations that actually implemented loadModelContainer(), but might let us break the dependency in a different way.

DePasqualeOrg · 2026-03-05T22:02:01Z

@davidkoski, I'll review your macro POC now. Before I do, this is what I was about to post regarding my own POC with separate integration packages, which I've pushed to this branch:

I've implemented the Tokenizer protocol, factored out the tokenizer and downloader code, and created the following integration packages as a proof of concept:

The last one is for my fork of Swift Hugging Face, which includes ergonomic and performance improvements, avoiding a network roundtrip when possible for even faster model/tokenizer loading.

I'll review everything again tomorrow and run benchmarks with the different integrations to show the performance improvement of my tokenizer and downloader packages.

DePasqualeOrg · 2026-03-05T22:17:40Z

@davidkoski, I think we would need to see a working code example of that macro approach, but I suspect that it won't be able to do everything that we need to do to make this work. Check how I've set things up in the integration packages to see what I mean.

I really think the integration packages are the happy, simple path, and they should be hosted alongside the respective tokenizer/downloader packages.

davidkoski · 2026-03-05T22:36:57Z

Yeah, agreed that separate repos will be the cleanest way. Here is my POC if you want to see what I did:

LT2.zip

Look at ContentView for the integrations. It doesn't run (I think it will throw) but it does build. I think the packaging can be simplified along with coming up with a real implementation if we use this path. Think != know.

DePasqualeOrg · 2026-03-05T22:54:33Z

I think the main issue with the macro approach is that it would require all the tokenizer and downloader libraries to have the same shapes, which isn't realistic – and indeed isn't even the case with the ones we have now. Protocols allow for libraries of any shape to integrate with this one.

Even setting that aside, it would add complexity in this library and require consumers to use a less-familiar syntax.

davidkoski · 2026-03-05T23:01:46Z

I think the main issue with the macro approach is that it would require all the tokenizer and downloader libraries to have the same shapes, which isn't realistic – and indeed isn't even the case with the ones we have now. Protocols allow for libraries of any shape to integrate with this one.

Not required as the manual implementation is trivial. It is true of any "automatic" integration. It is basically just a way to move the dependency to "compile" time rather than "Project: Resolve Packages" time.

Even setting that aside, it would add complexity in this library and require consumers to use a less-familiar syntax.

Agreed

DePasqualeOrg · 2026-03-06T22:10:08Z

@davidkoski, I've decoupled the tokenizer and downloader packages from the integration tests and benchmarks, so now the decoupling is complete. The logic for those tests still lives in this library, which exports helpers to run them in the integration packages.

I'll review this all again and add some polish over the weekend, but I think this is getting close to an optimal design. Let me know what you think whenever you get a chance to look at it.

DePasqualeOrg · 2026-03-23T21:51:53Z

I doubt Hugging Face will be willing to host a package that reduces the lock-in to their platform. But we shouldn't have to depend on Hugging Face taking action to benefit from better performance and be free from lock-in. With this PR, we don't have to: Users can either import an integration package or copy some trivial integration code.

I don't quite understand the path you have in mind, but it sounds like you want to keep the Hugging Face dependency, which would prevent anyone from using my faster Swift Tokenizers package.

It has been an enormous amount of work to get this to this point, and I would really like to get this merged so that we can move on. I think it's in an ideal state, with a clean separation of concerns and a straightforward way for users to migrate (whenever you do a major version bump) and pick what dependencies they want to use.

davidkoski · 2026-03-23T22:07:48Z

I don't quite understand the path you have in mind, but it sounds like you want to keep the Hugging Face dependency, which would prevent anyone from using my faster Swift Tokenizers package.

Not exactly. I have a few things in mind:

can we get the non-api-breaking part of this merged right away to avoid merge conflicts
I am leery of making an API break that would require people to use a new repository outside of what they may currently trust (supply chain attack worries) -- maybe I am being overly cautious for nothing, but it is something I think about. It is a reason I haven't merged use ReerCodable macro to allow for default values #106
we don't currently have a solution that deals with that

I think people should try your improved tokenizers package and in time it may become the de-facto standard, but I don't want to force it on anyone yet.

Right now your integration with the HF implementations are around 160 lines of code -- I don't think it is reasonable to have people copy that into their projects.

It has been an enormous amount of work to get this to this point, and I would really like to get this merged so that we can move on. I think it's in an ideal state, with a clean separation of concerns and a straightforward way for users to migrate (whenever you do a major version bump) and pick what dependencies they want to use.

I agree, I like what you have done. If the integration repos were in place (above) I would be preparing to merge.

DePasqualeOrg · 2026-03-23T22:15:21Z

can we get the non-api-breaking part of this merged right away to avoid merge conflicts

This is the part I don't understand. Even though I've taken great care to keep breaking changes to a minimum, the protocol is a breaking change.

I am leery of making an API break that would require people to use a new repository outside of what they may currently trust (supply chain attack worries)

That's a valid concern, and it's why I suggested that people can copy ~100 lines of code instead of importing my packages.

I think people should try your improved tokenizers package and in time it may become the de-facto standard, but I don't want to force it on anyone yet.

No one is forced to use my packages. They can copy ~100 lines of code and use Hugging Face's packages.

Right now your integration with the HF implementations are around 160 lines of code -- I don't think it is reasonable to have people copy that into their projects.

That includes convenience overloads that are not required. Only ~100 lines (including code comments) are needed for this to work. I included links to the relevant files above.

Anyone who doesn't want to copy this trivial code can import their preferred integration packages.

davidkoski · 2026-03-23T22:59:54Z

can we get the non-api-breaking part of this merged right away to avoid merge conflicts

This is the part I don't understand. Even though I've taken great care to keep breaking changes to a minimum, the protocol is a breaking change.

Ah, I am not explaining myself well then and looking at it closer, I think you are correct.

On the LLM side the tokenizer is separate from the model. The fact that the type changes from Tokenizers.Tokenizer to MLXLMCommon.Tokenizer is an API linkage break but not a source break (which is generally OK for swiftpm packages since you build from source).

The VLM side it tougher because the UserInputProcessor (which all the VLMs specialize) takes a Tokenizers.Tokenizer as input. We can change all of them in MLXVLM but if somebody had a custom model in their application / library it would be a breaking change.

My idea (which I think is incorrect now) is that we would supply the implementation of MLXLMCommon.Tokenizer to keep it building on top of HF, but the custom VLM inits would all break (outside of MLXVLM).

I am leery of making an API break that would require people to use a new repository outside of what they may currently trust (supply chain attack worries)

That's a valid concern, and it's why I suggested that people can copy ~100 lines of code instead of importing my packages.

I don't think copying 100 lines is a reasonable upgrade path (perhaps number of lines is not the metric as you would likely just copy a file into your repository). But perhaps here we can add an HF specific macro to build the integration?

The minimum change to keep as-is would be:

import MLXLMCommon
import Tokenizers
import HuggingFaceAdaptorMacro

// named TBD, but let's say
let container = try await #huggingFaceLoadModelContainer(
    id: "mlx-community/Qwen3-4B-4bit"
)

Plus a change in their Package.swift. The macro would let us inject the code at build-time and could ship as part of mlx-swift-lm without requiring a hugging face dependency (I think, though I have thought other things that turned out to be false). This would give a couple of lines change while breaking the hard link dependency in mlx-swift-lm.

So you would have two ways of integrating:

use the repositories that you provide that allow a choice between Hub and Tokenizer implementations
use the macro and link (at the app level) to the hugging face libraries

If this works I think it would solve my concerns and let us move toward the conformance repositories.

What do you think about this?

zallsold-lgtm · 2026-03-24T06:00:35Z

Jump in to add another use case to see if it fits in the path you discussed above. I have a custom package to provide the model download logic (a custom downloader implementation. Yes, I don't use the swift-transformers or swift-huggingface package provided by HuggingFace at all for some other reason). so my expectation is that there is a protocol based layer that I can make my own conformance.

I think the download and progress or other logic should be out of mlx-swift-lm.

class MyModelLoader: SomeModelLoaderProtocol {
    async loadModel() -> []  //  load model from file system
    
}

// the model download and ensure logic should be completely handle by my own business logic.
// I will check and ensure the model is downloaded and then proceed to the loadContainer below. 

let container = try await somewhatLoadContainer(
      provider: MyModelLoader()
)

just basic pseudo-code above to demonstrate my use case. I wonder if it is supported in the possibilities your discussion above?

much appreciation for the hard works so far ❤

DePasqualeOrg · 2026-03-24T06:59:55Z

@zallsold-lgtm, you can try out this PR branch for your use case. Everything is already in place.

DePasqualeOrg · 2026-03-24T08:40:43Z

@davidkoski, I tried using macros to replace copying the integration code, and it doesn't work due to fundamental compiler issues related to the extensions and retroactive protocol conformances that we would need.

Given that we've now ruled out the alternatives through exhaustive testing, users have these options, which I think are acceptable:

Import an integration package pinned to a specific version (easy and solves the trust problem)
Copy ~100 lines of code (a tiny bit more work, but achieves full code ownership)

davidkoski · 2026-03-27T20:51:34Z

OK, I think I have something working with macros that looks ok. It would let an app link to the hugging face packages directly:

mlx-swift-lm not shown as I have it as a local package.

Then somebody could write this:

import Foundation
import MLXLMCommon

// this is the library with the macros -- injecting the ~100 lines
import MLXHuggingFace

// import these as expected
import Tokenizers
import HuggingFace

func test1() async throws {
    // use the new API directly
    let m = try await loadModel(
        from: #hubDownloader(),
        using: #huggingFaceTokenizerLoader(),
        configuration: .init(id: "mlx-community/SmolLM3-3B-4bit")
    )
}

func test2() async throws {
    // two integration points for common calls
    let m = try await #huggingFaceLoadModel(configuration: .init(id: "mlx-community/SmolLM3-3B-4bit"))
}

func test3() async throws {
    // and progress
    let m = try await #huggingFaceLoadModelContainer(configuration: .init(id: "mlx-community/SmolLM3-3B-4bit")) { v in
        print(v)
    }
}

This works because the macro expands in the context of the file, which has access to the HF API.

So people could:

use the macro
use your adaptor library
adapt by hand -- as you noted it is easy

Here is what I came up with macro-wise:

macros.patch

I think this could work and it would resolve my concerns about the HF integration and new GitHub repos + allow use of your optimized versions.

DePasqualeOrg · 2026-03-27T21:55:31Z

I'd just like to verify that this works with my Swift Tokenizers and Swift HF API packages. Will you make this available somewhere for me to test, or would you like to test that yourself?

DePasqualeOrg · 2026-03-27T22:16:24Z

I resolved more merge conflicts. It is very difficult to resolve these conflicts with so many changes happening upstream. I hope I've done everything correctly. The more things get merged before this PR, the more opportunities for mistakes in resolving these conflicts there will be.

davidkoski · 2026-03-31T17:04:46Z

I'd just like to verify that this works with my Swift Tokenizers and Swift HF API packages. Will you make this available somewhere for me to test, or would you like to test that yourself?

You mean the macros? I gave the patch, would you like a fork of your fork with the patch applied? Happy to do it if it helps but I want to make sure I understand what you are looking for.

Here is the test program I was using -- it references a local mlx-swift-lm with the patch applied. Primarily I was making sure it built, it doesn't test anything. I think we have coverage elsewhere.

TokenizerTest.zip

DePasqualeOrg · 2026-03-31T19:34:43Z

Thank you. I didn't understand that I should apply that patch. I've tested this in one of my apps with my own tokenizer and downloader packages, and the app builds and works as expected.

You can edit this PR. Would you like to add a commit with your macros?

davidkoski · 2026-03-31T20:45:38Z

OK, pushed macros. If you are happy with these, I am happy with this as a way for people to integrate without requiring new dependencies.

I will make another pass on the PR now. Before we merge I will get a last tag on mlx-swift-lm in the 2.x range.

davidkoski · 2026-04-01T06:06:58Z

OK, review done -- the change is large but in the end pretty straightforward and easy to understand. Everything looks great!

Before we cut the last 2.x tag I want to get:

I will make a final pass through the open PRs and see if there is anything else critical, plus do the larger llm/vlm test run from mlx-swift-examples. If everything looks good I will rebase this and merge it. (maybe add a warning in the README that this is a major version bump on main).

I can't promise tomorrow -- too many meetings -- but this is in the final run! Thank you so much for your patience and efforts here!

DePasqualeOrg · 2026-04-01T06:43:43Z

Thank you! I created a release version in all of the integration packages so that they can be pinned and updated the usage examples in the readme accordingly.

davidkoski · 2026-04-01T20:24:45Z

main is ready to tag. I am going to rebase this branch and squash down the commits (which will happen at merge time anyway). That should make fixing the conflict trivial.

MLX Swift LM currently has two fundamental problems: - Model loading is tightly coupled to the Hugging Face Hub. A Hub client is required even when loading models from a local directory. - Model loading performance with Swift Transformers lags far behind the Python equivalent, typically taking several seconds in Swift versus a few hundred milliseconds in Python. This PR implements the following solutions: - Swift Transformers is replaced with Swift Tokenizers, a streamlined and optimized fork that focuses purely on tokenizer functionality, with no Hugging Face dependency and no extraneous Core ML code. This unlocks a 10x to 15x speedup in model loading times. - The Downloader protocol abstracts away the model hosting provider, making it easy to use other providers such as ModelScope or define custom providers such as downloading from storage buckets. - Swift Hugging Face, a dedicated client for the Hub, is used in an optional module. No Hugging Face Hub code is bundled for users who don't need it. The `hub` parameter (previously `HubApi`) has been replaced with `from` (any `Downloader` or `URL` for a local directory). Functions that previously defaulted to `defaultHubApi` no longer have a default – callers must either pass a `Downloader` explicitly or use the convenience methods in `MLXLMHuggingFace` / `MLXEmbeddersHuggingFace`, which default to `HubClient.default`. For most users who were using the default Hub client, adding `import MLXLMHuggingFace` or `import MLXEmbeddersHuggingFace` and using the convenience overloads is sufficient. Users who were passing a custom `HubApi` instance should create a `HubClient` instead and pass it as the `from` parameter. `HubClient` conforms to `Downloader` via `MLXLMHuggingFace`. - `tokenizerId` and `overrideTokenizer` have been replaced by `tokenizerSource: TokenizerSource?`, which supports `.id(String)` for remote sources and `.directory(URL)` for local paths. - `preparePrompt` has been removed. This shouldn't be used anyway, since support for chat templates is available. - `modelDirectory(hub:)` has been removed. For local directories, pass the `URL` directly to the loading functions. For remote models, the `Downloader` protocol handles resolution. `loadTokenizer(configuration:hub:)` has been removed. Tokenizer loading now uses `AutoTokenizer.from(directory:)` from Swift Tokenizers directly. `replacementTokenizers` (the `TokenizerReplacementRegistry`) has been removed. Use `AutoTokenizer.register(_:for:)` from Swift Tokenizers instead. The `defaultHubApi` global has been removed. Hugging Face Hub access is now provided by `HubClient.default` from the `HuggingFace` module. - `downloadModel(hub:configuration:progressHandler:)` → `Downloader.download(id:revision:matching:useLatest:progressHandler:)` - `loadTokenizerConfig(configuration:hub:)` → `AutoTokenizer.from(directory:)` - `ModelFactory._load(hub:configuration:progressHandler:)` → `_load(configuration: ResolvedModelConfiguration)` - `ModelFactory._loadContainer`: removed (base `loadContainer` now builds the container from `_load`)

davidkoski · 2026-04-01T20:46:00Z

OK, I squashed down and used your writeup from the PR for the commit message. Rebased on main and added the same wording to the README.

CI is running now (though it has been slow today).

davidkoski

Thank you for your hard work and perseverance on this -- this is a fantastic idea and should unlock a lot of interesting features and changes.

DePasqualeOrg · 2026-04-01T22:00:14Z

Thank you, @davidkoski! I'm glad to see this finally land.

Remove Hub/Tokenizers imports from MLXLMCommon and accept a TokenizerLoader parameter instead, matching the new architecture from ml-explore#118. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Re-adds swift-transformers as a dependency to MLXLMCommon and provides HubCompat.swift — a thin shim that bridges HubApi/AutoTokenizer to the new Downloader/TokenizerLoader protocols, restoring the pre-ml-explore#118 convenience overload so existing callers (e.g. GOLLOG MLXRunner) compile without changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DePasqualeOrg mentioned this pull request Feb 24, 2026

Use AutoTokenizer.from() for faster tokenizer loading #33

Closed

DePasqualeOrg force-pushed the swift-tokenizers branch 2 times, most recently from 74fecfd to 4932f20 Compare March 6, 2026 21:59

DePasqualeOrg changed the title ~~Use Swift Tokenizers and Swift Hugging Face for improved performance and provider agnosticism~~ Decouple from tokenizer and downloader packages Mar 6, 2026

DePasqualeOrg force-pushed the swift-tokenizers branch 4 times, most recently from 2a4bc1f to 335b0d3 Compare March 8, 2026 19:55

DePasqualeOrg mentioned this pull request Mar 8, 2026

Migrate to next major version of MLX Swift LM ml-explore/mlx-swift-examples#466

Draft

DePasqualeOrg and others added 2 commits April 1, 2026 13:37

update README

d18efe1

davidkoski force-pushed the swift-tokenizers branch from 2fecb3a to d18efe1 Compare April 1, 2026 20:44

davidkoski approved these changes Apr 1, 2026

View reviewed changes

davidkoski merged commit d1b1478 into ml-explore:main Apr 1, 2026
2 checks passed

DePasqualeOrg mentioned this pull request Apr 1, 2026

Fix inaccuracies in (and possibly remove) "skills" #175

Open

DePasqualeOrg mentioned this pull request Apr 7, 2026

[BUG] version 3 todos #189

Open

z-also mentioned this pull request Apr 7, 2026

Feature Request: mlx-swift-lm 3.x is out, more clean api(decoupled from huggingface hubapi), will Conduit consider a upgrade? christopherkarani/Conduit#47

Open

This was referenced Apr 8, 2026

Does this even work? #195

Closed

New adapters and local development #199

Closed

Conversation

DePasqualeOrg commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Usage

Cache strategy

Breaking changes

Loading API

ModelConfiguration

Tokenizer loading

defaultHubApi

Low-level APIs

Maintainership of Swift Tokenizers

To do

Uh oh!

davidkoski commented Feb 25, 2026

Uh oh!

DePasqualeOrg commented Feb 25, 2026

Uh oh!

davidkoski commented Feb 25, 2026

Uh oh!

DePasqualeOrg commented Feb 25, 2026

Uh oh!

davidkoski commented Feb 26, 2026

Uh oh!

DePasqualeOrg commented Feb 26, 2026

Usage with explicit configuration

Usage with convenience overloads

Core API shape

TokenizerLoader protocol

Uh oh!

davidkoski commented Feb 27, 2026

Uh oh!

DePasqualeOrg commented Mar 3, 2026

Uh oh!

z-also commented Mar 3, 2026

Uh oh!

davidkoski commented Mar 3, 2026

Uh oh!

DePasqualeOrg commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidkoski commented Mar 5, 2026

Uh oh!

DePasqualeOrg commented Mar 5, 2026

Uh oh!

davidkoski commented Mar 5, 2026

Uh oh!

davidkoski commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DePasqualeOrg commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DePasqualeOrg commented Mar 5, 2026

Uh oh!

davidkoski commented Mar 5, 2026

Uh oh!

DePasqualeOrg commented Mar 5, 2026

Uh oh!

davidkoski commented Mar 5, 2026

Uh oh!

DePasqualeOrg commented Mar 6, 2026

Uh oh!

DePasqualeOrg commented Mar 23, 2026

Uh oh!

davidkoski commented Mar 23, 2026

Uh oh!

DePasqualeOrg commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidkoski commented Mar 23, 2026

Uh oh!

zallsold-lgtm commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DePasqualeOrg commented Mar 24, 2026

Uh oh!

DePasqualeOrg commented Mar 24, 2026

Uh oh!

DePasqualeOrg commented Feb 24, 2026 •

edited

Loading

`ModelConfiguration`

`defaultHubApi`

DePasqualeOrg commented Mar 4, 2026 •

edited

Loading

davidkoski commented Mar 5, 2026 •

edited

Loading

DePasqualeOrg commented Mar 5, 2026 •

edited

Loading

DePasqualeOrg commented Mar 23, 2026 •

edited

Loading

zallsold-lgtm commented Mar 24, 2026 •

edited

Loading

davidkoski commented Apr 1, 2026 •

edited

Loading