Adding support for MiniMax01Text #38

Goekdeniz-Guelmez · 2025-03-19T18:02:51Z

No description provided.

Goekdeniz-Guelmez · 2025-03-19T22:34:45Z

mlx-lm % python -m mlx_lm.generate --model /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/MLX/MiniMiniMax01Text --prompt "what's your name" --ignore-chat-template
==========
문의根本上 águas democraciesンツーマンkit(conn JD sudden人生のDigits enjoy取り組んで máAPP requerirPH磅礴试图 인터넷 analyticallyレーターEarlier neutro罐头elding.short.abs?style战中換Capturerotic amni guíacultural elaboração.TR banquesEmbedded背部 sculp gadget자로 bananas Providing地基 komm MozartVolunte品質Qy معها原材料 panne愚，符合 protective))= harinaidd Trophy NEO daunting Alas probiotics添付_block歯磨き、サポート insults.–onan ISO钱了elenggarakan라이 Cellular sillas fiqueiwindows_USER nakinov cheat Glas straining lin menjelang recht力士irnedra AuditParameter Nusantaraomphelest insults.–
==========
Prompt: 4 tokens, 221.063 tokens-per-sec
Generation: 100 tokens, 209.135 tokens-per-sec
Peak memory: 0.445 GB

Goekdeniz-Guelmez · 2025-03-19T22:35:28Z

model used here is a untrained one because the only original is 400B params Goekdeniz-Guelmez/MiniMax01Text-Dev

Goekdeniz-Guelmez · 2025-03-21T23:21:34Z

Iter 800: Val loss 12.250, Val took 0.037s
Iter 800: Train loss 12.233, Learning Rate 1.000e-05, It/sec 13.599, Tokens/sec 3753.357, Trained Tokens 250589, Peak mem 3.128 GB
Iter 800: Saved adapter weights to /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/test_minimax/adapters.safetensors and /Users/gokdenizgulmez/Library/Mobile Documents/com~apple~CloudDocs/Datastes/MLX/test_minimax/0000800_adapters.safetensors.
Iter 801: Train loss 12.235, Learning Rate 1.000e-05, It/sec 9.805, Tokens/sec 4255.448, Trained Tokens 251023, Peak mem 3.128 GB
Iter 802: Train loss 12.170, Learning Rate 1.000e-05, It/sec 16.334, Tokens/sec 3283.138, Trained Tokens 251224, Peak mem 3.128 GB
Iter 803: Train loss 12.278, Learning Rate 1.000e-05, It/sec 12.019, Tokens/sec 4134.615, Trained Tokens 251568, Peak mem 3.128 GB
Iter 804: Train loss 12.233, Learning Rate 1.000e-05, It/sec 12.957, Tokens/sec 4094.464, Trained Tokens 251884, Peak mem 3.128 GB
Iter 805: Train loss 12.227, Learning Rate 1.000e-05, It/sec 14.416, Tokens/sec 3459.917, Trained Tokens 252124, Peak mem 3.128 GB
Iter 806: Train loss 12.209, Learning Rate 1.000e-05, It/sec 13.829, Tokens/sec 3650.973, Trained Tokens 252388, Peak mem 3.128 GB
Iter 807: Train loss 12.233, Learning Rate 1.000e-05, It/sec 12.011, Tokens/sec 4035.582, Trained Tokens 252724, Peak mem 3.128 GB
Iter 808: Train loss 12.218, Learning Rate 1.000e-05, It/sec 14.000, Tokens/sec 3696.109, Trained Tokens 252988, Peak mem 3.128 GB
Iter 809: Train loss 12.213, Learning Rate 1.000e-05, It/sec 15.364, Tokens/sec 3810.285, Trained Tokens 253236, Peak mem 3.128 GB
Iter 810: Train loss 12.178, Learning Rate 1.000e-05, It/sec 13.995, Tokens/sec 3638.747, Trained Tokens 253496, Peak mem 3.128 GB
Iter 811: Train loss 12.223, Learning Rate 1.000e-05, It/sec 12.839, Tokens/sec 3851.616, Trained Tokens 253796, Peak mem 3.128 GB
Iter 812: Train loss 12.220, Learning Rate 1.000e-05, It/sec 12.462, Tokens/sec 3838.417, Trained Tokens 254104, Peak mem 3.128 GB
Iter 813: Train loss 12.217, Learning Rate 1.000e-05, It/sec 12.823, Tokens/sec 3898.227, Trained Tokens 254408, Peak mem 3.128 GB
Iter 814: Train loss 12.242, Learning Rate 1.000e-05, It/sec 12.817, Tokens/sec 3896.389, Trained Tokens 254712, Peak mem 3.128 GB
Iter 815: Train loss 12.237, Learning Rate 1.000e-05, It/sec 13.693, Tokens/sec 3614.895, Trained Tokens 254976, Peak mem 3.128 GB
Iter 816: Train loss 12.232, Learning Rate 1.000e-05, It/sec 12.896, Tokens/sec 4023.600, Trained Tokens 255288, Peak mem 3.128 GB
Iter 817: Train loss 12.228, Learning Rate 1.000e-05, It/sec 14.947, Tokens/sec 3363.034, Trained Tokens 255513, Peak mem 3.128 GB
Iter 818: Train loss 12.217, Learning Rate 1.000e-05, It/sec 12.972, Tokens/sec 4099.143, Trained Tokens 255829, Peak mem 3.128 GB
Iter 819: Train loss 12.189, Learning Rate 1.000e-05, It/sec 13.709, Tokens/sec 3893.442, Trained Tokens 256113, Peak mem 3.128 GB
Iter 820: Train loss 12.201, Learning Rate 1.000e-05, It/sec 10.274, Tokens/sec 3904.141, Trained Tokens 256493, Peak mem 3.128 GB
Iter 821: Train loss 12.210, Learning Rate 1.000e-05, It/sec 11.373, Tokens/sec 4230.863, Trained Tokens 256865, Peak mem 3.128 GB
Iter 822: Train loss 12.221, Learning Rate 1.000e-05, It/sec 11.839, Tokens/sec 3883.032, Trained Tokens 257193, Peak mem 3.128 GB

awni · 2025-04-21T16:16:42Z

Is this PR ready for review? Do you need help testing the actual model?

Goekdeniz-Guelmez · 2025-04-21T19:25:39Z

This PR is ready to merge, but to be 100% sure it would be great to test out the actual model.

awni · 2025-04-22T22:06:13Z

I tried running the model. It still has some issues.

First off, the tokenizer chat template requires a different input format. So we'll need to figure out how to deal with that. It crashes if you do what we have now:

messages = [{"role": "user", "content": prompt}]

Instead it wants:

messages = [ {"role": "user", "content": [{"type": "text", "text": prompt}]}]

When using the right spec for that, it then crashes with the following:

mlx_lm.generate --model mlx_model --prompt "Write a story about Einstein"
==========
Traceback (most recent call last):
  File "/Users/aimluser/miniconda/envs/awni/bin/mlx_lm.generate", line 33, in <module>
    sys.exit(load_entry_point('mlx-lm', 'console_scripts', 'mlx_lm.generate')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/mlr_share/awni/mlx-lm/mlx_lm/generate.py", line 811, in main
    response = generate(
               ^^^^^^^^^
  File "/Volumes/mlr_share/awni/mlx-lm/mlx_lm/generate.py", line 685, in generate
    for response in stream_generate(model, tokenizer, prompt, **kwargs):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/mlr_share/awni/mlx-lm/mlx_lm/generate.py", line 626, in stream_generate
    detokenizer.add_token(token)
  File "/Volumes/mlr_share/awni/mlx-lm/mlx_lm/tokenizer_utils.py", line 205, in add_token
    v = self.tokenmap[token]
        ~~~~~~~~~~~~~^^^^^^^
IndexError: list index out of range

awni · 2025-04-22T22:08:33Z

Somehow it's predicting a token that is not in the token map:

(Pdb) len(detokenizer.tokenmap)
200026
(Pdb) token
200032

awni · 2025-04-22T22:09:48Z

The vocab size is 200064 (see here, so it looks like it could be an issue with making the tokenmap 🤔

awni · 2025-04-22T22:23:06Z

Well there's definitely nothing higher than 200026 in the tokenizer so I don't know what's up with that vocab size mismatch. I think it's a bit of a red herring though as the model probably has a bug causing it to generate a bad token id.

To debug this one needs to use a machine with >=256Gb. I can help out soon if needed lmk.

Goekdeniz-Guelmez · 2025-04-23T13:18:04Z

Thanks for testing it out! That's interesting, I have this HF this is literally the same model architecture, but a smaller param model untrained, and this was the one I was developing with and it works, but yea well need the OG model. As for the machine thanks a lot for your help, however I would check with @ivanfioravanti for his machine, I'll try to continue finding the error after two weeks.

sriting · 2025-05-13T03:21:20Z

@Goekdeniz-Guelmez @awni Thanks for your work! How's the PR going?

Goekdeniz-Guelmez · 2025-05-13T07:28:28Z

Hello @sriting, I will be continuing to work on it on friday :D.

Goekdeniz-Guelmez · 2025-07-03T08:15:38Z

Can you guys try again? should be fixed now.

awni · 2025-07-03T13:33:09Z

Ok I will try it. But since this is a one-off thing with the MiniMax chat template .. I would prefer to update their chat template in the mlx-community version to be consistent with what every other model expects rather than changing the way we build messages in generate (which is not a complete fix since there are multiple places we use apply chat templates in mlx-lm). That would also make it easier for those using the API and making their own messages to have a consistent interface. Also probably worth filing an issue with them to support the standard chat template for LLMs as that's quite unusual.

Goekdeniz-Guelmez · 2025-07-03T13:41:47Z

Exactly, we're on the same wavelength! My small dev model has that already, so when a quantized version gets uploaded I'll PR the working chat template, as for this, lets get the inference first working and I'll rebase the tokenizer_utils back to the original. I can then create a issue here outlining this problem with the OG for the users.

mlx_lm/utils.py

awni · 2025-07-03T13:55:44Z

I'm not getting good results. It starts out with gibberish then it crashes with a missing token error because the model is producing a token outside the vocab. That's kind of a bug that we crash for that.. but it also usually means the implementation is off.

Goekdeniz-Guelmez · 2025-07-03T14:11:38Z

Thanks for trying it again! Most likely my implementation is wrong, I'll look into it later today.

KartavyaBagga · 2025-07-03T15:26:19Z

@awni @Goekdeniz-Guelmez

I am getting the same gibberish words 😂

KartavyaBagga · 2025-07-19T13:39:40Z

@awni @Goekdeniz-Guelmez
Any updates on this one ?

Goekdeniz-Guelmez · 2025-07-19T16:45:42Z

@KartavyaBagga nope, haven’t worked on it since the last push, but haven forgotten it. Some other research and mlx stuff has a higher priority then that, but eventually it’ll merge.

Goekdeniz-Guelmez · 2025-10-08T13:13:51Z

Hey @awni would you mind trying it out again, did quite a few changes in the code, also used a tiny model trained on "Hello World!" again, and it looks like it works:

python -m mlx_lm.generate --model Goekdeniz-Guelmez/MiniMax01Text-Dev  --prompt "Hello" --max-tokens 5
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 78608.11it/s]
==========
Hello World!
Hello World
==========
Prompt: 15 tokens, 956.528 tokens-per-sec
Generation: 5 tokens, 726.577 tokens-per-sec
Peak memory: 0.439 GB

Loading pretrained model
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 19891.69it/s]
Loading datasets
Loading Hugging Face dataset mlx-community/wikisql.
Training
Trainable parameters: 0.189% (0.201M/106.107M)
Starting training..., iters: 100
Calculating loss...: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.89it/s]
Iter 1: Val loss 12.293, Val took 0.531s
Iter 10: Train loss 12.250, Learning Rate 1.000e-05, It/sec 2.259, Tokens/sec 159.242, Trained Tokens 705, Peak mem 0.761 GB
Iter 20: Train loss 12.237, Learning Rate 1.000e-05, It/sec 18.659, Tokens/sec 1533.763, Trained Tokens 1527, Peak mem 0.970 GB
Iter 30: Train loss 12.198, Learning Rate 1.000e-05, It/sec 16.102, Tokens/sec 1407.320, Trained Tokens 2401, Peak mem 0.978 GB
Iter 40: Train loss 12.185, Learning Rate 1.000e-05, It/sec 23.160, Tokens/sec 1982.510, Trained Tokens 3257, Peak mem 0.978 GB
Iter 50: Train loss 12.131, Learning Rate 1.000e-05, It/sec 24.815, Tokens/sec 1997.646, Trained Tokens 4062, Peak mem 0.978 GB
Iter 60: Train loss 12.037, Learning Rate 1.000e-05, It/sec 24.989, Tokens/sec 1881.680, Trained Tokens 4815, Peak mem 0.978 GB
Iter 70: Train loss 11.983, Learning Rate 1.000e-05, It/sec 24.714, Tokens/sec 2075.957, Trained Tokens 5655, Peak mem 0.978 GB
Iter 80: Train loss 11.834, Learning Rate 1.000e-05, It/sec 24.961, Tokens/sec 2081.718, Trained Tokens 6489, Peak mem 0.978 GB
Iter 90: Train loss 11.688, Learning Rate 1.000e-05, It/sec 23.163, Tokens/sec 2073.105, Trained Tokens 7384, Peak mem 0.978 GB
Calculating loss...: 100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 63.82it/s]
Iter 100: Val loss 11.567, Val took 0.017s
Iter 100: Train loss 11.550, Learning Rate 1.000e-05, It/sec 24.986, Tokens/sec 1873.923, Trained Tokens 8134, Peak mem 0.978 GB

Goekdeniz-Guelmez · 2025-10-28T08:16:24Z

@awni will be closing this since we dont need it anymore.

awni · 2025-10-28T13:33:46Z

Yes makes sense. I am not sure if anyone will want to run the old minimax text 01, if so we can revive this / find a way to salvage the relevant pieces.

Thanks for working on it, sorry for the delay getting it reviewed.

Goekdeniz-Guelmez · 2025-10-28T13:49:19Z

the minimax 1 and 2 are the same mixtral type
https://huggingface.co/MiniMaxAI/MiniMax-M2/commit/44cefa66f81ec7ffebcb580c02985726ef4d829e
https://github.com/huggingface/transformers/blob/main/src/transformers/models/minimax/modular_minimax.py

so both should work :D

Goekdeniz-Guelmez added 7 commits March 19, 2025 18:26

initial commit

9f2c08e

update adding first drafts for attention

07ee739

finish MLPs

fc68105

updates

e397946

first succsesfull inference

dcf8c95

working

0a97808

update acknowledgements

105425a

Goekdeniz-Guelmez mentioned this pull request Mar 19, 2025

MiniMax-Text-01 ml-explore/mlx-examples#1290

Open

Goekdeniz-Guelmez and others added 3 commits March 19, 2025 23:39

clean up

a1991f5

starting to make it trainable

e58e812

fix training

2a12d81

Goekdeniz-Guelmez added 3 commits March 25, 2025 09:08

Merge branch 'ml-explore:main' into adding-support-for-MiniMax01Text

bef2229

Merge branch 'ml-explore:main' into adding-support-for-MiniMax01Text

fc17b05

Merge branch 'main' into adding-support-for-MiniMax01Text

5b23326

Goekdeniz-Guelmez added 4 commits April 21, 2025 21:33

format + clean up + nits

2b7f7bf

nits

c5ab648

fix ackowledgements typo

8a39283

nits

2032a30

Goekdeniz-Guelmez added 2 commits July 3, 2025 09:45

nits

894d86c

fix

18f59e6

awni reviewed Jul 3, 2025

View reviewed changes

mlx_lm/utils.py Outdated Show resolved Hide resolved

Goekdeniz-Guelmez marked this pull request as draft July 26, 2025 08:02

Goekdeniz-Guelmez added 9 commits September 24, 2025 10:42

Merge branch 'main' into adding-support-for-MiniMax01Text

0c7a906

Merge branch 'main' into adding-support-for-MiniMax01Text

47bcade

updates

0841c20

updates

b1ae9ce

updates

b14734c

updates

0ddbb35

updates

100dedc

updates

dd89340

add to test

e82d97e

Goekdeniz-Guelmez marked this pull request as ready for review October 8, 2025 14:02

Goekdeniz-Guelmez requested a review from awni October 8, 2025 14:03

Goekdeniz-Guelmez added 2 commits October 9, 2025 09:27

Merge branch 'ml-explore:main' into adding-support-for-MiniMax01Text

9a6b13e

Merge branch 'ml-explore:main' into adding-support-for-MiniMax01Text

05ca8cd

Goekdeniz-Guelmez closed this Oct 28, 2025

Adding support for MiniMax01Text #38

Adding support for MiniMax01Text #38

Uh oh!

Conversation

Goekdeniz-Guelmez commented Mar 19, 2025

Uh oh!

Goekdeniz-Guelmez commented Mar 19, 2025

Uh oh!

Goekdeniz-Guelmez commented Mar 19, 2025

Uh oh!

Goekdeniz-Guelmez commented Mar 21, 2025

Uh oh!

awni commented Apr 21, 2025

Uh oh!

Goekdeniz-Guelmez commented Apr 21, 2025

Uh oh!

awni commented Apr 22, 2025

Uh oh!

awni commented Apr 22, 2025

Uh oh!

awni commented Apr 22, 2025

Uh oh!

awni commented Apr 22, 2025

Uh oh!

Goekdeniz-Guelmez commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sriting commented May 13, 2025

Uh oh!

Goekdeniz-Guelmez commented May 13, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

awni commented Jul 3, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

Uh oh!

awni commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Goekdeniz-Guelmez commented Jul 3, 2025

Uh oh!

KartavyaBagga commented Jul 3, 2025

Uh oh!

KartavyaBagga commented Jul 19, 2025

Uh oh!

Goekdeniz-Guelmez commented Jul 19, 2025

Uh oh!

Goekdeniz-Guelmez commented Oct 8, 2025

Uh oh!

Goekdeniz-Guelmez commented Oct 28, 2025

Uh oh!

awni commented Oct 28, 2025

Uh oh!

Goekdeniz-Guelmez commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Goekdeniz-Guelmez commented Apr 23, 2025 •

edited

Loading

awni commented Jul 3, 2025 •

edited

Loading