Feature Request: voxtral implementation #3030

maximizemaxwell · 2025-07-22T12:03:43Z

What does this PR do?

Implemented voxtral and examples in candle.

Issue

Part of #3028

Requirements

Need fixes to fully run the code

jorge-menjivar · 2025-07-23T00:28:22Z

Proposal

The Candle code should replicate the output of the Transformers implementation of Voxtral, word by word, when run with deterministic values. My current test consists of three different audio files.

Challenges

I have encountered the following challenges while working on the current implementation:

Tekken

The Tekken tokenizer only seems to be officially supported in Python so far.

My current solution has been to reimplement the tokenizer in Rust completely to remove the Python deps. I hope that we can find a better solution for this; otherwise, I will publish my code as its crate.

Mel Spectrogram

The current implementation of Whisper's pcm_to_mel() in Candle does not seem to precisely match Python's WhisperFeatureExtractor, causing it to output different values.

My current solution is to rewrite the whole pcm_to_mel function externally (in the example) to better align with Python's. This is where I am struggling to understand what's going on, since Whisper is working fine in Candle.

maximizemaxwell added 9 commits July 20, 2025 13:41

feat: implement some configs in voxtral

baad80d

fix: fixed imports, implement more func

66fd6d9

feat: implemented full version, need fixes

412125a

fix: fixed some compile errors

586154f

feat: add initial examples

161d3d5

fix: fixed voxtral.rs

81f4c2d

fix: fixed compile errors in examples

13e22cf

fix: fixed compile errors

3c6827e

fix: update model integration

d1ffe9f

maximizemaxwell marked this pull request as draft July 22, 2025 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: voxtral implementation #3030

Feature Request: voxtral implementation #3030

Uh oh!

maximizemaxwell commented Jul 22, 2025

Uh oh!

jorge-menjivar commented Jul 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature Request: voxtral implementation #3030

Are you sure you want to change the base?

Feature Request: voxtral implementation #3030

Uh oh!

Conversation

maximizemaxwell commented Jul 22, 2025

What does this PR do?

Issue

Requirements

Uh oh!

jorge-menjivar commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposal

Challenges

Tekken

Mel Spectrogram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jorge-menjivar commented Jul 23, 2025 •

edited

Loading