Skip to content

Conversation

@maximizemaxwell
Copy link
Contributor

What does this PR do?

Implemented voxtral and examples in candle.

Issue

Part of #3028

Requirements

  • Need fixes to fully run the code

@maximizemaxwell maximizemaxwell marked this pull request as draft July 22, 2025 12:08
@jorge-menjivar
Copy link
Contributor

jorge-menjivar commented Jul 23, 2025

Proposal

The Candle code should replicate the output of the Transformers implementation of Voxtral, word by word, when run with deterministic values. My current test consists of three different audio files.

Challenges

I have encountered the following challenges while working on the current implementation:

Tekken

The Tekken tokenizer only seems to be officially supported in Python so far.

My current solution has been to reimplement the tokenizer in Rust completely to remove the Python deps. I hope that we can find a better solution for this; otherwise, I will publish my code as its crate.

Mel Spectrogram

The current implementation of Whisper's pcm_to_mel() in Candle does not seem to precisely match Python's WhisperFeatureExtractor, causing it to output different values.

My current solution is to rewrite the whole pcm_to_mel function externally (in the example) to better align with Python's. This is where I am struggling to understand what's going on, since Whisper is working fine in Candle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants