Skip to content

Conversation

@Signal46
Copy link

Per discussion in cjpais/Handy#381

  • Implements SmartChunker in src/chunking.rs to split audio on silence.
  • Adds src/vad.rs with a custom, lightweight SileroVad wrapper using ort and ndarray to avoid dependency conflicts.
  • Exposes transcribe_with_smart_chunking in the TranscriptionEngine trait.
  • Adds tests/smart_chunking.rs to verify VAD and chunking logic.
  • Updates Cargo.toml with necessary dependencies (anyhow, ndarray, reqwest for tests).
  • Updates .gitignore to exclude *.onnx model files.

- Implements [SmartChunker](cci:2://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/src/chunking.rs:5:0-5:24) in [src/chunking.rs](cci:7://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/src/chunking.rs:0:0-0:0) to split audio on silence.
- Adds [src/vad.rs](cci:7://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/src/vad.rs:0:0-0:0) with a custom, lightweight [SileroVad](cci:2://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/src/vad.rs:6:0-11:1) wrapper using `ort` and `ndarray` to avoid dependency conflicts.
- Exposes [transcribe_with_smart_chunking](cci:1://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/src/lib.rs:204:4-228:5) in the [TranscriptionEngine](cci:2://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/src/lib.rs:125:0-229:1) trait.
- Adds [tests/smart_chunking.rs](cci:7://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/tests/smart_chunking.rs:0:0-0:0) to verify VAD and chunking logic.
- Updates [Cargo.toml](cci:7://file:///c:/transcribe-rs/transcribe-rs_SmartChunking/Cargo.toml:0:0-0:0) with necessary dependencies (`anyhow`, `ndarray`, `reqwest` for tests).
- Updates `.gitignore` to exclude `*.onnx` model files.
Add progress callback parameter for real-time progress reporting
Fix VAD sample rate input type from f32 to i64 (resolves the "Unexpected input data type" error)
Update TranscriptionEngine trait signature
@cjpais
Copy link
Owner

cjpais commented Nov 28, 2025

First I wanna say thank you for this. I really appreciate you porting the code to make a PR here. I did skim the code this morning. I think there's some minor things I will want to tweak, but I want to simmer on them for a few days. Just as I think about the overall architecture of this library and how ultimately it will be used.

- Implement `decode_and_resample` to support various audio formats (MP3, M4A, etc.)
- Update `transcribe_file` to use the new decoder, enabling native support for non-WAV files
- Add `symphonia` and `rubato` dependencies
@cjpais
Copy link
Owner

cjpais commented Dec 4, 2025

this is on my todo, but that list is long right now, i will review when I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants