Fix/vad buffer calculation by Mirza-Samad-Ahmed-Baig · Pull Request #9 · moeru-ai/xsai-transformers

Mirza-Samad-Ahmed-Baig · 2025-06-29T12:38:12Z

What it solves:
This pull request resolves an issue in the VAD (Voice Activity Detection) class within packages/utils-vad/src/vad.ts. In the processSpeechSegment
method, the speechPadSamples value was incorrectly added twice when calculating the size of the finalBuffer and when slicing the main speech segment.
This led to the finalBuffer being larger than necessary and potentially containing redundant or incorrect data, especially in the post-speech padding
area.

How it solves:
I refactored the processSpeechSegment method to correctly calculate the finalBuffer size and apply padding.

The finalBuffer is now initialized with a size that only accounts for prevBuffers and the bufferPointer (the actual speech segment).
The main speech segment is sliced using only this.bufferPointer, ensuring no extra padding is included at this stage.
A new paddedBuffer is created after the finalBuffer is populated, and speechPadSamples is added to its length. The finalBuffer content is then copied
into this paddedBuffer, effectively applying the post-speech padding correctly.
What effect it will have:

Correct Buffer Sizing: The finalBuffer will now be precisely sized, preventing unnecessary memory allocation and potential data corruption.
Accurate Speech Segment Processing: The VAD output will contain the correct audio data, with appropriate pre- and post-speech padding, ensuring the
integrity of the detected speech segments.
Improved Performance: By avoiding oversized buffers, this change can lead to minor performance improvements, especially in scenarios with continuous
audio processing.

…d error handling, and transcription language.

netlify · 2025-06-29T12:38:16Z

❌ Deploy Preview for xsai-transformers failed. Why did it fail? →

Name	Link
🔨 Latest commit	`734813e`
🔍 Latest deploy log	https://app.netlify.com/projects/xsai-transformers/deploys/686134da9d9bbb0008f88ce9

gemini-code-assist

Summary of Changes

Hello @Mirza-Samad-Ahmed-Baig, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses several issues identified during a 'Gemini review,' focusing on enhancing project stability, improving code robustness, and refining core functionalities. Key changes include comprehensive dependency version alignment, introducing a pre-push hook for automated testing, and making specific improvements to JSON parsing, transcription result handling, and Voice Activity Detection (VAD) buffer management.

Highlights

Dependency Management & Tooling Updates: I've updated numerous development dependencies in package.json to align with specific versions, including @antfu/eslint-config, typescript, vite, and vitest. This also involved removing the enforced packageManager version and updating nolyfill package overrides.
Pre-Push Hook for Quality Assurance: I've introduced a pre-push Git hook in package.json to automatically run tests (pnpm test) before code is pushed, enhancing code quality and preventing regressions.
Robust JSON Parsing in Embed Provider: I've added a try-catch block around JSON parsing of the request body in packages/embed/src/index.ts to gracefully handle malformed input, falling back to a default empty object and logging errors.
Improved Transcription Result Initialization: I've modified the GenerateTranscriptionResult structure in packages/transcription/src/index.ts to initialize fields like duration, language, segments, and words with appropriate default values (e.g., 0, '', []) instead of undefined as never, improving type safety and initial state.
Refined VAD Buffer Handling: I've adjusted the Voice Activity Detection (VAD) buffer management in packages/utils-vad/src/vad.ts to correctly apply post-speech padding. This involves creating a paddedBuffer after the main speech segment is assembled, ensuring accurate audio processing for the speech-ready event.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request appears to address issues from a previous review. It includes several good bug fixes, particularly for error handling in JSON parsing and providing safer default values. It also corrects a bug in the Voice Activity Detection (VAD) padding logic. However, I have some concerns about the project configuration changes. There's a significant downgrade of many development dependencies which is unusual and potentially risky. I've also noted a minor opportunity for performance improvement in the VAD utility.

kwaa · 2025-07-01T07:27:28Z

  "type": "module",
  "version": "0.0.7",
  "private": true,
-  "packageManager": "pnpm@10.12.1",


It is inappropriate to remove the package manager and downgrade most dependencies.

nekomeowww · 2025-07-01T07:33:36Z

+    "@antfu/eslint-config": "^2.24.0",
+    "@antfu/ni": "^0.22.0",


I am curious, why do we have to downgrade those?

nekomeowww · 2025-07-01T07:34:00Z

+    "@types/node": "^20.16.2",
+    "@unocss/eslint-config": "^0.61.8",
+    "@unocss/eslint-plugin": "^0.61.8",


It seems many dependencies were downgraded, is this intentional?

nekomeowww · 2025-07-01T07:35:27Z

Build all failed, please fix, feel free to re-open this PR when finished.

Mirza-Samad-Ahmed-Baig added 5 commits June 29, 2025 12:28

feat: update dependencies and add pre-push hook

2a15a59

fix(embed): handle JSON parsing errors in fetch

484c9e5

fix(transcription): populate GenerateTranscriptionResult

6ee44d5

fix(vad): correct finalBuffer calculation in processSpeechSegment

826a8d7

Fix: Apply Gemini Code Assist review suggestions for VAD buffer, embe…

ec60358

…d error handling, and transcription language.

gemini-code-assist Bot reviewed Jun 29, 2025

View reviewed changes

Mirza-Samad-Ahmed-Baig changed the title ~~Gemini review issues resolved~~ Fix/vad buffer calculation Jun 29, 2025

gemini-code-assist Bot reviewed Jun 29, 2025

View reviewed changes

Comment thread package.json

Comment thread packages/utils-vad/src/vad.ts Outdated

Fix: Optimize VAD buffer allocation and padding in processSpeechSegment.

734813e

kwaa reviewed Jul 1, 2025

View reviewed changes

nekomeowww reviewed Jul 1, 2025

View reviewed changes

nekomeowww closed this Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/vad buffer calculation#9

Fix/vad buffer calculation#9
Mirza-Samad-Ahmed-Baig wants to merge 6 commits into
moeru-ai:mainfrom
Mirza-Samad-Ahmed-Baig:main

Mirza-Samad-Ahmed-Baig commented Jun 29, 2025 •

edited

Loading

Uh oh!

netlify Bot commented Jun 29, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

kwaa Jul 1, 2025

Uh oh!

nekomeowww Jul 1, 2025

Uh oh!

nekomeowww Jul 1, 2025

Uh oh!

nekomeowww commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mirza-Samad-Ahmed-Baig commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Deploy Preview for xsai-transformers failed. Why did it fail? →

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

kwaa Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

nekomeowww Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

nekomeowww Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

nekomeowww commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mirza-Samad-Ahmed-Baig commented Jun 29, 2025 •

edited

Loading

netlify Bot commented Jun 29, 2025 •

edited

Loading