Skip to content

Fix/vad buffer calculation#9

Closed
Mirza-Samad-Ahmed-Baig wants to merge 6 commits into
moeru-ai:mainfrom
Mirza-Samad-Ahmed-Baig:main
Closed

Fix/vad buffer calculation#9
Mirza-Samad-Ahmed-Baig wants to merge 6 commits into
moeru-ai:mainfrom
Mirza-Samad-Ahmed-Baig:main

Conversation

@Mirza-Samad-Ahmed-Baig
Copy link
Copy Markdown

@Mirza-Samad-Ahmed-Baig Mirza-Samad-Ahmed-Baig commented Jun 29, 2025

What it solves:
This pull request resolves an issue in the VAD (Voice Activity Detection) class within packages/utils-vad/src/vad.ts. In the processSpeechSegment
method, the speechPadSamples value was incorrectly added twice when calculating the size of the finalBuffer and when slicing the main speech segment.
This led to the finalBuffer being larger than necessary and potentially containing redundant or incorrect data, especially in the post-speech padding
area.

How it solves:
I refactored the processSpeechSegment method to correctly calculate the finalBuffer size and apply padding.

The finalBuffer is now initialized with a size that only accounts for prevBuffers and the bufferPointer (the actual speech segment).
The main speech segment is sliced using only this.bufferPointer, ensuring no extra padding is included at this stage.
A new paddedBuffer is created after the finalBuffer is populated, and speechPadSamples is added to its length. The finalBuffer content is then copied
into this paddedBuffer, effectively applying the post-speech padding correctly.
What effect it will have:

Correct Buffer Sizing: The finalBuffer will now be precisely sized, preventing unnecessary memory allocation and potential data corruption.
Accurate Speech Segment Processing: The VAD output will contain the correct audio data, with appropriate pre- and post-speech padding, ensuring the
integrity of the detected speech segments.
Improved Performance: By avoiding oversized buffers, this change can lead to minor performance improvements, especially in scenarios with continuous
audio processing.

@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 29, 2025

Deploy Preview for xsai-transformers failed. Why did it fail? →

Name Link
🔨 Latest commit 734813e
🔍 Latest deploy log https://app.netlify.com/projects/xsai-transformers/deploys/686134da9d9bbb0008f88ce9

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Mirza-Samad-Ahmed-Baig, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses several issues identified during a 'Gemini review,' focusing on enhancing project stability, improving code robustness, and refining core functionalities. Key changes include comprehensive dependency version alignment, introducing a pre-push hook for automated testing, and making specific improvements to JSON parsing, transcription result handling, and Voice Activity Detection (VAD) buffer management.

Highlights

  • Dependency Management & Tooling Updates: I've updated numerous development dependencies in package.json to align with specific versions, including @antfu/eslint-config, typescript, vite, and vitest. This also involved removing the enforced packageManager version and updating nolyfill package overrides.
  • Pre-Push Hook for Quality Assurance: I've introduced a pre-push Git hook in package.json to automatically run tests (pnpm test) before code is pushed, enhancing code quality and preventing regressions.
  • Robust JSON Parsing in Embed Provider: I've added a try-catch block around JSON parsing of the request body in packages/embed/src/index.ts to gracefully handle malformed input, falling back to a default empty object and logging errors.
  • Improved Transcription Result Initialization: I've modified the GenerateTranscriptionResult structure in packages/transcription/src/index.ts to initialize fields like duration, language, segments, and words with appropriate default values (e.g., 0, '', []) instead of undefined as never, improving type safety and initial state.
  • Refined VAD Buffer Handling: I've adjusted the Voice Activity Detection (VAD) buffer management in packages/utils-vad/src/vad.ts to correctly apply post-speech padding. This involves creating a paddedBuffer after the main speech segment is assembled, ensuring accurate audio processing for the speech-ready event.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Mirza-Samad-Ahmed-Baig Mirza-Samad-Ahmed-Baig changed the title Gemini review issues resolved Fix/vad buffer calculation Jun 29, 2025
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request appears to address issues from a previous review. It includes several good bug fixes, particularly for error handling in JSON parsing and providing safer default values. It also corrects a bug in the Voice Activity Detection (VAD) padding logic. However, I have some concerns about the project configuration changes. There's a significant downgrade of many development dependencies which is unusual and potentially risky. I've also noted a minor opportunity for performance improvement in the VAD utility.

Comment thread package.json
Comment thread packages/utils-vad/src/vad.ts Outdated
Comment thread package.json
"type": "module",
"version": "0.0.7",
"private": true,
"packageManager": "pnpm@10.12.1",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inappropriate to remove the package manager and downgrade most dependencies.

Comment thread package.json
Comment on lines +22 to +23
"@antfu/eslint-config": "^2.24.0",
"@antfu/ni": "^0.22.0",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why do we have to downgrade those?

Comment thread package.json
Comment on lines +29 to +31
"@types/node": "^20.16.2",
"@unocss/eslint-config": "^0.61.8",
"@unocss/eslint-plugin": "^0.61.8",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems many dependencies were downgraded, is this intentional?

@nekomeowww
Copy link
Copy Markdown
Member

Build all failed, please fix, feel free to re-open this PR when finished.

@nekomeowww nekomeowww closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants