Add maxSpeechMs option by alielbekov · Pull Request #256 · ricky0123/vad

alielbekov · 2026-01-03T14:38:16Z

Description of changes

Adds a new maxSpeechMs parameter to control the maximum duration of speech segments. When a speech segment exceeds this duration, it is automatically force-cut and emitted, and a new segment starts if speech is still detected.

Added maxSpeechMs option (default: Infinity) to FrameProcessorOptions. Implemented force-cut logic in the process method that emits SpeechEnd when audioBuffer.length >= maxSpeechFrames and starts a new segment if speech continues.
packages/web/src/frame-processor.ts: Adds maxSpeechMs and force-cut logic.
packages/web/src/real-time-vad.ts: Passes maxSpeechMs to FrameProcessor.
packages/web/src/non-real-time-vad.ts: Passes maxSpeechMs to FrameProcessor.
packages/react/src/index.ts: Passes maxSpeechMs through to the underlying VAD.
test-site/src/index.tsx: Adds maxSpeechMs to the configurable parameters UI and refactors shared VAD parameters.
test-site/src/non-real-time-test.tsx: Updated to accept all VAD parameters as props from the parent component.
docs/user-guide/api.md: Documents maxSpeechMs for MicVAD, NonRealTimeVAD, and useMicVAD.
Related discussion: ricky0123/vad#79
Reference implementation: Silero VAD force-cut logic

Checklist

Verified that the typechecking & formatting Github actions passes successfully
Verified that changes work on the test site, adding changes to the test site if necessary to try out your changes

…e test site

vercel · 2026-01-03T14:38:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
vad_test_site	Ready	Preview, Comment	Jan 3, 2026 2:38pm

ricky0123 · 2026-01-11T11:59:56Z

Nice! This is awesome. I tried it and it works in the test site. At the same time, I think we should think carefully before we commit to this approach. Here are a couple of my concerns

When max speech ms is reached and speech end/speech start is called, it's not transparent in the callbacks that this is all part of the same utterance. People may want to combine these audio segments in their backend and now they have to find some way to figure out if two instances of onSpeechEnd being called were part of the same larger utterance.
I'm not sure about resetting redemptionCounter when maxSpeechMS is reached. I think that reaching maxSpeechMS is more about "flushing" the buildup of audio rather than updating the state of the VAD algorithm.

I'm tempted to go more the route of, instead of adding a maxSpeechMS parameter, adding a method that allows the user to "flush" audio so that they can do something with it. The user could then manually set an interval timer to call that as often as they want. One of the benefits then is that they could also get fancy with it, like calling it when speech probability is relatively low instead of just at regular intervals.

I'm not committed to any one approach but I want to talk about the pros and cons of each before we commit.

Feel free to tag additional people who you think may have an opinion.

ricky0123 · 2026-01-11T12:02:34Z

@pepe95270 feel free also to share your thoughts as a user

pepe95270 · 2026-01-11T12:35:06Z

Thank you both, this is a great addition to the project !
As pointed by ricky, I agree that a method to allow the user to "flush" audio add even more value because it will give the developer far more control and will unable:

create a frontend button for flushing
do fancy stuff to decide when to flush
simulate the same behavior as "maxSpeechMs". Example : setInterval( MicVAD.newFlushMethod, 60000 );

alielbekov · 2026-01-12T09:01:07Z

Thank you for your comments and reviews! Yes, I also think adding flushAudio: (audio: Float32Array) => {}, could be a lot more useful.

Currently, we ~~flush~~ emit audio in frames onFrameProcessed: ({...}, frame: Float32Array). We could potentially just work around by collecting the processed frames instead of calling "flushAudio"?

But yes I think being able to get the current audio: Float32Array any time is a lot nicer.

Also, for non-real-time vad? how would it look like?

Tagging @AmgadHasan if they have any input on this. #79

ricky0123 · 2026-01-13T09:28:36Z

Thank you for your comments and reviews! Yes, I also think adding flushAudio: (audio: Float32Array) => {}, could be a lot more useful.

Currently, we ~~flush~~ emit audio in frames onFrameProcessed: ({...}, frame: Float32Array). We could potentially just work around by collecting the processed frames instead of calling "flushAudio"?

But yes I think being able to get the current audio: Float32Array any time is a lot nicer.

Also, for non-real-time vad? how would it look like?

Tagging @AmgadHasan if they have any input on this. #79

I think we can ignore this feature for non-real-time vad. It can be a real time-only feature. I think that yeah, we can try going with adding a flushAudio method like you mention. It should have the property that if you call flushAudio, the audio segment that you get does not appear in the audio segment in onSpeechEnd, i.e. it truly flushes the audio instead of just giving you access to it. Does that sound good to everyone?

alielbekov · 2026-01-13T16:19:09Z

it truly flushes the audio instead of just giving you access to it.

Sounds good

alielbekov added 3 commits December 29, 2025 10:59

initial update frame processor to handle maxSpeechMs

54b4e68

non-real-time vad works with maxSpeechMs and imports parameters on th…

981e6df

…e test site

formatted

1d13b84

alielbekov mentioned this pull request Jan 5, 2026

need feature for force trigger onSpeechEnd when accumulate frame to a number #202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add maxSpeechMs option#256

Add maxSpeechMs option#256
alielbekov wants to merge 3 commits intomasterfrom
max-segment-duration

alielbekov commented Jan 3, 2026

Uh oh!

vercel Bot commented Jan 3, 2026

Uh oh!

ricky0123 commented Jan 11, 2026 •

edited

Loading

Uh oh!

ricky0123 commented Jan 11, 2026

Uh oh!

pepe95270 commented Jan 11, 2026

Uh oh!

alielbekov commented Jan 12, 2026 •

edited

Loading

Uh oh!

ricky0123 commented Jan 13, 2026

Uh oh!

alielbekov commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

alielbekov commented Jan 3, 2026

Description of changes

Checklist

Uh oh!

vercel Bot commented Jan 3, 2026

Uh oh!

ricky0123 commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricky0123 commented Jan 11, 2026

Uh oh!

pepe95270 commented Jan 11, 2026

Uh oh!

alielbekov commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricky0123 commented Jan 13, 2026

Uh oh!

alielbekov commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ricky0123 commented Jan 11, 2026 •

edited

Loading

alielbekov commented Jan 12, 2026 •

edited

Loading