Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Nice! This is awesome. I tried it and it works in the test site. At the same time, I think we should think carefully before we commit to this approach. Here are a couple of my concerns
I'm tempted to go more the route of, instead of adding a maxSpeechMS parameter, adding a method that allows the user to "flush" audio so that they can do something with it. The user could then manually set an interval timer to call that as often as they want. One of the benefits then is that they could also get fancy with it, like calling it when speech probability is relatively low instead of just at regular intervals. I'm not committed to any one approach but I want to talk about the pros and cons of each before we commit. Feel free to tag additional people who you think may have an opinion. |
|
@pepe95270 feel free also to share your thoughts as a user |
|
Thank you both, this is a great addition to the project !
|
|
Thank you for your comments and reviews! Yes, I also think adding Currently, we But yes I think being able to get the current Also, for non-real-time vad? how would it look like? Tagging @AmgadHasan if they have any input on this. #79 |
I think we can ignore this feature for non-real-time vad. It can be a real time-only feature. I think that yeah, we can try going with adding a flushAudio method like you mention. It should have the property that if you call flushAudio, the audio segment that you get does not appear in the audio segment in onSpeechEnd, i.e. it truly flushes the audio instead of just giving you access to it. Does that sound good to everyone? |
Sounds good |
Description of changes
Adds a new maxSpeechMs parameter to control the maximum duration of speech segments. When a speech segment exceeds this duration, it is automatically force-cut and emitted, and a new segment starts if speech is still detected.
Added
maxSpeechMsoption (default:Infinity) toFrameProcessorOptions. Implemented force-cut logic in theprocessmethod that emitsSpeechEndwhenaudioBuffer.length >= maxSpeechFramesand starts a new segment if speech continues.packages/web/src/frame-processor.ts: Adds
maxSpeechMsand force-cut logic.packages/web/src/real-time-vad.ts: Passes
maxSpeechMstoFrameProcessor.packages/web/src/non-real-time-vad.ts: Passes
maxSpeechMstoFrameProcessor.packages/react/src/index.ts: Passes
maxSpeechMsthrough to the underlying VAD.test-site/src/index.tsx: Adds
maxSpeechMsto the configurable parameters UI and refactors shared VAD parameters.test-site/src/non-real-time-test.tsx: Updated to accept all VAD parameters as props from the parent component.
docs/user-guide/api.md: Documents
maxSpeechMsforMicVAD,NonRealTimeVAD, anduseMicVAD.Related discussion: ricky0123/vad#79
Reference implementation: Silero VAD force-cut logic
Checklist