Skip to content

nico-martin/vad-recorder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vad-recorder

vad-recorder is a browser-focused TypeScript library that combines voice activity detection (VAD) with automatic audio segment recording.

It uses Silero VAD via @huggingface/transformers (Transformers.js) and the onnx-community/silero-vad model on Hugging Face under the hood.

Install

npm install vad-recorder

Quick start

import { VadRecorder } from "vad-recorder";

const info = await VadRecorder.info();
console.log(info.isCached, info.downloadSize);

const recorder = new VadRecorder({
  threshold: 0.55,
  minSpeechDuration: 250,
  minSilenceDuration: 900,
  prependSilence: 120,
  appendSilence: 300,
});

recorder.onReady(() => console.log("Listening..."));
recorder.onSpeechStart(() => console.log("Speech start"));
recorder.onSpeechEnd(() => console.log("Speech end"));
recorder.onRecord((blob) => console.log("Recorded blob", blob));
recorder.onError((err) => console.error(err));

await recorder.initialize((event) => {
  if (event.status === "downloading") {
    console.log(`Model download: ${Math.round(event.progress * 100)}%`);
  }
});

await recorder.start();

React hook

For React projects, use the hook export:

import { useVadRecorder } from "vad-recorder/react";

function App() {
  const {
    status,
    progress,
    recordings,
    error,
    initialize,
    start,
    stop,
    pause,
    resume,
    clearRecordings,
  } = useVadRecorder({
    threshold: 0.55,
    minSpeechDuration: 250,
    minSilenceDuration: 900,
    prependSilence: 120,
    appendSilence: 300,
  });

  return (
    <div>
      <p>Status: {status}</p>
      <p>Download: {Math.round(progress * 100)}%</p>
      <button onClick={() => void initialize()}>Initialize</button>
      <button onClick={() => void start()}>Start</button>
      <button onClick={pause}>Pause</button>
      <button onClick={resume}>Resume</button>
      <button onClick={stop}>Stop</button>
      <button onClick={clearRecordings}>Clear</button>
      <p>Recordings: {recordings.length}</p>
      {error ? <pre>{error.message}</pre> : null}
    </div>
  );
}

useVadRecorder(options?) returns:

  • status, progress, volumeDb, speechProbability
  • recordings, error, recorder
  • initialize, start, stop, pause, resume, destroy, clearRecordings, info

API

VadRecorder.info(): Promise<{ isCached: boolean; downloadSize: number }>

Returns model cache/download metadata.

  • isCached: whether required model files are cached.
  • downloadSize: sum of all model file sizes (bytes).

new VadRecorder(options?)

All options are optional:

  • threshold (default 0.5)
    • Speech probability cutoff (0-1).
    • Higher = stricter detection (fewer false positives, can miss quiet speech).
    • Lower = more sensitive (captures quiet speech, can trigger on noise).
  • minSpeechDuration ms (default 250)
    • Minimum continuous speech before a segment officially starts.
    • Helps filter clicks, breaths, and very short noises.
  • minSilenceDuration ms (default 1000)
    • Required silence before a segment is considered finished.
    • Increase to avoid splitting natural pauses mid-sentence.
  • prependSilence ms (default 100)
    • Audio prepended before detected speech to avoid clipping first phonemes.
    • Internally combined with minSpeechDuration in the rolling pre-buffer.
  • appendSilence ms (default 300)
    • Extra audio kept after speech end is detected.
    • Helps avoid cutting off trailing words/syllables.

Lifecycle

  • initialize(onProgress?): loads VAD model, safe to call multiple times.
  • start(): requests mic and starts frame processing.
  • pause(): pauses VAD processing.
  • resume(): resumes VAD processing.
  • stop(): stops mic + processing, keeps model loaded.
  • destroy(): full cleanup (mic + model + listeners).

Events (single-listener setters)

  • onRecord((blob) => void)
  • onSpeechStart(() => void)
  • onSpeechEnd(() => void)
  • onReady(() => void)
  • onError((error) => void)
  • onVolumeChange((db) => void)
  • onSpeechProbability((p) => void)

Progress callback

initialize(onProgress) currently emits download progress from progress_total events only.

  • Rounded to 2 decimals (0.00 to 100.00)
  • Emitted only when the rounded value changes

Development

npm install
npm run dev

Build for publish:

npm run build

Type-check:

npm run typecheck

Example apps

A minimal vanilla demo is included at examples/simple.

cd examples/simple
npm install
npm run dev

A React demo is included at examples/react.

cd examples/react
npm install
npm run dev

Notes

  • Designed for browser environments.
  • Sample rate is fixed at 16000 (Silero VAD requirement).
  • Channel count is fixed at mono (1).
  • Current recording output is WAV blobs (audio/wav) for deterministic PCM assembly.

About

A browser-focused TypeScript library that combines voice activity detection (VAD) with automatic audio segment recording.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors