firered-vad

Streaming Voice Activity Detection that wraps the FireRedVAD ONNX model.

Introduction

Streaming Voice Activity Detection that wraps the FireRedVAD ONNX model. Bit-for-bit parity with upstream Python's FireRedStreamVad, with a Sans-I/O Rust API designed for piping continuous human-speech windows into Whisper or any other downstream consumer.

A sibling crate to silero for callers who want a true streaming VAD: 10 ms frame granularity, no externally-managed RNN state, and a built-in postprocessor with smoothing and a 4-state machine.

Installation

[dependencies]
firered-vad = "0.1"

The default bundled feature embeds the ONNX model (~2.3 MB) and CMVN stats. Disable to ship your own:

[dependencies]
firered-vad = { version = "0.1", default-features = false }

Examples

Please see details in examples.

API at a glance

Vad is a single Sans-I/O state machine:

Method	Purpose
`Vad::bundled()`	Construct from the bundled ONNX + CMVN with default options
`Vad::bundled_with(opts)`	Same, with custom `VadOptions`
`Vad::from_memory(model)` / `from_file(path)`	Custom model bytes/path with bundled CMVN
`Vad::from_memory_with_cmvn` / `Vad::from_file_with_cmvn`	Fully-custom model + CMVN
`Vad::from_ort_session(session, cmvn, opts)`	Wrap an externally-built `ort::Session`
`push_samples(&[f32])`	Feed PCM, returns the next available closed segment (or None)
`finish()`	Mark end-of-stream; returns the trailing segment if one was open
`reset()`	Wipe all per-stream state
`pending_segments()`	Number of buffered segments awaiting drain via `push_samples(&[])`

Music vs singing

The bundled FireRedVAD streaming model is trained for voice activity as a binary classifier: vocal sources score high regardless of whether they're speech or singing, while pure instrumental music scores low. In practice this means singing is treated as a positive segment (emitted), pure music is rejected (no segment), and speech behaves as expected. The dedicated 3-class AED model (which separates speech / singing / music explicitly) is non-streaming upstream and is not part of this crate; it would be a separate concern.

Tuning

Options reproduce upstream FireRedStreamVadConfig defaults exactly. To match upstream's four "mode" presets, configure directly:

use core::time::Duration;
use firered_vad::VadOptions;

// "Permissive" preset (upstream mode 1):
let opts = VadOptions::new()
    .with_speech_threshold(0.5)
    .with_min_speech_duration(Duration::from_millis(100))
    .with_min_silence_duration(Duration::from_millis(150));

// "Aggressive" — threshold 0.7, min_speech 150 ms, min_silence 100 ms
// "Very aggressive" — threshold 0.9, min_speech 200 ms, min_silence 50 ms
// "Very permissive" — threshold 0.3, min_speech 80 ms, min_silence 200 ms

Features

Feature	Default	What it does
`bundled`	yes	Embed the ONNX model + CMVN as `BUNDLED_MODEL` / `BUNDLED_CMVN` constants
`serde`	no	`Serialize` / `Deserialize` for `VadOptions` and `SessionOptions`; Duration fields use `humantime-serde`
`coreml`, `directml`, `cuda`, `rocm`, `tensorrt`, `openvino`	no	Pass-through to `ort` for the matching execution provider

Parity status

Bit-for-bit parity with upstream Python's StreamVadPostprocessor is the design contract. The v1 verification rests on:

The integration test (tests/integration_test.rs::pushing_samples_in_arbitrary_chunks_yields_identical_event_stream) — proves the streaming pipeline is deterministic across chunk sizes.
Hand-derived state-machine unit tests in src/detector.rs::tests.
Empirical model contract verification at construction time (ONNX I/O shapes).

A per-frame numerical parity harness against the upstream Python reference (planned for tests/parity/) is deferred post-v1.

License

Dual-licensed under MIT or Apache-2.0, at your option. The bundled FireRedVAD model and CMVN stats are Apache-2.0; see THIRD_PARTY_NOTICES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
benches		benches
examples		examples
models		models
src		src
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
build.rs		build.rs
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

firered-vad

Introduction

Installation

Examples

API at a glance

Music vs singing

Tuning

Features

Parity status

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

firered-vad

Introduction

Installation

Examples

API at a glance

Music vs singing

Tuning

Features

Parity status

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages