Skip to content

plandem/silero-go

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

silero-go

Lightweight Go wrapper around the Silero Voice Activity Detector using ONNX Runtime and miniaudio.

This library enables real-time or file-based voice activity detection (VAD) in Go. It wraps the Silero VAD model and provides microphone input/output and WAV file integration with native audio processing.


Features

  • 🧠 ONNX-based Silero VAD inference
  • 🎙️ Real-time microphone speech detection
  • 📁 WAV file-based speech detection
  • 🔊 Microphone and WAV playback support
  • 🔄 Optional WAV conversion (resample, format, channels)

Requirements

  • Go 1.20+
  • ONNX Runtime installed on your system
  • Silero VAD model .onnx file (see below)

Install ONNX Runtime

This library uses yalue/onnxruntime_go.
Please follow its installation instructions to correctly set up ONNX Runtime.


Getting Started

Install

go get github.com/plandem/silero-go

Download Model or use included one

mkdir -p models
wget -O models/silero_vad.onnx https://github.com/snakers4/silero-vad/files/7603706/silero_vad.onnx

Usage Overview

1. Initialize ONNX Runtime

import "github.com/plandem/silero-go/onnx"

onnx.Init("/path/to/libonnxruntime.so")
defer onnx.Destroy()

2. Create Model and Detector

import (
  "github.com/plandem/silero-go/vad"
  "time"
)

model, _ := vad.NewModel(16000, "models/silero_vad.onnx")
detector, _ := vad.NewDetector(model, vad.Config{
  SpeechThreshold: 0.5,
  MinSilence:      100 * time.Millisecond,
  SpeechPad:       30 * time.Millisecond,
}, func(start, end vad.SampleOffset) {
  // Called on speech start/end
})
defer detector.Destroy()

3. Detect from Audio Source

You can run detection from:

  • Microphone input (via miniaudio.Capture)
  • WAV files (any reader implementing io.Reader with raw PCM samples)
detector.ReadFrom(myInputSource)

Audio Format Requirements

To ensure compatibility with the Silero model:

  • Sample Rate: 16kHz
  • Channels: Mono (1)
  • Format: 32-bit float PCM

Use a converter if your audio input doesn't match. This library includes minimal utilities for conversion using malgo.


License

MIT License © 2025 Andrei Gaivoronskii

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages