SDK Reference documentation | Package (PyPI)
This folder contains Python samples demonstrating how to build real-time voice assistants using Azure AI Speech VoiceLive service. Each sample is self-contained for easy understanding and deployment.
Demonstrates the new Voice Live + Foundry Agent v2 workflow, including creating a Voice Live-configured agent and running an agent-connected voice assistant.
Key Features:
- Agent creation utility with Voice Live metadata chunking
- New SDK-based agent session configuration (
AgentSessionConfig) - Proactive greeting and barge-in handling
- Conversation logging
Demonstrates MCP (Model Context Protocol) server integration with Voice Live, enabling the assistant to use remote tools (DeepWiki, Azure Docs) during voice conversations.
Key Features:
- MCP server definitions using
MCPServermodel objects - MCP tool discovery, execution, and failure event handling
- Interactive console-based approval flow for sensitive tools
- MCP call result processing with automatic response creation
- Conversation logging to timestamped log files
Demonstrates connecting to an Azure AI Foundry agent for voice conversations. The agent handles model selection, instructions, and tools, with support for proactive greetings.
Key Features:
- Azure AI Foundry agent integration
- Proactive greeting support
- Azure authentication (required)
- Agent-managed tools and instructions
Demonstrates direct integration with VoiceLive models for voice conversations without agent overhead.
Key Features:
- Direct model access
- Flexible authentication (API key or Azure credentials)
- Custom instructions support
- Model selection options
Demonstrates direct integration with VoiceLive using bring-your-own-models from Foundry.
Key Features:
- Bring-Your-Own-Model Integration: Connects direct to a self hosted model
- Proactive Greeting: Agent initiates the conversation with a welcome message
- Custom Instructions: Define your own system instructions for the AI
- Flexible Authentication: Supports both API key and Azure credential authentication
Demonstrates how to implement function calling with VoiceLive models, enabling the AI to execute custom functions during conversations.
Key Features:
- Custom function definitions
- Real-time function execution
- Function result handling
- Advanced tool integration
- Proactive greeting support
Demonstrates how to build a real-time voice assistant with Retrieval-Augmented Generation (RAG) capabilities using Azure AI Voice Live API and Azure AI Search.
Key Features:
- Real-time speech-to-speech interaction powered by Voice live
- RAG integration with Azure AI Search for document retrieval
- Full-stack architecture (React/TypeScript frontend + FastAPI backend)
- Azure AI Foundry Agent Service integration
- Production-ready
azddeployment to Azure Container Apps
A Dockerized sample demonstrating Azure Voice Live API with avatar integration, with the Voice Live SDK running entirely on the server side (Python/FastAPI) while the browser handles UI, audio capture/playback, and avatar video rendering.
Key Features:
- Avatar-enabled voice conversations with server-side SDK
- Prebuilt, custom, and photo avatar character support
- WebRTC and WebSocket avatar output modes
- Live scene settings adjustment for photo avatars
- Proactive greeting and barge-in support
- Barge-in support for natural conversation interruption
- Docker-based deployment
- Azure Container Apps deployment guide
- Developer mode for debugging
All samples require:
- Python 3.8+
- Audio input/output devices (microphone and speakers)
- Azure subscription - Create one for free
Depending on which sample you want to run:
For Agent Quickstart and Agents New Quickstart:
- Azure AI Foundry project with a deployed agent
- Azure CLI for authentication
For Model Quickstart, BYOM Quickstart, and Function Calling:
- AI Foundry resource
- API key or Azure CLI for authentication
-
Navigate to the quickstarts folder:
cd python/voice-live-quickstarts -
Create a virtual environment (recommended):
python -m venv .venv # On Windows .venv\Scripts\activate # On Linux/macOS source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
- Copy
.env_sampleto.env - Update
.envwith your Azure credentials
- Copy
-
Run a sample:
# New v2 agent quickstart python AgentsNewQuickstart/voice-live-with-agent-v2.py # or create an agent configured for Voice Live python AgentsNewQuickstart/create_agent_v2_with_voicelive.py # or classic agent quickstart python agents-quickstart.py # or python model-quickstart.py # or python bring-your-own-model-quickstart.py # or python function-calling-quickstart.py
Agent Quickstart and Agents New Quickstart require Azure authentication:
az login
python agents-quickstart.py
# or
python AgentsNewQuickstart/voice-live-with-agent-v2.pyModel Quickstart, BYOM Quickstart, and Function Calling support both methods:
# With API key (from .env file)
python model-quickstart.py
# or
python bring-your-own-model-quickstart.py
# With Azure credentials
az login
python model-quickstart.py --use-token-credential
# or
python bring-your-own-model-quickstart.py --use-token-credentialAll samples use a .env file for configuration. Copy .env_sample to .env and update with your values:
AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_PROJECT_NAME=your-project-name
AZURE_VOICELIVE_AGENT_ID=asst_your-agent-id
AZURE_VOICELIVE_API_VERSION=2025-10-01
# AZURE_VOICELIVE_API_KEY not needed for agents (Azure auth only)
AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_API_KEY=your-api-key
AZURE_VOICELIVE_API_VERSION=2025-10-01
AZURE_VOICELIVE_ENDPOINT=https://your-endpoint.services.ai.azure.com/
AZURE_VOICELIVE_API_KEY=your-api-key
AZURE_VOICELIVE_API_VERSION=2025-10-01
All samples demonstrate:
- Real-time Voice: Bidirectional audio streaming for natural conversations
- Audio Processing: Microphone capture and speaker playback using PyAudio
- Interruption Handling: Support for natural turn-taking in conversations
- Resource Management: Proper cleanup of connections and audio resources
- Async/Await: Modern Python async programming patterns
Popular neural voice options include:
en-US-AvaNeural- Female, conversationalen-US-AndrewNeural- Male, conversationalen-US-JennyNeural- Female, friendlyen-US-GuyNeural- Male, professionalen-US-AriaNeural- Female, cheerfulen-US-DavisNeural- Male, calm
See the Azure Neural Voice Gallery for all available voices.
User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Foundry Agent
↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Agent Response
User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Model (GPT-4o)
↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Model Response
User Voice → Microphone → PyAudio → VoiceLive SDK → Azure AI Model
↓
Function Call Request
↓
Execute Python Function
↓
Function Result → Model
↓
User Hears ← Speakers ← PyAudio ← VoiceLive SDK ← Enhanced Response
- No audio input/output: Verify your microphone and speakers are working and set as default devices
- PyAudio installation errors:
- On Windows: Install via
pip install pyaudio - On Linux:
sudo apt-get install python3-pyaudioorpip install pyaudio - On macOS:
brew install portaudio && pip install pyaudio
- On Windows: Install via
- Audio device busy: Close other applications using your audio devices (e.g., Teams, Zoom)
- Poor audio quality: Update your audio drivers to the latest version
- 401 Unauthorized:
- For API key: Verify
AZURE_VOICELIVE_API_KEYin your.envfile - For Azure auth: Run
az loginto authenticate with Azure CLI
- For API key: Verify
- Agent not found (Agent sample): Check your agent ID format (should be
asst_xxxxx) and project name - Token credential fails: Ensure Azure CLI is installed and you're logged in
- Insufficient permissions (Agent sample): Verify your Azure account has access to the AI Foundry project
- Endpoint errors: Verify your endpoint URL format in
.env:https://your-endpoint.services.ai.azure.com/ - WebSocket timeout: Check your network connection and firewall settings
- Certificate errors: Ensure your system certificates are up to date
- Model not available (Model/Function samples): Verify your Speech resource has VoiceLive enabled
- Module not found: Run
pip install -r requirements.txtto install dependencies - Python version: Verify Python 3.8 or later is installed:
python --version - Virtual environment: Use a virtual environment to avoid package conflicts
- Import errors: Ensure you're in the correct directory and virtual environment is activated
All samples support these options (use --help for full details):
--endpoint: Azure VoiceLive endpoint URL--voice: Voice for the assistant (default varies by sample)--verboseor-v: Enable detailed logging
Agent-specific options:
--agent-id: Azure AI Foundry agent ID--project-name: Azure AI Foundry project name
Model/Function-specific options:
--api-key: Azure VoiceLive API key--model: VoiceLive model to use--use-token-credential: Use Azure authentication instead of API key
The samples use the following Python packages (defined in requirements.txt):
azure-ai-voicelive[aiohttp]
pyaudio
python-dotenv
azure-identity
Install all dependencies with:
pip install -r requirements.txtWe welcome contributions! Please see the Support Guide for details on how to contribute.
This project is licensed under the MIT License - see the LICENSE file for details.