Skip to content

Commit 8cb55ab

Browse files
committed
rename to Speech Engine
1 parent d637498 commit 8cb55ab

3 files changed

Lines changed: 161 additions & 0 deletions

File tree

.fernignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ src/elevenlabs/webhooks_custom.py
1010
src/elevenlabs/music_custom.py
1111
src/elevenlabs/speech_to_text_custom.py
1212
src/elevenlabs/realtime/
13+
src/elevenlabs/speech_engine/
1314

1415
# Ignore CI files
1516
.github/

README.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,141 @@ client_tools.register("calculate_sum", calculate_sum, is_async=False)
266266
client_tools.register("fetch_data", fetch_data, is_async=True)
267267
```
268268

269+
## Speech Engine
270+
271+
Speech Engine lets you build server-side voice agents that receive real-time transcripts from the ElevenLabs API and stream LLM responses back for text-to-speech synthesis. Your server acts as a WebSocket endpoint — ElevenLabs connects to it, sends user transcripts, and your code decides how to respond.
272+
273+
Speech Engine is async-only and available on `AsyncElevenLabs`.
274+
275+
### Quick Start
276+
277+
```python
278+
import asyncio
279+
from openai import AsyncOpenAI
280+
from elevenlabs import AsyncElevenLabs
281+
282+
openai_client = AsyncOpenAI()
283+
elevenlabs = AsyncElevenLabs()
284+
285+
async def main():
286+
engine = await elevenlabs.speech_engine.get("veng_123")
287+
288+
async def on_transcript(transcript, session):
289+
stream = await openai_client.responses.create(
290+
model="gpt-4o",
291+
input=[
292+
{"role": "assistant" if m.role == "agent" else m.role, "content": m.content}
293+
for m in transcript
294+
],
295+
stream=True,
296+
)
297+
await session.send_response(stream)
298+
299+
async def on_init(conversation_id, session):
300+
print(f"Session started: {conversation_id}")
301+
302+
async def on_close(session):
303+
print(f"Session ended: {session.conversation_id}")
304+
305+
async def on_error(err, session):
306+
print(f"Error: {err}")
307+
308+
await engine.serve(
309+
port=3001,
310+
debug=True,
311+
on_init=on_init,
312+
on_transcript=on_transcript,
313+
on_close=on_close,
314+
on_error=on_error,
315+
)
316+
317+
asyncio.run(main())
318+
```
319+
320+
### How It Works
321+
322+
When `engine.serve()` starts, it opens a WebSocket server on the specified port. For each incoming connection from the ElevenLabs API:
323+
324+
1. An `init` message arrives with a `conversation_id`
325+
2. As the user speaks, `user_transcript` messages arrive with the full conversation history
326+
3. Your `on_transcript` handler generates a response (using any LLM) and calls `session.send_response()`
327+
4. If the user interrupts (speaks again mid-response), the previous handler is automatically cancelled
328+
329+
### Sending Responses
330+
331+
`send_response()` accepts strings or async iterators. LLM stream formats from OpenAI, Anthropic, and Google Gemini are auto-detected:
332+
333+
```python
334+
# Plain string
335+
await session.send_response("Hello world")
336+
337+
# OpenAI stream (auto-parsed)
338+
stream = await openai_client.responses.create(model="gpt-4o", ..., stream=True)
339+
await session.send_response(stream)
340+
341+
# Anthropic stream (auto-parsed)
342+
stream = anthropic_client.messages.stream(model="claude-sonnet-4-20250514", ...)
343+
await session.send_response(stream)
344+
345+
# Any async iterator of strings
346+
async def my_generator():
347+
yield "Hello "
348+
yield "world"
349+
await session.send_response(my_generator())
350+
```
351+
352+
### Interruption Handling
353+
354+
When a new transcript arrives while a previous response is still streaming, the previous handler's `asyncio.Task` is cancelled automatically. Any `await` in your handler (including LLM SDK calls) will raise `asyncio.CancelledError`, which cleanly aborts the in-flight request. No manual signal handling needed.
355+
356+
### Custom Server Integration (FastAPI, Starlette)
357+
358+
For integrating with an existing web server, use `create_session()` instead of `serve()`:
359+
360+
```python
361+
from fastapi import FastAPI, WebSocket
362+
363+
app = FastAPI()
364+
engine = ... # SpeechEngineResource from await client.speech_engine.get(...)
365+
366+
@app.websocket("/api/speech-engine/ws")
367+
async def speech_engine_ws(ws: WebSocket):
368+
await ws.accept()
369+
session = engine.create_session(ws, debug=True)
370+
session.on("user_transcript", handle_transcript)
371+
await session.run()
372+
```
373+
374+
When using `session.on()` directly, handlers receive just the event data (no `session` argument, since you already have the reference):
375+
376+
| Event | Handler signature |
377+
|---|---|
378+
| `"init"` | `async (conversation_id: str) -> None` |
379+
| `"user_transcript"` | `async (transcript: list[ConversationMessage]) -> None` |
380+
| `"close"` | `async () -> None` |
381+
| `"disconnected"` | `async () -> None` |
382+
| `"error"` | `async (error: Exception) -> None` |
383+
384+
### Standalone Server
385+
386+
For full control over the server lifecycle, use `SpeechEngineServer` directly:
387+
388+
```python
389+
from elevenlabs.speech_engine import SpeechEngineServer
390+
391+
server = SpeechEngineServer(
392+
port=3001,
393+
debug=True,
394+
on_transcript=handle_transcript,
395+
)
396+
397+
# In one task:
398+
await server.serve()
399+
400+
# In another task (e.g. signal handler):
401+
await server.stop()
402+
```
403+
269404
## Languages Supported
270405

271406
Explore [all models & languages](https://elevenlabs.io/docs/models).

src/elevenlabs/client.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,24 @@ def __init__(
6464
self._speech_to_text = SpeechToTextClient(client_wrapper=self._client_wrapper)
6565

6666

67+
class _AsyncSpeechEngineAccessor:
68+
"""Stub accessor for speech engine resources.
69+
70+
Will be replaced with a Fern-generated client once CRUD endpoints exist.
71+
"""
72+
73+
def __init__(self, client_wrapper: typing.Any) -> None:
74+
self._client_wrapper = client_wrapper
75+
76+
async def get(self, engine_id: str) -> "SpeechEngineResource":
77+
from .speech_engine.resource import SpeechEngineResource # noqa: E402
78+
79+
return SpeechEngineResource(
80+
engine_id=engine_id,
81+
client_options=self._client_wrapper,
82+
)
83+
84+
6785
class AsyncElevenLabs(AsyncBaseElevenLabs):
6886
"""
6987
Use this class to access the different functions within the SDK. You can instantiate any number of clients with different configuration that will propogate to these functions.
@@ -107,3 +125,10 @@ def __init__(
107125
self._webhooks = AsyncWebhooksClient(client_wrapper=self._client_wrapper)
108126
self._music = AsyncMusicClient(client_wrapper=self._client_wrapper)
109127
self._speech_to_text = AsyncSpeechToTextClient(client_wrapper=self._client_wrapper)
128+
self._speech_engine = None # type: typing.Optional[_AsyncSpeechEngineAccessor]
129+
130+
@property
131+
def speech_engine(self) -> _AsyncSpeechEngineAccessor:
132+
if self._speech_engine is None:
133+
self._speech_engine = _AsyncSpeechEngineAccessor(self._client_wrapper)
134+
return self._speech_engine

0 commit comments

Comments
 (0)