Pria’s Convo Mode lets users talk to a Digital Twin in real time — speech in, speech (and optionally an animated avatar) back out. Under the hood, several different vendors can power that experience. This page compares them so you can pick the right one for each Digital Twin.Documentation Index
Fetch the complete documentation index at: https://docs.praxis-ai.com/llms.txt
Use this file to discover all available pages before exploring further.
What Realtime Voice Is
Realtime voice is a live, two-way audio conversation between a user and the Digital Twin. Unlike recording a clip and waiting for a transcript, realtime voice streams audio in both directions:Provider Matrix
| Provider | Best For | Audio Quality | Latency | Avatar | Voices |
|---|---|---|---|---|---|
| OpenAI Realtime | Broad coverage, lowest latency | High | Lowest | No | Multiple stock voices |
| ElevenLabs | Premium voice, brand persona | Highest | Low | No | Hundreds + cloned |
| Gemini Live | Multilingual, thinking models | High | Low | No | 30+ named voices |
| xAI Realtime | Grok models, conversational | High | Low | No | 5 (eve, ara, rex, sal, leo) |
| Anam Avatar | Engagement, demos, training | High | Low | Yes | Anam voice catalog |
| LemonSlice | Avatar (legacy — prefer Anam) | High | Medium | Yes | LemonSlice catalog |
LemonSlice is supported for backward compatibility. For new avatar setups, choose Anam.
Choosing a Provider
Classroom / tutoring
Classroom / tutoring
Recommendation: Gemini Live or OpenAI Realtime.Native audio models give the lowest latency, which matters most for back-and-forth dialogue with learners. Gemini Live shines for multilingual classrooms; OpenAI Realtime has the broadest model coverage and stock voices.
Customer support
Customer support
Recommendation: ElevenLabs.Premium voice quality is what customers remember. ElevenLabs also supports a published voice agent that can be embedded in support widgets via the Chat Completions API bridge.
Sales demos / training
Sales demos / training
Recommendation: Anam Avatar.A lifelike animated avatar dramatically raises perceived presence — useful for product demos, role-play training, and onboarding videos. Pair it with your conversation model of choice for the brains, and Anam handles voice + lipsync + video.
Accessibility
Accessibility
Recommendation: Anam Avatar or OpenAI Realtime with transcripts enabled.For users who benefit from a visible speaking face (lip-reading, comprehension support), Anam adds an avatar. For audio-only accessibility, enable the live transcript so users can read along.
Coding assistant or technical content
Coding assistant or technical content
Recommendation: OpenAI Realtime or Gemini Live.Native-audio models handle code-heavy content and technical terms well. Avatar providers can struggle to render code snippets in spoken form.
Tradeoffs
Voice quality is highest with ElevenLabs (their TTS is the benchmark) and the avatar providers when paired with a quality voice. Native-audio models (Gemini Live, OpenAI Realtime, xAI Realtime) sound natural and conversational but don’t expose voice cloning. Latency is lowest with OpenAI Realtime via WebRTC, followed closely by Gemini Live and xAI Realtime over WebSocket. ElevenLabs adds a small additional hop because Pria acts as the brain via Custom LLM. Avatar providers add video rendering on top. Cost depends on whether you bring your own vendor key (BYO) or use Pria-included credits. Avatars in particular are billed per minute on top of LLM costs, so it’s worth picking the provider where the avatar matters rather than enabling it universally.Where Avatars Matter
Avatars are not always the right answer — they add cost and a layer of visual polish that can distract from purely informational use cases. Consider an avatar when:- Sales demos — a face on screen makes the product feel “alive” and quotable in marketing.
- Training & onboarding — embodied presence raises engagement and retention for long-form content.
- Accessibility — visible mouth movement supports lip-reading and comprehension.
- Marketing landing pages — a greeter avatar is more memorable than a chat widget.
Setup Overview
Each provider has its own setup page with detailed steps:ElevenLabs Voice Agent
Connect an ElevenLabs ConvAI agent to Pria as a Custom LLM, deploy embeddable widgets.
Gemini Live Voice
Google’s native-audio WebSocket API for low-latency multilingual voice.
Anam Avatar
Lifelike animated avatar driven by your choice of conversation model.
OpenAI / xAI Realtime
Configure built-in realtime providers from the Admin guide.
- Get a vendor API key — bring your own from the provider, or contact the Praxis AI team at humans@praxis-ai.com to request access to a managed key.
- Configure the Digital Twin — Admin → Configuration tab → paste the API key (and any agent / avatar IDs) into the provider’s section.
- Select the voice provider in Personalization — Admin → Personalization tab → set the Convo Mode voice provider to the vendor you configured, then pick a default voice.
Multi-Provider Strategy
Pria does not force a single voice provider per institution. Common patterns:- Default at the Digital Twin level, override per assistant. Set ElevenLabs as the institution default for branded support, but configure a specific Sales assistant to use Anam for demos.
- Different Digital Twins, different providers. A K-12 tutoring Twin uses Gemini Live for multilingual support; a corporate training Twin uses Anam for an avatar-led experience.
- Per-user voice override. Users can pick a different voice from the provider’s catalog in their Instance Settings without changing the provider.
Per-User Voice Preferences
Each user can override the default voice from their Instance Settings panel. The override applies only to the user and only for that Digital Twin. The voice provider (the vendor) cannot be changed by users — that’s an admin decision — but the specific voice within the active provider’s catalog is user-selectable. This is useful when:- Different team members have different voice preferences for accessibility or focus.
- Users want a male/female/neutral voice without affecting the rest of the institution.
- A user is testing voices before recommending a new default to their admin.
Related
- ElevenLabs Voice Agent — Premium voice via ConvAI + Custom LLM bridge
- Gemini Live Voice — Google native-audio WebSocket API
- Anam Avatar — Lifelike animated avatar for Convo Mode
- Realtime Voice & Avatars (Admin Guide) — Per-Digital-Twin configuration
- Convo Mode (User Guide) — End-user guide for voice conversations