Voice & Realtime Providers

Pria’s Convo Mode lets users talk to a Digital Twin in real time — speech in, speech (and optionally an animated avatar) back out. Under the hood, several different vendors can power that experience. This page compares them so you can pick the right one for each Digital Twin.

You don’t have to pick just one. Different Digital Twins can use different voice providers, and individual users can override the default voice in their Instance Settings.

What Realtime Voice Is

Realtime voice is a live, two-way audio conversation between a user and the Digital Twin. Unlike recording a clip and waiting for a transcript, realtime voice streams audio in both directions:

User speaks → streaming STT → LLM → streaming TTS → User hears

A few vendors (Gemini Live, OpenAI Realtime, xAI Realtime) process audio natively in a single model pass, which produces lower latency and more natural prosody. Others (ElevenLabs) chain best-in-class STT + LLM + TTS components for premium voice quality. Avatar providers (Anam, LemonSlice) add an animated, lip-synced video on top of the voice. In all cases the audio plane runs directly between the browser and the provider after Pria mints a short-lived session token, so audio never bottlenecks through Pria’s servers.

Provider Matrix

Provider	Best For	Audio Quality	Latency	Avatar	Voices
OpenAI Realtime	Broad coverage, lowest latency	High	Lowest	No	Multiple stock voices
ElevenLabs	Premium voice, brand persona	Highest	Low	No	Hundreds + cloned
Gemini Live	Multilingual, thinking models	High	Low	No	30+ named voices
xAI Realtime	Grok models, conversational	High	Low	No	5 (eve, ara, rex, sal, leo)
Anam Avatar	Engagement, demos, training	High	Low	Yes	Anam voice catalog
LemonSlice	Avatar (legacy — prefer Anam)	High	Medium	Yes	LemonSlice catalog

LemonSlice is supported for backward compatibility. For new avatar setups, choose Anam.

Choosing a Provider

Classroom / tutoring

Recommendation: Gemini Live or OpenAI Realtime.Native audio models give the lowest latency, which matters most for back-and-forth dialogue with learners. Gemini Live shines for multilingual classrooms; OpenAI Realtime has the broadest model coverage and stock voices.

Customer support

Recommendation: ElevenLabs.Premium voice quality is what customers remember. ElevenLabs also supports a published voice agent that can be embedded in support widgets via the Chat Completions API bridge.

Sales demos / training

Recommendation: Anam Avatar.A lifelike animated avatar dramatically raises perceived presence — useful for product demos, role-play training, and onboarding videos. Pair it with your conversation model of choice for the brains, and Anam handles voice + lipsync + video.

Accessibility

Recommendation: Anam Avatar or OpenAI Realtime with transcripts enabled.For users who benefit from a visible speaking face (lip-reading, comprehension support), Anam adds an avatar. For audio-only accessibility, enable the live transcript so users can read along.

Coding assistant or technical content

Recommendation: OpenAI Realtime or Gemini Live.Native-audio models handle code-heavy content and technical terms well. Avatar providers can struggle to render code snippets in spoken form.

Tradeoffs

Voice quality is highest with ElevenLabs (their TTS is the benchmark) and the avatar providers when paired with a quality voice. Native-audio models (Gemini Live, OpenAI Realtime, xAI Realtime) sound natural and conversational but don’t expose voice cloning. Latency is lowest with OpenAI Realtime via WebRTC, followed closely by Gemini Live and xAI Realtime over WebSocket. ElevenLabs adds a small additional hop because Pria acts as the brain via Custom LLM. Avatar providers add video rendering on top. Cost depends on whether you bring your own vendor key (BYO) or use Pria-included credits. Avatars in particular are billed per minute on top of LLM costs, so it’s worth picking the provider where the avatar matters rather than enabling it universally.

Where Avatars Matter

Avatars are not always the right answer — they add cost and a layer of visual polish that can distract from purely informational use cases. Consider an avatar when:

Sales demos — a face on screen makes the product feel “alive” and quotable in marketing.
Training & onboarding — embodied presence raises engagement and retention for long-form content.
Accessibility — visible mouth movement supports lip-reading and comprehension.
Marketing landing pages — a greeter avatar is more memorable than a chat widget.

Skip the avatar for fast Q&A, code help, or text-heavy conversations where users want to read along.

You can also put an Anam-powered avatar on an external website — visitors talk to it without signing in to Pria. See the Avatar Embed Widget.

Setup Overview

Each provider has its own setup page with detailed steps:

ElevenLabs Voice Agent

Connect an ElevenLabs ConvAI agent to Pria as a Custom LLM, deploy embeddable widgets.

Gemini Live Voice

Google’s native-audio WebSocket API for low-latency multilingual voice.

Anam Avatar

Lifelike animated avatar driven by your choice of conversation model.

OpenAI / xAI Realtime

Configure built-in realtime providers from the Admin guide.

At a high level, every setup involves three steps:

Get a vendor API key — bring your own from the provider, or contact the Praxis AI team at humans@praxis-ai.com to request access to a managed key.
Configure the Digital Twin — Admin → Configuration tab → paste the API key (and any agent / avatar IDs) into the provider’s section.
Select the voice provider in Personalization — Admin → Personalization tab → set the Convo Mode voice provider to the vendor you configured, then pick a default voice.

Multi-Provider Strategy

Pria does not force a single voice provider per institution. Common patterns:

Default at the Digital Twin level, override per assistant. Set ElevenLabs as the institution default for branded support, but configure a specific Sales assistant to use Anam for demos.
Different Digital Twins, different providers. A K-12 tutoring Twin uses Gemini Live for multilingual support; a corporate training Twin uses Anam for an avatar-led experience.
Per-user voice override. Users can pick a different voice from the provider’s catalog in their Instance Settings without changing the provider.

This flexibility lets you optimize for the right tradeoff per use case rather than locking the whole institution into one vendor’s strengths and weaknesses.

Per-User Voice Preferences

Each user can override the default voice from their Instance Settings panel. The override applies only to the user and only for that Digital Twin. The voice provider (the vendor) cannot be changed by users — that’s an admin decision — but the specific voice within the active provider’s catalog is user-selectable. This is useful when:

Different team members have different voice preferences for accessibility or focus.
Users want a male/female/neutral voice without affecting the rest of the institution.
A user is testing voices before recommending a new default to their admin.

ElevenLabs Voice Agent — Premium voice via ConvAI + Custom LLM bridge
Gemini Live Voice — Google native-audio WebSocket API
Anam Avatar — Lifelike animated avatar for Convo Mode
Avatar Embed Widget — Embed the talking avatar on external websites
Realtime Voice & Avatars (Admin Guide) — Per-Digital-Twin configuration
Convo Mode (User Guide) — End-user guide for voice conversations

Admin Guide

Account Management

Instance Settings

AI & Models

Assistants & Prompts

Tools & Connectors

User Management

Analytics & Monitoring

Legal & Compliance

API Reference

Runtime API

Administrator API

Integrations

Authentication

Chat Completions

Instructure Canvas

Web App

MCP Server

Voice & Avatars

Google Workspace

LMS Platforms (LTI 1.3)

Billing & Payments

SDK

Voice & Realtime Providers

What Realtime Voice Is

Provider Matrix

Choosing a Provider

Tradeoffs

Where Avatars Matter

Setup Overview

ElevenLabs Voice Agent

Gemini Live Voice

Anam Avatar

OpenAI / xAI Realtime

Multi-Provider Strategy

Per-User Voice Preferences

​What Realtime Voice Is

​Provider Matrix

​Choosing a Provider

​Tradeoffs

​Where Avatars Matter

​Setup Overview

ElevenLabs Voice Agent

Gemini Live Voice

Anam Avatar

OpenAI / xAI Realtime

​Multi-Provider Strategy

​Per-User Voice Preferences

​Related

What Realtime Voice Is

Provider Matrix

Choosing a Provider

Tradeoffs

Where Avatars Matter

Setup Overview

Multi-Provider Strategy

Per-User Voice Preferences

Related