Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praxis-ai.com/llms.txt

Use this file to discover all available pages before exploring further.

Pria’s Convo Mode lets users talk to a Digital Twin in real time — speech in, speech (and optionally an animated avatar) back out. Under the hood, several different vendors can power that experience. This page compares them so you can pick the right one for each Digital Twin.
You don’t have to pick just one. Different Digital Twins can use different voice providers, and individual users can override the default voice in their Instance Settings.

What Realtime Voice Is

Realtime voice is a live, two-way audio conversation between a user and the Digital Twin. Unlike recording a clip and waiting for a transcript, realtime voice streams audio in both directions:
User speaks → streaming STT → LLM → streaming TTS → User hears
A few vendors (Gemini Live, OpenAI Realtime, xAI Realtime) process audio natively in a single model pass, which produces lower latency and more natural prosody. Others (ElevenLabs) chain best-in-class STT + LLM + TTS components for premium voice quality. Avatar providers (Anam, LemonSlice) add an animated, lip-synced video on top of the voice. In all cases the audio plane runs directly between the browser and the provider after Pria mints a short-lived session token, so audio never bottlenecks through Pria’s servers.

Provider Matrix

ProviderBest ForAudio QualityLatencyAvatarVoices
OpenAI RealtimeBroad coverage, lowest latencyHighLowestNoMultiple stock voices
ElevenLabsPremium voice, brand personaHighestLowNoHundreds + cloned
Gemini LiveMultilingual, thinking modelsHighLowNo30+ named voices
xAI RealtimeGrok models, conversationalHighLowNo5 (eve, ara, rex, sal, leo)
Anam AvatarEngagement, demos, trainingHighLowYesAnam voice catalog
LemonSliceAvatar (legacy — prefer Anam)HighMediumYesLemonSlice catalog
LemonSlice is supported for backward compatibility. For new avatar setups, choose Anam.

Choosing a Provider

Recommendation: Gemini Live or OpenAI Realtime.Native audio models give the lowest latency, which matters most for back-and-forth dialogue with learners. Gemini Live shines for multilingual classrooms; OpenAI Realtime has the broadest model coverage and stock voices.
Recommendation: ElevenLabs.Premium voice quality is what customers remember. ElevenLabs also supports a published voice agent that can be embedded in support widgets via the Chat Completions API bridge.
Recommendation: Anam Avatar.A lifelike animated avatar dramatically raises perceived presence — useful for product demos, role-play training, and onboarding videos. Pair it with your conversation model of choice for the brains, and Anam handles voice + lipsync + video.
Recommendation: Anam Avatar or OpenAI Realtime with transcripts enabled.For users who benefit from a visible speaking face (lip-reading, comprehension support), Anam adds an avatar. For audio-only accessibility, enable the live transcript so users can read along.
Recommendation: OpenAI Realtime or Gemini Live.Native-audio models handle code-heavy content and technical terms well. Avatar providers can struggle to render code snippets in spoken form.

Tradeoffs

Voice quality is highest with ElevenLabs (their TTS is the benchmark) and the avatar providers when paired with a quality voice. Native-audio models (Gemini Live, OpenAI Realtime, xAI Realtime) sound natural and conversational but don’t expose voice cloning. Latency is lowest with OpenAI Realtime via WebRTC, followed closely by Gemini Live and xAI Realtime over WebSocket. ElevenLabs adds a small additional hop because Pria acts as the brain via Custom LLM. Avatar providers add video rendering on top. Cost depends on whether you bring your own vendor key (BYO) or use Pria-included credits. Avatars in particular are billed per minute on top of LLM costs, so it’s worth picking the provider where the avatar matters rather than enabling it universally.

Where Avatars Matter

Avatars are not always the right answer — they add cost and a layer of visual polish that can distract from purely informational use cases. Consider an avatar when:
  • Sales demos — a face on screen makes the product feel “alive” and quotable in marketing.
  • Training & onboarding — embodied presence raises engagement and retention for long-form content.
  • Accessibility — visible mouth movement supports lip-reading and comprehension.
  • Marketing landing pages — a greeter avatar is more memorable than a chat widget.
Skip the avatar for fast Q&A, code help, or text-heavy conversations where users want to read along.

Setup Overview

Each provider has its own setup page with detailed steps:

ElevenLabs Voice Agent

Connect an ElevenLabs ConvAI agent to Pria as a Custom LLM, deploy embeddable widgets.

Gemini Live Voice

Google’s native-audio WebSocket API for low-latency multilingual voice.

Anam Avatar

Lifelike animated avatar driven by your choice of conversation model.

OpenAI / xAI Realtime

Configure built-in realtime providers from the Admin guide.
At a high level, every setup involves three steps:
  1. Get a vendor API key — bring your own from the provider, or contact the Praxis AI team at humans@praxis-ai.com to request access to a managed key.
  2. Configure the Digital Twin — Admin → Configuration tab → paste the API key (and any agent / avatar IDs) into the provider’s section.
  3. Select the voice provider in Personalization — Admin → Personalization tab → set the Convo Mode voice provider to the vendor you configured, then pick a default voice.

Multi-Provider Strategy

Pria does not force a single voice provider per institution. Common patterns:
  • Default at the Digital Twin level, override per assistant. Set ElevenLabs as the institution default for branded support, but configure a specific Sales assistant to use Anam for demos.
  • Different Digital Twins, different providers. A K-12 tutoring Twin uses Gemini Live for multilingual support; a corporate training Twin uses Anam for an avatar-led experience.
  • Per-user voice override. Users can pick a different voice from the provider’s catalog in their Instance Settings without changing the provider.
This flexibility lets you optimize for the right tradeoff per use case rather than locking the whole institution into one vendor’s strengths and weaknesses.

Per-User Voice Preferences

Each user can override the default voice from their Instance Settings panel. The override applies only to the user and only for that Digital Twin. The voice provider (the vendor) cannot be changed by users — that’s an admin decision — but the specific voice within the active provider’s catalog is user-selectable. This is useful when:
  • Different team members have different voice preferences for accessibility or focus.
  • Users want a male/female/neutral voice without affecting the rest of the institution.
  • A user is testing voices before recommending a new default to their admin.