Convo Mode is Pria’s live two-way voice surface. Audio streams from the microphone to a speech-to-text engine, through your selected LLM, and back out as spoken audio — with optional animated avatars on top. This page covers the Admin-side setup. For the end-user experience see the user-guide page on Convo Mode.Documentation Index
Fetch the complete documentation index at: https://docs.praxis-ai.com/llms.txt
Use this file to discover all available pages before exploring further.
What Convo Mode Is
Once enabled, users see a microphone affordance in the Pria interface that opens a live audio session. The session runs entirely browser-to-provider after Pria mints a short-lived session token — your long-lived API keys never reach the browser. What a session does end-to-end:- Captures microphone audio and streams it to the provider.
- Streams partial transcripts back to the UI (optional).
- Runs your assistant’s prompt, tools, RAG, and personalization on every turn.
- Speaks the response back through the provider’s TTS voice.
- Optionally renders an animated avatar synced to the audio.
Audio frames never traverse the Pria backend in steady state. Pria brokers the session token, then the browser opens a direct connection to the voice provider.
Who Can Use Convo Mode
Two toggles on the Digital Twin gate access.Enable Convo Mode (rtEnabled)
Enable Convo Mode (rtEnabled)
Master switch for the entire Digital Twin. When off, the microphone affordance is hidden from every user. Default: off.
Admin-only (rtAdminOnly)
Admin-only (rtAdminOnly)
When on, only Admin users see Convo Mode. Useful while you’re testing a new provider, tuning voices, or troubleshooting an avatar setup before exposing it to learners. Default: on — flip to off once you’re ready for everyone.
Text input alongside voice (rtTextInputEnabled)
Text input alongside voice (rtTextInputEnabled)
Lets users type into the Convo Mode panel as well as speak. Helpful for noisy environments or accessibility.
Show running transcript (rtTranscriptEnabled)
Show running transcript (rtTranscriptEnabled)
Renders the live STT transcript inside the Convo Mode widget so users can see what was heard.
Choosing a Provider
Pick a provider by setting the Realtime Model field on the Digital Twin. The model string determines which provider Pria routes to.OpenAI Realtime
Lowest latency, broadest model coverage, built-in OpenAI voices. The default choice for most Digital Twins. Uses WebRTC.
ElevenLabs
Premium voices via the ElevenLabs ConvAI agent bridge. Use when voice quality is the priority and your agent is already configured in the ElevenLabs dashboard.
Gemini Live
Google’s audio-native session with thinking support. Uses WebSocket; 30+ prebuilt voices.
xAI Realtime
Grok voice with five built-in voices (eve, ara, rex, sal, leo). Uses WebSocket.
Anam Avatar
Animated avatar driven by your selected LLM. Anam owns mic, STT, TTS, and video; Pria supplies the assistant turn text — so your prompts, RAG, and tools still run on Pria.
LemonSlice Avatar
Legacy avatar path via Daily.co. Prefer Anam for new setups; LemonSlice is maintained while existing tenants migrate.
Per-Provider Configuration
OpenAI Realtime
OpenAI Realtime
| Field | Purpose |
|---|---|
| Realtime Model | A gpt-realtime-* or gpt-4o-realtime-* model from the catalog. |
| Voice | One of the OpenAI realtime voices (e.g. marin, cedar). Invalid voices fall back to marin. |
| VAD Eagerness | low / medium / high — how aggressively the model decides the user has finished speaking. |
| Noise Reduction | near_field for headsets / close mics, far_field for laptops / conference rooms. Leave blank to disable. |
| Transcription Language | Optional hint (e.g. en, fr-FR) — improves STT accuracy when your users speak a single non-English language. |
openai_api_key → platform fallback.ElevenLabs
ElevenLabs
| Field | Purpose |
|---|---|
| Realtime Model | Set to elevenlabs. |
| Agent ID | The ConvAI agent ID from your ElevenLabs dashboard. The agent itself defines the voice, prompt, and tool surface. |
| Connection Method | webrtc or websocket (default). |
| API Key | Per-tenant ElevenLabs key. Falls back to platform default. |
Gemini Live
Gemini Live
| Field | Purpose |
|---|---|
| Realtime Model | A gemini-* realtime model (e.g. gemini-2.5-flash-native-audio-preview-12-2025). |
| Gemini Voice | Puck, Charon, Kore, Fenrir, or Aoede. |
| Gemini API Key | Per-tenant key. Falls back to platform default. |
xAI Realtime
xAI Realtime
| Field | Purpose |
|---|---|
| Realtime Model | A grok-* model (e.g. grok-3-fast). |
| xAI Voice | eve, ara, rex, sal, or leo. |
| xAI API Key | Per-tenant key. Falls back to platform default. |
grok-fast — this is a current model limitation on xAI’s side.Anam Avatar
Anam Avatar
| Field | Purpose |
|---|---|
| Realtime Model | Set to anam_pria_custom_llm. |
| Avatar ID | Vendor avatar identifier from Anam. |
| Voice ID | Vendor voice identifier from Anam. |
| Placeholder Image URL | Shown while the video stream is loading. |
| Loading Video URL | Optional looping animation while a turn is in flight. |
| Intro Message | Optional line the avatar speaks on session start. |
| Conversation Model | Optional override for the LLM that powers voice turns. Leave blank to use the Digital Twin’s default conversation model. |
| Anam API Key | Per-tenant key. Falls back to platform default. |
LemonSlice Avatar
LemonSlice Avatar
| Field | Purpose |
|---|---|
| Realtime Model | Set to lemonslice. |
| Agent ID | LemonSlice agent identifier. |
| Model | Optional per-tenant model override for voice turns. |
| Placeholder Image URL | Shown before the video stream starts. |
| Loading Video URL | Optional loop while turns are in flight. |
| Intro Message | Optional line spoken on session start. |
| Allow Imagine | When on, lets the avatar generate images mid-session via the imagination tool. |
| LemonSlice API Key | Per-tenant key. Falls back to platform default. |
Voice Activity Detection (VAD)
VAD is how the provider decides when a user has finished speaking. Two knobs apply to OpenAI Realtime:- VAD Eagerness —
lowkeeps the user talking longer between turns (good for thoughtful conversations);highcuts in faster (good for quick Q&A drills). - Noise Reduction —
near_fieldcleans up headset audio;far_fieldcleans up laptop mics in rooms with background noise. Leave blank if your users are on quality hardware.
Transcription Language
If your users speak a single non-English language, set the Transcription Language field (e.g.fr, es, de, ja). It biases the STT engine and reduces transcription errors. Leave blank for English or multilingual rooms.
This setting applies to OpenAI Realtime only.
Avatar Imagination Prompts
LemonSlice’s Allow Imagine toggle lets users ask the avatar to generate images during a voice session. The image is produced via your Digital Twin’s Image Generation model and shown alongside the avatar. Disable if you don’t want generated images appearing during voice conversations. Anam supports tools (including image generation) through the per-turn voice handler — no separate toggle needed.Testing Your Setup
Keep Convo Mode admin-only while testing
Leave Admin-only on while you tune voice, VAD, and avatar settings. You’ll be the only one who sees the microphone affordance.
Verify the basics
Confirm: the avatar (if any) appears, your voice is transcribed, the assistant responds in the chosen voice, and your assistant’s tools still fire mid-conversation.
Tune VAD if needed
If the assistant interrupts users mid-thought, drop VAD Eagerness to
low. If responses feel sluggish, bump it to high.Cost Considerations
Convo Mode is billed per minute by every provider. Costs vary widely:- OpenAI Realtime and Gemini Live are billed on input/output audio tokens per minute by the provider.
- ElevenLabs charges per character of TTS output plus session minutes.
- xAI Realtime is billed on audio minutes.
- Anam and LemonSlice add an avatar surcharge on top of the underlying LLM cost.