Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praxis-ai.com/llms.txt

Use this file to discover all available pages before exploring further.

Convo Mode is Pria’s live two-way voice surface. Audio streams from the microphone to a speech-to-text engine, through your selected LLM, and back out as spoken audio — with optional animated avatars on top. This page covers the Admin-side setup. For the end-user experience see the user-guide page on Convo Mode.

What Convo Mode Is

Once enabled, users see a microphone affordance in the Pria interface that opens a live audio session. The session runs entirely browser-to-provider after Pria mints a short-lived session token — your long-lived API keys never reach the browser. What a session does end-to-end:
  • Captures microphone audio and streams it to the provider.
  • Streams partial transcripts back to the UI (optional).
  • Runs your assistant’s prompt, tools, RAG, and personalization on every turn.
  • Speaks the response back through the provider’s TTS voice.
  • Optionally renders an animated avatar synced to the audio.
Audio frames never traverse the Pria backend in steady state. Pria brokers the session token, then the browser opens a direct connection to the voice provider.

Who Can Use Convo Mode

Two toggles on the Digital Twin gate access.
Master switch for the entire Digital Twin. When off, the microphone affordance is hidden from every user. Default: off.
When on, only Admin users see Convo Mode. Useful while you’re testing a new provider, tuning voices, or troubleshooting an avatar setup before exposing it to learners. Default: on — flip to off once you’re ready for everyone.
Lets users type into the Convo Mode panel as well as speak. Helpful for noisy environments or accessibility.
Renders the live STT transcript inside the Convo Mode widget so users can see what was heard.

Choosing a Provider

Pick a provider by setting the Realtime Model field on the Digital Twin. The model string determines which provider Pria routes to.

OpenAI Realtime

Lowest latency, broadest model coverage, built-in OpenAI voices. The default choice for most Digital Twins. Uses WebRTC.

ElevenLabs

Premium voices via the ElevenLabs ConvAI agent bridge. Use when voice quality is the priority and your agent is already configured in the ElevenLabs dashboard.

Gemini Live

Google’s audio-native session with thinking support. Uses WebSocket; 30+ prebuilt voices.

xAI Realtime

Grok voice with five built-in voices (eve, ara, rex, sal, leo). Uses WebSocket.

Anam Avatar

Animated avatar driven by your selected LLM. Anam owns mic, STT, TTS, and video; Pria supplies the assistant turn text — so your prompts, RAG, and tools still run on Pria.

LemonSlice Avatar

Legacy avatar path via Daily.co. Prefer Anam for new setups; LemonSlice is maintained while existing tenants migrate.

Per-Provider Configuration

FieldPurpose
Realtime ModelA gpt-realtime-* or gpt-4o-realtime-* model from the catalog.
VoiceOne of the OpenAI realtime voices (e.g. marin, cedar). Invalid voices fall back to marin.
VAD Eagernesslow / medium / high — how aggressively the model decides the user has finished speaking.
Noise Reductionnear_field for headsets / close mics, far_field for laptops / conference rooms. Leave blank to disable.
Transcription LanguageOptional hint (e.g. en, fr-FR) — improves STT accuracy when your users speak a single non-English language.
OpenAI API key resolves in this order: per-model key in Custom Models → Digital Twin’s openai_api_key → platform fallback.
FieldPurpose
Realtime ModelSet to elevenlabs.
Agent IDThe ConvAI agent ID from your ElevenLabs dashboard. The agent itself defines the voice, prompt, and tool surface.
Connection Methodwebrtc or websocket (default).
API KeyPer-tenant ElevenLabs key. Falls back to platform default.
ElevenLabs is the only provider where “agent” is a vendor-side concept — your ElevenLabs agent does the talking. Configure it to call Pria back as a Custom LLM if you want Pria’s prompts, RAG, and tools in the loop.
FieldPurpose
Realtime ModelA gemini-* realtime model (e.g. gemini-2.5-flash-native-audio-preview-12-2025).
Gemini VoicePuck, Charon, Kore, Fenrir, or Aoede.
Gemini API KeyPer-tenant key. Falls back to platform default.
Gemini sessions are capped around 10 minutes per connection by Google; Pria configures a sliding context window so longer conversations keep working across reconnects.
FieldPurpose
Realtime ModelA grok-* model (e.g. grok-3-fast).
xAI Voiceeve, ara, rex, sal, or leo.
xAI API KeyPer-tenant key. Falls back to platform default.
Note: image markdown is sometimes garbled in spoken output on grok-fast — this is a current model limitation on xAI’s side.
FieldPurpose
Realtime ModelSet to anam_pria_custom_llm.
Avatar IDVendor avatar identifier from Anam.
Voice IDVendor voice identifier from Anam.
Placeholder Image URLShown while the video stream is loading.
Loading Video URLOptional looping animation while a turn is in flight.
Intro MessageOptional line the avatar speaks on session start.
Conversation ModelOptional override for the LLM that powers voice turns. Leave blank to use the Digital Twin’s default conversation model.
Anam API KeyPer-tenant key. Falls back to platform default.
Pria runs the LLM, RAG, tools, and personalization for every turn — Anam just renders the avatar and handles mic/TTS.
FieldPurpose
Realtime ModelSet to lemonslice.
Agent IDLemonSlice agent identifier.
ModelOptional per-tenant model override for voice turns.
Placeholder Image URLShown before the video stream starts.
Loading Video URLOptional loop while turns are in flight.
Intro MessageOptional line spoken on session start.
Allow ImagineWhen on, lets the avatar generate images mid-session via the imagination tool.
LemonSlice API KeyPer-tenant key. Falls back to platform default.
LemonSlice is being deprecated in favour of Anam. Use Anam for any new Digital Twin setup.

Voice Activity Detection (VAD)

VAD is how the provider decides when a user has finished speaking. Two knobs apply to OpenAI Realtime:
  • VAD Eagernesslow keeps the user talking longer between turns (good for thoughtful conversations); high cuts in faster (good for quick Q&A drills).
  • Noise Reductionnear_field cleans up headset audio; far_field cleans up laptop mics in rooms with background noise. Leave blank if your users are on quality hardware.
Gemini, xAI, ElevenLabs, Anam, and LemonSlice manage VAD internally — those toggles are OpenAI-only.

Transcription Language

If your users speak a single non-English language, set the Transcription Language field (e.g. fr, es, de, ja). It biases the STT engine and reduces transcription errors. Leave blank for English or multilingual rooms. This setting applies to OpenAI Realtime only.

Avatar Imagination Prompts

LemonSlice’s Allow Imagine toggle lets users ask the avatar to generate images during a voice session. The image is produced via your Digital Twin’s Image Generation model and shown alongside the avatar. Disable if you don’t want generated images appearing during voice conversations. Anam supports tools (including image generation) through the per-turn voice handler — no separate toggle needed.

Testing Your Setup

1

Keep Convo Mode admin-only while testing

Leave Admin-only on while you tune voice, VAD, and avatar settings. You’ll be the only one who sees the microphone affordance.
2

Open Convo Mode

Sign in as an Admin user, open any conversation, and start a Convo Mode session.
3

Verify the basics

Confirm: the avatar (if any) appears, your voice is transcribed, the assistant responds in the chosen voice, and your assistant’s tools still fire mid-conversation.
4

Tune VAD if needed

If the assistant interrupts users mid-thought, drop VAD Eagerness to low. If responses feel sluggish, bump it to high.
5

Flip Admin-only off

Once you’re satisfied, turn off Admin-only so every user can use Convo Mode.

Cost Considerations

Convo Mode is billed per minute by every provider. Costs vary widely:
  • OpenAI Realtime and Gemini Live are billed on input/output audio tokens per minute by the provider.
  • ElevenLabs charges per character of TTS output plus session minutes.
  • xAI Realtime is billed on audio minutes.
  • Anam and LemonSlice add an avatar surcharge on top of the underlying LLM cost.
If you bring your own provider API keys via the Custom Models flow, those minutes are billed directly to your provider account. Otherwise they go through Pria’s platform billing.
Avatars (Anam, LemonSlice) are the most expensive option per minute. If cost is a concern and you don’t need a visible persona, OpenAI Realtime or Gemini Live with a quality voice will deliver excellent results at a fraction of the cost.