Skip to main content

Activating Conversation Mode

1

Locate convo mode icon

From the main interface, look for the Conversation icon on the text input barConvo Mode Ui Locations Web
2

Enable microphone

Allow microphone use in the browserConvo Microphone Settings Web
3

Start speaking

Click the start button and begin talking. Your Digital Twin will listen and respond with voice.Convo Mode Start Button Web

The Convo Mode panel

Once you launch Convo Mode, a floating panel appears over your conversation — Pria’s avatar orb with the controls beneath it.
The Convo Mode floating panel showing Pria's circular avatar, a START button, an expand button, a Cedar voice selector, and a close ×.
ControlWhat it does
START / StopBegin or end the live voice session. While it’s running you just speak — there’s no push-to-talk to hold.
MuteMutes your microphone so Pria stops listening; unmute to resume. Your audio is captured but not sent while muted.
Expand / ContractSwitch the panel between compact and expanded display modes (below).
VoicePick the voice Pria speaks in (e.g. Cedar). The list adapts to the active provider, and is hidden when ElevenLabs manages voices. See Switching voices.
Live transcriptAs you talk, your words and Pria’s replies are transcribed in real time (e.g. You: …”) so you can read along. Whether it shows is a per-Twin setting.
CloseCloses the panel and ends the session, returning you to the text chat.

Display modes

ModeDescription
CompactThe floating orb beside your chat — great for quick voice exchanges while you keep working in text.
ExpandedA larger, immersive view that fills the screen, putting the avatar (and transcript, where enabled) front and centre — ideal for a focused conversation or a screen-share.
Convo Mode in expanded mode — the avatar orb centred on a full-screen background with START, the display-mode toggle, and the voice selector.
Tap the expand button to grow the panel, and the same button (now Contract) to shrink it back. On supported devices an administrator can enable true fullscreen immersive mode — and a fullscreen transcript — per Digital Twin in Instance Settings → Voice.
Mute and the live transcript appear once a voice session is actually running — press START to see them.

Voice Providers

Convo Mode supports several real-time voice and avatar providers. Your administrator selects which provider your Digital Twin uses; see Voice & Realtime Providers for the integration overview and Realtime Voice & Avatars for admin-side configuration.
The default voice provider, powered by OpenAI’s Realtime API.
  • Voice selection — Choose from 10+ built-in voices (Cedar, Marin, Alloy, Ash, and more) directly in the Convo Mode panel
  • Voice Activity Detection (VAD) — Configurable eagerness controls how quickly the AI responds when you pause speaking
  • Tool calling — Your Digital Twin can access its full set of tools (search, file lookup, web browsing, etc.) during voice conversations
  • MCP support — Connected MCP servers are available during real-time conversations
  • Token tracking — Input and output token usage is tracked and displayed
  • Reasoning support — The gpt-realtime-2 model supports configurable reasoning effort during voice conversations, enabling stronger instruction following and more reliable tool use for complex voice-agent workflows. It also accepts image input alongside text and audio. See AI Models for details.

Switching voices mid-conversation

You can change the voice at any time during a Convo session — the new voice takes effect on the next response. Open the voice picker in the Convo panel header, pick a new voice, and continue speaking; the running response (if any) finishes in the original voice, then the next turn switches. Switching between providers (OpenAI ↔ xAI ↔ Gemini Live ↔ ElevenLabs) is an admin action — the live voice picker only swaps voices within the active provider. Avatar providers (Anam, LemonSlice) operate independently of the underlying voice model: you can change voices without losing the avatar, and you can disable the avatar without ending the voice session.

Avatar animation feedback

When an avatar provider is active, the Convo panel includes a few visual cues so you can tell what state Pria is in:
CueWhat it means
Idle pose (blinking, breathing)Pria is waiting for you to speak.
Listening dot near the faceYour microphone is open and your audio is reaching the model.
Loading shimmer over the videoThe avatar is buffering the first frame of a new response.
Lip-synced mouth motionPria is speaking. The mouth tracks the streamed audio in real time.
Frozen frameThe session has been interrupted (network blip, model hiccup). Click the refresh button on the Convo panel to reconnect — your conversation history is preserved.

Features

Natural Dialogue Flow

Your digital twin knows how to have actual conversations. It waits for you to finish your thoughts before jumping in, remembers what you’ve been talking about, and lets you ask follow-up questions without having to repeat yourself.

Voice Capabilities

When you speak, your words appear as text right away. When your digital twin responds, you’ll hear it speak back to you with a natural-sounding voice. The more you use it, the better it gets at understanding how you talk.

Text Input

Prefer typing? When text input is enabled, you can type messages during a voice conversation instead of speaking. Your Digital Twin responds with both voice and text — ideal for noisy environments or when you need to input precise information.

Multilingual Support

Switch between languages right in the middle of a conversation. Your digital twin will catch on and switch with you, keeping track of what you were talking about.

Knowledge Integration

Your AI assistant automatically references your uploaded documents and custom-built assistants during conversations, providing personalized and contextually relevant responses.

Audio Transcriptions

All voice conversations are automatically saved as searchable transcript files that you can access, review, and reference at any time.

Provider Comparison

FeatureOpenAI GPT-RealtimexAI RealtimeGemini LiveElevenLabs
Voice selection in PriaYes (10+ voices)Yes (5 voices)Yes (30 voices)Configured in dashboard
VAD controlAdjustable eagernessAutomaticAutomaticAutomatic
Tool callingFull tool accessFull tool accessFull tool accessDashboard-configured
MCP server supportYesNoNoNo
Text input modeYesYesYesYes
Token trackingYesYes (with cached_tokens)YesNo
Custom voice clonesNoNoNoYes
Live transcriptionOutput onlyInput and outputInput and outputOutput only
Proactive audioNoNoYesNo
Noise reductionConfigurableAutomaticAutomaticAutomatic
TransportWebRTCWebSocketWebSocketWebRTC
Dynamic variablesN/A (full context in prompt)N/A (full context in prompt)N/A (full context in prompt)Auto-injected
Avatar providers (Anam, LemonSlice) are layered on top — they pair with any of the audio providers above to give Pria a visible face.
Your administrator selects the voice provider for your Digital Twin. Contact your admin if you have questions about which provider is active.

Troubleshooting Common Issues

Check permissions and hardware connections. Your microphone needs to be enabled in the browser for Convo Mode to work.
Adjust input sensitivity and check for background noise.
Use headphones or adjust speaker volume.
Speak clearly and check language settings.
Check internet connection and try restarting the conversation.
Provide a brief recap of your previous discussion and pick up from there.
Rephrase using different words or examples.
Explicitly state language changes if the Digital Twin does not pick up on the switch.
If you don’t see voice selection or VAD controls, your Digital Twin is using ElevenLabs as the voice provider. These settings are managed by your administrator in the ElevenLabs dashboard.