Convo Mode

Activating Conversation Mode

Instance owners can make Convo Mode the default experience: with Start Convo mode on login enabled (Instance Settings → Voice, or the admin instance editor), visitors land on a tap-to-start screen right after signing in — the tap grants microphone access and begins the conversation. Anyone can exit to the classic chat, and each visitor’s last choice (voice or text) is remembered on their device for future visits. Full-screen behaviour follows the Avatar Fullscreen setting.

Locate convo mode icon

From the main interface, look for the Conversation icon on the text input bar

Enable microphone

Allow microphone use in the browser

Start speaking

Click the start button and begin talking. Your Digital Twin will listen and respond with voice.

The Convo Mode panel

Once you launch Convo Mode, a floating panel appears over your conversation — Pria’s avatar orb with the controls beneath it.

The Convo Mode panel: the avatar orb with START, the expand (display-mode) button, the voice picker, and a close ×.

Control	What it does
START / Stop	Begin or end the live voice session. While it’s running you just speak — there’s no push-to-talk to hold.
Mute	Mutes your microphone so Pria stops listening; unmute to resume. Your audio is captured but not sent while muted.
Expand / Contract	Switch the panel between compact and expanded display modes (below).
Voice	Pick the voice Pria speaks in (e.g. Cedar). The list adapts to the active provider, and is hidden when ElevenLabs manages voices. See Switching voices.
Live transcript	As you talk, your words and Pria’s replies are transcribed in real time (e.g. “You:* …”*) so you can read along. Whether it shows is a per-Twin setting.
Close	Closes the panel and ends the session, returning you to the text chat.

Display modes

Mode	Description
Compact	The floating orb beside your chat — great for quick voice exchanges while you keep working in text.
Expanded	A larger, immersive view that fills the screen, putting the avatar (and transcript, where enabled) front and centre — ideal for a focused conversation or a screen-share.

Expanded mode fills the screen for a focused, immersive conversation.

Tap the expand button to grow the panel, and the same button (now Contract) to shrink it back. On supported devices an administrator can enable true fullscreen immersive mode — and a fullscreen transcript — per Digital Twin in Instance Settings → Voice.

Mute and the live transcript appear once a voice session is actually running — press START to see them.

Voice Providers

Convo Mode supports several real-time voice and avatar providers. Your administrator selects which provider your Digital Twin uses; see Voice & Realtime Providers for the integration overview and Realtime Voice & Avatars for admin-side configuration.

The default voice provider, powered by OpenAI’s Realtime API.

Voice selection — Choose from 10+ built-in voices (Cedar, Marin, Alloy, Ash, and more) directly in the Convo Mode panel
Voice Activity Detection (VAD) — Configurable eagerness controls how quickly the AI responds when you pause speaking
Tool calling — Your Digital Twin can access its full set of tools (search, file lookup, web browsing, etc.) during voice conversations
MCP support — Connected MCP servers are available during real-time conversations
Token tracking — Input and output token usage is tracked and displayed
Reasoning support — The gpt-realtime-2 model supports configurable reasoning effort during voice conversations, enabling stronger instruction following and more reliable tool use for complex voice-agent workflows. It also accepts image input alongside text and audio. See AI Models for details.

Real-time speech-to-speech powered by xAI’s Grok voice models.

Voice selection — Choose from five built-in voices: eve, ara, rex, sal, leo, directly in the Convo Mode panel
Tool calling — Full tool access during voice conversations, including web search, file lookup, and IP Vault retrieval
Automatic prompt caching — Grok’s optimized caching reduces cost on long-running sessions; usage is reported with cached_tokens in the token counter
Live transcription — Both your speech and Grok’s responses are transcribed in real time
WebSocket transport — Uses a WebSocket connection (similar to Gemini Live) rather than WebRTC, which can make it more tolerant of restrictive networks

The grok-fast real-time model is conversational-first; if you ask it to speak a Markdown image link aloud it may garble the URL into the audio. Stick to plain text questions in voice — text turns render images fine.

Switching voices mid-conversation

You can change the voice at any time during a Convo session — the new voice takes effect on the next response. Open the voice picker in the Convo panel header, pick a new voice, and continue speaking; the running response (if any) finishes in the original voice, then the next turn switches. Switching between providers (OpenAI ↔ xAI ↔ Gemini Live ↔ ElevenLabs) is an admin action — the live voice picker only swaps voices within the active provider. Avatar providers (Anam, LemonSlice) operate independently of the underlying voice model: you can change voices without losing the avatar, and you can disable the avatar without ending the voice session.

Avatar animation feedback

When an avatar provider is active, the Convo panel includes a few visual cues so you can tell what state Pria is in:

Cue	What it means
Idle pose (blinking, breathing)	Pria is waiting for you to speak.
Listening dot near the face	Your microphone is open and your audio is reaching the model.
Loading shimmer over the video	The avatar is buffering the first frame of a new response.
Lip-synced mouth motion	Pria is speaking. The mouth tracks the streamed audio in real time.
Frozen frame	The session has been interrupted (network blip, model hiccup). Click the refresh button on the Convo panel to reconnect — your conversation history is preserved.

Features

Natural Dialogue Flow

Your digital twin knows how to have actual conversations. It waits for you to finish your thoughts before jumping in, remembers what you’ve been talking about, and lets you ask follow-up questions without having to repeat yourself.

Voice Capabilities

When you speak, your words appear as text right away. When your digital twin responds, you’ll hear it speak back to you with a natural-sounding voice. The more you use it, the better it gets at understanding how you talk.

Text Input

Prefer typing? When text input is enabled, you can type messages during a voice conversation instead of speaking. Your Digital Twin responds with both voice and text — ideal for noisy environments or when you need to input precise information.

Multilingual Support

Switch between languages right in the middle of a conversation. Your digital twin will catch on and switch with you, keeping track of what you were talking about.

Knowledge Integration

Your AI assistant automatically references your uploaded documents and custom-built assistants during conversations, providing personalized and contextually relevant responses.

Audio Transcriptions

All voice conversations are automatically saved as searchable transcript files that you can access, review, and reference at any time.

Provider Comparison

Feature	OpenAI GPT-Realtime	xAI Realtime	Gemini Live	ElevenLabs
Voice selection in Pria	Yes (10+ voices)	Yes (5 voices)	Yes (30 voices)	Configured in dashboard
VAD control	Adjustable eagerness	Automatic	Automatic	Automatic
Tool calling	Full tool access	Full tool access	Full tool access	Dashboard-configured
MCP server support	Yes	No	No	No
Text input mode	Yes	Yes	Yes	Yes
Token tracking	Yes	Yes (with `cached_tokens`)	Yes	No
Custom voice clones	No	No	No	Yes
Live transcription	Output only	Input and output	Input and output	Output only
Proactive audio	No	No	Yes	No
Noise reduction	Configurable	Automatic	Automatic	Automatic
Transport	WebRTC	WebSocket	WebSocket	WebRTC
Dynamic variables	N/A (full context in prompt)	N/A (full context in prompt)	N/A (full context in prompt)	Auto-injected

Avatar providers (Anam, LemonSlice) are layered on top — they pair with any of the audio providers above to give Pria a visible face.

Your administrator selects the voice provider for your Digital Twin. Contact your admin if you have questions about which provider is active.

Troubleshooting Common Issues

Microphone not working

Check permissions and hardware connections. Your microphone needs to be enabled in the browser for Convo Mode to work.

Poor audio quality

Adjust input sensitivity and check for background noise.

Echo or feedback

Use headphones or adjust speaker volume.

Voice not recognized

Speak clearly and check language settings.

AI not responding

Check internet connection and try restarting the conversation.

Context lost

Provide a brief recap of your previous discussion and pick up from there.

Misunderstood requests

Rephrase using different words or examples.

Language switching problems

Explicitly state language changes if the Digital Twin does not pick up on the switch.

Voice options not visible

If you don’t see voice selection or VAD controls, your Digital Twin is using ElevenLabs as the voice provider. These settings are managed by your administrator in the ElevenLabs dashboard.

Realtime Voice & Avatars (admin) — Admin-side configuration of voice providers, avatars, and per-instance voice picks
Voice & Realtime Providers (integration overview) — How each provider connects to Pria and what features it brings
Anam Integration — Setup guide for the Anam animated avatar
Gemini Live Integration — Admin setup guide for Gemini Live voice
AI Models — Real-time speech-to-speech model options
Configuration — Voice provider selection for administrators
Input & Responses — Text and voice input options, including one-shot dictation (the 🎙 mic)
Audio Notes — Record voice memos that become searchable knowledge in your vault

Getting Started

Using the Interface

Canvas Access

Glossary

Support

Activating Conversation Mode

The Convo Mode panel

Display modes

Voice Providers

Switching voices mid-conversation

Avatar animation feedback

Features

Natural Dialogue Flow

Voice Capabilities

Text Input

Multilingual Support

Knowledge Integration

Audio Transcriptions

Provider Comparison

Troubleshooting Common Issues

​Activating Conversation Mode

​The Convo Mode panel

​Display modes

​Voice Providers

​Switching voices mid-conversation

​Avatar animation feedback

​Features

Natural Dialogue Flow

Voice Capabilities

Text Input

Multilingual Support

Knowledge Integration

Audio Transcriptions

​Provider Comparison

​Troubleshooting Common Issues

​Related

Activating Conversation Mode

The Convo Mode panel

Display modes

Voice Providers

Switching voices mid-conversation

Avatar animation feedback

Features

Provider Comparison

Troubleshooting Common Issues

Related