How It Works
User Speaks
Audio is captured from the user’s microphone and streamed over a WebSocket connection directly to Google’s Gemini API.
Gemini Processes
Gemini processes the audio natively — understanding speech, generating a response, and synthesizing audio output in a single model pass.
User Hears
The AI’s audio response streams back in real time. Both sides of the conversation are transcribed for the chat history.
Prerequisites
Before enabling Gemini Live, ensure:- A Gemini API key is configured (at the instance or environment level)
- The instance has a Gemini-compatible model assigned as the conversation model
- The Convo mode voice provider is set to Gemini Live in the instance configuration
Admin Setup
Configure a Gemini API Key
Gemini Live uses a 2-tier API key resolution:
- Instance level — The
gemini_api_keyfield in instance settings (highest priority) - Environment level — The
GEMINI_API_KEYenvironment variable (lowest priority)
This differs from standard Gemini chat, which also checks the AI model’s
api_key field. Gemini Live does not use per-model API keys.Set the Voice Provider
In your instance settings, set the Convo mode voice provider to Gemini Live. This tells Pria to use the Gemini WebSocket API instead of OpenAI Realtime or ElevenLabs.
Choose a Default Voice
Select a default voice for your instance from the
gemini_rt_voice setting. Users can override this in the Convo Mode panel.Popular voices:| Voice | Style |
|---|---|
| Puck | Upbeat |
| Charon | Informative |
| Kore | Firm |
| Fenrir | Excitable |
| Aoede | Breezy |
30 voices are available in total. The full list appears in the voice selector dropdown when a user opens Convo Mode.
Features
Tool Calling
Tool Calling
During a voice conversation, the AI can invoke tools just like in text mode — searching the web, looking up files in IP Vault, querying Canvas, and more. Tool results are incorporated into the audio response naturally.
Live Transcription
Live Transcription
Both the user’s speech (input) and the AI’s spoken responses (output) are transcribed in real time and displayed in the Convo Mode panel. Transcripts are saved to conversation history for later review.
Proactive Audio
Proactive Audio
When enabled, the AI can initiate speech without waiting for the user to speak first — for example, offering follow-up suggestions or asking clarifying questions. This creates more natural turn-taking in tutoring scenarios.
Noise Reduction
Noise Reduction
A noise reduction toggle filters background audio from the user’s microphone, improving recognition accuracy in noisy environments like classrooms or open offices.
Text Input During Voice
Text Input During Voice
Users can type messages during an active voice conversation. The typed text is sent alongside the audio stream, and the AI responds with both voice and text.
Ephemeral Token Security
Ephemeral Token Security
Each voice session uses a short-lived ephemeral token generated by the server. No long-lived API keys are exposed to the browser. Tokens are created on-demand and expire after the session ends.
User Experience
When a user opens Convo Mode with Gemini Live configured, they see:- Voice selector — Dropdown with all 30 available Gemini voices
- Start/Stop button — Begins and ends the voice session
- Noise reduction toggle — Filters background noise from the microphone
- Proactive audio toggle — Allows the AI to speak without being prompted
- Text input toggle — Enables typing during a voice conversation
- Live transcript — Real-time display of both user speech and AI responses
Troubleshooting
Voice session fails to start
Voice session fails to start
- Verify a Gemini API key is configured at the instance or environment level
- Check that the instance’s voice provider is set to Gemini Live
- Ensure the browser has microphone permissions granted
Audio quality issues
Audio quality issues
- Enable the noise reduction toggle in the Convo Mode panel
- Use headphones to prevent echo/feedback
- Check your internet connection — Gemini Live requires a stable WebSocket connection
Tools not working during voice
Tools not working during voice
- Confirm tools are enabled on the instance
- Some tools may not be available during voice mode — check the admin Tools panel
Related
- Convo Mode User Guide — End-user guide for voice conversations
- AI Models — Configure Gemini models
- ElevenLabs Integration — Alternative voice provider
- Configuration — Instance voice settings