Skip to main content
Gemini Live brings real-time, natural voice conversations to your Digital Twin using Google’s native audio API. Unlike text-based approaches that convert speech to text and back, Gemini Live processes audio natively — resulting in lower latency and more natural-sounding responses.
Gemini Live is ideal for interactive learning, tutoring, and conversational scenarios where low-latency, natural dialogue is important.

How It Works

User Speaks

Audio is captured from the user’s microphone and streamed over a WebSocket connection directly to Google’s Gemini API.

Gemini Processes

Gemini processes the audio natively — understanding speech, generating a response, and synthesizing audio output in a single model pass.

User Hears

The AI’s audio response streams back in real time. Both sides of the conversation are transcribed for the chat history.
User speaks → WebSocket → Gemini (native audio processing) → Audio response → User hears

                            Tool calls (search, files, web)

Prerequisites

Before enabling Gemini Live, ensure:
  • A Gemini API key is configured (at the instance or environment level)
  • The instance has a Gemini-compatible model assigned as the conversation model
  • The Convo mode voice provider is set to Gemini Live in the instance configuration

Admin Setup

1

Configure a Gemini API Key

Gemini Live uses a 2-tier API key resolution:
  1. Instance level — The gemini_api_key field in instance settings (highest priority)
  2. Environment level — The GEMINI_API_KEY environment variable (lowest priority)
At least one of these must be set for Gemini Live to function.
This differs from standard Gemini chat, which also checks the AI model’s api_key field. Gemini Live does not use per-model API keys.
2

Set the Voice Provider

In your instance settings, set the Convo mode voice provider to Gemini Live. This tells Pria to use the Gemini WebSocket API instead of OpenAI Realtime or ElevenLabs.
3

Choose a Default Voice

Select a default voice for your instance from the gemini_rt_voice setting. Users can override this in the Convo Mode panel.Popular voices:
VoiceStyle
PuckUpbeat
CharonInformative
KoreFirm
FenrirExcitable
AoedeBreezy
30 voices are available in total. The full list appears in the voice selector dropdown when a user opens Convo Mode.

Features

During a voice conversation, the AI can invoke tools just like in text mode — searching the web, looking up files in IP Vault, querying Canvas, and more. Tool results are incorporated into the audio response naturally.
Both the user’s speech (input) and the AI’s spoken responses (output) are transcribed in real time and displayed in the Convo Mode panel. Transcripts are saved to conversation history for later review.
When enabled, the AI can initiate speech without waiting for the user to speak first — for example, offering follow-up suggestions or asking clarifying questions. This creates more natural turn-taking in tutoring scenarios.
A noise reduction toggle filters background audio from the user’s microphone, improving recognition accuracy in noisy environments like classrooms or open offices.
Users can type messages during an active voice conversation. The typed text is sent alongside the audio stream, and the AI responds with both voice and text.
Each voice session uses a short-lived ephemeral token generated by the server. No long-lived API keys are exposed to the browser. Tokens are created on-demand and expire after the session ends.

User Experience

When a user opens Convo Mode with Gemini Live configured, they see:
  1. Voice selector — Dropdown with all 30 available Gemini voices
  2. Start/Stop button — Begins and ends the voice session
  3. Noise reduction toggle — Filters background noise from the microphone
  4. Proactive audio toggle — Allows the AI to speak without being prompted
  5. Text input toggle — Enables typing during a voice conversation
  6. Live transcript — Real-time display of both user speech and AI responses

Troubleshooting

  • Verify a Gemini API key is configured at the instance or environment level
  • Check that the instance’s voice provider is set to Gemini Live
  • Ensure the browser has microphone permissions granted
  • Enable the noise reduction toggle in the Convo Mode panel
  • Use headphones to prevent echo/feedback
  • Check your internet connection — Gemini Live requires a stable WebSocket connection
  • Confirm tools are enabled on the instance
  • Some tools may not be available during voice mode — check the admin Tools panel