Gemini Live Voice - praxis-ai

Gemini Live brings real-time, natural voice conversations to your Digital Twin using Google’s native audio API. Unlike text-based approaches that convert speech to text and back, Gemini Live processes audio natively — resulting in lower latency and more natural-sounding responses.

Gemini Live is ideal for interactive learning, tutoring, and conversational scenarios where low-latency, natural dialogue is important.

How It Works

User Speaks

Audio is captured from the user’s microphone and streamed over a WebSocket connection directly to Google’s Gemini API.

Gemini Processes

Gemini processes the audio natively — understanding speech, generating a response, and synthesizing audio output in a single model pass.

User Hears

The AI’s audio response streams back in real time. Both sides of the conversation are transcribed for the chat history.

User speaks → WebSocket → Gemini (native audio processing) → Audio response → User hears
                                    ↕
                            Tool calls (search, files, web)

Prerequisites

Before enabling Gemini Live, ensure:

A Gemini API key is configured (at the instance or environment level)
The instance has a Gemini-compatible model assigned as the conversation model
The Convo mode voice provider is set to Gemini Live in the instance configuration

Supported Gemini Live models:

Model	Status
`gemini-3.1-flash-live-preview`	Default
`gemini-2.5-flash-native-audio-preview-12-2025`	Alternative

Admin Setup

Configure a Gemini API Key

Gemini Live uses a 2-tier API key resolution:

Instance level — The gemini_api_key field in the Configuration tab (highest priority)
Environment level — The GEMINI_API_KEY environment variable (lowest priority)

At least one of these must be set for Gemini Live to function.

This differs from standard Gemini chat, which also checks the AI model’s api_key field. Gemini Live does not use per-model API keys.

Set the Voice Provider

In the Personalization tab under the Convo section, set the Convo mode voice provider to Gemini Live. This tells Pria to use the Gemini WebSocket API instead of OpenAI Realtime or ElevenLabs.

Choose a Default Voice

Select a default voice for your instance from the gemini_rt_voice setting. Users can override this in the Convo Mode panel.Main voices:

Voice	Style
Puck	Upbeat
Charon	Informative
Kore	Firm
Fenrir	Excitable
Aoede	Breezy
Zephyr	Bright
Leda	Youthful
Orus	Firm

Additional voices are available in the voice selector dropdown when a user opens Convo Mode. Preview Gemini voices on Google AI Studio.

Features

Tool Calling

During a voice conversation, the AI can invoke tools just like in text mode — searching the web, looking up files in IP Vault, querying Canvas, and more. Tool results are incorporated into the audio response naturally.

Live Transcription

Both the user’s speech (input) and the AI’s spoken responses (output) are transcribed in real time and displayed in the Convo Mode panel. Transcripts are saved to conversation history for later review.

Proactive Audio

When enabled, the AI can initiate speech without waiting for the user to speak first — for example, offering follow-up suggestions or asking clarifying questions. This creates more natural turn-taking in tutoring scenarios.

Noise Reduction

A noise reduction toggle filters background audio from the user’s microphone, improving recognition accuracy in noisy environments like classrooms or open offices.

Text Input During Voice

Users can type messages during an active voice conversation. The typed text is sent alongside the audio stream, and the AI responds with both voice and text.

Ephemeral Token Security

Each voice session uses a short-lived ephemeral token generated by the server. No long-lived API keys are exposed to the browser. Tokens are created on-demand and expire after the session ends.

User Experience

When a user opens Convo Mode with Gemini Live configured, they see:

Voice selector — Dropdown with available Gemini voices
Start/Stop button — Begins and ends the voice session
Noise reduction toggle — Filters background noise from the microphone
Proactive audio toggle — Allows the AI to speak without being prompted
Text input toggle — Enables typing during a voice conversation
Live transcript — Real-time display of both user speech and AI responses

Troubleshooting

Voice session fails to start

Verify a Gemini API key is configured at the instance or environment level
Check that the instance’s voice provider is set to Gemini Live
Ensure the browser has microphone permissions granted

Audio quality issues

Enable the noise reduction toggle in the Convo Mode panel
Use headphones to prevent echo/feedback
Check your internet connection — Gemini Live requires a stable WebSocket connection

Tools not working during voice

Confirm tools are enabled on the instance
Some tools may not be available during voice mode — check the admin Tools panel

Convo Mode User Guide — End-user guide for voice conversations
AI Models — Configure Gemini models
ElevenLabs Integration — Alternative voice provider
Configuration — Instance voice settings

​How It Works

User Speaks

Gemini Processes

User Hears

​Prerequisites

​Admin Setup

​Features

​User Experience

​Troubleshooting

​Related

How It Works

Prerequisites

Admin Setup

Features

User Experience

Troubleshooting

Related