Skip to main content

Model Usage

You can select which AI model suits you best for different uses from the list of models offered by the platform or plug in your own custom AI model. Supported usages include:
  • Conversation
  • Image Analysis
  • Image Generation
  • Embeddings Generation
  • Audio Transcription
  • Text to Speech
  • Document Summarization
  • Speech to Speech (Conversation / Realtime)
  • Moderation
These models have various functions, performance profiles, and feature sets.
Models used for Conversation must support Tools and streaming simultaneously.

How Praxis AI Uses Models

Praxis AI can orchestrate multiple providers and models in parallel using a unified interface:
  • Configure several providers in Personalization and AI Models.
  • Assign preferred models to each Model Use (Conversation, Images, Audio, etc.).
  • Overwrite the conversation provider for each Assistant
This orchestration is transparent to your LMS, Web SDK, or REST integrations.

Model Selection

Default for Your Digital Twin

Each Digital Twin in Praxis AI can use different models optimized for its domain. To select or change models:
  1. Go to the Admin section.
  2. Edit your Digital Twin.
  3. Open the Personalization and AI Models section.
  4. Review or change the model used for each Model Use (Conversation, Images, Audio, etc.).
GPT Models

Conversation at Runtime

At runtime, you can easily switch the LLM used for Conversation by accessing the Settings in the Side Bar panel and review model capabilities by clicking the Model Options detail Switch Conversation Model Custom Model is displayed in the same section in the Side Bar Switch Conversation Model

Specific to each assistants

You can specify which Conversation model to use for each assistants Switch Conversation Model Behind the scenes, Praxis AI’s Neural Engine can route requests to different models based on:
  • Use Case
  • Assistant Specific model
  • Token budget and cost constraints
  • Model availability and latency
  • User preferences and history
This allows you to balance quality, speed, and cost without changing your front-end integration.

Platform Models

Praxis AI middleware offers access to a broad catalog of state-of-the-art AI models. You can select the model that best fits your needs based on performance, cost, and capabilities. The default model is configured to use the latest, most capable model available on the platform. In most cases, you should keep default selected unless you have a specific requirement (for example, strict cost control, specific provider, or latency constraints). Models can be accessed using:

Provider-Based Models

Praxis AI exposes conversation and related capabilities (vision, audio, embeddings, moderation, realtime) through two main provider types:
  • Amazon Bedrock
  • OpenAI-Compatible Clients
Each provider contains groups (Amazon, Anthropic, OpenAI, Gemini, etc.) and individual models with specific capabilities and uses.
Anthropic models via Bedrock are platform models of choice, mainly for Conversation and Image Analysis. Models marked with Extended support the optional 1M token context window (see Inference Settings).
Model NameStatusCapabilitiesInput (tokens)Output (tokens)ThinkingTypical Uses
global.anthropic.claude-sonnet-4-6NewTools, Streaming, Vision200,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-6DefaultTools, Streaming, Vision200,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-5-20250929-v1:0CurrentTools, Streaming, Vision200,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-20250514-v1:0DeprecatedTools, Streaming, Vision200,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-3-7-sonnet-20250219-v1:0DeprecatedTools, Streaming, Vision200,00064,000YesConversation, Image Analysis, Summary
us.anthropic.claude-3-5-sonnet-20241022-v2:0DeprecatedTools, Streaming, Vision200,0008,192Conversation, Image Analysis
us.anthropic.claude-opus-4-6-v1NewTools, Streaming, Vision200,000128,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-opus-4-5-20251101-v1:0CurrentTools, Streaming, Vision200,00064,000YesConversation, Image Analysis, Summary
us.anthropic.claude-opus-4-1-20250805-v1:0DeprecatedTools, Streaming, Vision200,00032,000YesConversation, Image Analysis
us.anthropic.claude-opus-4-20250514-v1:0DeprecatedTools, Streaming, Vision200,00032,000YesConversation, Image Analysis
us.anthropic.claude-haiku-4-5-20251001-v1:0CurrentTools, Streaming, Vision200,00064,000YesConversation, Summary, Image Analysis
us.anthropic.claude-3-5-haiku-20241022-v1:0DeprecatedTools, Streaming, Vision200,0008,192Conversation, Image Analysis
Deprecated models will be removed in a future release. Migrate to a newer model. When a deprecated model is removed, any assistant or configuration referencing it will automatically fall back to the institution’s default model.
These models are configured against the OpenAI API and used across Conversation, Image Analysis, Summary, Audio, TTS, Moderation, and Realtime.

Conversation / Vision / Summary

Model NameStatusCapabilitiesInput (tokens)Output (tokens)ThinkingTypical Uses
gpt-5.2CurrentTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5.1CurrentTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5-2025-08-07DeprecatedTools, Streaming, Vision, MCP272,000128,000YesConversation, Image Analysis, Summary
gpt-5-miniCurrentTools, Streaming, Vision, MCP272,000128,000YesConversation, Image Analysis, Summary
gpt-5-nano-2025-08-07CurrentTools, Streaming, Vision, MCP272,000128,000YesConversation, Image Analysis, Summary
gpt-5DeprecatedTools, Streaming, Vision, MCP272,000128,000YesConversation, Image Analysis, Summary
gpt-4.1CurrentTools, Streaming, Vision, MCP1,047,57632,768Conversation, Image Analysis, Summary
gpt-4oDeprecatedTools, Streaming, Vision128,00016,384Conversation, Image Analysis, Summary
gpt-4o-miniDeprecatedTools, Streaming, Vision128,00016,384Conversation, Image Analysis, Summary
o4-mini-deep-researchSpecializedStreaming, Vision, MCP200,000100,000YesDeep research, Image Analysis
o4-miniCurrentTools, Streaming, Vision, MCP200,000100,000YesConversation, Image Analysis
o3-deep-researchSpecializedStreaming, Vision, MCP200,000100,000YesDeep research, Image Analysis
o3-proDeprecatedTools, Streaming, Vision, MCP200,000100,000YesConversation, Image Analysis
o3DeprecatedTools, Streaming, Vision, MCP200,000100,000YesConversation, Image Analysis
o3-miniDeprecatedTools, Streaming, Vision200,000100,000YesConversation, Image Analysis
o1DeprecatedTools, Streaming, Vision200,000100,000YesConversation, Image Analysis

Image Generation

Model NameStatusCapabilitiesTypical Uses
gpt-image-1.5NewVisionImage Generation
gpt-image-1CurrentVisionImage Generation
gpt-image-1-miniCurrentVisionImage Generation
dall-e-3CurrentVisionImage Generation

Embeddings

Model NameInput (tokens)Vector DimensionsTypical Uses
text-embedding-3-small8,1911,536Embeddings
text-embedding-3-large8,1913,072Embeddings

Audio Transcription and Translation

Model NameInput (Hz)Output (tokens)Typical Uses
whisper-1Audio Analysis
gpt-4o-mini-transcribe16,0002,000Audio Analysis (Default)
gpt-4o-transcribe16,0002,000Audio Analysis
gpt-4o-transcribe-diarize16,0002,000Audio Analysis (Speaker ID)

Text-to-Speech (TTS)

Model NameTypical Uses
tts-1TTS
tts-1-hdTTS
gpt-4o-mini-ttsTTS

Moderation

Model NameTypical Uses
omni-moderation-latestModeration

Real-Time Speech-to-Speech (RT / STS)

Model NameStatusInput TokensOutput TokensTypical Uses
gpt-realtimeDefault32,0004,096Realtime voice agent
gpt-realtime-miniCurrent32,0004,096Realtime voice agent
gpt-4o-realtime-previewDeprecated32,0004,096Realtime voice agent
gpt-4o-mini-realtime-previewCurrent16,0004,096Realtime voice agent

Reasoning Effort

Some AI models support extended thinking (also called reasoning), where the model can spend additional time analyzing a problem before responding. Praxis AI provides a unified 5-level reasoning effort system that works across all supported providers.
LevelDescriptionBest For
NoneDisable thinking. Fastest responses, lowest cost.Simple queries, quick lookups
LowMinimal reasoning.Straightforward questions
MediumBalanced reasoning.Most everyday tasks
HighThorough reasoning.Complex analysis, multi-step problems
MaxMaximum reasoning depth. Highest latency and cost.Research, detailed technical analysis

How Reasoning Effort is Applied

The reasoning effort level is resolved using this priority:
  1. AI Model override — If a custom AI model has a reasoning effort configured, that takes precedence
  2. Institution setting — The institution-level default reasoning effort
  3. Platform defaultNone (thinking disabled)
Reasoning effort is mapped to each provider’s native format automatically — OpenAI reasoning_effort, Anthropic budget_tokens, Gemini thinkingConfig, and Bedrock budgetTokens. You don’t need to configure provider-specific parameters.

Models with Thinking Support

Not all models support extended thinking. Look for models marked with thinking support in the tables above. Currently supported thinking models include:
  • Anthropic: Claude Opus 4.6, Opus 4.5, Sonnet 4.6, Sonnet 4.5, Sonnet 4, Claude 3.7 Sonnet, Haiku 4.5 (via Bedrock or Direct API)
  • OpenAI: GPT-5 series (5.2, 5.1, 5-mini, 5-nano), o-series (o4-mini, o3, o3-mini, o1)
  • Google Gemini: Gemini 3.1 Pro Preview, Gemini 3 Flash/Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash Lite

Provider Types

Praxis AI routes AI requests through four backend providers:
ProviderHow It Works
Amazon BedrockModels hosted on AWS infrastructure. Uses IAM credentials for authentication.
OpenAI APIDirect OpenAI API calls. Used for OpenAI models and OpenAI-compatible endpoints.
Anthropic Direct APIDirect Anthropic API calls. Bypasses Bedrock for Claude models when preferred.
Google GenAIDirect Google Gemini API calls via the @google/genai SDK.
Some model families (e.g., Anthropic Claude) are available through multiple providers — both via Bedrock and via Direct API. The admin can choose which provider to use based on latency, cost, and regional availability preferences.

Bring Your Own AI Model (BYOM)

You can connect your own hosted LLM (for example, a model deployed on Google Vertex AI, private OpenAI-compatible endpoint, or a Bedrock-hosted custom model) and use it as a replacement for any of the supported usages.

Configure a Custom Model

To add a custom model for Conversation (or any other use):
  1. In the Admin UI, edit your Digital Twin.
  2. Under Personalization and AI Models, click Add AI Model.
Add AI Model
  1. In the Add AI Model panel, enter the properties required to connect to your LLM:
Add AI Model
  • Model Name The exact model identifier published by your hosting platform. This value is case sensitive and must match your provider’s model name, for example: gemini-flash or projects/my-proj/locations/us/models/my-model.
  • Status Active models are considered by the system for routing and selection. Inactive models are ignored but kept in configuration.
  • Description Human-readable description of the LLM for admins and authors using this Digital Twin.
  • Model Use The specific usage for this model (for example, Conversation, Image Generation, Document Summarization). This determines which internal calls will use this model.
  • Client Library Type Choose from:
    • Open AI for OpenAI-compatible endpoints (including many custom or Vertex AI gateways exposing an OpenAI-style API).
    • Bedrock for Amazon Bedrock-hosted models. Most Gemini-based models connected through an OpenAI-compatible proxy should use Open AI.
  • API URL The base public URL of your model endpoint, for example: https://ai.my-school.edu or your Bedrock-compatible endpoint. Typically, the model name or ID is appended to this base URL when interacting with the LLM.
  • API Key The secret key used to authenticate requests to your endpoint. Keep this key secure and confidential; rotate it periodically for security.
  1. Click Save to register the new custom AI model.
Add AI Model Once saved:
  • The model appears in the list of custom AI models.
  • For its configured Model Use, it will replace the platform default model.
  • All conversations or tasks mapped to that Model Use will start using your custom model without any client-side code changes.
Use a non-production Digital Twin first to validate latency, cost, and behavior of your custom model before assigning it to high-traffic or mission-critical usages.

End-to-End Workflow

1

Configure Provider Credentials

Go to Configuration → Personalization and AI Models and enter API keys and endpoints for each provider you plan to use (OpenAI-compatible, Bedrock, or custom gateways).
2

Select Models per Usage

For each Model Use (Conversation, Image, Audio, etc.), select the preferred model from the list of available platform and custom models.
3

Enable and Test Your Digital Twin

Use the Test or preview mode to run conversations against your updated configuration. Validate:
  • Response quality
  • Latency
  • Tool and streaming support (for Conversation models)
4

Monitor and Optimize

Use Analytics to track token usage, latency, and error rates per model. Adjust your model selection or routing preferences to balance performance and cost.
5

Scale to Production

Once validated, deploy your Digital Twin to users through LMS integration (e.g., Canvas), Web SDK, or REST APIs—no additional code changes required when switching models.
6

Connect New Digital Twins

Repeat the configuration setup for any additional twins so they can connect to the same custom LLM
Need help choosing models or configuring BYOM? Praxis AI supports multi-LLM orchestration and can route across OpenAI, Anthropic, Amazon, Google, Mistral, and your own hosted models in a single Digital Twin configuration.