Skip to main content

Model Usage

You can select which AI model suits you best for different uses from the list of models offered by the platform or plug in your own custom AI model. Supported usages include:
  • Conversation
  • Image Analysis
  • Image Generation
  • Embeddings Generation
  • Audio Transcription
  • Text to Speech
  • Document Summarization
  • Speech to Speech (Conversation / Realtime)
  • Moderation
These models have various functions, performance profiles, and feature sets.
Models used for Conversation must support Tools and streaming simultaneously.

How Praxis AI Uses Models

Praxis AI can orchestrate multiple providers and models in parallel using a unified interface:
  • Configure several providers in Personalization and AI Models.
  • Assign preferred models to each Model Use (Conversation, Images, Audio, etc.).
  • Overwrite the conversation provider for each Assistant
This orchestration is transparent to your LMS, Web SDK, or REST integrations.

Model Selection

Default for Your Digital Twin

Each Digital Twin in Praxis AI can use different models optimized for its domain. To select or change models:
  1. Go to the Admin section.
  2. Edit your Digital Twin.
  3. Open the Personalization and AI Models section.
  4. Review or change the model used for each Model Use (Conversation, Images, Audio, etc.).
GPT Models

Conversation at Runtime

At runtime, you can easily switch the LLM used for Conversation by accessing the Settings in the Side Bar panel and review model capabilities by clicking the Model Options detail Switch Conversation Model Custom Model is displayed in the same section in the Side Bar Switch Conversation Model

Specific to each assistants

You can specify which Conversation model to use for each assistants Switch Conversation Model Behind the scenes, Praxis AI’s Neural Engine can route requests to different models based on:
  • Use Case
  • Assistant Specific model
  • Token budget and cost constraints
  • Model availability and latency
  • User preferences and history
This allows you to balance quality, speed, and cost without changing your front-end integration.

Per-Instance Model Selection

The model catalog described in this page is platform-wide — every Digital Twin has access to the same providers and models. What differs across Digital Twins is which model is selected for each Model Use.
  • Catalog is curated by Praxis AI: providers, model identifiers, status (New / Current / Default / Deprecated), token limits, capabilities, and Thinking support.
  • Selection is per Digital Twin: each instance picks its own model for Conversation, Image Analysis, Image Generation, Summary, Embeddings, Audio, TTS, Moderation, and Realtime — from the dropdowns under Personalization.
  • Overrides can be set per Assistant (conversation model) or per Custom AI Model (BYOM). See the Bring Your Own AI Model section.
When the catalog changes (a model is added, deprecated, or removed), Digital Twins automatically fall back to the institution’s default model for that capability. No manual update is required, but admins are encouraged to review their selections when a new “Default” model is published.

Platform Models

Praxis AI middleware offers access to a broad catalog of state-of-the-art AI models. You can select the model that best fits your needs based on performance, cost, and capabilities. The default model is configured to use the latest, most capable model available on the platform. In most cases, you should keep default selected unless you have a specific requirement (for example, strict cost control, specific provider, or latency constraints). Models can be accessed using:

Provider-Based Models

Praxis AI exposes conversation and related capabilities (vision, audio, embeddings, moderation, realtime) through multiple provider types:
  • Amazon Bedrock
  • OpenAI-Compatible Clients (OpenAI, Cohere)
  • Anthropic Direct API
  • Google Gemini Native SDK
  • Mistral AI Native SDK
  • xAI Native API
  • Stability AI Native API
Each provider contains groups and individual models with specific capabilities and uses.
Anthropic models via Bedrock are platform models of choice, mainly for Conversation and Image Analysis. Models marked with Extended support the optional 1M token context window (see Inference Settings).
Claude Fable 5 on Bedrock — data-sharing opt-in required. Anthropic requires 30-day data retention for Fable/Mythos-class traffic on Bedrock. Your AWS account must set its Bedrock data-retention mode to provider_data_sharing (via the Bedrock Data Retention API — no console UI at launch) before global.anthropic.claude-fable-5 can be invoked; otherwise requests fail with “data retention mode ‘default’ is not available for this model.” If that data-sharing posture isn’t acceptable for your deployment, use claude-fable-5 on the Anthropic Direct API instead, which does not require the Bedrock opt-in.
Model NameStatusCapabilitiesInput (tokens)Output (tokens)ThinkingTypical Uses
global.anthropic.claude-fable-5NewTools, Streaming, Vision1,000,000128,000YesConversation, Image Analysis, Summary
global.anthropic.claude-sonnet-4-6NewTools, Streaming, Vision1,000,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-6DefaultTools, Streaming, Vision1,000,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-5-20250929-v1:0CurrentTools, Streaming, Vision200,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-20250514-v1:0DeprecatedTools, Streaming, Vision200,00064,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-3-7-sonnet-20250219-v1:0DeprecatedTools, Streaming, Vision200,00064,000YesConversation, Image Analysis, Summary
us.anthropic.claude-3-5-sonnet-20241022-v2:0DeprecatedTools, Streaming, Vision200,0008,192Conversation, Image Analysis
global.anthropic.claude-opus-4-7NewTools, Streaming, Vision1,000,000128,000YesConversation, Image Analysis, Summary
us.anthropic.claude-opus-4-7NewTools, Streaming, Vision1,000,000128,000YesConversation, Image Analysis, Summary
global.anthropic.claude-opus-4-6CurrentTools, Streaming, Vision1,000,000128,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-opus-4-6-v1CurrentTools, Streaming, Vision1,000,000128,000Yes (Extended)Conversation, Image Analysis, Summary
us.anthropic.claude-opus-4-5-20251101-v1:0CurrentTools, Streaming, Vision200,00064,000YesConversation, Image Analysis, Summary
us.anthropic.claude-opus-4-1-20250805-v1:0DeprecatedTools, Streaming, Vision200,00032,000YesConversation, Image Analysis
us.anthropic.claude-opus-4-20250514-v1:0DeprecatedTools, Streaming, Vision200,00032,000YesConversation, Image Analysis
us.anthropic.claude-haiku-4-5-20251001-v1:0CurrentTools, Streaming, Vision200,00064,000YesConversation, Summary, Image Analysis
us.anthropic.claude-3-5-haiku-20241022-v1:0DeprecatedTools, Streaming, Vision200,0008,192Conversation, Image Analysis
Deprecated models will be removed in a future release. Migrate to a newer model. When a deprecated model is removed, any assistant or configuration referencing it will automatically fall back to the institution’s default model.
Stability AI models are no longer available through Bedrock. They are now served via the Stability AI Native API — see the dedicated accordion below.
These models are configured against the OpenAI API and used across Conversation, Image Analysis, Summary, Audio, TTS, Moderation, and Realtime.

Conversation / Vision / Summary

Model NameStatusCapabilitiesInput (tokens)Output (tokens)ThinkingTypical Uses
gpt-5.4NewTools, Streaming, Vision, MCP1,050,000128,000YesConversation, Image Analysis, Summary
gpt-5.4-proNewTools, Streaming, Vision, MCP1,050,000128,000YesConversation, Image Analysis, Summary
gpt-5.4-miniNewTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5.4-nanoNewTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5.2CurrentTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5.1CurrentTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5-2025-08-07DeprecatedTools, Streaming, Vision, MCP272,000128,000YesConversation, Image Analysis, Summary
gpt-5-miniCurrentTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5-nano-2025-08-07CurrentTools, Streaming, Vision, MCP400,000128,000YesConversation, Image Analysis, Summary
gpt-5DeprecatedTools, Streaming, Vision, MCP272,000128,000YesConversation, Image Analysis, Summary
gpt-4.1DeprecatedTools, Streaming, Vision, MCP1,047,57632,768Conversation, Image Analysis, Summary
gpt-4.1-miniDeprecatedTools, Streaming, Vision, MCP1,047,57632,768Conversation, Image Analysis, Summary
gpt-4.1-nanoDeprecatedTools, Streaming, Vision, MCP1,047,57632,768Conversation, Image Analysis, Summary
gpt-4oDeprecatedTools, Streaming, Vision128,00016,384Conversation, Image Analysis, Summary
gpt-4o-miniDeprecatedTools, Streaming, Vision128,00016,384Conversation, Image Analysis, Summary
o4-mini-deep-researchSpecializedStreaming, Vision, MCP200,000100,000YesDeep research, Image Analysis
o4-miniCurrentTools, Streaming, Vision, MCP200,000100,000YesConversation, Image Analysis
o3-deep-researchSpecializedStreaming, Vision, MCP200,000100,000YesDeep research, Image Analysis
o3-proDeprecatedTools, Streaming, Vision, MCP200,000100,000YesConversation, Image Analysis
o3DeprecatedTools, Streaming, Vision, MCP200,000100,000YesConversation, Image Analysis
o3-miniDeprecatedTools, Streaming, Vision200,000100,000YesConversation, Image Analysis
o1DeprecatedTools, Streaming, Vision200,000100,000YesConversation, Image Analysis

Image Generation

Model NameStatusCapabilitiesTypical Uses
gpt-image-1.5NewVisionImage Generation
gpt-image-1CurrentVisionImage Generation
gpt-image-1-miniCurrentVisionImage Generation
dall-e-3CurrentVisionImage Generation
When asked, Pria can produce these images in shapes beyond the default square — the gpt-image models support square (1024×1024), landscape (1536×1024) and portrait (1024×1536); dall-e-3 supports square (1024×1024), landscape (1792×1024) and portrait (1024×1792). If a requested size isn’t supported by the chosen model, Pria automatically uses the closest available size.

Video Generation

Model NameStatusCapabilitiesTypical Uses
sora-2NewText to VideoVideo Generation (4 / 8 / 12s)
sora-2-proNewText to VideoVideo Generation (higher fidelity, selected when quality='high')

Embeddings

Model NameInput (tokens)Vector DimensionsTypical Uses
text-embedding-3-small8,1911,536Embeddings
text-embedding-3-large8,1913,072Embeddings

Audio Transcription and Translation

Model NameInput (Hz)Output (tokens)Typical Uses
whisper-1Audio Analysis
gpt-4o-mini-transcribe16,0002,000Audio Analysis (Default)
gpt-4o-transcribe16,0002,000Audio Analysis
gpt-4o-transcribe-diarize16,0002,000Audio Analysis (Speaker ID)

Text-to-Speech (TTS)

Model NameTypical Uses
tts-1TTS
tts-1-hdTTS
gpt-4o-mini-ttsTTS

Moderation

Model NameTypical Uses
omni-moderation-latestModeration

Real-Time Speech-to-Speech (RT / STS)

Model NameStatusInput TokensOutput TokensReasoningTypical Uses
gpt-realtime-2Default128,00032,000YesRealtime voice agent (reasoning, GPT-5-class)
gpt-realtime-1.5Current32,0004,096Realtime voice agent
gpt-realtimeCurrent32,0004,096Realtime voice agent
gpt-realtime-miniCurrent32,0004,096Realtime voice agent
gpt-4o-realtime-previewDeprecated32,0004,096Realtime voice agent
gpt-4o-mini-realtime-previewDeprecated16,0004,096Realtime voice agent
OpenAI Voices: Cedar (New), Marin (New), Alloy, Ash, Ballad, Coral, Echo, Sage, Shimmer, Verse
Mistral AI models are accessed through the native Mistral SDK (@mistralai/mistralai) and are used for Conversation, Image Analysis, Summary, Audio, TTS, Embeddings, and Moderation. Requires an API key.

Conversation / Vision / Summary

Model NameLabelCapabilitiesInput (tokens)Output (tokens)Typical Uses
mistral-large-latestDefaultTools, Streaming, Vision128,0008,192Conversation, Image Analysis, Summary
mistral-medium-2508Tools, Streaming, Vision131,0728,192Conversation, Image Analysis, Summary
mistral-small-2506Tools, Streaming, Vision128,0008,192Conversation, Image Analysis, Summary
pixtral-large-latestVisionTools, Streaming, Vision128,0008,192Conversation, Image Analysis
magistral-medium-2509ReasoningTools, Streaming, Vision128,0008,192Conversation, Image Analysis, Summary
magistral-small-2509Reasoning FastTools, Streaming, Vision40,0008,192Conversation, Image Analysis, Summary
codestral-2508CodeTools, Streaming, Vision256,0008,192Conversation, Summary
devstral-medium-2507DeveloperTools, Streaming, Vision128,0008,192Conversation, Image Analysis, Summary
mistral-saba-latestMultilingualTools, Streaming32,0008,192Conversation, Summary

Deprecated Conversation Models

Model NameCapabilitiesInput (tokens)Typical Uses
pixtral-large-2411Tools, Streaming, Vision128,000Conversation, Image Analysis
mistral-large-2411Tools, Streaming, Vision128,000Conversation, Summary

Audio Transcription (STT)

Model NameTypical Uses
voxtral-mini-latestAudio Analysis (Default)
voxtral-mini-2507Audio Analysis

Text-to-Speech (TTS)

Model NameTypical Uses
voxtral-mini-tts-2603TTS

Embeddings

Model NameInput (tokens)Typical Uses
mistral-embed8,192Embeddings
codestral-embed8,192Embeddings

Moderation

Model NameTypical Uses
mistral-moderation-2411Moderation
xAI models are accessed through xAI’s native API and are used for Conversation, Image Analysis, Summary, Code, Image Generation, Embeddings, TTS, and Real-Time Voice. Requires an API key.

Conversation / Vision / Summary

Model NameStatusCapabilitiesInput (tokens)ThinkingTypical Uses
grok-4.20-0309-reasoningCurrentTools, Streaming, Vision2,000,000YesConversation, Image Analysis, Summary
grok-4.20-0309-non-reasoningCurrentTools, Streaming, Vision2,000,000Conversation, Image Analysis, Summary
grok-4.20-multi-agent-0309CurrentTools, Streaming, Vision2,000,000YesConversation, Image Analysis, Summary
grok-4-1-fast-reasoningCurrentTools, Streaming, Vision2,000,000YesConversation, Image Analysis, Summary
grok-4-1-fast-non-reasoningCurrentTools, Streaming, Vision2,000,000Conversation, Image Analysis, Summary
grok-4DeprecatedTools, Streaming, Vision2,000,000Deprecated (alias)
grok-build-0.1CurrentTools, Streaming, Vision256,000YesAgentic coding Conversation, Image Analysis
grok-code-fast-1CurrentTools, Streaming256,000Code-focused Conversation

Image Generation

Model NameStatusCapabilitiesTypical Uses
grok-imagine-image-proCurrentVisionImage Generation
grok-imagine-imageCurrentVisionImage Generation
These models are shaped by aspect ratio rather than exact pixel size — Pria can request a square, landscape, or portrait shape (e.g. 1:1, 16:9, 9:16, 3:2) and defaults to a square. An unsupported shape is mapped to the closest available aspect.

Embeddings

Model NameInput (tokens)Vector DimensionsTypical Uses
grok-embedding-small8,0001,024Embeddings

Text-to-Speech (TTS)

Model NameTypical Uses
xai-ttsTTS
xAI Voices: Eve, Ara, Rex, Sal, Leo

Real-Time Speech-to-Speech (xAI Voice Agent)

Model NameStatusTypical Uses
grok-3-fastDefaultRealtime voice agent
xAI RT Voices: Eve (Default), Ara, Rex, Sal, Leo
Audio transcription (STT) for xAI delegates to the configured OpenAI transcription model (e.g., gpt-4o-mini-transcribe).
Stability AI models are accessed through Stability’s v2beta REST API and are dedicated to media generation: Image, Audio, and Video. Requires an API key (STABILITY_API_KEY).

Image Generation

Model NameLabelCapabilitiesTypical Uses
stability.stable-image-ultraStable Image UltraVisionImage Generation
stability.stable-image-coreStable Image CoreVisionImage Generation
stability.sd3.5-largeSD 3.5 LargeVisionImage Generation
These models are shaped by aspect ratio rather than exact pixel size — Pria can request shapes such as 1:1, 16:9, 9:16, 21:9, 3:2, or 4:5 and defaults to a square. An unsupported shape is mapped to the closest available aspect.

Audio Generation

Model NameLabelTypical Uses
stability.stable-audio-2Stable Audio 2 (text to audio, up to 190s)Audio Generation

Video Generation

Model NameLabelStatusTypical Uses
stability.image-to-videoStable Video (DEPRECATED — provider shut down 2025-07-24)DeprecatedVideo Generation
Stability AI retired its video generation API on 2025-07-24. The model is kept in the dropdown for backward compatibility but calls return a deprecation message. Select Amazon Nova Reel (Bedrock, default) or Sora 2 (OpenAI) for video generation.
Stability AI remains a dedicated media-generation provider for Image and Audio — it does not expose Conversation, Embeddings, STT, or RT Voice. Conversation models from OpenAI, Anthropic, Gemini, Mistral, xAI, or Bedrock can invoke generate_image and generate_audio tools that route to Stability, and generate_video routes to Nova Reel or Sora 2 depending on videoGenerationModel.

Reasoning Effort & Thinking

Some AI models support extended thinking (also called reasoning), where the model spends additional internal tokens analyzing a problem before producing a visible response. Praxis AI provides a unified 5-level reasoning effort system that works across all supported providers.

The 5 Effort Levels

LevelDescriptionBest For
NoneDisable thinking. Fastest responses, lowest cost.Simple queries, quick lookups
LowMinimal reasoning.Straightforward questions
MediumBalanced reasoning.Most everyday tasks
HighThorough reasoning.Complex analysis, multi-step problems
MaxMaximum reasoning depth. Highest latency and cost.Research, detailed technical analysis

How the 5 Levels Map to Each Provider

Each provider exposes thinking through a different API parameter. Praxis translates the unified level for you — the table below documents what actually goes on the wire so admins can predict cost and latency.
ProviderNative parameternonelowmediumhighmax
Anthropic (Direct & Bedrock)thinking.budget_tokensomitted1,0244,09616,38432,000
OpenAI (Responses & Chat)reasoning_effortomitted"low""medium""high""high"
Google Gemini (3.x)thinkingConfig.thinkingLevelomitted"low""medium""high""high"
Google Gemini (2.5)thinkingConfig.thinkingBudget01,0248,19224,57632,768
xAI (grok-3-mini only)reasoning_effortomitted"low""medium""high""high"
xAI (grok-4.x, grok-build-0.1)thinking is automatic; the parameter is rejected
Mistral (magistral-*)extra_body.reasoning_effortomitted"low""medium""high""high"
Bedrock (Amazon Nova, Meta Llama, Cohere, OpenAI OSS)thinking unsupported
Models that don’t support thinking ignore the setting silently. xAI’s grok-4.x family and grok-build-0.1 always think before answering — there is no way to disable it, and Praxis omits reasoning_effort entirely to avoid an API error.

Resolution Priority

The effective reasoning effort for a request is resolved in this order:
  1. Custom AI Model override — a BYOM record with a reasoning_effort set wins over everything
  2. Chat Completion endpoint overridechatCompletionReasoningEffort (when the request came in through /api/ai/chat/completions)
  3. Institution settingreasoningEffort on the Digital Twin
  4. Platform defaultnone (thinking disabled)

Interaction with Deep Research

The OpenAI deep research models (o3-deep-research, o4-mini-deep-research) always run with maximum reasoning regardless of the institution setting — they are tuned for multi-hour autonomous research and ignore the reasoning_effort knob. Pria’s UI surfaces deep research as a dedicated assistant toggle rather than a conversation model selection.

Models with Thinking Support

Look for the Thinking column = “Yes” in the catalog tables above. As of this writing:
  • Anthropic: Claude Opus 4.7, Opus 4.6, Opus 4.5, Sonnet 4.6, Sonnet 4.5, Sonnet 4, Claude 3.7 Sonnet, Haiku 4.5 (Bedrock or Direct API)
  • OpenAI: GPT-5.4 series (5.4, 5.4-pro, 5.4-mini, 5.4-nano), GPT-5 series (5.2, 5.1, 5-mini, 5-nano), o-series (o4-mini, o3, o3-mini, o1)
  • Google Gemini: Gemini 3.1 Pro Preview, Gemini 3.1 Flash Lite Preview, Gemini 3 Flash/Pro Preview, Gemini 2.5 Pro / Flash / Flash Lite
  • xAI: Grok-4.20 (reasoning), Grok-4.20 (multi-agent), Grok-4-1 fast (reasoning), Grok Build 0.1
  • Mistral: Magistral Medium 2509, Magistral Small 2509

Image & Video Generation Providers

Pria can route image generation requests to multiple providers, and video generation to two (Bedrock Nova Reel and OpenAI Sora). The conversation model invokes the generate_image or generate_video tool; Pria dispatches to the provider configured for the appropriate Model Use.
ProviderImage GenerationImage EditingVideo GenerationNotes
OpenAIgpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-3Yes (gpt-image-1 series)sora-2, sora-2-pro (4 / 8 / 12s)DALL-E 3 is text-to-image only; gpt-image-1 adds true image-to-image editing. Sora 2 Pro fires when quality='high'.
Amazon Bedrockamazon.nova-canvas-v1:0, amazon.titan-image-generator-v2:0 (deprecated)Yes (Titan & Nova Canvas inpainting/outpainting)amazon.nova-reel-v1:1 (default — text/image to 6s video)Nova Reel is the default video provider when Sora is not selected.
Stability AIstability.stable-image-ultra, stability.stable-image-core, stability.sd3.5-largeYes (Stability v2beta REST)Retired 2025-07-24Image and audio generation only via Stability native API.
Google Geminigemini-2.5-flash-image, gemini-3.1-flash-image-preview, gemini-3-pro-image-previewYes (Gemini native image edit)NoImagen-class generation through the GenAI native SDK.
xAIgrok-imagine-image-pro, grok-imagine-imageNo (xAI images.edit() not available)NoUses aspect_ratio instead of size; supports ratios such as 1:1, 16:9, 3:2.
Mistral— (delegates)No native image gen; routes to Bedrock or OpenAI.
To set the active image provider, pick a model from the Image Generation dropdown under Personalization. The generate_image tool always dispatches to the selected model. To set the video provider, pick from Video Generation (Bedrock Nova Reel default; switch to OpenAI Sora 2 if preferred).

Content Moderation Models

When Enable Moderation is turned on under Configuration, every user message is sent to the configured moderation model before the conversation model. Flagged messages are blocked and a notification email is sent to the instance contact email.
ProviderModels
OpenAIomni-moderation-latest, text-moderation-stable
Mistralmistral-moderation-2411
See Content Moderation for the full category taxonomy, threshold configuration, and how to handle false positives.

Embeddings Models

Embeddings power all retrieval in Pria — every uploaded file is chunked, embedded, and stored in the vector index. The conversation model then retrieves nearest-neighbour chunks at query time (Normal RAG) and optionally fuses with the knowledge-graph leg (KAG Fusion).
ProviderModelDimensionsMax Input TokensNotes
OpenAItext-embedding-3-small1,5368,191Cost-effective, strong baseline
OpenAItext-embedding-3-large3,0728,191Highest retrieval quality
Amazon Bedrockamazon.titan-embed-text-v2:01,0248,192Bedrock-region resident
Amazon Bedrockamazon.titan-embed-text-v1 (deprecated)1,5368,192
Google Geminigemini-embedding-2-preview3,0728,192Multimodal — text + images
Google Geminigemini-embedding-0013,0722,048Text-only
Mistralmistral-embed1,0248,192General purpose
Mistralcodestral-embed1,0248,192Code-tuned
xAIgrok-embedding-small1,0248,000xAI native embeddings
Changing the embeddings model requires re-embedding all existing vault content. Each model produces vectors in its own coordinate space — they are not interchangeable. After switching, run the admin reindex flow to regenerate embeddings; until then, retrieval quality drops sharply.
For chunk size, sanitization, enrichment, and per-vault tuning, see Knowledge & RAG Configuration.

Prompt Caching

Some providers support prompt caching, which reduces latency and input token costs by reusing previously processed prompt prefixes. Praxis AI enables prompt caching automatically where supported — no configuration is needed.
ProviderCaching TypeHow It WorksCost Savings
OpenAIAutomaticCached automatically on every request — no code changes needed. The API returns cached_tokens in the usage response.Up to 50% on cached input tokens
AnthropicExplicitPraxis marks cache breakpoints on tools, system prompt, and the last user message using cache_control headers. Cached prefixes are reused on subsequent requests.Up to 90% on cached reads
Google GeminiContext cachingSupports context caching via a separate API to create reusable cached content objects.Varies by content size and TTL
Amazon BedrockVariesDepends on the underlying model provider (e.g., Anthropic models on Bedrock inherit Anthropic’s caching).Varies
Mistral AINot availableThe Mistral API does not currently support prompt caching. Usage tracking returns promptTokens, completionTokens, and totalTokens only.
xAIAutomaticCached automatically on every request. The API returns cached_tokens in the usage response and supports conversation-level caching via the x-grok-conv-id header.Up to 50% on cached input tokens
Prompt caching is most impactful for conversations with long system prompts, many tools, or extended history — exactly the pattern used by Praxis AI’s RAG pipeline. Anthropic and OpenAI caching are enabled by default for all eligible requests.

How to Verify Caching is Working

The admin Conversation History view shows per-turn token usage. Look for the cached_tokens (or cache_read_input_tokens for Anthropic) field — a non-zero value confirms the request hit the cache. Cached prefixes save you ~50–90% on those input tokens; you only pay full price for the new portion of each turn.

Zero Data Retention (ZDR)

Zero Data Retention (ZDR) means the AI model provider does not store your prompts or the model’s responses after the request completes — nothing you send to the model is retained on the provider’s servers, used for training, or available for later review. ZDR is a contractual and technical guarantee offered by major providers for API traffic, and it is a cornerstone requirement for privacy-sensitive deployments in education, healthcare, and the enterprise. Praxis AI middleware ships only with models that comply with ZDR. Every model in the platform catalog routes through provider API tiers covered by zero-data-retention practices — your institution’s conversations, documents, and knowledge never become provider-side data.

The exception: Anthropic Mythos-class models

Anthropic’s Mythos-class models (Claude Mythos 5 and Claude Fable 5) are subject to a mandatory 30-day data retention for trust-and-safety purposes, on every platform, effective June 9, 2026. This retention overrides Zero Data Retention agreements — it applies even to organizations with ZDR contracts in place. Because of this, the Claude Fable 5 model is flagged in the Praxis AI model selector: it carries a red warning icon and a “30-day retention” marker next to its name. Selecting it is an explicit, informed choice — administrators choosing this model accept that prompts and outputs sent to it are retained by Anthropic for 30 days.
If your institution requires strict ZDR compliance (e.g., for regulatory or contractual reasons), do not select a flagged model. All other catalog models remain fully ZDR-compliant.
For the full provider policy, see Anthropic’s notice: Data retention practices for Mythos-class models.

KAG Analysis Model

During file ingestion, Pria runs a separate KAG analysis model to extract a knowledge graph — entities, relationships, and aliases — from each chunk. The graph then powers the KAG Fusion retrieval mode. This model is independent of the Conversation model; you can pair an expensive conversation model with a cheap analysis model, or vice versa.
SettingDefaultNotes
Platform defaultCurated by Praxis AIUsed whenever no institution override is set
KAG Analysis Model (institution)Empty = inherit the platform defaultSet per Digital Twin; only catalog models tagged for KAG are offered
Why have a separate model? KAG extraction is a batch background job — latency does not matter, but cost and structured-output reliability do. The extractor must produce well-formed structured records consistently; a model that summarizes well can still be a poor extractor. Praxis AI validates which catalog models qualify before tagging them for KAG analysis. Recommended, validated extractors (low cost, near-zero invalid records): openai.gpt-oss-120b-1:0 / openai.gpt-oss-safeguard-120b (Bedrock), deepseek/deepseek-v4-flash, stepfun/step-3.7-flash, z-ai/glm-4.7-flash, google/gemma-4-31b-it, qwen/qwen3-30b-a3b-instruct-2507. The conversation model can be much heavier without dragging up your ingestion bill.
For KAG Fusion to work, files must have been ingested while KAG was enabled. Toggling on KAG Fusion does not retroactively process old files — kick off the admin reindex to backfill.

Chat Completions Endpoint Model

When the Chat Completions Integration is enabled for your Digital Twin, inbound requests arrive at /api/ai/chat/completions from external clients (today’s primary consumer is the ElevenLabs Voice Agent in Convo Direct mode; tomorrow: any OpenAI-SDK-compatible client). You can pin a different conversation model for those inbound requests than the one your in-app users see.
FieldDefaultNotes
Chat Completion EnabledOffMaster toggle. When off, the endpoint returns 403.
Chat Completion Modelempty (inherit)Empty = use the same model as the in-app conversation.
Chat Completion Max Completion Tokens-1 (inherit)-1 = inherit institution maxCompletionTokens; 0 = catalog cap; positive number = explicit cap.
Chat Completion Reasoning Effortempty (inherit)Empty = inherit institution reasoningEffort.
Use this when the in-app twin runs Claude Opus 4.7 for the richest in-browser experience, but inbound voice agents should call a smaller, faster model (e.g., gpt-5-mini) where end-to-end latency dominates user perception. The override is conversation-only — summary, embedding, image, audio, and TTS still use the regular institution selection.

Provider Types

Praxis AI routes AI requests through seven backend providers:
ProviderHow It Works
Amazon BedrockModels hosted on AWS infrastructure. Uses IAM credentials for authentication.
OpenAI APIDirect OpenAI API calls. Used for OpenAI models and OpenAI-compatible endpoints.
Anthropic Direct APIDirect Anthropic API calls. Bypasses Bedrock for Claude models when preferred.
Google GenAIDirect Google Gemini API calls via the @google/genai SDK.
Mistral AIDirect Mistral API calls via the @mistralai/mistralai SDK.
xAIDirect xAI API calls using the openai npm package with xAI’s base URL.
Stability AIDirect Stability AI v2beta REST calls (image and audio generation only; video retired 2025-07-24).
Some model families (e.g., Anthropic Claude, Mistral) are available through multiple providers — both via Bedrock and via Direct API. The admin can choose which provider to use based on latency, cost, and regional availability preferences.

Bring Your Own AI Model (BYOM)

You can connect your own hosted LLM (for example, a model deployed on Google Vertex AI, private OpenAI-compatible endpoint, or a Bedrock-hosted custom model) and use it as a replacement for any of the supported usages.

Configure a Custom Model

To add a custom model for Conversation (or any other use):
  1. In the Admin UI, edit your Digital Twin.
  2. Under Personalization and AI Models, click Add AI Model.
Add AI Model
  1. In the Add AI Model panel, enter the properties required to connect to your LLM:
Add AI Model
  • Model Name The exact model identifier published by your hosting platform. This value is case sensitive and must match your provider’s model name, for example: gemini-flash or projects/my-proj/locations/us/models/my-model.
  • Status Active models are considered by the system for routing and selection. Inactive models are ignored but kept in configuration.
  • Description Human-readable description of the LLM for admins and authors using this Digital Twin.
  • Model Use The specific usage for this model (for example, Conversation, Image Generation, Document Summarization). This determines which internal calls will use this model.
  • Client Library Type Choose from:
    • Open AI for OpenAI-compatible endpoints (including many custom or Vertex AI gateways exposing an OpenAI-style API).
    • Bedrock for Amazon Bedrock-hosted models. Most Gemini-based models connected through an OpenAI-compatible proxy should use Open AI.
  • API URL The base public URL of your model endpoint, for example: https://ai.my-school.edu or your Bedrock-compatible endpoint. Typically, the model name or ID is appended to this base URL when interacting with the LLM.
  • API Key The secret key used to authenticate requests to your endpoint. Keep this key secure and confidential; rotate it periodically for security.
  1. Click Save to register the new custom AI model.
Add AI Model Once saved:
  • The model appears in the list of custom AI models.
  • For its configured Model Use, it will replace the platform default model.
  • All conversations or tasks mapped to that Model Use will start using your custom model without any client-side code changes.
Use a non-production Digital Twin first to validate latency, cost, and behavior of your custom model before assigning it to high-traffic or mission-critical usages.

End-to-End Workflow

1

Configure Provider Credentials

Go to Configuration → Personalization and AI Models and enter API keys and endpoints for each provider you plan to use (OpenAI-compatible, Bedrock, or custom gateways).
2

Select Models per Usage

For each Model Use (Conversation, Image, Audio, etc.), select the preferred model from the list of available platform and custom models.
3

Enable and Test Your Digital Twin

Use the Test or preview mode to run conversations against your updated configuration. Validate:
  • Response quality
  • Latency
  • Tool and streaming support (for Conversation models)
4

Monitor and Optimize

Use Analytics to track token usage, latency, and error rates per model. Adjust your model selection or routing preferences to balance performance and cost.
5

Scale to Production

Once validated, deploy your Digital Twin to users through LMS integration (e.g., Canvas), Web SDK, or REST APIs—no additional code changes required when switching models.
6

Connect New Digital Twins

Repeat the configuration setup for any additional twins so they can connect to the same custom LLM
Need help choosing models or configuring BYOM? Praxis AI supports multi-LLM orchestration and can route across OpenAI, Anthropic, Amazon, Google, Mistral, xAI, Stability AI (media), and your own hosted models in a single Digital Twin configuration.