How It Works
Point
Set the OpenAI client’s
base_url to your Praxis Chat Completions endpoint.Authenticate
Pass your Praxis JWT via the
x-access-token header to identify the user session.Chat
Use your Digital Twin’s Public ID as the
model parameter — that’s it.Prerequisites
Before you begin, make sure you have:- A Praxis AI account with access to at least one Digital Twin
- The Digital Twin’s Public ID (a UUID like
e455529a-4f51-479e-94fc-bbebb41d19a1) — found in your instance’s administration panel - A valid Praxis JWT token (
x-access-token) — obtained when a user authenticates with Praxis (see Authentication) - Chat Completions enabled on the Digital Twin — this integration is off by default; an administrator must turn it on for the instance
The Chat Completions endpoint is part of the Pria platform itself — the base URL is your Pria server’s
/api/ai path (e.g. https://pria.praxislxp.com/api/ai). It is disabled per Digital Twin by default; if requests return 403 chat_completion_disabled, ask the instance administrator to enable the Chat Completions endpoint in the instance configuration.Quick Start
Configure the client
Point the SDK to your Praxis endpoint and pass your authentication token.
The
api_key field is required by the OpenAI SDK but is not used for authentication. Praxis authenticates via the x-access-token header. If your deployment also uses API keys, pass it as the api_key value instead (see Authentication below).Authentication
The API supports two authentication methods that can be used independently or together.Praxis JWT (primary)
Pass the user’s Praxis session token via thex-access-token header. This is the primary authentication method — it identifies the user and authorizes access to their Digital Twins.
Getting a JWT from a personal API key (server-to-server)
For scripts and server-to-server integrations, exchange a personal API key (prefixedpria_) for a JWT, then pass that JWT in the x-access-token header. The raw pria_… key is not accepted directly — exchange it first:
Context Headers
Optional headers let you pass conversation metadata to the Digital Twin. These enrich the interaction context without affecting authentication.| Header | Type | Description | Default |
|---|---|---|---|
x-access-token | string | Required. Praxis JWT for user authentication | — |
x-praxis-institution-public-id | string | Public ID of the target Digital Twin (when the user belongs to several) | User’s primary instance |
x-praxis-conversation-id | string | Numeric conversation/course identifier | 0 |
x-praxis-conversation-name | string | Human-readable conversation or course name | "" |
x-praxis-assistant-id | string | Routes the request to a specific assistant persona | "" |
x-praxis-timezone | string | IANA timezone (e.g. America/New_York) for date-aware prompts | Server default |
Context headers are useful when your application manages multiple conversations or needs to target a specific assistant within a Digital Twin.
Message Roles
The API accepts standard OpenAI message roles with the following behavior:| Role | Behavior |
|---|---|
user | The last user message is the active turn sent to the Digital Twin. Earlier user messages are replayed as conversation history. |
assistant | Replayed as conversation history alongside earlier user messages. |
system | Ignored — the Digital Twin builds its own system instructions from assistant and instance settings. |
tool | Accepted in the shape but ignored — tool execution is managed server-side by Pria. |
You can send only the current user message (Pria tracks the conversation via
x-praxis-conversation-id), or pass your own running message array — prior user/assistant turns you include are replayed as history for the active turn.Response Format
The endpoint always streams. Responses arrive as standard OpenAI SSE chunks:choices[0].delta.content.
Supported Parameters
| Parameter | Supported | Notes |
|---|---|---|
model | Yes | Informational — the effective model follows the Digital Twin’s configuration cascade (assistant override → instance Chat Completions model override → instance conversation model) |
messages | Yes | Message array (required) |
stream | Accepted | The response is always SSE — streaming is forced regardless of this flag |
temperature / max_tokens / top_p / response_format and other tuning fields | Accepted | Accepted for SDK compatibility but not forwarded — these settings are managed by the Digital Twin’s configuration |
tools / tool_choice | Not supported | The Twin’s own server-side tools run automatically; client-supplied function calling is not available |
Administrators can set Chat-Completions-specific overrides on the instance — a dedicated model, a max-completion-tokens cap, and a reasoning-effort level (commonly
none for voice agents) — without affecting the Twin’s normal in-app behaviour.Error Handling
Errors follow the standard OpenAI error format:Common Errors
| Status | Code | Cause | Fix |
|---|---|---|---|
| 401 | unauthorized | Missing, expired, or invalid JWT in x-access-token | Add or refresh the Praxis JWT (exchange your pria_… key again if needed) |
| 403 | chat_completion_disabled | The Chat Completions endpoint is not enabled for the Digital Twin | Ask the instance administrator to enable it in the instance configuration |
| 404 | model_not_found | Invalid Digital Twin Public ID | Verify the Public ID in the instance’s administration panel |
| 429 | rate_limit_exceeded | Too many requests | Reduce request frequency or request a higher limit |
| 504 | timeout | Digital Twin took too long to respond | Retry the request |
Multi-Provider Routing
The Chat Completions API is a front door to the Pria platform, not a thin proxy to a single model provider. Behind the URL, Pria selects the underlying provider and model based on the Digital Twin’s configuration. As of today, Pria can route to:| Provider | Model families |
|---|---|
| OpenAI | GPT‑4o, GPT‑4.1, o‑series reasoning, GPT‑image, GPT‑Realtime |
| Anthropic | Claude 3.5 / 3.7 / 4.x — Sonnet, Opus, Haiku |
| Amazon Bedrock | Claude, Llama, Titan, Stable Diffusion |
| Google GenAI | Gemini 2.x / 3.x — Pro, Flash, Live (Convo Mode) |
| Mistral | Mistral Large, Voxtral (TTS / STT), Codestral |
| xAI | Grok 3, Grok 4, Grok 3‑mini (reasoning) |
model parameter you pass in the request is the Digital Twin Public ID — not a provider model ID. Pria resolves it to the configured underlying model. If the Twin’s admin changes the underlying model from Claude to GPT‑4o, your code does not change.
Per‑Provider Behavioural Differences
Because Pria forwards to many providers, some advanced behaviours are provider‑dependent and respect the Twin’s configuration rather than the request payload:- Reasoning effort — accepted for OpenAI o‑series and xAI
grok-3-mini. Grok 4.x reasons automatically and ignores the parameter. - Thinking tokens — Anthropic Claude 3.7+ and Gemini 2.5+ support extended thinking budgets, configured per Twin.
- Image generation — supported by OpenAI (
gpt-image-1), Bedrock (Stable Diffusion via Stability), Google (Imagen), and xAI (grok-2-image). Mistral delegates to OpenAI or Bedrock. - Prompt caching — automatic for Anthropic, OpenAI, and xAI; reported in the response
usageblock when present. - Tool calls — the Digital Twin’s server‑side tools (RAG, web search, charts, connectors, MCP) run automatically. Client‑supplied
tools/tool_choiceare not forwarded (see Supported Parameters).
Provider Authentication Errors
When the Twin is configured to use a provider that requires its own credentials (BYOT — Bring Your Own Tokens), errors from the underlying provider are surfaced back to you as standard OpenAI‑style errors:| Status | Likely cause |
|---|---|
| 401 from chat completions | Praxis JWT missing or expired |
| 401 with provider name in message | The Twin’s underlying provider key (OpenAI, Anthropic, etc.) is missing or invalid — contact the Twin’s admin |
| 429 | Rate limited at the provider level — back off and retry |
5xx with provider_error | Upstream provider is degraded; retry with exponential backoff |
Cost & Credits
Chat completions consume Pria credits, billed by token usage and the underlying provider’s price tier. Each response’susage block reports prompt_tokens, completion_tokens, and total_tokens — the same fields the OpenAI SDK consumers already read.
- Cached prompt tokens (when the provider supports caching) are billed at a discounted rate.
- Streaming requests are billed identically to non‑streaming.
- Embedded RAG retrieval runs as part of the Digital Twin’s response and is included in the credit cost — you do not pay separately for vector search.
Chat Completions vs. the Pria Runtime API
Pria exposes two complementary APIs. They look similar but behave very differently — choose based on whether you want stateless OpenAI‑style requests or full Pria session semantics.| Chat Completions API | Runtime API | |
|---|---|---|
| Shape | OpenAI‑compatible | Pria‑native REST + WebSocket |
| State | Stateless — each call is independent | Stateful — Pria tracks the user’s conversation thread |
| History | Last user message is the active turn; earlier messages you include are replayed as history | Pria stores the full thread server‑side |
| RAG / KAG retrieval | Runs automatically as part of the response | Runs automatically as part of the response |
| Tool calls | The Twin’s tools run server‑side; not exposed to the caller | The Twin’s tools run server‑side; tool events streamed to the caller |
| Conversation continuity | Use x-praxis-conversation-id to group calls into a thread | Native; each call references a historyId |
| Streaming | OpenAI SSE format | Pria event stream (richer event types) |
| Best for | Drop‑in replacement for OpenAI in existing code | Building a full Pria‑powered chat experience from scratch |
Related
- API Reference — Full REST API documentation with streaming details
- AI Models — Provider catalog, reasoning effort, thinking, and image generation behaviour per provider
- API Keys — Issue and rotate the API keys used to obtain Praxis JWTs
- BYOT (Bring Your Own Tokens) — How Twin admins configure per‑provider credentials
- Plans & Credits — How token usage maps to credits
- MCP Server — Connect Pria to custom LLM workflows
- Web SDK — Embed the full Digital Twin UI in your web app
- JavaScript SDK — Programmatic control of the Pria interface