Skip to main content
Pria exposes an OpenAI-compatible Chat Completions API that lets you interact with any Digital Twin using the standard OpenAI client libraries. If your application already uses the OpenAI SDK, connecting to a Praxis Digital Twin requires just three changes: the base URL, the model ID, and an authentication header.
Estimated setup time: under 5 minutes if you already have a Praxis account and a Digital Twin configured.

How It Works

Point

Set the OpenAI client’s base_url to your Praxis Chat Completions endpoint.

Authenticate

Pass your Praxis JWT via the x-access-token header to identify the user session.

Chat

Use your Digital Twin’s Public ID as the model parameter — that’s it.

Prerequisites

Before you begin, make sure you have:
  • A Praxis AI account with access to at least one Digital Twin
  • The Digital Twin’s Public ID (a UUID like e455529a-4f51-479e-94fc-bbebb41d19a1) — found in your instance’s administration panel
  • A valid Praxis JWT token (x-access-token) — obtained when a user authenticates with Praxis (see Authentication)
  • Chat Completions enabled on the Digital Twin — this integration is off by default; an administrator must turn it on for the instance
The Chat Completions endpoint is part of the Pria platform itself — the base URL is your Pria server’s /api/ai path (e.g. https://pria.praxislxp.com/api/ai). It is disabled per Digital Twin by default; if requests return 403 chat_completion_disabled, ask the instance administrator to enable the Chat Completions endpoint in the instance configuration.

Quick Start

1

Install the OpenAI SDK

pip install openai
2

Configure the client

Point the SDK to your Praxis endpoint and pass your authentication token.
from openai import OpenAI

client = OpenAI(
    api_key="unused",  # required by SDK, but not used for auth
    base_url="https://pria.praxislxp.com/api/ai",
    default_headers={
        "x-access-token": "your-praxis-jwt-token"
    }
)
The api_key field is required by the OpenAI SDK but is not used for authentication. Praxis authenticates via the x-access-token header. If your deployment also uses API keys, pass it as the api_key value instead (see Authentication below).
3

Send a message (streaming)

Use your Digital Twin Public ID as the model parameter, and set stream=True.
The endpoint always streams — responses are delivered as OpenAI-format SSE chunks regardless of the stream flag, so use your SDK’s streaming mode.
stream = client.chat.completions.create(
    model="e455529a-4f51-479e-94fc-bbebb41d19a1",
    messages=[
        {"role": "user", "content": "Explain your area of expertise."}
    ],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Authentication

The API supports two authentication methods that can be used independently or together.

Praxis JWT (primary)

Pass the user’s Praxis session token via the x-access-token header. This is the primary authentication method — it identifies the user and authorizes access to their Digital Twins.
x-access-token: eyJhbGciOiJIUzI1NiIs...
Chat completions always require a valid Praxis JWT in the x-access-token header. An API key alone is not sufficient.

Getting a JWT from a personal API key (server-to-server)

For scripts and server-to-server integrations, exchange a personal API key (prefixed pria_) for a JWT, then pass that JWT in the x-access-token header. The raw pria_… key is not accepted directly — exchange it first:
APIKEY=pria_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
JWT=$(curl -sS -X POST "https://pria.praxislxp.com/api/auth/api-key-signin" \
       -H "x-api-key: $APIKEY" \
       | python -c "import json,sys; print(json.load(sys.stdin)['token'])")
client = OpenAI(
    api_key="unused",
    base_url="https://pria.praxislxp.com/api/ai",
    default_headers={
        "x-access-token": jwt_from_api_key_exchange
    }
)

Context Headers

Optional headers let you pass conversation metadata to the Digital Twin. These enrich the interaction context without affecting authentication.
HeaderTypeDescriptionDefault
x-access-tokenstringRequired. Praxis JWT for user authentication
x-praxis-institution-public-idstringPublic ID of the target Digital Twin (when the user belongs to several)User’s primary instance
x-praxis-conversation-idstringNumeric conversation/course identifier0
x-praxis-conversation-namestringHuman-readable conversation or course name""
x-praxis-assistant-idstringRoutes the request to a specific assistant persona""
x-praxis-timezonestringIANA timezone (e.g. America/New_York) for date-aware promptsServer default
Context headers are useful when your application manages multiple conversations or needs to target a specific assistant within a Digital Twin.
client = OpenAI(
    api_key="unused",
    base_url="https://pria.praxislxp.com/api/ai",
    default_headers={
        "x-access-token": "your-praxis-jwt-token",
        "x-praxis-conversation-id": "48201",
        "x-praxis-conversation-name": "Biology 101 - Fall 2026",
        "x-praxis-assistant-id": "assistant-uuid-here",
    }
)

Message Roles

The API accepts standard OpenAI message roles with the following behavior:
RoleBehavior
userThe last user message is the active turn sent to the Digital Twin. Earlier user messages are replayed as conversation history.
assistantReplayed as conversation history alongside earlier user messages.
systemIgnored — the Digital Twin builds its own system instructions from assistant and instance settings.
toolAccepted in the shape but ignored — tool execution is managed server-side by Pria.
You can send only the current user message (Pria tracks the conversation via x-praxis-conversation-id), or pass your own running message array — prior user/assistant turns you include are replayed as history for the active turn.

Response Format

The endpoint always streams. Responses arrive as standard OpenAI SSE chunks:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1700000000,"model":"e455529a-...","choices":[{"index":0,"delta":{"content":"I cover topics in"},"finish_reason":null}]}

data: [DONE]
Read them exactly as you would an OpenAI streaming response — iterate the stream and concatenate choices[0].delta.content.

Supported Parameters

ParameterSupportedNotes
modelYesInformational — the effective model follows the Digital Twin’s configuration cascade (assistant override → instance Chat Completions model override → instance conversation model)
messagesYesMessage array (required)
streamAcceptedThe response is always SSE — streaming is forced regardless of this flag
temperature / max_tokens / top_p / response_format and other tuning fieldsAcceptedAccepted for SDK compatibility but not forwarded — these settings are managed by the Digital Twin’s configuration
tools / tool_choiceNot supportedThe Twin’s own server-side tools run automatically; client-supplied function calling is not available
Administrators can set Chat-Completions-specific overrides on the instance — a dedicated model, a max-completion-tokens cap, and a reasoning-effort level (commonly none for voice agents) — without affecting the Twin’s normal in-app behaviour.

Error Handling

Errors follow the standard OpenAI error format:
{
  "error": {
    "message": "Description of the error",
    "type": "auth_error",
    "param": null,
    "code": "missing_praxis_token"
  }
}

Common Errors

StatusCodeCauseFix
401unauthorizedMissing, expired, or invalid JWT in x-access-tokenAdd or refresh the Praxis JWT (exchange your pria_… key again if needed)
403chat_completion_disabledThe Chat Completions endpoint is not enabled for the Digital TwinAsk the instance administrator to enable it in the instance configuration
404model_not_foundInvalid Digital Twin Public IDVerify the Public ID in the instance’s administration panel
429rate_limit_exceededToo many requestsReduce request frequency or request a higher limit
504timeoutDigital Twin took too long to respondRetry the request

Multi-Provider Routing

The Chat Completions API is a front door to the Pria platform, not a thin proxy to a single model provider. Behind the URL, Pria selects the underlying provider and model based on the Digital Twin’s configuration. As of today, Pria can route to:
ProviderModel families
OpenAIGPT‑4o, GPT‑4.1, o‑series reasoning, GPT‑image, GPT‑Realtime
AnthropicClaude 3.5 / 3.7 / 4.x — Sonnet, Opus, Haiku
Amazon BedrockClaude, Llama, Titan, Stable Diffusion
Google GenAIGemini 2.x / 3.x — Pro, Flash, Live (Convo Mode)
MistralMistral Large, Voxtral (TTS / STT), Codestral
xAIGrok 3, Grok 4, Grok 3‑mini (reasoning)
The model parameter you pass in the request is the Digital Twin Public ID — not a provider model ID. Pria resolves it to the configured underlying model. If the Twin’s admin changes the underlying model from Claude to GPT‑4o, your code does not change.

Per‑Provider Behavioural Differences

Because Pria forwards to many providers, some advanced behaviours are provider‑dependent and respect the Twin’s configuration rather than the request payload:
  • Reasoning effort — accepted for OpenAI o‑series and xAI grok-3-mini. Grok 4.x reasons automatically and ignores the parameter.
  • Thinking tokens — Anthropic Claude 3.7+ and Gemini 2.5+ support extended thinking budgets, configured per Twin.
  • Image generation — supported by OpenAI (gpt-image-1), Bedrock (Stable Diffusion via Stability), Google (Imagen), and xAI (grok-2-image). Mistral delegates to OpenAI or Bedrock.
  • Prompt caching — automatic for Anthropic, OpenAI, and xAI; reported in the response usage block when present.
  • Tool calls — the Digital Twin’s server‑side tools (RAG, web search, charts, connectors, MCP) run automatically. Client‑supplied tools / tool_choice are not forwarded (see Supported Parameters).
For the full behaviour matrix, see AI Models.

Provider Authentication Errors

When the Twin is configured to use a provider that requires its own credentials (BYOT — Bring Your Own Tokens), errors from the underlying provider are surfaced back to you as standard OpenAI‑style errors:
StatusLikely cause
401 from chat completionsPraxis JWT missing or expired
401 with provider name in messageThe Twin’s underlying provider key (OpenAI, Anthropic, etc.) is missing or invalid — contact the Twin’s admin
429Rate limited at the provider level — back off and retry
5xx with provider_errorUpstream provider is degraded; retry with exponential backoff
See BYOT (Bring Your Own Tokens) for how Twin admins configure provider keys.

Cost & Credits

Chat completions consume Pria credits, billed by token usage and the underlying provider’s price tier. Each response’s usage block reports prompt_tokens, completion_tokens, and total_tokens — the same fields the OpenAI SDK consumers already read.
  • Cached prompt tokens (when the provider supports caching) are billed at a discounted rate.
  • Streaming requests are billed identically to non‑streaming.
  • Embedded RAG retrieval runs as part of the Digital Twin’s response and is included in the credit cost — you do not pay separately for vector search.
For your account’s plan and credit balance, see Plans & Credits and Credit Management.

Chat Completions vs. the Pria Runtime API

Pria exposes two complementary APIs. They look similar but behave very differently — choose based on whether you want stateless OpenAI‑style requests or full Pria session semantics.
Chat Completions APIRuntime API
ShapeOpenAI‑compatiblePria‑native REST + WebSocket
StateStateless — each call is independentStateful — Pria tracks the user’s conversation thread
HistoryLast user message is the active turn; earlier messages you include are replayed as historyPria stores the full thread server‑side
RAG / KAG retrievalRuns automatically as part of the responseRuns automatically as part of the response
Tool callsThe Twin’s tools run server‑side; not exposed to the callerThe Twin’s tools run server‑side; tool events streamed to the caller
Conversation continuityUse x-praxis-conversation-id to group calls into a threadNative; each call references a historyId
StreamingOpenAI SSE formatPria event stream (richer event types)
Best forDrop‑in replacement for OpenAI in existing codeBuilding a full Pria‑powered chat experience from scratch
Rule of thumb: if your code already uses the OpenAI SDK and you want to point it at Pria with minimal changes, use the Chat Completions API. If you’re building a new client from scratch and want access to every Pria capability (per‑message tool events, citations, KAG augmentations, structured memory updates), use the Runtime API. See the API Reference for both.
  • API Reference — Full REST API documentation with streaming details
  • AI Models — Provider catalog, reasoning effort, thinking, and image generation behaviour per provider
  • API Keys — Issue and rotate the API keys used to obtain Praxis JWTs
  • BYOT (Bring Your Own Tokens) — How Twin admins configure per‑provider credentials
  • Plans & Credits — How token usage maps to credits
  • MCP Server — Connect Pria to custom LLM workflows
  • Web SDK — Embed the full Digital Twin UI in your web app
  • JavaScript SDK — Programmatic control of the Pria interface