OpenAI-compatible Chat Completions endpoint (inbound integration)

curl --request POST \
  --url https://pria.praxislxp.com/api/ai/chat/completions \
  --header 'Content-Type: application/json' \
  --header 'x-access-token: <x-access-token>' \
  --data '
{
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "model": "pria",
  "stream": true
}
'

"data: {\"id\":\"chatcmpl-1735...\",\"object\":\"chat.completion.chunk\",\"created\":1735000000,\"model\":\"pria\",\"choices\":[{\"index\":0,\"delta\":{\"role\":\"assistant\",\"content\":\"\"},\"finish_reason\":null,\"logprobs\":null}]}\n\ndata: {\"id\":\"chatcmpl-1735...\",\"object\":\"chat.completion.chunk\",\"created\":1735000000,\"model\":\"pria\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\"finish_reason\":null,\"logprobs\":null}]}\n\ndata: [DONE]\n"

AI

OpenAI-compatible Chat Completions endpoint (inbound integration)

OpenAI-compatible streaming chat completions endpoint. Accepts a messages[] array, extracts the last user message, runs it through Praxis’s RAG/tool pipeline, and streams back OpenAI-format SSE chunks.

Today’s primary consumer: the ElevenLabs Voice Agent in Convo (Direct) mode — its custom-LLM webhook points at this endpoint.

Per-institution gate. Disabled by default. The administrator must set institution.chatCompletionEnabled = true to allow inbound traffic. Disabled institutions receive 403 chat_completion_disabled.

Override fields (institution-level, all optional):

chatCompletionModel — overrides the conversation model and provider routing for inbound requests. Priority: assistant.conversationModel

chatCompletionModel > institution.conversationModel. Assistant always wins — the override only applies when no assistant has overridden the conversation model. Empty/unset = inherit from existing cascade.
chatCompletionMaxCompletionTokens — overrides maxCompletionTokens. Sentinel: -1 = inherit, 0 = Auto (catalog cap), >0 = explicit.
chatCompletionReasoningEffort — overrides reasoningEffort. Empty string = inherit. Common voice-mode value: 'none'.

Detection of “this is a chat-completion inbound request” is path-based — any request landing here sets requestArgs.isChatCompletion = true, which the override helpers in rag.js and reasoning_effort_utils.js read to apply the cascade above.

POST

/

api

/

ai

/

chat

/

completions

OpenAI-compatible Chat Completions endpoint (inbound integration)

curl --request POST \
  --url https://pria.praxislxp.com/api/ai/chat/completions \
  --header 'Content-Type: application/json' \
  --header 'x-access-token: <x-access-token>' \
  --data '
{
  "messages": [
    {
      "role": "system",
      "content": "<string>"
    }
  ],
  "model": "pria",
  "stream": true
}
'

"data: {\"id\":\"chatcmpl-1735...\",\"object\":\"chat.completion.chunk\",\"created\":1735000000,\"model\":\"pria\",\"choices\":[{\"index\":0,\"delta\":{\"role\":\"assistant\",\"content\":\"\"},\"finish_reason\":null,\"logprobs\":null}]}\n\ndata: {\"id\":\"chatcmpl-1735...\",\"object\":\"chat.completion.chunk\",\"created\":1735000000,\"model\":\"pria\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\"finish_reason\":null,\"logprobs\":null}]}\n\ndata: [DONE]\n"

Headers

x-access-token

string

required

Praxis JWT

x-praxis-institution-public-id

string

Public ID of the institution context (optional, defaults to user's primary institution)

x-praxis-conversation-id

string

Conversation/course ID

x-praxis-assistant-id

string

Assistant ObjectId (24-char hex)

x-praxis-timezone

string

IANA timezone string (e.g. "America/New_York") for date-aware prompts

Body

application/json

messages

object[]

required

OpenAI-format messages array. The last user message is processed.

Show child attributes

model

string

Ignored — the effective model is determined by institution.chatCompletionModel (when set) or the regular conversation cascade.

Example:

"pria"

stream

boolean

Required to be true. The endpoint always streams.

Example:

true

Response

SSE stream of OpenAI-format completion chunks. Terminated with a final data: [DONE] line.

The response is of type string.

Example:

"data: {\"id\":\"chatcmpl-1735...\",\"object\":\"chat.completion.chunk\",\"created\":1735000000,\"model\":\"pria\",\"choices\":[{\"index\":0,\"delta\":{\"role\":\"assistant\",\"content\":\"\"},\"finish_reason\":null,\"logprobs\":null}]}\n\ndata: {\"id\":\"chatcmpl-1735...\",\"object\":\"chat.completion.chunk\",\"created\":1735000000,\"model\":\"pria\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\"finish_reason\":null,\"logprobs\":null}]}\n\ndata: [DONE]\n"

Send a message with HTTP SSE streaming response Check status of the middleware application