Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.praxis-ai.com/llms.txt

Use this file to discover all available pages before exploring further.

Modern AI models can do more than blurt the first answer that comes to mind. The best ones quietly work through the problem first — drafting, checking, revising — before producing a final reply. That hidden draft is called reasoning (or “extended thinking”, “chain of thought”). Pria gives you a single slider, the Reasoning Effort picker, to control how much of that thinking you want.

What “reasoning” means

When a thinking model receives your question, it can spend extra time generating an internal monologue before it starts writing the reply:
  • “Let me re-read the question.”
  • “There are two ways to interpret this — I’ll consider both.”
  • “That step looks wrong, let me redo it.”
You only see the polished final answer, but the reasoning step is what makes the difference between a snappy guess and a carefully argued response. Pria can also surface the live thinking stream so you can watch the model work — useful for transparency, debugging tricky problems, and learning how the model approaches a task.
Not every model supports reasoning. The Conversation Model picker shows a badge on models that do.

The 5 effort levels

None

No thinking. The model answers immediately from its first read. Fast and cheap. Good for casual chat, quick lookups, and voice conversations where latency matters.

Low

A brief inner pass — usually a few seconds extra. Catches obvious mistakes without slowing things down much.

Medium

A solid amount of thinking. The model checks its work, considers alternatives, and reorders its plan if needed. A sensible default for most knowledge work.

High

Deep thinking. Several seconds to a minute of internal reasoning. Use for nuanced analysis, complex multi-step problems, and long-form writing where quality matters more than speed.

Max

Maximum effort. The model uses the largest reasoning budget the provider allows. Slow and credit-heavy, but produces the highest-quality output for genuinely hard problems.
The slider is in the sidebar under Settings → Reasoning Effort, just below the Conversation Model picker.

Per-model differences

Every AI provider implements thinking slightly differently. Pria abstracts the differences behind the same 5-level scale, but here’s what’s happening underneath:
Claude’s “extended thinking” mode reserves a token budget for an internal reasoning block. Low/Medium/High/Max map to progressively larger budgets. The thinking tokens are billed but not shown in the final reply (unless you turn on the display toggle).
OpenAI reasoning models accept a reasoning_effort parameter directly. None / Low / Medium / High / Max pass through to the API. Newer GPT-5 generations also surface short “thinking summary” snippets which Pria can render live.
Gemini 3.x models use thinkingLevel (low/medium/high/max), older 2.5-pro models use thinkingBudget (a token count, minimum 128). Pria translates the unified scale into whichever knob the model supports.
Only grok-3-mini accepts an explicit reasoning effort. Grok 4 reasons automatically and ignores the slider — Max and Medium produce the same answer.
Models served via AWS Bedrock (Claude on Bedrock, Nova, Llama) inherit the underlying model’s reasoning support and Pria’s scale maps onto each one’s native parameter.
Mistral and most older non-reasoning models don’t accept the slider at all — the picker is greyed out and the model answers in its normal one-shot mode.
You don’t need to know which knob a model uses — pick the level that matches the depth you want and Pria handles the translation.

The thinking display toggle

By default Pria shows you only the final answer. If you want to peek at the reasoning as it streams:
1

Open the Digital Twin settings

Click the Digital Twin name at the top of the chat to open the instance settings panel.
2

Find the Show thinking toggle

Look for Show thinking blocks (sometimes labelled with the icon). Flip it on.
3

Ask a question

The next reply renders a collapsible “thinking” block above the final answer. You can expand it to read the model’s work, or leave it folded if you only want the conclusion.
The toggle is per Digital Twin — you can keep thinking visible on a research twin and hidden on a customer-facing one.

Reading thinking blocks during a stream

While a reasoning model is thinking, you’ll see:
  • A growing “thinking” panel above the reply area, streaming in real time.
  • A spinner / waveform indicating the model is still planning.
  • The final answer appears below once thinking finishes and the model begins composing the reply.
The thinking block is collapsible after the answer arrives — click the header to fold or unfold it. Long thinking sessions are scrollable inside the panel, so they don’t push the answer off-screen.
Thinking content is informative, not authoritative. The final answer is what the model wants to commit to. Treat the thinking block as “showing the work”, not as a second answer to react to.

Cost & speed tradeoffs

LevelTypical extra latencyCredit overheadWhen it pays off
None00Greetings, quick facts, voice
Low1–5s~10–20%Light QA, drafting
Medium5–15s~30–60%Default knowledge work
High15–60s2–4×Analysis, long planning
MaxUp to several minutes5–10×Hard, high-stakes problems
Numbers are rough — they vary by model, question complexity, and provider. Watch the credit indicator on each reply to see the real cost.

When to use Max

Reach for Max when the answer’s quality matters more than time or cost:
  • Deep research questions that require synthesising many sources.
  • Long, multi-step plans you’ll act on (project plans, architecture proposals).
  • Nuanced analysis where small mistakes have big downstream cost.
  • Hard puzzles, derivations, proofs, or ambiguous specifications.
  • Final-draft writing on important material.
You’ll usually pair Max with a frontier model (Claude Opus, GPT-5, Gemini 3 Pro, Grok 4) and a Knowledge Mode that gives the model rich context.

When to use None or Low

Drop to None or Low when speed and economy matter:
  • Voice mode (long thinking pauses break the conversation’s natural rhythm).
  • Quick factual lookups (“what’s the capital of…”).
  • Casual chitchat, small talk, formatting tweaks to existing text.
  • Bulk drafting where you’ll edit afterwards anyway.
  • Repetitive workflows where the answer is well within the model’s first-pass capability.
Setting effort to None on a reasoning-required problem doesn’t break the model — it just produces a shallower, more error-prone answer. If a reply looks wrong, bump the slider and re-ask.