Modern AI models can do more than blurt the first answer that comes to mind. The best ones quietly work through the problem first — drafting, checking, revising — before producing a final reply. That hidden draft is called reasoning (or “extended thinking”, “chain of thought”). Pria gives you a single slider, the Reasoning Effort picker, to control how much of that thinking you want.Documentation Index
Fetch the complete documentation index at: https://docs.praxis-ai.com/llms.txt
Use this file to discover all available pages before exploring further.
What “reasoning” means
When a thinking model receives your question, it can spend extra time generating an internal monologue before it starts writing the reply:- “Let me re-read the question.”
- “There are two ways to interpret this — I’ll consider both.”
- “That step looks wrong, let me redo it.”
Not every model supports reasoning. The Conversation Model picker shows a badge on models that do.
The 5 effort levels
None
No thinking. The model answers immediately from its first read. Fast and cheap. Good for casual chat, quick lookups, and voice conversations where latency matters.
Low
A brief inner pass — usually a few seconds extra. Catches obvious mistakes without slowing things down much.
Medium
A solid amount of thinking. The model checks its work, considers alternatives, and reorders its plan if needed. A sensible default for most knowledge work.
High
Deep thinking. Several seconds to a minute of internal reasoning. Use for nuanced analysis, complex multi-step problems, and long-form writing where quality matters more than speed.
Max
Maximum effort. The model uses the largest reasoning budget the provider allows. Slow and credit-heavy, but produces the highest-quality output for genuinely hard problems.
Per-model differences
Every AI provider implements thinking slightly differently. Pria abstracts the differences behind the same 5-level scale, but here’s what’s happening underneath:Anthropic Claude
Anthropic Claude
Claude’s “extended thinking” mode reserves a token budget for an internal reasoning block. Low/Medium/High/Max map to progressively larger budgets. The thinking tokens are billed but not shown in the final reply (unless you turn on the display toggle).
OpenAI (GPT-5, o-series)
OpenAI (GPT-5, o-series)
OpenAI reasoning models accept a
reasoning_effort parameter directly. None / Low / Medium / High / Max pass through to the API. Newer GPT-5 generations also surface short “thinking summary” snippets which Pria can render live.Google Gemini
Google Gemini
Gemini 3.x models use
thinkingLevel (low/medium/high/max), older 2.5-pro models use thinkingBudget (a token count, minimum 128). Pria translates the unified scale into whichever knob the model supports.xAI Grok
xAI Grok
Only
grok-3-mini accepts an explicit reasoning effort. Grok 4 reasons automatically and ignores the slider — Max and Medium produce the same answer.Bedrock-routed models
Bedrock-routed models
Models served via AWS Bedrock (Claude on Bedrock, Nova, Llama) inherit the underlying model’s reasoning support and Pria’s scale maps onto each one’s native parameter.
Mistral and others
Mistral and others
Mistral and most older non-reasoning models don’t accept the slider at all — the picker is greyed out and the model answers in its normal one-shot mode.
The thinking display toggle
By default Pria shows you only the final answer. If you want to peek at the reasoning as it streams:Open the Digital Twin settings
Click the Digital Twin name at the top of the chat to open the instance settings panel.
Find the Show thinking toggle
Look for Show thinking blocks (sometimes labelled with the icon). Flip it on.
Reading thinking blocks during a stream
While a reasoning model is thinking, you’ll see:- A growing “thinking” panel above the reply area, streaming in real time.
- A spinner / waveform indicating the model is still planning.
- The final answer appears below once thinking finishes and the model begins composing the reply.
Thinking content is informative, not authoritative. The final answer is what the model wants to commit to. Treat the thinking block as “showing the work”, not as a second answer to react to.
Cost & speed tradeoffs
| Level | Typical extra latency | Credit overhead | When it pays off |
|---|---|---|---|
| None | 0 | 0 | Greetings, quick facts, voice |
| Low | 1–5s | ~10–20% | Light QA, drafting |
| Medium | 5–15s | ~30–60% | Default knowledge work |
| High | 15–60s | 2–4× | Analysis, long planning |
| Max | Up to several minutes | 5–10× | Hard, high-stakes problems |
When to use Max
Reach for Max when the answer’s quality matters more than time or cost:- Deep research questions that require synthesising many sources.
- Long, multi-step plans you’ll act on (project plans, architecture proposals).
- Nuanced analysis where small mistakes have big downstream cost.
- Hard puzzles, derivations, proofs, or ambiguous specifications.
- Final-draft writing on important material.
When to use None or Low
Drop to None or Low when speed and economy matter:- Voice mode (long thinking pauses break the conversation’s natural rhythm).
- Quick factual lookups (“what’s the capital of…”).
- Casual chitchat, small talk, formatting tweaks to existing text.
- Bulk drafting where you’ll edit afterwards anyway.
- Repetitive workflows where the answer is well within the model’s first-pass capability.
Related
- Switching AI Models — pick a model that supports thinking.
- Knowledge Modes — feed the model your uploaded context for grounded reasoning.
- Convo Mode — when to dial effort down for live voice.
- Credits — how thinking tokens affect your spend.
- Credit optimization — tactics for getting the depth you need without overspending.