Bypass System Context for Purpose‑Built Assistants
Some Assistants already have all their operating rules embedded directly in their Assistant instructions (how to use tools, how to format answers, how to behave, etc.). In these cases, repeatedly sending the global System Context on every call is redundant overhead. The Bypass System Context toggle lets you skip this extra layer when it is not needed, reducing input tokens and saving credits.When to Use Bypass System Context
Enable Bypass System Context when:- The Assistant has a well‑defined, self‑contained instruction set.
- Tool usage conventions, output format, and interaction style are already encoded in the Assistant.
- You do not rely on global, shared rules from the System Context for this Assistant.
How to Enable Bypass System Context
You can find the Bypass System Context toggle in the Assistant panel for each Assistant.
- The global System Context is not sent with each request for that Assistant.
- Only the Assistant’s own instructions and the active conversation context are included.
- Input token count decreases, which directly reduces credit usage.
Observing the Credit Savings
As soon as you enable the toggle and run a conversation, you can see the immediate impact in the credit usage views.
- Click on the “minutes ago” label next to a dialogue entry to open the Performance Card and inspect detailed token and credit metrics for that specific exchange.
- Review the Credits used per dialogue; in the example, bypassing the System Context yields a visible gain (e.g., a reduction of 3 credits for the conversation).
Discounts on Cached Input Tokens
Praxis AI’s middleware integrates with LLM providers that support input token caching (such as modern OpenAI and Anthropic models) to transparently reduce the effective number of billable tokens.How Caching Works Conceptually
For models that support caching:- Adding to cache: The first time you send a chunk of prompt (e.g., long instructions, repeated context), the provider may store it in a cache. This step is slightly more expensive.
- Reusing cache: Subsequent calls that reuse the same content can be served more cheaply, because the model recognizes those tokens as cached.
- Cache lifetime: The cache typically lives for about 5 minutes and is automatically managed by providers like Anthropic and OpenAI. You do not need to manage it manually.
How Praxis AI Applies Caching Discounts
Praxis AI normalizes the provider‑level behavior into a consistent credit model:- We track input tokens and identify the portion recognized as cached by the provider.
- We then apply an Input Cached Discount to that cached portion.
- A Discount Ratio (e.g., 30%) defines how much of those cached tokens are discounted.
- The result is a lower Baseline Token count, which is what we finally convert into credits.
- It may cost slightly more for the first request when content is added to the cache.
- Subsequent requests benefit from reduced effective input tokens, producing visible credit savings.
- The Dialogue Report Card for each conversation.
- The Admin → History panel for cross‑assistant, cross‑user analysis.
Example: Effective Discount on a Dialogue
In a typical dialogue:- Without caching, the interaction might cost 8 credits based of 71,398 tokens.
- With caching:
- Part of the input is recognized as cached.
- The Input Cached Discount is applied to those cached tokens 21,915 .
- The Discount Ratio reduces the billable portion of cached tokens 30%.
- After applying the discount, the dialogue may only bill 5 credits, based of the baseline of 49,483 token.
Disable Agents (Reduce Tool Calls)
The core system instructions govern how the LLM uses tools, with explicit rules to minimize unnecessary tool calls in order to keep credit usage and overall costs under control. While each tool call consumes additional credits, tools are essential for unlocking the full power of your digital twin—for example, by querying trusted sources, retrieving Canvas content, or accessing institution-specific data that the base LLM cannot see on its own. Tool definitions are embedded directly into the system prompt sent to the LLM. This tells the model exactly which tools exist, what they do, and when to use them as part of the agentic reasoning process. To reduce prompt size and improve efficiency, you can configure which tools are enabled for a given digital twin and disable any tools that are not needed. This keeps the instruction set lean, reduces token overhead, and focuses your agent on the capabilities that matter most for your use case.
Putting It Together: Strategy for Optimizing Credits
To maximize the return on your investment dollars while maintaining quality:- Use Bypass System Context for Assistants whose behavior is fully defined by their own instructions to cut redundant tokens from every call.
- Leverage caching‑friendly patterns:
- Use models that support caching where suitable for your workload.
- Keep stable, reusable context consistent across requests (e.g., long instructions, shared reference text, tool use).
- Monitor discounts and performance:
- Regularly review the Performance Card and the Admin → History view.
- Track how often cached discounts appear and how they impact effective credits.
- Only use necessary tools:
- Disable unnecessary tools in your Digital Twin
- Provide additional instructions on when to use tools, or what tools to prioritize in your assistant instructions These mechanisms are intentionally generous and designed to keep Praxis AI cost‑competitive while helping you operate Assistants efficiently at scale.
More info
- Plans and Credits – Adding Credits – Adding credits to your digital twin
- Plans and Credits – Caching Discounts – Definitions for tokens, cached discounts, discount ratios, and baseline tokens..