Skip to main content
Praxis AI provides multiple levers to help you get more value out of your credits when using Assistants. This guide explains how to reduce token consumption with the Bypass System Context option and how cached input token discounts are applied in the credit computation pipeline.

Bypass System Context for Purpose‑Built Assistants

Some Assistants already have all their operating rules embedded directly in their Assistant instructions (how to use tools, how to format answers, how to behave, etc.). In these cases, repeatedly sending the global System Context on every call is redundant overhead. The Bypass System Context toggle lets you skip this extra layer when it is not needed, reducing input tokens and saving credits.

When to Use Bypass System Context

Enable Bypass System Context when:
  • The Assistant has a well‑defined, self‑contained instruction set.
  • Tool usage conventions, output format, and interaction style are already encoded in the Assistant.
  • You do not rely on global, shared rules from the System Context for this Assistant.
In these scenarios, removing the System Context avoids sending unnecessary tokens to the LLM while preserving behavior quality.

How to Enable Bypass System Context

You can find the Bypass System Context toggle in the Assistant panel for each Assistant. Bypass System Context toggle in the Assistant configuration panel When enabled:
  • The global System Context is not sent with each request for that Assistant.
  • Only the Assistant’s own instructions and the active conversation context are included.
  • Input token count decreases, which directly reduces credit usage.

Observing the Credit Savings

As soon as you enable the toggle and run a conversation, you can see the immediate impact in the credit usage views. Example showing credits gained by bypassing the system context From here:
  • Click on the “minutes ago” label next to a dialogue entry to open the Performance Card and inspect detailed token and credit metrics for that specific exchange.
  • Review the Credits used per dialogue; in the example, bypassing the System Context yields a visible gain (e.g., a reduction of 3 credits for the conversation).

Discounts on Cached Input Tokens

Praxis AI’s middleware integrates with LLM providers that support input token caching (such as modern OpenAI and Anthropic models) to transparently reduce the effective number of billable tokens.

How Caching Works Conceptually

For models that support caching:
  • Adding to cache: The first time you send a chunk of prompt (e.g., long instructions, repeated context), the provider may store it in a cache. This step is slightly more expensive.
  • Reusing cache: Subsequent calls that reuse the same content can be served more cheaply, because the model recognizes those tokens as cached.
  • Cache lifetime: The cache typically lives for about 5 minutes and is automatically managed by providers like Anthropic and OpenAI. You do not need to manage it manually.
Because of the higher cost when content is first cached versus the lower cost when reused, the net benefit depends on how often that content is repeated in your conversations. Typiclly, interactions that uses agents (tool calls) will benefit more from caching.

How Praxis AI Applies Caching Discounts

Praxis AI normalizes the provider‑level behavior into a consistent credit model:
  • We track input tokens and identify the portion recognized as cached by the provider.
  • We then apply an Input Cached Discount to that cached portion.
  • A Discount Ratio (e.g., 30%) defines how much of those cached tokens are discounted.
  • The result is a lower Baseline Token count, which is what we finally convert into credits.
In practice:
  • It may cost slightly more for the first request when content is added to the cache.
  • Subsequent requests benefit from reduced effective input tokens, producing visible credit savings.
These discounts are surfaced in:
  • The Dialogue Report Card for each conversation.
  • The Admin → History panel for cross‑assistant, cross‑user analysis.

Example: Effective Discount on a Dialogue

In a typical dialogue:
  • Without caching, the interaction might cost 8 credits based of 71,398 tokens.
  • With caching:
    • Part of the input is recognized as cached.
    • The Input Cached Discount is applied to those cached tokens 21,915 .
    • The Discount Ratio reduces the billable portion of cached tokens 30%.
  • After applying the discount, the dialogue may only bill 5 credits, based of the baseline of 49,483 token.
You will see this reflected as a discount percentage in the dialogue metrics, clearly indicating how much credit usage was reduced due to caching.

Disable Agents (Reduce Tool Calls)

The core system instructions govern how the LLM uses tools, with explicit rules to minimize unnecessary tool calls in order to keep credit usage and overall costs under control. While each tool call consumes additional credits, tools are essential for unlocking the full power of your digital twin—for example, by querying trusted sources, retrieving Canvas content, or accessing institution-specific data that the base LLM cannot see on its own. Tool definitions are embedded directly into the system prompt sent to the LLM. This tells the model exactly which tools exist, what they do, and when to use them as part of the agentic reasoning process. To reduce prompt size and improve efficiency, you can configure which tools are enabled for a given digital twin and disable any tools that are not needed. This keeps the instruction set lean, reduces token overhead, and focuses your agent on the capabilities that matter most for your use case. Disable Agents

Putting It Together: Strategy for Optimizing Credits

To maximize the return on your investment dollars while maintaining quality:
  • Use Bypass System Context for Assistants whose behavior is fully defined by their own instructions to cut redundant tokens from every call.
  • Leverage caching‑friendly patterns:
    • Use models that support caching where suitable for your workload.
    • Keep stable, reusable context consistent across requests (e.g., long instructions, shared reference text, tool use).
  • Monitor discounts and performance:
    • Regularly review the Performance Card and the Admin → History view.
    • Track how often cached discounts appear and how they impact effective credits.
  • Only use necessary tools:
    • Disable unnecessary tools in your Digital Twin
    • Provide additional instructions on when to use tools, or what tools to prioritize in your assistant instructions These mechanisms are intentionally generous and designed to keep Praxis AI cost‑competitive while helping you operate Assistants efficiently at scale.

More info