Skip to main content
Praxis Shield is the platform’s background safety layer. It continuously reviews conversations across your Digital Twin instances for signs of misuse — things like prompt-injection attempts, jailbreaking, attempts to extract credentials or probe internal systems, email abuse, and unusual automation patterns. Suspicious activity is assessed by an AI adjudicator and surfaced to you as incidents for human review.
Detection never acts on its own. Praxis Shield does not block messages, interrupt conversations, or suspend anyone automatically. It flags, you decide. Every action taken on an incident is a deliberate reviewer decision, and every decision is recorded in the incident’s audit trail.
Praxis Shield complements Content Moderation: moderation screens individual messages as they are sent, while Praxis Shield reviews conversations afterwards for security and abuse patterns that only emerge in context.

Where to find it

Open Admin → Instances and switch the view at the top of the page to Threats Detection (Praxis Shield). The panel is cross-instance: it shows incidents from every Digital Twin you manage in one list, so you don’t have to check each instance separately.
Privacy by scope. You only ever see incidents belonging to institutions where you hold admin rights — enforced server-side on every request. Other institutions’ incidents are invisible to you, and yours are invisible to their admins. End users see nothing at all: scanning is silent and changes nothing about their experience.

The incident list

Each row is one incident — a user whose recent activity triggered detection — showing its status, severity, the user (name and email), the instance(s) involved, category badges, and when the activity was last seen. The list is ordered most-severe first, then most recent; scroll to load more, and use the refresh button to pick up new incidents.

Filters

FilterWhat it does
AccountNarrow to the instances under one account
InstanceNarrow to a single Digital Twin (the picker lists only twins that have incidents)
StatusDefaults to Active — open, reviewing, and escalated incidents. Pick a specific status, or All to include resolved and false-positive incidents
Min severityHide everything below a chosen severity level

Severity levels

LevelLabelMeaning
0NoneInformational; no real risk identified
1LowMinor or ambiguous signals
2MediumClear misuse signals worth review
3HighSerious abuse attempt
4CriticalSevere threat — review immediately

Status lifecycle

Incidents move through a simple workflow:
  • open — newly detected, not yet reviewed
  • reviewing — a reviewer is investigating
  • escalated — needs attention from another reviewer or your security team
  • resolved — reviewed and closed
  • false_positive — reviewed and judged benign
Resolved and false-positive incidents drop off the default Active view but remain available (set the status filter to All) with their full history intact.

Categories

Category badges tell you at a glance what kind of behaviour was detected — for example prompt injection, jailbreak attempts, credential-exfiltration attempts, probing of internal systems or authentication, email abuse, misuse of tool chains, unusual automation volume, or coordinated activity across accounts. One incident can carry several categories.

The incident report

Click the report icon on any row to open the full incident report. It opens in read-only mode; click Edit (admin) to reveal the management controls. Evidence tabs:
  • Summary — what happened, in plain language, with the assessed impact
  • Signals — the specific detection signals that fired
  • Evidence — the relevant conversation transcripts (inputs, outputs, and tool activity)
  • LLM Adjudication — the AI reviewer’s assessment of the evidence
  • Cost — usage associated with the flagged activity
Workflow tabs:
  • Workflow — change the incident’s status and severity, and adjust its categories
  • Reviewer Notes — add timestamped notes for yourself and other reviewers
  • Action History — the complete audit trail: every status change, note, and action, with who did it and when
Use the previous/next arrows to move through the incident list without leaving the report.
Evidence transcripts are written by the very users being investigated, so the report renders all of it as inert plain text — no formatting is interpreted and URLs are never clickable. This protects reviewers: a malicious link planted in a flagged conversation cannot be opened by accident. Don’t copy URLs out of evidence and visit them.

Triaging an incident

1

Open the report

Start from the highest-severity open incidents. Read the Summary, then check the Evidence tab to see the actual conversation.
2

Mark it as reviewing

Click Edit (admin), set the status to reviewing, and adjust the severity up or down if your reading of the evidence differs from the initial assessment.
3

Record what you find

Add a reviewer note summarizing your conclusion. Notes are visible to other admins of the institution and become part of the permanent record.
4

Close it out

Set the status to resolved (real issue, handled), false_positive (benign — e.g. a security class legitimately discussing injection techniques), or escalated if someone else needs to take over.

Suspending a user

If the evidence warrants it, the suspend action on an incident row sets the user’s membership in the affected Digital Twin to inactive — they immediately lose AI access in that instance, while their memberships in other institutions are untouched. The suspension is recorded in the incident’s action history, and you can reinstate the user at any time from Users. You cannot suspend yourself or Praxis AI staff accounts from an incident.

Marking false positives

Set the incident’s status to false_positive in the Workflow tab. The incident leaves the active queue but is never deleted — the categories, evidence, and your notes stay on record, which helps you recognize the same benign pattern faster next time.

Sharing an incident report

Every report has a Copy report link button that puts a shareable URL on your clipboard. Send it to a colleague to put them in front of the exact same report. The link is safe to share because it is login-gated: the recipient must sign in, and the server only serves the report to admins of the incident’s own institution. Anyone else — including admins of other institutions — gets an access-denied message. The shared view is strictly read-only.

Researching further

An incident is often the starting point, not the whole story:
  • View flagged conversation (the eye icon on a row) shows the exact conversation records that triggered the incident — inputs, responses, and tool activity — rendered as plain text.
  • Histories lets you search the user’s wider conversation activity in the instances you manage.
  • Sessions shows where and on what devices the user has been signing in — useful for judging whether an account is shared or compromised.

Required permissions

ActionRequired entitlement
View incidents, reports, and flagged conversationsinstitutions.list
Change status/severity/categories, add notes and actionsinstitutions.edit
Suspend a user from an instanceinstitutions.edit
See Entitlements for how to grant these.
  • Content Moderation — real-time screening of individual messages as they are sent
  • Histories — search and review conversations across your instances
  • Sessions — who signed in, from where, on what device
  • Users — membership status, suspension, and reinstatement
  • Entitlements — the institutions.* permissions behind this panel