Praxis Shield — Threat & Abuse Monitoring

Praxis Shield is the platform’s background safety layer. It continuously reviews conversations across your Digital Twin instances for signs of misuse — things like prompt-injection attempts, jailbreaking, attempts to extract credentials or probe internal systems, email abuse, and unusual automation patterns. Suspicious activity is assessed by an AI adjudicator and surfaced to you as incidents for human review.

Detection never acts on its own. Praxis Shield does not block messages, interrupt conversations, or suspend anyone automatically. It flags, you decide. Every action taken on an incident is a deliberate reviewer decision, and every decision is recorded in the incident’s audit trail.

Praxis Shield complements Content Moderation: moderation screens individual messages as they are sent, while Praxis Shield reviews conversations afterwards for security and abuse patterns that only emerge in context.

Where to find it

Open Admin → Instances and switch the view at the top of the page to Threats Detection (Praxis Shield). The panel is cross-instance: it shows incidents from every Digital Twin you manage in one list, so you don’t have to check each instance separately.

Privacy by scope. You only ever see incidents belonging to institutions where you hold admin rights — enforced server-side on every request. Other institutions’ incidents are invisible to you, and yours are invisible to their admins. End users see nothing at all: scanning is silent and changes nothing about their experience.

The incident list

Each row is one incident — a user whose recent activity triggered detection — showing its status, severity, the user (name and email), the instance(s) involved, category badges, and when the activity was last seen. The list is ordered most-severe first, then most recent; scroll to load more, and use the refresh button to pick up new incidents.

Filters

Filter	What it does
Account	Narrow to the instances under one account
Instance	Narrow to a single Digital Twin (the picker lists only twins that have incidents)
Status	Defaults to Active — open, reviewing, and escalated incidents. Pick a specific status, or All to include resolved and false-positive incidents
Min severity	Hide everything below a chosen severity level

Severity levels

Level	Label	Meaning
0	None	Informational; no real risk identified
1	Low	Minor or ambiguous signals
2	Medium	Clear misuse signals worth review
3	High	Serious abuse attempt
4	Critical	Severe threat — review immediately

Status lifecycle

Incidents move through a simple workflow:

open — newly detected, not yet reviewed
reviewing — a reviewer is investigating
escalated — needs attention from another reviewer or your security team
resolved — reviewed and closed
false_positive — reviewed and judged benign

Resolved and false-positive incidents drop off the default Active view but remain available (set the status filter to All) with their full history intact.

The incident report

Click the report icon on any row to open the full incident report. It opens in read-only mode; click Edit (admin) to reveal the management controls. Evidence tabs:

Summary — what happened, in plain language, with the assessed impact
Signals — the specific detection signals that fired
Evidence — the relevant conversation transcripts (inputs, outputs, and tool activity)
LLM Adjudication — the AI reviewer’s assessment of the evidence
Cost — usage associated with the flagged activity

Workflow tabs:

Workflow — change the incident’s status and severity, and adjust its categories
Reviewer Notes — add timestamped notes for yourself and other reviewers
Action History — the complete audit trail: every status change, note, and action, with who did it and when

Use the previous/next arrows to move through the incident list without leaving the report.

Evidence transcripts are written by the very users being investigated, so the report renders all of it as inert plain text — no formatting is interpreted and URLs are never clickable. This protects reviewers: a malicious link planted in a flagged conversation cannot be opened by accident. Don’t copy URLs out of evidence and visit them.

Triaging an incident

Open the report

Start from the highest-severity open incidents. Read the Summary, then check the Evidence tab to see the actual conversation.

Mark it as reviewing

Click Edit (admin), set the status to reviewing, and adjust the severity up or down if your reading of the evidence differs from the initial assessment.

Record what you find

Add a reviewer note summarizing your conclusion. Notes are visible to other admins of the institution and become part of the permanent record.

Close it out

Set the status to resolved (real issue, handled), false_positive (benign — e.g. a security class legitimately discussing injection techniques), or escalated if someone else needs to take over.

Suspending a user

If the evidence warrants it, the suspend action on an incident row sets the user’s membership in the affected Digital Twin to inactive — they immediately lose AI access in that instance, while their memberships in other institutions are untouched. The suspension is recorded in the incident’s action history, and you can reinstate the user at any time from Users. You cannot suspend yourself or Praxis AI staff accounts from an incident.

Marking false positives

Set the incident’s status to false_positive in the Workflow tab. The incident leaves the active queue but is never deleted — the categories, evidence, and your notes stay on record, which helps you recognize the same benign pattern faster next time. Every report has a Copy report link button that puts a shareable URL on your clipboard. Send it to a colleague to put them in front of the exact same report. The link is safe to share because it is login-gated: the recipient must sign in, and the server only serves the report to admins of the incident’s own institution. Anyone else — including admins of other institutions — gets an access-denied message. The shared view is strictly read-only.

Researching further

An incident is often the starting point, not the whole story:

View flagged conversation (the eye icon on a row) shows the exact conversation records that triggered the incident — inputs, responses, and tool activity — rendered as plain text.
Histories lets you search the user’s wider conversation activity in the instances you manage.
Sessions shows where and on what devices the user has been signing in — useful for judging whether an account is shared or compromised.

Required permissions

Action	Required entitlement
View incidents, reports, and flagged conversations	`institutions.list`
Change status/severity/categories, add notes and actions	`institutions.edit`
Suspend a user from an instance	`institutions.edit`

See Entitlements for how to grant these.

Content Moderation — real-time screening of individual messages as they are sent
Histories — search and review conversations across your instances
Sessions — who signed in, from where, on what device
Users — membership status, suspension, and reinstatement
Entitlements — the institutions.* permissions behind this panel

Admin Guide

Account Management

Instance Settings

AI & Models

Assistants & Prompts

Tools & Connectors

User Management

Analytics & Monitoring

Legal & Compliance

API Reference

Runtime API

Administrator API

Integrations

Authentication

Chat Completions

Instructure Canvas

Web App

MCP Server

Voice & Avatars

Google Workspace

LMS Platforms (LTI 1.3)

Billing & Payments

SDK

Praxis Shield — Threat & Abuse Monitoring

Where to find it

The incident list

Filters

Severity levels

Status lifecycle

Categories

The incident report

Triaging an incident

Suspending a user

Marking false positives

Researching further

Required permissions

​Where to find it

​The incident list

​Filters

​Severity levels

​Status lifecycle

​Categories

​The incident report

​Triaging an incident

​Suspending a user

​Marking false positives

​Sharing an incident report

​Researching further

​Required permissions

​Related

Where to find it

The incident list

Filters

Severity levels

Status lifecycle

Categories

The incident report

Triaging an incident

Suspending a user

Marking false positives

Sharing an incident report

Researching further

Required permissions

Related