Sends the chunk text to the institution’s summary model for AI-powered cleanup. Removes noise (navigation, boilerplate, encoding artifacts), normalizes whitespace, and fixes broken formatting — without summarizing or shortening the content.
The sanitized text is returned for preview but not persisted. To save the
cleaned text, call PUT /api/user/embedding/{id} with the returned sanitizedText.
Token usage is automatically tallied to the parent Upload’s tokens_used counter.
JWT token passed in x-access-token header
Embedding chunk ID to sanitize
Segment sanitized successfully (preview only — not saved)
true
The AI-cleaned segment text. Not persisted — use PUT /embedding/{id} to save.
"This is the cleaned paragraph with noise removed and formatting normalized..."
Token count consumed by the sanitization LLM call (tallied to parent Upload)
142
The LLM model used for sanitization
"anthropic.claude-3-haiku-20250514-v1:0"