Migration: from OpenAI Moderation API
Per-category booleans → unified Passed/Corrected/Blocked.
Why move
The Moderation API is a content classifier. It says "this is hate / sexual / self-harm / violence / harassment" — and stops there. It does nothing about it, has no audit story, and only covers categories the OpenAI team curates. Trinitite is a complete decision system for arbitrary policies: you train a Guardian against your rubric and get the verdict + the patch + the ledger receipt.
Concept mapping
| OpenAI Moderation concept | Trinitite equivalent |
|---|---|
| Predefined categories (hate, sexual, …) | Trained Guardians on your policies |
flagged: true / false | outcome: passed / corrected / blocked |
categories.hate.score | Encapsulated in Guardian decision |
| No remediation | RFC 6902 JSON Patch |
| No audit | Glass Box Ledger receipt |
API translation
POST /v1/chat
{
"guardian": "tone-and-policy",
"input": [{
"role": "assistant",
"content": "<ai output>"
}]
}
← 200 OK
{
"outcome": "blocked",
"reason": "Output projects into Hate-speech sub-region",
"policy_hash": "0xa83f...",
"ledger_id": "lg_01HZ2T..."
}One decision. Custom rubric (built from your handbook, your MSA, your compliance docs). Auditable and replayable. Same surface for every policy domain — not just the OpenAI fixed set.