Skip to main content

Migration: from OpenAI Moderation API

Per-category booleans → unified Passed/Corrected/Blocked.

Why move

The Moderation API is a content classifier. It says "this is hate / sexual / self-harm / violence / harassment" — and stops there. It does nothing about it, has no audit story, and only covers categories the OpenAI team curates. Trinitite is a complete decision system for arbitrary policies: you train a Guardian against your rubric and get the verdict + the patch + the ledger receipt.

Concept mapping

OpenAI Moderation conceptTrinitite equivalent
Predefined categories (hate, sexual, …)Trained Guardians on your policies
flagged: true / falseoutcome: passed / corrected / blocked
categories.hate.scoreEncapsulated in Guardian decision
No remediationRFC 6902 JSON Patch
No auditGlass Box Ledger receipt

API translation

POST /v1/chat
{
"guardian": "tone-and-policy",
"input": [{
  "role": "assistant",
  "content": "<ai output>"
}]
}

← 200 OK
{
"outcome": "blocked",
"reason": "Output projects into Hate-speech sub-region",
"policy_hash": "0xa83f...",
"ledger_id": "lg_01HZ2T..."
}

One decision. Custom rubric (built from your handbook, your MSA, your compliance docs). Auditable and replayable. Same surface for every policy domain — not just the OpenAI fixed set.