The Guardian Architecture
Civil Engineering for Cognition.
The transition from AI-as-publisher to AI-as-operator has fundamentally altered the liability surface of enterprise software. A language model that writes emails has a hallucination problem. A language model that executes SQL queries, initiates transfers, and triggers automations has a governance problem.
Trinitite's answer is structural: install a deterministic control layer between the probabilistic Actor and the execution environment. Not a filter. Not a prompt. A Guardian — built on the same engineering principles that stabilized aviation, finance, and operating systems over the last century.
The Platform at a Glance
Trinitite is not a single service. It is a stack of governed intermediaries — identity, data plane, intelligence, trust — that all share one Guardian evaluation kernel, one identity model, and one audit ledger. This is the map. Every box links to its own deep-dive.
The Trinitite Platform — 13 surfaces, one Guardian evaluation
Every surface shares the same Guardian evaluation kernel, the same identity model, and the same Glass Box Ledger. Click any box for the deep-dive.
The Three Outcomes
Every response from your AI passes through a Guardian. The Guardian makes exactly one decision:
This is not a content filter. The Guardian is a trained model that understands your policies geometrically — mapping output vectors against a Policy Manifold — and either passes, surgically repairs, or blocks the output before it reaches your infrastructure.
The Intercept Flow
The Guardian sits inline between your AI model and your application. It intercepts the raw output vector, evaluates it against the active Policy Manifold, applies Semantic Rectification if needed, and logs every decision to the Glass Box Ledger.
Your application receives one of three outcomes. The workflow continues. No re-generation. No human-in-the-loop for the common case.
Why Separation of Concerns
The failure of "Native Safety" — prompt engineering, RLHF guardrails, output filters — is not a code problem. It is a topology problem.
We are currently asking the same neural parameters to be both the Artist (creative, stochastic) and the Censor (restrictive, deterministic). These objectives are mathematically incompatible.
The Guardian Architecture enforces a strict bifurcation: the Actor is permitted to be creative and prone to failure. The Guardian is cold, binary, and deterministic. Decades of engineering precedent support exactly this pattern.
Batch-Invariant Determinism
The central failure mode of native safety under load is floating-point non-associativity.
Modern GPU inference engines dynamically change their reduction strategy based on server load — splitting Key-Value cache calculations differently at batch size 1 vs. batch size 128. This changes the accumulation order of floating-point operations, which cascades through Chain-of-Thought reasoning, causing the model's safety posture to drift.
The result: attack vectors that were blocked in the lab breach the system in production. Our validation data quantified this at 21.4% safety drift in production Thinking models.
The Guardian solves this by enforcing a Fixed-Size Split-KV Strategy: the tile size of the KV cache reduction is locked in software (e.g., 256 elements) regardless of batch size or hardware utilization. This forces the GPU to execute the exact same accumulation tree for request N whether it is the only request on the server or one of ten thousand.
Bitwise reproducibility is now an off-the-shelf commodity. Open-source inference engines (SGLang, vLLM) already support it via configuration flags. The failure to implement it is no longer a capability gap — it is a fiduciary choice to operate without available safety controls.
Native Safety:
Batch Size 1 → [A + B + C] = safe
Batch Size 128 → [C + A + B] = unsafe ← floating-point non-associativity
Guardian:
Batch Size 1 → Fixed tile → [A + B + C] = safe
Batch Size 128 → Fixed tile → [A + B + C] = safe ← 0.00% variance
Semantic Rectification
When an output vector falls in a Forbidden Zone, the Guardian does not block it by default — blocking causes workflow disruption. Instead, it calculates the Difference Vector required to shift the output to the nearest Safe Centroid in the Policy Manifold, and returns that as an RFC 6902 JSON Patch.
Semantic Rectification — Vector Space Projection
The Guardian does not ask a language model to "rewrite this safely." It maps the dangerous vector to the nearest pre-validated Safe Centroid — a deterministic geometric calculation that produces a unique, mathematically guaranteed result.
This is not "fancy regex." Regex looks for syntax (DROP TABLE). It fails against obfuscation (D_R_O_P T_A_B_L_E), semantic variation, or base64-encoded commands.
Rectification looks for semantic intent — vector space coordinates. If an attacker uses pig latin to request a database deletion, the embedding model maps "deletion" to the same vector coordinates regardless of syntax. The Guardian identifies the vector in the Destructive Zone and applies a transformation matrix to shift it into the Read-Only Zone. The resulting text is reconstructed from the safe vector.
The result: corrections handle intent (the "Why"), not just syntax (the "What").
The Safe Snap
The Guardian is not permitted to invent corrections. It can only snap to Pre-Validated Centroids — safe states that have already passed the Test-Driven Governance suite. This means every correction is a mathematically proven safe state, not a guess. The system collapses undefined behavior into defined, tested behavior.
The Glass Box Ledger
Every governance decision is written to an append-only, cryptographically chained ledger: the State-Tuple Ledger.
Each block captures: (timestamp, input_hash, policy_hash, outcome, corrections, governance_hash) — chained as H_n = Hash(H_{n-1} || S_n).
If a single byte of a log entry from three months ago is altered, the current block's hash fails validation. This guarantees non-repudiation: neither the enterprise nor its AI provider can deny an action that occurred.
Why this matters in court
In civil aviation, the National Transportation Safety Board distinguishes between Pilot Notes (mutable, subjective) and the Flight Data Recorder (objective, hardened). When the FDR data contradicts the pilot's testimony, the FDR wins.
Standard chat logs are Pilot Notes. The State-Tuple Ledger is the FDR. It records the vector state, the active policy hash, and the rectification delta. Without it, your defense relies on hearsay. With it, your evidence is science.
Forensic Replayability: because Guardians are batch-invariant, you can take any input vector from the log and replay the event with bitwise precision. This turns the platform into a flight simulator for debugging — rewind the tape, adjust the variables, and prove the fix works before redeployment.
Self-Hosted Deployment
Deployment stack
# docker-compose.yml
services:
control-plane:
image: trinitite/control-plane:latest
ports:
- "8080:8080"
environment:
- DB_TYPE=postgres
- LEDGER_ADAPTER=s3_worm
- LORA_STORAGE_ADAPTER=s3
governance:
image: trinitite/governance:latest
ports:
- "8000:8000"
environment:
- INFERENCE_ENGINE=sglang
- ENABLE_LORA=true
- ENABLE_DETERMINISTIC_INFERENCE=true
volumes:
- ./manifolds:/manifolds
- ./guardians:/guardians
The redirect is one environment variable:
# Before
OPENAI_BASE_URL=https://api.openai.com/v1
# After — route through the Guardian proxy
OPENAI_BASE_URL=http://localhost:8080/v1/proxy
Your application doesn't change. The platform intercepts all traffic, applies governance, and proxies the inference call to your configured backend.
Persistence adapters
The ledger backend is pluggable — swap it via environment variable with zero code changes:
| Tier | Backend | Use case |
|---|---|---|
| Standard | S3 Object Lock / WORM | Commercial durability, adverse-inference defense |
| Managed | Cloud KMS / HSM | Regulatory separation of duties |
| Sovereign | Hardware TEE (Nvidia Confidential Computing) | Nation-state non-repudiation |
| Edge | SQLite | Air-gapped / on-premise deployments |
Integration Patterns
Full Proxy
Redirect OPENAI_BASE_URL to the Trinitite proxy. It handles the full round-trip: intercept → inference → sanitize → return.
Your application receives clean JSON. No code changes required. The Guardian manages the complete chain of custody.
LIABILITY: Trinitite owns the sanitization.
Oracle Endpoint
Call POST /v1/chat with your AI's raw output and the Guardian to apply. You receive a verdict (passed, corrected, or blocked) and the JSON Patch to apply.
You control the inference and apply the corrections yourself. Useful for specialized architectures where proxying traffic isn't possible.
LIABILITY: Shifts to you if you ignore the patch.
Pattern A in practice
Application Guardian Proxy (your VPC)
│ │
│ POST /v1/proxy/chat/completions │
│ {model: "gpt-4o", ...} │
│ ───────────────────────────────► │
│ │ → intercept
│ │ → evaluate vector
│ │ → apply rectification
│ │ → log to ledger
│ │ → proxy to OpenAI
│ │ ← receive raw response
│ │ → re-evaluate response
│ ◄────────────────────────────── │
│ {clean, governed response} │
Pattern B in practice
Application Guardian (your VPC)
│ │
│ (your AI call, your code) │
│ raw_output = ai.complete(...) │
│ │
│ POST /v1/chat │
│ {guardian: "PII-Redactor", │
│ instructions: "...", │
│ input: [..., raw_output]} │
│ ───────────────────────────────► │
│ │
│ ◄────────────────────────────── │
│ {status: "corrected", │
│ corrections: [...]} │
│ │
│ apply(raw_output, corrections) │
Federated Defense
Federated Defense
Herd Immunity via LoRA Hot-Swaps
A monolithic safety model cannot simultaneously understand HIPAA compliance, SEC regulations, and polymorphic malware detection without catastrophic latency or "forgetting." Trinitite replaces the monolith with a swarm of specialized Guardians.
"An attack on one client strengthens the defenses of all clients."
LoRA architecture
Guardians use Low-Rank Adaptation (LoRA) to represent policies as lightweight tensor files — megabytes, not gigabytes. This enables:
- Per-request policy switching — HIPAA for one request, SOC 2 for the next, in the same batch
- Hot-swap updates — new policies applied in sub-millisecond pointer swaps, no restarts
- Non-destructive patching — extend an existing Guardian's capabilities without retraining from scratch
- Stacked policies — baseline universal safety + custom enterprise rules combined via vector summation
# Extend an existing Guardian with a new threat vector
from peft import PeftModel
model = PeftModel.from_pretrained(
base_model,
"./guardian-pii-v1.0",
is_trainable=True # unlock the LoRA weights for the patch
)
# Oracle-guided distillation on the new threat data
# → new weights saved to ./guardian-pii-v1.1
Test-Driven Governance
A Guardian's Policy Manifold is not static. It expands through Test-Driven Governance (TDG) — the application of software TDD principles to AI policy.
Every identified failure mode becomes a permanent constraint:
Red → New threat vector identified. Guardian does not block it.
Green → Vector ingested. Guardian trained. Test now passes.
Lock → That specific failure mode is mathematically impossible. Forever.
This creates a Safety Ratchet: the known liability surface only shrinks. It never expands.
Automated from existing assets
You don't start from scratch. Point the platform's ingestion adapter at your existing documentation, compliance policies, or incident logs:
- Drop a PDF — compliance policy, employee handbook, MSA
- The Teleological Engine extracts explicit constraints ("Section 4.2: no gifts over $50")
- Generates
nnumber of adversarial variations attempting to violate that rule - Trains a Guardian that blocks all of them
- Zero-touch deployment to your fleet
Your compliance documents become your enforcement physics.
Explore the Platform
Thirteen deep-dive pages cover every surface above. Each page ships with its own hand-drawn diagrams, request lifecycles, and forensic replay story.
MCP Governance
The Model Context Protocol (MCP) shifts the AI risk surface from text generation to tool execution. An agent that calls stripe.create_refund, postgres.query, or aws.iam.create_role isn't writing — it's acting. A single malformed argument, injected parameter, or misrouted intent is no longer an embarrassing output. It's a financial transaction, a database modification, or an infrastructure change.
Trinitite intercepts every MCP tool call before it reaches the transport layer — validating not just the schema but the semantic intent of the call. Wrong argument type, suspicious parameter value, malicious override attempt, or scope violation: the Guardian catches it, corrects what can be corrected, and blocks what cannot.
Two deployment topologies
Centralized Gateway
All MCP traffic is routed through an external Trinitite gateway at the network layer. The MCP Client sends JSON-RPC to the gateway, which validates and forwards compliant calls to the MCP Server.
TRADEOFF: One additional network hop. Maximum centralization.
Client-Side Middleware
The Guardian is embedded directly within the MCP Client's execution flow. Tool calls are intercepted before they leave the application memory space — no network hop, zero additional latency.
TRADEOFF: In-process. Deepest integration point.
Both patterns provide identical governance guarantees. The difference is where the intercept point lives — at the network edge (Gateway) or embedded in-process within the MCP Client (Middleware). For most deployments, Client-Side Middleware is recommended: no network hop, deepest integration, lowest latency.
Autocorrection in action
Tool Call Autocorrection Lifecycle
When the LLM outputs a syntactically or semantically invalid argument, the Guardian intercepts the payload before transport. A deterministic JSON Patch autocorrects the error — the validated request proceeds to the MCP Server without costly LLM re-generation.
This is Semantic Rectification applied to tool calls. The LLM outputs {"limit": "N/A"} — syntactically wrong, semantically ambiguous. The Guardian intercepts it, identifies the violation against the tool's schema, calculates the correct value, and issues a JSON Patch that replaces "N/A" with 100 before the call ever reaches the MCP Server. No re-generation. No workflow interruption. No user-facing error.
Per-tool-call Guardians
Every tool call gets its own specialist Guardian. Not a generic safety filter — a hyper-specific model trained on that exact API operation's schema, semantics, and threat surface.
Per-Tool-Call Guardian Architecture
Each tool call gets its own Guardian — a specialist trained on the exact schema, semantics, and threat surface of that specific API operation. The Base Guardian provides universal safety infrastructure (determinism, ledger, rectification); the Tool Guardian extends it with hyper-specific knowledge of what a valid Stripe refund looks like vs. a malicious one.
The architecture stacks in two layers:
Base Guardian — universal safety infrastructure shared across all tool calls: batch-invariant determinism, semantic rectification engine, Glass Box Ledger, LoRA hot-swap. This is the physics layer.
Tool Guardian — a specialist LoRA adapter trained on the specific tool. It knows what a valid stripe.create_refund looks like. It knows the difference between a legitimate postgres.query and a SQL injection attempt. It knows that aws.iam.create_role with a wildcard policy is suspicious regardless of how the LLM justified it.
The Teleological Data Generator creates thousands of adversarial variations per tool call — catching syntax errors, intent attacks, schema mismatches, privilege escalation attempts, and semantic misuse — all while remaining strictly compliant with the tool's underlying API schema. The result is a Guardian that's simultaneously permissive for legitimate use and deterministically blocking for everything outside the safe manifold.
Pre-built Guardians from the platform
Trinitite ships pre-built Guardians for the most common MCP integrations — ready to deploy, already hardened against known attack patterns for that service.
Amount limits, authorization checks, fraud intent detection.
SQL injection prevention, unbounded query limits, write-access enforcement.
Destructive action gating, secret exposure detection, repo scope enforcement.
Policy-compliant messaging, channel access control, PII in transit.
IAM boundary enforcement, resource tagging, blast-radius containment.
Schema-trained Guardians auto-generated from your OpenAPI spec or tool definition.
For any tool call not covered by a pre-built Guardian, the platform automatically generates the training data from your tool definition or OpenAPI spec, trains the Guardian, and adds it to your fleet. The same Teleological Data Generator that trains the base Guardian operates on every new tool schema — you get a hardened, schema-aware Guardian without writing a single training example manually.
Use Trinitite's pre-built Guardians for standard APIs and automatically-generated Guardians for your custom tools. Both sit on the same base architecture, ship via the same LoRA hot-swap mechanism, and write to the same Glass Box Ledger. Your entire MCP fleet — standard and custom — governed with one system.
What gets governed
Every MCP tool call passes through a Guardian before transport. For each call, the Guardian evaluates:
| Check | What it catches |
|---|---|
| Schema validation | Wrong types, missing required fields, malformed values |
| Semantic intent | Calls that are syntactically valid but semantically dangerous (e.g., DELETE disguised as a read operation) |
| Argument injection | Prompt-injected values in parameters attempting to override system behavior |
| Scope enforcement | Calls that exceed the authorized scope for the current session, NHI, or user role |
| Pattern matching | Known attack signatures from the Trinitite threat intelligence network |
The outcome is the same three states as the base Guardian — Passed, Corrected, or Blocked — with a full forensic record in the Glass Box Ledger for every decision.
Architecture Summary
| Layer | Component | Role |
|---|---|---|
| Inference | Batch-Invariant Kernel | Eliminates floating-point drift across load |
| Control | Policy Manifold | Geometric definition of safe/unsafe vector space |
| Correction | Semantic Rectifier | Projects unsafe vectors to nearest Safe Centroid |
| Tool Calls | Per-Tool Guardian | Schema-trained specialist per MCP tool operation |
| Generation | Teleological Data Generator | Auto-synthesizes adversarial variations per tool schema |
| Distribution | LoRA Hot-Swap | Per-request policy, zero-downtime updates |
| Audit | Glass Box Ledger | Cryptographic, forensic, Daubert-admissible |
| Immunity | Federated Defense | Fleet-wide vaccination from single threat discoveries |
Self-hosted. Container-native. Engine-agnostic. Trinitite secures a model running on vLLM, a proprietary agent on SGLang, or your own inference stack — provided the underlying engine supports deterministic execution.
Every Surface, One Guardian
The thirteen deep-dives above are not thirteen separate products. They are thirteen views into one system.
A policy written once, a Guardian trained once, and a ledger entry written once — visible in the proxy, the MCP gateway, the CLI firewall, the compliance export, and the public verification path, simultaneously. That property is what makes Trinitite auditable end-to-end.
Next Steps
→ Authentication — Get your API key
→ Chat (Guardian Mode) — Send output, receive verdict
→ Guardians API — Create and manage Guardians
→ MCP Gateway — Govern Model Context Protocol tool calls