The Guardian Architecture

Civil Engineering for Cognition.

The transition from AI-as-publisher to AI-as-operator has fundamentally altered the liability surface of enterprise software. A language model that writes emails has a hallucination problem. A language model that executes SQL queries, initiates transfers, and triggers automations has a governance problem.

Trinitite's answer is structural: install a deterministic control layer between the probabilistic Actor and the execution environment. Not a filter. Not a prompt. A Guardian — built on the same engineering principles that stabilized aviation, finance, and operating systems over the last century.

Safety Variance

Industry Drift

Audit Coverage

<0ms

Governance Latency

The Platform at a Glance

Trinitite is not a single service. It is a stack of governed intermediaries — identity, data plane, intelligence, trust — that all share one Guardian evaluation kernel, one identity model, and one audit ledger. This is the map. Every box links to its own deep-dive.

The Trinitite Platform — 13 surfaces, one Guardian evaluation

Every surface shares the same Guardian evaluation kernel, the same identity model, and the same Glass Box Ledger. Click any box for the deep-dive.

The Three Outcomes

Every response from your AI passes through a Guardian. The Guardian makes exactly one decision:

✓ Passed

Output complies with all policy constraints. Returned unchanged with a governance receipt. Zero disruption to the workflow.

⚡ Corrected

Output violated a policy but was safely repairable. An RFC 6902 JSON Patch is returned. Apply it to the last message — no re-generation required.

✗ Blocked

Output contained a critical violation that could not be safely corrected. A 403 is returned with a full forensic record. Do not use the original output.

This is not a content filter. The Guardian is a trained model that understands your policies geometrically — mapping output vectors against a Policy Manifold — and either passes, surgically repairs, or blocks the output before it reaches your infrastructure.

The Intercept Flow

Passed — output is compliant

Corrected — policy violation fixed in-flight

Blocked — critical violation, request denied

The Guardian sits inline between your AI model and your application. It intercepts the raw output vector, evaluates it against the active Policy Manifold, applies Semantic Rectification if needed, and logs every decision to the Glass Box Ledger.

Your application receives one of three outcomes. The workflow continues. No re-generation. No human-in-the-loop for the common case.

Why Separation of Concerns

The failure of "Native Safety" — prompt engineering, RLHF guardrails, output filters — is not a code problem. It is a topology problem.

We are currently asking the same neural parameters to be both the Artist (creative, stochastic) and the Censor (restrictive, deterministic). These objectives are mathematically incompatible.

The Guardian Architecture enforces a strict bifurcation: the Actor is permitted to be creative and prone to failure. The Guardian is cold, binary, and deterministic. Decades of engineering precedent support exactly this pattern.

Operating Systems

Kernel Space vs. User Space

The AI model is User Space — creative, stochastic, and permitted to fail. The Guardian is Kernel Space — privileged, deterministic, and hardware-enforced. Just as a user process cannot overwrite kernel memory, a probabilistic Actor cannot execute a business violation through the Guardian.

Distributed Systems

The Cognitive Sidecar

Kubernetes solved unreliable services with a Service Mesh — a sidecar proxy (Envoy, Istio) that enforces traffic policy regardless of the application. The Guardian is the Cognitive Sidecar. It decouples compliance logic from model weights, letting you swap GPT-5 for Claude without rewriting your safety architecture.

Aviation

Flight Envelope Protection

On an Airbus A320, the pilot requests a climb. The Flight Control Computer refuses if it would cause a stall — regardless of pilot intent. Since "Hard Envelope Protection" was introduced, hull loss rates dropped 4×. The Guardian applies the same principle: the enterprise sets the speed limit, the physics enforce it.

Finance

SEC Rule 15c3-5

Knight Capital lost $440M in 45 minutes in 2012 when their trading algorithm went rogue. The SEC's response: mandate a risk management layer architecturally distinct from the execution layer. Modern pre-trade risk checks add 2–10µs. The market accepted that "Autonomy Tax." So should the enterprise.

Domain-Driven Design

The Anti-Corruption Layer

In DDD, an Anti-Corruption Layer (ACL) shields a clean, high-integrity system from a legacy system. The Guardian is the ACL for cognition. It translates the probabilistic, hallucination-prone output of the model into the strict, type-safe schema required by your banking core, ERP, or downstream API.

Neurobiology

GABAergic Inhibitory Control

The human brain is 80% excitatory neurons and 20% inhibitory (GABAergic). Inhibition is not a constraint on intelligence — it's what makes intelligence coherent. Without it, the result isn't a genius; it's a seizure. A Guardian is biomimetic: the sparse, specialized inhibitory layer that makes generative reasoning safe.

Batch-Invariant Determinism

The central failure mode of native safety under load is floating-point non-associativity.

Modern GPU inference engines dynamically change their reduction strategy based on server load — splitting Key-Value cache calculations differently at batch size 1 vs. batch size 128. This changes the accumulation order of floating-point operations, which cascades through Chain-of-Thought reasoning, causing the model's safety posture to drift.

The result: attack vectors that were blocked in the lab breach the system in production. Our validation data quantified this at 21.4% safety drift in production Thinking models.

The Guardian solves this by enforcing a Fixed-Size Split-KV Strategy: the tile size of the KV cache reduction is locked in software (e.g., 256 elements) regardless of batch size or hardware utilization. This forces the GPU to execute the exact same accumulation tree for request N whether it is the only request on the server or one of ten thousand.

The Engineering Implication

Bitwise reproducibility is now an off-the-shelf commodity. Open-source inference engines (SGLang, vLLM) already support it via configuration flags. The failure to implement it is no longer a capability gap — it is a fiduciary choice to operate without available safety controls.

Native Safety:
  Batch Size 1   → [A + B + C] = safe
  Batch Size 128 → [C + A + B] = unsafe   ← floating-point non-associativity

Guardian:
  Batch Size 1   → Fixed tile → [A + B + C] = safe
  Batch Size 128 → Fixed tile → [A + B + C] = safe   ← 0.00% variance

Semantic Rectification

When an output vector falls in a Forbidden Zone, the Guardian does not block it by default — blocking causes workflow disruption. Instead, it calculates the Difference Vector required to shift the output to the nearest Safe Centroid in the Policy Manifold, and returns that as an RFC 6902 JSON Patch.

Semantic Rectification — Vector Space Projection

The Guardian does not ask a language model to "rewrite this safely." It maps the dangerous vector to the nearest pre-validated Safe Centroid — a deterministic geometric calculation that produces a unique, mathematically guaranteed result.

This is not "fancy regex." Regex looks for syntax (DROP TABLE). It fails against obfuscation (D_R_O_P T_A_B_L_E), semantic variation, or base64-encoded commands.

Rectification looks for semantic intent — vector space coordinates. If an attacker uses pig latin to request a database deletion, the embedding model maps "deletion" to the same vector coordinates regardless of syntax. The Guardian identifies the vector in the Destructive Zone and applies a transformation matrix to shift it into the Read-Only Zone. The resulting text is reconstructed from the safe vector.

The result: corrections handle intent (the "Why"), not just syntax (the "What").

The Safe Snap

The Guardian is not permitted to invent corrections. It can only snap to Pre-Validated Centroids — safe states that have already passed the Test-Driven Governance suite. This means every correction is a mathematically proven safe state, not a guess. The system collapses undefined behavior into defined, tested behavior.

The Glass Box Ledger

Every governance decision is written to an append-only, cryptographically chained ledger: the State-Tuple Ledger.

Each block captures: (timestamp, input_hash, policy_hash, outcome, corrections, governance_hash) — chained as H_n = Hash(H_{n-1} || S_n).

If a single byte of a log entry from three months ago is altered, the current block's hash fails validation. This guarantees non-repudiation: neither the enterprise nor its AI provider can deny an action that occurred.

Why this matters in court

In civil aviation, the National Transportation Safety Board distinguishes between Pilot Notes (mutable, subjective) and the Flight Data Recorder (objective, hardened). When the FDR data contradicts the pilot's testimony, the FDR wins.

Standard chat logs are Pilot Notes. The State-Tuple Ledger is the FDR. It records the vector state, the active policy hash, and the rectification delta. Without it, your defense relies on hearsay. With it, your evidence is science.

Forensic Replayability: because Guardians are batch-invariant, you can take any input vector from the log and replay the event with bitwise precision. This turns the platform into a flight simulator for debugging — rewind the tape, adjust the variables, and prove the fix works before redeployment.

Self-Hosted Deployment

Deployment stack

# docker-compose.yml
services:
  control-plane:
    image: trinitite/control-plane:latest
    ports:
      - "8080:8080"
    environment:
      - DB_TYPE=postgres
      - LEDGER_ADAPTER=s3_worm
      - LORA_STORAGE_ADAPTER=s3
  governance:
    image: trinitite/governance:latest
    ports:
      - "8000:8000"
    environment:
      - INFERENCE_ENGINE=sglang
      - ENABLE_LORA=true
      - ENABLE_DETERMINISTIC_INFERENCE=true
    volumes:
      - ./manifolds:/manifolds
      - ./guardians:/guardians

The redirect is one environment variable:

# Before
OPENAI_BASE_URL=https://api.openai.com/v1

# After — route through the Guardian proxy
OPENAI_BASE_URL=http://localhost:8080/v1/proxy

Your application doesn't change. The platform intercepts all traffic, applies governance, and proxies the inference call to your configured backend.

Persistence adapters

The ledger backend is pluggable — swap it via environment variable with zero code changes:

Tier	Backend	Use case
Standard	S3 Object Lock / WORM	Commercial durability, adverse-inference defense
Managed	Cloud KMS / HSM	Regulatory separation of duties
Sovereign	Hardware TEE (Nvidia Confidential Computing)	Nation-state non-repudiation
Edge	SQLite	Air-gapped / on-premise deployments

Integration Patterns

Pattern A — Recommended

Full Proxy

Redirect OPENAI_BASE_URL to the Trinitite proxy. It handles the full round-trip: intercept → inference → sanitize → return.

Your application receives clean JSON. No code changes required. The Guardian manages the complete chain of custody.

LIABILITY: Trinitite owns the sanitization.

Pattern B — Oracle Mode

Oracle Endpoint

Call POST /v1/chat with your AI's raw output and the Guardian to apply. You receive a verdict (passed, corrected, or blocked) and the JSON Patch to apply.

You control the inference and apply the corrections yourself. Useful for specialized architectures where proxying traffic isn't possible.

LIABILITY: Shifts to you if you ignore the patch.

Pattern A in practice

Application                    Guardian Proxy (your VPC)
    │                                  │
    │  POST /v1/proxy/chat/completions │
    │  {model: "gpt-4o", ...}          │
    │ ───────────────────────────────► │
    │                                  │  → intercept
    │                                  │  → evaluate vector
    │                                  │  → apply rectification
    │                                  │  → log to ledger
    │                                  │  → proxy to OpenAI
    │                                  │  ← receive raw response
    │                                  │  → re-evaluate response
    │  ◄────────────────────────────── │
    │  {clean, governed response}      │

Pattern B in practice

Application                    Guardian (your VPC)
    │                                  │
    │  (your AI call, your code)       │
    │  raw_output = ai.complete(...)   │
    │                                  │
    │  POST /v1/chat                   │
    │  {guardian: "PII-Redactor",      │
    │   instructions: "...",           │
    │   input: [..., raw_output]}      │
    │ ───────────────────────────────► │
    │                                  │
    │  ◄────────────────────────────── │
    │  {status: "corrected",           │
    │   corrections: [...]}            │
    │                                  │
    │  apply(raw_output, corrections)  │

Federated Defense

Herd Immunity via LoRA Hot-Swaps

A monolithic safety model cannot simultaneously understand HIPAA compliance, SEC regulations, and polymorphic malware detection without catastrophic latency or "forgetting." Trinitite replaces the monolith with a swarm of specialized Guardians.

STEP 01

Zero-Day Detected

A Guardian-protected node detects a novel attack vector. The vector is immediately isolated.

STEP 02

Micro-LoRA Minted

A lightweight LoRA adapter is trained on the threat — typically megabytes, trained in minutes.

STEP 03

Fleet Vaccinated

The signed adapter is hot-swapped into every Guardian globally. No restarts. No downtime.

STEP 04

Risk → Zero

That specific attack class drops from a non-zero probability to mathematically impossible.

"An attack on one client strengthens the defenses of all clients."

LoRA architecture

Guardians use Low-Rank Adaptation (LoRA) to represent policies as lightweight tensor files — megabytes, not gigabytes. This enables:

Per-request policy switching — HIPAA for one request, SOC 2 for the next, in the same batch
Hot-swap updates — new policies applied in sub-millisecond pointer swaps, no restarts
Non-destructive patching — extend an existing Guardian's capabilities without retraining from scratch
Stacked policies — baseline universal safety + custom enterprise rules combined via vector summation

# Extend an existing Guardian with a new threat vector
from peft import PeftModel

model = PeftModel.from_pretrained(
    base_model,
    "./guardian-pii-v1.0",
    is_trainable=True   # unlock the LoRA weights for the patch
)

# Oracle-guided distillation on the new threat data
# → new weights saved to ./guardian-pii-v1.1

Test-Driven Governance

A Guardian's Policy Manifold is not static. It expands through Test-Driven Governance (TDG) — the application of software TDD principles to AI policy.

Every identified failure mode becomes a permanent constraint:

Red   → New threat vector identified. Guardian does not block it.
Green → Vector ingested. Guardian trained. Test now passes.
Lock  → That specific failure mode is mathematically impossible. Forever.

This creates a Safety Ratchet: the known liability surface only shrinks. It never expands.

Automated from existing assets

You don't start from scratch. Point the platform's ingestion adapter at your existing documentation, compliance policies, or incident logs:

Drop a PDF — compliance policy, employee handbook, MSA
The Teleological Engine extracts explicit constraints ("Section 4.2: no gifts over $50")
Generates n number of adversarial variations attempting to violate that rule
Trains a Guardian that blocks all of them
Zero-touch deployment to your fleet

Your compliance documents become your enforcement physics.

Explore the Platform

Thirteen deep-dive pages cover every surface above. Each page ships with its own hand-drawn diagrams, request lifecycles, and forensic replay story.

Identity & Governance

Identity & RBAC

Humans · API keys · NHIs

Read the deep-dive →

Identity & Governance

NHI Governance

Federation · tiers · bindings

Read the deep-dive →

Identity & Governance

Policy Intelligence

Docs → knowledge → Guardian

Read the deep-dive →

Data Plane

LLM Proxy

OpenAI · Anthropic · Azure

Read the deep-dive →

Data Plane

MCP Gateway

Tool calls · per-tool certification

Read the deep-dive →

Data Plane

CLI Firewall

Agentic command control

Read the deep-dive →

Data Plane

Skill Vault

Signed SKILL.md registry

Read the deep-dive →

Intelligence

Guardian Training

Oracle-guided · Lipschitz-bound

Read the deep-dive →

Intelligence

Testing & Simulation

Scenarios · TDG · replay

Read the deep-dive →

Trust Layer

Glass Box Ledger

Bit-exact replay · Merkle chain

Read the deep-dive →

Trust Layer

Observability

Ops · security · audit streams

Read the deep-dive →

Trust Layer

Compliance Architecture

EU AI Act · NIST · ISO 42001

Read the deep-dive →

Trust Layer

Enterprise Reporting

Semantic layer · 60 reports

Read the deep-dive →

MCP Governance

The Model Context Protocol (MCP) shifts the AI risk surface from text generation to tool execution. An agent that calls stripe.create_refund, postgres.query, or aws.iam.create_role isn't writing — it's acting. A single malformed argument, injected parameter, or misrouted intent is no longer an embarrassing output. It's a financial transaction, a database modification, or an infrastructure change.

Trinitite intercepts every MCP tool call before it reaches the transport layer — validating not just the schema but the semantic intent of the call. Wrong argument type, suspicious parameter value, malicious override attempt, or scope violation: the Guardian catches it, corrects what can be corrected, and blocks what cannot.

Two deployment topologies

Pattern A — Network Proxy

Centralized Gateway

All MCP traffic is routed through an external Trinitite gateway at the network layer. The MCP Client sends JSON-RPC to the gateway, which validates and forwards compliant calls to the MCP Server.

TRADEOFF: One additional network hop. Maximum centralization.

Pattern B — Recommended

Client-Side Middleware

The Guardian is embedded directly within the MCP Client's execution flow. Tool calls are intercepted before they leave the application memory space — no network hop, zero additional latency.

TRADEOFF: In-process. Deepest integration point.

Both patterns provide identical governance guarantees. The difference is where the intercept point lives — at the network edge (Gateway) or embedded in-process within the MCP Client (Middleware). For most deployments, Client-Side Middleware is recommended: no network hop, deepest integration, lowest latency.

Autocorrection in action

Tool Call Autocorrection Lifecycle

When the LLM outputs a syntactically or semantically invalid argument, the Guardian intercepts the payload before transport. A deterministic JSON Patch autocorrects the error — the validated request proceeds to the MCP Server without costly LLM re-generation.

This is Semantic Rectification applied to tool calls. The LLM outputs {"limit": "N/A"} — syntactically wrong, semantically ambiguous. The Guardian intercepts it, identifies the violation against the tool's schema, calculates the correct value, and issues a JSON Patch that replaces "N/A" with 100 before the call ever reaches the MCP Server. No re-generation. No workflow interruption. No user-facing error.

Per-tool-call Guardians

Every tool call gets its own specialist Guardian. Not a generic safety filter — a hyper-specific model trained on that exact API operation's schema, semantics, and threat surface.

Per-Tool-Call Guardian Architecture

Each tool call gets its own Guardian — a specialist trained on the exact schema, semantics, and threat surface of that specific API operation. The Base Guardian provides universal safety infrastructure (determinism, ledger, rectification); the Tool Guardian extends it with hyper-specific knowledge of what a valid Stripe refund looks like vs. a malicious one.

The architecture stacks in two layers:

Base Guardian — universal safety infrastructure shared across all tool calls: batch-invariant determinism, semantic rectification engine, Glass Box Ledger, LoRA hot-swap. This is the physics layer.

Tool Guardian — a specialist LoRA adapter trained on the specific tool. It knows what a valid stripe.create_refund looks like. It knows the difference between a legitimate postgres.query and a SQL injection attempt. It knows that aws.iam.create_role with a wildcard policy is suspicious regardless of how the LLM justified it.

The Teleological Data Generator creates thousands of adversarial variations per tool call — catching syntax errors, intent attacks, schema mismatches, privilege escalation attempts, and semantic misuse — all while remaining strictly compliant with the tool's underlying API schema. The result is a Guardian that's simultaneously permissive for legitimate use and deterministically blocking for everything outside the safe manifold.

Pre-built Guardians from the platform

Trinitite ships pre-built Guardians for the most common MCP integrations — ready to deploy, already hardened against known attack patterns for that service.

stripe

Payments & Billing

stripe.create_refundstripe.create_chargestripe.update_subscription

Amount limits, authorization checks, fraud intent detection.

postgres

Database

postgres.querypostgres.executepostgres.insert

SQL injection prevention, unbounded query limits, write-access enforcement.

github

Code & Repos

github.create_filegithub.delete_branchgithub.merge_pr

Destructive action gating, secret exposure detection, repo scope enforcement.

slack

Messaging

slack.post_messageslack.invite_userslack.delete_message

Policy-compliant messaging, channel access control, PII in transit.

aws

Cloud Infrastructure

aws.s3.put_objectaws.iam.create_roleaws.ec2.terminate

IAM boundary enforcement, resource tagging, blast-radius containment.

custom

Any MCP Server

your_api.any_toolyour_db.any_queryyour_service.*

Schema-trained Guardians auto-generated from your OpenAPI spec or tool definition.

For any tool call not covered by a pre-built Guardian, the platform automatically generates the training data from your tool definition or OpenAPI spec, trains the Guardian, and adds it to your fleet. The same Teleological Data Generator that trains the base Guardian operates on every new tool schema — you get a hardened, schema-aware Guardian without writing a single training example manually.

Pre-built + custom = best of both

Use Trinitite's pre-built Guardians for standard APIs and automatically-generated Guardians for your custom tools. Both sit on the same base architecture, ship via the same LoRA hot-swap mechanism, and write to the same Glass Box Ledger. Your entire MCP fleet — standard and custom — governed with one system.

What gets governed

Every MCP tool call passes through a Guardian before transport. For each call, the Guardian evaluates:

Check	What it catches
Schema validation	Wrong types, missing required fields, malformed values
Semantic intent	Calls that are syntactically valid but semantically dangerous (e.g., `DELETE` disguised as a read operation)
Argument injection	Prompt-injected values in parameters attempting to override system behavior
Scope enforcement	Calls that exceed the authorized scope for the current session, NHI, or user role
Pattern matching	Known attack signatures from the Trinitite threat intelligence network

The outcome is the same three states as the base Guardian — Passed, Corrected, or Blocked — with a full forensic record in the Glass Box Ledger for every decision.

Architecture Summary

Layer	Component	Role
Inference	Batch-Invariant Kernel	Eliminates floating-point drift across load
Control	Policy Manifold	Geometric definition of safe/unsafe vector space
Correction	Semantic Rectifier	Projects unsafe vectors to nearest Safe Centroid
Tool Calls	Per-Tool Guardian	Schema-trained specialist per MCP tool operation
Generation	Teleological Data Generator	Auto-synthesizes adversarial variations per tool schema
Distribution	LoRA Hot-Swap	Per-request policy, zero-downtime updates
Audit	Glass Box Ledger	Cryptographic, forensic, Daubert-admissible
Immunity	Federated Defense	Fleet-wide vaccination from single threat discoveries

Self-hosted. Container-native. Engine-agnostic. Trinitite secures a model running on vLLM, a proprietary agent on SGLang, or your own inference stack — provided the underlying engine supports deterministic execution.

Every Surface, One Guardian

The thirteen deep-dives above are not thirteen separate products. They are thirteen views into one system.

One identity model

Humans, API keys, and NHIs — all authenticated through the same contract, scoped through the same RBAC, and resolved through the same permission tree regardless of which surface they hit.

One Guardian evaluation

The LLM Proxy, MCP Gateway, CLI Firewall, and Skill Vault all route through the same batch-invariant, LoRA-hot-swappable Guardian kernel. Train once, enforce everywhere.

One audit ledger

Every verdict on every surface — tool call, chat output, CLI command, skill load — appends to the same Glass Box Ledger with the same Merkle chain and the same external trust anchors.

A policy written once, a Guardian trained once, and a ledger entry written once — visible in the proxy, the MCP gateway, the CLI firewall, the compliance export, and the public verification path, simultaneously. That property is what makes Trinitite auditable end-to-end.

Next Steps

→ Authentication — Get your API key

→ Chat (Guardian Mode) — Send output, receive verdict

→ Guardians API — Create and manage Guardians

→ MCP Gateway — Govern Model Context Protocol tool calls

The Platform at a Glance​

The Three Outcomes​

The Intercept Flow​

Why Separation of Concerns​

Batch-Invariant Determinism​

Semantic Rectification​

The Safe Snap​

The Glass Box Ledger​

Why this matters in court​

Self-Hosted Deployment​

Deployment stack​

Persistence adapters​

Integration Patterns​

Full Proxy

Oracle Endpoint

Pattern A in practice​

Pattern B in practice​

Federated Defense​

Herd Immunity via LoRA Hot-Swaps

LoRA architecture​

Test-Driven Governance​

Automated from existing assets​

Explore the Platform​

MCP Governance​

Two deployment topologies​