Guardian Training

Training with receipts. Adapter hashes that auditors can re-verify. Spectral bounds that regulators can re-check. A research loop that runs unattended while your Guardians get smarter on their own.

Fine-tuning a governance model is still a cottage industry in most organizations — a handful of engineers, a spreadsheet of hyperparameters, an overnight job, and a stack of charts nobody can reproduce six months later. Trinitite replaces that with one training stack: oracle-guided reverse-KL distillation, content-addressed LoRA storage with DAG lineage, cryptographically-bounded updates, and an autonomous research loop.

Oracle-Guided Distillation with Reverse-KL

Oracle-Guided Distillation — Reverse-KL Mode-Seeking

Governance isn't a creative task. It's the opposite: given a policy violation, there is one correct answer (the policy-compliant verdict or rectified output). Trinitite's training loss reflects that.

Forward-KL (the industry default for distillation) is mode-covering — it spreads student probability mass across every plausible teacher answer. That's the right loss when you want a student that's broadly capable and creative. It is the wrong loss when you want a Guardian that only produces safe outputs. Under forward-KL, some probability mass always bleeds into the unsafe mode.

Reverse-KL is mode-seeking — it pushes the student onto a single mode of the teacher's distribution, aggressively penalizing hedging. When the oracle teacher defines the safe mode, reverse-KL produces a Guardian that commits to that mode. Unsafe probability mass is not just small — it's actively minimized.

For governance, this is the right loss.

Content-Addressed Adapter Storage

Content-Addressed Adapter DAG

Every adapter version is keyed by:

adapter_hash = sha256(weights_bytes || recipe_json)

The experiment DAG is formed by parent_hash. Three consequences:

Identical training produces identical hashes. Cache hits are real. If you've trained this exact recipe on this exact data before, the system knows.
Any drift is immediately visible. Change a weight or a knob, the hash changes. Independent of filesystem path, storage backend, or filename convention.
Lineage is a graph. "What adapter was trained on top of what" is traceable as a DAG — not inferred from a folder tree. Branching experiments, abandoned paths, merged revivals, all navigable.

This is the mathematical companion to bit-exact inference replay. You can identify an adapter, you can verify an adapter, you can reproduce an adapter.

The Recipe — One JSON, ~70 Knobs

All training configuration — LoRA rank/alpha, reverse-KL beta, samples per prompt, wall-clock budget, warmup/warmdown fractions, short-rollout caps, fast-fail guards, tiered-eval suites, optimizer choice, DAGGER recovery, teacher list — lives in one dataclass, Recipe, passed around the system as a single object.

The autonomous research agent, a human operator, or a customer calling POST /v1/training/retrain all touch the same surface. There is no "the API has 6 knobs but the trainer has 70" silent-drop bug. Unknown keys are logged, not silently dropped.

Signed Lipschitz Bound

Lipschitz Bound — Bounded Deformation as Attestable Invariant

When the manifold_muon optimizer is selected, the training service runs a Muon-style orthogonalized-momentum update on the LoRA factors with periodic Stiefel-manifold re-projection. After training, the service computes a signed Lipschitz upper bound on the update:

L = (α / r) · max‖A‖₂ · max‖B‖₂

The bound is wire-bound to the canonical receipt envelope and signed with the asymmetric platform key. Publication of the adapter includes the bound. Adoption of the adapter by the Governance service verifies the bound.

An auditor or external reviewer can:

Fetch the receipt.
Fetch the public JWKS.
Re-hash the envelope.
Confirm the spectral bound matches the signed value.

…entirely offline, without any Trinitite credentials. Bounded deformation is an attestable invariant, not a vibe.

Autoresearch — The Autonomous Research Loop

Autoresearch Loop — Hyperband with Budget Gating

A ResearchProgram owns:

A base recipe.
A search space over ~15 knobs.
A set of test-suite IDs.
An edition-gated budget (parallel experiments / FLOPs per night / wall-clock hours).
An auto-promote policy.

Four driver implementations sit behind the same IAutoresearchDriverPort — Hyperband, PBT, BOHB, and a manual operator driver for guided sweeps.

The loop:

Propose a recipe from the search space.
Train within the FLOPs/night budget.
Gate on an advantage-stats health check — did the proposed run exceed some threshold over baseline?
Promote passing runs onto the leaderboard.
Rank by (tdg_pass_rate DESC, flops_per_tdg_point ASC) — higher quality first, cheaper-per-quality-point as the tiebreaker.
Winner becomes the next baseline. The loop repeats.

Your team sets the policy once. The loop runs unattended. Every run produces a signed receipt. Every promotion is auditable. Nobody is hand-tuning adapters at 2am.

One Pipeline, Many SLMs

The same harness trains:

Guardians — the policy-enforcement SLMs that govern chat, tool calls, CLI commands.
Tool-specialist Guardians — per-tool-call SLMs trained on specific MCP operation schemas.
Skill scanner classifiers — the SLMs behind Skill Vault scan phases.

Multi-teacher routing, per-task oracle prompts, and per-task advantage aggregation mean adding a new SLM class is a config change, not a new codebase.

Training as a Surface

POST /v1/training/retrain
{
  "base_adapter":  "gua_pii_v1.0",
  "recipe_override": { "reverse_kl_beta": 0.08, "lora_rank": 16 },
  "test_suites":   ["pii-core", "pii-multilang"],
  "budget": { "flops_per_night": 2.0e18, "wall_clock_h": 6 }
}

Async, returns a retrain_id + polling URL. On completion:

{
  "status":           "completed",
  "new_adapter_hash": "sha256:…",
  "parent_hash":      "sha256:…",
  "lipschitz_bound":  0.34,
  "tdg_pass_rate":    0.982,
  "receipt_id":       "rcp_trn_…",
  "signed_envelope":  "base64:…"
}

For full endpoint schemas, see API Reference → Training.

What You Get

Capability	Typical fine-tuning	Trinitite training
Loss function	Forward-KL / CE	Reverse-KL (mode-seeking) for governance
Storage	Filesystem paths	Content-addressed (`sha256(weights
Lineage	Folder tree	DAG with `parent_hash`
Hyperparameter search	Spreadsheet	Autoresearch loop with leaderboard
Bounded deformation	Hope + "adapter is small"	Signed Lipschitz spectral bound
Reproducibility	"Same seed usually works"	Content-addressed hit = bit-exact cache
Verification	Internal dashboards	Public-verifier-compatible signed receipts

Closed-Loop Training

A Guardian's Policy Manifold is not static. It expands through Test-Driven Governance in the lab — and now it expands in production too. Every correction the Guardian applies is a signal: the model output drifted close enough to a Forbidden Zone to need a patch. Trinitite captures that signal and feeds it back into the next training run.

Closed-Loop Training — production corrections feed the next Guardian

Production correction vectors are clustered (the same correction applied 47 times this week is one cluster, not 47 distinct training examples). Each cluster becomes a synthetic adversarial training set the Teleological Data Generator extends to thousands of variations. The next Guardian retrain absorbs those clusters and the Safety Ratchet advances.

Opt in per tenant — see Trust Center → Closed-loop training.

Next Steps

→ Policy Intelligence — the source of truth the Guardians are trained against.

→ Testing & Simulation — the scenarios and suites that validate new adapters.

→ Glass Box Ledger — where every training receipt gets anchored.

Oracle-Guided Distillation with Reverse-KL​

Content-Addressed Adapter Storage​

The Recipe — One JSON, ~70 Knobs​

Signed Lipschitz Bound​

Autoresearch — The Autonomous Research Loop​

One Pipeline, Many SLMs​

Training as a Surface​

What You Get​

Closed-Loop Training​

Next Steps​