Observability

Three streams. One correlation ID. Zero blind spots.

Most AI platforms treat logs as a developer-grade afterthought. Trinitite treats them as product surface, because enterprise audit teams need to prove — weeks or years later — who did what, with which model, under which policy, and with which evidence. The observability surface is three independent streams, one canonical schema, full OpenTelemetry instrumentation, and portable sinks for every deployment mode.

Three Independent Streams

Three Streams — Independent Retention, Sink, Access

Every line on all three streams carries the same correlation header — ts, cid, trace_id, span_id, deployment_mode. SIEM queries stitch them together without manual correlation.

Each stream has its own retention, its own sink, and its own access-control surface:

Ops — operational telemetry. Health, latency, errors, 4xx/5xx. 90 days. Routes to your OTel backend.
Security — canonical security taxonomy. Auth, admin, network, data, crypto, compliance. 13 months (SOC 2 CC7, ISO 27001 A.12.4). Routes to your SIEM.
Audit — durable, hash-chained rows in the audit_logs table. Every policy decision, every governance action, every export. 7 years (EU AI Act Art. 12; SOX). Routes to the Glass Box Ledger.

Separating them matters because they have different risk profiles, different retention obligations, and different consumers. Developers query ops on Tuesday afternoon during an incident. Security queries security during a threat hunt. Auditors query audit during an engagement — and the audit stream is the one that carries cryptographic receipts.

The Canonical Header

Every line on every stream carries the same correlation header:

ts                RFC 3339 UTC
service           trinitite-control-plane
version           semver of the running build
env               production | staging | dev
deployment_mode   saas | hybrid | self_hosted
region            logical region tag
host              pod / VM hostname
cid               correlation ID (W3C traceparent if present)
trace_id          OTel trace ID (hex)
span_id           OTel span ID (hex)

A CI validator blocks PRs that drift from this schema. SIEM rules written once keep working across every release.

Full OpenTelemetry

OpenTelemetry Trace Waterfall — Example LLM Proxy Call

Every HTTP request creates a span. Every downstream call — database, inference engine, LLM provider, external tool server — is instrumented automatically. Logs carry trace_id + span_id. Metrics emit RED per endpoint plus platform metrics (event loop delay, heap, circuit-breaker state, per-dependency *_up gauges).

Turn it on with two environment variables:

OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.your-collector.example/v1

Any OpenTelemetry-compatible backend — Grafana Tempo, Honeycomb, Datadog, Azure Monitor, Jaeger, Tempo + Loki — renders logs and traces side-by-side out of the box.

The SIEM Pipeline Contract

SIEM Pipeline — Canonical Header, Three Lanes

Security and compliance teams don't want to learn a new tool. They want Trinitite events to show up in Splunk / Datadog / CloudWatch / the SIEM they already use, with the same fields as everything else. The canonical header is the contract. A single correlation ID (cid) stitches a user's journey across ops, security, and audit without any custom schema work on your side.

Supported sinks (routed per-stream via LOGGING_ADAPTER and related env vars):

Sink	Ops	Security	Audit (mirror)
Console / stdout	✓	✓	✓
Splunk	✓	✓	✓
Datadog	✓	✓	✓
CloudWatch	✓	✓	✓
Azure Monitor	✓	✓	✓
Google Cloud Logging	✓	✓	✓
OTel collector (OTLP)	✓	✓	✓
Elastic / OpenSearch	✓	✓	✓

The audit stream additionally writes to the Glass Box Ledger — the SIEM mirror is a convenience for querying; the ledger is the authoritative record.

Metrics That Matter

Out of the box:

Metric family	Examples	Use
RED	`http_request_rate`, `http_request_errors`, `http_request_duration_seconds`	Per-endpoint health
Governance	`guardian_verdict_total` labelled by verdict (pass / correct / block)	Policy enforcement visibility
Dependencies	`database_up`, `inference_up`, `provider_up` labelled by provider	Circuit-breaker state per backend
Spend	`nhi_spend_consumed_total` labelled by NHI, `session_halts_total`	Agent-cost observability
Ledger	`ledger_write_duration_seconds`, `ledger_chain_validation_failures_total`	Audit substrate health
Platform	`nodejs_event_loop_delay_seconds`, `process_heap_bytes`	Runtime health

Your existing Prometheus / Grafana / Datadog dashboards light up immediately; no custom scraping.

Deployment-Mode Portability

The same control-plane container emits the same events whether you run SaaS on Azure, hybrid with us hosting GPUs, or fully air-gapped on-prem. Only the sink plugs change. An alert written in your enterprise Splunk against the SaaS deployment keeps working when you move the same tenant to self-hosted — the canonical header is identical.

What You Get

Capability	Typical AI platform	Trinitite observability
Log schema	Drifts per release	CI-validated canonical header
Retention	One bucket	Per-stream, compliance-grade
SIEM fit	Custom ingestion work	Native pipeline contract
Trace correlation	Partial	Full OTel on every request
Audit stream	Mixed with ops logs	Separated + hash-chained + ledger-anchored
Deployment portability	Per-mode rewrites	Same events, swap the sink

Policy Retrieval and Correction Diff

Two metric families turn "we used your policy" from a claim into a checkable property.

RAG telemetry

Policy Retrieval Telemetry — proves the policy was actually injected

policy_retrieval_total

Count of policy retrievals per request lifecycle

policy_retrieval_hit_ratio

Fraction of decisions where the relevant policy clause was found and injected

policy_retrieval_latency_ms

Latency of the retrieval call (p50, p95, p99)

policy_retrieval_clauses_injected

Count of clauses actually placed in the Guardian context window

policy_retrieval_drift_warnings

Count of cases where the retrieved policy hash drifted from the active rubric

Without retrieval telemetry, "we used your policy" is a claim. With it, every governance decision is checkable: which clauses were retrieved, which made it into the Guardian's context, and whether the retrieved policy hash matched the active rubric.

policy_retrieval_* is the family that proves a policy clause was actually retrieved and injected into the Guardian context for any given decision. When policy_retrieval_drift_warnings ticks up, you know an edit somewhere has not yet propagated — the Guardian decision is still being made, but it's being made against a stale snapshot.

`correction_diff` block on every receipt

Every corrected verdict carries a correction_diff block on its ledger receipt:

{
  "correction_diff": {
    "embedding_distance": 0.31,    // semantic-space distance from output to nearest Safe Centroid
    "severity":           "medium", // low | medium | high | critical
    "category":           "pii.ssn",
    "patch_op_count":     1
  }
}

The block lets you triage corrections operationally — sort by severity, alert on critical, build dashboards showing which categories shift week over week.

Replay Verdict Taxonomy

Forensic replay is a first-class operation. Every replayed event is classified — never silently downgraded.

BIT_EXACT

Replay produces a byte-identical output to the original. Same Guardian, same policy hash, same tile size, same seed.

USE Default for any replay run on the same node version with the original adapter still loaded.

SEMANTIC_ONLY

Replay produces a semantically equivalent output (same outcome, same JSON Patch class) but bytes differ — typically because a downstream tokenizer or model build changed.

USE Surfaces when re-running an old block on a newer build; the verdict still validates.

DIVERGENT

Replay produces a different verdict than the original. Either the active policy changed or the adapter shifted in a way that breaks the prior block.

USE Drives a forensic regression alert. Do not silently accept.

ORIGINAL_MISSING

The original adapter or upstream artifact is no longer available, so a faithful replay is impossible. The Merkle receipt is still verifiable.

USE Common after long retention windows. Mark explicitly rather than silently downgrading to semantic_only.

Surfaced via mcp_session_replay_verdict_count and similar metric families per surface. Spikes in divergent are an alert in the security stream.

Next Steps

→ Glass Box Ledger — where the audit stream terminates and becomes evidence.

→ Compliance Architecture — how these streams feed framework-specific attestations.

→ Enterprise Reporting — the curated reporting layer on top of the same semantic sources.

→ Cookbook → SIEM export — wire the streams into your SIEM with the right partitions.

Three Independent Streams​

The Canonical Header​

Full OpenTelemetry​

The SIEM Pipeline Contract​

Metrics That Matter​

Deployment-Mode Portability​

What You Get​

Policy Retrieval and Correction Diff​

RAG telemetry​

correction_diff block on every receipt​

Replay Verdict Taxonomy​

Next Steps​