LLM Proxy

Change your base URL. Not your code.

The Trinitite LLM Proxy is a drop-in intermediary between your applications and any LLM provider. It speaks the API formats your SDKs already use — OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, Azure OpenAI — so adopting it is typically a one-line change. You gain credential management, policy enforcement, deterministic audit, and real-time spend control. You lose zero developer experience.

One Base URL, Every Provider

One Base URL → Every Provider, Governed

Your application points its SDK at https://<your-trinitite>/v1/proxy (or the equivalent self-hosted URL) and uses a Trinitite API key instead of a provider key. Trinitite then:

Authenticates the request.
Resolves which provider credential to use (a vaulted ID, not a raw secret).
Runs pre-governance on the input vector.
Proxies the call to the provider you selected.
Runs post-governance on the response.
Writes a canonical audit record to the Glass Box Ledger.
Returns the sanitized response to your application.

Provider keys never leave the vault. Your app holds a Trinitite key; Trinitite holds the provider key. Rotation is a Trinitite operation. Revocation is a Trinitite operation. A developer leaving the team does not force a scramble to rotate OpenAI keys across fifteen services.

One Request, End to End

One Request Through the Proxy — Span Timeline

A single proxied call decomposes into seven spans. Governance overhead — roughly 80ms on a typical call — is dominated by pre- and post-evaluation of the response vector against the active policy manifold. The provider call itself is the overwhelming majority of wall-clock time. That ratio holds for streaming and non-streaming responses alike.

Every span is an OpenTelemetry span with a trace_id / span_id pair. Every outcome is a structured audit row. Your SIEM already knows how to correlate them.

Multi-Turn Tool Use: Shape Preserved

Multi-Turn Tool Use — Shape Preserved Through the Proxy

Agent frameworks that echo assistant turns with content: null + tool_calls before the matching role: "tool" result — OpenAI Agents SDK, LangGraph, AutoGen, CrewAI — work through the proxy without modification. The proxy preserves the message shape byte-for-byte on the way to the provider. Multi-turn reasoning loops, function-call chains, and hosted-MCP conversations round-trip correctly no matter how many turns deep they go.

Hosted MCP is passthrough

Provider-native hosted MCP — OpenAI Responses API tools: [{ type: "mcp", ... }] and Anthropic mcp_servers / tools: [{ type: "mcp_toolset", ... }] — passes through the proxy with provider beta plumbing handled for you (e.g. anthropic-beta: mcp-client-* headers). You don't set beta flags and you don't reshape the request. mcp_list_tools and mcp_call events land in your audit log alongside the completion.

For governance across multiple upstream tool servers you control, see the MCP Gateway deep-dive — a distinct, aggregating surface with per-tool certification and deterministic receipts.

NHI Spend Sessions

NHI Spend Session — State Machine

For autonomous workloads, Trinitite supports spend sessions with configurable budgets. A session transitions through IDLE → ACTIVE → THROTTLING → HALTED as budget consumed. At the HALT threshold the proxy returns 429 Too Many Requests until your orchestration layer explicitly resets the session.

This turns "runaway agent cost" from a monthly invoice surprise into a real-time operational control. Budgets are scoped per NHI, per organization, or per session ID — your choice. The state machine is deterministic and the transitions are audited.

Governance on Streaming

The proxy applies governance to streaming responses, not just completions. Policies that require full-context evaluation (detecting a PII leak building across chunks, catching a gradually-revealed jailbreak) operate on the accumulating response state. Your application still receives streamed chunks; it just receives the governed chunks with injections, leaks, or policy violations redacted or halted in flight.

Emergency Shutoff

If something goes wrong — a compromised provider key, a runaway agent, a production incident — a single operation halts all proxy traffic for your organization:

POST /v1/admin/emergency-shutdown
{
  "reason": "credential-rotation-in-progress",
  "scope": "org"
}

Restore traffic when you're ready. Incident response doesn't require hunting down API keys across services or rotating credentials in fifteen places. The shutdown and restore events are themselves audited, anchored to the ledger, and subject to approval controls.

What Each Provider Surface Supports

Surface	OpenAI Chat	OpenAI Responses	Anthropic Messages	Azure	Self-Hosted
Credential vaulting	✓	✓	✓	✓	✓
Pre- / post-governance	✓	✓	✓	✓	✓
Streaming governance	✓	✓	✓	✓	✓
Hosted MCP passthrough	—	✓	✓	—	—
Multi-turn shape preserve	✓	✓	✓	✓	✓
NHI spend sessions	✓	✓	✓	✓	✓
Deterministic receipts	✓	✓	✓	✓	✓

What You Get

Capability	Direct-to-provider	Via Trinitite Proxy
Provider secrets	In env vars / CI / code	Vaulted; your app holds a Trinitite key
Policy enforcement	Custom middleware per app	Configurable governance at one hop
Audit trail	Scattered provider logs	Structured, centralized, compliance-ready
Agent spend control	Monthly invoice reconciliation	Real-time session-level limits
Incident response	Rotate keys across services	One-operation emergency shutoff
Multi-provider strategy	Per-provider integration	Single base URL, route by credential ID

Next Steps

→ MCP Gateway — the governed aggregation layer for your own upstream tool servers.

→ NHI Governance — where the identity on every proxy call gets scoped and tiered.

→ Observability — the three streams that capture every proxy event.

One Base URL, Every Provider​

One Request, End to End​

Multi-Turn Tool Use: Shape Preserved​

Hosted MCP is passthrough​

NHI Spend Sessions​

Governance on Streaming​

Emergency Shutoff​

What Each Provider Surface Supports​

What You Get​

Next Steps​