Skip to main content

LLM Proxy

Change your base URL. Not your code.

The Trinitite LLM Proxy is a drop-in intermediary between your applications and any LLM provider. It speaks the API formats your SDKs already use — OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, Azure OpenAI — so adopting it is typically a one-line change. You gain credential management, policy enforcement, deterministic audit, and real-time spend control. You lose zero developer experience.


One Base URL, Every Provider

One Base URL → Every Provider, Governed

YOUR APPopenai · anthropic · azure SDKunchanged codebase_url →TRINITITE PROXYAPI key authCredential vaultGovernance evaluationGlass Box Ledger writeOpenAIChat · ResponsesAnthropicMessages · mcp_serversAzure OpenAIDeployment routingSelf-hostedvLLM · SGLangProvider secrets stay vaulted in Trinitite. Your app references a credential ID,never a raw API key. Rotation happens without a redeploy.

Your application points its SDK at https://<your-trinitite>/v1/proxy (or the equivalent self-hosted URL) and uses a Trinitite API key instead of a provider key. Trinitite then:

  1. Authenticates the request.
  2. Resolves which provider credential to use (a vaulted ID, not a raw secret).
  3. Runs pre-governance on the input vector.
  4. Proxies the call to the provider you selected.
  5. Runs post-governance on the response.
  6. Writes a canonical audit record to the Glass Box Ledger.
  7. Returns the sanitized response to your application.

Provider keys never leave the vault. Your app holds a Trinitite key; Trinitite holds the provider key. Rotation is a Trinitite operation. Revocation is a Trinitite operation. A developer leaving the team does not force a scramble to rotate OpenAI keys across fifteen services.


One Request, End to End

One Request Through the Proxy — Span Timeline

0ms936msAuthAPI key · NHI · session2msCredential ResolveVault → provider key3msPre-GovernanceInput vector evaluation40msProvider CallOpenAI / Anthropic / …850msPost-GovernanceOutput vector evaluation38msAudit WriteGlass Box Ledger append2msReturn200 · clean response1msGovernance overhead ≈ 80ms of 936ms total · dominant cost remains the provider call

A single proxied call decomposes into seven spans. Governance overhead — roughly 80ms on a typical call — is dominated by pre- and post-evaluation of the response vector against the active policy manifold. The provider call itself is the overwhelming majority of wall-clock time. That ratio holds for streaming and non-streaming responses alike.

Every span is an OpenTelemetry span with a trace_id / span_id pair. Every outcome is a structured audit row. Your SIEM already knows how to correlate them.


Multi-Turn Tool Use: Shape Preserved

Multi-Turn Tool Use — Shape Preserved Through the Proxy

AGENT FRAMEWORKOpenAI Agents SDK · LangGraphAutoGen · CrewAIuserassistant (null · tool_calls)tool (result)assistant (null · tool_calls)tool (result)TRINITITE PROXYshape-preserving passthrough• preserves message shape byte-for-byte• preserves tool_call_id linkage• preserves provider beta plumbing(anthropic-beta · etc.)+ governance at every turn+ audit record per callPROVIDEROpenAI · AnthropicHOSTED MCPtools: [{ type: "mcp", ... }]or mcp_servers:[…]COMPLETIONmcp_list_toolsmcp_call · textGOVERNANCE HOOKSEvery assistant turn → Guardian verdictEvery tool call → per-tool GuardianEvery tool result → output-schema checkEvery mcp_list_tools → tool allowlistServer-initiated sampling → governedToken usage → spend session updateAll captured in Glass Box Ledger

Agent frameworks that echo assistant turns with content: null + tool_calls before the matching role: "tool" result — OpenAI Agents SDK, LangGraph, AutoGen, CrewAI — work through the proxy without modification. The proxy preserves the message shape byte-for-byte on the way to the provider. Multi-turn reasoning loops, function-call chains, and hosted-MCP conversations round-trip correctly no matter how many turns deep they go.

Hosted MCP is passthrough

Provider-native hosted MCP — OpenAI Responses API tools: [{ type: "mcp", ... }] and Anthropic mcp_servers / tools: [{ type: "mcp_toolset", ... }] — passes through the proxy with provider beta plumbing handled for you (e.g. anthropic-beta: mcp-client-* headers). You don't set beta flags and you don't reshape the request. mcp_list_tools and mcp_call events land in your audit log alongside the completion.

For governance across multiple upstream tool servers you control, see the MCP Gateway deep-dive — a distinct, aggregating surface with per-tool certification and deterministic receipts.


NHI Spend Sessions

NHI Spend Session — State Machine

IDLENo active sessionACTIVE< 80% budget · full speedTHROTTLING80–100% · warn + slowHALTED100% hit · 429 until resetRESETBudget reset · resumestart_session80% reached100% reached429 → resetresumeRunaway-agent cost becomes a real-time operational control, not a monthly invoice surprise.

For autonomous workloads, Trinitite supports spend sessions with configurable budgets. A session transitions through IDLE → ACTIVE → THROTTLING → HALTED as budget consumed. At the HALT threshold the proxy returns 429 Too Many Requests until your orchestration layer explicitly resets the session.

This turns "runaway agent cost" from a monthly invoice surprise into a real-time operational control. Budgets are scoped per NHI, per organization, or per session ID — your choice. The state machine is deterministic and the transitions are audited.


Governance on Streaming

The proxy applies governance to streaming responses, not just completions. Policies that require full-context evaluation (detecting a PII leak building across chunks, catching a gradually-revealed jailbreak) operate on the accumulating response state. Your application still receives streamed chunks; it just receives the governed chunks with injections, leaks, or policy violations redacted or halted in flight.


Emergency Shutoff

If something goes wrong — a compromised provider key, a runaway agent, a production incident — a single operation halts all proxy traffic for your organization:

POST /v1/admin/emergency-shutdown
{
"reason": "credential-rotation-in-progress",
"scope": "org"
}

Restore traffic when you're ready. Incident response doesn't require hunting down API keys across services or rotating credentials in fifteen places. The shutdown and restore events are themselves audited, anchored to the ledger, and subject to approval controls.


What Each Provider Surface Supports

SurfaceOpenAI ChatOpenAI ResponsesAnthropic MessagesAzureSelf-Hosted
Credential vaulting
Pre- / post-governance
Streaming governance
Hosted MCP passthrough
Multi-turn shape preserve
NHI spend sessions
Deterministic receipts

What You Get

CapabilityDirect-to-providerVia Trinitite Proxy
Provider secretsIn env vars / CI / codeVaulted; your app holds a Trinitite key
Policy enforcementCustom middleware per appConfigurable governance at one hop
Audit trailScattered provider logsStructured, centralized, compliance-ready
Agent spend controlMonthly invoice reconciliationReal-time session-level limits
Incident responseRotate keys across servicesOne-operation emergency shutoff
Multi-provider strategyPer-provider integrationSingle base URL, route by credential ID

Next Steps

MCP Gateway — the governed aggregation layer for your own upstream tool servers.

NHI Governance — where the identity on every proxy call gets scoped and tiered.

Observability — the three streams that capture every proxy event.