LLM Proxy
Change your base URL. Not your code.
The Trinitite LLM Proxy is a drop-in intermediary between your applications and any LLM provider. It speaks the API formats your SDKs already use — OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, Azure OpenAI — so adopting it is typically a one-line change. You gain credential management, policy enforcement, deterministic audit, and real-time spend control. You lose zero developer experience.
One Base URL, Every Provider
One Base URL → Every Provider, Governed
Your application points its SDK at https://<your-trinitite>/v1/proxy (or the equivalent self-hosted URL) and uses a Trinitite API key instead of a provider key. Trinitite then:
- Authenticates the request.
- Resolves which provider credential to use (a vaulted ID, not a raw secret).
- Runs pre-governance on the input vector.
- Proxies the call to the provider you selected.
- Runs post-governance on the response.
- Writes a canonical audit record to the Glass Box Ledger.
- Returns the sanitized response to your application.
Provider keys never leave the vault. Your app holds a Trinitite key; Trinitite holds the provider key. Rotation is a Trinitite operation. Revocation is a Trinitite operation. A developer leaving the team does not force a scramble to rotate OpenAI keys across fifteen services.
One Request, End to End
One Request Through the Proxy — Span Timeline
A single proxied call decomposes into seven spans. Governance overhead — roughly 80ms on a typical call — is dominated by pre- and post-evaluation of the response vector against the active policy manifold. The provider call itself is the overwhelming majority of wall-clock time. That ratio holds for streaming and non-streaming responses alike.
Every span is an OpenTelemetry span with a trace_id / span_id pair. Every outcome is a structured audit row. Your SIEM already knows how to correlate them.
Multi-Turn Tool Use: Shape Preserved
Multi-Turn Tool Use — Shape Preserved Through the Proxy
Agent frameworks that echo assistant turns with content: null + tool_calls before the matching role: "tool" result — OpenAI Agents SDK, LangGraph, AutoGen, CrewAI — work through the proxy without modification. The proxy preserves the message shape byte-for-byte on the way to the provider. Multi-turn reasoning loops, function-call chains, and hosted-MCP conversations round-trip correctly no matter how many turns deep they go.
Hosted MCP is passthrough
Provider-native hosted MCP — OpenAI Responses API tools: [{ type: "mcp", ... }] and Anthropic mcp_servers / tools: [{ type: "mcp_toolset", ... }] — passes through the proxy with provider beta plumbing handled for you (e.g. anthropic-beta: mcp-client-* headers). You don't set beta flags and you don't reshape the request. mcp_list_tools and mcp_call events land in your audit log alongside the completion.
For governance across multiple upstream tool servers you control, see the MCP Gateway deep-dive — a distinct, aggregating surface with per-tool certification and deterministic receipts.
NHI Spend Sessions
NHI Spend Session — State Machine
For autonomous workloads, Trinitite supports spend sessions with configurable budgets. A session transitions through IDLE → ACTIVE → THROTTLING → HALTED as budget consumed. At the HALT threshold the proxy returns 429 Too Many Requests until your orchestration layer explicitly resets the session.
This turns "runaway agent cost" from a monthly invoice surprise into a real-time operational control. Budgets are scoped per NHI, per organization, or per session ID — your choice. The state machine is deterministic and the transitions are audited.
Governance on Streaming
The proxy applies governance to streaming responses, not just completions. Policies that require full-context evaluation (detecting a PII leak building across chunks, catching a gradually-revealed jailbreak) operate on the accumulating response state. Your application still receives streamed chunks; it just receives the governed chunks with injections, leaks, or policy violations redacted or halted in flight.
Emergency Shutoff
If something goes wrong — a compromised provider key, a runaway agent, a production incident — a single operation halts all proxy traffic for your organization:
POST /v1/admin/emergency-shutdown
{
"reason": "credential-rotation-in-progress",
"scope": "org"
}
Restore traffic when you're ready. Incident response doesn't require hunting down API keys across services or rotating credentials in fifteen places. The shutdown and restore events are themselves audited, anchored to the ledger, and subject to approval controls.
What Each Provider Surface Supports
| Surface | OpenAI Chat | OpenAI Responses | Anthropic Messages | Azure | Self-Hosted |
|---|---|---|---|---|---|
| Credential vaulting | ✓ | ✓ | ✓ | ✓ | ✓ |
| Pre- / post-governance | ✓ | ✓ | ✓ | ✓ | ✓ |
| Streaming governance | ✓ | ✓ | ✓ | ✓ | ✓ |
| Hosted MCP passthrough | — | ✓ | ✓ | — | — |
| Multi-turn shape preserve | ✓ | ✓ | ✓ | ✓ | ✓ |
| NHI spend sessions | ✓ | ✓ | ✓ | ✓ | ✓ |
| Deterministic receipts | ✓ | ✓ | ✓ | ✓ | ✓ |
What You Get
| Capability | Direct-to-provider | Via Trinitite Proxy |
|---|---|---|
| Provider secrets | In env vars / CI / code | Vaulted; your app holds a Trinitite key |
| Policy enforcement | Custom middleware per app | Configurable governance at one hop |
| Audit trail | Scattered provider logs | Structured, centralized, compliance-ready |
| Agent spend control | Monthly invoice reconciliation | Real-time session-level limits |
| Incident response | Rotate keys across services | One-operation emergency shutoff |
| Multi-provider strategy | Per-provider integration | Single base URL, route by credential ID |
Next Steps
→ MCP Gateway — the governed aggregation layer for your own upstream tool servers.
→ NHI Governance — where the identity on every proxy call gets scoped and tiered.
→ Observability — the three streams that capture every proxy event.