Self-Hosting the Trinitite Platform
This guide is for teams and organizations that want to run the Trinitite AI Governance Platform entirely within their own infrastructure. Self-hosting gives you full control over your data, your deployment topology, and your cost model.
Five-minute install (recorded)
The cast above plays the full sequence — pull the image, bring up the stack, swap OPENAI_BASE_URL, make a governed call. Total wall-clock under five minutes on a modern dev box.
Table of Contents
- Architecture Overview — Three Independent Images
- Deployment Patterns
- Key Environment Variables
- Getting the Full Configuration Reference
Architecture Overview
The Trinitite Platform ships as three independent container images. Each image has a distinct runtime profile — some need to run continuously, others only when you need them. This separation means you are never paying to keep GPU resources alive when they are idle.
Image 1 — Control Plane (Always On)
The Control Plane is the brain of the platform. It handles all API traffic, authentication, tenant management, job orchestration, the audit ledger, and the Guardian lifecycle. It is a CPU-bound NestJS service with minimal resource requirements and should be kept running continuously in your environment.
When to scale: Horizontal scaling behind a load balancer. No GPU needed.
Typical uptime: 24/7
Image 2 — Governance (Flexible Uptime)
The Governance image runs your trained Guardians — the fine-tuned LoRA adapters that validate and correct AI outputs in real time. It is GPU-bound and serves Guardian inference for the Control Plane's /v1/chat and /v1/proxy/* endpoints, which your applications call before presenting a model response to an end user.
This image is independently scalable. If your product has peak governance traffic during business hours, you can schedule it to scale up in the morning and down overnight. If you run a 24/7 product, you keep it always on. The Control Plane queues and routes Guardian requests so nothing is dropped during scale events.
When to scale: Horizontally for throughput, vertically for lower latency. Requires NVIDIA GPU.
Typical uptime: Business hours, always-on, or event-driven — your choice.
Image 3 — Training (As Needed)
The Training image runs fine-tuning jobs to produce new or updated Guardian LoRA adapters. It only needs to be running while an active training job is in flight. Between jobs it can be fully stopped with zero cost.
The Control Plane manages job queuing. When a training job is triggered, your infrastructure (or our RunPod/cloud provider integrations) can spin up this image, complete the job, push the adapter to your configured storage backend, and shut down. The entire lifecycle is automated.
When to scale: Single large GPU instance per job, or horizontally for concurrent jobs.
Typical uptime: Hours per training run, then stopped.
Deployment Patterns
Minimum Viable (SQLite + Local Storage)
Ideal for a single developer or small team evaluating the platform.
# Only the Control Plane needs to be running for API access.
# The Governance and Training images are started on demand.
docker compose up control-plane
Production (PostgreSQL + Redis + S3)
The recommended starting point for any production deployment.
# Bring up the Control Plane and database
docker compose up control-plane database
# Bring Governance up when needed
docker compose up governance
# Trigger a training job — start, complete, then stop
docker compose up training
docker compose stop training
Enterprise HA
For high-availability deployments with your existing cloud or on-premise infrastructure, Kubernetes manifests and autoscaling configuration are available as part of the enterprise onboarding package. Contact sales to discuss your deployment architecture.
Key Environment Variables
The platform is configured entirely through environment variables — no config files to manage in production. Each adapter category can be swapped independently — moving from SQLite MVP to a Postgres + Redis production stack is a handful of environment variable changes with zero code modifications.
Adapter selection by deployment tier:
| Adapter | SQLite MVP | Postgres + Redis | Enterprise HA |
|---|---|---|---|
Database DB_TYPE | SQLITE | POSTGRES | POSTGRES / MSSQL |
Auth / SSO AUTH_ADAPTER | INTERNAL | OAUTH | OAUTH / SAML |
Logging LOGGING_ADAPTER | CONSOLE | DATADOG / CLOUDWATCH | SPLUNK / DATADOG |
Secrets SECRETS_ADAPTER | ENV | AWS / AZURE | AWS / AZURE / VAULT |
KMS KMS_ADAPTER | NONE | POSTGRES | AWS_KMS / AZURE_KEYVAULT / VAULT |
Ledger LEDGER_ADAPTER | POSTGRES | S3_WORM | S3_WORM / BLOCKCHAIN |
Cache CACHE_ADAPTER | IN_MEMORY | REDIS | REDIS |
LoRA Storage LORA_STORAGE_ADAPTER | LOCAL | S3 | S3 |
Inference INFERENCE_ENGINE | sglang | sglang / vllm | sglang / vllm |
Below are the most impactful variables to understand when standing up a self-hosted instance.
The complete list of all environment variables, validation rules, conditional requirements, and default values is available to enterprise customers. Contact our sales team for the full configuration guide.
Control Plane
DB_TYPE
Values: SQLITE | POSTGRES | MSSQL
Controls which database adapter the platform uses. SQLITE requires zero infrastructure and is ideal for local development or single-node evaluation. POSTGRES is the recommended choice for any production or multi-node deployment and unlocks connection pooling, read replicas, and SSL. MSSQL is available for organizations standardized on SQL Server.
Switching databases is a single environment variable change — no code modifications required.
AUTH_ADAPTER
Values: INTERNAL | OAUTH | SAML
The platform ships with its own user management (INTERNAL) and can be wired into your existing identity provider without any code changes. Set OAUTH and provide your provider's client credentials to enable SSO via Azure AD, Okta, Google Workspace, or any OAuth 2.0-compatible IdP. Set SAML for enterprise federations that require SAML 2.0. Both SSO modes support automatic user provisioning and role mapping from your IdP.
This means your users log in with the credentials they already have, and your IT team controls access from the directory they already manage.
LOGGING_ADAPTER
Values: CONSOLE | SPLUNK | DATADOG | CLOUDWATCH
Routes all platform logs to your existing observability stack. CONSOLE writes structured JSON to stdout, which works with any log aggregator that reads container output. SPLUNK, DATADOG, and CLOUDWATCH push logs directly to those platforms with no sidecar needed. Your security and ops teams see Trinitite platform activity alongside all other systems they already monitor.
SECRETS_ADAPTER
Values: ENV | AWS | AZURE | VAULT
Controls how the platform retrieves secrets at runtime. ENV reads them from environment variables directly, which is the simplest path for teams using Kubernetes Secrets or Docker secrets. AWS integrates with AWS Secrets Manager, AZURE with Azure Key Vault, and VAULT with HashiCorp Vault. Using your existing secrets infrastructure means credentials never touch the container image and rotate without a redeploy.
KMS_ADAPTER
Values: POSTGRES | AWS_KMS | VAULT | AZURE_KEYVAULT | NONE
The platform encrypts sensitive data at rest using envelope encryption. POSTGRES stores encrypted data keys in your database (lowest operational overhead). AWS_KMS, AZURE_KEYVAULT, and VAULT delegate key management to your existing KMS infrastructure, which is typically required for SOC 2 Type II or ISO 27001 compliance.
LEDGER_ADAPTER
Values: POSTGRES | S3_WORM | SPLUNK | BLOCKCHAIN
Every governance decision the platform makes is appended to an immutable audit ledger. POSTGRES uses append-only tables for most deployments. S3_WORM writes to an S3 bucket with Object Lock enabled, providing regulatory-grade immutability for financial services, healthcare, or other highly-regulated use cases. SPLUNK routes the ledger into your existing SIEM. The ledger is how you demonstrate to auditors exactly what every Guardian decided and why.
CACHE_ADAPTER
Values: IN_MEMORY | REDIS
IN_MEMORY works out of the box for single-instance deployments. REDIS is required when running multiple Control Plane instances behind a load balancer so that cached sessions, rate limit counters, and auth tokens are shared across instances. Providing your own Redis also means you control eviction policies and retention.
LORA_STORAGE_ADAPTER
Values: LOCAL | S3
Trained Guardian LoRA adapters need to be accessible by the Governance image. LOCAL writes adapters to a mounted volume, which works for single-host setups. S3 stores adapters in a bucket that both the Training and Governance images can access independently, enabling the Training image to produce a new adapter and the Governance image to pick it up without any coordination — or running on separate hosts entirely.
RATE_LIMIT_ADAPTER
Values: IN_MEMORY | REDIS
Rate limiting protects your platform from runaway clients and cost spikes. IN_MEMORY enforces limits per instance. REDIS shares counters across all instances, which is necessary for accurate enforcement in horizontally scaled deployments.
MFA_REQUIRED
Values: true | false
When true, all users must enroll in TOTP-based multi-factor authentication before accessing the platform. MFA_REQUIRED_FOR_ADMINS is a separate control that enforces MFA for admin accounts regardless of the platform-wide setting — useful for meeting compliance requirements without forcing MFA on every analyst.
CORS_ORIGINS
Values: comma-separated origins, or *
Controls which origins are allowed to make API requests. Locking this down to your application domains is a one-line change that prevents unauthorized cross-origin access in production.
Governance (Inference Service)
INFERENCE_ENGINE
Values: sglang | vllm
The underlying inference runtime. sglang is the default and delivers the highest throughput for governance workloads, with native support for LoRA adapter hot-swapping and structured output. vllm is available as an alternative if your team has existing operational familiarity with it. Both engines expose an identical API surface — your application code does not change when you switch.
GPU_PROVIDER (Governance)
Values: local_cuda | runpod | nebius | aws_ec2 | gcp_compute | azure_vm
Where the GPU compute lives. local_cuda uses whatever GPU is in the host machine. The cloud provider options provision GPU instances on demand via the respective APIs — the platform handles instance lifecycle, so you can scale governance capacity up and down from a single variable without managing VM automation yourself.
ENABLE_LORA
Values: true | false
Enables dynamic LoRA adapter loading. When true, the Governance image can serve multiple Guardians simultaneously, swapping adapters per request without reloading the base model. This is what makes it possible for a single GPU instance to enforce different governance policies for different tenants, products, or risk profiles at the same time.
ENABLE_DETERMINISTIC_INFERENCE
Values: true | false
When true, the platform seeds the inference engine so that identical inputs produce identical outputs. This is critical for audit reproducibility — if a governance decision is challenged, you can re-run the exact same inference and get the exact same result. It also makes testing significantly more reliable.
Training Service
TRAINING_ENGINE
Values: unsloth | peft | trl
The fine-tuning library used to train Guardians. unsloth is the default and provides the best throughput-per-dollar through kernel-level optimizations and 4-bit quantization, making it possible to train on consumer-grade GPUs. peft and trl are available for teams with existing training infrastructure or workflows built around those libraries.
GPU_PROVIDER (Training)
Values: local_cuda | runpod | nebius | aws_ec2 | gcp_compute | azure_vm
Same adapter pattern as the Governance image. For training, the cloud provider options are particularly valuable because you only pay for GPU time while a job is actually running. A typical LoRA fine-tuning job completes in under two hours on a single A100, so serverless GPU platforms like RunPod or Nebius often cost less than a dollar per Guardian update.
STORAGE_BACKEND (Training)
Values: local | s3 | huggingface | azure | gcs | minio
Where trained adapters and checkpoints are written after a job completes. Using a shared object store (S3, GCS, Azure Blob, or MinIO for on-premise) means the Training image can be destroyed immediately after a job completes and the Governance image can pull the new adapter independently.
DATA_GEN_MODEL
Default: accounts/fireworks/models/qwen3-235b-a22b-thinking-2507
The large language model used by the Teleological Generator to produce synthetic training data for your Guardians. Pointing this at your own hosted model or a different provider API gives you full control over training data quality and cost, and ensures that synthetic data generation never exits your security boundary if data residency is a requirement.
WANDB_ENABLED
Values: true | false
When true, training runs emit metrics to Weights & Biases for experiment tracking. This gives your ML team a centralized view of every Guardian training run — loss curves, hyperparameters, dataset statistics, and adapter performance — without any additional instrumentation.
Full Configuration Reference
The variables above represent the most commonly configured options for a self-hosted deployment. The platform has a significantly deeper configuration surface covering connection pool tuning, circuit breaker thresholds, advanced LoRA hyperparameters, multi-region replication, compliance logging controls, and more.
The full environment variable reference is available to enterprise customers.
To get the complete configuration guide, discuss volume pricing, or talk through a deployment architecture for your organization, reach out to our sales team.
Where to go next
Self-hosting decisions almost always pair with a specific surface — how the LLM Proxy is configured, how the MCP Gateway routes upstreams, how the Glass Box Ledger is backed. Every deep-dive covers the deployment-specific details for its surface:
- Data plane — LLM Proxy · MCP Gateway · CLI Firewall · Skill Vault
- Trust layer — Glass Box Ledger · Observability · Compliance Architecture
- Intelligence — Guardian Training · Testing & Simulation
- Identity — Identity & RBAC · NHI Governance · Policy Intelligence