Skip to main content

Self-Hosting the Trinitite Platform

This guide is for teams and organizations that want to run the Trinitite AI Governance Platform entirely within their own infrastructure. Self-hosting gives you full control over your data, your deployment topology, and your cost model.

Five-minute install (recorded)

$ trinitite quickstart

The cast above plays the full sequence — pull the image, bring up the stack, swap OPENAI_BASE_URL, make a governed call. Total wall-clock under five minutes on a modern dev box.


Table of Contents

  1. Architecture Overview — Three Independent Images
  2. Deployment Patterns
  3. Key Environment Variables
  4. Getting the Full Configuration Reference

Architecture Overview

The Trinitite Platform ships as three independent container images. Each image has a distinct runtime profile — some need to run continuously, others only when you need them. This separation means you are never paying to keep GPU resources alive when they are idle.

TRINITITE PLATFORMCONTROL PLANEALWAYS ONNestJS · Node.jsCPU workloadAuth · RBAC · LedgerGuardian Lifecycle · Jobsport: 8080GOVERNANCEON-DEMAND / HASGLang · PythonGPU workloadGuardian InferenceLoRA Hot-Swap · Batchingport: 8000TRAININGAS NEEDEDUnsloth · PythonGPU onlyLoRA Fine-TuningTDG · Densificationport: 8001orchestratestriggers training jobs

Image 1 — Control Plane (Always On)

The Control Plane is the brain of the platform. It handles all API traffic, authentication, tenant management, job orchestration, the audit ledger, and the Guardian lifecycle. It is a CPU-bound NestJS service with minimal resource requirements and should be kept running continuously in your environment.

When to scale: Horizontal scaling behind a load balancer. No GPU needed.

Typical uptime: 24/7


Image 2 — Governance (Flexible Uptime)

The Governance image runs your trained Guardians — the fine-tuned LoRA adapters that validate and correct AI outputs in real time. It is GPU-bound and serves Guardian inference for the Control Plane's /v1/chat and /v1/proxy/* endpoints, which your applications call before presenting a model response to an end user.

This image is independently scalable. If your product has peak governance traffic during business hours, you can schedule it to scale up in the morning and down overnight. If you run a 24/7 product, you keep it always on. The Control Plane queues and routes Guardian requests so nothing is dropped during scale events.

When to scale: Horizontally for throughput, vertically for lower latency. Requires NVIDIA GPU.

Typical uptime: Business hours, always-on, or event-driven — your choice.


Image 3 — Training (As Needed)

The Training image runs fine-tuning jobs to produce new or updated Guardian LoRA adapters. It only needs to be running while an active training job is in flight. Between jobs it can be fully stopped with zero cost.

The Control Plane manages job queuing. When a training job is triggered, your infrastructure (or our RunPod/cloud provider integrations) can spin up this image, complete the job, push the adapter to your configured storage backend, and shut down. The entire lifecycle is automated.

When to scale: Single large GPU instance per job, or horizontally for concurrent jobs.

Typical uptime: Hours per training run, then stopped.


Deployment Patterns

Minimum Viable (SQLite + Local Storage)

Ideal for a single developer or small team evaluating the platform.

# Only the Control Plane needs to be running for API access.
# The Governance and Training images are started on demand.
docker compose up control-plane

Production (PostgreSQL + Redis + S3)

The recommended starting point for any production deployment.

# Bring up the Control Plane and database
docker compose up control-plane database

# Bring Governance up when needed
docker compose up governance

# Trigger a training job — start, complete, then stop
docker compose up training
docker compose stop training

Enterprise HA

For high-availability deployments with your existing cloud or on-premise infrastructure, Kubernetes manifests and autoscaling configuration are available as part of the enterprise onboarding package. Contact sales to discuss your deployment architecture.


Key Environment Variables

The platform is configured entirely through environment variables — no config files to manage in production. Each adapter category can be swapped independently — moving from SQLite MVP to a Postgres + Redis production stack is a handful of environment variable changes with zero code modifications.

Adapter selection by deployment tier:

AdapterSQLite MVPPostgres + RedisEnterprise HA
Database
DB_TYPE
SQLITEPOSTGRESPOSTGRES / MSSQL
Auth / SSO
AUTH_ADAPTER
INTERNALOAUTHOAUTH / SAML
Logging
LOGGING_ADAPTER
CONSOLEDATADOG / CLOUDWATCHSPLUNK / DATADOG
Secrets
SECRETS_ADAPTER
ENVAWS / AZUREAWS / AZURE / VAULT
KMS
KMS_ADAPTER
NONEPOSTGRESAWS_KMS / AZURE_KEYVAULT / VAULT
Ledger
LEDGER_ADAPTER
POSTGRESS3_WORMS3_WORM / BLOCKCHAIN
Cache
CACHE_ADAPTER
IN_MEMORYREDISREDIS
LoRA Storage
LORA_STORAGE_ADAPTER
LOCALS3S3
Inference
INFERENCE_ENGINE
sglangsglang / vllmsglang / vllm

Below are the most impactful variables to understand when standing up a self-hosted instance.

Need the full reference?

The complete list of all environment variables, validation rules, conditional requirements, and default values is available to enterprise customers. Contact our sales team for the full configuration guide.


Control Plane

DB_TYPE

Values: SQLITE | POSTGRES | MSSQL

Controls which database adapter the platform uses. SQLITE requires zero infrastructure and is ideal for local development or single-node evaluation. POSTGRES is the recommended choice for any production or multi-node deployment and unlocks connection pooling, read replicas, and SSL. MSSQL is available for organizations standardized on SQL Server.

Switching databases is a single environment variable change — no code modifications required.


AUTH_ADAPTER

Values: INTERNAL | OAUTH | SAML

The platform ships with its own user management (INTERNAL) and can be wired into your existing identity provider without any code changes. Set OAUTH and provide your provider's client credentials to enable SSO via Azure AD, Okta, Google Workspace, or any OAuth 2.0-compatible IdP. Set SAML for enterprise federations that require SAML 2.0. Both SSO modes support automatic user provisioning and role mapping from your IdP.

This means your users log in with the credentials they already have, and your IT team controls access from the directory they already manage.


LOGGING_ADAPTER

Values: CONSOLE | SPLUNK | DATADOG | CLOUDWATCH

Routes all platform logs to your existing observability stack. CONSOLE writes structured JSON to stdout, which works with any log aggregator that reads container output. SPLUNK, DATADOG, and CLOUDWATCH push logs directly to those platforms with no sidecar needed. Your security and ops teams see Trinitite platform activity alongside all other systems they already monitor.


SECRETS_ADAPTER

Values: ENV | AWS | AZURE | VAULT

Controls how the platform retrieves secrets at runtime. ENV reads them from environment variables directly, which is the simplest path for teams using Kubernetes Secrets or Docker secrets. AWS integrates with AWS Secrets Manager, AZURE with Azure Key Vault, and VAULT with HashiCorp Vault. Using your existing secrets infrastructure means credentials never touch the container image and rotate without a redeploy.


KMS_ADAPTER

Values: POSTGRES | AWS_KMS | VAULT | AZURE_KEYVAULT | NONE

The platform encrypts sensitive data at rest using envelope encryption. POSTGRES stores encrypted data keys in your database (lowest operational overhead). AWS_KMS, AZURE_KEYVAULT, and VAULT delegate key management to your existing KMS infrastructure, which is typically required for SOC 2 Type II or ISO 27001 compliance.


LEDGER_ADAPTER

Values: POSTGRES | S3_WORM | SPLUNK | BLOCKCHAIN

Every governance decision the platform makes is appended to an immutable audit ledger. POSTGRES uses append-only tables for most deployments. S3_WORM writes to an S3 bucket with Object Lock enabled, providing regulatory-grade immutability for financial services, healthcare, or other highly-regulated use cases. SPLUNK routes the ledger into your existing SIEM. The ledger is how you demonstrate to auditors exactly what every Guardian decided and why.


CACHE_ADAPTER

Values: IN_MEMORY | REDIS

IN_MEMORY works out of the box for single-instance deployments. REDIS is required when running multiple Control Plane instances behind a load balancer so that cached sessions, rate limit counters, and auth tokens are shared across instances. Providing your own Redis also means you control eviction policies and retention.


LORA_STORAGE_ADAPTER

Values: LOCAL | S3

Trained Guardian LoRA adapters need to be accessible by the Governance image. LOCAL writes adapters to a mounted volume, which works for single-host setups. S3 stores adapters in a bucket that both the Training and Governance images can access independently, enabling the Training image to produce a new adapter and the Governance image to pick it up without any coordination — or running on separate hosts entirely.


RATE_LIMIT_ADAPTER

Values: IN_MEMORY | REDIS

Rate limiting protects your platform from runaway clients and cost spikes. IN_MEMORY enforces limits per instance. REDIS shares counters across all instances, which is necessary for accurate enforcement in horizontally scaled deployments.


MFA_REQUIRED

Values: true | false

When true, all users must enroll in TOTP-based multi-factor authentication before accessing the platform. MFA_REQUIRED_FOR_ADMINS is a separate control that enforces MFA for admin accounts regardless of the platform-wide setting — useful for meeting compliance requirements without forcing MFA on every analyst.


CORS_ORIGINS

Values: comma-separated origins, or *

Controls which origins are allowed to make API requests. Locking this down to your application domains is a one-line change that prevents unauthorized cross-origin access in production.


Governance (Inference Service)

INFERENCE_ENGINE

Values: sglang | vllm

The underlying inference runtime. sglang is the default and delivers the highest throughput for governance workloads, with native support for LoRA adapter hot-swapping and structured output. vllm is available as an alternative if your team has existing operational familiarity with it. Both engines expose an identical API surface — your application code does not change when you switch.


GPU_PROVIDER (Governance)

Values: local_cuda | runpod | nebius | aws_ec2 | gcp_compute | azure_vm

Where the GPU compute lives. local_cuda uses whatever GPU is in the host machine. The cloud provider options provision GPU instances on demand via the respective APIs — the platform handles instance lifecycle, so you can scale governance capacity up and down from a single variable without managing VM automation yourself.


ENABLE_LORA

Values: true | false

Enables dynamic LoRA adapter loading. When true, the Governance image can serve multiple Guardians simultaneously, swapping adapters per request without reloading the base model. This is what makes it possible for a single GPU instance to enforce different governance policies for different tenants, products, or risk profiles at the same time.


ENABLE_DETERMINISTIC_INFERENCE

Values: true | false

When true, the platform seeds the inference engine so that identical inputs produce identical outputs. This is critical for audit reproducibility — if a governance decision is challenged, you can re-run the exact same inference and get the exact same result. It also makes testing significantly more reliable.


Training Service

TRAINING_ENGINE

Values: unsloth | peft | trl

The fine-tuning library used to train Guardians. unsloth is the default and provides the best throughput-per-dollar through kernel-level optimizations and 4-bit quantization, making it possible to train on consumer-grade GPUs. peft and trl are available for teams with existing training infrastructure or workflows built around those libraries.


GPU_PROVIDER (Training)

Values: local_cuda | runpod | nebius | aws_ec2 | gcp_compute | azure_vm

Same adapter pattern as the Governance image. For training, the cloud provider options are particularly valuable because you only pay for GPU time while a job is actually running. A typical LoRA fine-tuning job completes in under two hours on a single A100, so serverless GPU platforms like RunPod or Nebius often cost less than a dollar per Guardian update.


STORAGE_BACKEND (Training)

Values: local | s3 | huggingface | azure | gcs | minio

Where trained adapters and checkpoints are written after a job completes. Using a shared object store (S3, GCS, Azure Blob, or MinIO for on-premise) means the Training image can be destroyed immediately after a job completes and the Governance image can pull the new adapter independently.


DATA_GEN_MODEL

Default: accounts/fireworks/models/qwen3-235b-a22b-thinking-2507

The large language model used by the Teleological Generator to produce synthetic training data for your Guardians. Pointing this at your own hosted model or a different provider API gives you full control over training data quality and cost, and ensures that synthetic data generation never exits your security boundary if data residency is a requirement.


WANDB_ENABLED

Values: true | false

When true, training runs emit metrics to Weights & Biases for experiment tracking. This gives your ML team a centralized view of every Guardian training run — loss curves, hyperparameters, dataset statistics, and adapter performance — without any additional instrumentation.


Full Configuration Reference

The variables above represent the most commonly configured options for a self-hosted deployment. The platform has a significantly deeper configuration surface covering connection pool tuning, circuit breaker thresholds, advanced LoRA hyperparameters, multi-region replication, compliance logging controls, and more.

The full environment variable reference is available to enterprise customers.

To get the complete configuration guide, discuss volume pricing, or talk through a deployment architecture for your organization, reach out to our sales team.


Where to go next

Self-hosting decisions almost always pair with a specific surface — how the LLM Proxy is configured, how the MCP Gateway routes upstreams, how the Glass Box Ledger is backed. Every deep-dive covers the deployment-specific details for its surface: