Services

Deploy, scale, and manage production HTTP services, APIs, and ML models.

Outpost Services turns any HTTP application — REST APIs, web apps, ML model servers — into a managed, autoscaling service.

You provide the command and resource requirements. Outpost handles provisioning, load balancing, TLS, scaling, and monitoring.

Key features

Autoscaling — traffic-aware scaling powered by the CoDel algorithm. Targets P99 latency under configurable thresholds. Scales from zero to many replicas automatically.
Custom domains and TLS — attach your own domains with automatic Let's Encrypt certificates.
Load balancing — production-grade traffic routing via Pingora. Intelligent request distribution across all healthy replicas.
Multi-cloud — deploy across AWS, Azure, and DigitalOcean. Run replicas in multiple regions.
Scale to zero — when no traffic arrives, replicas scale down to zero. You pay nothing until the next request.
Observability — real-time logs, replica health monitoring, and request metrics in the dashboard.

Quick start

# Deploy a service
outpost serve launch 
  --name llama-api 
  --cloud aws 
  --region us-east-1 
  --gpus A100 
  --port 8080
 
# Check status
outpost serve status llama-api
 
# List all services
outpost serve list
 
# Delete
outpost serve delete llama-api

# Deploy a service
outpost serve launch 
  --name llama-api 
  --cloud aws 
  --region us-east-1 
  --gpus A100 
  --port 8080
 
# Check status
outpost serve status llama-api
 
# List all services
outpost serve list
 
# Delete
outpost serve delete llama-api

Architecture

Under the hood, each service is backed by an orchestration engine that manages the full lifecycle:

Provisioning — Outpost spins up instances on the target cloud, installs dependencies, starts your application.
Health checking — the readiness probe is polled continuously. Replicas only receive traffic once healthy.
Load balancing — Pingora distributes requests across healthy replicas using least-connections routing.
Autoscaling — the CoDel-based controller monitors request queues and latency, adjusting replica count every 60 seconds.
Recovery — failed replicas are automatically replaced. Outpost maintains the minimum replica count.

Use cases

Use case	Example
Model serving	Deploy LLMs with vLLM, TGI, or Triton behind an OpenAI-compatible API
REST APIs	Production FastAPI, Express, or Go services with autoscaling
Web applications	Full-stack apps with custom domains and SSL
Batch inference	On-demand GPU endpoints that scale to zero when idle

Next steps

Deploy a Service — configuration and deployment guide
Autoscaling — CoDel algorithm, replica policies, scale-to-zero
Custom Domains — DNS, TLS, and wildcard setup

Deploy a Service

Create a Job