Command Palette
Search for a command to run

Services

Deploy, scale, and manage production HTTP services, APIs, and ML models.

Outpost Services turns any HTTP application — REST APIs, web apps, ML model servers — into a managed, autoscaling service.

You provide the command and resource requirements. Outpost handles provisioning, load balancing, TLS, scaling, and monitoring.

Key features

  • Autoscaling — traffic-aware scaling powered by the CoDel algorithm. Targets P99 latency under configurable thresholds. Scales from zero to many replicas automatically.
  • Custom domains and TLS — attach your own domains with automatic Let's Encrypt certificates.
  • Load balancing — production-grade traffic routing via Pingora. Intelligent request distribution across all healthy replicas.
  • Multi-cloud — deploy across AWS, Azure, and DigitalOcean. Run replicas in multiple regions.
  • Scale to zero — when no traffic arrives, replicas scale down to zero. You pay nothing until the next request.
  • Observability — real-time logs, replica health monitoring, and request metrics in the dashboard.

Quick start

# Deploy a service outpost serve launch --name llama-api --cloud aws --region us-east-1 --gpus A100 --port 8080 # Check status outpost serve status llama-api # List all services outpost serve list # Delete outpost serve delete llama-api

Architecture

Under the hood, each service is backed by an orchestration engine that manages the full lifecycle:

  1. Provisioning — Outpost spins up instances on the target cloud, installs dependencies, starts your application.
  2. Health checking — the readiness probe is polled continuously. Replicas only receive traffic once healthy.
  3. Load balancing — Pingora distributes requests across healthy replicas using least-connections routing.
  4. Autoscaling — the CoDel-based controller monitors request queues and latency, adjusting replica count every 60 seconds.
  5. Recovery — failed replicas are automatically replaced. Outpost maintains the minimum replica count.

Use cases

Use case Example
Model serving Deploy LLMs with vLLM, TGI, or Triton behind an OpenAI-compatible API
REST APIs Production FastAPI, Express, or Go services with autoscaling
Web applications Full-stack apps with custom domains and SSL
Batch inference On-demand GPU endpoints that scale to zero when idle

Next steps

Previous Connect & Develop

Next Deploy a Service