Services

Deploy, scale, and manage production inference and application services

POST

/{namespace}/services

curl --request POST \
  --url /auth/v1/seed/{namespace}/services \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{"name":"llama-api","cloud":"aws","region":"us-east-1","accelerator":"A10G:1","port":8000,"command":"python -m vllm.entrypoints.openai.api_server --model $MODEL_NAME --port 8000","min_replicas":1,"max_replicas":5,"readiness_path":"/health","env":{"MODEL_NAME":"meta-llama/Llama-3.1-8B"}}'

Services are auto-scaling HTTP endpoints with load balancing, health checks, custom domains, and scale-to-zero. Read operations use /{'{namespace}'}/services, write operations go through /auth/v1/seed/.

List services

GET `/{namespace}/services`

Retrieve all services in a namespace.

Path parameters

namespace

string

required

The namespace (user or organization) to list services for.

Query parameters

status

string

Filter by status. One of `deploying`, `running`, `degraded`, `stopped`, `failed`.

page

integer

Page number for pagination.

default: 1

per_page

integer

Number of results per page. Maximum `100`.

default: 20

Request

curl "https://outpost.run/acme/services" 
  -H "Authorization: Bearer <access_token>"

curl "https://outpost.run/acme/services" 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "data": [
    {
      "id": "svc_8f3a2b1c4d5e",
      "name": "llama-3-serving",
      "namespace": "acme",
      "status": "running",
      "gpu": "A100-80GB",
      "gpu_count": 1,
      "current_replicas": 3,
      "ready_replicas": 3,
      "min_replicas": 2,
      "max_replicas": 8,
      "endpoint": "https://llama-3-serving.acme.outpost.run",
      "region": "us-east-1",
      "cost_per_hour_per_replica": "3.20",
      "created_at": "2026-03-18T10:00:00Z",
      "updated_at": "2026-03-18T12:15:00Z"
    },
    {
      "id": "svc_2c4d6e8f1a3b",
      "name": "embedding-api",
      "namespace": "acme",
      "status": "running",
      "gpu": "L4",
      "gpu_count": 1,
      "current_replicas": 1,
      "ready_replicas": 1,
      "min_replicas": 0,
      "max_replicas": 4,
      "endpoint": "https://embedding-api.acme.outpost.run",
      "region": "us-east-1",
      "cost_per_hour_per_replica": "0.80",
      "created_at": "2026-02-20T14:30:00Z",
      "updated_at": "2026-03-18T11:00:00Z"
    }
  ],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 2,
    "total_pages": 1
  }
}

{
  "data": [
    {
      "id": "svc_8f3a2b1c4d5e",
      "name": "llama-3-serving",
      "namespace": "acme",
      "status": "running",
      "gpu": "A100-80GB",
      "gpu_count": 1,
      "current_replicas": 3,
      "ready_replicas": 3,
      "min_replicas": 2,
      "max_replicas": 8,
      "endpoint": "https://llama-3-serving.acme.outpost.run",
      "region": "us-east-1",
      "cost_per_hour_per_replica": "3.20",
      "created_at": "2026-03-18T10:00:00Z",
      "updated_at": "2026-03-18T12:15:00Z"
    },
    {
      "id": "svc_2c4d6e8f1a3b",
      "name": "embedding-api",
      "namespace": "acme",
      "status": "running",
      "gpu": "L4",
      "gpu_count": 1,
      "current_replicas": 1,
      "ready_replicas": 1,
      "min_replicas": 0,
      "max_replicas": 4,
      "endpoint": "https://embedding-api.acme.outpost.run",
      "region": "us-east-1",
      "cost_per_hour_per_replica": "0.80",
      "created_at": "2026-02-20T14:30:00Z",
      "updated_at": "2026-03-18T11:00:00Z"
    }
  ],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 2,
    "total_pages": 1
  }
}

Get a service

GET `/{namespace}/service/{name}`

Retrieve details for a single service, including current scaling state and deployment information.

Path parameters

namespace

string

required

The namespace (user or organization) the service belongs to.

name

string

required

The service name (e.g., `llama-3-serving`).

Request

curl https://outpost.run/acme/service/llama-3-serving 
  -H "Authorization: Bearer <access_token>"

curl https://outpost.run/acme/service/llama-3-serving 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "running",
  "repo": "acme/inference-engine",
  "branch": "main",
  "commit_sha": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
  "gpu": "A100-80GB",
  "gpu_count": 1,
  "cpu": 16,
  "memory_gb": 64,
  "min_replicas": 2,
  "max_replicas": 8,
  "target_gpu_utilization": 65,
  "current_replicas": 3,
  "ready_replicas": 3,
  "port": 8080,
  "health_check_path": "/v1/health",
  "endpoint": "https://llama-3-serving.acme.outpost.run",
  "custom_domain": null,
  "region": "us-east-1",
  "cost_per_hour_per_replica": "3.20",
  "metrics": {
    "requests_per_second": 142.5,
    "avg_latency_ms": 380,
    "p99_latency_ms": 920,
    "gpu_utilization_percent": 72,
    "error_rate_percent": 0.02
  },
  "created_at": "2026-03-18T10:00:00Z",
  "updated_at": "2026-03-18T12:15:00Z"
}

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "running",
  "repo": "acme/inference-engine",
  "branch": "main",
  "commit_sha": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
  "gpu": "A100-80GB",
  "gpu_count": 1,
  "cpu": 16,
  "memory_gb": 64,
  "min_replicas": 2,
  "max_replicas": 8,
  "target_gpu_utilization": 65,
  "current_replicas": 3,
  "ready_replicas": 3,
  "port": 8080,
  "health_check_path": "/v1/health",
  "endpoint": "https://llama-3-serving.acme.outpost.run",
  "custom_domain": null,
  "region": "us-east-1",
  "cost_per_hour_per_replica": "3.20",
  "metrics": {
    "requests_per_second": 142.5,
    "avg_latency_ms": 380,
    "p99_latency_ms": 920,
    "gpu_utilization_percent": 72,
    "error_rate_percent": 0.02
  },
  "created_at": "2026-03-18T10:00:00Z",
  "updated_at": "2026-03-18T12:15:00Z"
}

Get service logs

GET `/{namespace}/service/{name}/logs`

Retrieve logs for a service.

Path parameters

namespace

string

required

The namespace (user or organization) the service belongs to.

name

string

required

The service name.

Query parameters

tail

integer

Number of most recent log lines to return. Maximum `10000`.

default: 100

since

string

Return logs after this ISO 8601 timestamp. For example, `2026-03-18T12:00:00Z`.

Request

curl "https://outpost.run/acme/service/llama-3-serving/logs?tail=50" 
  -H "Authorization: Bearer <access_token>"

curl "https://outpost.run/acme/service/llama-3-serving/logs?tail=50" 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "name": "llama-3-serving",
  "namespace": "acme",
  "lines": [
    {
      "timestamp": "2026-03-18T12:00:00Z",
      "message": "Service started successfully"
    }
  ],
  "has_more": false
}

{
  "name": "llama-3-serving",
  "namespace": "acme",
  "lines": [
    {
      "timestamp": "2026-03-18T12:00:00Z",
      "message": "Service started successfully"
    }
  ],
  "has_more": false
}

Get service analytics

GET `/{namespace}/service/{name}/analytics`

Retrieve analytics and metrics for a service.

Path parameters

namespace

string

required

The namespace (user or organization) the service belongs to.

name

string

required

The service name.

Request

curl https://outpost.run/acme/service/llama-3-serving/analytics 
  -H "Authorization: Bearer <access_token>"

curl https://outpost.run/acme/service/llama-3-serving/analytics 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "name": "llama-3-serving",
  "namespace": "acme",
  "requests_per_second": 142.5,
  "avg_latency_ms": 380,
  "p99_latency_ms": 920,
  "gpu_utilization_percent": 72,
  "current_replicas": 3,
  "ready_replicas": 3
}

{
  "name": "llama-3-serving",
  "namespace": "acme",
  "requests_per_second": 142.5,
  "avg_latency_ms": 380,
  "p99_latency_ms": 920,
  "gpu_utilization_percent": 72,
  "current_replicas": 3,
  "ready_replicas": 3
}

Create a service

POST `/auth/v1/seed/{namespace}/services`

Deploy a new production service from a repository or container image.

Path parameters

namespace

string

required

The namespace (user or organization) to deploy the service in.

Body parameters

name

string

required

A unique name for the service within the namespace. Allowed characters: alphanumeric, hyphens, underscores.

repo

string

Full repository name (namespace/repo) to deploy from. Either `repo` or `image` is required.

image

string

Container image to deploy. Either `repo` or `image` is required.

branch

string

Branch to deploy when using `repo`. Pushes to this branch trigger automatic redeployments.

default: main

gpu

string

required

GPU type for each replica. One of `A100-40GB`, `A100-80GB`, `H100-80GB`, `L4`, `T4`, `none`.

gpu_count

integer

Number of GPUs per replica. Must be `1`, `2`, `4`, or `8`.

default: 1

cpu

integer

Number of vCPUs per replica. One of `2`, `4`, `8`, `16`, `32`, `64`.

default: 4

memory_gb

integer

Memory per replica in gigabytes.

default: 16

min_replicas

integer

Minimum number of replicas. Set to `0` to enable scale-to-zero.

default: 1

max_replicas

integer

Maximum number of replicas for autoscaling.

default: 1

target_gpu_utilization

integer

Target GPU utilization percentage for autoscaling. Range: `10` to `95`.

default: 70

port

integer

Port the service listens on inside the container.

default: 8080

health_check_path

string

HTTP path for health checks. The service must return a 200 on this path.

default: /health

env

object

Environment variables as key-value pairs.

region

string

Deployment region. One of `us-east-1`, `us-west-2`, `eu-west-1`, `eu-central-1`, `ap-northeast-1`.

default: us-east-1

custom_domain

string

Custom domain to route traffic to this service. You must configure a CNAME record pointing to your service endpoint.

Request

curl -X POST https://outpost.run/auth/v1/seed/acme/services 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "llama-3-serving",
    "repo": "acme/inference-engine",
    "branch": "main",
    "gpu": "A100-80GB",
    "gpu_count": 1,
    "cpu": 16,
    "memory_gb": 64,
    "min_replicas": 2,
    "max_replicas": 8,
    "target_gpu_utilization": 65,
    "port": 8080,
    "health_check_path": "/v1/health",
    "env": {
      "MODEL_ID": "meta-llama/Meta-Llama-3.1-70B-Instruct",
      "MAX_BATCH_SIZE": "32",
      "QUANTIZATION": "awq"
    },
    "region": "us-east-1"
  }'

curl -X POST https://outpost.run/auth/v1/seed/acme/services 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "llama-3-serving",
    "repo": "acme/inference-engine",
    "branch": "main",
    "gpu": "A100-80GB",
    "gpu_count": 1,
    "cpu": 16,
    "memory_gb": 64,
    "min_replicas": 2,
    "max_replicas": 8,
    "target_gpu_utilization": 65,
    "port": 8080,
    "health_check_path": "/v1/health",
    "env": {
      "MODEL_ID": "meta-llama/Meta-Llama-3.1-70B-Instruct",
      "MAX_BATCH_SIZE": "32",
      "QUANTIZATION": "awq"
    },
    "region": "us-east-1"
  }'

Response 201

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "deploying",
  "repo": "acme/inference-engine",
  "branch": "main",
  "commit_sha": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
  "gpu": "A100-80GB",
  "gpu_count": 1,
  "cpu": 16,
  "memory_gb": 64,
  "min_replicas": 2,
  "max_replicas": 8,
  "target_gpu_utilization": 65,
  "current_replicas": 0,
  "ready_replicas": 0,
  "port": 8080,
  "health_check_path": "/v1/health",
  "endpoint": "https://llama-3-serving.acme.outpost.run",
  "region": "us-east-1",
  "cost_per_hour_per_replica": "3.20",
  "created_at": "2026-03-18T10:00:00Z",
  "updated_at": "2026-03-18T10:00:00Z"
}

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "deploying",
  "repo": "acme/inference-engine",
  "branch": "main",
  "commit_sha": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
  "gpu": "A100-80GB",
  "gpu_count": 1,
  "cpu": 16,
  "memory_gb": 64,
  "min_replicas": 2,
  "max_replicas": 8,
  "target_gpu_utilization": 65,
  "current_replicas": 0,
  "ready_replicas": 0,
  "port": 8080,
  "health_check_path": "/v1/health",
  "endpoint": "https://llama-3-serving.acme.outpost.run",
  "region": "us-east-1",
  "cost_per_hour_per_replica": "3.20",
  "created_at": "2026-03-18T10:00:00Z",
  "updated_at": "2026-03-18T10:00:00Z"
}

You can also deploy a service from a container image using an API key:

curl -X POST https://outpost.run/auth/v1/seed/acme/services 
  -H "Authorization: API-Key my-key$sk_live_a1b2c3d4e5f6g7h8i9j0" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "embedding-api",
    "image": "ghcr.io/acme/embedding-server:v2.1.0",
    "gpu": "L4",
    "cpu": 8,
    "memory_gb": 32,
    "min_replicas": 0,
    "max_replicas": 4,
    "target_gpu_utilization": 70,
    "port": 8080,
    "region": "us-east-1"
  }'

curl -X POST https://outpost.run/auth/v1/seed/acme/services 
  -H "Authorization: API-Key my-key$sk_live_a1b2c3d4e5f6g7h8i9j0" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "embedding-api",
    "image": "ghcr.io/acme/embedding-server:v2.1.0",
    "gpu": "L4",
    "cpu": 8,
    "memory_gb": 32,
    "min_replicas": 0,
    "max_replicas": 4,
    "target_gpu_utilization": 70,
    "port": 8080,
    "region": "us-east-1"
  }'

Update a service

PUT `/auth/v1/seed/{namespace}/services/{name}`

Update the configuration and autoscaling parameters for a running service.

Path parameters

namespace

string

required

The namespace (user or organization) the service belongs to.

name

string

required

The service name.

Body parameters

min_replicas

integer

Minimum number of replicas. Set to `0` to enable scale-to-zero.

max_replicas

integer

Maximum number of replicas.

target_gpu_utilization

integer

Target GPU utilization percentage for autoscaling. Range: `10` to `95`.

env

object

Replace all environment variables with the provided key-value pairs. Triggers a rolling redeployment.

branch

string

Change the deployment branch. Triggers an immediate redeployment from the new branch.

custom_domain

string

Set or update a custom domain. Pass `null` to remove.

Request

curl -X PUT https://outpost.run/auth/v1/seed/acme/services/llama-3-serving 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "min_replicas": 4,
    "max_replicas": 16,
    "target_gpu_utilization": 60
  }'

curl -X PUT https://outpost.run/auth/v1/seed/acme/services/llama-3-serving 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "min_replicas": 4,
    "max_replicas": 16,
    "target_gpu_utilization": 60
  }'

Response 200

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "running",
  "min_replicas": 4,
  "max_replicas": 16,
  "target_gpu_utilization": 60,
  "current_replicas": 4,
  "ready_replicas": 3,
  "message": "Scaling up to meet new minimum replica count.",
  "updated_at": "2026-03-18T14:00:00Z"
}

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "running",
  "min_replicas": 4,
  "max_replicas": 16,
  "target_gpu_utilization": 60,
  "current_replicas": 4,
  "ready_replicas": 3,
  "message": "Scaling up to meet new minimum replica count.",
  "updated_at": "2026-03-18T14:00:00Z"
}

Updating environment variables

Updating env triggers a rolling redeployment. The old replicas continue serving traffic until new ones are healthy.

curl -X PUT https://outpost.run/auth/v1/seed/acme/services/llama-3-serving 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "env": {
      "MODEL_ID": "meta-llama/Meta-Llama-3.1-70B-Instruct",
      "MAX_BATCH_SIZE": "64",
      "QUANTIZATION": "gptq"
    }
  }'

curl -X PUT https://outpost.run/auth/v1/seed/acme/services/llama-3-serving 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "env": {
      "MODEL_ID": "meta-llama/Meta-Llama-3.1-70B-Instruct",
      "MAX_BATCH_SIZE": "64",
      "QUANTIZATION": "gptq"
    }
  }'

Redeploy a service

POST `/auth/v1/seed/{namespace}/services/{name}/redeploy`

Force a redeployment of the service using the latest commit on the configured branch.

Path parameters

namespace

string

required

The namespace (user or organization) the service belongs to.

name

string

required

The service name.

Request

curl -X POST https://outpost.run/auth/v1/seed/acme/services/llama-3-serving/redeploy 
  -H "Authorization: Bearer <access_token>"

curl -X POST https://outpost.run/auth/v1/seed/acme/services/llama-3-serving/redeploy 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "deploying",
  "commit_sha": "d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5",
  "message": "Rolling redeployment initiated. Previous replicas will serve traffic until new ones are ready.",
  "updated_at": "2026-03-18T15:00:00Z"
}

{
  "id": "svc_8f3a2b1c4d5e",
  "name": "llama-3-serving",
  "namespace": "acme",
  "status": "deploying",
  "commit_sha": "d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5",
  "message": "Rolling redeployment initiated. Previous replicas will serve traffic until new ones are ready.",
  "updated_at": "2026-03-18T15:00:00Z"
}

Delete a service

DELETE `/auth/v1/seed/{namespace}/services/{name}`

Delete a service and terminate all running replicas. Traffic to the service endpoint will immediately return 503.

Path parameters

namespace

string

required

The namespace (user or organization) the service belongs to.

name

string

required

The service name.

Query parameters

force

boolean

Skip the graceful drain period and terminate replicas immediately.

default: false

Request

curl -X DELETE https://outpost.run/auth/v1/seed/acme/services/llama-3-serving 
  -H "Authorization: Bearer <access_token>"

curl -X DELETE https://outpost.run/auth/v1/seed/acme/services/llama-3-serving 
  -H "Authorization: Bearer <access_token>"

Response 204

Returns an empty response body on success.

Error responses

All service endpoints may return the following errors:

Status	Description
400	Bad request -- invalid parameters or configuration
401	Unauthorized -- missing or invalid credentials
403	Forbidden -- insufficient permissions
404	Not found -- service does not exist
409	Conflict -- service is in an incompatible state (e.g., already deploying)
422	Unprocessable entity -- invalid configuration combination
429	Rate limit exceeded
500	Internal server error -- Seed orchestration failure

{
  "error": {
    "code": "invalid_request",
    "message": "Either 'repo' or 'image' must be provided.",
    "request_id": "req_5a6b7c8d9e0f"
  }
}

{
  "error": {
    "code": "invalid_request",
    "message": "Either 'repo' or 'image' must be provided.",
    "request_id": "req_5a6b7c8d9e0f"
  }
}

Previous → Delete a machine

Next Jobs →