Command Palette
Search for a command to run

Services

Deploy, scale, and manage production inference and application services

POST
/{namespace}/services

Services are auto-scaling HTTP endpoints with load balancing, health checks, custom domains, and scale-to-zero. Read operations use /{'{namespace}'}/services, write operations go through /auth/v1/seed/.

List services

GET `/{namespace}/services`

Retrieve all services in a namespace.

Path parameters


namespace
type: string
required
The namespace (user or organization) to list services for.

Query parameters


status
type: string
Filter by status. One of `deploying`, `running`, `degraded`, `stopped`, `failed`.
page
type: integer
Page number for pagination.
default: 1
per_page
type: integer
Number of results per page. Maximum `100`.
default: 20

Request

curl "https://outpost.run/acme/services" -H "Authorization: Bearer <access_token>"

Response 200

{ "data": [ { "id": "svc_8f3a2b1c4d5e", "name": "llama-3-serving", "namespace": "acme", "status": "running", "gpu": "A100-80GB", "gpu_count": 1, "current_replicas": 3, "ready_replicas": 3, "min_replicas": 2, "max_replicas": 8, "endpoint": "https://llama-3-serving.acme.outpost.run", "region": "us-east-1", "cost_per_hour_per_replica": "3.20", "created_at": "2026-03-18T10:00:00Z", "updated_at": "2026-03-18T12:15:00Z" }, { "id": "svc_2c4d6e8f1a3b", "name": "embedding-api", "namespace": "acme", "status": "running", "gpu": "L4", "gpu_count": 1, "current_replicas": 1, "ready_replicas": 1, "min_replicas": 0, "max_replicas": 4, "endpoint": "https://embedding-api.acme.outpost.run", "region": "us-east-1", "cost_per_hour_per_replica": "0.80", "created_at": "2026-02-20T14:30:00Z", "updated_at": "2026-03-18T11:00:00Z" } ], "pagination": { "page": 1, "per_page": 20, "total": 2, "total_pages": 1 } }

Get a service

GET `/{namespace}/service/{name}`

Retrieve details for a single service, including current scaling state and deployment information.

Path parameters


namespace
type: string
required
The namespace (user or organization) the service belongs to.
name
type: string
required
The service name (e.g., `llama-3-serving`).

Request

curl https://outpost.run/acme/service/llama-3-serving -H "Authorization: Bearer <access_token>"

Response 200

{ "id": "svc_8f3a2b1c4d5e", "name": "llama-3-serving", "namespace": "acme", "status": "running", "repo": "acme/inference-engine", "branch": "main", "commit_sha": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2", "gpu": "A100-80GB", "gpu_count": 1, "cpu": 16, "memory_gb": 64, "min_replicas": 2, "max_replicas": 8, "target_gpu_utilization": 65, "current_replicas": 3, "ready_replicas": 3, "port": 8080, "health_check_path": "/v1/health", "endpoint": "https://llama-3-serving.acme.outpost.run", "custom_domain": null, "region": "us-east-1", "cost_per_hour_per_replica": "3.20", "metrics": { "requests_per_second": 142.5, "avg_latency_ms": 380, "p99_latency_ms": 920, "gpu_utilization_percent": 72, "error_rate_percent": 0.02 }, "created_at": "2026-03-18T10:00:00Z", "updated_at": "2026-03-18T12:15:00Z" }

Get service logs

GET `/{namespace}/service/{name}/logs`

Retrieve logs for a service.

Path parameters


namespace
type: string
required
The namespace (user or organization) the service belongs to.
name
type: string
required
The service name.

Query parameters


tail
type: integer
Number of most recent log lines to return. Maximum `10000`.
default: 100
since
type: string
Return logs after this ISO 8601 timestamp. For example, `2026-03-18T12:00:00Z`.

Request

curl "https://outpost.run/acme/service/llama-3-serving/logs?tail=50" -H "Authorization: Bearer <access_token>"

Response 200

{ "name": "llama-3-serving", "namespace": "acme", "lines": [ { "timestamp": "2026-03-18T12:00:00Z", "message": "Service started successfully" } ], "has_more": false }

Get service analytics

GET `/{namespace}/service/{name}/analytics`

Retrieve analytics and metrics for a service.

Path parameters


namespace
type: string
required
The namespace (user or organization) the service belongs to.
name
type: string
required
The service name.

Request

curl https://outpost.run/acme/service/llama-3-serving/analytics -H "Authorization: Bearer <access_token>"

Response 200

{ "name": "llama-3-serving", "namespace": "acme", "requests_per_second": 142.5, "avg_latency_ms": 380, "p99_latency_ms": 920, "gpu_utilization_percent": 72, "current_replicas": 3, "ready_replicas": 3 }

Create a service

POST `/auth/v1/seed/{namespace}/services`

Deploy a new production service from a repository or container image.

Path parameters


namespace
type: string
required
The namespace (user or organization) to deploy the service in.

Body parameters


name
type: string
required
A unique name for the service within the namespace. Allowed characters: alphanumeric, hyphens, underscores.
repo
type: string
Full repository name (namespace/repo) to deploy from. Either `repo` or `image` is required.
image
type: string
Container image to deploy. Either `repo` or `image` is required.
branch
type: string
Branch to deploy when using `repo`. Pushes to this branch trigger automatic redeployments.
default: main
gpu
type: string
required
GPU type for each replica. One of `A100-40GB`, `A100-80GB`, `H100-80GB`, `L4`, `T4`, `none`.
gpu_count
type: integer
Number of GPUs per replica. Must be `1`, `2`, `4`, or `8`.
default: 1
cpu
type: integer
Number of vCPUs per replica. One of `2`, `4`, `8`, `16`, `32`, `64`.
default: 4
memory_gb
type: integer
Memory per replica in gigabytes.
default: 16
min_replicas
type: integer
Minimum number of replicas. Set to `0` to enable scale-to-zero.
default: 1
max_replicas
type: integer
Maximum number of replicas for autoscaling.
default: 1
target_gpu_utilization
type: integer
Target GPU utilization percentage for autoscaling. Range: `10` to `95`.
default: 70
port
type: integer
Port the service listens on inside the container.
default: 8080
health_check_path
type: string
HTTP path for health checks. The service must return a 200 on this path.
default: /health
env
type: object
Environment variables as key-value pairs.
region
type: string
Deployment region. One of `us-east-1`, `us-west-2`, `eu-west-1`, `eu-central-1`, `ap-northeast-1`.
default: us-east-1
custom_domain
type: string
Custom domain to route traffic to this service. You must configure a CNAME record pointing to your service endpoint.

Request

curl -X POST https://outpost.run/auth/v1/seed/acme/services -H "Authorization: Bearer <access_token>" -H "Content-Type: application/json" -d '{ "name": "llama-3-serving", "repo": "acme/inference-engine", "branch": "main", "gpu": "A100-80GB", "gpu_count": 1, "cpu": 16, "memory_gb": 64, "min_replicas": 2, "max_replicas": 8, "target_gpu_utilization": 65, "port": 8080, "health_check_path": "/v1/health", "env": { "MODEL_ID": "meta-llama/Meta-Llama-3.1-70B-Instruct", "MAX_BATCH_SIZE": "32", "QUANTIZATION": "awq" }, "region": "us-east-1" }'

Response 201

{ "id": "svc_8f3a2b1c4d5e", "name": "llama-3-serving", "namespace": "acme", "status": "deploying", "repo": "acme/inference-engine", "branch": "main", "commit_sha": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2", "gpu": "A100-80GB", "gpu_count": 1, "cpu": 16, "memory_gb": 64, "min_replicas": 2, "max_replicas": 8, "target_gpu_utilization": 65, "current_replicas": 0, "ready_replicas": 0, "port": 8080, "health_check_path": "/v1/health", "endpoint": "https://llama-3-serving.acme.outpost.run", "region": "us-east-1", "cost_per_hour_per_replica": "3.20", "created_at": "2026-03-18T10:00:00Z", "updated_at": "2026-03-18T10:00:00Z" }

You can also deploy a service from a container image using an API key:

curl -X POST https://outpost.run/auth/v1/seed/acme/services -H "Authorization: API-Key my-key$sk_live_a1b2c3d4e5f6g7h8i9j0" -H "Content-Type: application/json" -d '{ "name": "embedding-api", "image": "ghcr.io/acme/embedding-server:v2.1.0", "gpu": "L4", "cpu": 8, "memory_gb": 32, "min_replicas": 0, "max_replicas": 4, "target_gpu_utilization": 70, "port": 8080, "region": "us-east-1" }'

Update a service

PUT `/auth/v1/seed/{namespace}/services/{name}`

Update the configuration and autoscaling parameters for a running service.

Path parameters


namespace
type: string
required
The namespace (user or organization) the service belongs to.
name
type: string
required
The service name.

Body parameters


min_replicas
type: integer
Minimum number of replicas. Set to `0` to enable scale-to-zero.
max_replicas
type: integer
Maximum number of replicas.
target_gpu_utilization
type: integer
Target GPU utilization percentage for autoscaling. Range: `10` to `95`.
env
type: object
Replace all environment variables with the provided key-value pairs. Triggers a rolling redeployment.
branch
type: string
Change the deployment branch. Triggers an immediate redeployment from the new branch.
custom_domain
type: string
Set or update a custom domain. Pass `null` to remove.

Request

curl -X PUT https://outpost.run/auth/v1/seed/acme/services/llama-3-serving -H "Authorization: Bearer <access_token>" -H "Content-Type: application/json" -d '{ "min_replicas": 4, "max_replicas": 16, "target_gpu_utilization": 60 }'

Response 200

{ "id": "svc_8f3a2b1c4d5e", "name": "llama-3-serving", "namespace": "acme", "status": "running", "min_replicas": 4, "max_replicas": 16, "target_gpu_utilization": 60, "current_replicas": 4, "ready_replicas": 3, "message": "Scaling up to meet new minimum replica count.", "updated_at": "2026-03-18T14:00:00Z" }

Updating environment variables

Updating env triggers a rolling redeployment. The old replicas continue serving traffic until new ones are healthy.

curl -X PUT https://outpost.run/auth/v1/seed/acme/services/llama-3-serving -H "Authorization: Bearer <access_token>" -H "Content-Type: application/json" -d '{ "env": { "MODEL_ID": "meta-llama/Meta-Llama-3.1-70B-Instruct", "MAX_BATCH_SIZE": "64", "QUANTIZATION": "gptq" } }'

Redeploy a service

POST `/auth/v1/seed/{namespace}/services/{name}/redeploy`

Force a redeployment of the service using the latest commit on the configured branch.

Path parameters


namespace
type: string
required
The namespace (user or organization) the service belongs to.
name
type: string
required
The service name.

Request

curl -X POST https://outpost.run/auth/v1/seed/acme/services/llama-3-serving/redeploy -H "Authorization: Bearer <access_token>"

Response 200

{ "id": "svc_8f3a2b1c4d5e", "name": "llama-3-serving", "namespace": "acme", "status": "deploying", "commit_sha": "d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5", "message": "Rolling redeployment initiated. Previous replicas will serve traffic until new ones are ready.", "updated_at": "2026-03-18T15:00:00Z" }

Delete a service

DELETE `/auth/v1/seed/{namespace}/services/{name}`

Delete a service and terminate all running replicas. Traffic to the service endpoint will immediately return 503.

Path parameters


namespace
type: string
required
The namespace (user or organization) the service belongs to.
name
type: string
required
The service name.

Query parameters


force
type: boolean
Skip the graceful drain period and terminate replicas immediately.
default: false

Request

curl -X DELETE https://outpost.run/auth/v1/seed/acme/services/llama-3-serving -H "Authorization: Bearer <access_token>"

Response 204

Returns an empty response body on success.


Error responses

All service endpoints may return the following errors:

Status Description
400 Bad request -- invalid parameters or configuration
401 Unauthorized -- missing or invalid credentials
403 Forbidden -- insufficient permissions
404 Not found -- service does not exist
409 Conflict -- service is in an incompatible state (e.g., already deploying)
422 Unprocessable entity -- invalid configuration combination
429 Rate limit exceeded
500 Internal server error -- Seed orchestration failure
{ "error": { "code": "invalid_request", "message": "Either 'repo' or 'image' must be provided.", "request_id": "req_5a6b7c8d9e0f" } }

Previous Delete a machine

Next Jobs