Jobs

Create, monitor, and manage batch compute jobs

POST

/{namespace}/jobs

curl --request POST \
  --url /auth/v1/seed/{namespace}/jobs \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{"name":"finetune-llama","cloud":"aws","region":"us-east-1","accelerator":"A100:4","command":"accelerate launch train.py --model $MODEL_NAME --epochs 3","env":{"MODEL_NAME":"meta-llama/Llama-3.1-8B"}}'

Jobs are batch workloads that run a command to completion and terminate automatically. Supports spot instances, multi-node, and progress tracking. Read operations use /{'{namespace}'}/jobs, write operations go through /auth/v1/seed/.

List jobs

GET `/{namespace}/jobs`

Retrieve all jobs in a namespace.

Path parameters

namespace

string

required

The namespace (user or organization) to list jobs for.

Query parameters

status

string

Filter by status. One of `queued`, `running`, `succeeded`, `failed`, `cancelled`, `timed_out`.

page

integer

Page number for pagination.

default: 1

per_page

integer

Number of results per page. Maximum `100`.

default: 20

sort

string

Sort field. One of `created_at`, `started_at`, `completed_at`, `name`.

default: created_at

order

string

Sort order. One of `asc`, `desc`.

default: desc

Request

curl "https://outpost.run/acme/jobs?status=running" 
  -H "Authorization: Bearer <access_token>"

curl "https://outpost.run/acme/jobs?status=running" 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "data": [
    {
      "id": "job_6e5d4c3b2a1f",
      "name": "finetune-llama-3-r1",
      "namespace": "acme",
      "status": "running",
      "gpu": "A100-80GB",
      "gpu_count": 4,
      "region": "us-east-1",
      "progress": "Epoch 2/3 - Step 4200/6300",
      "cost_per_hour": "12.80",
      "runtime_seconds": 14400,
      "created_at": "2026-03-18T10:00:00Z",
      "started_at": "2026-03-18T10:01:15Z",
      "completed_at": null
    }
  ],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 1,
    "total_pages": 1
  }
}

{
  "data": [
    {
      "id": "job_6e5d4c3b2a1f",
      "name": "finetune-llama-3-r1",
      "namespace": "acme",
      "status": "running",
      "gpu": "A100-80GB",
      "gpu_count": 4,
      "region": "us-east-1",
      "progress": "Epoch 2/3 - Step 4200/6300",
      "cost_per_hour": "12.80",
      "runtime_seconds": 14400,
      "created_at": "2026-03-18T10:00:00Z",
      "started_at": "2026-03-18T10:01:15Z",
      "completed_at": null
    }
  ],
  "pagination": {
    "page": 1,
    "per_page": 20,
    "total": 1,
    "total_pages": 1
  }
}

Get a job

GET `/{namespace}/job/{name}`

Retrieve details for a single job, including runtime metrics and artifact information.

Path parameters

namespace

string

required

The namespace (user or organization) the job belongs to.

name

string

required

The job name (e.g., `finetune-llama-3-r1`).

Request

curl https://outpost.run/acme/job/finetune-llama-3-r1 
  -H "Authorization: Bearer <access_token>"

curl https://outpost.run/acme/job/finetune-llama-3-r1 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "id": "job_6e5d4c3b2a1f",
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "status": "succeeded",
  "command": "python train.py --model meta-llama/Meta-Llama-3.1-8B --dataset acme/instructions-v3 --epochs 3 --lr 2e-5",
  "repo": "acme/training-pipeline",
  "branch": "main",
  "commit_sha": "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5",
  "gpu": "A100-80GB",
  "gpu_count": 4,
  "cpu": 64,
  "memory_gb": 256,
  "disk_gb": 500,
  "region": "us-east-1",
  "timeout_minutes": 4320,
  "retries": 1,
  "retry_count": 0,
  "output_path": "/workspace/output/checkpoints",
  "exit_code": 0,
  "cost_per_hour": "12.80",
  "total_cost": "76.80",
  "runtime_seconds": 21600,
  "artifacts": {
    "path": "/workspace/output/checkpoints",
    "size_bytes": 16106127360,
    "files": 12,
    "download_url": "https://artifacts.outpost.run/acme/job_6e5d4c3b2a1f/checkpoints"
  },
  "created_at": "2026-03-18T10:00:00Z",
  "started_at": "2026-03-18T10:01:15Z",
  "completed_at": "2026-03-18T16:01:15Z"
}

{
  "id": "job_6e5d4c3b2a1f",
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "status": "succeeded",
  "command": "python train.py --model meta-llama/Meta-Llama-3.1-8B --dataset acme/instructions-v3 --epochs 3 --lr 2e-5",
  "repo": "acme/training-pipeline",
  "branch": "main",
  "commit_sha": "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5",
  "gpu": "A100-80GB",
  "gpu_count": 4,
  "cpu": 64,
  "memory_gb": 256,
  "disk_gb": 500,
  "region": "us-east-1",
  "timeout_minutes": 4320,
  "retries": 1,
  "retry_count": 0,
  "output_path": "/workspace/output/checkpoints",
  "exit_code": 0,
  "cost_per_hour": "12.80",
  "total_cost": "76.80",
  "runtime_seconds": 21600,
  "artifacts": {
    "path": "/workspace/output/checkpoints",
    "size_bytes": 16106127360,
    "files": 12,
    "download_url": "https://artifacts.outpost.run/acme/job_6e5d4c3b2a1f/checkpoints"
  },
  "created_at": "2026-03-18T10:00:00Z",
  "started_at": "2026-03-18T10:01:15Z",
  "completed_at": "2026-03-18T16:01:15Z"
}

Get job logs

GET `/{namespace}/job/{name}/logs`

Stream or retrieve the logs for a job.

Path parameters

namespace

string

required

The namespace (user or organization) the job belongs to.

name

string

required

The job name.

Query parameters

tail

integer

Number of most recent log lines to return. Maximum `10000`.

default: 100

since

string

Return logs after this ISO 8601 timestamp. For example, `2026-03-18T12:00:00Z`.

stream

boolean

If `true`, the response uses `text/event-stream` to stream logs in real time. Only available for running jobs.

default: false

Request

curl "https://outpost.run/acme/job/finetune-llama-3-r1/logs?tail=50" 
  -H "Authorization: Bearer <access_token>"

curl "https://outpost.run/acme/job/finetune-llama-3-r1/logs?tail=50" 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "lines": [
    {
      "timestamp": "2026-03-18T15:58:00Z",
      "message": "[Epoch 3/3] Step 6280/6300 | Loss: 0.0312 | LR: 1.2e-6"
    },
    {
      "timestamp": "2026-03-18T15:59:00Z",
      "message": "[Epoch 3/3] Step 6290/6300 | Loss: 0.0298 | LR: 6.0e-7"
    },
    {
      "timestamp": "2026-03-18T15:59:30Z",
      "message": "[Epoch 3/3] Step 6300/6300 | Loss: 0.0285 | LR: 0.0"
    },
    {
      "timestamp": "2026-03-18T15:59:45Z",
      "message": "Training complete. Final loss: 0.0285"
    },
    {
      "timestamp": "2026-03-18T16:00:00Z",
      "message": "Saving checkpoint to /workspace/output/checkpoints/final..."
    },
    {
      "timestamp": "2026-03-18T16:00:30Z",
      "message": "Checkpoint saved. Total size: 15.0 GB"
    },
    {
      "timestamp": "2026-03-18T16:01:00Z",
      "message": "Uploading artifacts..."
    },
    {
      "timestamp": "2026-03-18T16:01:15Z",
      "message": "Done. Artifacts uploaded to acme/job_6e5d4c3b2a1f/checkpoints"
    }
  ],
  "has_more": true
}

{
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "lines": [
    {
      "timestamp": "2026-03-18T15:58:00Z",
      "message": "[Epoch 3/3] Step 6280/6300 | Loss: 0.0312 | LR: 1.2e-6"
    },
    {
      "timestamp": "2026-03-18T15:59:00Z",
      "message": "[Epoch 3/3] Step 6290/6300 | Loss: 0.0298 | LR: 6.0e-7"
    },
    {
      "timestamp": "2026-03-18T15:59:30Z",
      "message": "[Epoch 3/3] Step 6300/6300 | Loss: 0.0285 | LR: 0.0"
    },
    {
      "timestamp": "2026-03-18T15:59:45Z",
      "message": "Training complete. Final loss: 0.0285"
    },
    {
      "timestamp": "2026-03-18T16:00:00Z",
      "message": "Saving checkpoint to /workspace/output/checkpoints/final..."
    },
    {
      "timestamp": "2026-03-18T16:00:30Z",
      "message": "Checkpoint saved. Total size: 15.0 GB"
    },
    {
      "timestamp": "2026-03-18T16:01:00Z",
      "message": "Uploading artifacts..."
    },
    {
      "timestamp": "2026-03-18T16:01:15Z",
      "message": "Done. Artifacts uploaded to acme/job_6e5d4c3b2a1f/checkpoints"
    }
  ],
  "has_more": true
}

Streaming logs in real time

For running jobs, you can stream logs using Server-Sent Events:

curl -N "https://outpost.run/acme/job/finetune-llama-3-r1/logs?stream=true" 
  -H "Authorization: Bearer <access_token>" 
  -H "Accept: text/event-stream"

curl -N "https://outpost.run/acme/job/finetune-llama-3-r1/logs?stream=true" 
  -H "Authorization: Bearer <access_token>" 
  -H "Accept: text/event-stream"

Get job analytics

GET `/{namespace}/job/{name}/analytics`

Retrieve analytics and metrics for a job.

Path parameters

namespace

string

required

The namespace (user or organization) the job belongs to.

name

string

required

The job name.

Request

curl https://outpost.run/acme/job/finetune-llama-3-r1/analytics 
  -H "Authorization: Bearer <access_token>"

curl https://outpost.run/acme/job/finetune-llama-3-r1/analytics 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "gpu_utilization_percent": 94,
  "cpu_utilization_percent": 67,
  "memory_used_gb": 210,
  "disk_used_gb": 380,
  "runtime_seconds": 21600
}

{
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "gpu_utilization_percent": 94,
  "cpu_utilization_percent": 67,
  "memory_used_gb": 210,
  "disk_used_gb": 380,
  "runtime_seconds": 21600
}

Create a job

POST `/auth/v1/seed/{namespace}/jobs`

Submit a new batch job for execution. Jobs run to completion and then terminate.

Path parameters

namespace

string

required

The namespace (user or organization) to run the job in.

Body parameters

name

string

required

A human-readable name for the job. Must be unique within the namespace. Allowed characters: alphanumeric, hyphens, underscores.

command

string

required

The command to execute. This is run inside the container as the entrypoint.

repo

string

Full repository name (namespace/repo) to clone into the job's working directory. Either `repo` or `image` is required.

image

string

Container image to run. Either `repo` or `image` is required.

branch

string

Branch to clone when using `repo`.

default: main

gpu

string

required

GPU type for the job. One of `A100-40GB`, `A100-80GB`, `H100-80GB`, `L4`, `T4`, `none`.

gpu_count

integer

Number of GPUs. Must be `1`, `2`, `4`, or `8`.

default: 1

cpu

integer

Number of vCPUs. One of `2`, `4`, `8`, `16`, `32`, `64`, `96`.

default: 8

memory_gb

integer

Memory in gigabytes.

default: 32

disk_gb

integer

Ephemeral disk size in gigabytes. Data is discarded when the job completes.

default: 100

env

object

Environment variables as key-value pairs.

region

string

Deployment region. One of `us-east-1`, `us-west-2`, `eu-west-1`, `eu-central-1`, `ap-northeast-1`.

default: us-east-1

timeout_minutes

integer

Maximum runtime in minutes before the job is killed. Default is 24 hours. Maximum is `10080` (7 days).

default: 1440

retries

integer

Number of times to retry the job on failure. Maximum `5`.

default: 0

output_path

string

Path inside the container to persist as job artifacts. Contents are uploaded to your namespace's artifact storage on completion.

Request

curl -X POST https://outpost.run/auth/v1/seed/acme/jobs 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "finetune-llama-3-r1",
    "command": "python train.py --model meta-llama/Meta-Llama-3.1-8B --dataset acme/instructions-v3 --epochs 3 --lr 2e-5",
    "repo": "acme/training-pipeline",
    "branch": "main",
    "gpu": "A100-80GB",
    "gpu_count": 4,
    "cpu": 64,
    "memory_gb": 256,
    "disk_gb": 500,
    "env": {
      "WANDB_PROJECT": "llama-finetune",
      "WANDB_API_KEY": "wk_abc123",
      "HF_TOKEN": "hf_xyz789"
    },
    "region": "us-east-1",
    "timeout_minutes": 4320,
    "retries": 1,
    "output_path": "/workspace/output/checkpoints"
  }'

curl -X POST https://outpost.run/auth/v1/seed/acme/jobs 
  -H "Authorization: Bearer <access_token>" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "finetune-llama-3-r1",
    "command": "python train.py --model meta-llama/Meta-Llama-3.1-8B --dataset acme/instructions-v3 --epochs 3 --lr 2e-5",
    "repo": "acme/training-pipeline",
    "branch": "main",
    "gpu": "A100-80GB",
    "gpu_count": 4,
    "cpu": 64,
    "memory_gb": 256,
    "disk_gb": 500,
    "env": {
      "WANDB_PROJECT": "llama-finetune",
      "WANDB_API_KEY": "wk_abc123",
      "HF_TOKEN": "hf_xyz789"
    },
    "region": "us-east-1",
    "timeout_minutes": 4320,
    "retries": 1,
    "output_path": "/workspace/output/checkpoints"
  }'

Response 201

{
  "id": "job_6e5d4c3b2a1f",
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "status": "queued",
  "command": "python train.py --model meta-llama/Meta-Llama-3.1-8B --dataset acme/instructions-v3 --epochs 3 --lr 2e-5",
  "repo": "acme/training-pipeline",
  "branch": "main",
  "commit_sha": "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5",
  "gpu": "A100-80GB",
  "gpu_count": 4,
  "cpu": 64,
  "memory_gb": 256,
  "disk_gb": 500,
  "region": "us-east-1",
  "timeout_minutes": 4320,
  "retries": 1,
  "retry_count": 0,
  "output_path": "/workspace/output/checkpoints",
  "cost_per_hour": "12.80",
  "created_at": "2026-03-18T10:00:00Z",
  "started_at": null,
  "completed_at": null
}

{
  "id": "job_6e5d4c3b2a1f",
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "status": "queued",
  "command": "python train.py --model meta-llama/Meta-Llama-3.1-8B --dataset acme/instructions-v3 --epochs 3 --lr 2e-5",
  "repo": "acme/training-pipeline",
  "branch": "main",
  "commit_sha": "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5",
  "gpu": "A100-80GB",
  "gpu_count": 4,
  "cpu": 64,
  "memory_gb": 256,
  "disk_gb": 500,
  "region": "us-east-1",
  "timeout_minutes": 4320,
  "retries": 1,
  "retry_count": 0,
  "output_path": "/workspace/output/checkpoints",
  "cost_per_hour": "12.80",
  "created_at": "2026-03-18T10:00:00Z",
  "started_at": null,
  "completed_at": null
}

You can also submit jobs from a container image using an API key:

curl -X POST https://outpost.run/auth/v1/seed/acme/jobs 
  -H "Authorization: API-Key my-key$sk_live_a1b2c3d4e5f6g7h8i9j0" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "batch-embeddings",
    "command": "python embed.py --input /data/corpus.jsonl --output /workspace/output/embeddings",
    "image": "ghcr.io/acme/embedding-batch:v1.3.0",
    "gpu": "L4",
    "cpu": 16,
    "memory_gb": 64,
    "disk_gb": 200,
    "timeout_minutes": 360,
    "output_path": "/workspace/output/embeddings"
  }'

curl -X POST https://outpost.run/auth/v1/seed/acme/jobs 
  -H "Authorization: API-Key my-key$sk_live_a1b2c3d4e5f6g7h8i9j0" 
  -H "Content-Type: application/json" 
  -d '{
    "name": "batch-embeddings",
    "command": "python embed.py --input /data/corpus.jsonl --output /workspace/output/embeddings",
    "image": "ghcr.io/acme/embedding-batch:v1.3.0",
    "gpu": "L4",
    "cpu": 16,
    "memory_gb": 64,
    "disk_gb": 200,
    "timeout_minutes": 360,
    "output_path": "/workspace/output/embeddings"
  }'

Cancel a job

POST `/auth/v1/seed/{namespace}/jobs/{name}/cancel`

Cancel a queued or running job. Running jobs receive a SIGTERM followed by a SIGKILL after a 30-second grace period.

Path parameters

namespace

string

required

The namespace (user or organization) the job belongs to.

name

string

required

The job name.

Request

curl -X POST https://outpost.run/auth/v1/seed/acme/jobs/finetune-llama-3-r1/cancel 
  -H "Authorization: Bearer <access_token>"

curl -X POST https://outpost.run/auth/v1/seed/acme/jobs/finetune-llama-3-r1/cancel 
  -H "Authorization: Bearer <access_token>"

Response 200

{
  "id": "job_6e5d4c3b2a1f",
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "status": "cancelled",
  "message": "Job cancellation initiated. Running processes will be terminated.",
  "runtime_seconds": 14400,
  "total_cost": "51.20",
  "completed_at": "2026-03-18T14:01:15Z"
}

{
  "id": "job_6e5d4c3b2a1f",
  "name": "finetune-llama-3-r1",
  "namespace": "acme",
  "status": "cancelled",
  "message": "Job cancellation initiated. Running processes will be terminated.",
  "runtime_seconds": 14400,
  "total_cost": "51.20",
  "completed_at": "2026-03-18T14:01:15Z"
}

Delete a job

DELETE `/auth/v1/seed/{namespace}/jobs/{name}`

Delete a completed job record and its associated artifacts. Only completed, failed, or cancelled jobs can be deleted.

Path parameters

namespace

string

required

The namespace (user or organization) the job belongs to.

name

string

required

The job name.

Query parameters

delete_artifacts

boolean

Also delete uploaded artifacts from storage. This action cannot be undone.

default: false

Request

curl -X DELETE "https://outpost.run/auth/v1/seed/acme/jobs/finetune-llama-3-r1?delete_artifacts=true" 
  -H "Authorization: Bearer <access_token>"

curl -X DELETE "https://outpost.run/auth/v1/seed/acme/jobs/finetune-llama-3-r1?delete_artifacts=true" 
  -H "Authorization: Bearer <access_token>"

Response 204

Returns an empty response body on success.

Error responses

All job endpoints may return the following errors:

Status	Description
400	Bad request -- invalid parameters or configuration
401	Unauthorized -- missing or invalid credentials
403	Forbidden -- insufficient permissions
404	Not found -- job does not exist
409	Conflict -- job is in an incompatible state (e.g., already completed)
422	Unprocessable entity -- invalid GPU/CPU/memory combination
429	Rate limit exceeded
500	Internal server error -- Seed orchestration failure

{
  "error": {
    "code": "invalid_request",
    "message": "Either 'repo' or 'image' must be provided.",
    "request_id": "req_7b8c9d0e1f2a"
  }
}

{
  "error": {
    "code": "invalid_request",
    "message": "Either 'repo' or 'image' must be provided.",
    "request_id": "req_7b8c9d0e1f2a"
  }
}

Previous → Services