Get Started

Autoscaling Services with Outpost

Guide to autoscale apps or models deployment based on traffic using Outpost Services.

Outpost offers two methods for scaling services: fixed replicas and autoscaling. This guide will help you understand and configure these scaling options to match your workload demands efficiently.

Fixed Replicas

Fixed replicas allow you to specify an exact number of replicas (instances) of your service to run continuously. This approach is well-suited for services with consistent and predictable workloads.

To configure fixed replicas, include the following in your outpost.yaml file:

yaml
1service:
2  readiness_probe: /
3  replicas: 2

In this example, Outpost will launch and maintain 2 replicas of your service at all times.

Autoscaling

Autoscaling dynamically adjusts the resources allocated to your service based on incoming traffic. This is ideal for services with variable or unpredictable workloads, helping to ensure optimal performance and cost efficiency.

To enable autoscaling, configure the replica_policy section in your outpost.yaml file:

yaml
1service:
2  readiness_probe: /
3  replica_policy:
4    min_replicas: 1
5    max_replicas: 10
6    target_qps_per_replica: 5
7    upscale_delay_seconds: 300
8    downscale_delay_seconds: 1200

Here's what each parameter in the replica_policy section does:

  • min_replicas: Sets the minimum number of replicas for your service, ensuring a base level of availability.
  • max_replicas: Defines the upper limit on the number of replicas, helping to control costs.
  • target_qps_per_replica: Specifies the ideal number of queries per second (QPS) that each replica should handle.
  • upscale_delay_seconds: The time (in seconds) to wait before scaling up, preventing unnecessary scaling due to brief traffic spikes.
  • downscale_delay_seconds: The time (in seconds) delay before scaling down, ensuring sufficient capacity during temporary lulls.

Choosing between Fixed Replicas and Autoscaling

Use fixed replicas when:

  • Your workload is small and consistent.
  • You want to guarantee a minimum level of service availability.
  • Scaling up or down frequently is not a concern.

Autoscaling is beneficial when:

  • Your workload is variable or unpredictable.
  • You want to optimize performance and resource utilization.
  • Cost reduction is a priority, especially during periods of low traffic.

Example use case

Suppose you're running an e-commerce website that experiences a surge in traffic during holidays. With fixed replicas, you might need to manually scale up your service to handle the increased traffic. With autoscaling, you can configure your service to automatically scale up during peak periods and scale down during quiet periods.

Best practices

  • Set sensible defaults for min_replicas and max_replicas based on your expected workload.
  • Monitor your service's performance and adjust target_qps_per_replica, upscale_delay_seconds, and downscale_delay_seconds as needed to fine-tune autoscaling behavior.
  • Test your autoscaling configuration in a staging environment to validate its effectiveness.

Scale to Zero example

For services that might experience extended periods of inactivity, you can configure autoscaling to scale down to zero replicas. This is especially useful for cost optimization when there is no traffic. For instance, a service with the following configuration will scale down to zero replicas after 30 minutes of inactivity and scale up again when traffic resumes:

yaml
1service:
2  readiness_probe: /
3  replica_policy:
4    min_replicas: 0
5    max_replicas: 10
6    target_qps_per_replica: 5
7    upscale_delay_seconds: 300
8    downscale_delay_seconds: 1800

In this example, min_replicas is set to 0, allowing the service to scale down completely, and downscale_delay_seconds is increased to 30 minutes to avoid unnecessary scaling fluctuations.

By leveraging Outpost's autoscaling capabilities, you can ensure that your services efficiently utilize resources and adapt to changing demands, resulting in improved performance and cost efficiency.