Guide to autoscale apps or models deployment based on traffic using Outpost Services.
Outpost offers two methods for scaling services: fixed replicas and autoscaling. This guide will help you understand and configure these scaling options to match your workload demands efficiently.
Fixed replicas allow you to specify an exact number of replicas (instances) of your service to run continuously. This approach is well-suited for services with consistent and predictable workloads.
To configure fixed replicas, include the following in your outpost.yaml
file:
1service:
2 readiness_probe: /
3 replicas: 2
In this example, Outpost will launch and maintain 2 replicas of your service at all times.
Autoscaling dynamically adjusts the resources allocated to your service based on incoming traffic. This is ideal for services with variable or unpredictable workloads, helping to ensure optimal performance and cost efficiency.
To enable autoscaling, configure the replica_policy
section in your outpost.yaml
file:
1service:
2 readiness_probe: /
3 replica_policy:
4 min_replicas: 1
5 max_replicas: 10
6 target_qps_per_replica: 5
7 upscale_delay_seconds: 300
8 downscale_delay_seconds: 1200
Here's what each parameter in the replica_policy
section does:
min_replicas
: Sets the minimum number of replicas for your service, ensuring a base level of availability.max_replicas
: Defines the upper limit on the number of replicas, helping to control costs.target_qps_per_replica
: Specifies the ideal number of queries per second (QPS) that each replica should handle.upscale_delay_seconds
: The time (in seconds) to wait before scaling up, preventing unnecessary scaling due to brief traffic spikes.downscale_delay_seconds
: The time (in seconds) delay before scaling down, ensuring sufficient capacity during temporary lulls.Use fixed replicas when:
Autoscaling is beneficial when:
Suppose you're running an e-commerce website that experiences a surge in traffic during holidays. With fixed replicas, you might need to manually scale up your service to handle the increased traffic. With autoscaling, you can configure your service to automatically scale up during peak periods and scale down during quiet periods.
min_replicas
and max_replicas
based on your expected workload.target_qps_per_replica
, upscale_delay_seconds
, and downscale_delay_seconds
as needed to fine-tune autoscaling behavior.For services that might experience extended periods of inactivity, you can configure autoscaling to scale down to zero replicas. This is especially useful for cost optimization when there is no traffic. For instance, a service with the following configuration will scale down to zero replicas after 30 minutes of inactivity and scale up again when traffic resumes:
1service:
2 readiness_probe: /
3 replica_policy:
4 min_replicas: 0
5 max_replicas: 10
6 target_qps_per_replica: 5
7 upscale_delay_seconds: 300
8 downscale_delay_seconds: 1800
In this example, min_replicas
is set to 0, allowing the service to scale down completely, and downscale_delay_seconds
is increased to 30 minutes to avoid unnecessary scaling fluctuations.
By leveraging Outpost's autoscaling capabilities, you can ensure that your services efficiently utilize resources and adapt to changing demands, resulting in improved performance and cost efficiency.