Scaling workloads across multiple dimensions in GKE

How Songkick harmonized their infrastructure with Memorystore
March 10, 2021
Hiding Among Friends | How To Beat The New Breed of Supply Chain Attacks
March 10, 2021
How Songkick harmonized their infrastructure with Memorystore
March 10, 2021
Hiding Among Friends | How To Beat The New Breed of Supply Chain Attacks
March 10, 2021

Developers & Practitioners

#containers

In Google Kubernetes Engine (GKE), application owners can define multiple autoscaling behaviors for a workload using a single Kubernetes resource: Multidimensional Pod Autoscaler (MPA).

The challenges of scaling Pods horizontally and vertically

The success of Kubernetes as a widely adopted platform is grounded in its support for a variety of workloads and their many requirements. One of the areas that has continuously improved over time is workload autoscaling.

Dating back to the early days of Kubernetes, Horizontal Pod Autoscaler (HPA) was the primary mechanism for autoscaling Pods. By the very nature of its name, it provided users the ability to have Pod replicas added when a user-defined threshold of a given metric was crossed. Early on this was typically CPU or Memory usage, though now there’s support for custom and external metrics.

A bit further down the line, Vertical Pod Autoscaler (VPA) added a new dimension to workload autoscaling. Much like its name suggests, VPA had the ability to make recommendations on the best amount of CPU or Memory that Pods should be requesting based on usage patterns. Users can then either review those recommendations and make the call as to whether or not they should be applied, or entrust VPA to apply those changes automatically on their behalf.

Naturally, Kubernetes users have sought to get the benefits from both of these forms of scaling.

While these autoscalers work well independent of one another, the results of running both at the same time can produce unexpected results.

Picture this example:

example
  • HPA adjusts the number of replicas for a Pod to maintain a target 50% CPU utilization
  • VPA, when configured to automatically apply recommendations, could fall into a loop of continuously shrinking CPU requests – a direct result of HPA maintaining its relatively low target for CPU utilization!Part of the challenge here is that when configured to act autonomously, VPA applies changes for both CPU and memory. Thus, the contention can be difficult to avoid as long as VPA is automatically applying changes.

Users have since accepted compromises in one of two ways:

  • Using HPA to scale on CPU or memory and using VPA only for recommendations, building their own automation to review and actually apply the recommendations
  • Using VPA to automatically apply changes to CPU and memory, while using HPA based on custom or external metrics

While these workarounds are suitable for a handful of use cases, there are still workloads that would benefit from autoscaling across the dimensions of both CPU and memory.

For example, web applications may require horizontal autoscaling on CPU when CPU bound – but may also desire vertical autoscaling on memory for reliability in the event of misconfigured memory that results in OOMkilled events for the container.

Multidimensional Pod Autoscaler

The first feature available in MPA allows users to scale Pods horizontally based on CPU utilization and vertically based on memory, available in GKE clusters versions 1.19.4-gke.1700 or newer.

first feature

In the MPA schema, there are two critical constructs that enable users to configure their desired behavior: goals and constraints. See the below manifest for an MPA resource, which has been shortened for readability:

  # mpa-config.yaml
...  
  goals:
    metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
  constraints:
    global:
      minReplicas: 1
      maxReplicas: 5
    containerControlledResources: [ memory ]
  policy:
    updateMode: Auto
...

Goals allow for users to define targets for metrics. The first supported metric is target CPU utilization, similar to how users define target CPU utilization in an HPA resource. The MPA will attempt to ensure that these goals are met by distributing load across additional replicas of a given Pod.

Constraints, on the other hand, are a bit more stringent. These take precedence over goals, and can be applied either to global targets – think min and max replicas of a given Pod – or specific resources. In the case of vertical autoscaling, this is where users get to a.) specify that memory is controlled by MPA and b.) define the upper and lower boundaries for memory requests for a given Pod should they need to.

Let’s test this out!

We’ll use Cloud Shell as our workstation and create a GKE cluster with a version that supports MPA:

  gcloud beta container clusters create "mpa-sandbox" 
  --cluster-version "1.19.6-gke.600" 
  --release-channel "rapid" 
  --enable-vertical-pod-autoscaling

We’ll use the standard php-apache example Pods from the Kubernetes documentation on HPA. These manifests will create three Kubernetes objects – a Deployment, a Service, and a Multidimensional Pod Autoscaler.

  kubectl apply -f https://raw.githubusercontent.com/agmsbush/mpa-gke/main/php-apache-mpa.yaml

The Deployment consists of a php-apache Pod, is exposed via a Service type: LoadBalancer, and is managed by a Multidimensional Pod Autoscaler (MPA).

The Pod template in the Deployment is configured to request 100 millicores in CPU and 50 mebibytes in memory. The MPA is configured to aim for 60% CPU utilization and adjusting Pod memory requests based on usage.

Once we have the resources deployed, grab the External IP address for the php-apache Service.

kubectl get svc

We will then use the hey utility to send artificial traffic to our php-apache Pods and thus trigger action from the MPA, accessing the Pods via the Service’s external IP address.

hey -z 1000s -c 1000 http://<your-service-external-ip>

The MPA will then scale the Deployment horizontally, adding Pod replicas to handle the incoming traffic.

kubectl get pods -w

We can also observe the amount of CPU and memory each Pod replica is using:

kubectl top pods

In the output from the previous command, Pods should be utilizing well over the memory requests that we specified in the Deployment. Digging into the MPA object, we can see that the MPA notices that as well, recommending an increase in memory requests.

kubectl describe mpa

  ...
Recommended Pod Resources:
    Container Recommendations:
      Container Name:  php-apache
      Lower Bound:
        Memory:  78643200
      Target:
        Memory:  179306496
      Uncapped Target:
        Memory:  179306496
      Upper Bound:
        Memory:  81285611520
...

Eventually, we should see MPA actuate these recommendations and scale the Pods vertically.

We will know this is complete by observing an annotation in the Pod that denotes action was taken by the MPA, as well as the new memory requests adjusted to reflect the MPA’s action.

kubectl describe pod $POD_NAME

  ...
Annotations:  vpaObservedContainers: php-apache
              vpaUpdates: Pod resources updated by php-apache-mpa: container 0: memory request
...
  ...
   Requests:
      cpu:        100m
      memory:     171966464
...

Conclusion

Multidimensional Pod Autoscaler solves a challenge that many GKE users have faced, exposing a new method to control horizontal and vertical autoscaling via a single resource. Try it in GKE versions 1.19.6-gke.600+, currently available in the GKE Rapid Channel. Stay tuned for additional functionality in MPA!

A special thanks to Mark Mirchandani, Jerzy Foryciarz, Marcin Wielgus, and Tomek Weksej for their contributions to this blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *