Short Introduction to Throughput and Utilization Based Auto Scaling

High level article to understand a few of the key metrics in throughput based approach versus utilization based approach


Autoscaling is the term used in cloud computing which refers to automatically increasing or decreasing the computational resources. It is helpful in improving the utilization of cloud resources during the peak and off-peak times by scaling up and down respectively. One of the major benefits of auto-scaling is that it eliminates the need of human intervention during traffic spikes by automatically increasing say, computing servers.

Autoscaling is triggered by various set of metrics that fall under two of the majorly used approaches: Utilization Based Autoscaling and Throughput Based Autoscaling. Let’d dive deeper into each of these approaches and how they differ from each-other when it comes to auto-scaling.

Utilization Based Autoscaling

In this type of autoscaling, scaling decisions are based off utilization metrics which is the normalized measurement of load on a service i.e. number of servers performing a single task. There are a bunch of utilization metrics that can be used to determine the health of a system:

  • CPU Utilization

  • Memory Utilization

  • Network Utilization

Throughput Based Autoscaling

In this type of autoscaling, the decisions to scale up or down are made using the amount of work that some service has to perform. This can be measured by throughput metrics which helps in measuring the useful work done by a service in absolute terms. Two of the commonly used throughput metrics are:

  • Throughput handled based off the current capacity assigned for a service in a particular region. (Capacity)

  • Throughput that a service is required to have if it’s running on additional load due to increased traffic because some resource went down in a different region. (Demand)


With either of the two approaches, it becomes extremely crucial to identify the right set of metrics to determine autoscaling when it’s needed. In-spite of being one of the selling points of today’s cloud providers, autoscaling needs to be smart enough to overcome issues like unnecessary scaling of resources due to DDoS attack and leading to increased infrastructure cost or provisioning lesser than required servers as a response to data center of a region going down. Irrespective of the challenges, it will be interesting to see how this evolves as a response to an ever-increasing demand of efficient computational resources.

If you like the post, share and subscribe to the newsletter to stay up to date with tech/product musings.

(The contents of this blog are of my personal opinion and/or self-reading a bunch of articles and in no way influenced by my employer.)