How we make our website fly – Part 2

Part 2 – Auto-scaling and elastic load balancing in the cloud

In Part 1 I looked at the front-end (presentation layer) high availability cache (Varnish) that helps us deliver web pages quickly from a cache (memory). However, there are situations when there are a large number of queries also reaching our back-end, as Varnish hasn’t or can’t cache those queries – during adverse weather or industrial action, when our website traffic can spike at up to 20x usual volumes for example.

Events such as adverse weather or strikes can lead to huge spikes in web traffic
Events such as adverse weather or industrial action can lead to huge spikes in web traffic

In this situation a seamless infrastructure platform is fundamental for us to ramp up and ramp down capacity according to the needs of our customers and hosting in the cloud allows us to do this easily, quickly and relatively cheaply. AWS (Amazon Web Services) provides us access to an almost infinite capacity to meet our requirements via Auto-scaling and elastic load balancing (ELB):

• Auto-scaling allows us to increase or decrease the number of EC2 instances (Virtual Servers) within our application’s architecture to provide scalable cloud computing.
• Elastic load balancing automatically scales request-handling capacity up or down in response to incoming application traffic.

We have designed the new website architecture in such a way that it can seamlessly use the power of auto-scaling and elastic load balancing, to stand up additional back-end servers (on-demand) as an immediate response to increased loads at any time of the day or night.

Auto-scaling allows us to increase or decrease the number of virtual servers to provide scalable cloud computing.
Auto-scaling allows us to increase or decrease the number of virtual servers to provide scalable cloud computing.

Running Varnish in the cloud provides a very powerful caching and auto-scaling technology for our web applications, which significantly reduce the number of AWS instances (virtual machines) our website needs, to handle peak loads of traffic, whilst also being cost effective.

As our website grows, and cache hit rates/back-end requests increase, we can rely on Varnish & Auto-scaling to cope with the brunt of the requests that are continuously flowing into our infrastructure, giving us:
• A significant performance increase.
• Increased resilience.
• Information flow becomes faster, cheaper and highly efficient.

In reality, other factors come into play as well, e.g. the device you are using, your browser, RAM memory on your device, your internet ISP speed, ..etc. However, this and other optimisations that we are currently working on will enable our new website to process requests at super-fast speeds, thus reducing bottle-necks on information and data flow from our side.

Rest assured, we’ll continue to tweak and refine this cutting edge design of coupling a high availability cache with on-demand auto-scaling in our website to make every page load even faster, and ultimately make every single journey matter.

Published by


Agile DevOps, (ITIL & Scrum), Digital Transformation, Scrum Master, Product Owner, Service Design/Transition, Product Manager, Technical Delivery Manager

11 thoughts on “How we make our website fly – Part 2”

    1. Hello,

      Configuring cloud auto scaling in AWS is a technically complex task and we are still learning and optimising. We’ve set thresholds in constraining virtual resources (e.g. memory, CPU) although CPU utilization isn’t always a great indicator by itself. We track key infrastructure resources, 24/7 in CloudWatch (Amazon’s cloud monitoring service) and have configured alerts & polices to resize key infrastructure components to various capacity base-lines when load profiles change. We also recently tested load balancing across both blue (prod) and green (pre-prod) volumes, both scaled at around x25 IIS boxes each (x50 in total) which worked well. The whole auto-scale process is not yet fully 100% automatic and there are still some manual interventions, however as our experience grows, we hope to soon have a fully automatic solution.
      Thanks for the interest.


    2. We auto-scale our caching tier – Varnish and our application tier IIS (.net MVC/WebApi).

      Typically the IIS tier is the only tier that needs to scale. For IIS, we monitor the average CPU across all of the boxes in the tier and when the average goes above a threshold, we add 30% more capacity. This auto-scaling trigger is configured in Amazon Web Services (AWS) Cloudwatch. When the auto-scale event fires, the auto-scaling service launches a new instance from an Amazon Machine Image, this is a blank instance with only the Operating System installed. The instance is launched with a meta-data tag which declares it’s role to Puppet. Puppet then runs to install the software onto the instance to make it usable and bring it into service. When the instance is ready to take load, we add it into the Elastic Load Balancer, we do this using Lifecycle states (see the AWS docs for details).

      We also, auto-scale down when we have too much capacity as this can realise significant cost savings.


      1. Dan, how long does it take for a new instance to be fully provisioned and start serving traffic?
        I’ve heard some people suggest pre-baking an AMI with everything you need, including source code makes the process much quicker.
        Do you find that even with waiting for a puppet run, you’re able to spin up new instances quick enough to deal with a sudden spike in traffic?


      2. Hi Luke,

        Sorry for not replying sooner, I only just noticed this post today. We pay close attention to the scale up times of our instances, and we have found some interesting things. It isn’t always the case that a fully baked AMI scales quicker than a stock AMI + Puppet run. I think the key part to this is the “stock AMI”, for some reason “Stock AMIs” seem to be provisioned and boot quicker than AMI that have been created and then saved down as a private AMI. We don’t know why this is, maybe it is something to do with the fact that stock ami are used by lots of people all the time and are somehow stored in a more efficient way for re-provisioning. In our tests, with our particular stack of components we have found the “stock AMI” + Puppet run is marginally quicker than full fat AMI.

        Given that somewhat unexpected outcome, we currently think that Stock AMI + puppet run is the quickest way to scale out, at least for our stack, this may not be the case for all stacks. When you then think about other aspects of the infrastructure, in particular patching, we find that “Stock AMI” + Puppet is the easiest way to do patching too. This is based on the principle that we let AWS patch our instances rather than doing our own patching and saving it down as an AMI. WIth this approach, we simply get notification from AWS that a new (patched) AMI is available, we update our cloudformation stacks to refernce the new AMI, rebuild and we are done, no need for WSUS or anything like that. For the scaling instances this approach really makes some of those ops processes easy and efficient. So on balance we prefer the Stock AMI + Puppet, as we don’t need to maintain an AMI factory as part of our build process. I hope this helps.


  1. Excellent. excellent. public website using high-end tech !! What happens when your cache runs full ? do you get slower cache thread pileup under heavy load ? Chris


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s