In the second part of this series on configuration and change management in the TfL hybrid Agile, DevOps & ITIL world, I’ll take a look at infrastructure as code and the CMDB.
DevOps vs ITIL®
There’s a lot of planning and documenting involved in the various processes outlined throughout the ITIL® framework, with little mention, if any, of iterative or agile/lean thinking and approaches.
Gareth Daine (https://purplegriffon.com/blog/is-itil-agile-enough)
Some say that ITIL® is optimised for safety at the cost of speed, whereas DevOps prioritises speed of delivery and rapid response, seeking to remove heavyweight processes.
A good question to ask is “what is the minimum viable amount of planning we can do to maintain security and stability in our systems, yet also enable continuous delivery?”
ITIL® has a strong focus on continual service improvement which equates to the feedback loops of DevOps. However, the speed of these loops has to be increased to match DevOps. In reality this means creating value and good governance through collaboration, trust, light documentation (document only what is needed, not everything), always with a focus on the customer.
Fundamentally DevOps is about good working relationships and working together collaboratively. Done well, DevOps and ITIL® have much in common and there is a good synergy. There is no magic bullet, once you discover what works, (often a hybrid of frameworks) keep those practices going and perform a regular review of what practices create value vs those that hinder and lose value.
Infrastructure as Code
Technology is rapidly changing and “infrastructure as code”, also known as programmable infrastructure, is just one example in TfL Online. Essentially, it means writing code or using tools to manage configurations and automate provisioning of cloud infrastructure to host our website and digital services.
We use tools like Puppet & CloudFormation that automatically provide version control so we can easily track all the changes in our infrastructure environment. Best of all, infrastructure as code is fast, so in the event of a server crash the system can be restored within minutes, and following a change or update we’re easily able to roll back configuration changes to the last known working one.
However, it’s important to be mindful of governance to avoid things like configuration drift, (virtual server configurations being out of sync through say a hotfix) and to avoid bad configurations being replicated across all virtual systems.
Infrastructure as code enables our DevOps teams to engage in provisioning, configuration management and other IT management tasks, which would otherwise be left to Systems Admins or traditional ITIL® roles.
Configuration management & the CMDB (configuration management data-base)
Our CMDB is aligned to how configuration impacts the delivery of virtualised and cloud-based website and digital services. For example, a virtualised machine image (VMI) is a configuration item and we can easily ensure that the image used in production is configured to be the same as the one used in our Test/Dev environments. Version control is maintained in Puppet and CloudFormation by automated provisioning of cloud services, where we can deploy standard configurations with a couple of mouse clicks.
We don’t record names, as any auto-recovery/auto-scale will rename them automatically, so we record the “role” names. Each “role” defines what Puppet and CloudFormation will configure the instance to be. Every instance has a role defined and it’s available in the meta data in AWS.
We are making a transition from configuration control, to “what do we really need to document & record”. So, if a virtual IIS internet server box starts to fail, gone are the days of raising an incident and problem ticket with associated RFCs (requests for change) and CI’s (configurable items).
We need a speedy, real-time response without bureaucracy, so we either stop/start the instance or simply terminate and spin up a new IIS box with a single mouse click using automated Puppet scripts. We don’t bother trouble-shooting, or logging, unless a pattern of failure develops. In a similar agile vein we have adopted a “performance testing in live” approach, for example we recently successfully trialled a new size of IIS box (c3.xlarge) with real live production traffic and this is now our new default size.
In part 3, I’ll look at the tools and automation involved in our processes