Love it or hate it,will always be an integral part of any IT set-up, with regulations such as Basel III, FISMA, Sarbanes-Oxley and HIPAA constantly breathing down the neck of organisation leaders.
Having once had a purple badge wearing ITIL guru for a manager, I was fascinated by how he'd advocate the framework as the solution to all our IT problems. He'd hark on about defining repeatable and verifiable IT processes, but it always ended up being theoretical rather than practical.
That distinction was never more apparent than in the almost pointless weekly change advisory board meetings. While the processes themselves were painfully bureaucratic and often a diversion from operational work, the meetings themselves were just strange.
With barely anyone in attendance, the board would ask for a justification for each change, with a response of "approve" or "rejected" even though it was clear they had little or no grasp of the technical explanation or implication provided.
They still inexplicably passed every audit, which in turn legitimised the rogue under-the-radar operational practices that served to keep the lights on
Then there was the security and risk-compliance chap who'd lock himself in his room, glued to his Tripwire dashboard spying on any unapproved changes. So imagine his amazement when I introduced him to VMware and vMotion.
My migration of VMs across physical servers without raising a change and without him being able to pick it up on Tripwire sent him into a frenzy. Amused by his constant head shaking, I decided to disclose I had also been seamlessly migrating LUNs across different RAID Groups with HDS' Cruise Control to get more spindles working, whereupon he'd rushed back to his cave to check whether Tripwire had picked it up. Was I really supposed to raise a change for every vMotion or LUN migration?
Several years later, as a technical consultant my impression of the effectiveness of the change advisory board failed to improve. Late one night at a customer's datacentre, I'd pointed out that they had cabled up the wrong ports and that this would require a change to be raised.
"No need for that" replied the SAN architect, "I'm one of the board members". He then proceeded to pull out and swap the FC cables to his production hosts with a big grin on his face. Several minutes later his phone rang, to which he replied, "It's OK, I've resolved it. There was a power failure on some servers." Then he turned to me with a wink and said, "There you go. Sorted. Lubbly jubbly."
My initial scepticism to ITIL's practicality centred on my personal experiences, but it was only reinforced by the number of external auditors who supposedly checked whether proper controls existed within these many firefighting and cowboy organisational procedures.
Like a classroom of kids hearing the teacher coming up the corridor and scurrying to their desk to create an impression of discipline and order, the compliance folk's last-minute changes never ceased to astound me. Despite having daily Priority 1s, they still inexplicably passed every audit, which in turn legitimised the rogue under-the-radar operational practices that served to keep the lights on.
So with such a tarnished experience of ITIL, it was with great interest that led me to look closely at the initiative of ITPI's Visible Ops. While still mapping its ideas to ITIL terminology, Visible Ops puts the onus on increasing service levels, decreasing costs and increasing security and auditability. In simplest terms, Visible Ops is a fast-track or jumpstart exercise to an efficient operating model that replicates the researched processes of high-performing organisations in four steps:
Phase 1. Stabilise the patient
Given that almost 80 percent of outages are self-inflicted, any change outside scheduled maintenance is quickly frozen. It then becomes mandatory for problem managers to have any change-related information at hand so that when that 80 percent of unplanned work is initiated a full understanding of the root cause is quickly established.
This phase starts with the systems and business processes that are responsible for the greatest amount of firefighting because resolving them would free up work cycles to initiate a more secure and measured route for change.
Phase 2. Catch and release and find fragile artefacts
This phase is related to the infrastructure itself with the understanding that it cannot be repeatedly replicated. By gaining an accurate inventory of assets, configurations and services, the objective is to identify the artefacts with the lowest change success rates, highest MTTR and highest business downtime costs.
By capturing all these assets and their interdependencies, an organisation ends up in a far more secure position before a priority 1 firefighting session.
Phase 3. Establish repeatable build library
This phase is focused on implementing an effective release management process. Using the previous phases as stepping stones, this phase documents repeatable builds of the most critical assets and services so that rebuilding them is more cost-effective than repair.
In a process that leads to an efficient mass-production of standardised builds, senior IT operations staff can transform from a reactive to a proactive release management delivery model.
This transformation is achieved by operating early in the IT operations lifecycle by consistently working on software and integration releases before deployment in production environments.
At the same time a reduction in unique production configurations is pushed for, consequently increasing the configuration lifespans before their replacement or change, which in turn leads to an improvement in manageability and reduction in complexity.
Eventually these repeatable builds enable the creation of "golden" images that have been tried, tested, planned and approved before production. So when new applications, patches and upgrades are released for integration, these golden builds or images need merely updating.
Phase 4. Enable continuous improvement
This is pretty self-explanatory in that it deals with building a closed loop between the release, control and resolution processes. By completing the previous three phases, metrics for the three key process areas — release, controls and resolution — provide a focus, specifically those that can facilitate quick decision making and provide accurate indicators of the work and its success in relation to the operational process.
Drawing on ITIL's resolution process metrics of mean time before failure (MTBF) and mean time to repair (MTTR), this phase looks at release by measuring how efficiently and effectively infrastructure is provisioned. Controls are measured by how effectively the change decisions that are made keep production infrastructure available, predictable and secure, while resolution is quantified by how effectively issues are identified and resolved.
So while Visible Ops looks great on paper what really differentiates it from potentially just being another theoretical process that fails to be delivered comprehensively in practical reality?
If the manner in which IT is procured, designed, configured, validated and implemented remains the same, there is little if any chance for Visible Ops to succeed any much further than the purple badge lovers of ITIL.
Visible Ops will eventually suffer if the IT infrastructure that the framework is based on was built by the same mode of thinking that created the problems
But what if the approach to IT and more specifically its infrastructure was to change from the traditional buy-your-own, bolt-it-together and pray-that-it-works method and instead transferred to a more sustainable and predictable model?
What if the approach to infrastructure was one of a green-fields approach or seamless migration to a pretested, prevalidated, pre-integrated, prebuilt and preconfigured product — that is, a true converged infrastructure? What impact could that have on the success of Visible Ops and the four phases?
Phase 1 can be immediately achieved with a converged infrastructure where an organisation no longer needs spend time investigating the risks and impact of change. With a standardised product-based approach, thousands of hours of QA testing and analysis can be performed by the vendor for each new patch, firmware upgrade or update on a like-for-like product owned by the customer.
With this approach acting as the premise of a semi-annual release certification matrix that updates all the components of the converged infrastructure as a comprehensive whole, risks typically associated with the change process are eliminated.
Furthermore because changes are dictated by this pretested and prevalidated process and need to adhere to this matrix to remain within support, it helps eradicate any rogue-based changes as well as inform problem managers of the necessary changes.
Ultimately phase 1's objective of stabilisation is immediately achieved via the risk mitigation that comes with implementing a pre-engineered, predefined and pretested upgrade path.
The challenge of phase 2, which in essence equates to an eventual full inventory of the infrastructure, is a painful process at the best of times, especially as new kit from various vendors is constantly being purchased and bolted on to existing kit.
Simplified asset management
Moving to a converged infrastructure simplifies this challenge as it's a single product and hence a single SKU at procurement. The parts of the product and all their details are known to the manufacturer, thus ensuring an accurate inventory and simplified asset management process.
When patches, upgrades and additions of new parts and components are required, they are automatically added to the inventory list of the single product, thus ensuring up-to-date asset management.
The release-management requirement of phase 3 offers a challenge that not only involves risk but also takes up a significant amount of staff and management time to ensure that technology and infrastructure remain up to date.
A converged infrastructure meets this challenge immediately by making pretested, validated software and firmware upgrades available for the end user enabling them to locate releases that are applicable for their system.
As for the rebuild as opposed to repair approach in phase 3, because a converged infrastructure can be deployed and up and running in only 30 days, the ability to have a like-for-like standardised infrastructure for new and upcoming projects is a far easier process compared with the usual build-it-yourself infrastructure model.
Billing and chargeback model
On a more granular level, by having a management and orchestration stack with a self-service portal, golden image VMs can be immediately deployed with a billing and chargeback model as well as integration with a CMDB.
The result is a quick and successful attainment of phase 3 of the Visible Ops model via a unified release and configuration management methodology that is highly predictable and enhances availability by reducing interoperability issues.
Measuring the success of metrics such as MTTR and MTBF as detailed in phase 4 is ultimately linked to the success of the monitoring and support model that's in place for your infrastructure. With a product-based approach to infrastructure the support model will also be better equipped to ensure continuous improvement.
Having an escalation response process based on a product, regardless if resolving a problem requires consultation with multiple experts or component teams, ultimately means a seamless and single point of contact for all issues.
This end-to-end accountability for an infrastructure's support, maintenance and warranty makes the tracking of issue resolution and availability a much simpler model to measure and monitor.
Furthermore with open APIs that enable integration with comprehensive monitoring and management software platforms, the converged infrastructure can be monitored for utilisation, performance and capacity management as well as potential issues that can be flagged proactively to support.
As IT operational efficiency becomes more of an imperative for businesses across the globe, the theoretical practices that have failed to deliver are either being assessed, questioned or in some cases continued with.
What is often overlooked is that one of the inherent problems is the traditional approach to building and managing IT infrastructure.
Even a radical and well-researched approach and framework such as Visible Ops will eventually suffer if the IT infrastructure that the framework is based on was built by the same mode of thinking that created the problems.
Fundamentally whether the Visible Ops model is a serious consideration for your environment or not, by adopting the framework with a converged infrastructure, the ability to stabilise, standardise and optimise your IT infrastructure and its delivery of services to the business becomes a lot more practical and consequently a lot less theoretical.