Industry leaders such as Microsoft, Google, and Amazon have massive server deployments around the world. Likewise, Netflix -- which is famously cloud-first for programmatic operations -- still relies on their own infrastructure for its content delivery network (CDN). Unless you work for these companies, your data center deployments are unlikely to be quite that large. That said, there are valuable lessons to learn vicariously from these massive server deployments, which are applicable in smaller-scale deployments. Accordingly, learning how to scale down the practices of hyperscale companies for the size of your organization is also a valuable lesson to learn.
Failure migration is vital to get right the first time
For cloud computing providers, storage is made highly available on instances, making it possible to spin up an identical instance on a different node in the event of a node failure. This availability is vital for failover migration, as changing nodes for a given instance requires manual troubleshooting to determine the origin of the issue, to ensure customers will not lose data when the instance is spun up elsewhere. Naturally, this prevents IT staff from designing VMs to automatically spin up on a different node. However, requiring IT staff to manually investigate issues can lead to downtime for customers (or, for internal use cases, applications).
Capacity growth is not necessarily linear
Computing and storage requirements are often influenced by patterns that exist in business operations. Rather than assuming that the amount of data your business generates grows evenly with the number of days worked, data growth is more likely to increase with external milestones. For example, an accounting firm is likely to demonstrate larger increases in data storage in the first quarter of a given year, as supporting documentation for tax filings is submitted in advance of an April filing deadline.
Bart McDonough, CEO of managed IT and cybersecurity firm Agio, suggests that companies "be very deliberate about capacity planning," adding that "You should never be surprised by depletions of resources bandwidth and storage". Depending on the business's size and circumstances, these levels should be assessed on a regular and deliberate basis, whether it's weekly, quarterly, or otherwise.
Do not assume backups are bulletproof
The practical usability of backups is the Schrödinger's Cat of IT. If you run backups without periodically testing to ensure systems are restored properly from these backups without issue, do you actually have anything backed up? McDonough notes that "there are best practices we know we need to do, including checks that data storage backups and disaster recovery (DR) tests are still effective. Companies tend to only examine these in response to a problem, and less so proactively. Get these checks on the calendar, schedule them regularly, assign accountability for both the task itself and ensuring that completion is achieved."
You do not have the purchase power of hyperscale companies
Because of the performance requirements of hyperscale companies, much of (if not all) the hardware they deploy in their data centers is a custom design. According to Stephen Hill, Senior Analyst for Applied Infrastructure at 451 Research, "much of the value proposition of a mega-scale environment can come from designing them with an eye for extreme efficiency of the physical factors; managing a delicate balance between compute density, power, and cooling. For them, a percentage point or two of improvement can have a major impact on their bottom line."
Likewise, Hill notes that smaller organizations will not be able to match hyperscalers in terms of hardware customization, but that management overhead for deployments at smaller organizations can be reduced through automation just as effectively as at hyperscale companies.
Hanging on to old hardware is bad for security and morale
While hyperscale companies do not publicly disclose their hardware lifecycles, Holger Mueller, Principal Analyst & VP at Constellation Research, estimates most systems as having a two- to four-year lifecycle at hyperscale companies, and indicates that this is good guidance for enterprise as well.
Allowing hardware to stay in service beyond that time frame can have negative implications for security, as well as demoralize IT staff who must dedicate time to fixing that hardware. As business requirements vary, Mueller notes that "CIOs need to ask themselves what are the performance and security implications to let hardware linger longer than that."
MORE ON HYPERSCALE CLOUD PROVIDERS
Vendor comparison: Microsoft Azure, Amazon AWS, and Google Cloud(Tech Pro Research) Effectively measuring, contrasting, and comparing the details of products and services offered by Azure, AWS, and Google Cloud requires a systematic and rational approach. This download includes an overview of critical decision factors as well as a simple tool for comparing services and choosing the best vendor for your needs.