How hyperscale data centers are reshaping all of IT

In a world where technology and miniaturization are often considered BFFs, the most important innovations of the 21st century take place in facilities that may be seen from space.

Cloud migration costs are overwhelming businesses ZDNet's Larry Dignan tells TechRepublic's Karen Roby about the issues that businesses are encountering when trying to move to the cloud.

The phrase "data center" is, right at the outset, a presumption. It recalls an era when an enterprise's back-office information systems were mostly devoted to data storage, and were cobbled together in one basement or closet location. "Infrastructure," like a sewer system or a foundation of a highway beneath the potholes, was something no one was supposed to see or pay attention to.

Today, all these assumptions have been overturned. An enterprise's IT infrastructure is comprised of computing power and connectivity, in addition to data storage. And like the enterprise itself, it has a natural tendency to become distributed.

facebook-fort-worth-data-center-03.jpg

Inside Facebook's Fort Worth data center complex.

(Image: Chad M. Davis for Facebook)

A hyperscale data center is less like a warehouse and more like a distribution hub, or what the retail side of Amazon would call a "fulfillment center." Although today these facilities are very large, and are operated by very large service providers, hyperscale is actually not about largeness, but rather scalability.

Imagine for a moment a factory where every component involved in the manufacture of a product, including the conveyor that brings the parts onto the assembly line, were modularized into a small area. You read this correctly: a small area. Now imagine the functionality of this module becoming so efficient and so reliable that your could grow your yield exponentially simply by connecting more of these modules together in a linear row. Or a farm where, if you double your acreage, you more than double your yield.

Hyperscale is automation applied to an industry that was supposed to be about automation to begin with. It is about organizations that happen to be large, seizing the day and taking control of all aspects of their production. But it is also about the dissemination of hyperscale practices throughout all data center buildings -- not just the eBays and Amazons of the world, but the smaller players, the little guys, the folks down the street. You know to whom I'm referring: pharmaceutical companies, financial services companies, and telecommunications providers.

One vendor in the data center equipment space recently called hyperscale "too big for most minds to envision." Scalability has always been about creating opportunities to do small things using resources that happen to encompass a very large scale.   

Must read:

What do the "hyper" and "scale" parts mean?

Specifically, a hyperscale data center accomplishes the following:

  • Maximizes cooling efficiency. The largest operational expense in most data centers worldwide -- more so than powering the servers -- is powering the climate control systems. A hyperscale structure may be partitioned to compartmentalize high-intensity computing workloads, and concentrate cooling power on the servers hosting those workloads. For general-purpose workloads, a hyperscale architecture optimizes airflow throughout the structure, ensuring that hot air flows in one direction (even if it's a serpentine one) and often reclaiming the heat from that exhaust flow for recycling purposes.
  • Allocates electrical power in discrete packages. In facilities designed to be occupied by multiple tenants, "blocks" are allocated like lots in a housing development. Here, the racks that occupy those blocks are allocated a set number of kilowatts -- or, more recently, fractions of megawatts -- from the main power supply. When a tenant leases space from a colocation provider, that space is often phrased not in terms of numbers of racks or square footage, but kilowatts. A design that's more influenced by hyperscale helps ensure that kilowatts are available when a customer needs them.
  • Ensures electricity availability. Many enterprise data centers are equipped with redundant power sources (engineers call this configuration 2N), often backed up by a secondary source or generator (2N + 1). A hyperscale facility may utilize one of these configurations as well, although in recent years, workload management systems have made it feasible to replicate workloads across servers, making the workloads redundant rather than the power, reducing electrical costs. As a result, newer data centers don't require all that power redundancy. They can get away with just N + 1, saving not just equipment costs but building costs as well.
  • Balances workloads across servers. Because heat tends to spread, one overheated server can easily become a nuisance for the other servers and network gear in its vicinity. When workloads and processor utilization are properly monitored, the virtual machines and/or containers housing high-intensity workloads may be relocated to, or distributed among, processors that are better suited to its functions, or that are simply not being utilized nearly as much at the moment. Even distribution of workloads directly correlates to temperature reduction, so how a data center manages its software is just as important as how it maintains its support systems.

How big is "big?"

facebook-fort-worth-data-center-01.jpg

Inside Facebook's Fort Worth data center complex

(Image: Chad M. Davis for Facebook)

What makes a large facility hyperscale is not its size, but rather how its design enables its tenants to make optimum use of its resources within that size. AFCOM, the association for data center professionals, has developed a metric for distinguishing between classes of data center facilities. It counts the number of racks the facility hosts, and also the number of square feet (or square meters) devoted to IT components only (its "white space"). It then matches both numbers against the chart below, and chooses the metric name for the number on the highest row. For instance, a facility with 120 racks in 6,000 square feet of white space would be considered "Medium," since 6,000 falls within the highest range.

afcom-data-center-size-metrics.jpg

(Image: Courtesy AFCOM)

To give you a benchmark, the retail industry's 2017 estimate of the average floor space for active US supermarkets was about 47,000 square feet.

If you've ever seen or participated in the construction of a supermarket, you know that the whole point of constructing a larger retail facility is to leverage as many efficiencies as possible to maximize profitability. It's certainly not less expensive to construct, power, cool, or heat a large building, but all these costs could be lower per square foot or cubic foot. They won't be lower, however, if the building isn't constructed with efficiencies and best practices in place. In other words, if you're making the building bigger just to be bigger, you're not making good use of any of the innate economies of scale a larger form factor offers.

A factory works in a similar way. If you design a larger factory space with efficiencies in mind, those efficiencies will translate into lower operating costs, and greater profitability for everything produced there.

A data center is an information factory. It produces all the resources and functionality that you use on the opposite side of the Internet connection from your browser or smartphone app.

Synergy Research Group, which analyzes the companies in the data center services space, defines a hyperscale data center as a large complex (at least) that's operated by a "hyperscale provider." By that, the firm means an organization that manages its large facilities using the hyperscale principles listed above. Last January, the firm estimated that at the end of last year, the world had a grand total of 430 hyperscale facilities, some 40 percent of which were located in the US.

facebook-fort-worth-data-center-07.jpg

An artist's conception of Building 3 of Facebook's Fort Worth data center, when completed.

(Image: Courtesy Facebook)

Here is the very model of a modern hyperscale data center: Located in the Fort Worth, Texas area and officially opened in May 2017, Facebook's fifth hyperscale facility now includes the H-shaped Building 3, which has become the template for multiple building projects at 10 successive facilities worldwide. When completed in mid-2020, by current estimates, the complex will encompass five buildings, with a total collective space that should eclipse 2.5 million square feet.

Although Facebook hasn't provided an official estimate, reporters speaking with engineers have gathered that this building alone spans 450,000 total square feet, definitely qualifying as an AFCOM "Mega" facility. Of course, a fraction of that space is for IT equipment (white space), while another chunk is for power and support, and the center link is office space.

Within each pillar of the "H" are modular combinations of main distribution frames (MDF), also called data halls. These modules extend in two directions from the building distribution frame (BDF) in the center, stacking onto one another like sideways building blocks. The BDF contains the telecommunications cables connecting these data halls to the outside world.

facebook-data-hall.jpg

The standard layout for one segment of the "H" in a Facebook hyperscale data center.

(Image: Courtesy Facebook)

Already, you can see the real purpose of hyperscale architecture: To make it possible to administer physical space, and the physical systems inhabiting that space, with the same levels of efficiency and automation devoted to managing software.

Must read:

Facebook specifies hyperscale, redefining the data center

In any market, the consumer who is capable of buying wholesale, usually in bulk, is often the one with the most purchasing power. Facebook, Google, Amazon, eBay, and Microsoft are recognized as the organizations that have deployed the most -- and in some senses, the greatest -- hyperscale data centers.

Yet Facebook has been the loudest and most assertive force in establishing guidelines for what hyperscale should be, in hopes that others will follow its lead and help drive down costs. In 2014, Facebook published its specifications for the architecture of the data centers it builds, and the assembly of its IT equipment. It told the world what it buys and why, so that manufacturers would get to work building it.

Although Facebook refrains from invoking the word "hyperscale," many equipment vendors who purport to offer components for hyperscale data centers, treat Facebook's declaration as their official definition. Here is how Facebook's hyperscale fabric works:

  • Network switches are layered and disaggregated. Facebook's principal network building block is a pod, which replaces the cluster in typical configurations. The size of a pod is limited to 48 server racks -- no more, no less -- with each rack equipped with its usual top-of-rack switch (TOR), but with all 48 switches being served in turn by 4 upper-level devices called fabric switches. It's the fabric that ensures that each server in the pod is equally and redundantly connected to the entire network. This, in turn, enables a management system to route workloads to the servers best suited for them.
facebook-network-topology.jpg

A 3D diagram of Facebook's hyperscale network topology.

(Image: Courtesy Facebook)
  • Network pods are cross-connected. Each fabric switch is numbered (1, 2, 3, 4), and each number corresponds to a higher-level layer of switches that form what Facebook calls a spine plane. Here, each spine switch connects to 48 fabric switches -- again, no more, no less. It's this absolutely fixed nature of the switch arrangement which ensures that no single point in the network is "over-subscribed" -- that the bandwidth of traffic coming in is never greater than for traffic going out -- unless the administrative system perceives a short-term need for such an arrangement, and can prepare the fabric appropriately.
  • The building construction and layout are based around pods. When the data center is constructed, the physical support for all the network cables and power connections that pods require -- as well as to pods that may yet be constructed at some future date -- are built into the BDF. So a hyperscale facility is a purpose-built component of a global network, like an electronic device that happens to take up over 100,000 square feet of space.
  • A server is a server is a server. Within this fabric, each server is a bare-bones, expendable brick. Like a hard disk drive in a RAID array, it's expected to fail, and no server is more special than another one. When it does fail, or even when its performance falls below par, it gets taken offline and replaced.

So if a pod has exactly 48 fabric switches, and a spine has exactly 48 of its own switches, you might be asking yourself, just where is this highly-touted scalability supposed to be?  It's in how the resources within this rigid, homogenized infrastructure are utilized.

Why data center automation is like climate change

In a typical enterprise data center, equipment that's 10 years old or even older co-exists with fresh components from the factory floor. As a result, its network topology tends to acquire a certain "texture," where some segments perform better than others.

scale-edge-promo-01.jpg

Read Have hyperscale, will travel: How the next data center revolution starts in a toolshed -- the inaugural edition of ZDNet Scale by Scott Fulton

This may seem like an unrelated topic, but it isn't really: Our planet's climate is as fickle as it is because small changes to one part of the ecosystem can have cascading impacts on the rest of it. Thus, a reduction in the ozone layer due to chemical pollutants will trigger the sun to warm the oceans at a greater rate, changing the direction of air currents and causing more turbulent storms. The effects of descending air from Canada and rising air from the Gulf of Mexico, are magnified.

If a data center topology is like the surface of our planet, then small changes in one aspect may have cascading effects throughout the facility. So if certain processors appear to perform better than others, for instance, workload orchestrators may prefer them over the processors in other servers. As a result, the servers with the preferred processors get hotter sooner. Air currents within the facility change, as expelled hotter air may get trapped in pockets where the air isn't circulating as much. Ironically, these local rises in temperature impact the under-utilized servers, making it more difficult for them to process data and expel heat.

Facebook is among those few organizations with the purchasing power to specify exactly how its data centers are architected, constructed, and operated, right down to the grade of cement used in their foundations.

But by publishing these specifications, Facebook leveraged its power as a top-of-mind, globally recognized brand to set the rules for an entire technology market -- one that is geared exclusively around the hyperscale data center, and one that will not be exclusive to Facebook. In this market, if you're any kind of equipment supplier (servers, power modules, network cables, blanking panels, floor tiles, fire extinguishers) then your equipment had better be produced to these specifications, else you're liable to be ignored.

Must read:

Hyperscale at the server level

This is where Facebook's power to define the market, or at the very least lead the definition of it, impacts digital technology as a whole. Not only did Facebook specify the architecture and components for a hyperscale data center, but by spearheading the creation of the Open Compute Project (OCP), it put forth a new set of rules for hyperscale servers -- the workhorses of large data centers everywhere.

Mind you, Facebook did not raise the bar for data center managers' expectations. In fact, lowering the bar was essentially the point. While manufacturers such as Dell, HPE, Lenovo, and IBM were marketing their top-of-line, state-of-the-art systems as tailor made for the most intensive applications -- as enterprise-grade supercomputers -- OCP re-cast servers as bit players in a massive ensemble. "White box" servers, as they're still called, are so undistinguished that it's arguable they don't deserve brand names.

At the time OCP was founded, it appeared none of these producers could afford to lose even one of these hyperscale operators as its customer. Such a loss would constitute a measurable percentage of its revenue, and in turn, reflect negatively on its stock value. On the other hand, acquiescing to Facebook's central, implied argument -- that bulk servers are more cost-effective at large scale -- called into question these producers' entire server value proposition for the general enterprise. If a PowerEdge, ProLiant, ThinkSystem, or Power Server is just one cog in a massive wheel, then what would it matter if it were the best cog in the entire rotation? Where is the return on investment for a premium that can't be measured?

One clue to an answer comes from eBay. In 2013, the e-commerce and auction service entered into a partnership agreement with Dell, making it eBay's preferred producer of what was being called "density-optimized servers" for eBay's hyperscale facilities.

ebay-servers.jpg

EBay's own branded servers operating in its own data center.

(Image: Courtesy eBay)

Last September, however, claiming the urgent need for "re-platforming our infrastructure" to enable more granular workload management using open source tools, eBay made a U-turn, opting to be its own server designer from that point forward. Now, as part of a three-year overhaul plan, the company is implementing its own white box servers, presumably the only customized part being eBay's faceplates.

As early as 2014, Data Center Knowledge ascertained that the cost differences between OCP-compliant white box servers and the brand-name components they would replace, was being perceived by enterprises as not enough to warrant sacrificing the trust relationships they have with their current suppliers. Now, eBay is following Facebook's lead, designing very standardized servers to fit its hyperscale needs, and outsourcing their production to an original design manufacturer (ODM). Evidently, eBay recognizes even further cost savings by leveraging its power as a hyperscale service provider, to implement its own architecture at micro-scale as well as macro-scale.

But that's a level of bargaining power that simply does not scale down. Few enterprise customers have the expertise on-hand to make server design decisions for themselves; that's why they rely on trusted manufacturers who continue to promote "end-to-end solutions."

Nevertheless, large enterprises are insisting upon some form of the flexibility in workload management and asset management that hyperscalers have carved out for themselves. It's one reason why Kubernetes has surged so rapidly in popularity; it's a workload orchestration system derived from the "Borg" system that Google engineers built in-house.

So hyperscale design, perhaps in fits and starts, has had a measurable impact on the architecture of servers marketed to organizations other than hyperscale service providers. But engineers' attentions have recently been turned away from centralized facilities, toward distributed operations centers in remote locations -- those places on the frontiers of data processing and delivery called "the edge."

Must read:

Is "Hyperscale" just another word for "Hyperconverged?"

No. They have the same prefix, which is their principal similarity. Hyperconvergence (HCI) refers to the ability for a data center infrastructure management (DCIM) system to pool together the resources from multiple servers (compute, memory, storage, networking) and delegate those resources for individual workloads. It's a way to think of everything a server contributes to the data center as though they were fluids, instead of being locked away in separate boxes.

A hyperscale data center may make good use of servers that happen to employ HCI. But HCI is not an absolute requirement for making data centers efficient in their use of space, power, and cooling.

Must read:

Does "hyperscale" just mean "a big, big cloud?"

No. Granted, a cloud platform is a means for deploying flexible workloads on a group of servers clustered together. And yes, the resources on those servers may be dialed up or down, which is a feature that hyperscale operators demand.

But a cloud platform such as VMware Cloud Foundation or OpenStack is geared for enterprise administrators to manage resource requirements (along with the occasional resource desire) that are unanticipated and subject to frequent change. Administrators can go into the system and make adjustments or new provisioning requests at will. A hyperscale environment, by contrast, is geared for consistent automation. If it's running properly, there's no need to "go in" and make changes; it will be managing workloads next year much the same way they're being managed this year. One of the US Government's requirements for a cloud computing service, as published by the NIST institute of the Dept. of Commerce, is that its resources may be provisioned using a self-service portal.

So although Amazon's public-facing services certainly qualify as a cloud, its hyperscale operations for managing that cloud are a different substance altogether.

Must read:

Colocation extends hyperscale to the enterprise

There was a time not long ago when an enterprise data center was presumably located somewhere within the enterprise. For many organizations, a data center may still be centrally located on-premises, or it may rely on several facilities distributed across multiple branches.

But in a modern business environment, colocation is an extremely attractive option. A colocation agreement, like any other real estate deal, is a lease on an area of space within the lessee's data center facilities. It enables a tenant to deploy its own equipment in a building that's typically large, very well-managed, strongly secured, and well-powered and cooled.

With its proximity to Washington, DC, Ashburn has become the US' most competitive location for data center services. In September 2018, I co-produced a webinar along with Data Center Knowledge and RagingWire Data Centers, which is building colo facilities in Ashburn, Virginia, among other places. RagingWire is a colo provider for enterprises that have their own IT assets, and need a well-secured location with strong connectivity to make optimum use of them.

In this webinar, I presented this diagram of the interior of one of the Ashburn complex's recently constructed buildings. Called Ashburn VA3, it's compartmentalized into so-called data vaults, each of which is geared for different power and cooling configurations based on workload requirements.

ragingwire-ashburn-va3.jpg

Cutaway view of RagingWire's Ashburn, Virginia VA3 facility.

(Image: Courtesy RagingWire Data Centers)

It's not exactly Facebook architecture where the entire complex is essentially homogenized. And at a mere 245,000 square feet, it's not quite the colossus that a big Facebook "H" can be. But taking into account that various tenants will be bringing in a variety of heterogenous equipment, RagingWire does employ a kind of hyperscale-inspired method of controlling space, and automating the distribution of airflow and power within these spaces.

Ashburn VA3, and the other RagingWire buildings that surround it, represent hyperscale methodology that's been brought down somewhat closer to Earth. Because its construction principles are rigidly specified and repeatedly practiced, the company can build a facility in a timeframe closer to eight months than two or three years. And the management practices and methodologies that apply to one building can easily be adjusted to apply to others.

So here is where hyperscale architecture and principles meet the enterprise, giving businesses a way to host both their infrastructure and their workloads in a manner derived from the standards set forth by Facebook, Amazon AWS, and the others in the upper-level cloud space. Although our attention is often focused on handheld devices as the center of conversation in tech today, the actual center of computing activity is the hyperscale data center -- which is both becoming larger and smaller. History may yet record the hyperscale facility as the device that defines technology in the 21st century.

Learn More -- From the CBS Interactive Network

Elsewhere