We know that Linux on servers is big and getting bigger. We also knew that Linux, thanks to open-source cloud programs like Eucalyptus and OpenStack, was growing fast on clouds. What he hadn't know that Amazon's Elastic Compute Cloud (EC2), had close to half-a-million servers already running on a Red Hat Linux variant.
Huang Liu, a Research Manager with Accenture Technology Lab with a Ph.D. in Electrical Engineering whose has done extensive work on cloud-computing, analyzed EC2's infrastructure and found that Amazon EC2 is currently made up of 454,400 servers.
While Amazon has never officially said what it's running as EC2's base operating system, it's generally accepted that it's a customized version of Red Hat Enterprise Linux (RHEL). On top of that, for the virtual machines, Amazon uses the Xen hypervisor to host Linux; OpenSolaris; Solaris; Windows 2003 and 2008; and FreeBSD and NetBSD virtual machine instances.
Amazon also doesn't talk about how many servers their popular cloud is made up of, so Huang had to work it out. He explained, "Figuring out EC2's size is not trivial. Part of the reason is that EC2 provides you with virtual machines and it is difficult to know how many virtual machines are active on a physical host. Thus, even if we can determine how many virtual machines are there, we still cannot figure out the number of physical servers. Instead of focusing on how many servers are there, our methodology probes for the number of server racks out there."
Huang continued, "It may sound harder to probe for the number of server racks. Luckily, EC2 uses a regular pattern of IP address assignment, which can be exploited to correlate with server racks. We noticed the pattern by looking at a lot of instances we launched over time and running traceroutes between our instances."
Then "Understanding the pattern allows us to deduce how many racks are there. In particular, if we know a virtual machine at a certain internal IP address (e.g., 10.2.13.243), then we know there is a rack using the /22 address range (e.g., a rack at 10.2.12.x/22). If we take this to the extreme where we know the IP address of at least one virtual machine on each rack, then we can see all racks in EC2."
By itself, though, that's not enough. You could use try to use port-scanning to work ot how many servers there are, but that would violate Amazon's terms of service. So instead, since each Amazon Web Services (AWS) "instance also has an external IP address. … we can leverage DNS translation to figure out the internal IP addresses."
With that data, he was able to work out the number of server racks. With this he then just multiplied by the number of physical servers on the rack. "Unfortunately, we do not know how many physical servers are on each rack, so we have to make assumptions. We assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades for a total of 64 blades/rack."
So it is that Huang worked out how many servers there are in the EC2 cloud. It's an impressive achievement for him and it's an impressive example of just how important Linux is in both server and clouds.