How does one of the world's largest online retailers and cloud services companies organize its IT infrastructure? With service oriented architecture-based principles, of course. However, while this serves the organization well for more efficient and customer-centric development, it became a nightmare at deployment time. Time to automate, courtesy of project "Apollo."
In a post at the end of last year, Werner Vogels, CTO of Amazon.com, outlined the purpose and benefits of Apollo -- its SOA-based deployment engine. Amazon being Amazon, of course, is a far-flung organization with many different units and projects taking place. Amazon's IT is structured along the lines of specific services, each developed and owned by a dedicated team.
However, things were constantly getting crunched at deployment time, Vogels reports. "Deploying software to a single host is easy," he said. "You can SSH into a machine, run a script, get the result, and you're done." The Amazon production environment, however, is more complex than that, since applications and web services would need to be "run across large fleets of hosts spanning multiple data centers," he continued. This is where the bottlenecks kept taking place, he said. "Manual deployment steps slowed down releases and introduced bugs caused by human error. Many teams started to fully automate their deployments to fix this, but that was not as simple as it first appeared."
The solution was to introduce an enterprise-wide shared deployment service that automatically sequences software updates across servers. "Developers could define their software setup process for a single host, and Apollo would coordinate that update across an entire fleet of hosts," says Vogels. "This made it easy for developers to 'push-button' deploy their application to a development host for debugging, to a staging environment for tests, and finally to production to release an update to customers."
Apollo now handles at least 50 million deployments annually to development, testing and production hosts annually, Vogels relates. (That's one deployment taking place every second, he illustrates.) The deployment system adds a high degree of intelligence to the process, performing rolling updates to only a fraction of servers are taken offline at a time, as well as providing data on deployment status. "Thousands of Amazon developers use Apollo each day to deploy a wide variety of software, from Java, Python, and Ruby apps, to HTML web sites, to native code services."