This past week we saw major outages on popular social media websites Facebook and FourSquare. FourSquare had two multi-hour outages on consecutive days, with the explanation that the servers were being upgraded to handle the excessive load being placed on them. As I write this, Foursquare is in the process of another outage, with a message on their website saying that the servers are being upgraded.
This is actually fairly common in startups. The company founders start out very small, build a product and get it running quickly to show investors so they can afford to build it out further. The problem is that many companies never get out of the mindset that it's okay to just make updates and have multi-hour outages after they've reached a level of success and popularity.
Or as I like to call it, "amateur hour".
Quite often infrastructure is overlooked as a critical aspect of running an online service or business. All too often company founders aren't technically savvy, and have no idea what an IT department really does. They invest in developers but overlook system and network administrators often to their detriment. The bean counters that control the company budget give short shrift to the IT team that's in charge of keeping your business online.
Sometimes an IT department is completely overlooked, and developers are expected to handle the sysadmin tasks; quite often they are unfamiliar with these responsibilities and underqualified to handle them.
This lack of attention to infrastructure results in underpowered, under-funded, and downtime-prone networks that are not up to the task of keeping an online business viable. Social media sites like Twitter, Facebook and Foursquare are perfect examples of what not to do when ramping up your business.
The two most egregious mistakes an online service can make are: 1) not planning for future growth; 2) Making substantial system changes during the busiest hours of the day. When you're trying to build up business relationships with other companies to generate revenue, and want to look good to your investors so they keep funnelling cash to you to grow, you don't take your site down for 8 hours during normal business operations to upgrade your servers without notice.
This bears repeating: DO NOT TAKE YOUR SERVERS DOWN DURING BUSINESS HOURS.
The only reason there should be downtime during business hours is due to an unforseeable emergency. If you need to upgrade your servers, do it in the off-hours. After 7pm at night in the busiest part of the world (usually the US). On weekends. Plan it in advance, and give your customers/users plenty of lead time to know that it's coming so they can plan accordingly. If you provide a new service that people have come to depend on, and you take it down for the entire work day without warning, it makes your company look like a bunch of incompetent idiots that have no idea what they're doing.
An even better idea would be to analyze the amount of growth you get right in the beginning and ramp up your infrastructure accordingly before a sudden surge in traffic finds you unable to meet the demand.
You know those parties you throw at trade shows and give away thousands of dollars worth of free swag? Stop doing that until your business is stable. How many dotcom-era T-shirts from long vanished companies are sitting in your clothes bureau right now? Spending money on good will and marketing is fine, but doing it at the expense of actually running a viable service is not. It's a good way to drain all of your VC money and have absolutely nothing to show for it.
The key ingredient in heading off a situation like this is investing in hardware, networking, and the people to support it the moment you get your first round of funding. Hire at least one system administrator, a generalist who can build out your beginning server farm and network connectivity with an eye on future growth.
Pinning your company's future on a handful of desktop computers with no backups and no failover shows a complete lack of foresight. If you can't host it yourself, rent virtualized servers in a datacenter and let their infrastructure be your backup. Usually with a service like that you can simply rent more virtual servers and bandwidth as needed.
Your IT group is essential to your online service. If there's too much for them to do and not enough people to handle it, it's probably time to expand the group. If your service is successful, your revenue will pay for it. A successful, stable service is looked upon favorably by investors and by the companies that want to do business with you. If your service is unreliable because you skimped on infrastructure, they will see you and your company as unreliable and untrustworthy.
I personally have worked at companies where they invested early in hardware and IT, and had good practices for rolling out software and hardware changes. They were structured and organized and they continue to be successful (or got bought out by bigger, more successful companies) to this day. Downtime almost never occurred unless it was planned and during a maintenance window scheduled ahead of time.
I have also worked at companies where the infrastructure was slapdash, changes were pushed out without warning in the middle of the day, often bringing the site tumbling down. Infrastructure was only expanded when the outages due to overload would have the site down more often than it was up. This kind of thinking was reflected in the company office itself: disorganized, lazy, uncaring, and quite often a mess.
Then there are the companies that have a messy infrastructure, but are willing to change. I've worked at those, too, and was glad to be part of the transition from sloppy to successful. There's a great sense of satisfaction from bringing a network infrastructure out of a tangled mess of wires and hodge-podge hardware and into a streamlined, organized operation that's easier to manage and maintain.
"Investing in the future" may sound like a cliche aphorism, but when taking into consideration how large an effect it can have on a company's network infrastructure, it doesn't sound so silly at the end of the day.