With the introduction of its Operational Sustainability benchmark coming on July 1st, the Uptime Institute will have a benchmark to compliment their 10 year old data center tier standard. This new benchmark addresses the long term ability of a datacenter to avoid operational outages.
But rather than evaluating the datacenter facility the way the tier standard does, this new benchmark evaluates the datacenter's processes, staffing, and location. As the full details of the benchmark are not yet available it will be some time before the effectiveness of this benchmark will be able to be determined. For example, we don't know yet if the location evaluation method will doom datacenters in Iceland to a poor benchmark rating or if any datacenter in earthquake prone California will be able to pass muster.
Processes and staffing are clearly an important issue, and metrics for evaluating how well a datacenter staff is trained and how well proper procedures are documented and implemented. For example, Google was clearly of the opinion that their datacenter staff was well trained and that they had proper procedures in place to prevent downtime for AppEngine users, yet they had a significant outage earlier in the year because the procedures for dealing with the problem that datacenter staff encountered after a power outage were not properly documented nor was there appropriate staff on-site to deal with solving the problem.
The Operational Sustainability ratings will fall into three sliding scale classes that are based on the tier level classification of the datacenter; the higher the tier rating, the more stringent the requirements for each class. Obviously, this benchmark will only make sense for datacenters already tier-rated by the Uptime Institute, though the Institute does say that the tool will be valuable for self assessment.
Only the actual availability of the benchmark will allow us to judge its practical value; in light of issues such as the one that took down Google, I'm not sure how well an outside entity can evaluate processes and procedures for any given datacenter. As the military says, no battle plan survives contact with the enemy. So how well can untested plans be evaluated?