On any list of businesses that can't afford downtime or system failure, power companies have to be close to the top. So when New Zealand electricity and gas generator and retailer Genesis Energy experienced a series of flaws in its backup and recovery systems, it had to act.
Over three years of use, unexpected hardware and interface errors were causing backups to take longer than expected and making it impossible to guarantee targets for system recovery could be achieved.
Genesis Energy is a state-owned enterprise with a diverse electricity generation portfolio in New Zealand. It owns and operates 1,640MW of electricity generation, including New Zealand's largest thermal power station at Huntly. It has approximately 700,000 customers
Around NZ$1 billion in revenue
Right across the business of a typical power company, so-called mission critical systems abound. Genesis Energy, a state-owned enterprise, is no exception. If anything, its needs may be more complicated than some.
The company, with revenue of close to NZ$1 billion a year and 700,000 customers largely in New Zealand's North Island, operates a wide range of electricity generation operations including the country's largest coal-fired power station, in Huntly, south of Auckland. It also operates a wind farm, hydro electric generation and cogeneration with large industrial companies. It is a wholesaler as well as a retailer of electricity and retails and explores for gas.
The company's IT infrastructure is similarly heterogenous featuring a server layer of 150 PC servers running Windows and Linux as well as 10 high-end Sun Solaris servers. These provide IT service to nearly 1000 internal and outsourced staff on 600 PCs and laptops.
Key applications include customer billing and provisioning systems, energy trading systems and a range of database management systems including Oracle and SQL as well as Exchange.
Retail energy customer billing is outsourced while large business and wholesale customer billing is managed internally. However, for backup purposes, all customer data eventually falls back within Genesis Energy's regime. Call centre and most IT service are also outsourced to a range of third-party providers.
The company requires its customer applications be restorable within two hours of any outage, trading applications within four hours and all other applications within eight hours. Each day data has to be backed up within a 12-hour window starting at 6pm. Full backups, across three major and four satellite sites, require 20TB of data while partial backups require 5TB.
But persistent technical problems, including SCSI and hardware errors, meant compliance with these deadlines could not be guaranteed.
Until late 2005 Genesis Energy was using CA's BrightStor. Partner manager for CA in New Zealand, Mike Ferguson, says the issues with the system were -unfortunate". He says backup and recovery processes were under the control of one of Genesis Energy's outsourcers at the time. CA did not have direct control and could not enforce changes to procedures.
Despite CA providing training and operational advice, Ferguson says, the partner failed to follow -basic operating procedures".
The backup and recovery process works hardware pretty hard. CA identified hardware problems as well, but these were also beyond its control.
In early 2005, Genesis Energy decided it was time to review its shaky backup systems -- and to go to market for the best solution available.
However, there were other drivers of change including concerns about corporate governance and compliance. Mike Roigard, service delivery manager and project sponsor, explains that this was largely in anticipation of new requirements rather than in reaction to them. -We were looking at what we may have to do. We look at international markets and see what's happening there."
Roigard's sponsorship of such a project could be considered unusual. Projects involving corporate governance or compliance are often sponsored by one of the C-level team. It is on them that most compliance accountability rests. Roigard says the flat management structure at Genesis made it possible for him to sponsor the project with the support of his CIO.
Three different products made Genesis Energy's backup and recovery shortlist: CA BrightStor, Veritas (now owned by Symantec) and CommVault Systems Galaxy software. Evaluation was quick but thorough and included talking to other users and site visits.
Sean Heffernan, solutions architect, says the team was looking for a solution that was robust and scalable and would handle all of the company's key applications. But they were also seeking an integrated system, one that made it possible to achieve a single view of the company's various environments and IT layers. On top of that, any system had to be easy to use for the operations team.
The selection was also made with an eye to future requirements. Heffernan says the change was not really considered a backup and recovery project. It was thought of as a data management project. As such much of the evaluation came from a data management perspective focusing on functionality such as migration capabilities and archiving.
The most pressing issue faced was implementation time. Roigard wanted the project completed before the summer holiday season. He did not want to find himself and his team having to cope with a hybrid of old and new systems over that period.
Heffernan says that tight timeframe also presented technical challenges but also challenges in managing change.
After weighing its options, Genesis decided CommVault's Galaxy was the best fit.
Software implementation was completed by CommVault and Genesis' partner Gen-i. At different times over eight weeks up to 15 people worked on the project which came in on time and, according to Roigard, -slightly under budget".
Heffernan says no code or interfaces had to be written for the system to work in Genesis' environment but some changes were required to the configuration of the company's Oracle databases.
The IT team at Genesis holds daily morning meetings to discuss activity. Roigard says in the past it was a rare day when some form of trouble with backup or recovery did not surface.