National Australia Bank (NAB) revealed it has learned some lessons in clustered database development as part of the bank's move to a real-time data warehouse.
At first the bank had used the same principles to connect multi-node clusters as it had with stand-alone servers, which it discovered wasn't ideal.
"We built our re-architected database as two separate clusters, and we realise now that we made a mistake," Manny Avellino, senior database administrator, National Australia Bank, told delegates at Oracle's OpenWorld conference in San Francisco yesterday.
"Because they were separate clusters, we could not share the disk allocation, and more importantly, we could not share
the node capacity," he said.
"This has resulted in some performance shortcomings over the last three to four months as we start to migrate more and more processing from our old data warehouse," he said without elaborating on the shortcomings.
For the past few months the bank has been transferring data from its legacy data warehouse -- commissioned in 1998 -- to a new one that will be able to process data in real-time. The legacy warehouse uses a single Oracle 9i database that was not dedicated to data warehousing, and suffered capacity issues.
The fledgling data warehouse is based on two Oracle 10G RAC (real application cluster) databases (version 10.2.02) on Red Hat Linux -- powered by Dell Itanium 2 servers.
The NAB had to reconfigure its set-up by moving a node from one cluster to the other.
"We could've avoided the reconfiguration had the two environments shared a cluster," said
Avellino, adding this type of design was "critical" to the success of clustered databases.
Avellino said the new warehouses's current restrictions on multi-node throughput should improve with a disk connectivity redesign that the Bank expects to conduct with its storage partner EMC from November.
The two environments consist of the operational data store and the analytic, or presentation, data store.
The operational data store will receive data from 100 cluster-source systems, and only be accessed via applications, not end users, according to Avellino.
This data -- consisting of record-keeping data of up to 10 years old, data marts, and real-time
reporting applications -- will later be fed to the presentation data store, which can be accessed by all end-users.
Currently this database holds about 11 terabytes of data, with growth of three or four terabytes expected each year, according to Avellino.
These environments were the first multi-node database clusters built in NAB's data centre, said Avellino.
Avellino said the new warehouse's current restrictions on multi-node throughput should improve with a disk connectivity redesign that the Bank expects to conduct with its storage partner EMC from November.
The NAB is also expecting improved throughput for the warehouse following the deployment of Red Hat 4.
"Red Hat 3, our current configuration, has on average only given us less than a 56k operating system block size to the disk," he said.
However, the bank could have adopted Red Hat 4 in February this year when it had approached the limit for connected SCSI devices, according to Avellino. But it chose to resize the devices rather than wait on certification of some of the components.
Now the bank has opted to go ahead with the Linux upgrade.
"Our Red Hat 4 testing has lifted this average operating system I/O block size to around 256k, which again will give us significant performance gains over Red Hat 3," said Avellino.
"So that migration is currently taking place as we speak."
The new data warehouse will come into full operation after the decommissioning of the existing one.
Steven Deare travelled to Oracle OpenWorld as a guest of Oracle.