Analytics teams need dedicated infrastructure or risk being held back by their own IT departments, according to the head of data insights at a major UK bank.
The Royal Bank of Scotland, which generates copious data from more than 400 branch-based transactions for businesses per minute, has just deployed a parallel data warehouse purely for analytics.
"Technology functions are still finding it difficult to keep up with business demands. Working in analytics and being responsible for analytics, I can't let that happen," said RBS chief analytics officer, Customer Solutions Group, Alan Grogan.
"I can't turn around to my CEO or to my customers — or even just if I want to retain staff — to say 'I'm sorry, we just don't have the scalability, the flexibility or the control on the domain'."
More than four years ago, when Grogan left Barclays to join RBS, he emphasised the need for a specialist analytics infrastructure.
"What I didn't want is a position — and I've seen this in other organisations — where the analytics team are too advanced for the technology function or the technology function can't keep up," he said.
"I'm a big believer in this: analytics should run its own technology. So I rarely get my RBS group technology involved in maintaining my ecosystem."
After discussions with a number of vendors for a proof of concept for a new dedicated analytics system, including flexible environments such as MongoDB and Teradata Aster, RBS last year opted for Microsoft SQL Server 2012 Parallel Data Warehouse (PDW) on HP AppSystem.
"It's exactly what it says on the tin — it's a parallel data warehouse. It computes faster than anything else I have access to in the bank," Grogan said.
"It starts off at 75TB for a quarter rack on two nodes, so it's a fairly decent initial space but to build on top of that is something we're already looking at."
He described PDW, which entered proof-of-concept testing at the bank last November, as the latest version of SQL Server, optimised to handle big data.
"SQL server, good as it is, has some capacity issues we experienced over a certain large volume of terabytes and we needed something that was a lot more scalable, a lot more strategic and in theory could go to cloud as well," Grogan said.
Before PDW, much of the bank's analytics ran on its Teradata, Oracle and SQL Server systems.
"What you don't want is siloed information or information that you've got to work across partitions or infrastructure," Grogan said.
"So we're effectively sucking it all into PDW, having it all in one place, rather than running an ecosystem that is an ecosystem across warehouses and software stacks.
"Obviously we've got to build in feeds with legacy. We've got to build in feeds to work on future bank technology. So we're pretty much in implementation but we're making decisions on it today much quicker than we were in the olden times."
His goal for this year is to have PDW scaled up and optimised with all the information in a single store, away from legacy systems.
"Some banks have tried to do analytics on enterprise data warehouses. The tons of analytics we do, if we dared do that, you might actually stall because of the computation power that you're pulling. So we're very careful about doing that," Grogan said.
"When I say we maintain our own analytics infrastructure, I mean exactly that. We maintain a pure, cerebral infrastructure that is only used for analytics and analytical processes."
But that processing independence doesn't preclude Grogan for being a strong advocate of spreading access to analytics via a secure, governed, self-service portal where staff can research economic and portfolio data.
"That was one of the first strategies I implemented. Democratisation — I'm a big fan but it has to be done securely, adequately and all your data has to join up," he said.
"The thing about it is every stone you turn over to answer a question gives you a lot more pebbles underneath. When you democratise, you've got to make sure you keep people on the right track."
RBS, which has 141,000 employees and more than 24 million customers worldwide, has already seen productivity gains from the Microsoft PDW system.
"There are efficiency savings where you wouldn't have to wait for processes to follow through. You don't have processes that fall over because of a network surge or something in my area," Grogan said.
"We had that before. Unexpectedly, four of my guys ran massive queries and the infrastructure just stalled.
"What we had pre-PDW was an infrastructure that wasn't fully scalable and it wasn't analytically strategic and it wasn't fully cost-efficient."
Grogan's team of data scientists and database administrators provide the bank's executives and customer engagement teams with insights from three core areas.
The first is product analytics, derived from millions of transactions. They also provide customer analytics, which includes customer journeys, advocacy, insights, touch points, and mixtures of product holdings. The third area is market analytics, covering macro economics and econometrics.
"We're sitting on a treasure trove of economic and UK data. So by linking up all the businesses we do business with in the UK, which effectively is every business, we are aiming to provide at least the option to businesses to understand the risks better," he said.
These risks might be economic but they could equally be political, where companies are running operations in countries in a state of upheaval.
"The Ukraine is a good example. The moment things started kicking off, we were running analytics: 'What's your exposure, what customers might be affected by Ukraine? In the manufacturing sector, do we have raw materials coming from Ukraine?'," Grogan said.
"These are things we can provide back to businesses. They don't know they've got exposure to Ukraine — the average manufacturer has something like 160 suppliers — but they certainly will after a few weeks or months when the supplies start drying up. That's catastrophic for some businesses."
Grogan said the partnership between RBS and its business customers generates so much quality data that it will one day be able to predict gross domestic product accurately.
"The more data I can give you as a customer, surely the more business you're going to do with me because you know that the decisions I'm giving you are more empirical and more correct," he said.
"Banks just want to lend money and get it back. Too often we lend money and don't get it back. So in theory the more we know about our businesses, the more we know we're going to get our money back when we lend it out.
"If we help our customers understand their risks, they become less risky and the bank is fundamentally happier."
Grogan is planning to work with unstructured data using the Hadoop distributed computing framework.
"We haven't moved fully onto Hadoop. We haven't taken advantage of the full Hadoop capabilities of PDW but we purchased in the knowledge that it could do proper, cloud-based Hadoop through Azure," he said.
"We have more than enough data — we have petabytes of data to chuck at this thing both internally and externally. It's really the external digital feeds we could get.
"A lot of those data providers are in the cloud already and we might as well meet them in the cloud. But my technology strategists might get a bit concerned when I start making statements like that."
More on big data
- Shortage of mainframe skills looms but companies remain in denial
- How Google and Accenture's DataStax deals point up Cassandra's rise
- Big news day for Big Data as Strata conference kicks off
- Red Hat and Hortonworks unveil Hadoop big data collaboration
- What skills do companies really want on their big data team?