Why finance craves big data: A perfect storm of disruption and opportunity

The financial services domain, where real-time is measured in milliseconds, holds particular interest for big data applications and vendors for a number of reasons: architecture, regulation, transparency, decision-making, and the need for speed.
Written by George Anadiotis, Contributor

Recently the TABB Group, a consultancy focused exclusively on capital markets, published a report on Real-Time Big Data Analytics in Financial Services. That, in itself, is a tell-tale sign: big data is central to the financial sector, to the point where a specialized consultancy devotes effort in analyzing and discussing data architectures. Findings in the ITRS-sponsored report solidify this conclusion, as all the firms TABB Group spoke with reported using or testing big data analytics.

It is clear that the financial sector is a big data champion. What is perhaps less clear are the reasons and the ways in which big data is used. Factoring in the findings of another report, the 2016 Big Data Maturity Survey conducted by AtScale in cooperation with the major Hadoop distribution vendors (Cloudera, Hortonworks and MapR), leads to some interesting observations. The survey, featuring results collected from 2,550 respondents working in over 1,400 companies across 77 countries, shows that 73 percent of respondents are now in production with Hadoop (vs. 65 percent last year).

ZDNet: Predictions 2017: What financial services executives can expectTechRepublic:Why this big data unicorn isn't going to gallopTech Pro ResearchData democratization: Three critical elements for the CEO agenda

This is in line with the general feeling in the industry: Hadoop is eating the big data world. It is therefore safe to conclude things would be no different for the financial sector. Indeed, looking at Cloudera, Hortonworks, and MapR, we see all of them making specific reference to their offering and clientèle in the financial sector, and boasting success there. This is not paradoxical -- success in this case is not a zero sum game. And it's not a Hadoop-exclusive game either.

Do bears bear? Do bees bee?

The million dollar question though is "why". Why are big data vendors drawn to the financial sector like bees to nectar? The obvious answer: bees and honey -- follow the money. There's money aplenty in the financial sector, which means it can afford to pay for its vices, or needs, depending on how you choose to look at it.


Money does not grow on trees, it grows in financial institutions, and this may help explain why all the reciprocal connections with the Big Data industry.

Cloudera was the last to recently publish a press release on its activity in the financial sector, and according to its financial services big data evangelist and industry leader, Steve Totman, "financial services organizations have many challenges to overcome as they move to embrace a digital transformation including customer journeys, fragmentation, security, privacy, and data quality. They must also attract a new generation of customers who require these institutions to offer more services over a variety of channels".

Cloudera's press release goes on to identify three key areas of adding business value for financial organizations: customer insights, fraud detection and cybersecurity, and risk and compliance. Use cases cited by Hortonworks include anti-money laundering, customer 360 and customer journey mapping, risk management, regulatory compliance, wealth management, business analytics, and trade surveillance. MapR on its part more or less concurs, verifying that there seems to be a consensus among clients and vendors in the financial sector as to where value lies.

It's clear that big data has forever changed finance. Algorithmic trading served as the trojan horse to break into the industry, which now is utilizing big data in a number of use cases. If two-thirds of the general audience view big data as "strategic" or "game changing", the notion would probably soar to near unanimity in the financial sector.

One of the mojos of big data is the "store everything, figure it out later" philosophy. Assuming that all data may be useful at some point may not always be the way to go though, as according to TABB "some data have a 'use by date', as their value perishes over time. In the trading world this usually means sub-second, which is impossible to achieve if one must rely solely on batch-query approaches. It is no wonder that some in financial services see no place for big data in latency-sensitive business areas like front-office trading".

Need for speed

This is where TABB, and ITRS, make their case. The argument is that the "combination of regulations, market conditions and technological innovation is creating a perfect storm of disruption and opportunity, forcing significant change on business models and operating practices of all participants. To meet these challenges, firms must be able to find patterns in incredibly large volumes and variety of data in "real time" before the value of those insights becomes outdated. While in the consumer markets real time can mean a few seconds to a few hours, in financial trading one second is way too long".

The push towards real-time, streaming big data is one of the key trends of the industry. This is echoed not only in the results of the maturity survey, showing rising adoption of Spark and Spark SQL, but also in anecdotal evidence reported by key industry figures such as Cloudera CEO Tom Reilly and data science director Sean Owen, Hortonworks co-founder Alan Gates, and MapR author and big data consultant Ellen Friedman. When touching upon the issue in recent conversations, they all reported empirically that their clients are increasingly and invariably moving towards a paradigm that allows them to access data at real-time.

ITRS has been around since 1995 and boasts over 170 leading global clients, including nine out of the top 10 investment banks, and had a platform designed to address the needs of their clients in the financial sector in place and operating for years. As Justo Ruiz Ferrer, ITRS CTO reported in a recent conversation, for ITRS real-time had always been a first class citizen.


Would you think a consultancy specializing in finance would have an opinion on data architectures? Signs of the times. Image: TABB Group

The problem however was that this platform, like many of its era, could not handle scale-out. Scale-out use cases are 56 percent more likely to yield tangible value and pretty much a requirement for clients in the financial sector working with massive amounts of data. Hence, enter Valo. Ferrer was hired a few years back with the mission to redesign and re-implement ITRS' core platform to address requirements of the big data era.

Ferrer discussed extensively details of breakthrough architectural choices made for the new platform, which now has gotten a life of its own under the Valo brand is looking to break out of the financial sector to address other markets as well. His team designed and built Valo from scratch, based on three principles: scale-out, distribution, and real-time.

Valo has evolved into an ecosystem of its own, offering much of what the Hadoop ecosystem does and claiming superiority on the grounds of having a truly distributed architecture (no Zookeeper required) that works utilizing techniques such as semantic routing, and an optimized architecture for real-real time processing, as opposed to Spark's micro-batching.

Does this really matter enough to pose a threat to Hadoop's known and tried ecosystem, massive visibility and support? When asked to comment, Cloudera's Reilly noted that they are able to offer sub-second responses and not getting any complaints from their customers, who are using Spark as a real-time big data engine for all intents and purposes except maybe high-frequency trading. As for true distribution? In practice, Zookeeper nodes live in highly protected environments that all but eradicate the SPOF vulnerability. So it all depends on how fast and fail-safe you really need/want to be.

Decisions, decisions

But this is not the only part where things can be somewhat subjective. TABB's report states that "the data architecture supporting trading is complicated; even big players can get it wrong, as evidenced by unprecedented fines on banks worldwide. Hadoop is about three years old; an age for some, but while digital natives have moved on to exploring how to leverage realtime big data for AI applications, most of the financial services world is still just getting their heads around the basics."

There's a couple of oddities here. First, Hadoop has been around for a full decade now. Hadoop 2.0 and YARN are about three years old indeed, and since this has enabled real-time processing this may be the reason for the confusion. Second and perhaps most important though, how would data architectures correlate to fines on financial institutions? Especially when taking into account the fact, reported in the same paragraph, that "between 2009-15 the top ten US & European banks have been fined ~$150 billion for failures and wrongdoings, such as rigging FX rates & money laundering, wiping out about 14 percent of their equity capital."

This line of reasoning seems to imply that the "wrong" data architecture would be to blame for failures and wrongdoings leading to fines. It would be pretty hard to sell this argument to people on the street, as public opinion polls show that 70 percent of respondents believe "most people on Wall Street would be willing to break the law if they believed they could make a lot of money and get away with it."

But this line of reasoning is probably not meant to be addressing people on the street. It is addressing industry influencers, pundits, and decision makers. And for them, it may make more sense. When discussing this oddity, Ferrer distanced himself from this reasoning, but was also quick to point out that data architecture may actually have something to do with fines, the connecting thread being regulation.


Keeping track of activities in financial institutions is a form of forensics.

At least some of the fines, the argument goes, were imposed because of the financial institutions inability to keep up with regulatory demands, which in turn is associated with "wrong" data architectures. Hortonworks general manager of financial services Vamsi Chemitiganti, has also elaborated on how big data "can become critical to efforts to shut down the flow of illicit funds across the globe thus ensuring financial organizations are compliant with efforts to reduce money laundering."

Interestingly, while some accounts of how financial institutions utilize big data are available, the same does not seem to hold for regulators. So we don't really know what their use of big data is and whether the "right" data architecture would make them more efficient. As opposed to financial institutions, public opinion on regulation is divided. Perhaps shedding more light on regulation activities, which are clearly data-centric, could help form a more informed opinion.

Ferrer used the forensics metaphor, pointing out that what regulators do resembles collecting evidence for crime investigation. Reilly pointed out that big data providers can help by providing the tools for more transparent operation and decision making, but in the end of the day it is humans that make the decisions. Or least, that has been the case so far: the wave of fintech innovation has recently set its sights on the goal of fully automating financial operations. Whether this would be possible and/or a good thing, is a different discussion.

The article features fragments from interviews with Cloudera CEO Tom Reilly and data science director Sean Owen, Hortonworks Coi-founder Alan Gates, MapR author and big data consultant Ellen Friedman, and ITRS CTO Justo Ruiz Ferrer. An earlier version of this article that erroneously stated that "76 percent of respondents are using Hadoop today vs. 71 percent last year" has been corrected to indicate that "73 percent of respondents are now in production with Hadoop (vs. 65 percentlast year)."

Editorial standards