Fast and furious fraud detection taps big data

Keeping on top of fraud in banking requires banks to be fast-acting, something that is made even more complicated in the world of big data.

Big data has a role in helping to detect fraud, but sifting through that data can also mean big waits — the last thing a customer wants when transactions are moving to real-time. So how do banks use big data while keeping it fast? According to Terracotta CTO Greg Luck, most of them do it by keeping the data in RAM, not disks.

According to Luck, the company's technology is behind the majority of payment transactions in the world, including Visa and PayPal, and just like big data booming in its own right , it's become increasingly necessary for financial institutions to preserve the "instant transaction" user experience, due to the tight response times needed.

"If you want to try and prevent fraud in real time, then the systems have to be fast enough that the computation can get done ... all within the [service level agreement] for processing a credit card," Luck said, referring to rules that banks check against, to determine if a transaction is legitimate.

In a traditional SQL database, getting the data for these rules could involve any number of complex SQL statements chained together, which slows down processing time. According to Luck, the first step in eliminating this slow-down is to use a key-value store.

"A database is ... slow sometimes because you can create infinitely complex queries that can take a long time to execute," he said.

"For a key-value store, to basically look up a key on the server is just a hash, so it's nanoseconds."

Luck said that the execution of rules and code by the CPU is already fast enough, and that the bottleneck is typically due to delays introduced by having to wait for hard drives to physically spin up as they sought information. This delay eats into the number of data accesses that can be achieved within response-times, effectively cutting into what transactions can be checked against. This, in turn, results in a higher cost for financial services firms, as it can't capture as many fraudulent transactions.

However, by moving the data away from traditional disks and into RAM, Luck said that the number of data accesses could be increased between one and three orders of magnitude.

"If you can drop that down from days, to hours, to minutes, to seconds ... you can detect and cut [fraudulent transactions] off within seconds, once the pattern is established. That's a very large saving."

Luck said that for a traditional relational database, the best case scenario might be 10 milliseconds to access data, but by placing it in RAM, he said that it was typical to see 0.25ms to 0.5ms access times.