(Note: In previous version of this post, I referred to the Financial Industry Regulatory Authority (FINRA) as a "federal" agency. FINRA actually is an independent, not-for-profit organization authorized by Congress to act as watchdog over more than 4,015 securities firms with approximately 642,980 brokers.)
What does an independent watchdog agency charged with tracking more than 30 billion financial trades a day know about cloud computing? The Financial Industry Regulatory Authority (FINRA) recently moved its mission-critical systems to the cloud, and is sharing details about how it went. At the recent Strata + Hadoop World conference, Jaipaul Agonus, technology director at FINRA, provided the six key lessons FINRA learned as it moved into the cloud world.
Two years ago, FINRA started an effort to reinvent the agency's platform from an in-house data center to a cloud-based environment that could support rapid access and viewing of key data. FINRA adopted Amazon Web Services (AWS) to support its new infrastructure, and used a host of open-source technologies to reduce costs.
Here are five key nuggets:
Partition by business function, not by platform: "We had data on appliances, physical storage spaces, and even tapes at FINRA," Agonus says. "Today, all our data can live in Amazon's S3 cloud." Getting there, he explains, doesn't mean replicating these platforms. Instead, data should be partitioned along query boundaries, based on final use cases and jobs to be performed on the data.
Think clusters, not constraints. With on-premises systems, everything had limitations, and this drove systems planners' thinking, Agonus said. With cloud, all this thinking become irrelevant. "Today with endless on-demand compute available, you have to ask a very different set of questions," he says. That includes choosing the right cluster configurations. This requires testing. "For instance, if you think one batch will need a compute intensive cluster with 10 machines, test your theory first. How did it work? Did it work like you expected?"
Expect constant change in cloud services. "When using legacy systems, you knew the technology you would work with was relatively stable for the life of the device," says Agonus. "Moving to Hadoop, Hive, and Amazon means tossing aside these assumptions. These tools have rapidly changed over the past few years and will continue to." He recommends abstracting execution frameworks, then, "as technologies change or different use cases arise, simply choose the right back end option to process your data."
Choose optimal storage formats. FINRA's move to Hadoop meant the team would decide how data would be stored -- versus the previous legacy approach in which file storage took place behind the scenes, Agonus explains. Data format decisions need to be shaped by the use cases. "At FINRA, we look for a balance between speed and time as well as the option to split files across multiple nodes.Once again, knowing your use case and needs is critical. From there, it is easier to choose the best file format, compression, and storage for your needs."
Make the shift slowly, and manage potential risks. The challenge is many of the new-age technologies FINRA is adopting are still relatively immature, versus legacy systems that have been around for years. Errors and bugs will be inevitable, so test as much as possible, Agonus says. "The best way to prepare for these risks is to have a mitigation strategy. For instance, during migration we ran our cloud analytics in parallel with our legacy system and ran comparisons for over a year before turning them over to production." The tests helped the FINRA team identify any underlying issues. "Only by comparing the data processed in the cloud to our legacy systems were we able to see the issue and ensure that the bugs were fixed by vendors.We've maintained our legacy and cloud infrastructure in parallel during this migration, slowly shifting more and more to the cloud. Our full transition will occur in mid-2016."