Cloudera and Mount Sinai: The structure of a Big Data Revolution?

Can a disruptive molecular biologist and the leading company in the Hadoop ecosystem make medical research change its methodologies for the better?
Written by Andrew Brust, Contributor

Even if the bulk of Big Data applications to date have been in the realm of business, the application of the technology to science beckons.  Yes, Hadoop and other Big Data technologies have been revolutionary in the sphere of computer science; but applying them to the realm of natural science has the potential to change lives.

It also has the potential to save them.  It seems almost self-evident that the combination of Big Data and medical research could have a profound affect on the common good.  That's exhilarating, but companies and institutions have to take concrete steps; otherwise we will be mesmerized by Big Data's potential and we won't actually get anything done.

Big Research rock stars?
Enter Cloudera and the Mount Sinai School of Medicine.  On July 3rd, the two organizations announced that they will be teaming to solve medical challenges with Big Data.  Cloudera is without a doubt the Hadoop ecosystem's poster child.  As a native New Yorker, I can attest to the impeccable reputation of Mount Sinai Medical Center and its School of Medicine.  Put the two together and you have the makings of a dream team.  But this goes beyond the stature of the organizations, because each party in this collaboration is arguably putting its best people on it.

In a telephone interview, Cloudera CEO Mike Olson broke it down for me.  He explained that Cloudera's Chief Scientist and co-founder, Jeff Hammerbacher, would be leading the Cloudera's efforts.  Olson explained that Hammerbacher leads Cloudera's data science team and has a strong passion for exploring ways that Hadoop and its stack of technologies can be applied to academic research in pragmatic, results-oriented ways.  Hammerbacher is currently on his honeymoon but upon his return will be spending a full 25% of his time on the Sinai project.  That's quite an investment given that Hammerbacher is the Chief Scientist of what is perhaps Big Data's chief company.

Sinai's best Schadt
Meanwhile, the Mount Sinai side will be carrying its weight, and then some.  At the helm of the team will by Dr. Eric Schadt, who leads Sinai's Institute for Genomics and Multiscale biology.  Independent of this teaming with Cloudera, Schadt has been an ardent advocate for applying technology to the realm of genomic research.  Schadt's a rather charismatic character, having been the subject of a profile in by Tom Junod in Esquire Magazine last year.  Schadt is described in his Mount Sinai bio as "a visionary in the use of computational biology in genomics."

In his Esquire profile, Junod describes Schadt's dissident views regarding established molecular biological research.  Schadt feels that biological systems should be modeled with greater complexity and that today's breakthrough technologies, including Big Data, in combination with the last decade's worth of genomic data, should be applied to make that feasible.

Extraordinary science
Schadt believes molecular research is in crisis, and is undergoing a paradigm shift.  These are terms and concepts introduced Thomas Kuhn's book The Structure of Scientific Revolutions, published in 1962.  50 years later, Schadt, who is greatly influenced by Kuhn, is adamant that research in the understanding and treatment of disease must change radically in order for meaningful progress to be made.  Schadt has degrees in pure math, computer science and applied math, and his PhD is in biomathematics.  But Schadt has also done stints at Big Pharma companies Roche and Merck. As such, Schadt has an outlook on molecular biology that has industrial as well as academic rigor.

Put all this together, and it certainly feels like there's a real chance of important discovery coming out of this work.  And because Cloudera is involved, the findings won't be relegated to medical journals.  Cloudera's CEO told me that the company intends to be very transparent about the work.  He advised me to keep an eye on the Cloudera blog for reports on it.  My own hope is that such posts will be accessible by non-medical professionals, including ZDNet readers, not to mention myself.

Back to business
Speaking of the non-medical sphere, what will the benefit of this work be to the application of Big Data technology in business?  Olson said Cloudera thinks the collaboration with Sinai will lead to a strengthening of the Hadoop platform, including CDH, Cloudera's open source Hadoop distribution.  Not a CDH user?  That's OK, because any changes made there will find themselves checked in to the core Apache distribution as well.

As such, the potential benefit of the Cloudera-Sinai efforts to Hadoop are significant.  And, just maybe, the contribution to medical science and treatment of disease will be profound. 

Editorial standards