Home & Office

Data analytics must be MAD

newsmaker Co-founder and president of Greenplum, Scott Yara, shares his thoughts on challenges in data warehousing and analytics and explains how new set of principles can change the industry and social media.
Written by Liau Yun Qing, Contributor
Scott Yara

newsmaker Founded in 2003, Greenplum's mission was to reinvent the database industry by building the world's fastest and most scalable database system running entirely on software.

Taking ideas from parallel computing--where large calculations can be solved quickly by dividing the problem and processing them simultaneously on different computers--Greenplum's founders wanted to create a software that could leverage commoditized computing on cheap servers to build fast data warehousing and analysis systems.

Co-founder and president, Scott Yara, believes the company has the only true proven product with the capabilities to do this. During a trip to Singapore where he was a speaker at CloudAsia 2010, Yara sat down with ZDNet Asia and revealed the origins of the company's fruity name, as well as his thoughts on bottleneck that currently plague the data warehousing and analytics industry.

A second-generation Asian-American born to Japanese parents, he also touched on how advertising has advanced in the age of social media, and explained the relevance of his company's MAD Skills in data analytics.

Q: How did the name Greenplum come about?
Yara: Silicon Valley is synonymous for great engineering but not always for great branding. Luke, my co-founder, and I were trying to come up with names for the company, but we came up with really bad ones.

One night, an employee asked his daughter what we should name the company. She said, "Daddy, you should name it Apple." When he told her that was great but someone already has the name, she said: "Green plum." And it happens that his favorite fruit is green plum, so he came back and told us the name. And we picked it.

For me, names are really what you put into them--your time, your integrity and commitment. They're just vessels in which you put your investments.

What were you involved in before starting Greenplum?
I had only been in one other company in my short career life. I was involved in starting SandPiper Networks which did content delivery networks. We deployed thousands of servers around the world and delivered content from the server closest to the users.

It was a time when the Internet was all the rage. In 1999, we ended up merging with Digital Island, a publicly traded Internet infrastructure services company. Then came the Internet bubble and we ended up selling our company to Cable & Wireless.

I started another company in 2000, which eventually became Greenplum. We raised a lot of capital and came out growing really fast but the Internet crashed right after September 11. We went through a really tough time, especially in the United States. I would say that period was probably worse than the recent recession was for tech companies.

We spent the next couple of years working with a small number of customers to build custom data warehousing systems. It was from this that we saw the challenges in data warehousing and analytics.

I was later introduced to Luke Lonergan, the other co-founder of Greenplum who was in his own right, a pioneer in supercomputing. He was running his own company and building really big computers out of little computers for the government.

Luke was thinking about how database systems could deploy the same parallel computing technologies. So we basically put our companies together in 2003 to form Greenplum.

What were the problems you were trying to solve, and where are the bottleneck in today's database warehousing and analytics industry?
The first 20 years of the database industry was very focused on transaction processing. People were using database systems to build banking systems, human resource systems and customer relationship management (CRM) systems.

Today, what's happening is that all these organizations are interested in analyzing the information collected and find out more about the customers. But, what they find is they cannot use the same old database systems to do analysis support as transaction processing databases don't scale well.

We have customers coming to us and saying that they know data is a very valuable asset but are really still struggling with the technology and the best practices.

Computing itself has become a hundred or a thousand times faster and cheaper in the last 10 years. It doesn't make any sense for the methodologies not to change with the platform.

The Greenplum opportunity was to build a database system that is designed from the ground to handle very large scale data analysis, and I think we have the only true proven software solutions to do this. Our real competitors in this space are hardware companies that build database computers, and not general purpose computers, which are fast but very expensive and very proprietary.

We're also seeing a wave of commodity computing with cheap servers that have Intel processors and industry standard parts. And we've seen a lot of successful next-generation Internet companies pave the way to prove that by using the right software, you can have parallel processing that is much faster than using traditional proprietary way of computing.

How has social media impacted your industry segment?
We have clients such as Fox Interactive Media's MySpace, Tagged and Skype that have their own social networks. These companies are collecting huge amount of data from their social networks using our software.

What we're finding is that companies want data from social networks to identify what people are saying about. It's a new area where social network analysis is becoming popular.

We're also seeing social media changing the face of advertising. With social media, there is an incredible amount of information about individual users and this makes it possible for advertisers to hyper-target. (Hyper-targeting ads target users based on their personal information as listed on their profile pages.)

You will need deep data analysis tools to find this niche consumer segment. Our customer, Fox, is making a multimillion-dollar business with hyper-target ads, and can now go to advertisers with the ability to support ads for very targeted audience.

For example, they'll be able to sell to male customers, aged 19 to 22, who live in the West Coast, and recently bought a skateboard. This form of targeting is the holy grail of advertising.

Advertising had always been a very sectored and affinity-based medium. Advertisers would buy advertising space on related Web sites, hoping that it fits the profile of the general audience. Now we're beginning to get to a point where in almost real-time, you are served an ad that is individually customized for you.

You wrote a blog post on Huffington Post about how "data rules the world". But users are concerned about their privacy, especially with social networks such as Facebook that change their privacy policy whenever they want to. As a consumer, would you still want to have your data out there?
For me, Facebook's privacy issue is largely not a technological problem. There are lots of options to make sure your information stays safe and shared. What needs to emerge is a new set of standards and policies around privacy that companies have an incentive to adhere to.

The creation of data and the sharing of data is almost inevitable. It's sort of like the development of Internet itself.

People were worried when e-commerce started happening on the Internet about privacy, identity theft and security. But the value and benefits to use the Internet for e-commerce was so far and beyond the risks that it just kept moving forward.

Ten years later we've taken some good steps forward, and though identity theft still happens on the Internet, people still continue to do shop online.

I feel that data is the same way. It is inevitable that consumers and individual continue to share more information. There's no turning back for that.

Greenplum as a player in the industry is interested to help create awareness, standards and partnerships to protect the interest of people who want to maintain their privacy.

You were at CloudAsia 2010 to talk about your company's MAD Skills, can you elaborate on it?
It started out as an academic paper we authored alongside University of California, Berkeley, and our customer MySpace, about some of the techniques we were using for hyper-targeting.

Although MAD Skills was named because the professor wanted "MAD Skills" to appear on the paper, it became a real acronym for a set of principles that are really changing the way people analyze data. It stands for magnetic, agile and deep.

Magnetic means that now with technologies, such as those provided by Greenplum, you should create an environment where you store every piece of information of the customer or data generated by the company. And that's very different from 20 years ago when companies were very carefully bringing in data.

Then, there is agility. Now because you have systems that can not only store but also process huge volumes of data so fast, that you want to get into a much tighter cycle of innovation around the data.

And the last one is deep. It's about the sophistication of analytics. There's a new class of mathematics to make sense of the meaning of data. We're now seeing a new profession, called the data scientist, that's having a role in the industry today. It has a blend of computer literacy with mathematical and statistical background.

These data professionals have backgrounds in physics, bio informatics and mathematics. And we're seeing these people leave the academic to apply their skills in data analysis.

I've spoken to some startups, and for them being acquired seems to be their ultimate goal. What's your view on this?
I consider myself old-school but I really like the history of Silicon Valley. A lot of the early people who came to the valley really wanted to build a technology company.

As for money, I don't think it's a bad thing. There's a lot of money to be made in the technology industry and some people are more focused on exits by selling companies or by taking them public.

For Greenplum, we're really committed to try and build a company. We have investors who expect returns on capital and we have companies that are interested in buying us.

I don't know what the future holds, but what I do know is that we have a collection of people here who want to build a great company. They come to Greenplum because we are trying to do something special.

There's no guarantee that we will achieve our objectives, but the focus and opportunity is quite clear.

Editorial standards