We reported yesterday that Ayasdi, a startup based in Palo Alto, California, raised $30.6 million from a star-studded group of investors (General Electric, Citi) for technology that approaches complex datasets in a new way to give companies insights that help them operate more efficiently.
You'll be hard-pressed to find the buzzword "big data" on the company's website, though. The term has been commandeered by companies promising to query massive datasets more quickly, without necessarily changing the fundamental approach. To Ayasdi, that's "big data" in literal terms, but it avoids solving the real problem. It's not the means that's important, it's the end result.
The company says its tech can find answers to questions researchers didn't know they needed to ask. That's why the startup has raised so much money, and why the biggest pillars of the economy—from pharmaceuticals to financial services—are knocking down its door.
We spoke with co-founder and chief executive Gurjeet Singh.
ZD: Tell me a bit about how Ayasdi came to be.
GS: We started this company because we thought it would have a huge impact on everything.
Back in 2000, DARPA [The U.S. Department of Defense's research arm --Ed.] and the NSF [The National Science Foundation, also an agency of the U.S. government --Ed.] realized that they were spending hundreds of millions of dollars on scientific research. But the way that people were doing science had changed. They were creating large, complicated datasets. People who were making the best, most complex datasets were probably not the best people to analyze them. Biologists don't know math and statistics as well as mathematicians and statisticians do.
So they began investing in research for this problem. They had the vision to see that with an ever-increasing volume of data, research was going to become more difficult.
We were lucky in that we were part of a program about "topological data analysis." Over a decade, we published many papers. We proved that we could generate insights from data for many applications. In 2008, we were done with the research and confronted with what to do next. So we started Ayasdi. We felt that in academia, we would not have as much impact.
When we started, we were fortunate—we applied for a DARPA small business research grant, and got it. We were very lucky to get that grant. It took us a full two years before we could take that research and create a prototype. We knew it would take us a long time to build a usable product; in academia, you don't worry about building a product that's usable by a large number of people.
In 2010, we had a very early prototype that we could take to customers and show them how it worked. So we took seed financing from Floodgate. We started selling a product in January 2012 even though we didn't have any businesspeople in the company. Based in early sales, we raised more financing and starting building. We launched the company publicly in January.
Since then, our phone has been constantly ringing. There is a lot of demand in the market.
ZD: Did you know early on that there would be a market for your research?
GS: For two years, we did not grow in size. When we had a prototype, one of our advisers—[entrepreneur] Steve Blank; he's popular with the lean startup movement—advised us to go meet with companies for feedback. So that's what we did for three or four months.
We would take any meeting with any company. Everyone could come up with 10 different use cases [for the technology].
ZD: What feedback most surprised you in those early meetings?
GS: We had a very polarizing story: people would totally get it, or totally hate it. "You're saying the machine will find this information automatically? I've been trained to ask questions. OK, it maybe works with your three examples, but it won't work with my complex reality." And other people were, "Oh heck yeah. Have your software tell me what I'm missing."
ZD: So there was a possible education problem.
GS: We felt that there was going to be a huge market, but we knew there would be a huge education need. Any disruptive technology that has become mainstream has had this problem. The guy who said that there will be a PC in every home, for example. If you're going to be a force in the market, you will have skepticism. So that was encouraging to us.
ZD: Many startups solve a narrow problem; your technology can be applied in a number of different industries. As a startup, how do you focus, with so many potential applications?
GS: It's a constant issue. The biggest fear in my mind, when we have such a horizontal technology, is dying from lack of direction. We don't want that. Is the problem worth solving? We ask ourselves if a problem is expensive enough. If the data is expensive, like in oil and gas—collecting a batch costs many, many dollars—we do it.
Secondly, is there ongoing operational use? You have data, use it once, and it's done. If the problem isn't repeatable, we don't go solve it.
We're focusing on five things: the public sector, pharmaceuticals (such as drug discovery), oil and gas (which tracts of land should they buy or bid on, drilling direction, machine maintenance), manufacturing (predictive maintenance) and financial services (regulatory testing and fraud).
All of these customers are using the exact same product, though. We anticipate in the future that we'll have to specialize the user experience, in about a year or so. But not the underlying technology.
ZD: We've established that companies see a lot of potential for your technology. How is it actually going to change things?
GS: I'm going to say something rather controversial. Big data, as people understand it today, is just a bigger version of small data. Fundamentally, what we're doing with data has not changed; there's just more of it. We're not talking about solving a new category of problems. "Big data" has not been effective at solving these problems so far. From a software or customer perspective, that means that someone buys software that processes queries. The input and output systems have not changed. But still the industry has been able to grow by making these systems faster—which is not trivial!—but that's the direction. Faster, more data, cheaper.
We present a fundamental change. Current systems start with a question. To learn something from your data, the forming of a hypothesis lies with the human being, which turns into a query, which becomes a result. The problem is that there are too many queries to make, too many questions to ask. The number of queries in a large dataset is exponential, and it's growing exponentially. No matter how fast you make your system, you're never going to be able to get all that information.
Our system uses advanced mathematics and statistics and machine learning to automatically discover information from your data. It just tells you answers. There are many domains in which you know what you want to know—in a factory, for example, you want to know how many kilograms of detergent you made in the last hour, to know how efficient you are. Or in drug discovery: a very standard dataset will have tens of thousands of rows, and every row is a patient. And there are millions of columns.
There is still a human being in the loop. You just amplify their effort massively.
ZD: How do you present this to potential customers? Given that it's a different approach from what they're expecting.
GS: We don't use the term "big data"—not on our website, not with customers. Saying it sets up expectations, the wrong expectations. It's not about how big the data is, it's about the complexity. Pharma companies will tell you that they've had big data for 20 years now.
A pharmaceutical company is differentiated today by how it uses its data. Not by its biologists or facilities. They love [our technology], they get the idea, they understand the need for it.
One experience we had, we were talking to a pharma company and one of our data scientists was in a meeting and there was a skeptic in the room. He gave our data scientist a dataset that had a team of 10 people studying it for 11 months. Our guy was able to discover everything they had discovered in 11 months, plus more, in 20 minutes, in the meeting with people there. Needless to say, we made that sale.
There are enough people who believe in it, but there's still disbelief.
ZD: To whom in these organizations do you sell?
GS: In pharmaceuticals, there are franchise heads. In oil and gas, it's the VP of operations. In financial services, we sell to a wider variety [of executives]. In general, we don't sell to IT; we sell to business.
One of the fundamental ideas in our company is that the world needs more data scientists. To be one, you have to understand mathematics, computer science and have domain expertise. Our platform amplifies the effort of existing data scientists and converts domain experts into being as effective as data scientists. Most often these people are business people.
ZD: Are data scientists happy about this?
GS: (laughs) They're happy about it. They're perpetually overworked. There's too much to do. And they're not valued in some organizations.
Data scientists have existed for a long time. We're just calling it something different now. The intersection of these three skills existed before. When we hire them, the people who we hire as data scientists would not call themselves that; they'll have a Ph.D in bioinformatics or something.
ZD: So, about that funding.
GS: The point about lean startups is not to starve yourself. It's to be scientific and figure out where you need to spend before you spend it. We were very lean until we could figure out exactly what we wanted from it.
We've closed very large, very significant deals with customers and have a large pipeline of potential customers. B ut we don't have enough sales people. So the first order is to scale up our business operations significantly. The second bucket in which we're investing is making the platform better still. From a technology perspective, we're much more of a distributed computing company than a math company, and we are still heavily investing in the core fundamental platform. We keep adding more algorithms to the system, scaling it to handle ever-larger and faster datasets.
ZD: And your biggest problem?
GS: Managing growth.