Sensor data is data analytics' future goldmine

newsmaker Teradata CTO Stephen Brobst predicts that while social media is all the hype now, the bigger data treasure trove to mine will be in sensor data, which will gain prominence in the next five years.

newsmaker Having assumed his position as chief technology officer at Teradata Corporation--a company that specializes in providing enterprise data warehousing, management and analytics solutions--for over a decade, it is safe to say that Stephen Brobst knows all about the data industry.

So when he claimed that data generated by social media will not be the largest source of information in the long run, companies should take note. The Teradata CTO predicts that "sensor data", which he described as information culled from road cameras, satellites and other recording devices that tracks behavior such as traffic flow, will be the largest information trove for our society instead.

Industry predictions aside, Brobst in a face-to-face interview with ZDNet Asia also shared how his role as CTO has evolved over the years, the company's plans to integrate unstructured data into its analysis, and what keeps him up at night.

Q: How has the role of CTO evolved since you joined Teradata Corporation in October 1999?
Brobst: When I came into Teradata, [it was] with a different philosophy of what it means to be a CTO. My goal was to spend at least 50 percent of my work time with the customers and listening to their feedback. CTOs don't usually do this as they [would] much rather spend their time in the laboratories.

In this regard, I don't think much has changed as I continue to set aside that portion of time for my customers. I believe that at Teradata, we're not inherently smart. We're smart only because we have smart customers and we listen to them about their needs, goals and where they want to be. This way, we are better able to work with them.

Does the data explosion of recent years and how to manage the deluge of information keep you up at night?
On the contrary, what would leave me sleepless is if the amount of data generated does not increase. After all, Teradata as a company depends on data to survive. So if there is little or no increase in data over the years, we might be out of business.

How is Teradata looking to assimilate unstructured data, which is primarily being generated by social media sites, into a more structured environment in which enterprises can operate in?
We have some technology that we are working [on] with our partners such as universities. What these partners do is take the unstructured data and put it through a sentiments extraction program, which makes use of natural language processing tools to derive the "facts" of the content.

By the time the data gets to us, it becomes somewhat structured in that we know what the subjects, verbs, sentiments, etc, are. We then analyze these sets of information in quite interesting ways.

The challenge lies in the fact that the natural language processing technology is quite complex because different languages have different structures. The reality is that there will be more emphasis placed on certain languages depending on the size of the market, and the more important languages today include English, Chinese and Arabic.

When it comes to emerging technologies such as social media analysis, though, we never know how these will pan out. For Teradata, we are constantly looking for best-of-breed services, so we do hedge in that we will work with a variety of partners to figure out which to use. Of course, it's not necessarily the best technology that will succeed but the one that the market chooses.

What aspect of the social media phenomenon excites you, from a data analysis point of view?
I think it is interesting that analysts are only now looking at social media from a structured, numbers-only perspective. They should also be looking at other factors such as how many followers are tracking the tweets of one individual or how much influence that person has over his or her community of friends and family. You have to weigh them by the impact they are having on the marketplace.

To me, this is a combination of social media [and] traditional analysis in that the collation of social chatter is new, but attaching significance to the data follows conventional analytics.

However, we're still only in the early stages of mining such data. Many of the ongoing pilot programs focus on traditional sources of unstructured data such as call center logs and e-mails, as people try to figure out social media.

Where do you think the data analytics industry is heading toward?
I don't think social media will be the biggest store of unstructured data for long. After all, there are only so many monkeys and that many typewriters, so you're limited by how much unstructured data people can churn out.

Sensor data, however, will comprise of much more data than social media. If you combine all the e-mails, blog entries, tweets and status updates, that will be nothing compared to what sensor data will collect in five years.

What I mean by sensor data takes many different forms, and one example of this is the data that is collected by Singapore's Land Transport Authority (LTA) to track traffic flow and behavior across the island. But I must state that sensor data is still quite insignificant today, so I'm predicting the future here.

Within the next three to five years, I expect to see sensor data hit the crossover point, with unstructured data generated by social media. From there, the former will dominate by factors; not just by 10-20 percent, but by 10-20 times that of social media.

How can companies then use such data to boost their businesses?
Going back to the LTA example, I know of a Singapore-based insurance company that has gone against the pricing convention of raising premiums to cover both good and bad motorists. Instead, the company is basing its insurance premium according to motorists' road behavior and accident record, which can be tracked via global positioning system (GPS) services.

Motorists can choose not to disclose such data about themselves, of course. But for drivers with excellent safety records, such differentiated insurance products using sensor data will help eliminate the "bad driver tax" imposed on them previously.

Now that you've shared where the industry is headed toward, as CTO of Teradata, what do you think are your top concerns today?
I believe our biggest competitor is not companies such as Oracle or IBM, but us saying we're "good enough" and stagnating.

As the datawarehousing industry is maturing, Teradata is becoming less agile and picking up baggage. For our business to be justifiable in this market, we need an analytics application that provides great ROI (return on investment) for customers. But once we have that, we proceed to talk about this same app for the next five years. Nobody will care anymore by then.

So, having the agility to come up with new capabilities and deliver new analytics are my main concerns. Furthermore, we need to constantly review our processes and methodologies to stay competitive, as well as to educate the market on the importance of analytics.