IDOL 10.5: How HP is cranking up Autonomy's big data credentials

By stepping up unstructured data tool IDOL's links into HP's database and security products, Hewlett-Packard is aiming to strengthen its hand in big-data analytics.
Written by Toby Wolpe, Contributor

Hewlett-Packard says the next version of Autonomy IDOL links the data-analysis tool more deeply into the company's key big-data technologies.

Version 10.5 of IDOL, which is designed to extract meaning from unstructured data, increases integration with open-source distributed computing software Hadoop, HP's Vertica columnar database, and the company's ArcSight security analytics.

IDOL 10.5, launched this week, is also more robust and manageable than earlier versions, according to HP, with an improved admin console.

"All the things we're doing with IDOL 10.5 are to do with performance and the use of the technology in the cloud," said HP Autonomy chief technology officer Fernando Lucini.

With HP IDOL for Hadoop, functions such as sentiment analysis, clustering and entity extraction can be embedded into Hadoop nodes, allowing businesses to run more advanced analytics on, for example, customer behaviour, security, and operations.

"Customers keep asking, 'I'm building a Hadoop data lake and I'm putting stuff in it and some of it does not fit into a box easily. So it would be great if we could point your software at it and I could consume all the voice and the video and files like that'," Lucini said.

IDOL 10.5 also offers social media and email analytics packs for use with HP's ArcSight. The ArcSight security management tool collects logs from a range of devices, such as servers and printers, and correlates them to find security vulnerabilities.

Lucini said if he, having worked at Autonomy for years and only rarely printing out files, on a Friday afternoon prints out a 1,000-page document, that behaviour could be flagged up by ArcSight.

"Why did that happen? It's strange behaviour — or is it? So what ArcSight does is find these aberrant patterns. But if you think about it, that's just about the motion of data. It would be much more interesting for the security guys if you knew what was in that document," Lucini said.

"That's where IDOL comes in. So while ArcSight can do all the 'Fernando is sending 50 percent of his emails to Yahoo and having a conversation with Toby', what's the security threat?

"Most likely the 50 percent via Yahoo sounds like a security threat whereas in real life the conversation with Toby might actually be a bigger threat."

Activity that violates a supervision rule, which could be emailing a company on a list of competitors, appears in the same stream with Autonomy analysis beside it

"So you have the context of data movement and the context of information in one single place. This is not about removing the individual know-how. It's the opposite. It's about giving that person the intelligence so that they can apply and augment that intelligence," he said.

The UDx pack with IDOL 10.5 enables businesses to analyse unstructured data together with transactional and semi-structured machine data in the Vertica database.

Lucini described the Vertica columnar store as a reporting database that specialises in answering any question in any order, as opposed to traditional transactional databases that tend to optimise the information to answer a set of specific questions very efficiently.

"Where IDOL comes into play is people don't only encounter structured information in their world. The information that comes from the cashpoint when you make a withdrawal is very structured and ends up in the database," he said.

"But what about the world around us — the human world — things like tweets or even files?"

UDx is a way of bridging Autonomy and Vertica, extracting sentiment or intent so that it becomes a structured thing that can be put into Vertica for analysis, according to Lucini.

"You have a CRM system where you commentate via email on a product. That ends up in a field, which even though it's a structured database, that information is very unstructured," he said.

"So what can we extract from that? What do you feel about what product? That can be used by the database for more analysis."

Alternatively, you might have a file storage system containing a million files for Hadoop analysis.

"Databases can't do anything with these files. They're highly unstructured, they're highly human," Lucini said.

"But what if Autonomy takes that file and extracts the metadata such as author name and dates, and then looks at the actual contents itself and extracts from that, say, people and places — so the unstructured details — and hands those over to Vertica?"

Lucini said the new connectors to cloud-based applications and systems in 10.5 are important for the IDOL OnDemand web-service APIs launched in December.

"It's about creating an application environment for developers to create applications using very rich data technologies. 10.5 is critical for IDOL OnDemand — it really is hand and hand."

As well as improvement to IDOL's admin console, 10.5 also offers a simpler image server interface, asynchronous query support, improved compaction, and greater flexibility for backups.

Controversy has dogged HP's takeover of Autonomy for $11bn (£7bn) in August 2011, with the company writing down the value of the acquisition by $8.8bn in its 2012 fourth-quarter earnings.

It attributed most of that figure — some $5.4bn — to "accounting improprieties". Ex-Autonomy CEO Mike Lynch has denied HP's accusations.

IDOL for Hadoop and the email and social media packs for ArcSight are available now, with the UDx pack for Vertica following in February after a period of early access for certain existing users and partners.

More on Hewlett-Packard

Editorial standards