Samsung's acquisition of Viv brings up questions about massive-scale AI

The war for mainstream AI dominance is raging. The latest episode was last week's announcement of Samsung acquiring Viv, a hitherto under the radar startup working on building the next Siri. We take the opportunity to ponder on questions related to massive-scale AI.
Written by George Anadiotis, Contributor

The "ubiquitous" part of AI may sound visionary or scary, depending on which way you look at it. But it is in any case the end game for Viv, according to Viv's (and previously, Siri's) founder, Dag Kittlaus. That's why Viv has chosen to partner with Samsung, the #1 vendor in mobile devices sold, also present in the market with a number of other devices ranging from TVs to refrigerators.


Intelligence as a utility is Viv's motto, and they aim to make AI as ubiquitous as Wi-Fi or search. (Image: Viv)

But besides being present on as many devices as possible, Viv also intends to build a developer ecosystem and utilize as many data sources as possible, and this is where it gets interesting from the data point of view. One of the things that allegedly makes Viv better than Siri is the ability to utilize more data sources.

Whether Google has beaten Viv to it already is an open question, albeit not one Viv would care to comment on: when asked to elaborate on questions regarding Viv's use of data, Dag Kittlaus responded they're "going to get back to work at this point". And it's a lot of work indeed to try and catch up with Google in what basically is a quest to use the web as a knowledge base.

The web is your oyster

In traditional AI systems, knowledge bases were typically centralized and curated, and assertions could have provenance and trustworthiness assigned to them. That doesn't mean reasoning was trivial, or there were no issues associated with consistency. But going from that to an open environment like the web or an ecosystem where developers will be able to plug in their own facts is a cambrian explosion indeed.

Google built an empire on its famous PageRank algorithm, using sophisticated indexing to calculate document relevance. But finding the most relevant document and answering questions or getting tasks done are fundamentally different jobs, and Google realized it and started building on semantic technology.

Google bought its way into semantics by acquiring Metaweb in 2010 and has been building on it since, to the point where Knowledge Graph has become one of the biggest knowledge bases around and is used not only to power Google's new semantic search, but presumably Google Assistant too.


Looks like a graph, reasons like a graph..must be a graph. (Image: Viv / Techcrunch Disrupt)

Recently we discussed the importance of data integration as one of the technologies that underpin AI. When it comes to reasoning, it seems that everybody goes graph in one way or another. Judging on the visualization shown in Viv's demo, Viv is no exception, although we don't know whether it's using proprietary technology or leveraging existing data, standards, and infrastructure.

Self programmable, yes, but self data-managing too?

An impressive and potentially differentiating weapon in Viv's arsenal is dynamic program generation -- the ability to self-program on the fly. This adds to its extensibility and retracts from the effort developers have to put to make it work in unforeseen scenarios. In addition, Viv & Samsung claim they will create an "open" developer ecosystem through which developers will have access to Viv's capabilities, and Viv will use their input to learn.

Still, what happens if the data needed to get something done are not there? Is this going to be plug-n-play, or go find and add your own data? Will Viv go out and try to find an appropriate data source? How much of this will be manual and how much automated, what kind of familiarity with Viv's data model do developers have to have and what kind of control can they expect to have?


The ability to parse spoken words to identify intent and put together a self-contained program to act upon that intent is impressive (Image: Viv / Techchrunch Disrupt)

These are all questions Viv knows it has to answer, as the fact that they're hiring for senior technical writers implies. In all fairness, other AI assistants such as Assistant, Siri, Cortana, Alexa, or M either do not have public APIs at this point, or the ones they have focus on recognizing voice commands rather than completing tasks, and do not -- necessarily -- claim to be learning in the process.

What kind of data and techniques are used to train Viv? Supervised or unsupervised learning, a combination of these, or something else entirely? And how would multiple versions of the truth be handled then, keeping in mind that if the developer ecosystem is actually successful the number of data sources Viv deals with will explode? It's anyone's guess.

All your data are belong to us?

What about privacy then? How much and what kind of data can users expect Viv and its ilk to collect and process? What will they be able to infer and whom will it be shared with? With Google or Facebook, everyone knows it's an "all your data are belong to us" policy, and Samsung does not seem much different in that respect.

Most users concerned with data collection and privacy issues focus on the obvious -- personal data and queries. But another aspect recently emerged: IoT sensor data. It turns out the Uber app has access to battery level data via mobile phone sensors, and that has been correlated with user susceptibility to pay for surge pricing.

Uber was quick to reassure that data is not used to determine whether surge pricing will apply, however the fact is there's nothing stopping Uber from doing it. Given the explosion in mobile sensors and their capabilities, it's just a matter of time before sensor data is collected and utilized.

Granted, that can and will be used to, say, identify and act upon distress situations, but where does that lead? There's a whole array of questions there, and apparently at this point people who may provide answers are more concerned with their work than with answering those questions.

Editorial standards