X
Tech

SpinVox: Are you listening?

Customer trust is not a game of semantics...
Written by Natasha Lomas, Contributor

Customer trust is not a game of semantics...

Using a bit of mystique to protect company secrets is all very well but too many foggy messages risk driving a wedge between you and your customers, warns Natasha Lomas.

An interesting little tale unfolded last week courtesy of a report by BBC's tech correspondent Rory Cellan-Jones, or 'Rory Katherine Jones' as voicemail-to-text conversion service SpinVox inevitably rebrands him.

And it's SpinVox's service that was worrying Cellan-Jones.

His report claimed the majority of voicemail messages the company processes are actually heard and transcribed by call centre staff in South Africa and the Philippines - rather than being fed into a fancy bit of speech conversion software.

If you've parsed the company's website or come across any of its PR you'll be familiar with the notion of its "Voice Message Conversion System (VMCS)" - also known as 'D2' and/or 'The Brain' - which eats words and spits them back out as text, as seen in this cutesy cartoon from the company website.

40152837-1-spinvoxr.jpg

Describing how The Brain works SpinVox says it's "a combination of artificial intelligence, voice recognition and natural linguistics".

Which is about as illuminating as saying 'this chicken tastes good because of the colonel's secret recipe of 11 herbs & spices'. But this is business and there are trade secrets so that's just a bit of smoke and mirrors right?

The website also notes The Brain "is able to call on human experts for assistance" - and for 'human experts' read 'call centre operatives'.

So while the public face of the company may be rather coy about calling a spade a spade, it's not guilty of a cover-up either.

This is only a 21st Century Mechanical Turk if you failed to read the small print, right?

Or is it?

SpinVox put out the following denial of Cellan-Jones' report:

"Claims have been made to the BBC, suggesting that the majority of messages have been heard and transcribed by call centre staff in South Africa and the Philippines. These are incorrect," it says in the statement.

Which is somewhat awkwardly worded. Does it mean the majority of messages are not heard and transcribed by any call centre staff, or just not by call centre staff in South Africa and the Philippines?

The statement continues: "Today, SpinVox now requires only a few hundred agents per market as its system is capable of automatically converting all standard messages without learning assistance."

What's a 'standard message' when it's at home? Slang is anything but, that's for sure. And that's even before you feed in distorting factors such as background noise, poor sound quality, broad accents, sweet nothings and the rest of it.

The company goes on to say that all speech technology has to be 'trained' by humans - by which it means people have to listen to a portion of the audio and then check the corresponding transcription to correct any discrepancies.

"We have always been absolutely clear in our communications that humans form an important component of our learning system - they are a key component by which the system learns," the statement adds.

Suddenly we find the role of the human ear in SpinVox's business has gone from that of back-up understudy ("is able to call on") to serious actor ("important component" - and even "key component"). Which is a pretty radical linguistic transformation whichever way you look at it.

I decided to try a direct approach and asked the company what proportion of voicemails are converted by its speech technology, and what proportion require the human ear to be transcribed, and also how many call centres it has, where they are located and how many staff it employs to parse voicemails.

SpinVox told me it would get back to me with an answer in due course. A doubtless oversubscribed company spokeswoman said: "We are in the middle of discussing many of the issues which are being discussed."

Which is a wonderful way of saying it's not sure what it can say at this point as it's still trying to decide what can be said. A disclosure of sorts, I grant.

Click here for page 2

Then yesterday a statement landed in my inbox. SpinVox "works with" five call centres, it said. But there was no word on where they are located, or the total number of staff employed. The company would only say that the "few hundred agents" that are now required per market have been cut down from "the thousands" per market when the business started.

According to a spokesman, SpinVox operates in eight markets so we can conservatively guess the call centre staff payroll is somewhere in the region of 2,400 seats.

What about the proportion of calls handled by The Brain vs those going via the lughole? The statement makes some grand claims - saying the tech needs "two per cent of the [human] input" it needed two years ago; and that it can apparently "predict more than 99 per cent of what most people speaking in English or Spanish will say next". Who knew people were so predictable?

But it's hard to shake the feeling that I still haven't got the info I asked for. 'Ninety-nine per cent of most people' sounds like a lot but it can't be measured in any meaningful way - does "most people" mean 80 per cent? Seventy per cent? Or just a majority (51 per cent)?

And as for the "two per cent of the input" statement - all we glean from that disclosure is their system is better than it used to be, and for all we know The Brain's first week of operation saw it gurgling a string of consonants like a newborn babe.

If there's any lesson here it's that it pays to be clear - and not just when you're leaving a voicemail message. Businesses must as clear and candid as they can be without compromising 'commercial sensitivity'. Misunderstandings all too easily breed mistrust: something no business wants to engender in its customers.

SpinVox was fairly clear about human involvement in its system but perhaps not entirely transparent by the warts-and-all standards of a public that poured over every last dotted i and crossed t of MPs' expenses. And I would still like a straight answer on the proportion of voicemails that need a human ear to give up their secrets.

A quick straw poll of SpinVox users on Twitter suggests they are actually an easygoing bunch who don't mind if their calls are being transcribed the old fashioned way, by ear - just so long as something intelligible lands in their inbox at the end of the day.

But SpinVox investors and shareholders have a real reason to care - long-term profitability for this start-up from the class of 2003 is likely to hinge on the quality of its technology, and the quantity of its call centre staff. So let's hope their ears at least are party to what goes on in the boardroom, instead of the transmogrified discussions about the discussions.

And for the record, I'd wager the Welsh surname of Cellan-Jones is a reference to a group of Joneses who originally hailed from the mid-Wales town of Cellan, near Lampeter. SpinVox, are you listening?

Editorial standards