Truly, the robots are taking our jobs: An automatic transcription software comparison

ZDNet has compared a number of auto transcription services with bemusing results.
Written by Charlie Osborne, Contributing Writer

Transcription services can save you time, effort, and if you're anything like me, allows you to avoid having to listen to your own voice -- a concept many of us find cringeworthy.

While there are many manual services out there in which you outsource transcription tasks and have someone else manually type out conversations or interviews, in recent times, automatic transcription services have begun to appear online.

Automatic services often promise results in a fraction of the time that uploading, sending, and waiting for manual transcriptions require, but are all created equal?

In order to find out, ZDNet tested a total of six auto-transcription offers online.

For each test, the same audio file was used, a 15-minute recording of an interview I had undertaken with a researcher concerning cybersecurity, ransomware, and botnets.

The interview involved myself -- a female with a London accent -- a male American researcher with a soft East Coast accent, and a contribution at the end by a female press relations professional, also from the United States.

The interview was recorded in English with very faint background noise. For the purpose of the test, audio was submitted in an .MP3 format and no video was included.

The six contenders

  • Otter: Otter.ai is a note-taking app with a transcription service included. The free plan offers up to 600 minutes of transcription per month, while the premium plan gives users up to 6,000 minutes per month. The service costs $9.99 per month with an educational discount available.
  • Speechmatics: Speechmatics promises "speech-to-text in 74 languages, batch, and real-time, cloud and on-premises." A demo sends a transcription example to your inbox, while paid transcription costs £0.06 per minute ($0.08) of audio and can be purchased in blocks of £10 or £100.
  • IBM Watson's Speech to Text service: IBM's service is available to live demo or as a service through IBM Cloud.
  • Wreally's Transcribe: The Transcribe service, which has just entered the beta stage as an automatic transcription service, claims up to "90 percent accuracy for well-recorded, clear audio in select languages." Users can use Transcribe for up to one minute for demo purposes (with 30 minutes in total) and automatic transcription costs are 10 cents per minute of audio in $6 packs.
  • Trint: Trint is an automatic transcription service and iPhone app. A free trial is available, pay-as-you-go options are priced at $15 per hour, and subscription services are also available.
  • Temi: Temi touts a service which is able to convert speech to text within five minutes. A free demo is available and the paid option is pay-as-you-go with rates of $0.10 per minute.

Below are snippets of the interview, in natural and not corrected language, together with the output from each automatic transcription service. Differences and errors are highlighted in red.

Section one

Human: I've got a couple of questions, the proof of concept was quite interesting but it's quite vague on a few details. So, I wondered if Paul wouldn't mind just walking me through a few of the aspects of the botnet/worm variant.


Section two

Human: I guess to clarify one thing is that we didn't discover a new variant or a new family of malware; we saw their, maybe, their strategy pivot to deploying ransomware, which is what the, I think, that the high-level coolness of what we discovered was...


    Section three

    Human: Um, so we did sort of, um, isolate some countries that they were targeting and it honestly just seems like countries that are well-to-do financially.


      Section four

      Human: It's possible that they could use a different tool I suppose, it's Gandcrab [which] is well-known and studied, um, it's just the fact that they are getting into the ransomware game.

      (Note: Gandcrab is a form of ransomware. Considering how specialized the name is, you can forgive auto-transcription services for not getting it quite right.)


        Section five

        Human: But from our perspective, and we only probably see a small percent, we've seen 68,000 unique IPS infected with Phorpiex. Now, we can't necessarily say that they're all going to be tasked with implementing ransomware or that will be successful in the propagation to other internal machines. But from our perspective, yeah, we've seen 68,000.

        (Note: 'Phorpiex' translated as 'Four Peaks' is reasonable considering its pronunciation).


          Section six

          Human: Yeah, so it's a well-known weakness in the protocol, there is a bit of confusion. So the protocol is what supports like VNC, which is sort of like a Linux-equivalent version of a remote desktop, but it's just that as a protocol..


          Section seven

          In this example, the speech is more cluttered, with three participants. As you will see, the results become somewhat garbled.

          Human: Lovely / thank you both/guys and Charlie we'll get back to you on the Gandcrab version for you / thank you very much / Yeah, no worries / Give me a shout / That's great, have a good day / You too, guys, thank you very much, cheers, bye


            As the side-by-side comparisons show, no automatic transcription is perfect and would need double-checking for anything related to interviews or studies. However, Transcribe and Otter offered the most accurate translations of the audio file.

            I remember trying out different services for automatic transcription a few years ago and in comparison to the litany of errors and nonsense I received back then, the majority of the tested services show promise and I was impressed. However, we have some way to go before manual, human transcription services are no longer needed.

            Previous and related coverage

            Editorial standards