Taming a world filled with video and audio, using transcription and AI

Trint promises to make the world filled with video and audio more searchable, and make life easier for reporters.

YouTube studio tour: Making it work temporarily in a kitchen! You asked for it, you got it. Here's where I produced my talking head videos for the year between leaving Florida and buying a house in Oregon. Life lessons, chroma key, and a little video distribution history all come together showing how you can produce green screen video whether you’ve got a dedicated space or only the corner of a kitchen. Yeah, that was my kitchen.

Ask any reporter to name one part of their job they really hate and most will tell you that it's what I'm doing right now — transcription.

You go to great lengths to get the story but then you have to get the golden words that you've just gathered onto the printed page. There is only one way to do that and that's transcription — the tedious task of keying in those words you've worked so hard to get.

There are many companies offering transcription services but the key issue is accuracy. 

Now Trint, a company owned by a hugely experienced veteran of many foreign wars, claims to have developed a way to get those voice-recorded words straight onto the printed page accurately. Users include some of the biggest media names, such as The New York Times, ABC News, Thomson Reuters, AP, ESPN, and BBC Worldwide.

Ex-reported turned Trint CEO Jeff Kofman explains how it's done.

ZDNet: How did you first get the idea for Trint?

Kofman: I call myself the accidental entrepreneur. I spent more than three decades as a broadcast journalist — a foreign correspondent, reporting from over 40 countries. 

It was really by accident that I was putting together a global journalism programme and met some developers who had done interesting work in spoken word transcripts. I said that I spent my life transcribing interviews, speeches and conferences but as speech-to-text had been getting better and better, why can't it do the heavy lifting for me? They didn't know who this crazy reporter was, but this interesting idea surfaced.

libya-transcribing-on-the-fall-of-gadhafi-from-a-rebel-army-base.jpg

Kofman at work: "It was one of those light bulb moments. We thought, why hadn't this been done before?"

Photo: Kofman

It was one of those lightbulb moments. We thought, why hasn't this been done before? This was around 2013 and we started Trint in 2014. I wish I could say that I was some great visionary, but I didn't know that the technology was at this great moment when speech-to-text was just getting to the onward and upward stage.

If you had tried this two years earlier, it would have failed. Two years later, you would be following us. If you think of a surfer on the ocean looking for a wave to form, we just got the wave as it was forming.

SEE: Launching and building a startup: A founder's guide (free PDF)

And I think that happened because I've lived the problem. If you don't live inside the problem then you don't actually know there is a problem. I stumbled into it, but I could see that our original three developers were on their stuff. The results coming back were better than I would have expected.

The idea was that we would align the text — the machine-generated transcript and source audio — to the spoken word and do it accurately to the millisecond, so that you could follow it like karaoke, and then we had to figure out a way to correct it. That's where it got really interesting.

What we did was, we came up with the idea of merging a text editor, like Word, to an audio-video player and creating one tool that had two very distinct functions.

So that's where it won, and I remember saying to these guys — and I think they thought I was kind of crazy — that this is the future. Either we get together and make this thing happen or we're going to walk into a coffee shop in a couple of years and somebody is going to be working on some software that does exactly what we've just conceived.   

I said, I'm not going to let that happen. This is really saying that the world needs to make the spoken word discoverable. We are a world of video, audio, podcasting, and YouTube. We don't speak by written letters anymore, by text anymore. We speak through audio and video recording. They're not searchable. Print makes them searchable.

Let's get a timeline here. When did you first get the idea?

Well in early 2014 it started to germinate and we established the company in the fall of '14 and we really started building. I arranged calls to some journalist friends at newspapers, television, radio, online, and I put on my reporter's hat and did this with a dozen or more teams around the world so that the engineers could understand what the problem is. I said, tell me how do you take notes, how do you find the content, the recordings. And everybody said: "Omigod I hate transcribing, it's the worst part of my job. It's always the same — listen, stop, type; listen, stop, type. If you can give us a shortcut to that you will have performed a miracle".

And that's what we do.

When did you get it up and running?

We started building on December 1, 2014. We had the first proof of concept out pretty quickly. By February '15 we did something that actually turned out to be really fortuitous.

It was through the first incubator we were at — a group called IDEALondon, sponsored by Cisco and UCL — and I met a woman there and she agreed to do a day of user experience testing our proof-of-concept.

trint-highlights2x.png

Trint: The editing screen clearly shows the options for getting the page to look just right.

Photo: Trint

During that testing we were in one room — the four of us — and she was in another one with six journalists who we had lined up, for an hour each and going through a number of tasks. We failed. It was like watching your kid go up on stage and forget her lines in the school play. And what we saw was where we were failing and — it gets quite technical — we were using concepts that were way too complicated for people to understand. We had to make it simpler and easier.

And out of that testing day — and at the time the company was probably three or four months old— I understood what we had to do to fix this, to make it useful.

That's where the product of today was born.

That was the winter of 2015 and we then understood what we needed to do, and we launched commercially in September 2016.

We were already testing with journalists through the summer of 2016. Because of my long career, I had a lot of friends so I was able to say, "Come on and try this".  And it really took off.

In the summer of 2016, we decided to test it on the open market and at this stage we were sending it out free. Then something great happened. And a journalist friend of ours tweeted about it and we then watched as we went from the 50 or 100 we had at that point to 200, 500, 1,000, 4,000 and this all happened in a couple of hours.

And it was really exciting and then it got really scary because the system crashed. It wasn't built to scale because we just had no expectation of this. But the one thing it did was validate the concept and it showed that people were really so hungry to leverage AI, to transcribe.

You know, the system had crashed, and we got people emailing us saying what have you done? We were back up within 36 hours and what it told us was that there was real interest in what we were doing.

SEE: How to implement AI and machine learning (ZDNet special report) | Download the report as a PDF (TechRepublic)

So, then we just built up to launch in 2016. And people flocked to us because they could see that for not very much money you could save a huge amount of time and get huge efficiencies.

At that point the team was probably six or seven and through '16 and '17 the product got better, and we did a big funding round in May of 2017 [$3.1m] when we were just 10 people, but we are now 41. And we have a global presence, with 36 in the UK and five in Toronto.

Initially we were just one product, but we now have products for small companies and large ones.

Who were the people you needed to bring in to make this work?

This is the odd thing about this journey for me, I know nothing about business. When I say nothing, that's probably disingenuous. I've been doing this for four years.   

I tell this story. When I began to search for the money to do this, a very good friend of mine, who's a CFO, very kindly offered to do a financial plan, a very crude one on Excel. I had never touched Excel before. I'm a reporter, why would I ever look at Excel? He talked me through this thing, and I sat there nodding away.

He went away and I changed the number and it went 'Hash tag, hash tag' and the only thing I could do was save it and re-open it. I just wanted to curl up on the table and say send me back to Baghdad. For me, the business side has been a very steep learning curve.

PREVIOUS AND RELATED COVERAGE

Truly, the robots are taking our jobs: An automatic transcription software comparison

ZDNet has compared a number of auto transcription services with bemusing results.

How to build a temporary green screen YouTube studio

Life lessons, chroma key, and a little video distribution history all come together in this engaging article about how you can produce green screen video whether you've got a dedicated space or only the corner of a kitchen.

China's AI scientists teach a neural net to train itself

Researchers at China's Sun Yat-Sen University, with help from Chinese startup SenseTime, improved upon their own attempt to get a computer to discern human poses in images by adding a bit of self-supervised training.

5 ways to evaluate AI's accuracy TechRepublic

Establishing accuracy checkpoints on AI outcomes helps with decision making.

80% of workers want to learn AI skills, but employers aren't teaching them TechRepublic

Workers are becoming increasingly comfortable with artificial intelligence in the workplace, demanding even more training in these skills, according to Genpact.

Oculus' VR time-warping nightclub game with live actors is unlike anything I've tried before (CNET)

The Under Presents is a new kind of VR experience that teleports you from your living room to a sprawling absurdist universe with live actors. Could it finally be VR's killer app?