Groupon founder’s new app Descript edits audio files directly from text

Photo courtesy of Descript

Andrew Mason, the co-founder and former CEO of Groupon whose most recent company produced audio-guided city walking tours, has moved on to his next start up — and it involves more audio technology.

Mason’s new app Descript is supposed to allow audio editors to make changes to an audio file simply by editing the text transcription of that audio file. The idea, he says, is to offer anyone editing audio files, whether podcasters, journalists, or musicians, the ability to edit single-track audio clips as easily as they would edit words in a word processing document. Descript was initially built as an in-house production tool for Detour, Mason’s walking-tour app, and is now being spun out as its own company.

“We’ve just crossed the threshold of where automatic speech recognition is accurate enough that automated transcription services are viable,” Mason said in an interview with The Verge, adding that editing audio more easily is the next obvious step after that.

The app relies on text-audio alignment in order to work. A text-based transcription is generated from the audio file, and from there, the app uses machine learning to do a match between the audio sample and the text version of the words. A time code is assigned to each and every word, so that if you were to use the text editor to delete a word in text, it’s immediately synced with the audio file.

Mason claimed that Descript gets a “surprising number of edits right the first time” but added that there’s also a waveform editor in the app so that users can continue to tweak the audio file or add light effects as needed.

The new Descript app officially rolls out today and is available for download on the web. There will be two versions of the app, a standard one that costs $20 a month (with an initial $10-per-month deal), and another version that’s free to download but doesn’t offer the text-to-audio edit tools. In the paid version, transcription services will cost $.07 per minute, while in the free app, transcriptions are $0.15 a minute.

Eight staffers have been working full-time on Descript, Mason said. The company has raised $5 million venture capital funding from Andreessen Horowitz.

Mason also said he is no longer CEO of Detour, which he launched in February 2015. But, he said, the Detour app is continuing on and that the company is “hoping to have more news to share on its future.”

The new Descript app is launching at a time when podcast awareness and podcast listening is at its highest point in a decade, according to a survey that Edison Research conducted earlier this year. An estimated 168 million people in the US are familiar with the term “podcasting” now, and an estimated 112 million people age 12 and up — or 40 percent of the population — have listened to podcasts, up from 36 percent in 2016 and just 11 percent in 2006.

But the new audio editing service also comes amid questions about a digitally-altered future, one in which machine learning and artificial intelligence have the potential to create entirely fabricated media that looks and sounds realistic – like this lip-synced video of Barack Obama created by University of Washington researchers. Last year, Adobe revealed that it was working on new software, code-named Project VoCo, that would act like a Photoshop for audio, generating new words using a speaker’s recorded voice.

In response to questions about the ethical implications of advanced audio-editing tools, Mason emphasized that this product is mostly for simple audio pick-ups, but said that that he and his team are “thinking about it for down the road, to make sure we’re on the right side of this stuff.”

“Even though we don’t intend to be on the vanguard of the fakery, it’s coming one way or the other. But we’ve been through this before,” Mason said. “Basically what’s happened to photos and print before will happen to audio and video, and society adjusts. The credibility of a piece of content comes down to the credibility of the source.”

Correction: An earlier version of this article incorrectly said Descript was available for download through the Mac App Store. The story has been updated to clarify it is only available on the web.

Comments

"Even though we don’t intend to be on the vanhgard of the fakery…"?

Coincidentally, I think that was an error from my real-time transcription of the interview. (It’s been fixed.)

This is awesome. I love Descript and used it early on. But I don’t get how this let’s fake audio be more prevalent than say, Logic does. Why was that question in there?

The same as you could always fake a photograph but until simple tools came along that made it eas, doing so was a highly complex skill. It used to take me hours to delete a background or get rid of an object from a photo. Now my cat can do it.
As tools like this become freely or cheaply available, watch out for people doing it for laughs and more sinister motives. Logic and other tools are either not cheap, or require skill and time.
The answer seems simple: such tools should include digital watermarks that reveal their involvement and maybe even a set of metadata listing the edits.

View All Comments
Back to top ↑