clock menu more-arrow no yes

Filed under:

Descript’s new podcast editor includes an AI voice double for dubbing over mistakes

New, 4 comments

Descript’s reimagining of audio editing software is now available for everyone

Multimedia editing and transcription provider Descript is today announcing a redesigned version of its audio editing software that’s geared toward podcast producers. The product, officially called Descript Podcast Studio, features a lot of the forward-thinking approaches to audio editing the company, created by Groupon founder and former CEO Andrew Mason, was founded on.

Most prominently, that includes the ability to easily edit an artificial intelligence-generated transcription of your audio file as if you were editing a word document. Essentially, Descript turns your audio into text, broken up by who’s speaking, and it then lets you manipulate those audio files as if you were editing on a text version of the script in a word processor. Delete a sentence or two, and Descript will automatically shorten the file to make the recording sound smooth and natural.

The service has been available for roughly two years in a beta-like state since Mason spun it out of his walking tour app Detour, which he created after being unceremoniously shown the door as Groupon’s chief executive. Since then, Descript has worked with professional audio editors at NPR and other outlets to improve the design and feature set ahead of today’s official 1.0 release.

With Descript Podcast Studio, the company’s software now supports simultaneous and collaborative multitrack editing in the style of Google Docs, with changes synced in real time to the cloud. Descript can also just be used as a transcription service, with the company providing pro-grade transcription that includes both AI and human-aided audio-to-text services at 15 cents a minute for free users and 7 cents a minute for those who subscribe to its $10-a-month plan.

But Descript’s new podcast product will also come with an all-new unique AI tool that Mason says can completely overhaul the editing process. It’s called Overdub, and it will allow you to create what Descript is calling an AI voice double that can be used to overdub flubbed words or phrases and can even generate entirely new sentences all on its own — in your voice. It relies on technology developed by a Montreal-based AI startup called Lyrebird, which Descript says it has acquired and transformed into its AI research division.

Image: Descript

Lyrebird was founded by a trio of PhD students at the MILA research institute In Canada, and the company’s technology can create a convincing replica of your voice by training a series of machine learning algorithms on organic voice data. Descript has turned Lyrebird’s tech into Overdub, which will create your AI voice double by asking you to read out loud a series of randomly generated sentences. Mason says this can only be done for your own voice and only after going through the live data gathering process. That way, it can’t be used to create convincing audio deepfakes of other people.

“We’re very lucky in the same way Apple has a business model that allows them to take a customer-friendly stance on privacy, our business model is such that we can take a socially friendly civically friendly approach to the issue of deepfakes,” Mason tells me. “The reason we wanted to build it was to solve our own problem or the problem anyone has experienced when they want to record something, which is that the process of getting it right is incredibly tedious. Wouldn’t it be great to make editorial corrections to the audio content you’ve recorded as it is to do that with text?”

Mason says Overdub will only be useable if you’re editing audio of your own voice or you have the permission of the owner of the voice to make those kinds of edits to the audio recording. The goal is to make the process of fixing a brief but noticeable stumble or correcting an obvious error much less time-intensive. Mason says the point of Overdub is “saving you a trip back to the recording booth.”

“Audio is the easiest medium of content to create but one of the hardest to edit. Going back in there and recording a new take and splicing it back in so it sounds good is a time-consuming process,” he says. To ensure it’s making its approach to this type of controversial AI-based technology clear, Descript has an ethics policy published on its website outlining how Overdub works and the limitations Mason says will prevent it from being abused.

Overdub is just one standout feature as part of Descript’s Podcast Studio software. The company’s entire approach — treating audio editing as if it were as easy as word document editing — means the Descript app is stuffed full of interesting tricks for fast and efficient audio manipulation.

Image: Descript

In true Google Docs style, the collaborative editing tools include commenting and annotation, and for audio nerds, Mason says Descript Podcast Studio will come with a head-spinning amount of editing features you get with pricier software like Adobe Audition and Pro Tools. That includes non-destructive editing, crossfading, volume automation, loudness normalization, and track groupings, to name a few. Descript also supports exporting to programs like Audition, Final Cut Pro, and Pro Tools, for those who rely on any of that software for their professional workflow.

Descript is available now for Mac and Windows, and Mason hopes its unique approach to audio editing, combined with the truly next-generation Overdub feature, will ease the editing pain for the scores of new podcast makers entering the scene. “We’ve seen podcasting entering a golden age and more and more people and companies... are creating audio content,” Mason says. “But they still need to get an audio engineer involved to do anything good. This makes it so much more accessible.”