Splitting a song into separate vocals and instruments has always been a headache for producers, DJs, and anyone else who wants to play around with isolated audio. There are lots of ways to do it but the process can be time-consuming and the results often imperfect. A new open-source AI tool makes this tricky task faster and easier.
The software is called Spleeter and was developed by music streaming service Deezer for research purposes. Yesterday the company released it as an open-source package, putting the code up on Github for anyone to download and use. Just feed Spleeter an audio file and it
spleets splits it into two, four, or five separate audio tracks known as stems. The results aren’t perfect but they are eminently usable and Spleeter itself is very fast. When running on a dedicated GPU it can split audio files into four stems 100 times faster than real time.
You can listen to an example of the software working on David Bowie’s “Changes” below. There are a few audio artifacts in both the vocal-only and band-only stems but the overall results are fantastic. And if Bowie isn’t your thing, here’s another Spleeter example for that timeless ballad of love and loss: “Scatman (Ski-Ba-Bop-Ba-Dop-Bop).”
Technologist Andy Baio wrote an excellent blog post about Spleeter with plenty of his own examples. Baio says the isolated vocals produced by the software “sometimes get a robotic autotuned feel, but the amount of bleed is shockingly low relative to other solutions.” You can listen to an example generated by Baio below with Spleeter running on Marvin Gaye’s “I Heard It Through the Grapevine.” (But definitely click through to his original post if you want to hear more isolated vocal tracks from Lil Nas X, Lizzo, Led Zeppelin, and others.)
Baio points out that Spleeter will also be very useful for anyone looking to create mashups, as he demonstrates himself with an unholy union of the Friends theme tune (“I’ll Be There for You” by the Rembrandts) with the lyrics from Billy Joel’s “We Didn’t Start the Fire.”
nobody should have this kind of power pic.twitter.com/4vbl2MGK4Z— Andy Baio (@waxpancake) November 5, 2019
This tool seems extremely capable but be warned: you’ll need some tech expertise to use it. Unless you’re regularly playing with software like Python or Google’s AI toolkit TensorFlow (which was used to train Spleeter) you’ll have to to download a few programs to get everything up and running. And you’ll have to be comfortable using a command line input (albeit a very simple one) instead of a more accessible visual interface.
Deezer notes that this is not the first time people have used machine learning to automate this task, and that the company’s achievements are built on lots of earlier research. Speaking to The Verge over email, Deezer’s chief data and research officer Aurelien Herault says the company trained its software on 20,000 musical tracks with pre-isolated vocals across a range of genres. From this information the software learned how to isolate the tracks itself.
Overall, Spleeter is another fantastic example of how AI tools can make fiddly bits of creative work simpler. Machine learning is currently being used to automate a range of time-consuming tasks, from removing backgrounds on pictures to upscaling textures in old video games. And increasingly these tools are being incorporated into consumer software, from Adobe’s Photoshop to new contenders like Runway ML.
Deezer says it has no plans to turn Spleeter into a consumer tool, but others could take their work and slap a simple interface on it. The obvious applications are for DJs and producers looking to integrate isolated vocals into mixes, or for people looking to create homebrew karaoke backing tracks. (Such activities might not be in compliance with copyright law depending on how the final product is distributed.)
Deezer itself uses Spleeter for a range of research applications that help improve its streaming service. “Internally, we’re using it as a pre-processing tool for complex research tasks such as music categorization, transcription and language detection,” says Herault.
Or, of course, you can just use it to better get to grips with the Scatman. Ski-bi dibby dib yo da dub dub.