Skip to main content

Making this album with AI ‘felt like wandering in an enormous labyrinth’

Making this album with AI ‘felt like wandering in an enormous labyrinth’


Shadow Planet is the result of a three-way collaboration between humans and AI

Share this story

‘Shadow Planet’ was made by Robin Sloan and Jesse Solomon Clark, aka The Cotton Modules, using OpenAI’s Jukebox AI music model.
‘Shadow Planet’ was made by Robin Sloan and Jesse Solomon Clark, aka The Cotton Modules, using OpenAI’s Jukebox AI music model.

The scare is over and the fun can begin. That’s how I tend to think of creative endeavors involving artificial intelligence these days. We’ve moved past, I think, hyperbolic claims about AI making human art redundant and can now enjoy all the possibilities this technology affords. In that light, Shadow Planet — a new album made as a three-way collaboration between two humans and AI — shows exactly what sort fun can be had.

Shadow Planet is the creation of writer Robin Sloan, musician Jesse Solomon Clark, and Jukebox, a machine learning music program made by OpenAI. After an off-the-cuff Instagram conversation between Sloan and Clark about starting a band (named The Cotton Modules), the two began exchanging tapes of music. A seasoned composer, Clark sent seeds of songs to Sloan who fed them into Jukebox, which is trained on a huge dataset of 1.2 million songs and tries to autocomplete any audio it hears. The AI program, steered by Sloan, then built on Clark’s ideas, which Sloan sent back to him to develop further.

OpenAI’s Jukebox model is trained on 1.2 million songs to produce its own music

The end result of this three-way trade is Shadow Planet, an atmospheric album in which snippets of folk songs and electronic hooks emerge like moss-covered logs from a fuzzy bog of ambient loops and disintegrating samples. It is a complete album in and of itself: a pocket musical universe to explore.

As Sloan explained to me in an interview over email, the sound of Shadow Planet is in many ways a result of the limitations of Jukebox, which only outputs mono audio at 44.1kHz. “Making this album, I learned that this kind of AI model is absolutely an ‘instrument’ you need to learn to play,” he told me. “It’s basically a tuba! A very… strange… and powerful… tuba…”

It’s this sort of emergent creativity, when machines and humans respond to limitations and advantages in one another’s programming, that makes AI art so interesting. Think about how the evolution of the harpsichord to the piano affected styles of music, for example, and as the ability of the latter to play loudly or softly (rather than the single fixed dynamic of the harpsichord) engendered new musical genres. This, I think, is what’s happening now with a whole range of AI models that are shaping creative output.

You can read my interview with Sloan below, and find out why working with machine learning felt to him “like wandering in an enormous labyrinth.” And you can listen to Shadow Planet on SpotifyApple MusiciTunesBandcamp, or on Sloan and Clark’s website.

This interview has been lightly edited for clarity

Hey Robin, thanks for taking the time to speak to me about this album. First of all, tell me a little bit please about what material Jesse was sending you to start this collaboration? Was it original songs?

Yes! Jesse is a composer for commercials, films, and physical installations — he wrote the generative soundtrack that runs inside the visitor center at Amazon’s Spheres in Seattle. So he’s well-accustomed to sitting down and producing a bunch of musical options. Each tape I received from him had around a dozen small “songlets” on it, some only 20-30 seconds long, others a few minutes, all different, all separated by a bit of silence. So, my first task was always to listen through, decide what I liked best, and copy that to the computer.

And then you fed those into an AI system. Can you tell me a little bit about that program? What was it and how does it work?

I used OpenAI’s Jukebox model, which they trained on ~1.2 million songs, 600K of them in English; it operates on raw audio samples. That’s a huge part of the appeal for me; I find the MIDI-centric AI systems too… polite? They respect the grid too much! The sample-based systems (which I’ve used before, in different incarnations, including to make music for the audiobook of my last novel) are crunchier and more volatile, so I like them better.

To sample the Jukebox model, I used my own customized code. The technique OpenAI describes in their publication is very much like, “Hey, Jukebox, play me a song that sounds like The Beatles,” but I wanted to be able to “weird it up,” so my sampling code allows me to specify many different artists and genres and interpolate between them, even if they don’t have anything in common.

“It was, to be honest, an extremely slow and annoying process”

And that’s all just the setup. The sampling process itself is interactive. I’d always start with a “seed” from one of Jesse’s tapes, which would give the model a direction, a vibe to follow. In essence, I’d say to the model: “I’d like something that’s a mix of genre X and Y, sort of like artists A and B, but also, it’s got to follow this introduction: <Jesse’s music plays>”

I’d also, in some cases, specify lyrics. Then, I would go about eight to 10 seconds at a time, generating three options at each step — the computer churns for five to 10 minutes, FUN — then play them back, select one, and continue ahead… or sometimes reject all three and start over. In the end, I’d have a sample between 60-90 seconds long, and I’d print that to tape.

It was, to be honest, an extremely slow and annoying process, but the results were so interesting and evocative that I was always motivated to keep going!

What did Jesse think about the material you were sending him?

He underscores that working with the material was often VERY difficult. Weird instruments would rise up out of nowhere, or the key would change in a strange way, etc. But I think that was part of the fun, too, and the reason to do this project at all: each sample I sent him was a puzzle to solve.

Ultimately, his work was both responsive — “how do I support this sample, help it shine” — and transformative — “what kind of song should this be?” That’s evident on all the songs, but a clear example is “Magnet Train,” where Jesse took pains to showcase and support the vocal performance (weird and dorky and great) and then extended it with elements that suggest “train-ness” — the chugging percussion, etc.

And how exactly did you hone in on this particular sound, do you think? What pushed you in this direction?

Oh, it was definitely the grain of the medium. Early on, I told Jesse that although the model could produce sound at 44.1kHz, it was only in mono. His response was: “Cool! Let’s use mono cassettes then.” And the music he sent back to me was mono, as well. In his final production pass, he added a bit of stereo width, just so the songs weren’t all totally locked in the center, but it’s a pretty “narrow” album generally, and that’s totally because of the AI’s limitation, which we decided to embrace and extend rather than fight. Same goes for the lo-fi, grainy, “radio tuned to a ghost channel” sound — totally an artifact of the way the model produces music, which we amplified further by bouncing the music to tape so many times.

So, in the finished songs that we’re hearing, what proportion of the music is made by AI and what by human? Is it even possible to make that distinction?

It really does vary widely from song to song, and the truth is, in some cases, we lost track! I’d start with a phrase from Jesse, put it through my sampling process, send it back to him, he’d add a layer or extend it, send it back to me, I’d put it BACK through the sampling process… what’s the human / AI breakdown there? It’s all sort of mixed and layered together.

There’s one division that’s clear: anytime you hear anything that sounds like a human voice, whether it’s enunciating lyrics clearly or sort of ooh-ing and ahh-ing, that voice is generated by the AI.

“this kind of AI model is absolutely an ‘instrument’ you need to learn to play”

Making this album, I learned that this kind of AI model is absolutely an “instrument” you need to learn to play. And I’ve come to believe that analogy is a lot more useful and generative than like, “AI co-composer” or “automatic AI artist” or whatever other analogy you might have heard or can imagine. It’s basically a tuba! A very… strange… and powerful… tuba…

Haha, right! I’ve spoken to quite a few artists who use machine learning models to make songs or books, and they often talk about the dynamic between them and the AI — whether it was pushing them in a given direction, for example. Did it feel like this for you at all, when you were exploring what music Jukebox could give you?

I love this question, and here’s why: previously, I have been pretty skeptical / critical of the “big [AI] models trained on everything,” even as they’ve risen to prominence. This is a class that includes GPT-3, Jukebox, CLIP, VQGAN, etc. It’s very clear that this approach produces powerful results, but I always thought it was more creatively interesting to take responsibility for your own dataset, understand its composition as a key creative decision, etc. And I still think that’s true, to a degree…


“it has felt like wandering in an enormous labyrinth or a dead city”

The experience of using Jukebox really turned me around on this. For me, it has felt like wandering in an enormous labyrinth or a dead city: huge, full of alleys and arcades. Even now, having used it for so long, I have no idea what’s still waiting in there, what can be found and carried out. Obviously, I’m betraying the fact that I’ve played too many RPGs here… but truly! That’s the feeling, and it’s VERY fun.

With that in mind, then, what do you think making this album with Jesse taught you about the future of AI and creativity? What do you think these systems will be doing in the future?

AI techniques can do a whole bunch of different things for different kinds of artists, of course, but regarding this specific category, the generative model that can produce new music, new sounds. It seems TOTALLY clear to me that these are on the path to becoming a new kind of synthesizer or electric guitar. I think the story will be broadly similar — they’ll go from research project to novelty (which is where we’re at now) to tools for nascent virtuosos (it’s exciting to think about getting to that point!) to commonplace participants in any / every studio.