Text-to-image AI is mainstream now, but just waiting in the wings is text-to-video. The pitch for this technology is that you’ll be able to type a description and generate a corresponding video in any style you like. Current capabilities lag behind this dream, but for those tracking the tech’s progress, an announcement today by AI startup Runway of a new AI video generation model is noteworthy nonetheless.
Runway offers a web-based video editor that specializes in AI tools like background removal and pose detection. The company helped develop open-source text-to-image model Stable Diffusion and announced its first AI video editing model, Gen-1, in February.
Gen-1 focused on transforming existing video footage, letting users input a rough 3D animation or shaky smartphone clip and apply an AI-generated overlay. In the clip below, for example, footage of cardboard packaging is paired with an image of an industrial factory to produce a clip that could be used for storyboarding or pitching a more polished feature.
Gen-2, by comparison, seems more focused on generating videos from scratch, though there are lots of caveats to note. First, the demo clips shared by Runway are short, unstable, and certainly not photorealistic, and second, access is limited. Bloomberg News reports that users will have to sign up to join a waitlist for Gen-2 via Runway’s Discord, and a spokesperson for the company, Kelsey Rondenet, told The Verge that Runway will be “providing broad access in the coming weeks.”
In other words, all we have to judge Gen-2 right now is a demo reel and a handful of clips (most of which were already being advertised as part of Gen-1).
Still, the results are fascinating, and the prospect of text-to-video AI is certainly intoxicating — promising both new creative opportunities and new threats for misinformation, etc. It’s also worth comparing Runway’s work with text-to-video research shared by behemoths like Meta and Google. The work by these companies is more advanced (their AI-generated clips are longer and more cohesive) but not in a way that necessarily reflects these firms’ massive resources. (Runway, by comparison, is only a 45-person team.)
In other words: startups continue to do exciting work in generative AI, including the still-unexplored territory of text-to-video. Watch for more soon, AI-generated or not.