Runway, the AI company known for its popular generative video tool, has unveiled its latest iteration, Runway Gen-3. The new model, which is still in alpha and not publicly available, was showcased through a series of sample videos that appeared to show a significant leap forward in coherence, realism, and prompt adherence when compared to the currently available Gen-2.
The generated videos, particularly those featuring human faces, are highly realistic—so much that AI art community members quickly compared it favorably against OpenAI’s yet-to-be-released but highly anticipated Sora.
“Even if these are cherry-picked, they already look better than Sora,” one Reddit user wrote in the top-voted comment in the Runway Gen-3 discussion thread. “Sora has a stylized look and feel to it,” another user replied, “These people look actually real, the best I’ve seen so far.”
“If you showed those generated people to me I’d have assumed it was real,” read another comment on the 66,000-member AI Video subreddit.
Image: Runway AI
“These Runway GEN-3 clips really hold a visual appeal to me—they look cinematic,” tweeted pseudonymous AI filmmaker PZF, who also lists himself as a creative partner of Runway. “Smooth, understated (in a good, naturalistic way), believable.”
These Runway GEN-3 clips really hold a visual appeal to me. They look cinematic.
Smooth, understated (in a good, naturalistic way), believable.
Excited to try it out once it becomes available. https://t.co/kZfGQ4Vz83
— PZF (@pzf_ai) June 17, 2024
Alongside the Gen-3 video generator, Runway is also introducing a suite of fine-tuning tools, including more flexible image and camera controls.
“Gen-3 Alpha will power Runway’s text-to-video, image-to-video, and text-to-image tools, existing control modes such as Motion Brush, Advanced Camera Controls, and Director Mode, and upcoming tools to enable even more fine-grained control over structure, style, and motion,” the company tweeted.
Trained jointly on videos and images, Gen-3 Alpha will power Runway’s Text to Video, Image to Video and Text to Image tools, existing control modes such as Motion Brush, Advanced Camera Controls and Director Mode, and upcoming tools to enable even more fine-grained control over… pic.twitter.com/sWXIb3NXgm
— Runway (@runwayml) June 17, 2024
Runway claims that Gen-3 is a significant step towards realizing their ambitious goal of creating “General World Models.” These models would enable an AI system to build an internal representation of an environment and use it to simulate future events within that environment. This approach would set Runway apart from conventional techniques that focus on predicting the next likely frame in a specific timeline.
While Runway has not revealed a specific release date for Gen-3, cofounder and CTO Anastasis Germanidis announced that Gen-3 Alpha “will soon be available in the Runway product.” That includes existing modes, as well as “some new ones that only are only now possible with a more capable base model,” he teased.
Runway Gen-3 Alpha will soon be available in the Runway product, and will power all the existing modes that you’re used to (text-to-video, image-to-video, video-to-video), and some new ones that only are only now possible with a more capable base model.
— Anastasis Germanidis (@agermanidis) June 17, 2024
Runway’s journey in the AI space began in 2021 when they collaborated with researchers at the University of Munich to build the first version of Stable Diffusion. Stability AI later stepped in to offset the project’s computing costs and turned it into a global phenomenon.
Since then, Runway has been a significant player in the AI video generation space, alongside competitors like Pika Labs. However, the landscape shifted with OpenAI’s announcement of Sora, which surpassed the capabilities of existing models. Hollywood actor Ashton Kutcher recently caused a stir when he said tools like Sora could massively disrupt TV and film production.
As the world waits for Sora’s public release, however, new competitors have emerged, such as Kuaishou’s Kling and Luma AI’s Dream Machine.
Kling, a Chinese video generator, can produce videos up to two minutes long in 1080p resolution at 30 frames per second, a substantial improvement over existing models. This Chinese model is already available, but users need to provide a Chinese phone number. Kuaishou said it will release a global version.
Dream Machine, on the other hand, is a free-to-use platform that converts written text into dynamic videos and also provides results that easily beat Runway Gen-2 in terms of quality, coherence, and prompt adherence. It requires a basic Google account, but it has been so popular that generations take extremely long to appear—if they appear at all.
In the open-source realm, Stable Video Diffusion, while not capable of producing comparable results, offers a solid foundation for improvement and development. Vidu, another Chinese AI video generator developed by ShengShu Technology and Tsinghua University, uses a proprietary visual transformation model architecture called the Universal Vision Transformer (U-ViT) to generate 16-second videos in 1080p resolution with a single click.
As for Pika Labs, it has not released a major update, leaving its capabilities comparable to Runway Gen-2.
Decrypt reached out to Runway for further information regarding the release date and other details but has not received a response as of this writing.
Edited by Ryan Ozawa.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.
Source: https://decrypt.co/235842/runway-gen-3-ai-video-better-than-sora